Iframe and CSP are big problems. For the in-page version, I chose to leave out Shadow DOM, canvas, and iframes. Although I know one of the developers forked a version to control same-origin iframes. I don't think it's practical to try to hack around browser security (and website security) — that's why I built the browser extension. I'm hoping the bridge that lets a page call the extension can cover most use cases.
My original HTML dehydration script was ported from `browser-use`. You're absolutely right that it's getting heavier over time, and it's the key factor influencing the overall task success rate. I'm looking to refactor that part and add an extension system for developers to patch their own sites. Hope it turns out well.
Thank you for the feedback. I'll be extra cautious to keep the dehydration code maintainable.
The PageAgent has access to the security tokens of the currently logged in user. They can do anything the user can on the site, including become them. What is to prevent the PageAgent from being exploited and send these security tokens elsewhere? It would be trivial for some other package to look for your PageAgent and override key functions, and then it is all over.
PageAgent operates at the HTML/DOM level with the same privileges as any other JavaScript running on the page and nothing more. The security token concern you're describing applies equally to every third-party script, npm package, or browser extension that runs in-page. It's not unique to PageAgent.
The browser extension can be more risky because it's more privileged. I've designed a simple authorization mechanism so that only pages explicitly approved by the user can call the extension.
That said, I'd welcome more eyes on this. If anyone wants to review the security model, the code is fully open source.
This library does not include a LLM services. The one on the homepage is only for demonstration and testing. The npm package and extension requires your own LLM api config. Doc here https://alibaba.github.io/page-agent/docs/features/models
PageAgent’s differentiator is that site developers can embed it directly into their own pages. In that scenario, with proper system instructions plus a built-in whitelist/blacklist API for interactive elements, the risk is pretty manageable.
For the general-agent case, operating on pages you don’t control, the risk is definitely higher. I’m currently working on the human-in-the-loop feature so the user can intervene before sensitive actions.
Would love to hear other approaches if anyone has ideas.
Glad it worked well! The Chrome extension is my focus right now. It handles simple tasks pretty reliably and fast, but still has a long way to go for more complex workflows. Lots to improve.
Currently the only dependency is zod for schema parsing.
I'm intentionally building on a lightweight, in-page JavaScript foundation to carve out some differentiation from the Python-heavy agent ecosystem.
The "protocol" layer of AG-UI does look interesting. I'll look into it to see if I can reuse something, although it seems to be evolving more toward an integration framework rather than an open protocol.
Really glad this resonates with your use case. Lightweight embedding is exactly my priority scenario. Would love to hear how the work goes!
Iframe and CSP are big problems. For the in-page version, I chose to leave out Shadow DOM, canvas, and iframes. Although I know one of the developers forked a version to control same-origin iframes. I don't think it's practical to try to hack around browser security (and website security) — that's why I built the browser extension. I'm hoping the bridge that lets a page call the extension can cover most use cases.
My original HTML dehydration script was ported from `browser-use`. You're absolutely right that it's getting heavier over time, and it's the key factor influencing the overall task success rate. I'm looking to refactor that part and add an extension system for developers to patch their own sites. Hope it turns out well.
Thank you for the feedback. I'll be extra cautious to keep the dehydration code maintainable.
reply