Hacker Newsnew | past | comments | ask | show | jobs | submit | simon_luv_pho's commentslogin

Really appreciate the in-depth feedback.

Iframe and CSP are big problems. For the in-page version, I chose to leave out Shadow DOM, canvas, and iframes. Although I know one of the developers forked a version to control same-origin iframes. I don't think it's practical to try to hack around browser security (and website security) — that's why I built the browser extension. I'm hoping the bridge that lets a page call the extension can cover most use cases.

My original HTML dehydration script was ported from `browser-use`. You're absolutely right that it's getting heavier over time, and it's the key factor influencing the overall task success rate. I'm looking to refactor that part and add an extension system for developers to patch their own sites. Hope it turns out well.

Thank you for the feedback. I'll be extra cautious to keep the dehydration code maintainable.


Details of the testing LLM are listed here. https://github.com/alibaba/page-agent/blob/main/docs/terms-a...

The library does NOT include backend services. This is an open source project. I’m not selling any service here…


Could you elaborate on what kind of security problems you’re referring to? Like hallucination?

The PageAgent has access to the security tokens of the currently logged in user. They can do anything the user can on the site, including become them. What is to prevent the PageAgent from being exploited and send these security tokens elsewhere? It would be trivial for some other package to look for your PageAgent and override key functions, and then it is all over.

PageAgent operates at the HTML/DOM level with the same privileges as any other JavaScript running on the page and nothing more. The security token concern you're describing applies equally to every third-party script, npm package, or browser extension that runs in-page. It's not unique to PageAgent.

The browser extension can be more risky because it's more privileged. I've designed a simple authorization mechanism so that only pages explicitly approved by the user can call the extension.

That said, I'd welcome more eyes on this. If anyone wants to review the security model, the code is fully open source.


This library does not include a LLM services. The one on the homepage is only for demonstration and testing. The npm package and extension requires your own LLM api config. Doc here https://alibaba.github.io/page-agent/docs/features/models

This is the problem every agent has to face.

PageAgent’s differentiator is that site developers can embed it directly into their own pages. In that scenario, with proper system instructions plus a built-in whitelist/blacklist API for interactive elements, the risk is pretty manageable.

For the general-agent case, operating on pages you don’t control, the risk is definitely higher. I’m currently working on the human-in-the-loop feature so the user can intervene before sensitive actions.

Would love to hear other approaches if anyone has ideas.


Glad it worked well! The Chrome extension is my focus right now. It handles simple tasks pretty reliably and fast, but still has a long way to go for more complex workflows. Lots to improve.

Currently the only dependency is zod for schema parsing.

I'm intentionally building on a lightweight, in-page JavaScript foundation to carve out some differentiation from the Python-heavy agent ecosystem.

The "protocol" layer of AG-UI does look interesting. I'll look into it to see if I can reuse something, although it seems to be evolving more toward an integration framework rather than an open protocol.

Really glad this resonates with your use case. Lightweight embedding is exactly my priority scenario. Would love to hear how the work goes!


No and please don’t do that.

If you only use it as a personal assistant. You can connect to your llm service directly.

If you plan to integrate it into your web app. It’s better to have a proxy api for the llm and auth the request with cookie or something.


In my plan. Should be easy since I use wxt as the extension framework.

WebMCP doesn’t seem to be available for use inside webpages or extensions.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: