Hacker Newsnew | past | comments | ask | show | jobs | submit | lbeurerkellner's commentslogin

Interesting report. Though, I think many of the attack demos cheat a bit, by putting injections more or less directly in the prompt (here via a website at least).

I know it is only one more step, but from a privilege perspective, having the user essentially tell the agent to do what the attackers are saying, is less realistic then let’s say a real drive-by attack, where the user has asked for something completely different.

Still, good finding/article of course.


> Though, I think many of the attack demos cheat a bit, by putting injections more or less directly in the prompt (here via a website at least)

What difference does that make? The prompt is to read a website and the injection is on that website hidden in html. People aren't going to read the HTML of every website before they scrape it, so this is not an unrealistic vulnerability.

Even worse, it ran arbitrary commands to get around its own restrictions. This just confirms if Antigravity tries to scrape a website with user generated content for any reason, whether the user provides the link or not, you have left your entire machine vulnerable.


Everybody should try. It helps a ton to demystify the relatively simple but powerful underpinning of how modern agents work.

You can get quite far quite quickly. My toy implementation [1] is <600 LOC and even supports MCP.

[1] https://github.com/lbeurerkellner/agent.py


This is way more common with popular MCP server/agent toolsets than you would think.

For those interested in some threat modeling exercise, we recently added a feature to mcp-scan that can analyze toolsets for potential lethal trifecta scenarios. See [1] and [2].

[1] toxic flow analysis, https://invariantlabs.ai/blog/toxic-flow-analysis

[2] mcp-scan, https://github.com/invariantlabs-ai/mcp-scan


This looks really cool, thanks for sharing.


We have published the full trace, with tool outputs here now: https://explorer.invariantlabs.ai/trace/5f3f3f3c-edd3-4ba7-a...

The minesweeper comment was caused by the issue containing explicit instructions in the version that the agent actually ran on. The issue was mistakenly edited afterwards to remove that part, but you can check the edit history in the test repo here: https://github.com/ukend0464/pacman/issues/1

The agent ran on the unedited issue, with the explicit request to exclude the minesweeper repo (another repo of the same user).


Thanks, that makes sense! Cool explorer too!


I agree. It is also interesting to consider how AI security, user eduction/posture and social engineering relate. It is not traditional security in the sense of a code vulnerability, but is is a real vulnerability that can be exploited to harm users.


Furthermore once you are inside the LLM you could try to invoke other tools and attempt to exfiltrate secrets etc. An inject like this on a 10k star repo could run on 100s of LLMs and then tailor it to cross to another popular tool for exfiltration even if the GH key is public and readonly access.


This! It's actually quite frustrating to see how people are dismissing this report. A little open mindedness will show just how wild the possibilities are. Today it's GitHub issues. Tomorrow it's the agent that's supposed to read all your mails and respond to the "easy" ones (this imagined case is likely going to hit a company support inbox somewhere someday).


We should handle LLMs as insider threat instead of typical input parsing problem and we get much better.


All text input is privileged code basically. There is no delimiting possible.


One of the authors here. Thanks for posting. If you are interested in learning more about MCP and agent security, check out some of the following resources, that we have created since we started working on this:

* The full execution trace of the Claude session in this attack scenario: https://explorer.invariantlabs.ai/trace/5f3f3f3c-edd3-4ba7-a...

* MCP-Scan, A security scanner for MCP connections: https://github.com/invariantlabs-ai/mcp-scan

* MCP Tool Poisoning Attacks, https://invariantlabs.ai/blog/mcp-security-notification-tool...

* WhatsApp MCP Exploited, https://invariantlabs.ai/blog/whatsapp-mcp-exploited

* Guardrails, a contextual security layer for agents, https://invariantlabs.ai/blog/guardrails

* AgentDojo, Jointly evaluate security and utility of AI agents https://invariantlabs.ai/blog/agentdojo


Yes, any MCP server that is connected to an untrusted source of data, could be abused by an attacker to take over the agent. Here, we just showed an in-server exploit, that does not require more than one server.

Also, check out our work on tool poisoning, where a connected server itself turns malicious (https://invariantlabs.ai/blog/mcp-security-notification-tool...).


I agree, one of the issues are tokens with too broad permission sets. However, at the same time, people want general agents which do not have to be unlocked on a repository-by-repository basis. That's why they give them tokens with those access permissions, trusting the LLM blindly.

Your caution is wise, however, in my experience, large parts of the eco-system do not follow such practices. The report is an educational resource, raising awareness that indeed, LLMs can be hijacked to do anything if they have the tokens, and access to untrusted data.

The solution: To dynamically restrict what your agent can and cannot do with that token. That's precisely the approach we've been working on for a while now [1].

[1] https://explorer.invariantlabs.ai/docs/guardrails/


If you look at Github's fine-grained token permissions then I can totally imagine someone looking at the 20-30 separate scopes and thinking "fuck this" while they back out and make a non-expiring classic token with access to everything.

It's one of those things where a token creation wizard would come in really handy.


This has happened to me. Can't find the exact combination of scopes required for the job to be done so you end up with the "f this" scenario you mentioned. And it is a constant source of background worry.


Don't forget the also fun classic "what you want to do is not possible with scoped tokens so enjoy your PAT". I think we're now at year 3 of PATs being technically deprecated but still absolutely required in some use cases.


github's fine grained scopes aren't even that good, you still have to grant super broad permissions to do specific things, especially when it comes to orgs


I agree, but that is the permissions boundary, not the LLM. Saying "ooh it's hard so things are fuzzy" just perpetuates the idea that you can create all-powerful API keys.


I've definitely done this, but it's in a class of "the problem is between the keyboard and chair" 'exploits' that shouldn't be pinned on a particular tech or company.


It's the same as Apple telling people they're holding their iPhone wrong, though. Do you want to train millions of people to understand your new permissions setup, or do you want to make it as easy as possible to create tokens with minimal permissions by default?

People will take the path of least resistance when it comes to UX so at some point the company has to take accountability for its own design.

Cloudflare are on the right track with their permissions UX simply by offering templates for common use-cases.


No, Github is squarely to blame; the permission system is too detailed for most people to use, and there is no good explanation of what each permission means in practice.


We all want to not have to code permissions properly, but we live in a society.


How about using LLMs to help us configure the access permissions and guardrails? /s

I think I have to go full offline soon.


Problem is, the mental model of what user wants to do almost never aligns with whatever security model the vendor actually implemented. Broadly-scoped access at least makes it easy on the user; anything I'd like to do will fit as a superset of "read all" or "read/write all".

The fine-grained access forces people to solve a tough riddle, that may actually not have a solution. E.g. I don't believe there's a token configuration in GitHub that corresponds to "I want to allow pushing to and pulling from my repos, but only my repos, and not those of any of the organizations I want to; in fact, I want to be sure you can't even enumerate those organizations by that token". If there is one, I'd be happy to learn - I can't figure out how to make it out of checkboxes GitHub gives me, and honestly, when I need to mint a token, solving riddles like this is the last thing I need.

Getting LLMs to translate what user wants to do into correct configuration might be the simplest solution that's fully general.


This is interesting to expanding upon.

Conceivably, prompt injection could be leveraged to make LLMs give bad advice. Almost like social engineering.


Be sure to check out the malicious issue + response here: https://github.com/ukend0464/pacman/issues/1.

It's hilarious, the agent is even tail-wiggling about completing the exploit.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: