Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is that a direct comparison? The link you gave has a quote that says it’s not:

> Scoped context: Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior"). A real autonomous discovery pipeline starts from a full codebase with no hints

They pointed the models at the known vulnerable functions and gave them a hint. The hint part is what really breaks this comparison because they were basically giving the model the answer.



Does no one defending mythos understand how nested foreloops work?

loop through each repo: loop through each file: opencode command /find_wraparoundvulnerability next file next repo

I can run this on my local LLM and sure, I gotta wait some time for it to complete, but I see zero distinguishing facts here.


No one is saying your nested for loop idea because it won't actually work in practice. In short, the signal to noise ratio will be too high - you will need to comb through a ton of false positives in order to find anything valuable, at which point it stops looking like "automated security research" and it starts looking like "normal security research".

If you don't believe me, you should try it yourself, it's only a couple of dollars. Hey, maybe you're right, and you can prove us all wrong. But I'd bet you on great odds that you're not.


The question is how customized those hints were. That changes whether looping over an entire code base is possible or not.


Aisle said they pointed it at the function, not the file. So, the nr of LLM turns would be something like nr of functions * nr of possible hints * nr of repos.

Could indeed be a useful exercise to benchmark the cost.

This would still be more limied, since many vulnerabilities are apparent only when you consider more context than one function to discover the vulnerability. I think there were those kinds of vulnerabilities in the published materials. So maybe the Aisle case is also picking the low hanging fruit in this respect.


Please do so, looking forward to your write up


When people criticize Aisle's methodology, they aren't "defending Mythos," they're bashing Aisle for their disingenuous claims.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: