TL;DR P0 collaborated with DeepMind to make "Big Sleep," which is an AI agent (u...

nickpsecurity · on Nov 2, 2024

Using a fuzzer was a terrible point of comparison. They’re the slowest, heaviest users of resources. They’d be better off comparing to static analyzers which find bugs fast. In this case, Infer might do since it’s designed to catch those errors.

My concept was running a bunch of open-source, static analyzers with the LLM’s essentially blocking false positives. They can do it analytically or by generating the test cases to prove the bug. It might also be easier to fine-tune open models for this since the job is narrower.