More

isege · 2026-05-04T07:22:01 1777879321

> Claude Code is the best autonomous coding agent.

If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.

So it's quite the opposite. Claude Code is arguably the worst harness to run models with.

DaanDL · 2026-05-04T07:37:11 1777880231

Okay, but not all results on there are valid, ForgeCode for instance has been cheating in the past:

https://debugml.github.io/cheating-agents/#sneaking-the-answ...

cpursley · 2026-05-04T09:29:53 1777886993

Those benches are completely and totally meaningless when it comes down to real world work tasks, and everyone knows it.

andxor · 2026-05-04T16:08:55 1777910935

Then the benchmarks are wrong.

isege · 2026-04-30T08:54:32 1777539272

One I noticed with gemini, especially 3 flash: "this is the classic _____".

isege · 2026-04-27T20:44:52 1777322692

Isn't that what terminal-bench does?

isege · 2026-04-09T21:45:31 1775771131

Christmas has come early! Thank you for sharing this

isege · 2026-01-28T07:51:46 1769586706

This comment allows ycombinator to steal ideas from their user's comments, using their huge mass news platform. Temendous overlap indeed.

isege · 2025-12-26T19:36:30 1766777790

This is not just about timestamps but how the traditional chat UI is simply not a good interface for information retrieval and organization.

isege · 2025-11-23T21:10:14 1763932214

I've had this exact experience. I used gnome for just one week before getting a macbook and after 3+ years of MacOS I still its find multi desktop handling absurd and unintuitive.

What makes this worse is that Apple's refusal to expose any public APIs to control workspace behavior so you can't even work around their shitty choices.

Instead of iterating on existing functionality, they launch flashy additions like Stage Manager only to abandon them immediately.

isege · 2025-10-24T11:48:31 1761306511

I'm also developing a similar branching interface though mine is structured differently. I hope we can make a dent in the LLM space, best of luck!

jborland · 2025-10-24T12:32:18 1761309138

Nice! Excited to see what you come up with. Best of luck

isege · 2025-10-21T13:33:52 1761053632

The chat interface has regrettably become the universal mold for LLM interaction. There are no dissenters. Every provider has the exact same experience. Just off the top of my head I can think of more than a dozen different features that would make LLM interactions infinitely more intuitive and efficient.

isege · 2025-09-25T07:28:27 1758785307

a) He has a “I use the web in a very niche way that nobody cares about an your browser sucks if it doesn’t meet those exact needs” way of thinking

b) He is an investor in helium