> Claude Code is the best autonomous coding agent.
If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
I've had this exact experience. I used gnome for just one week before getting a macbook and after 3+ years of MacOS I still its find multi desktop handling absurd and unintuitive.
What makes this worse is that Apple's refusal to expose any public APIs to control workspace behavior so you can't even work around their shitty choices.
Instead of iterating on existing functionality, they launch flashy additions like Stage Manager only to abandon them immediately.
The chat interface has regrettably become the universal mold for LLM interaction. There are no dissenters. Every provider has the exact same experience. Just off the top of my head I can think of more than a dozen different features that would make LLM interactions infinitely more intuitive and efficient.
If you look at the terminal-bench@2.0 leaderboard, you'll quickly see it's actually one of the weakest agentic harnesses. Anthropic's own models score lower with Claude Code than with virtually any other harness.
So it's quite the opposite. Claude Code is arguably the worst harness to run models with.
reply