I recognize the sarcasm. The data I can find says it's performing at baseline ho... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		MattSayar 38 days ago \| parent \| context \| favorite \| on: Claude Opus 4.7 I recognize the sarcasm. The data I can find says it's performing at baseline however? https://marginlab.ai/trackers/claude-code/

ACCount37 38 days ago [–]

Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact