More

sigbottle · 2026-05-29T17:15:02 1780074902

Well no, the idea is a tradeoff between interfaces and telemetry.

OK, the agents don't click in the same way as humans. You learn that, what about mouse hovering telemetry, time spent, etc. And one of the most extreme is to force biometrics - a lot of telemetry, breaks the interface a lot - but hey, you have assurance.

And none of these tradeoffs require understanding the deep processes of the human mind. Just, map is not the territory, how you do game the map harder and harder and how do the mapmakers respond to that?

catsrus · 2026-05-29T17:18:24 1780075104

did you look at the paper? they specifically look at mini tasks with cognitive processes (Eg what dictates the strategy of how people solve tasks)

CamperBob2 · 2026-05-29T17:43:12 1780076592

LLMs can solve original math problems at the IMO level and beyond, and you might be talking to one now. I don't think they are going to have problems with any CAPTCHA short of separate device attestation.

Whatever mechanism the paper proposes, rest assured it can be trained on.

sigbottle · 2026-05-26T16:11:00 1779811860

Sad times ahead.

sigbottle · 2026-05-24T17:04:40 1779642280

Wait isn't gpt 5.2 good? Or is it not thinking / not codex? 5.2 was what sparked the late 2025 openai agentic programming revolution.

mkozlows · 2026-05-25T03:50:49 1779681049

5.2 still had a Codex variant, which this doesn't describe using. It also notably is not using the Codex harness -- it does everything with open source harnesses (which obviously are worse). And while it uses two harnesses with its cheap models, it only uses the worse-performing one of those with GPT 5.2 for cost reasons. (They also don't specify effort/thinking level used for GPT 5.2, but given that it performs worse in their baseline testing than obviously non-SOTA models, I'm guessing it wasn't set to anything high.)

sigbottle · 2026-05-24T14:04:46 1779631486

What would unsupervised mean, would unsupervised be something like alphago playing against itself trillions of times?

Whereas self-supervised, allows learning without explicit annotation of data ; but it doesn't matter if the models already trained on the entire Internet, and it's not like a game where it can come up with effectively new training data for itself?

jmalicki · 2026-05-24T18:05:42 1779645942

Unsupervised is basically clustering. Alphago is RL - winning or losing a game is a form of supervision.

Unsupervised is something where there is no intrinsic reward signal. In pre training, predicting the next token and seeing that it matches is a reward signal, hence it is self supervised.

sigbottle · 2026-05-23T21:37:19 1779572239

And they said it wouldn't happen.

Everything is an accident, an anecdote, only trust the state with your authoritative quantitative data! There's surely no philosophical issues with that! There's no issues with definitional authority!

sigbottle · 2026-05-20T21:05:57 1779311157

And this is one of the many issues with invoking the logical positivists here...

I'm not even sure why they were invoked. Even disregarding the big techinical debunks such as two dogmas, sociologically and even by talking to real mathematicians (see Lakatos, historically, but this is true anecdotally too), it's (ironically) a complete non-question to wonder about mathematics in a logical positivist way.

sigbottle · 2026-05-19T18:00:47 1779213647

Sorry, I thought this would've been a method for finding invariants, not one just for expressing them? I guess I should think about TLA+ as ultimately some kind of solver - give it a configuration, it tells me if it's defined well or not, the point is to make sure I'm not making mistakes, but not necessarily automated innovation?

sigbottle · 2026-05-18T16:35:37 1779122137

I've been working through the American pragmatist tradition - James, Pierce, Sellars, Rorty, Brandom.

Something about that background, all the discussions about definitions and representation, the original article talking about dualisms.... It's certainly an experience.

sigbottle · 2026-05-15T19:21:12 1778872872

Sorry wasn't there a post literally like a week ago about this being a long term experimental branch and how we needed to not kick the hatchling while it's an egg?

1 week turnaround I guess is what they meant.

nicce · 2026-05-15T20:25:39 1778876739

Probably planned during Anthropic aquization. No way this would happen without their blessings. This is one way to reduce the community noise.

sigbottle · 2026-05-06T13:27:22 1778074042

I've noticed on hackernews in the past year, a certain type of comment. A deep suspicion to first call out a surface behavior, then psychoanalyze strangers with whatever the flavor of the month "deep observation" is.

You can't be a dick on this platform without fancy prose I guess.