Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article feels, I don't know… maybe like someone calmly sitting in a rocking chair staring at the sea. Then the camera turns, and there's an erupting volcano in the background.

> If it was a life or death decision, would you trust the model? Judgement, yes, but decision? No, they are not capable of making a decision, at least important ones.

A self-driving car with a vision-language-action model inside buzzes by.

> It still fails when it comes to spatial relations within text, because everything is understood in terms of relations and correspondences between tokens as values themselves, and apparent spatial position is not a stored value.

A large multimodal model listens to your request and produces a picture.

> They'll always need someone to take a look under the hood, figure out how their machine ticks. A strong, fearless individual, the spanner in the works, the eddy in the stream!

GPT‑5.3‑Codex helps debug its own training.



> A self-driving car with a vision-language-action model inside buzzes by.

Vision-action maybe. Jamming language in the middle there is an indicator you should run for public office.


Language allows higher-level and longer-term planning and better generalization than purely vision-action models. Waymo uses VLM in their Driver[1]. Tesla decided to add some kind of a language model to their VA stack[2]. Nvidia's Alpamayo uses VLA[3].

[1] https://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto...

[2] https://x.com/aelluswamy/status/1981760576591393203

[3] https://www.nvidia.com/en-gb/solutions/autonomous-vehicles/a...


> Language allows

If a tree falls...

Waymo, X, Nvidia. You fucking nailed unbiased.


[0] if you want something more academic. What's your gripe with language, anyway?

[0] https://arxiv.org/abs/2506.24044


Language is a construct.

> allow[s] higher-level and longer-term planning and better generalization than purely vision-action model

People do that part. Languages don't plan, they don't think. They don't _do_ anything at all.


Sure. Language just seems to structure a general learning system in a way that allows it to do all those things.


> Sure. Language just seems to structure a general learning system in a way that allows it to do all those things.

That's like saying python structures a general system.

Python does nothing at all. People use python to create constructs. We've covered this. :)


Python is not a natural language. A natural language communicates things that are not here and now (essentially everything that occurs naturally). Moreover, language can convey abstractions that exist because of language itself.


> GPT‑5.3‑Codex helps debug its own training

Doesn't this support the author's point? It still required humans.


Is that the hang-up? Like are people so unimaginative to see that none of this was here five years ago and now this machine is -- if still only in part -- assembling itself?

And the details involved in closing some of the rest of that loop do not seem THAT complicated.


You don't know how involved it was. I would imagine it helped debug some tools that they used to create it. Getting it to actually end to end produce a more capable model without any human help absolutely is that complicated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: