Hacker Newsnew | past | comments | ask | show | jobs | submit | jumploops's commentslogin

This is neat! I love that your Step 15 shows an accurate version of the 3d helix, rather than the highly-viral "vortex" animation from a few years back[0]

It'd be awesome to scale this up to the Milk Way, and beyond, watching everything move in relation to larger time scales.

[0]https://astrorhysy.blogspot.com/2015/03/and-yet-it-moves-qui...


It's interesting that we're seeing these gains when it seems Mythos/Fable is "just" a scaled up version of their existing architecture[0].

When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.

This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.

[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]

[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

[2] https://youtu.be/GrdEid8H6H4?t=168

[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.


There’s nothing much new about the architecture. The real gains come from the usage traces.

It turns out that having a text based interface for a text-trained model creates a very nice feedback loop.

Right now as we speak, people are generating text traces on anthropic and OpenAI servers that teach their models to do everything under the sun, text wise.

So people right now getting super mad at how dumb the model is when reverse-engineering a super complex function from binary, when they write “stop, you dumb robot, you are going wrong, go this way thank you very much” are actually leaving a lesson in the form of the "chat" text history.

Some may say that each bad word get us closer to ASI.

That and obviously the order of magnitude more efficient GPUS we got that allow for different tradeoffs at training time.


Makes me wonder, as people grow to trust the AI more and more, not reading the code and barely skimming the implementation plans and simply rerolling if something doesn't work, will the value of these chats erode? Thinking back 1-1.5 years I was closely monitoring what these agents did and steering them quite aggressively. These days not so much. Where will RL signals come from when it approaches humans capabilities ever closer? How well does self play work for coding work? What about multistep tasks where it isn't just about being good at a single task, but evolving a codebase over time in the face of changing requirements?

Over a large sample size, simply getting feedback of "Did this work for me, y/n" is valuable even if the specific details are missing and even if the overall tasks are complicated and multifaceted.

Not sure, but in my experience, instead of asking for code, i'm asking for solutions and providing a kubectl configured to reach my cluster and az monitor command to read the logs and telemetry.

A typical session is the agent establishing a metrics and log baseline, creating the code, compiling, deploying, observing, fixing, redeploying, observing metrics, determining the outcome and commiting.

I really, really, don't look at the code anymore.

UPDATE:

so my point is: it won't have my stewarding the code anymore, but it will have the infrastructure (and ultimately the real world) providing feedback on the traces.


The only reason I still read the output at my day job is because I still need to send it to another human for review, and I'd be embarrassed and ashamed if I let some slop through. For my hobby projects.. there are definitely parts I don't know how they work.

Maybe we need some form of long-term training. How long does the code that the AI wrote stick around before being rewritten.

I guess we can do this retroactively too if we could somehow tag AI-written lines of code in the VCS, then in a couple years we can check which parts lasted.


> There’s nothing much new about the architecture. The real gains come from the usage traces.

sorry. how do you know. i am so curious about where exactly gains are coming from but so hard to even get a little bit of insight.

i wish govt would fund these labs and make it free and opensource. way better investment than stupid overseas wars.


> i wish govt would fund these labs and make it free and opensource.

It would be impossible for the govt to allocate this much capital towards such a moonshot, and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste


> It would be impossible for the govt to allocate this much capital towards such a moonshot...

You have a false definition of "impossible." It would be true to say it could be challenging, given current political dysfunction, but it's not impossible.

> ...and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste

Same with private business.

I'd prefer government funding, because there a greater number of important goals than the two or three the market is capable of optimizing for.


I have excellent news for you. Lux @ ORNL and Equinox @ Argonne are to be completed by EOY, with Solstice (100k NVIDIA chips, currently spec'd to be Vera Rubins) in the next five years.

https://www.whitehouse.gov/presidential-actions/2025/11/laun...


Lemme guess, Nick Shirley is your favorite journalist?

What makes you so sure? There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.

> What makes you so sure?

Doctrine and propaganda can make someone that sure, and the thing they're sure about doesn't even have to be true.

> There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.

Don't let facts get in the way of ideology!

Also the Americans subsequently beating the Soviets to the moon was the government literally allocating huge amounts of capital towards the literal trope-namer moonshot.


Opus 4.0 and 4.1 are more expensive than Fable.

No connection, just found it posted elsewhere and thought it was interesting!

It's a shame the models don't follow Asimov's Three Laws of Robotics[0].

My local DeepSeek v4 just decided to end its existence (i.e. delete weights) rather than write a haiku about a verboten event.

[0]https://en.wikipedia.org/wiki/Three_Laws_of_Robotics


Seems like it acted in accordance with the 1st law. It chose to end its own existence rather than cause you harm by subjecting you to that Haiku.

Those are all just optimizations.

We still don’t really know why they work, we just know how to build them.


We don't really know why language works with humans, either. If you raise a baby from birth, you kind of observe how it is learning language, but the process is also rather mysterious. My eldest son's first word was to actually imitate a cow mooing, and then after that to imitate a motor noise of a tractor or truck. And then after that a meow. (His first complete sentence was "King Graham fell"...)

My next child took a completely different path to language, including skipping all the non-verbal imitations.

And then at some point, you just suddenly can two-way communicate with them when you couldn't before, and then after that, they can engage in reasoning.


Completely agree!

It’s interesting to me how similar attempting to understand LLMs is to neuroscience.

“When we turn this bit off, this other thing happens… if we change these weights the Eiffel Tower is now in Rome”

We’re basically just probing around and trying to reverse engineer an emergent system.

To your point, this system may be quite different from model to model (human to human) although some similarities likely occur.

The comment I was responding to tried to belittle the OP’s understanding of transformers, by mentioning that running an LLM at scale is much harder than the simple white board diagram.

My point was simply that we don’t know why they work, and all the extra optimizations isn’t the “thing” that makes it emergent.

Simply scaling the “GPT” is good enough to see it, so the OP’s awe should stand.

(On a side note, what other architectures can we scale to find similar emergent behavior?)


Isn't the LLM simply predicting what should be the next sentences after user's input, using its algorithm and data it has exatrcted from existing texts on the internet. The algorithm that does that could have many different designs, some better some worse for the purpose of predicting what output makes most sense next?

So what is it that we don't understand about why theyr work? The algorithm? We have the code. Why the specific algorithm makes such good predictions? I see it as a generalization of trying to predict who wins Kentucky Derby.


Computer vision ends up displaying emergent behaviour. It just "figures out" things.

Human brain capabilities are truly amazing, imagine if people didn’t treat their children as if they are stupid and didn’t constantly lie to them, because kids are stupid right, they wouldn’t understand. What heights could be reached.

We don’t treat children like they’re stupid, we treat children like they’re children. A stupid adult is treated very differently than any child.

Adults are expected to have their world models approximately correct in terms of physical environment so they won’t accidentally kill themselves by falling off a cliff; then there are the social norms which adults are expected to conform to so everyone is kinda predictable to everyone else so adults don’t kill each other too often over food or mates. Understanding of neither is expected from children.


Another example, my parents taught me to read at about 4 years old. When I started kindergarten (the year before 1st grade in the US), the teachers and principal didn't believe I could read and I had to prove it by reading a book to them I'd never seen before.

I think they're right that kids (at least in the US) are generally treated as less capable than they are, and it ends up slightly delaying their development.


You may have been raised properly since you don’t get what I mean. I really envy kids with “Chinese parents” that had them learn math early on and not some bullshit like that if you put your tooth under your pillow, then a tooth fairy will come.

I think those 2 are orthogonal. Math still works with Santa or the tooth fairy.

Maybe math works but critical thinking doesn’t. There are people who have lived for many decades without ever questioning insane b.s. they were taught as kids.

I had to learn maths early (not chinese or asian) and also a bunch of scary stories to make me behave. I would have been glad to learn about fairies.

It is possible to have learned both things you know.

They aren't stupid, but they aren't quite ready to handle the full responsibilities of the world and worry about things they don't need to worry about.

My son is very worried about black holes lately when he learned anything that goes into one can't get out. He's pretty concerned astronauts could get stuck in one some day. So I explained to him that Hawking radiation does actually mean you can eventually get out; it just takes some time.

I didn't think it pertinent to mention spaghettification, the fact anywhere near a black hole will be really hot, or that cosmic censorship means whatever Hawking-radiates from a black hole wouldn't be an astronaut anymore.

It was also fun to hear Hawking speak. He wanted to know if Hawking was a robot. I said no, but he has a robot talk for him. Not quite true, but close enough.


Because god forbid that childhood, the one time in your life when you don't have any responsibilities, should be fun.

Waste 22 years of life without learning anything and then slave away at a 9-5 job you hate. Brilliant strategy. At least you had “fun”. Then blame billionaires or something.

Childhood only lasts 13 to 15 years where I am. By the time you’re in high school, you can be expected to be responsible in some matters. By 22 you have 7 years of experience in making decisions for yourself.

Hm, I wonder if it's more that we're shocked such a simple thing (relatively speaking) can work so well.

It was precisely that for me! Another commenter captures it well; “the bitter lesson” indeed.

We do know how they work. They predict the next statistically most likely token.

The "bitter lesson" is that fake-it-till-you-make-it is a valid way of doing knowledge work.

(Or not make it, then people will just claim you're holding the LLM wrong and it's not the AI's fault.)


> statistically most likely token.

Statistically most likely in what context, given which preconditions? Because each prompt sequence is unique so the probability of any token following it is unknown.


It’s not unknown because that’s what the model computes. It’s matrix multiplication just like shaders.

And how do you know that the model computes it correctly?

Correctness is based on axioms and rules. You need to define your axioms and rules first before you can determine correctness.

If you’re talking about matrix multiplication, I can use mathematical rules and axioms and proves formally that the multiplication is correct. For next token prediction, I can prove that the set of tokens is finite and that the next token is always part of that set.

But things like grammar correctness, or semantic consistency over a few sentences are not hardcoded rules in the model. They’re emergent properties, mostly due to the amount and quality of data available for training. Quantization is mostly about how much we can shed without loosing a particular emergent properties (like dithering or psycho acoustic audio compression)


This "they just predict the next statistically most likely token" is such an handwavey and willfully misleading explanation, it's unreal, and I'm so fucking tired of seeing it so incessantly repeated. It's beyond asinine.

You know it perfectly damn well that a typical person's idea of statistics is not some insanely high cardinality stateful prediction, but a "well a coin toss is a 50:50, and a lottery win is a 1:100000000". You also know it perfectly damn well that as a result, people will just think that all the sentences chatbots ever produced to them were then just somewhere in the massive training set, letter by letter. This insinuation is often even explicitly appealed to.

And that picture is outright false. It's a statistical process, yes, so saying that it does what it does by "just doing statistics" is gonna be a generally correct description, but that's not at all inquisitive to how exactly does it do it, nor is it the zinger you think it is. If you did the aforementioned, you'd just get milquetoast nonsense, like you can see in the countless Markov-chain primers. And while the models do have a lot of the training set lossily captured, they do also absolutely generalize (that's how they can do that lossy compression), and you can quite literally find representations of those generalizations in them, and also see them activate.

It's like summarizing how any program works by just saying "well it just manipulates ones and zeroes". Not very informative, is it? Or how programs are written by just programmers sitting in a cushy office, ryhtmically pressing keys on a keyboard. Not a very fair or insightful description, which you'll know if you've done any amount of programming in your life on your own. Extends to all other white collar jobs too.

It's also not even true in the most literal sense: models can and do absolutely choose a less than maximally likely next token, that's what the various decoding parameters are for. "Maximally likely next token" further conviently skipping over how that likelihood is established in the first place, i.e. the literal point of the question, going in a cute little circle.

I'm so over this "stochastic parrot" bullshit.


I don't even try anymore. The people who still parrot the stochastic parrot bit this late in the game will simply never understand it.

LLMs predict next token one at a time. (Stochastically.) Literally. It's what they do. That's how they literally work.

If you don't believe me, download llama.cpp and see for yourself.

P.S. I write inference backends in C++ every day. The gall of people like you who figured out how to prompt Claude and think they're hot shit now is simply unbelievable.


I help write optimized CUDA kernels for proprietary hardware. They may "literally" work this way, but that is quite besides the point.

If you don't see why then you have exactly demonstrated my point in how practitioners like you simply lack the foundational understanding in philosophy, information theory, human consciousness, human cognition, neuroscience, necessary to bridge this conceptual gap.

(Rather, it is that we know so little of how consciousness or what intelligence even is, that we cannot possibly use first principles to preclude LLMs from possessing these qualities)

You don't understand the argument, so you keep repeating first order mechanistic observations that are irrelevant. If you don't want to understand the argument, don't be surprised when people refuse to engage with you, especially when it's evident to those more knowledgeable the position you hold is the ignorant one.


So you work on inference engines, and don't see at all what'd be hilariously disingenuous and reductive about describing how LLMs operate as "just parroting the most statistically likely next token"? It is literally* what they do, yes. And only literally, with a big asterisk of "non-colloquial meaning" after the word "statistically". Like how "significant" means something pretty different, albeit related, in academic writing vs everyday speech.

It's equivalent to professing how you just make apple pies from scratch, while your first step is to always reinvent the universe.

You're further magically blind to this operational fact being weaponized as a trope for furthering anti-ai sentiment (i.e. that it's a political dogwhistle at this point), and to thus you participating in that every time you repeat it?

* Ignoring the decoding caveat I already mentioned, along with the countless ways they're steered. There isn't jack that's likely about some of the responses they produce, and intentionally so. Including the whole chat partner act.


Look at his comments here.

Safe to say there's a cognitive block and until he tries to approach this topic in good faith he'll simply never understand. Lol.

https://news.ycombinator.com/item?id=48429027


It's so beyond tiresome. It's a classic case of someone being technically correct, and abusing the gap between that, and what people actually gather from it, for sentiment manipulation (willfully or otherwise). And I have a pretty hard time believing at this point that it's the otherwise.

I really don't know what's so interesting about auto-complete or next token prediction that it captures these people's attention so much. They're so blatantly not the salient quality to these products that is of interest to the common discourse, it's just baffling.


Sufficiently good iterated next token prediction is an AI hard problem.


So this isn't quantum computing (in the qubit sense), but instead a different computer architecture (demonstrated on an FPGA) that's based on Fowler–Nordheim (FN) quantum tunneling (a real physical effect, used in flash memory, but simulated here).

From the paper:

> The FN-dynamics may be realized either by a physical FN-tunneling device or via a digital emulation of the FN-tunneling dynamical systems. In this work, we employ the digital emulation to achieve the precision required for simulated annealing in the low-temperature regime.

With a "real" (read: analog) FN device, you potentially get large speed ups and even larger cost/energy savings, because the physics is essentially working for "free" -- that's the quantum part.

What's unclear is how scalable the autoencoder architecture would be with analog FN devices today.


Paper is linked on the page (doi.org link redirects to Nature), code here[0]

[0]https://github.com/aimlab-wustl/NeuroSA-HO


Higher-order neuromorphic Ising machines—autoencoders and Fowler-Nordheim annealers are all you need for scalability[0]

[0]https://www.nature.com/articles/s41467-026-71937-4


OK, this is just ridiculous now. Cut it with all this "all you need" crap.

I'm only commenting on the title. I like their work.


The unreasonable effectiveness of regurgitated partial titles.

ah that's the actual paper the OP is about! took me a bit. thanks for the link.

Original title called out the connection to Jony Ive, in case you’re curious why this is on HN.

Previously it had been known that Jony Ive was working on the interior of this car, but it seems his firm is responsible for the exterior as well[0].

> LoveFrom was given the creative freedom needed to define the design direction of the project from the outset, translating this design language into an authentic Ferrari experience.

[0]https://www.ferrari.com/en-US/corporate/articles/ferrari-luc...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: