> they are incapable of even the simplest "out-of-distribution" deductive reasoning
But the link demonstrates the opposite- these models absolutely are able to reason out of distribution, just not with perfect fidelity. The fact that they can do better than random is itself really impressive. And o1-preview does impressively well, only vary rarely getting the wrong answer on variants of that Alice in Wonderland problem.
If you would listen to most of the people critical of LLMs saying they're a "stochastic parrot" - it should be impossible for them to do better than random on any out of distribution problem. Even just changing one number to create a novel math problem should totally stump them and result in entirely random outputs, but it does not.
Overall, poor reasoning that is better than random but frequently gives the wrong answer is fundamentally, categorically entirely different from being incapable of reasoning.
A good literary production. I would have been proud of it had I thought of it, but it's a path to observe a strong "whataboutery" element that if we use "stochastic parrot" as shorthand and you dislike the term, now you understand why we dislike the constant use of "infer", "reason" and "hallucinate"
Parrots are self aware, complex reasoning brains which can solve problems in geometry, tell lies, and act socially or asocially. They also have complex vocal chords and can perform mimicry. Very few aspects of a parrots behaviour are stochastic but that also underplays how complex stochastic systems can be in their production. If we label LLM products as Stochastic Parrots it does not mean they like cuttlefish bones or are demonstrably modelled by Markov chains like Mark V Shaney.
Well parrots can make more parrots, LLMs can't make their own GPUs. So parrots win, but LLMs can interpolate and even extrapolate a little, have you ever heard a parrot do translation, hearing you say something in English and translating it to Spanish? Yes, LLMs are not parrots. Besides their debatable abilities, they work with human in the loop, which means humans push them outside their original distribution. That's not a parroting act, being able to do more than pattern matching and reproduction.
I don't like wading into this debate when semantics are very personal/subjective. But to me, it seems like almost a sleight of hand to add the stochastic part, when actually they're possibly weighted more on the parrot part. Parrots are much more concrete, whereas the term LLM could refer to the general architecture.
The question to me seems: If we expand on this architecture (in some direction, compute, size etc.), will we get something much more powerful? Whereas if you give nature more time to iterate on the parrot, you'd probably still end up with a parrot.
There's a giant impedance mismatch here (time scaling being one). Unless people want to think of parrots being a subset of all animals, and so 'stochastic animal' is what they mean. But then it's really the difference of 'stochastic human' and 'human'. And I don't think people really want to face that particular distinction.
"Expand the architecture" .. "get something much more powerful" .. "more dilithium crystals, captain"
Like I said elsewhere in this overall thread, we've been here before. Yes, you do see improvements in larger datasets, weighted models over more inputs. I suggest, I guess I believe (to be more honest) that no amount of "bigger" here will magically produce AGI simply because of the scale effect.
There is no theory behind "more" and that means there is no constructed sense of why, and the absence of abstract inductive reasoning continues to say to me, this stuff isn't making a qualitative leap into emergent anything.
It's just better at being an LLM. Even "show your working " is pointing to complex causal chains, not actual inductive reasoning as I see it.
And that's actually a really honest answer. Whereas someone of the opposite opinion might be like parroting in the general copying-template sense actually generalizes to all observable behaviours because templating systems can be turing-complete or something like that. It's templates-all-the-way-down, including complex induction as long as there is a meta-template to match on its symptoms it can be chained on.
Induction is a hard problem, but humans can skip infinite compute time (I don't think we have any reason to believe humans have infinite compute) and still give valid answers. Because there's some (meta)-structure to be exploited.
Architecturally if machines / NN can exploit this same structure is a truer question.
> this stuff isn't making a qualitative leap into emergent anything.
The magical missing ingredient here is search. AlphaZero used search to surpass humans, and the whole Alpha family from DeepMind is surprisingly strong, but narrowly targeted. The AlphaProof model uses LLMs and LEAN to solve hard math problems. The same problem solving CoT data is being used by current reasoning models and they have much better results. The missing piece was search.
I'm sure both of you know this, but "stochastic parrot" refers to the title of a research article that contained a particular argument about LLM limitations that had very little to do with parrots.
Firstly this is meta ad hom. You're ignoring the argument to target the speaker(s)
Secondly, you're ignoring the fact that the community of voices with experience in data sciences, computer science and artificial intelligence themselves are split on the qualities or lack of them in current AI. GPT and LLM are very interesting but say little or nothing to me of new theory of mind, or display inductive logic and reasoning, or even meet the bar for a philosophers cave solution to problems. We've been here before so many, many times. "Just a bit more power captain" was very strong in connectionist theories of mind. fMRI brains activity analytics, you name it.
So yes. There are a lot of "us" who are pushing back on the hype, and no we're not a mini cult.
> GPT and LLM are very interesting but say little or nothing to me of new theory of mind, or display inductive logic and reasoning, or even meet the bar for a philosophers cave solution to problems.
The simple fact they can generate language so well makes me think... maybe language itself carries more weight than we originally thought. LLMs can get to this point without personal experience and embodiment, it should not have been possible, but here we are.
I think philosophers are lagging science now. The RL paradigm of agent-environment-reward based learning seems to me a better one than what we have in philiosophy now. And if you look at how LLMs model language as high dimensional embedding spaces .. this could solve many intractable philosophical problems, like the infinite homunculus regress problem. Relational representations straddle the midpoint between 1st and 3rd person, offering a possible path over the hard problem "gap".
There are a couple Twitter personalities that definitely fit this description.
There is also a much bigger group of people that haven't really tried anything beyond GPT-3.5, which was the best you could get without paying a monthly subscription for a long time. One of the biggest reasons for r1 hype, besides the geopolitical angle, was people could actually try a reasoning model for free for the first time.
ie, the people that AI is dumb? Or you are saying I'm in a cult for being pro it - I'm definitely part of that cult - the "we already have agi and you have to contort yourself into a pretzel to believe otherwise" cult. Not sure if there is a leader though.
> If you would listen to most of the people critical of LLMs saying they're a "stochastic parrot" - it should be impossible for them to do better than random on any out of distribution problem. Even just changing one number to create a novel math problem should totally stump them and result in entirely random outputs, but it does not.
You don't seem to understand how they work, they recurse their solution meaning if they have remembered components it parrots back sub solutions. Its a bit like a natural language computer, that way you can get them to do math etc, although the instruction set isn't of a turing language.
They can't recurse sub sub parts they haven't seen, but problems that has similar sub parts can of course be solved, anyone understands that.
I don't think anyone understands how they work- these type of explanations aren't very complete or accurate. Such explanations/models allow one to reason out what types of things they should be capable of vs incapable of in principle regardless of scale or algorithm tweaks, and those predictions and arguments never match reality and require constant goal post shifting as the models are scaled up.
We understand how we brought them about via setting up an optimization problem in a specific way, that isn't the same at all as knowing how they work.
I tend to think in the totally abstract philosophical sense, independent of the type of model, at the limit of an increasingly capable function approximator trained on an increasingly large and diverse set of real world cause/effect time series data, you eventually develop and increasingly accurate and general predictive model of reality organically within the model. Some model types do have fundamental limits in their ability to scale like this, but we haven't yet found one with these models.
It is more appropriate to objectively test what they can and cannot do, and avoid trying to infer what we expect from how we think they work.
Magic wasn’t mentioned here. We don’t understand the emerging behavior, in the sense that we can’t reason well about it and make good predictions about it (which would allow us to better control and develop it).
This is similar to how understanding chemistry doesn’t imply understanding biology, or understanding how a brain works.
There's no belief or magic required, the word 'reasoning' is used here to refer to an observed capability, not a particular underlying process.
We also don't understand exactly how humans reason, so any claims that humans are capable of reasoning is also mostly an observation about abilities/capabilities.
> We understand how we brought them about via setting up an optimization problem in a specific way, that isn't the same at all as knowing how they work.
You're mistaking "knowing how they work" with "understanding all of the emergent behaviors of them"
If I build a physics simulation, then I know how it works. But that's a separate question from whether I can mentally model and explain the precise way that a ball will bounce given a set of initial conditions within the physics simulation which is what you seem to be talking about.
> You're mistaking "knowing how they work" with "understanding all of the emergent behaviors of them"
By knowing how they work I specifically mean understanding the emergent capabilities and behaviors, but I don't see how it is a mistake. If you understood physics but knew nothing about cars, you can't claim to understand how a car works "simple, it's just atoms interacting according to the laws of physics." That would not let you, e.g. explain its engineering principles or capabilities and limitations in any meaningful way.
But the link demonstrates the opposite- these models absolutely are able to reason out of distribution, just not with perfect fidelity. The fact that they can do better than random is itself really impressive. And o1-preview does impressively well, only vary rarely getting the wrong answer on variants of that Alice in Wonderland problem.
If you would listen to most of the people critical of LLMs saying they're a "stochastic parrot" - it should be impossible for them to do better than random on any out of distribution problem. Even just changing one number to create a novel math problem should totally stump them and result in entirely random outputs, but it does not.
Overall, poor reasoning that is better than random but frequently gives the wrong answer is fundamentally, categorically entirely different from being incapable of reasoning.