> just find the most probable word that follows next Well, if in all situations ...

qsera · 2026-03-03T16:26:13 1772555173

>Probable given what?

The training data..

>predicting what intelligence would do

No, it just predict what the next word would be if an intelligent entity translated its thoughts to words. Because it is trained on the text that are written by intelligent entities.

If it was trained on text written by someone who loves to rhyme, you would be getting all rhyming responses.

It imitates the behavior -- in text -- of what ever entity that generated the training data. Here the training data was made by intelligent humans, so we get an imitation of the same.

It is a clever party trick that works often enough.

empath75 · 2026-03-03T18:25:09 1772562309

It is impossible to accurately imitate the action of intelligent beings without being intelligent. To believe otherwise is to believe that intelligence is a vacuous property.

slopinthebag · 2026-03-03T22:12:58 1772575978

An unintelligent device can accurately imitate the action of intelligent beings within a given scope, in the same way an actor can accurately imitate the action of a fictional character in a given scope (the stage or camera) without actually being that character.

If the idea is that something cannot accurately replicate the entirety of intelligence without being intelligent itself, then perhaps. But that isn't really what people talk about with LLMs given their obvious limitations.

xlii · 2026-03-03T23:47:09 1772581629

So the actors who portrait great thinkers are great thinkers?

bonoboTP · 2026-03-04T00:42:31 1772584951

No, actors recite a pre-written script. But scriptwriters do have to be great thinkers in order to know what the great thinker would actually say.

kqr · 2026-03-04T09:01:17 1772614877

I suppose they really only have to be good at knowing what sort of thing the audience would believe a great thinker would say. As long as the audience does not consist of great thinkers they also cannot know for sure what a great thinker would say.

bonoboTP · 2026-03-04T09:10:27 1772615427

That's true for unverifiable "talk professions" where there is no grounding and it's all self-referential navel-gazing chatter.

But LLMs are already beyond that in writing code that passes actual tests, proving theorems that are check able with formal methods etc.

The people who still say LLMs are just parrots in 2026 will just keep saying this no matter what, so I don't think it makes sense to argue this point further.

qsera · 2026-03-04T10:07:41 1772618861

No no, parrots are truly intelligent.

jeremyjh · 2026-03-04T03:18:19 1772594299

Which is why so many portrayals are unconvincing.

qsera · 2026-03-03T19:09:08 1772564948

>It is impossible to accurately imitate the action of intelligent beings without being intelligent.

Wait what? So a robot who is accurately copying the actions of an intelligent human, is intelligent?

empath75 · 2026-03-03T20:00:25 1772568025

That was probably phrased poorly. If a robot can independently accurately do what an intelligent person would do when placed in a novel situation, then yes, I would say it is intelligent.

If it's just basically being a puppet, then no. You tell me what claude code is more like, a puppet, or a person?

qsera · 2026-03-04T01:25:00 1772587500

It is neither puppet or a person. It is a computer program.

throw310822 · 2026-03-04T12:19:32 1772626772

As much as a bundle of an mp3 decoder and a terabyte of mp3 music are "just a program".

UltraSane · 2026-03-03T22:03:39 1772575419

How can you distinguish intelligence form a sufficiently accurate imitation of intelligence?

slopinthebag · 2026-03-03T22:15:56 1772576156

By "sufficiently accurate" do you mean identical? Because if so, it's not an imitation of intelligence at all, and the question is thus nonsensical.

UltraSane · 2026-03-03T22:55:41 1772578541

"it's not an imitation of intelligence at all"

But that is the key insight, how can you tell when an imitation of intelligence becomes the real thing?

qsera · 2026-03-04T10:17:13 1772619433

When it stops hallucinating without explicit checks for that!

empath75 · 2026-03-04T20:57:06 1772657826

Making mistakes does not make people unintelligent.

qsera · 2026-03-04T21:31:41 1772659901

People don't hallucinate. That is they can pretty reliably assess if they know or don't know something.

empath75 · 2026-03-05T14:27:43 1772720863

Your comment is a perfect example of a human hallucinating something and not knowing they are wrong about it. People are confidently wrong about things _all the time_.

throw310822 · 2026-03-05T19:24:49 1772738689

You say this confidently, but you're wrong... ah no wait... ;)

qsera · 2026-03-06T00:54:22 1772758462

No no, you don't understand. People can misunderstand. But they will not, for example, proceed to drive a car as if they have attended driving lessons when they have not.

They might misremember, but they can know, for sure, if they have NOT come across some information. So if you ask someone if they know where `x` is, they might have came across that info, and still be wrong. But they will know if they have never come across it.

A neural network will happily produce an output when when the input is completely out of range of the training data.

throw310822 · 2026-03-06T08:56:26 1772787386

> if you ask someone if they know where `x` is, they might have came across that info, and still be wrong. But they will know if they have never come across it.

False memories are super common. "I thought I had seen this thing there, but turns out it's not" is a perfectly normal, very frequent occurrence.

qsera · 2026-03-06T21:54:58 1772834098

If someone ask "Hey how do I do this thing in python programming language" what are the chances that you will try to make up a solution, if you have never tried to learn Python?

Just tell me..

throw310822 · 2026-03-07T07:59:22 1772870362

Have you tried this with Claude?

---

Q: How do I reverse an array in the Navajo programming language?

A: I'm not familiar with a programming language called "Navajo." It's possible you might be thinking of a different language, or it could be something very niche that I don't have information about.

---

As for your question, the chances go from 0 to 100% depending on how many languages I already know and whether I have an idea (or I think I have an idea) of how python looks like. And LLMs have seen (and tried to "learn") pretty much everything.

qsera · 2026-03-07T16:08:26 1772899706

Lookup how these "fixes" are implemented and please go back and read my original comment.

UltraSane · 2026-03-07T01:53:49 1772848429

Models ARE slowly improving at this

https://artificialanalysis.ai/evaluations/omniscience

UltraSane · 2026-03-05T05:20:06 1772688006

Some people can reliably assess what the know, others cannot

throw310822 · 2026-03-03T16:32:57 1772555577

> The training data

If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?

cbovis · 2026-03-03T16:55:22 1772556922

The prompt is unique but the tokens aren't.

Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.

newswasboring · 2026-03-04T13:05:09 1772629509

> The prompt is unique but the tokens aren't.

The tokens aren't unique, but the sequence is. Every input this model sees in unique. Even tokens are not as simple as they seem

If you type "ejst os th xspitsl of fermaby?" in ChatGPT it responds with

> It looks like you typed “ejst os th xspitsl of fermaby?”, which seems like a garbled version of:

> "What is the capital of Germany?”

> The capital of Germany is Berlin.

> If you meant to ask something else, feel free to clarify!"

edit: formatting

ajam1507 · 2026-03-05T15:17:16 1772723836

The prompt does correlate to its training data. In this case, since you sent random text, it generated the most likely response to random text.

HDThoreaun · 2026-03-04T02:00:31 1772589631

Or because the text you send was random and doesnt form a clear quesiton?

hmmmmmmmmmmmmmm · 2026-03-03T17:32:32 1772559152

...? what is the response supposed to be here?

hmmmmmmmmmmmmmm · 2026-03-03T17:33:47 1772559227

Hamiltonian paths and previous work by Donald Knuth is more than likely in the training data.

red75prime · 2026-03-03T22:19:18 1772576358

The specific sequence of tokens that comprise the Knuth's problem with an answer to it is not in the training data. A naive probability distribution based on counting token sequences that are present in the training data would assign 0 probability to it. The trained network represents extremely non-naive approach to estimating the ground-truth distribution (the distribution that corresponds to what a human brain might have produced).

qsera · 2026-03-04T01:16:14 1772586974

>the distribution that corresponds to what a human brain might have produced..

But the human brain (or any other intelligent brain) does not work by generating probability distribution of the next word. Even beings that does not have a language can think and act intelligent.

hmmmmmmmmmmmmmm · 2026-03-04T09:34:14 1772616854

You are always making predictions based on the context. That's why illusions can be so effective like these ones: https://illusionoftheyear.com/cat/top-10-finalists/2024/

astrange · 2026-03-04T04:22:07 1772598127

LLMs also don't work by generating probability distributions of the next word. Your explanation isn't able to explain why they can generate words, let alone sentences.

qsera · 2026-03-04T05:14:52 1772601292

That is exactly how they work.

astrange · 2026-03-04T08:40:46 1772613646

No, a token is not a word.

qsera · 2026-03-04T10:15:58 1772619358

I mean, it is some text.

astrange · 2026-03-04T21:26:24 1772659584

How do you get from a piece of text smaller than a word to an entire coherent sentence?

red75prime · 2026-03-04T05:58:15 1772603895

[Citation needed] Neuroscience isn't yet at a point when it can say this with any certainty.

Anyway. It's not a theorem that you can be intelligent only if you fully imitate biological processes. Like flight can be achieved not only by the flapping wings.

qsera · 2026-03-04T06:38:51 1772606331

>you can be intelligent only if you fully imitate biological processes

It is not that. It is about having an understanding of how it is trained. For example, if it was trained on ideas, instead of words, then it would be closer to intelligent behavior.

Someone will say that during training it builds ideas and concepts, but that is just a name that we give for the internal representation that results from training and is not actual ideas and concepts. When it learns about the word "car", it does not actually understand it as a concept, but just as a word and how it can relate to other words. This enables it to generate words that include "car" that are consistent, projecting an appearance of intelligence.

It is hard to propose a test for this, because it will become the next target for the AI companies to optimize for, and maybe the next model will pass it.

red75prime · 2026-03-04T07:01:07 1772607667

The latest models are mostly LMMs (large multimodal models). If a model builds an internal representation that integrates all the modalities we are dealing with (robotics even provides tactile inputs), it becomes harder and harder to imagine why those representations should be qualitatively different.

qsera · 2026-03-04T08:11:14 1772611874

It can't, simply because the textual description of a concept is different from the concept itself.

red75prime · 2026-03-04T08:29:25 1772612965

Obviously, a concept (which is an abstraction in more ways than one) is different from a textual representation. But LLMs don't operate on the textual description of a concept when they are doing their thing. A textual description (which is associated with other modalities in the training data) serves as an input format. LLMs perform non-linear transformations of points in their latent space. These transformations and representations are useful not only for generating text but also for controlling robots, for example (see VLAs in robotics).

qsera · 2026-03-04T09:59:16 1772618356

> don't operate on the textual description of a concept when they are doing their thing.

It could be mapping the text to some other internal representation with connections to mappings from some other text/tokens. But it does not stop text from being the ground truth. It has nothing else going on!

The "hallucination" behavior alone should be enough to reject any claims that these are at least minimally similar to animal intelligence.

ajam1507 · 2026-03-05T15:20:54 1772724054

> The "hallucination" behavior alone should be enough to reject any claims that these are at least minimally similar to animal intelligence.

Can you elaborate on why you think this is the case?

red75prime · 2026-03-04T10:28:28 1772620108

The internal representation happen to be useful not only for outputting text. What does it mean from your standpoint?

qsera · 2026-03-04T10:57:44 1772621864

I didn't understand. Can you clarify?

red75prime · 2026-03-04T11:26:44 1772623604

If LLMs' internal representations are essentially one-to-one mappings of input texts with no additional structure, how can those representations be useful for tasks like object manipulation in robotics?

How is transfer learning possible when non-textual training data enhances performance on textual tasks?

qsera · 2026-03-04T12:18:28 1772626708

I didn't mean it is a one to one mapping from tokens. But instead it might be mapping a corpus of input text to some points in some multi dimensional space, (just like the input data a linear regression), then then it just extends the line further across that space to get the output.

>How is transfer learning possible when non-textual training data enhances performance on textual tasks?

If non-textual training data can be mapped to the same multi-dimensional space ( by using them alongside textual data during training or something like that), then shouldn't it be possible to do what you describe?

hmmmmmmmmmmmmmm · 2026-03-05T20:53:29 1772744009

Obviously there is some level of memorisation involved. That's why you can even get LLMs to write parts of Harry Potter from scratch with perfect precision.

qsera · 2026-03-03T16:51:50 1772556710

Just using a scaled up and cleverly tweaked version of linear regression analysis...

red75prime · 2026-03-03T21:59:48 1772575188

That is, the probability distribution that the network should learn is defined by which probability distribution the network has learned. Brilliant!