Does your Linux server decide what processes it should launch at what time with a theory of what will happen next in order to complete a goal you specified in natural language? If so yes, I reckon you sure have!
Claude does not have a "theory" of anything, and I'd argue applying that mental model to LLM+Tools is a major reason why Claude can delete a production database.
Well, humans also routinely accidentely delete production databases. I think at this point arguing that LLMs are just clueless automatons that have no idea what they are doing is a losing battle.
They’re not clueless they just don’t have a memory and they don’t have judgement.
They create the illusion of being able to make decisions but they are always just following a simple template.They do not consider nuance, they cannot judge between two difficult options in a real sense.
Which is why they can delete prod databases and why they cannot do expert level work
Not sure if you are being pedantic but mathematics is quite different from other fields because it is highly structured, reasoning is explicit and it contains a dense volume of high level training data. Results are able to be verified easily due to its structure.
Even then, they are most effective in assisting and are not able to produce results independently. If you have proof otherwise I would love to read up on it
I like to think of LLMs as idiot savants. Exceptional at certain tasks, but might also eat the table cloth if you stop paying attention at the wrong time.
With humans, you can kind of interview/select for a more normalized distribution of outcomes, with outliers being less probable, but not impossible.
I mean maybe it’s a losing battle today, but it is correct. So in a few years when the dust settles, we’ll probably all be using LLMs as clueless automatons that still do useful work as tools
Maybe. But probably not. It doesn't matter if it's AGI though. If those other apps and tools do simple things that are predictable, then we can be pretty sure what will happen. If those tools can modify their own configuration and create new cron jobs, it becomes much harder to say anything about what will happen.
Most of us work on software that can modify its own configuration and create new jobs. I, too, have worked in ansible and terraform.
The key break here is the lack of predictability and I think it's important that we don't get too starry eyed and accept that that might be a weakness - not a strength.