YouTube literally allows trading scams into their ad program, which they force onto users through their terms of service. This is far worse, as naive users may think that YouTube has vetted those ads.
One good use case is unit tests, since they can be trivial while at the same time being cumbersome to make. I could give the LLM code for React components, and it would make the tests and setup all the mocks which is the most annoying part. Although making "all the tests" will typically involve asking the LLM again to think of more edge cases and be sure to cover everything.
That's a lot of thinking they've done about LLMs, but how much did they actually try LLMs? I have long threads where ChatGPT refine solutions to coding problems. Their example of losing the thread after printing a tiny list of 10 philosophers seems really outdated. Also it seems LLMs utilize nested contexts as well, for example when it can break it' own rules while telling a story or speaking hypothetically.
For a paper submitted on July 11, 2024, and with several references to other 2024 publications, it is indeed strange that it gives ChatGPT output from April 2023 to demonstrate that “LLMs lose the thread of a conversation with inhuman ease, as outputs are generated in response to prompts rather than a consistent, shared dialogue” (Figure 1). I have had many consistent, shared dialogues with recent versions of ChatGPT and Claude without any loss of conversation thread even after many back-and-forths.
Most LLM critics (and singularity-is-near influencers) don't actually use the systems enough to have relevant opinions about them. The only really good sources of truth is the chatbot-arena from lmsys and the comment section of r/localllama (I'm quoting Karpathy), both are "wisdom of the crowd" and often the crowd on r/localllama is getting that wisdom by spending hours with one hand on the keyboard and another under their clothes.
From what I heard Crowdstrike just updated their DB file, which means the bug was alreadyq there, waiting for someone to trigger it with a "low risk" quick roll out.