> 100% agree. We don't expect human developers to be perfect, why should we expect AI assistants.
I think the issue is that we are currently being sold that it is. I'm blown away by how useful AI is, and how stupid it can be at the same time. Take a look at the following example:
If you click on the sentence, you can see how dumb Sonnet-3.5 and GPT-4 can be. Each model was asked to spell-check and grammar-check the sentence 5 times each, and you can see that GPT-4o-mini was the only one that got this right all 5 times. The other models mostly got it comically wrong.
I believe LLM is going to change things for the better for developers, but we need to properly set expectations. I suspect this will be difficult, since a lot of VC money is being pumped into AI.
I also think a lot of mistakes can be prevented if you include in your prompt, how and why it did what it did. For example, the prompt that was used in the blog post should include "After writing the test, summarize how each rule was applied."
"I think the issue is that we are currently being sold that it is."
The message that these systems are flawed appears to be pretty universal to me:
ChatGPT footer: "ChatGPT can make mistakes. Check important info."
Claude footer: "Claude can make mistakes. Please double-check responses."
https://www.meta.ai/ "Messages are generated by AI and may be inaccurate or inappropriate."
etc etc etc.
I still think the problem here is science fiction. We have decades of sci-fi telling us that AI systems never make mistakes, but instead will cause harm by following their rules too closely (paperclip factories, 2001: A Space Odyssey etc).
Turns out the actual AI systems we have make mistakes all the time.
But on the other other hand, there's the commercials generated to sell new models or new model features, that FREQUENTLY lie about actual capabilities and fake demos and don't actually end with an equivalent amount of time going over how actual usage may be shit and completely unlike the advertisement.
I'd say parent is absolutely correct - we ARE being sold (quite literally, through promotional material, i.e. ads) that these models are way more capable than they actually are.
You do have to admit, the footer is extremely small and it's also not in the most prominent place. I think most "AI companies" probably don't go into a sales pitch saying "It's awesome, but it might be full of shit".
I do see your science fiction angle, but I think the bigger issue is the media, VCs, etc. are not clearly spelling out that we are nowhere near science fiction AI.
I appreciate the footer on Kagi Assistant: "Assistant can make mistakes. Think for yourself when using it" - a reminder that theres a tendency to outsource your own train of thought
I would have to imagine 90+ percent of people use LLM and AI to outsource their thought and most will not heed this warning. OpenAI might say "Check important info." but they know most people probably won't do a google search or visit their library to fact check things.
I think the issue is that we are currently being sold that it is. I'm blown away by how useful AI is, and how stupid it can be at the same time. Take a look at the following example:
https://app.gitsense.com/?doc=f7419bfb27c896&highlight=&othe...
If you click on the sentence, you can see how dumb Sonnet-3.5 and GPT-4 can be. Each model was asked to spell-check and grammar-check the sentence 5 times each, and you can see that GPT-4o-mini was the only one that got this right all 5 times. The other models mostly got it comically wrong.
I believe LLM is going to change things for the better for developers, but we need to properly set expectations. I suspect this will be difficult, since a lot of VC money is being pumped into AI.
I also think a lot of mistakes can be prevented if you include in your prompt, how and why it did what it did. For example, the prompt that was used in the blog post should include "After writing the test, summarize how each rule was applied."