LLMs perform poorly at tasks that require considering words as their component letters because they only see tokens. "strawberry" is split as [str][aw][berry] and becomes [496, 675, 15717]. To some degree they can learn from data what letters are in which tokens, but it's not direct like it is for us.
> There are two common ways to understand questions like "how many r's are in strawberry" [...]
> 2- How many r's in a particular subword/syllable (here "berry")
This seems a strange interpretation - why is "straw" ignored?
You never been in a conversation where someone asks "is it 1 or 2 r in strawberry?", ignoring certain single instances of the same in other parts of the word?
Think about it- as I said below, usually it's about things like s vs ss, r vs rr, l vs ll etc
Ah so you think it's interpreting it as "is the second r-sound a single or double r?"
But, even with that interpetation, I don't think it really explains the errors. Like just now I asked how many i's were in "disabilities": https://i.imgur.com/TZFByen.png - it gives the wrong answer and there's no double-i in disabilities to be causing the ambiguity. The follow-on reveals that it generally struggles working with individual letters.
Or, taking a word that is ambigious in this way and adding another of the letter to it to show that it really is just undercounting: https://i.imgur.com/6PaetPK.png
> Ah so you think it's interpreting it as "is the second r-sound a single or double r?"
Right.
I agree it fails to actually count the letters, but I still think those two interpretations of the question are valid, so making sure the LLM addresses the intended one should be important.
I'm also not sure about the disabilities example: This type of question would be very uncommon I think, not least because there aren't really words with double ii's, and people dont usually ask trivially about the number of characters in a word; rather it's usually about the spelling of a particular syllable (and among those cases, usually 1 vs 2).
Your second example is more convincing; however again we must ask: is it understanding the question right? Because you didnt reset the context and so perhaps it based its last answer on its second last answer in a way that could be valid (inside your last screenshot).
Idk if that makes sense the way I'm trying to explain what I mean.
* When asked for r's in strawberry it outputs 2 because it's interpreting your question as whether the second r-sound has one or two r's
* It also miscounts 2 r's in strawberry when you're clear you don't mean that, or use strawberry as part of another phrase, due to how tokens work
But then that makes the part about interpreting the question as "r's in the second r-sound" superfluous (https://i.imgur.com/eYKYRfN.png), if it counts 2 r's in strawberry anyway. There's no need for it to also be misinterpreting what you meant.
> There are two common ways to understand questions like "how many r's are in strawberry" [...]
> 2- How many r's in a particular subword/syllable (here "berry")
This seems a strange interpretation - why is "straw" ignored?