First, students only get good after studying — education is not some magic spell cast by the teacher that only operates on a human's immortal soul. As we should not dismiss what students learn just because we could look it up, it is strange to dismiss what GPT has learned just because it could be looked up.
Second, the GPT-3 (and presumably also GPT-4) training set is about 500e9 tokens, which is what? Something like just a few terabytes?
We've been able to store that in a pocket for years now without being able to do almost any of the things that GPT can do — arbitrary natural language synthesis let alone arbitrary natural language queries — on a computer, even when we programmed the rules, and in this case the program learned the rules from the content.
Even just a few years ago, SOTA NLP was basically just "count up how many good words and bad words are in the text, the sentiment score is total good minus total bad."
That difference is what these test scores are showing.
First, students only get good after studying — education is not some magic spell cast by the teacher that only operates on a human's immortal soul. As we should not dismiss what students learn just because we could look it up, it is strange to dismiss what GPT has learned just because it could be looked up.
Second, the GPT-3 (and presumably also GPT-4) training set is about 500e9 tokens, which is what? Something like just a few terabytes?
We've been able to store that in a pocket for years now without being able to do almost any of the things that GPT can do — arbitrary natural language synthesis let alone arbitrary natural language queries — on a computer, even when we programmed the rules, and in this case the program learned the rules from the content.
Even just a few years ago, SOTA NLP was basically just "count up how many good words and bad words are in the text, the sentiment score is total good minus total bad."
That difference is what these test scores are showing.