Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

... why does reversing the all the digits help? could you please explain it? many thanks!


Math operations go right to left in the text, while we write them left to right. So if you see the digits 123... in an autoreressive manner, you don't know really anything, since it could be 12345 or 1234567. If you flipped 12345 as 543..., you know the place value of each. You know that the 5 you encounter first is in the ones place, the 4 is the tens place, etc. It gives the LLM a better chance of learning arithmetic.


ah, okay, thanks!

so basically reverse notation has the advantage of keeping magnitude of numbers (digits!) relative to each other constant (or at least anchored to the beginning of the number)

doesn't attention help with this? (or, it does help, but not much? or it falls out of autoregressive methods?)


Attention does help, which is why it can learn arithmetic, even with arbitrary tokenization. However, if you put it in a standard form, such as right-to-left groups of 3, you make it an easier problem for the LLM to learn. All the examples it sees are in the same format. Here, the issue is that BLT operates in an autoregressive manner (strictly left to right), which makes it harder to tokenize the digits in a way that is easier for the LLM to learn. Each digit is its own token (Llama style), or flipping the digits might be the best.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: