Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tokens are mapped to keys, values and queries.

Keys and values for past tokens are cached in modern systems, but the essence of the Transformer architecture is that each token can attend to every past token, so more tokens in a system prompt still consumes resources.



That makes sense, thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: