Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The KV cache is typically stored in a data structure external to the trained weights—often a buffer or set of tensors kept alongside the model’s forward pass (e.g., in PyTorch, one might store it in a dictionary-like container). It’s not baked into the neural network parameters themselves; instead, it’s an auxiliary memory that holds precomputed key-value pairs so the model doesn’t have to re-encode past tokens on each new inference step.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: