Keys and values for past tokens are cached in modern systems, but the essence of the Transformer architecture is that each token can attend to every past token, so more tokens in a system prompt still consumes resources.
Keys and values for past tokens are cached in modern systems, but the essence of the Transformer architecture is that each token can attend to every past token, so more tokens in a system prompt still consumes resources.