KV-Cache Reuse in Large Language Models

Key: A label for the previous tokens, describes what they contain
Value: The actual information or context that is stored
Query: The question being asked about the preceding text, this information is not cached

May 20, 2026

An exploration on sparse attention and KV-Cache optimization for language models.

\begin{array}{|c|c|c|} \hline \textbf{Order} & \textbf{Tokens} & \textbf{KV Cache} \\ \hline \text{1} & \text{My} & \boxed{My}\; \\ \hline \text{2} & \text{My name} & \boxed{My}\;\boxed{name} \\ \hline \text{3} & \text{My name is} & \boxed{My}\;\boxed{name}\;\boxed{is} \\ \hline \text{4} & \text{My name is Bob} & \boxed{My}\;\boxed{name}\;\boxed{is}\;\boxed{Bob} \\ \hline \end{array}