1 article
New approaches use statistical inference, rate-distortion theory, and learned eviction to reduce memory cost of long-context LLM inference.