Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

logKV cachegeneration

While humans naturally learn and adapt from past experiences, large language models (LLMs) and their agentic counterparts often fail to retain reasoning from previous tasks and apply it in future contexts. We introduce Log-Augmented Generation (LAG), a novel framework that directly reuses prior computation and reasoning from past logs at test time, enabling models to learn from previous tasks and perform better on new, unseen challenges, without sacrificing the system's efficiency or scalability. Our approach represents task logs as key-value (KV) caches that encode the full reasoning context of prior tasks, while storing KV values for only a selected subset of tokens. When a new task arises, LAG retrieves KV values from relevant logs to augment generation. Unlike reflection-based memory mechanisms, which require additional extraction or distillation steps, LAG reuses prior reasoning verbatim. Moreover, it extends beyond existing KV caching techniques, which have primarily targeted efficiency, by explicitly improving accuracy through log reuse. Experiments on knowledge- and reasoning-intensive datasets demonstrate that our method significantly outperforms standard agentic systems that do not utilize logs, as well as existing solutions based on reflection and KV cache techniques.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Log-Augmented Generation (LAG), a framework that directly reuses prior computation and reasoning from past task logs at test time by representing logs as key-value caches. Within the taxonomy, LAG occupies the 'Direct Computation Reuse at Test Time' leaf, which currently contains only this single paper. This positioning indicates a relatively sparse research direction focused specifically on runtime retrieval of prior reasoning without training updates, distinguishing it from the more populated branches addressing memory-based agent systems or training-based experience integration.

The taxonomy reveals that LAG's nearest conceptual neighbors reside in 'Memory-Based Experience Reuse for Agent Systems,' which includes procedural memory frameworks, trajectory-level storage, and vector-based memory systems across multiple papers. However, those approaches typically involve abstraction, distillation, or structured knowledge graphs rather than verbatim reuse of computation. Another related branch, 'Training-Based Experience Integration,' encompasses reasoning trace distillation and reinforcement learning with experience replay, but these methods incorporate experiences during model training rather than at inference time. LAG's approach diverges by avoiding both abstraction and training overhead.

Among the three identified contributions, the core LAG framework and KV cache representation each examined ten candidate papers with zero refutable instances, suggesting limited direct prior work in this specific formulation. The positional embedding adjustment mechanism, however, encountered five refutable candidates among ten examined, indicating more substantial overlap with existing KV caching techniques. This pattern suggests that while the overall framework appears relatively novel within the limited search scope of thirty candidates, certain technical components build upon established methods in transformer optimization and context extension.

Based on the top-thirty semantic matches examined, LAG appears to occupy a distinct niche combining test-time retrieval with verbatim reasoning reuse. The analysis does not cover exhaustive literature on KV caching efficiency techniques or broader memory-augmented generation methods, which may contain additional relevant prior work. The framework's novelty primarily stems from its integration of log-based retrieval with direct computation reuse, rather than from individual technical components.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Reusing prior reasoning and computation from past task logs. The field encompasses diverse strategies for leveraging historical execution traces, learned experiences, and intermediate computations to improve efficiency and performance in agent systems and reasoning tasks. The taxonomy reveals several major branches: Memory-Based Experience Reuse for Agent Systems focuses on storing and retrieving episodic or procedural knowledge from past interactions, often using replay mechanisms similar to those in reinforcement learning (e.g., Hindsight Experience Replay[33], Contextual Experience Replay[29]). Training-Based Experience Integration emphasizes incorporating logged data into model training pipelines, while Reasoning Strategy Optimization and Adaptation explores how agents can refine their problem-solving approaches by analyzing prior successes and failures. Execution Trace Analysis and Reuse examines methods that directly parse and repurpose computational traces, and Embodied and Interactive Task Learning addresses situated agents that learn from physical or simulated environments. Specialized Learning and Reasoning Applications target domain-specific scenarios, and Direct Computation Reuse at Test Time investigates runtime retrieval and application of previously computed solutions without additional training. A particularly active contrast emerges between training-time integration methods, which distill experience into model weights (e.g., ExGRPO[4], ARPO[20]), and test-time retrieval approaches that dynamically access stored computations on demand. Log-Augmented Generation[0] sits squarely within the Direct Computation Reuse at Test Time branch, emphasizing runtime lookup of relevant prior reasoning traces to guide current inference without retraining. This contrasts with memory-based agent systems like Agent KB[16] or Legomem[11], which maintain evolving knowledge bases updated through interaction, and with training-focused methods such as Reasoningbank[5] that curate datasets for offline learning. The central trade-off involves balancing the flexibility and immediacy of test-time reuse against the generalization and compactness achievable through training-based distillation, with Log-Augmented Generation[0] prioritizing rapid adaptation and interpretability by directly referencing historical logs.

Claimed Contributions

Log-Augmented Generation (LAG) framework

10 retrieved papers

The authors propose a framework that enables large language models to reuse prior reasoning and computation from past task executions at inference time. Unlike reflection-based methods that extract and distill logs, LAG directly reuses past reasoning verbatim to improve both accuracy and efficiency on new tasks.

10 retrieved papers

KV cache representation for logs

10 retrieved papers

The authors introduce a method to represent logs using key-value caches that capture the full reasoning context by encoding all model responses but storing KV values only for selected tokens (e.g., the last model response). This approach leverages the attention mechanism's property that a token's KV value attends to the entire context, enabling compact yet semantically rich log storage.

10 retrieved papers

Positional embedding adjustment for KV reuse

Can Refute

10 retrieved papers

The authors develop a technique to handle the positional dependency of KV values when reusing them in new contexts. By removing original RoPE positional embeddings and reapplying new ones based on updated positional IDs, the method enables effective integration of retrieved KV caches into current generation contexts.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Log-Augmented Generation (LAG) framework

[61] {ServerlessLLM}:{Low-Latency} serverless inference for large language models PDF

Cannot Refute

[62] {InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management PDF

Cannot Refute

[63] Cumulative Reasoning with Large Language Models PDF

Cannot Refute

[64] Recycled attention: Efficient inference for long-context language models PDF

Cannot Refute

[65] ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory PDF

Cannot Refute

[66] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference PDF

Cannot Refute

[67] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion PDF

Cannot Refute

[68] PLD+: Accelerating LLM inference by leveraging Language Model Artifacts PDF

Cannot Refute

[69] KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse PDF

Cannot Refute

[70] Reusing, recycling and reducing large models for developing green and responsible language technology PDF

Cannot Refute

Contribution

KV cache representation for logs

[51] Think clearly: Improving reasoning via redundant token pruning PDF

Cannot Refute

[52] VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching PDF

Cannot Refute

[53] FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation PDF

Cannot Refute

[54] Cache what lasts: Token retention for memory-bounded kv cache in llms PDF

Cannot Refute

[55] Think: Thinner key cache by query-driven pruning PDF

Cannot Refute

[56] Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference PDF

Cannot Refute

[57] KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments PDF

Cannot Refute

[58] Segcache: a memory-efficient and scalable in-memory key-value cache for small objects PDF

Cannot Refute

[59] Trellis: Learning to Compress Key-Value Memory in Attention Models PDF

Cannot Refute

[60] Cognitive memory in large language models PDF

Cannot Refute

Contribution

Positional embedding adjustment for KV reuse

[69] KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse PDF

Can Refute

[72] Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention PDF

Can Refute

[74] CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation PDF

Can Refute

[76] Turborag: Accelerating retrieval-augmented generation with precomputed kv caches for chunked text PDF

Can Refute

[78] SemShareKV: Efficient KVCache sharing for semantically similar prompts via token-level LSH matching PDF

Can Refute

[71] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens PDF

Cannot Refute

[73] Timestep Embedding Tells: Itâs Time to Cache for Video Diffusion Model PDF

Cannot Refute

[75] Found in the middle: How language models use long contexts better via plug-and-play positional encoding PDF

Cannot Refute

[77] Symbol-rooted cascade propagation in contextual memory routing for large language models PDF

Cannot Refute

[79] Towards More Efficient Insertion Transformer with Fractional Positional Encoding PDF

Cannot Refute

Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Log-Augmented Generation (LAG) framework

[61] {ServerlessLLM}:{Low-Latency} serverless inference for large language models PDF

[62] {InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management PDF

[63] Cumulative Reasoning with Large Language Models PDF

[64] Recycled attention: Efficient inference for long-context language models PDF

[65] ArcMemo: Abstract Reasoning Composition with Lifelong LLM Memory PDF

[66] SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference PDF

[67] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion PDF

[68] PLD+: Accelerating LLM inference by leveraging Language Model Artifacts PDF

[69] KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse PDF

[70] Reusing, recycling and reducing large models for developing green and responsible language technology PDF

KV cache representation for logs

[51] Think clearly: Improving reasoning via redundant token pruning PDF

[52] VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching PDF

[53] FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation PDF

[54] Cache what lasts: Token retention for memory-bounded kv cache in llms PDF

[55] Think: Thinner key cache by query-driven pruning PDF

[56] Keyformer: KV Cache Reduction through Key Tokens Selection for Efficient Generative Inference PDF

[57] KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments PDF

[58] Segcache: a memory-efficient and scalable in-memory key-value cache for small objects PDF

[59] Trellis: Learning to Compress Key-Value Memory in Attention Models PDF

[60] Cognitive memory in large language models PDF

Positional embedding adjustment for KV reuse

[69] KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse PDF

[72] Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention PDF

[74] CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation PDF

[76] Turborag: Accelerating retrieval-augmented generation with precomputed kv caches for chunked text PDF

[78] SemShareKV: Efficient KVCache sharing for semantically similar prompts via token-level LSH matching PDF

[71] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens PDF

[73] Timestep Embedding Tells: Itâs Time to Cache for Video Diffusion Model PDF

[75] Found in the middle: How language models use long contexts better via plug-and-play positional encoding PDF

[77] Symbol-rooted cascade propagation in contextual memory routing for large language models PDF

[79] Towards More Efficient Insertion Transformer with Fractional Positional Encoding PDF

Table of Contents

[73] Timestep Embedding Tells: Itâs Time to Cache for Video Diffusion Model PDF