Abstract:

While humans naturally learn and adapt from past experiences, large language models (LLMs) and their agentic counterparts often fail to retain reasoning from previous tasks and apply it in future contexts. We introduce Log-Augmented Generation (LAG), a novel framework that directly reuses prior computation and reasoning from past logs at test time, enabling models to learn from previous tasks and perform better on new, unseen challenges, without sacrificing the system's efficiency or scalability. Our approach represents task logs as key-value (KV) caches that encode the full reasoning context of prior tasks, while storing KV values for only a selected subset of tokens. When a new task arises, LAG retrieves KV values from relevant logs to augment generation. Unlike reflection-based memory mechanisms, which require additional extraction or distillation steps, LAG reuses prior reasoning verbatim. Moreover, it extends beyond existing KV caching techniques, which have primarily targeted efficiency, by explicitly improving accuracy through log reuse. Experiments on knowledge- and reasoning-intensive datasets demonstrate that our method significantly outperforms standard agentic systems that do not utilize logs, as well as existing solutions based on reflection and KV cache techniques.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Log-Augmented Generation (LAG), a framework that directly reuses prior computation and reasoning from past task logs at test time by representing logs as key-value caches. Within the taxonomy, LAG occupies the 'Direct Computation Reuse at Test Time' leaf, which currently contains only this single paper. This positioning indicates a relatively sparse research direction focused specifically on runtime retrieval of prior reasoning without training updates, distinguishing it from the more populated branches addressing memory-based agent systems or training-based experience integration.

The taxonomy reveals that LAG's nearest conceptual neighbors reside in 'Memory-Based Experience Reuse for Agent Systems,' which includes procedural memory frameworks, trajectory-level storage, and vector-based memory systems across multiple papers. However, those approaches typically involve abstraction, distillation, or structured knowledge graphs rather than verbatim reuse of computation. Another related branch, 'Training-Based Experience Integration,' encompasses reasoning trace distillation and reinforcement learning with experience replay, but these methods incorporate experiences during model training rather than at inference time. LAG's approach diverges by avoiding both abstraction and training overhead.

Among the three identified contributions, the core LAG framework and KV cache representation each examined ten candidate papers with zero refutable instances, suggesting limited direct prior work in this specific formulation. The positional embedding adjustment mechanism, however, encountered five refutable candidates among ten examined, indicating more substantial overlap with existing KV caching techniques. This pattern suggests that while the overall framework appears relatively novel within the limited search scope of thirty candidates, certain technical components build upon established methods in transformer optimization and context extension.

Based on the top-thirty semantic matches examined, LAG appears to occupy a distinct niche combining test-time retrieval with verbatim reasoning reuse. The analysis does not cover exhaustive literature on KV caching efficiency techniques or broader memory-augmented generation methods, which may contain additional relevant prior work. The framework's novelty primarily stems from its integration of log-based retrieval with direct computation reuse, rather than from individual technical components.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
5
Refutable Paper

Research Landscape Overview

Core task: Reusing prior reasoning and computation from past task logs. The field encompasses diverse strategies for leveraging historical execution traces, learned experiences, and intermediate computations to improve efficiency and performance in agent systems and reasoning tasks. The taxonomy reveals several major branches: Memory-Based Experience Reuse for Agent Systems focuses on storing and retrieving episodic or procedural knowledge from past interactions, often using replay mechanisms similar to those in reinforcement learning (e.g., Hindsight Experience Replay[33], Contextual Experience Replay[29]). Training-Based Experience Integration emphasizes incorporating logged data into model training pipelines, while Reasoning Strategy Optimization and Adaptation explores how agents can refine their problem-solving approaches by analyzing prior successes and failures. Execution Trace Analysis and Reuse examines methods that directly parse and repurpose computational traces, and Embodied and Interactive Task Learning addresses situated agents that learn from physical or simulated environments. Specialized Learning and Reasoning Applications target domain-specific scenarios, and Direct Computation Reuse at Test Time investigates runtime retrieval and application of previously computed solutions without additional training. A particularly active contrast emerges between training-time integration methods, which distill experience into model weights (e.g., ExGRPO[4], ARPO[20]), and test-time retrieval approaches that dynamically access stored computations on demand. Log-Augmented Generation[0] sits squarely within the Direct Computation Reuse at Test Time branch, emphasizing runtime lookup of relevant prior reasoning traces to guide current inference without retraining. This contrasts with memory-based agent systems like Agent KB[16] or Legomem[11], which maintain evolving knowledge bases updated through interaction, and with training-focused methods such as Reasoningbank[5] that curate datasets for offline learning. The central trade-off involves balancing the flexibility and immediacy of test-time reuse against the generalization and compactness achievable through training-based distillation, with Log-Augmented Generation[0] prioritizing rapid adaptation and interpretability by directly referencing historical logs.

Claimed Contributions

Log-Augmented Generation (LAG) framework

The authors propose a framework that enables large language models to reuse prior reasoning and computation from past task executions at inference time. Unlike reflection-based methods that extract and distill logs, LAG directly reuses past reasoning verbatim to improve both accuracy and efficiency on new tasks.

10 retrieved papers
KV cache representation for logs

The authors introduce a method to represent logs using key-value caches that capture the full reasoning context by encoding all model responses but storing KV values only for selected tokens (e.g., the last model response). This approach leverages the attention mechanism's property that a token's KV value attends to the entire context, enabling compact yet semantically rich log storage.

10 retrieved papers
Positional embedding adjustment for KV reuse

The authors develop a technique to handle the positional dependency of KV values when reusing them in new contexts. By removing original RoPE positional embeddings and reapplying new ones based on updated positional IDs, the method enables effective integration of retrieved KV caches into current generation contexts.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Log-Augmented Generation (LAG) framework

The authors propose a framework that enables large language models to reuse prior reasoning and computation from past task executions at inference time. Unlike reflection-based methods that extract and distill logs, LAG directly reuses past reasoning verbatim to improve both accuracy and efficiency on new tasks.

Contribution

KV cache representation for logs

The authors introduce a method to represent logs using key-value caches that capture the full reasoning context by encoding all model responses but storing KV values only for selected tokens (e.g., the last model response). This approach leverages the attention mechanism's property that a token's KV value attends to the entire context, enabling compact yet semantically rich log storage.

Contribution

Positional embedding adjustment for KV reuse

The authors develop a technique to handle the positional dependency of KV values when reusing them in new contexts. By removing original RoPE positional embeddings and reapplying new ones based on updated positional IDs, the method enables effective integration of retrieved KV caches into current generation contexts.