The Markovian Thinker
Overview
Overall Novelty Assessment
The paper introduces Delethink, a thinking algorithm implementing the Markovian Thinking Paradigm for reasoning LLMs. It decomposes reasoning into sequential chunks where each chunk references only a fixed number of prior tokens as Markovian state, deleting the rest to achieve linear compute and constant memory. Within the taxonomy, this work resides in the 'Markovian and Chunked Reasoning Paradigms' leaf, which contains only two papers total. This represents a sparse, emerging research direction focused specifically on structured reasoning under strict memory constraints, contrasting with the more populated branches addressing attention mechanisms or state-space models.
The taxonomy reveals that neighboring approaches pursue efficiency through different mechanisms. The 'State-Space and Recurrent Architectures' branch (containing models like Mamba and xLSTM across five subcategories) achieves linear complexity through sequential state updates but maintains implicit memory in parameters. The 'Memory-Augmented Architectures' branch (five subcategories including Infinite Memory Transformer and Memformer) uses external storage to extend context dynamically. Delethink diverges by imposing explicit Markovian constraints on reasoning traces rather than approximating full attention or managing external memory, positioning it as a fundamentally different paradigm for bounded-state reasoning.
Among thirty candidates examined, the contribution-level analysis shows varied novelty. The core Markovian Thinking Paradigm and zero-shot Delethink inference examined ten candidates each with zero refutations, suggesting these contributions occupy relatively unexplored territory within the limited search scope. However, the Delethink training for reinforcement learning examined ten candidates and found one refutable overlap, indicating more substantial prior work in efficient RL training methods. This pattern suggests the paradigm itself is novel while its application to RL training connects to existing efficiency techniques in that domain.
Based on the limited search of thirty semantically similar papers, Delethink appears to introduce a distinctive approach to reasoning efficiency. The sparse population of its taxonomy leaf and low refutation rates for core contributions suggest novelty, though the analysis cannot claim exhaustiveness. The single refutation in RL training highlights that while the Markovian paradigm is fresh, its integration with established training methods naturally encounters existing work in that intersection.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose a new reasoning paradigm where models think in fixed-size chunks, retaining only a minimal markovian state from prior reasoning. This enables linear compute scaling and constant memory usage during both training and inference, in contrast to the quadratic costs of standard long chain-of-thought approaches.
The authors present an inference method that can be applied directly to existing reasoning models without additional training or prompting, allowing them to function as Markovian thinkers by reasoning in chunks while maintaining fixed context size.
The authors develop a reinforcement learning training procedure that explicitly trains models to reason in the Markovian manner. This approach achieves comparable performance to standard long chain-of-thought training while requiring significantly less compute and memory resources.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[12] Working-Memory-Correct Long-Horizon Expert-Retrieval TTT Dialogue PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Markovian Thinking Paradigm and Delethink Algorithm
The authors propose a new reasoning paradigm where models think in fixed-size chunks, retaining only a minimal markovian state from prior reasoning. This enables linear compute scaling and constant memory usage during both training and inference, in contrast to the quadratic costs of standard long chain-of-thought approaches.
[51] Titans: Learning to memorize at test time PDF
[52] Scaling Reasoning without Attention PDF
[53] JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation PDF
[54] Random-access infinite context length for transformers PDF
[55] Lserve: Efficient long-sequence llm serving with unified sparse attention PDF
[56] Artificial hippocampus networks for efficient long-context modeling PDF
[57] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads PDF
[58] Lococo: Dropping in convolutions for long context compression PDF
[59] Trellis: Learning to Compress Key-Value Memory in Attention Models PDF
[60] MEDCnet: A Memory Efficient Approach for Processing HighâResolution Fundus Images for Diabetic Retinopathy Classification Using CNN PDF
Delethink Inference for Zero-Shot Markovian Thinking
The authors present an inference method that can be applied directly to existing reasoning models without additional training or prompting, allowing them to function as Markovian thinkers by reasoning in chunks while maintaining fixed context size.
[71] Least-to-Most Prompting Enables Complex Reasoning in Large Language Models PDF
[72] Art: Automatic multi-step reasoning and tool-use for large language models PDF
[73] A comprehensive survey of prompt engineering techniques in large language models PDF
[74] Lisa: Reasoning segmentation via large language model PDF
[75] Recursive decomposition of logical thoughts: Framework for superior reasoning and knowledge propagation in large language models PDF
[76] PAL: Program-aided Language Models PDF
[77] A systematic survey of prompt engineering in large language models: Techniques and applications PDF
[78] SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning PDF
[79] Reasoning with large language models, a survey PDF
[80] Decomposed Prompting: A Modular Approach for Solving Complex Tasks PDF
Delethink Training for Efficient Reinforcement Learning
The authors develop a reinforcement learning training procedure that explicitly trains models to reason in the Markovian manner. This approach achieves comparable performance to standard long chain-of-thought training while requiring significantly less compute and memory resources.