ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

ai for mathproof simplification

Neural theorem proving has advanced rapidly in the past year, reaching IMO gold-medalist capabilities and producing formal proofs that span thousands of lines. Although such proofs are mechanically verified by formal systems like Lean, their excessive length renders them difficult for humans to comprehend and limits their usefulness for mathematical insight. Proof simplification is therefore a critical bottleneck. Yet, training data for this task is scarce, and existing methods—mainly agentic scaffolding with off-the-shelf LLMs—struggle with the extremely long proofs generated by RL-trained provers. We introduce ProofOptimizer, the first language model trained to simplify Lean proofs without requiring additional human supervision. ProofOptimizer is trained via expert iteration and reinforcement learning, using Lean to verify simplifications and provide training signal. At inference time, it operates within an iterative proof-shortening workflow, progressively reducing proof length. Experiments show that ProofOptimizer substantially compresses proofs generated by state-of-the-art RL-trained provers on standard benchmarks, reducing proof length by 87% on miniF2F, 57% on PutnamBench, and 50% on Seed-Prover's IMO 2025 proofs. Beyond conciseness, the simplified proofs check faster in Lean and further improve downstream prover performance when reused as training data for supervised finetuning.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

ProofOptimizer introduces the first language model trained specifically to simplify Lean proofs without additional human supervision, using expert iteration and reinforcement learning with Lean-based verification as the training signal. The paper sits in the 'Proof Simplification and Compression' leaf under 'Proof Optimization and Repair', which contains only two papers total in the entire taxonomy. This represents a notably sparse research direction within the broader field of 50 papers, suggesting that proof optimization has received far less attention than proof generation or autoformalization tasks.

The taxonomy reveals that most research effort concentrates in adjacent branches: 'Formal Proof Generation and Verification' contains multiple dense subtopics with 15 papers across six leaves, while 'Autoformalization and Translation' addresses informal-to-formal conversion with seven papers. The 'Proof Repair and Error Correction' sibling leaf focuses on fixing incorrect proofs rather than simplifying correct ones. ProofOptimizer's work diverges from these neighboring directions by assuming correct input proofs and targeting length reduction, rather than initial generation, translation, or error correction.

Among 30 candidates examined across three contributions, none were identified as clearly refuting the paper's claims. The first contribution (ProofOptimizer as first trained simplification model) examined 10 candidates with zero refutable matches. The training methodology and iterative inference workflow contributions each examined 10 candidates with similar results. This suggests that within the limited search scope, no prior work directly addresses supervised learning for proof simplification in Lean, though the small candidate pool means the search cannot be considered exhaustive.

Based on the limited literature search covering 30 semantically similar papers, ProofOptimizer appears to occupy a genuinely sparse research area. The taxonomy structure confirms that proof optimization receives minimal attention compared to proof generation. However, the analysis cannot rule out relevant work outside the top-30 semantic matches or in adjacent communities not captured by this search methodology.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Simplifying formal mathematical proofs using language models. The field has evolved into a rich ecosystem of interconnected branches. Formal Proof Generation and Verification focuses on end-to-end theorem proving systems that synthesize and check proofs in systems like Lean or Coq, with works such as LeanDojo[10] and Deep Learning Theorem Proving[12] establishing foundational infrastructure. Autoformalization and Translation addresses the challenge of converting informal mathematical statements into formal syntax, exemplified by efforts like Autoformalize[18] and Multilingual Autoformalization[25]. Proof Optimization and Repair targets the refinement of existing proofs—making them shorter, more readable, or correcting errors—while Specialized Mathematical Domains tackle specific problem classes such as inequalities or trigonometry. Foundation Models and Pretraining, including Llemma[3], provide the base capabilities that other branches build upon, and Benchmarks and Datasets supply the evaluation infrastructure that drives progress across all areas. Within Proof Optimization and Repair, a small but growing cluster of works explores proof simplification and compression, seeking to reduce proof complexity while preserving correctness. ProofOptimizer[0] sits squarely in this niche, emphasizing automated techniques to streamline formal proofs. Nearby, FVEL[7] addresses related verification and efficiency concerns, though with a slightly different emphasis on validation workflows. This contrasts with broader proof generation efforts like Proof Automation[8] or MLFMF[9], which prioritize discovering new proofs over refining existing ones. The tension between generating correct proofs and making them human-readable or computationally efficient remains a central open question. ProofOptimizer[0] contributes to this dialogue by focusing specifically on simplification, complementing the wider landscape where most attention has centered on proof discovery and autoformalization rather than post-hoc optimization.

Claimed Contributions

ProofOptimizer: first language model trained to simplify Lean proofs without human supervision

10 retrieved papers

The authors present ProofOptimizer, a language model specifically trained for proof simplification in Lean using expert iteration and reinforcement learning, without needing human-annotated simplification data. The model uses Lean's verification to provide training signals and operates within an iterative proof-shortening workflow at inference time.

10 retrieved papers

Training methodology combining expert iteration and reinforcement learning for proof simplification

10 retrieved papers

The authors develop a training approach that combines expert iteration (where the model proposes simplifications verified by Lean and incorporated into training data) and online reinforcement learning (using proof length and correctness as reward signals) to enable continual improvement in proof simplification.

10 retrieved papers

Iterative proof-shortening inference workflow

10 retrieved papers

The authors introduce an inference-time algorithm that iteratively applies the model to progressively shorten proofs by sampling multiple candidate simplifications and repeatedly applying the model to the currently shortest proof, achieving substantial compression on benchmark datasets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[7] FVEL: Interactive formal verification environment with large language models via theorem proving PDF

Qingxing Cao, Yinya Huang, Xiaodan Liang, Xiaohan Lin, Liu Zhengying, Jianqiao Lu, Linqi Song, Haiming Wang (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ProofOptimizer: first language model trained to simplify Lean proofs without human supervision

[17] Generative language modeling for automated theorem proving PDF

Cannot Refute

[20] Baldur: Whole-proof generation and repair with large language models PDF

Cannot Refute

[24] APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning PDF

Cannot Refute

[43] Automating mathematical proof generation using large language model agents and knowledge graphs PDF

Cannot Refute

[70] Draft, sketch, and prove: Guiding formal theorem provers with informal proofs PDF

Cannot Refute

[71] Kimina-prover preview: Towards large formal reasoning models with reinforcement learning PDF

Cannot Refute

[72] Autoformalization with large language models PDF

Cannot Refute

[73] Formal theorem proving by rewarding llms to decompose proofs hierarchically PDF

Cannot Refute

[74] Position: Formal Mathematical ReasoningâA New Frontier in AI PDF

Cannot Refute

[75] Towards automating formalisation of theorem statements using large language models PDF

Cannot Refute

Contribution

Training methodology combining expert iteration and reinforcement learning for proof simplification

[22] AI for Mathematics PDF

Cannot Refute

[51] ABEL: Sample efficient online reinforcement learning for neural theorem proving PDF

Cannot Refute

[52] Lean-star: Learning to interleave thinking and proving PDF

Cannot Refute

[53] Internlm2. 5-stepprover: Advancing automated theorem proving via expert iteration on large-scale lean problems PDF

Cannot Refute

[54] Formal mathematics statement curriculum learning PDF

Cannot Refute

[55] Bfs-prover: Scalable best-first tree search for llm-based automatic theorem proving PDF

Cannot Refute

[56] Contributions to Neural Theorem Proving PDF

Cannot Refute

[57] STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving PDF

Cannot Refute

[58] GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving PDF

Cannot Refute

[59] Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction PDF

Cannot Refute

Contribution

Iterative proof-shortening inference workflow

[60] Concise: Confidence-guided compression in step-by-step efficient reasoning PDF

Cannot Refute

[61] How to discover short, shorter, and the shortest proofs of unsatisfiability: a branch-and-bound approach for resolution proof length minimization PDF

Cannot Refute

[62] Learning better representations from less data for propositional satisfiability PDF

Cannot Refute

[63] Abstracting gradual typing PDF

Cannot Refute

[64] The normalized curve shortening flow and homothetic solutions PDF

Cannot Refute

[65] Coarsening natural deduction proofs II: finding gaunt proofs PDF

Cannot Refute

[66] Stepwise refinement provenance scheme for wireless sensor networks PDF

Cannot Refute

[67] Rodin: an open toolset for modelling and reasoning in Event-B PDF

Cannot Refute

[68] Efficient, verified checking of propositional proofs PDF

Cannot Refute

[69] Toward mechanical mathematics PDF

Cannot Refute

ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[7] FVEL: Interactive formal verification environment with large language models via theorem proving PDF

Contribution Analysis

ProofOptimizer: first language model trained to simplify Lean proofs without human supervision

[17] Generative language modeling for automated theorem proving PDF

[20] Baldur: Whole-proof generation and repair with large language models PDF

[24] APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning PDF

[43] Automating mathematical proof generation using large language model agents and knowledge graphs PDF

[70] Draft, sketch, and prove: Guiding formal theorem provers with informal proofs PDF

[71] Kimina-prover preview: Towards large formal reasoning models with reinforcement learning PDF

[72] Autoformalization with large language models PDF

[73] Formal theorem proving by rewarding llms to decompose proofs hierarchically PDF

[74] Position: Formal Mathematical ReasoningâA New Frontier in AI PDF

[75] Towards automating formalisation of theorem statements using large language models PDF

Training methodology combining expert iteration and reinforcement learning for proof simplification

[22] AI for Mathematics PDF

[51] ABEL: Sample efficient online reinforcement learning for neural theorem proving PDF

[52] Lean-star: Learning to interleave thinking and proving PDF

[53] Internlm2. 5-stepprover: Advancing automated theorem proving via expert iteration on large-scale lean problems PDF

[54] Formal mathematics statement curriculum learning PDF

[55] Bfs-prover: Scalable best-first tree search for llm-based automatic theorem proving PDF

[56] Contributions to Neural Theorem Proving PDF

[57] STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving PDF

[58] GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving PDF

[59] Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction PDF

Iterative proof-shortening inference workflow

[60] Concise: Confidence-guided compression in step-by-step efficient reasoning PDF

[61] How to discover short, shorter, and the shortest proofs of unsatisfiability: a branch-and-bound approach for resolution proof length minimization PDF

[62] Learning better representations from less data for propositional satisfiability PDF

[63] Abstracting gradual typing PDF

[64] The normalized curve shortening flow and homothetic solutions PDF

[65] Coarsening natural deduction proofs II: finding gaunt proofs PDF

[66] Stepwise refinement provenance scheme for wireless sensor networks PDF

[67] Rodin: an open toolset for modelling and reasoning in Event-B PDF

[68] Efficient, verified checking of propositional proofs PDF

[69] Toward mechanical mathematics PDF

Table of Contents

[74] Position: Formal Mathematical ReasoningâA New Frontier in AI PDF