Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Large Language ModelReinforcement LearningGeometry Agent Prover

Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of formal proof systems. However, due to weak heuristics for auxiliary constructions, AI for geometry problem solving remains dominated by expert models such as AlphaGeometry 2, which rely heavily on large-scale data synthesis and search for both training and evaluation. In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry. InternGeometry overcomes the heuristic limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on the engine’s feedback to guide subsequent proposals. A dynamic memory mechanism enables InternGeometry to conduct more than two hundred interactions with the symbolic engine per problem. To further accelerate learning, we introduce Complexity-Boosting Reinforcement Learning (CBRL), which gradually increases the complexity of synthesized problems across training stages. Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems (2000–2024), exceeding the average gold medalist score (40.9), using only 13K training examples, just $0.004\%$ of the data used by AlphaGeometry 2, demonstrating the potential of LLM agents on expert-level geometry tasks. InternGeometry can also propose novel auxiliary constructions for IMO problems that do not appear in human solutions. We will release the model, data, and symbolic engine to support future research.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Automated geometry theorem proving. The field encompasses a diverse set of approaches for mechanically establishing the validity of geometric statements. At the highest level, the taxonomy distinguishes between classical algebraic and symbolic methods—such as the Wu method[5] and Gröbner basis techniques[18]—that reduce geometric problems to polynomial algebra, and formal verification frameworks that integrate geometry into interactive proof assistants like Coq[6,13]. Alongside these traditional pillars, neural and learning-based approaches have emerged, leveraging reinforcement learning and neural proof search to navigate large search spaces. Additional branches address proof generation and readability[11,17], theorem discovery and generation[20,32], interactive and dynamic geometry systems[23,24], and specialized domains including solid geometry[39] and olympiad-level problems[47]. Surveys and foundational studies[14] provide historical context and methodological overviews, tying together decades of research from symbolic engines to modern machine learning. Recent work has intensified the use of reinforcement learning for proof search, exploring how agents can learn effective strategies in complex geometric environments. Expert Agent Curriculum[0] sits squarely within this neural and learning-based branch, focusing on curriculum design and expert guidance to improve RL-driven proof discovery. It contrasts with Aristotle IMO[2], which also targets challenging competition-level geometry but may emphasize different training regimes or neural architectures. Both efforts reflect a broader trend of applying deep learning to domains once dominated by symbolic reasoning, raising questions about the trade-offs between interpretability—where classical methods like Wu's algorithm[5] produce verifiable algebraic certificates—and the flexibility of learned heuristics. As the field matures, a key open question is how to blend symbolic guarantees with neural scalability, ensuring that automated provers remain both powerful and trustworthy across diverse geometric settings.

Claimed Contributions

InternGeometry: a medalist-level LLM agent for geometry problem solving

10 retrieved papers

The authors introduce InternGeometry, an LLM-based agent that solves IMO-level geometry problems by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on feedback. A dynamic memory mechanism enables the agent to conduct over 200 interactions per problem.

10 retrieved papers

Complexity-Boosting Reinforcement Learning (CBRL)

Can Refute

9 retrieved papers

The authors propose CBRL, a multi-stage curriculum reinforcement learning framework that progressively increases the difficulty of synthesized geometry problems during training. This approach accelerates learning by adapting problem complexity to the current model capability.

9 retrieved papers

Can Refute

InternGeometry-DDAR: an interactive geometric proof engine

Can Refute

10 retrieved papers

The authors develop InternGeometry-DDAR, an enhanced interactive geometric proof engine based on the open-source DDAR system. It includes advanced definition strategies and a rich theorem library whose search space theoretically covers complete solutions for most IMO geometry problems.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Aristotle: IMO-level Automated Theorem Proving PDF

Achim, Tudor, Best, Alex, Tudor Achim, Bietti, Alberto, Alex Best, Alberto Bietti, Kevin Der, Gukov, Sergei, Mathis F'ed'erico, Halpern-Leistner, Daniel, Sergei Gukov, Daniel Halpern-Leistner, Kudryashov, Yury, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Rodriguez, Eric, Riley Patterson, Eric Rodriguez, Laura Scharff, Sicca, Vladmir, Vikram Shanker, Sowrirajan, Hari, Vladmir Sicca, Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Williams, Harold, Jonathan Thomm, Harold Williams, Lawrence Wu (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

InternGeometry: a medalist-level LLM agent for geometry problem solving

[4] Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving PDF

Cannot Refute

[8] LeanGeo: Formalizing Competitional Geometry problems in Lean PDF

Cannot Refute

[51] GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation PDF

Cannot Refute

[52] Enhancing the geometric problem-solving ability of multimodal llms via symbolic-neural integration PDF

Cannot Refute

[53] GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language PDF

Cannot Refute

[54] Geox: Geometric problem solving through unified formalized vision-language pre-training PDF

Cannot Refute

[55] Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach PDF

Cannot Refute

[56] Towards Geometry Problem Solving in the Large Model Era: A Survey PDF

Cannot Refute

[57] Machine assisted proof PDF

Cannot Refute

[58] Knowledge crosswords: Geometric reasoning over structured knowledge with large language models PDF

Cannot Refute

Contribution

Complexity-Boosting Reinforcement Learning (CBRL)

[67] Efficient reinforcement finetuning via adaptive curriculum learning PDF

Can Refute

[70] Self-Evolving Curriculum for LLM Reasoning PDF

Can Refute

[66] Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning PDF

Cannot Refute

[68] Vl-cogito: Progressive curriculum reinforcement learning for advanced multimodal reasoning PDF

Cannot Refute

[69] Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond PDF

Cannot Refute

[71] Formal mathematics statement curriculum learning PDF

Cannot Refute

[72] Learning like humans: Advancing llm reasoning capabilities via adaptive difficulty curriculum learning and expert-guided self-reformulation PDF

Cannot Refute

[73] Ghpo: Adaptive guidance for stable and efficient llm reinforcement learning PDF

Cannot Refute

[74] SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning PDF

Cannot Refute

Contribution

InternGeometry-DDAR: an interactive geometric proof engine

[62] Formalgeo: An extensible formalized framework for olympiad geometric problem solving PDF

Can Refute

[20] Considerations on approaches and metrics in automated theorem generation/finding in geometry PDF

Cannot Refute

[23] Automated theorem proving in GeoGebra: Current achievements PDF

Cannot Refute

[24] Proof exploration using dynamic geometry systems with integrated automated deduction capabilities PDF

Cannot Refute

[32] Automatic discovery of theorems in elementary geometry PDF

Cannot Refute

[59] Evolution of automated deduction and dynamic constructions in geometry PDF

Cannot Refute

[60] Computer-assisted theorem proving in synthetic geometry PDF

Cannot Refute

[61] ArgoTriCSâautomated triangle construction solver PDF

Cannot Refute

[63] Combining pencil/paper proofs and formal proofs, a challenge for Artificial Intelligence and mathematics education PDF

Cannot Refute

[64] Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions PDF

Cannot Refute

Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Aristotle: IMO-level Automated Theorem Proving PDF

Contribution Analysis

InternGeometry: a medalist-level LLM agent for geometry problem solving

[4] Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving PDF

[8] LeanGeo: Formalizing Competitional Geometry problems in Lean PDF

[51] GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation PDF

[52] Enhancing the geometric problem-solving ability of multimodal llms via symbolic-neural integration PDF

[53] GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language PDF

[54] Geox: Geometric problem solving through unified formalized vision-language pre-training PDF

[55] Towards Reliable Proof Generation with LLMs: A Neuro-Symbolic Approach PDF

[56] Towards Geometry Problem Solving in the Large Model Era: A Survey PDF

[57] Machine assisted proof PDF

[58] Knowledge crosswords: Geometric reasoning over structured knowledge with large language models PDF

Complexity-Boosting Reinforcement Learning (CBRL)

[67] Efficient reinforcement finetuning via adaptive curriculum learning PDF

[70] Self-Evolving Curriculum for LLM Reasoning PDF

[66] Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning PDF

[68] Vl-cogito: Progressive curriculum reinforcement learning for advanced multimodal reasoning PDF

[69] Light-r1: Curriculum sft, dpo and rl for long cot from scratch and beyond PDF

[71] Formal mathematics statement curriculum learning PDF

[72] Learning like humans: Advancing llm reasoning capabilities via adaptive difficulty curriculum learning and expert-guided self-reformulation PDF

[73] Ghpo: Adaptive guidance for stable and efficient llm reinforcement learning PDF

[74] SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning PDF

InternGeometry-DDAR: an interactive geometric proof engine

[62] Formalgeo: An extensible formalized framework for olympiad geometric problem solving PDF

[20] Considerations on approaches and metrics in automated theorem generation/finding in geometry PDF

[23] Automated theorem proving in GeoGebra: Current achievements PDF

[24] Proof exploration using dynamic geometry systems with integrated automated deduction capabilities PDF

[32] Automatic discovery of theorems in elementary geometry PDF

[59] Evolution of automated deduction and dynamic constructions in geometry PDF

[60] Computer-assisted theorem proving in synthetic geometry PDF

[61] ArgoTriCSâautomated triangle construction solver PDF

[63] Combining pencil/paper proofs and formal proofs, a challenge for Artificial Intelligence and mathematics education PDF

[64] Geoint-R1: Formalizing Multimodal Geometric Reasoning with Dynamic Auxiliary Constructions PDF

Table of Contents

[61] ArgoTriCSâautomated triangle construction solver PDF