Achieving Expert-Level Agent from Foundation Model via Complexity Curriculum Reinforcement Learning with Synthetic Data

ICLR 2026 Conference SubmissionAnonymous Authors
Large Language ModelReinforcement LearningGeometry Agent Prover
Abstract:

Large language model (LLM) agents exhibit strong mathematical problem-solving abilities and can even solve International Mathematical Olympiad (IMO) level problems with the assistance of formal proof systems. However, due to weak heuristics for auxiliary constructions, AI for geometry problem solving remains dominated by expert models such as AlphaGeometry 2, which rely heavily on large-scale data synthesis and search for both training and evaluation. In this work, we make the first attempt to build a medalist-level LLM agent for geometry and present InternGeometry. InternGeometry overcomes the heuristic limitations in geometry by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on the engine’s feedback to guide subsequent proposals. A dynamic memory mechanism enables InternGeometry to conduct more than two hundred interactions with the symbolic engine per problem. To further accelerate learning, we introduce Complexity-Boosting Reinforcement Learning (CBRL), which gradually increases the complexity of synthesized problems across training stages. Built on InternThinker-32B, InternGeometry solves 44 of 50 IMO geometry problems (2000–2024), exceeding the average gold medalist score (40.9), using only 13K training examples, just 0.004%0.004\% of the data used by AlphaGeometry 2, demonstrating the potential of LLM agents on expert-level geometry tasks. InternGeometry can also propose novel auxiliary constructions for IMO problems that do not appear in human solutions. We will release the model, data, and symbolic engine to support future research.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Automated geometry theorem proving. The field encompasses a diverse set of approaches for mechanically establishing the validity of geometric statements. At the highest level, the taxonomy distinguishes between classical algebraic and symbolic methods—such as the Wu method[5] and Gröbner basis techniques[18]—that reduce geometric problems to polynomial algebra, and formal verification frameworks that integrate geometry into interactive proof assistants like Coq[6,13]. Alongside these traditional pillars, neural and learning-based approaches have emerged, leveraging reinforcement learning and neural proof search to navigate large search spaces. Additional branches address proof generation and readability[11,17], theorem discovery and generation[20,32], interactive and dynamic geometry systems[23,24], and specialized domains including solid geometry[39] and olympiad-level problems[47]. Surveys and foundational studies[14] provide historical context and methodological overviews, tying together decades of research from symbolic engines to modern machine learning. Recent work has intensified the use of reinforcement learning for proof search, exploring how agents can learn effective strategies in complex geometric environments. Expert Agent Curriculum[0] sits squarely within this neural and learning-based branch, focusing on curriculum design and expert guidance to improve RL-driven proof discovery. It contrasts with Aristotle IMO[2], which also targets challenging competition-level geometry but may emphasize different training regimes or neural architectures. Both efforts reflect a broader trend of applying deep learning to domains once dominated by symbolic reasoning, raising questions about the trade-offs between interpretability—where classical methods like Wu's algorithm[5] produce verifiable algebraic certificates—and the flexibility of learned heuristics. As the field matures, a key open question is how to blend symbolic guarantees with neural scalability, ensuring that automated provers remain both powerful and trustworthy across diverse geometric settings.

Claimed Contributions

InternGeometry: a medalist-level LLM agent for geometry problem solving

The authors introduce InternGeometry, an LLM-based agent that solves IMO-level geometry problems by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on feedback. A dynamic memory mechanism enables the agent to conduct over 200 interactions per problem.

10 retrieved papers
Complexity-Boosting Reinforcement Learning (CBRL)

The authors propose CBRL, a multi-stage curriculum reinforcement learning framework that progressively increases the difficulty of synthesized geometry problems during training. This approach accelerates learning by adapting problem complexity to the current model capability.

9 retrieved papers
Can Refute
InternGeometry-DDAR: an interactive geometric proof engine

The authors develop InternGeometry-DDAR, an enhanced interactive geometric proof engine based on the open-source DDAR system. It includes advanced definition strategies and a rich theorem library whose search space theoretically covers complete solutions for most IMO geometry problems.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

InternGeometry: a medalist-level LLM agent for geometry problem solving

The authors introduce InternGeometry, an LLM-based agent that solves IMO-level geometry problems by iteratively proposing propositions and auxiliary constructions, verifying them with a symbolic engine, and reflecting on feedback. A dynamic memory mechanism enables the agent to conduct over 200 interactions per problem.

Contribution

Complexity-Boosting Reinforcement Learning (CBRL)

The authors propose CBRL, a multi-stage curriculum reinforcement learning framework that progressively increases the difficulty of synthesized geometry problems during training. This approach accelerates learning by adapting problem complexity to the current model capability.

Contribution

InternGeometry-DDAR: an interactive geometric proof engine

The authors develop InternGeometry-DDAR, an enhanced interactive geometric proof engine based on the open-source DDAR system. It includes advanced definition strategies and a rich theorem library whose search space theoretically covers complete solutions for most IMO geometry problems.