Premise Selection for a Lean Hammer

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

premise selectioninteractive theorem provingautomated reasoningcontrastive learning

Neural methods are transforming automated reasoning for proof assistants, yet integrating these advances into practical verification workflows remains challenging. A $\textit{hammer}$ is a tool that integrates premise selection, translation to external automatic theorem provers, and proof reconstruction into one overarching tool to automate tedious reasoning steps. We present LeanPremise, a novel neural premise selection system, and we combine it with existing translation and proof reconstruction components to create LeanHammer, the first end-to-end domain general hammer for the Lean proof assistant. Unlike existing Lean premise selectors, LeanPremise is specifically trained for use with a hammer in dependent type theory. It also dynamically adapts to user-specific contexts, enabling it to effectively recommend premises from libraries outside LeanPremise's training data as well as lemmas defined by the user locally. With comprehensive evaluations, we show that LeanPremise enables LeanHammer to solve 21% more goals than existing premise selectors and generalizes well to diverse domains. Our work helps bridge the gap between neural retrieval and symbolic reasoning, making formal verification more accessible to researchers and practitioners.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LeanPremise, a neural premise selection system, and LeanHammer, an end-to-end hammer for Lean. It resides in the 'Lean-Based Systems' leaf under 'Retrieval-Augmented Proof Assistants', which contains four papers total. This leaf sits within a moderately populated branch of integrated systems for proof assistants, suggesting active but not overcrowded research. The work targets a specific niche: combining premise selection with translation and proof reconstruction into a unified hammer tool for Lean's dependent type theory.

The taxonomy reveals neighboring leaves for Coq-Based Systems (three papers) and Other Proof Assistant Systems (four papers), indicating parallel efforts across different proof assistants. The parent category 'Retrieval-Augmented Proof Assistants' excludes automated theorem provers without interactive assistance, clarifying that this work emphasizes integration with user workflows rather than standalone automation. Sibling papers like LeanDojo and Lean Copilot focus on data extraction and tactic generation respectively, while this work emphasizes premise retrieval tailored for hammer use, suggesting complementary rather than overlapping goals.

Among 26 candidates examined, the analysis found one refutable pair for the 'end-to-end hammer' contribution (examined six candidates), while the neural premise selection system (ten candidates examined) and hammer-aware data extraction techniques (ten candidates examined) showed no clear refutations. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The premise selection component appears more novel within this sample, whereas the hammer integration claim faces at least one prior work challenge, though the scale of examination remains modest.

Given the limited literature search (26 candidates from semantic retrieval), the work appears to occupy a recognizable but not densely populated research direction. The taxonomy structure suggests this is an active area with established sibling systems, yet the specific combination of hammer-aware training and dynamic context adaptation may differentiate it. A broader search would be needed to assess whether similar hammer implementations exist outside the top-K matches examined here.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: neural premise selection for automated theorem proving. The field has evolved into a rich ecosystem organized around several complementary dimensions. At the highest level, one finds dedicated Neural Premise Selection Methods that develop specialized architectures and embeddings for ranking candidate premises, alongside Integrated Theorem Proving Systems that embed these selection modules into end-to-end proof assistants. Training Data and Benchmarks provide the empirical foundation, while Premise Selection Strategies and Optimization explore algorithmic refinements such as active learning and Bayesian tuning. Representation Learning for Logical Formulas focuses on encoding syntactic and semantic structure—ranging from graph embeddings to property-invariant representations—and Hybrid and Auxiliary Techniques combine symbolic reasoning with neural guidance. Surveys and Foundational Work (e.g., Deep Learning Theorem Proving Survey[2], Guided Automated Reasoning Survey[5]) offer broad perspectives, and Specialized Applications and Extensions address domain-specific challenges in systems like Metamath or commonsense reasoning benchmarks. Within Integrated Theorem Proving Systems, a particularly active line centers on Retrieval-Augmented Proof Assistants for the Lean proof assistant. Works such as LeanDojo[4] and Lean Copilot[6] provide infrastructure and interactive tooling that tightly couple premise retrieval with tactic suggestion, enabling rapid prototyping of neural methods in a production environment. Premise Selection Lean Hammer[0] sits squarely in this cluster, emphasizing efficient retrieval mechanisms tailored to Lean's library structure. Compared to LeanDojo[4], which prioritizes data extraction and benchmarking, and Lean Copilot[6], which integrates language-model-driven tactic generation, Premise Selection Lean Hammer[0] focuses more narrowly on the premise-ranking component itself. This specialization reflects a broader trade-off in the field: some systems aim for holistic proof search (blending premise selection with clause selection and strategy scheduling), while others isolate premise retrieval to achieve higher precision and interpretability within a single proof assistant.

Claimed Contributions

LEANPREMISE: Neural premise selection system for Lean hammer

10 retrieved papers

The authors develop LEANPREMISE, a neural premise selection tool specifically designed for use with a hammer in dependent type theory. Unlike existing Lean premise selectors, it is trained for hammer integration and dynamically adapts to user-specific contexts, enabling effective recommendation of premises from libraries outside its training data as well as user-defined local lemmas.

10 retrieved papers

LEANHAMMER: First end-to-end domain general hammer for Lean

Can Refute

6 retrieved papers

The authors combine LEANPREMISE with Lean-auto (translation tool), Duper (proof-producing tactic), and Aesop (proof search tool) to create LEANHAMMER, which is the first domain-general hammer for the Lean proof assistant. This unified pipeline integrates premise selection, translation to external automatic theorem provers, and proof reconstruction.

6 retrieved papers

Can Refute

Hammer-aware data extraction techniques

10 retrieved papers

The authors develop novel data extraction methods specifically designed for hammer integration, including normalized signature serialization, extraction from both term-style and tactic-style proofs, collection of implicit premises from automation, and training the model to select premises for closing goals rather than just modifying them.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models PDF

Yang Kaiyu, Kaiyu Yang, Swope, Aidan M., Aidan Swope, Gu, Alex, Alex Gu, Aidan M. Swope, Chalamala, Rahul, R Chalamala, Song, Peiyang, Peiyang Song, Rahul Chalamala, Yu, Shixing, Shixing Yu, Godil, Saad, Saad Godil, Prenger, Ryan, Ryan Prenger, Anandkumar, Anima, Anima Anandkumar, R. Prenger (2023) • Neural Information Processing Systems

[6] Lean copilot: Large language models as copilots for theorem proving in lean PDF

Song, Peiyang, Yang Kaiyu, Anandkumar, Anima (2024)

[29] Machine-Learned Premise Selection for Lean PDF

Bartosz Piotrowski, Ramon FernÃ¡ndez Mir, Edward Ayers, Edward L. Ayers (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LEANPREMISE: Neural premise selection system for Lean hammer

[23] Learning proof search in proof assistants PDF

Cannot Refute

[51] Hammer for Coq: Automation for dependent type theory PDF

Cannot Refute

[52] Integrating Deep Neural Networks with Dependent Type Semantics PDF

Cannot Refute

[53] Towards neural synthesis for smt-assisted proof-oriented programming PDF

Cannot Refute

[54] Holist: An environment for machine learning of higher order logic theorem proving PDF

Cannot Refute

[55] Neural Networks for Mathematical ReasoningâEvaluations, Capabilities, and Techniques PDF

Cannot Refute

[56] Learning Structure-Aware Representations of Dependent Types PDF

Cannot Refute

[57] Proof searching and prediction in HOL4 with evolutionary/heuristic and deep learning techniques PDF

Cannot Refute

[58] Learning-Assisted Reasoning within Proof Assistants via Symbolic, Statistical, and Neural Guidance PDF

Cannot Refute

[59] Dependent type networks: a probabilistic logic via the curry-howard correspondence in a system of probabilistic dependent types PDF

Cannot Refute

Contribution

LEANHAMMER: First end-to-end domain general hammer for Lean

[51] Hammer for Coq: Automation for dependent type theory PDF

Can Refute

[38] Automated Theorem Proving for Metamath PDF

Cannot Refute

[43] Premise Selection and External Provers for HOL4 PDF

Cannot Refute

[46] The Isabelle ENIGMA PDF

Cannot Refute

[64] Language Models for Verifiable Mathematical Automation Interaction, Integration, and Autoformalization PDF

Cannot Refute

[65] Goal translation for a hammer for Coq PDF

Cannot Refute

Contribution

Hammer-aware data extraction techniques

[1] Search Strategy Selection for Automated Theorem Proving PDF

Cannot Refute

[2] A survey on deep learning for theorem proving PDF

Cannot Refute

[4] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models PDF

Cannot Refute

[9] Rango: Adaptive retrieval-augmented proving for automated software verification PDF

Cannot Refute

[13] Property invariant embedding for automated reasoning PDF

Cannot Refute

[29] Machine-Learned Premise Selection for Lean PDF

Cannot Refute

[60] REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning PDF

Cannot Refute

[61] Magnushammer: A Transformer-based Approach to Premise Selection PDF

Cannot Refute

[62] MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation PDF

Cannot Refute

[63] Towards AI-assisted correctness-by-construction software development PDF

Cannot Refute

Premise Selection for a Lean Hammer

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models PDF

[6] Lean copilot: Large language models as copilots for theorem proving in lean PDF

[29] Machine-Learned Premise Selection for Lean PDF

Contribution Analysis

LEANPREMISE: Neural premise selection system for Lean hammer

[23] Learning proof search in proof assistants PDF

[51] Hammer for Coq: Automation for dependent type theory PDF

[52] Integrating Deep Neural Networks with Dependent Type Semantics PDF

[53] Towards neural synthesis for smt-assisted proof-oriented programming PDF

[54] Holist: An environment for machine learning of higher order logic theorem proving PDF

[55] Neural Networks for Mathematical ReasoningâEvaluations, Capabilities, and Techniques PDF

[56] Learning Structure-Aware Representations of Dependent Types PDF

[57] Proof searching and prediction in HOL4 with evolutionary/heuristic and deep learning techniques PDF

[58] Learning-Assisted Reasoning within Proof Assistants via Symbolic, Statistical, and Neural Guidance PDF

[59] Dependent type networks: a probabilistic logic via the curry-howard correspondence in a system of probabilistic dependent types PDF

LEANHAMMER: First end-to-end domain general hammer for Lean

[51] Hammer for Coq: Automation for dependent type theory PDF

[38] Automated Theorem Proving for Metamath PDF

[43] Premise Selection and External Provers for HOL4 PDF

[46] The Isabelle ENIGMA PDF

[64] Language Models for Verifiable Mathematical Automation Interaction, Integration, and Autoformalization PDF

[65] Goal translation for a hammer for Coq PDF

Hammer-aware data extraction techniques

[1] Search Strategy Selection for Automated Theorem Proving PDF

[2] A survey on deep learning for theorem proving PDF

[4] LeanDojo: Theorem Proving with Retrieval-Augmented Language Models PDF

[9] Rango: Adaptive retrieval-augmented proving for automated software verification PDF

[13] Property invariant embedding for automated reasoning PDF

[29] Machine-Learned Premise Selection for Lean PDF

[60] REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning PDF

[61] Magnushammer: A Transformer-based Approach to Premise Selection PDF

[62] MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation PDF

[63] Towards AI-assisted correctness-by-construction software development PDF

Table of Contents

[55] Neural Networks for Mathematical ReasoningâEvaluations, Capabilities, and Techniques PDF