Premise Selection for a Lean Hammer

ICLR 2026 Conference SubmissionAnonymous Authors
premise selectioninteractive theorem provingautomated reasoningcontrastive learning
Abstract:

Neural methods are transforming automated reasoning for proof assistants, yet integrating these advances into practical verification workflows remains challenging. A hammer\textit{hammer} is a tool that integrates premise selection, translation to external automatic theorem provers, and proof reconstruction into one overarching tool to automate tedious reasoning steps. We present LeanPremise, a novel neural premise selection system, and we combine it with existing translation and proof reconstruction components to create LeanHammer, the first end-to-end domain general hammer for the Lean proof assistant. Unlike existing Lean premise selectors, LeanPremise is specifically trained for use with a hammer in dependent type theory. It also dynamically adapts to user-specific contexts, enabling it to effectively recommend premises from libraries outside LeanPremise's training data as well as lemmas defined by the user locally. With comprehensive evaluations, we show that LeanPremise enables LeanHammer to solve 21% more goals than existing premise selectors and generalizes well to diverse domains. Our work helps bridge the gap between neural retrieval and symbolic reasoning, making formal verification more accessible to researchers and practitioners.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LeanPremise, a neural premise selection system, and LeanHammer, an end-to-end hammer for Lean. It resides in the 'Lean-Based Systems' leaf under 'Retrieval-Augmented Proof Assistants', which contains four papers total. This leaf sits within a moderately populated branch of integrated systems for proof assistants, suggesting active but not overcrowded research. The work targets a specific niche: combining premise selection with translation and proof reconstruction into a unified hammer tool for Lean's dependent type theory.

The taxonomy reveals neighboring leaves for Coq-Based Systems (three papers) and Other Proof Assistant Systems (four papers), indicating parallel efforts across different proof assistants. The parent category 'Retrieval-Augmented Proof Assistants' excludes automated theorem provers without interactive assistance, clarifying that this work emphasizes integration with user workflows rather than standalone automation. Sibling papers like LeanDojo and Lean Copilot focus on data extraction and tactic generation respectively, while this work emphasizes premise retrieval tailored for hammer use, suggesting complementary rather than overlapping goals.

Among 26 candidates examined, the analysis found one refutable pair for the 'end-to-end hammer' contribution (examined six candidates), while the neural premise selection system (ten candidates examined) and hammer-aware data extraction techniques (ten candidates examined) showed no clear refutations. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The premise selection component appears more novel within this sample, whereas the hammer integration claim faces at least one prior work challenge, though the scale of examination remains modest.

Given the limited literature search (26 candidates from semantic retrieval), the work appears to occupy a recognizable but not densely populated research direction. The taxonomy structure suggests this is an active area with established sibling systems, yet the specific combination of hammer-aware training and dynamic context adaptation may differentiate it. A broader search would be needed to assess whether similar hammer implementations exist outside the top-K matches examined here.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
26
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: neural premise selection for automated theorem proving. The field has evolved into a rich ecosystem organized around several complementary dimensions. At the highest level, one finds dedicated Neural Premise Selection Methods that develop specialized architectures and embeddings for ranking candidate premises, alongside Integrated Theorem Proving Systems that embed these selection modules into end-to-end proof assistants. Training Data and Benchmarks provide the empirical foundation, while Premise Selection Strategies and Optimization explore algorithmic refinements such as active learning and Bayesian tuning. Representation Learning for Logical Formulas focuses on encoding syntactic and semantic structure—ranging from graph embeddings to property-invariant representations—and Hybrid and Auxiliary Techniques combine symbolic reasoning with neural guidance. Surveys and Foundational Work (e.g., Deep Learning Theorem Proving Survey[2], Guided Automated Reasoning Survey[5]) offer broad perspectives, and Specialized Applications and Extensions address domain-specific challenges in systems like Metamath or commonsense reasoning benchmarks. Within Integrated Theorem Proving Systems, a particularly active line centers on Retrieval-Augmented Proof Assistants for the Lean proof assistant. Works such as LeanDojo[4] and Lean Copilot[6] provide infrastructure and interactive tooling that tightly couple premise retrieval with tactic suggestion, enabling rapid prototyping of neural methods in a production environment. Premise Selection Lean Hammer[0] sits squarely in this cluster, emphasizing efficient retrieval mechanisms tailored to Lean's library structure. Compared to LeanDojo[4], which prioritizes data extraction and benchmarking, and Lean Copilot[6], which integrates language-model-driven tactic generation, Premise Selection Lean Hammer[0] focuses more narrowly on the premise-ranking component itself. This specialization reflects a broader trade-off in the field: some systems aim for holistic proof search (blending premise selection with clause selection and strategy scheduling), while others isolate premise retrieval to achieve higher precision and interpretability within a single proof assistant.

Claimed Contributions

LEANPREMISE: Neural premise selection system for Lean hammer

The authors develop LEANPREMISE, a neural premise selection tool specifically designed for use with a hammer in dependent type theory. Unlike existing Lean premise selectors, it is trained for hammer integration and dynamically adapts to user-specific contexts, enabling effective recommendation of premises from libraries outside its training data as well as user-defined local lemmas.

10 retrieved papers
LEANHAMMER: First end-to-end domain general hammer for Lean

The authors combine LEANPREMISE with Lean-auto (translation tool), Duper (proof-producing tactic), and Aesop (proof search tool) to create LEANHAMMER, which is the first domain-general hammer for the Lean proof assistant. This unified pipeline integrates premise selection, translation to external automatic theorem provers, and proof reconstruction.

6 retrieved papers
Can Refute
Hammer-aware data extraction techniques

The authors develop novel data extraction methods specifically designed for hammer integration, including normalized signature serialization, extraction from both term-style and tactic-style proofs, collection of implicit premises from automation, and training the model to select premises for closing goals rather than just modifying them.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

LEANPREMISE: Neural premise selection system for Lean hammer

The authors develop LEANPREMISE, a neural premise selection tool specifically designed for use with a hammer in dependent type theory. Unlike existing Lean premise selectors, it is trained for hammer integration and dynamically adapts to user-specific contexts, enabling effective recommendation of premises from libraries outside its training data as well as user-defined local lemmas.

Contribution

LEANHAMMER: First end-to-end domain general hammer for Lean

The authors combine LEANPREMISE with Lean-auto (translation tool), Duper (proof-producing tactic), and Aesop (proof search tool) to create LEANHAMMER, which is the first domain-general hammer for the Lean proof assistant. This unified pipeline integrates premise selection, translation to external automatic theorem provers, and proof reconstruction.

Contribution

Hammer-aware data extraction techniques

The authors develop novel data extraction methods specifically designed for hammer integration, including normalized signature serialization, extraction from both term-style and tactic-style proofs, collection of implicit premises from automation, and training the model to select premises for closing goals rather than just modifying them.