ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

AutoformalizationLarge Language ModelsDependency GraphLean (Formal Language)Structural FidelitySemantic Faithfulness

Proof autoformalization, the task of translating natural language theorems and proofs into machine-verifiable code, is a critical step for integrating large language models into rigorous mathematical workflows. Current approaches focus on producing executable code, but they frequently fail to preserve the semantic meaning and logical structure of the original human-written argument. To address this, we introduce ProofFlow, a novel pipeline that treats structural fidelity as a primary objective. ProofFlow first constructs a directed acyclic graph (DAG) to map the logical dependencies between proof steps. Then, it employs a novel lemma-based approach to systematically formalize each step as an intermediate lemma, preserving the logical structure of the original argument. To facilitate evaluation, we present a new benchmark of 184 undergraduate-level problems, manually annotated with step-by-step solutions and logical dependency graphs, and introduce ProofScore, a new composite metric to evaluate syntactic correctness, semantic faithfulness, and structural fidelity. Experimental results show our pipeline sets a new state-of-the-art for autoformalization, achieving a ProofScore of 0.545, substantially exceeding baselines like full-proof formalization (0.279), which processes the entire proof at once, and step-proof formalization (0.046), which handles each step independently. Our pipeline, benchmark, and score metric are open-sourced to encourage further progress at https://anonymous.4open.science/r/ProofFlow-351E.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

ProofFlow introduces a pipeline that constructs directed acyclic graphs to map logical dependencies between proof steps, then formalizes each step as an intermediate lemma to preserve structural fidelity. The taxonomy places this work in the 'Dependency Graph-Based Formalization' leaf under 'Structure-Aware Autoformalization', which currently contains only this paper as its sole member. This indicates a relatively sparse research direction within the broader autoformalization landscape, suggesting the explicit graph-based structural modeling approach is not yet widely explored in the literature.

The taxonomy reveals that ProofFlow's parent branch, 'Structure-Aware Autoformalization', sits alongside 'End-to-End Neural Autoformalization' and 'Controlled Natural Language Formalization' as major methodological divisions. Neighboring leaves include 'Incremental Step-by-Step Formalization' (which processes proofs sequentially with verification feedback) and 'Full-Proof Autoformalization' (which translates complete proofs without decomposition). ProofFlow diverges from these by explicitly modeling dependency graphs before formalization, occupying a distinct methodological niche that bridges structural analysis and systematic translation.

Among the 24 candidates examined through semantic search, none were found to clearly refute any of ProofFlow's three contributions. The ProofFlow pipeline examined 10 candidates with zero refutable overlaps, the ProofScore metric examined 6 candidates with zero refutations, and the ProofFlowBench benchmark examined 8 candidates with zero refutations. This suggests that within the limited search scope, the combination of dependency graph construction, lemma-based formalization, and the specific evaluation framework appears relatively novel, though the analysis does not cover the entire field exhaustively.

Based on the top-24 semantic matches and the taxonomy structure, ProofFlow appears to occupy a sparsely populated methodological space. The absence of sibling papers in its taxonomy leaf and the lack of clear prior work overlap in the examined candidates suggest meaningful novelty, though this assessment is constrained by the limited search scope. A more comprehensive literature review covering additional venues and earlier foundational work in proof structure analysis would strengthen confidence in this preliminary assessment.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Translating natural language mathematical proofs into formal verification code. The field has evolved into several distinct branches that address different facets of this challenge. Autoformalization Methods and Systems explore techniques for converting informal mathematics into machine-checkable code, ranging from structure-aware approaches like dependency graph-based methods to controlled natural language interfaces (Naproche[46], Controlled Natural Language[38]). Formal Proof Synthesis and Search focus on generating and discovering proofs within formal systems, often leveraging neural methods (Deep Learning Theorem Proving[9]) or structured search strategies (Draft Sketch Prove[14]). Benchmarks and Datasets provide the empirical foundation, with resources like ProofNet[5] and Lean Workbook[7] enabling systematic evaluation. Verification and Alignment Evaluation address the correctness and fidelity of translations (FormalAlign[15]), while Domain-Specific and Applied Formalization targets particular mathematical areas such as Euclidean geometry (Euclidean Geometry Proofs[12]). Surveys and Overviews (Autoformalization Survey[10]) synthesize progress, and Foundations and Theoretical Perspectives examine the underlying principles of proof and computation (Proof and Computation[28]). Recent work has intensified around structure-aware autoformalization, where methods exploit the logical dependencies and hierarchical organization of proofs rather than treating them as flat text. ProofFlow[0] exemplifies this trend by using dependency graphs to guide formalization, situating itself within a small cluster of works that parse proof structure explicitly. This contrasts with earlier efforts like Autoformalization LLMs[2], which rely more heavily on end-to-end neural translation, and with step-by-step approaches such as StepProof[4] that incrementally build formal statements. A key trade-off emerges between preserving the natural proof's modularity—enabling easier debugging and human readability—and achieving high automation with minimal user intervention. ProofFlow[0] leans toward the former, emphasizing how dependency-aware decomposition can improve both correctness and interpretability, while neighboring systems like Informal to Formal[3] explore hybrid strategies that balance structure and flexibility. Open questions remain about scalability to complex, multi-layered arguments and the extent to which graph-based representations generalize across diverse mathematical domains.

Claimed Contributions

ProofFlow pipeline for structure-preserving proof autoformalization

10 retrieved papers

The authors propose a three-stage pipeline that constructs a directed acyclic graph (DAG) to map logical dependencies between proof steps, then employs a lemma-based approach to systematically formalize each step as an intermediate lemma, preserving the logical structure of the original natural language proof.

10 retrieved papers

ProofScore metric for comprehensive autoformalization evaluation

6 retrieved papers

The authors develop a unified scoring method that explicitly measures three key properties of autoformalized proofs: syntactic correctness (no compilation errors), semantic faithfulness (preserving mathematical meaning), and structural fidelity (preserving the proof's dependency graph).

6 retrieved papers

ProofFlowBench benchmark dataset with annotated dependency graphs

8 retrieved papers

The authors introduce a curated benchmark dataset containing 184 undergraduate-level mathematics theorems and proofs from six key areas, each manually annotated with proof steps divided into logical components and their respective dependency graphs for evaluating structural fidelity.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ProofFlow pipeline for structure-preserving proof autoformalization

[56] Rethinking and improving autoformalization: towards a faithful metric and a dependency retrieval-based approach PDF

Cannot Refute

[57] Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph PDF

Cannot Refute

[58] Toward AI-Augmented Formal Verification: A Preliminary Investigation of ENGRU and Its Challenges PDF

Cannot Refute

[59] Automated Enrichment of Logical Attack Graphs via Formal Ontologies PDF

Cannot Refute

[60] Dependency-Graph Enabled Formal Analysis for 5G AKA Protocols: Assumption Propagation and Verification PDF

Cannot Refute

[61] FVEL: Interactive formal verification environment with large language models via theorem proving PDF

Cannot Refute

[62] Toward Auto-Modeling of Formal Verification for NextG Protocols: A Multimodal Cross- and Self-Attention Large Language Model Approach PDF

Cannot Refute

[63] Binary-Level Formal Verification Based Automatic Security Ensurement for PLC in Industrial IoT PDF

Cannot Refute

[64] Subgoal-based demonstration learning for formal theorem proving PDF

Cannot Refute

[65] Graph of Logic: Enhancing LLM Reasoning with Graphs and Symbolic Logic PDF

Cannot Refute

Contribution

ProofScore metric for comprehensive autoformalization evaluation

[32] ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings PDF

Cannot Refute

[51] Criticlean: Critic-guided reinforcement learning for mathematical formalization PDF

Cannot Refute

[52] ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity PDF

Cannot Refute

[53] ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization PDF

Cannot Refute

[54] KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment PDF

Cannot Refute

[55] Automatic Translation of Natural Language Requirements into Ctl Specifications Using Large Language Models: A Multi-Approach Evaluationâ PDF

Cannot Refute

Contribution

ProofFlowBench benchmark dataset with annotated dependency graphs

[56] Rethinking and improving autoformalization: towards a faithful metric and a dependency retrieval-based approach PDF

Cannot Refute

[57] Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph PDF

Cannot Refute

[66] AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database PDF

Cannot Refute

[67] Improving Autoformalization Using Direct Dependency Retrieval PDF

Cannot Refute

[68] Formal entity graphs as complex networks: assessing centrality metrics of the archive of formal proofs PDF

Cannot Refute

[69] Structure in Theorem Proving: Analyzing and Improving the Isabelle Archive of Formal Proofs PDF

Cannot Refute

[70] Autoformalization of Mathematical Proofs from Natural Language to Proof Assistants PDF

Cannot Refute

[71] DAG-MATH: GRAPH-GUIDED MATHEMATICAL REA PDF

Cannot Refute

ProofFlow: A Dependency Graph Approach to Faithful Proof Autoformalization

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

ProofFlow pipeline for structure-preserving proof autoformalization

[56] Rethinking and improving autoformalization: towards a faithful metric and a dependency retrieval-based approach PDF

[57] Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph PDF

[58] Toward AI-Augmented Formal Verification: A Preliminary Investigation of ENGRU and Its Challenges PDF

[59] Automated Enrichment of Logical Attack Graphs via Formal Ontologies PDF

[60] Dependency-Graph Enabled Formal Analysis for 5G AKA Protocols: Assumption Propagation and Verification PDF

[61] FVEL: Interactive formal verification environment with large language models via theorem proving PDF

[62] Toward Auto-Modeling of Formal Verification for NextG Protocols: A Multimodal Cross- and Self-Attention Large Language Model Approach PDF

[63] Binary-Level Formal Verification Based Automatic Security Ensurement for PLC in Industrial IoT PDF

[64] Subgoal-based demonstration learning for formal theorem proving PDF

[65] Graph of Logic: Enhancing LLM Reasoning with Graphs and Symbolic Logic PDF

ProofScore metric for comprehensive autoformalization evaluation

[32] ProofBridge: Auto-Formalization of Natural Language Proofs in Lean via Joint Embeddings PDF

[51] Criticlean: Critic-guided reinforcement learning for mathematical formalization PDF

[52] ASSESS: A Semantic and Structural Evaluation Framework for Statement Similarity PDF

[53] ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization PDF

[54] KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment PDF

[55] Automatic Translation of Natural Language Requirements into Ctl Specifications Using Large Language Models: A Multi-Approach Evaluationâ PDF

ProofFlowBench benchmark dataset with annotated dependency graphs

[56] Rethinking and improving autoformalization: towards a faithful metric and a dependency retrieval-based approach PDF

[57] Aria: An Agent For Retrieval and Iterative Auto-Formalization via Dependency Graph PDF

[66] AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database PDF

[67] Improving Autoformalization Using Direct Dependency Retrieval PDF

[68] Formal entity graphs as complex networks: assessing centrality metrics of the archive of formal proofs PDF

[69] Structure in Theorem Proving: Analyzing and Improving the Isabelle Archive of Formal Proofs PDF

[70] Autoformalization of Mathematical Proofs from Natural Language to Proof Assistants PDF

[71] DAG-MATH: GRAPH-GUIDED MATHEMATICAL REA PDF

Table of Contents

[55] Automatic Translation of Natural Language Requirements into Ctl Specifications Using Large Language Models: A Multi-Approach Evaluationâ PDF