Verifying Chain-of-Thought Reasoning via its Computational Graph

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Mechanistic InterpretabilityChain-of-Thought ReasoningAttribution Graphs

Current Chain-of-Thought (CoT) verification methods predict reasoning correctness based on outputs (black-box) or activations (gray-box), but offer limited insight into \textit{why} a computation fails. We introduce a white-box method: \textbf{Circuit-based Reasoning Verification (CRV)}. We hypothesize that attribution graphs of correct CoT steps, viewed as \textit{execution traces} of the model's latent reasoning circuits, possess distinct structural fingerprints from those of incorrect steps. By training a classifier on structural features of these graphs, we show that these traces contain a powerful signal of reasoning errors. Our white-box approach yields novel scientific insights unattainable by other methods. (1) We demonstrate that structural signatures of error are highly predictive, establishing the viability of verifying reasoning directly via its computational graph. (2) We find these signatures to be highly domain-specific, revealing that failures in different reasoning tasks manifest as distinct computational patterns. (3) We provide evidence that these signatures are not merely correlational; by using our analysis to guide targeted interventions on individual transcoder features, we successfully correct the model's faulty reasoning. Our work shows that, by scrutinizing a model's computational process, we can move from simple error detection to a deeper, causal understanding of LLM reasoning.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Circuit-based Reasoning Verification (CRV), which analyzes attribution graphs of chain-of-thought steps as execution traces of latent reasoning circuits. Within the taxonomy, it resides in the 'Attribution Graph and Circuit Analysis' leaf under 'Computational Graph and Circuit-Based Verification'. This leaf contains only three papers total, including the original work, indicating a relatively sparse and emerging research direction. The approach represents a white-box verification method that examines internal computational structures rather than relying solely on output analysis or external knowledge augmentation.

The taxonomy reveals that the broader field encompasses multiple verification paradigms. Neighboring branches include 'Structural Pattern Analysis in Reasoning Chains' (2 papers) within the same parent category, and more populated areas like 'External Knowledge Graph Augmented Reasoning' (15+ papers across multiple leaves) and 'Verification via Output Analysis' (8 papers). The scope note for the paper's leaf explicitly focuses on 'mechanistic circuits within transformer models', distinguishing it from methods that use external knowledge graphs or analyze only final outputs. This positioning suggests the work explores a less-traveled path compared to knowledge-graph-based or black-box verification approaches.

Among 30 candidates examined across three contributions, none were found to clearly refute any claimed novelty. For the core CRV method, 10 candidates were examined with 0 refutable overlaps; similarly, domain-specific structural signatures and causal interventions each had 10 candidates examined with no clear prior work. This limited search scope suggests that within the top-30 semantically similar papers, the specific combination of attribution graph analysis, structural fingerprinting of errors, and domain-specific patterns appears distinctive. However, the analysis acknowledges this represents a bounded literature search rather than exhaustive coverage.

Given the sparse population of the attribution graph analysis leaf and the absence of refuting work among examined candidates, the approach appears to occupy a relatively novel position within the limited search scope. The mechanistic focus on computational graph structures for verification contrasts with the field's heavier emphasis on external knowledge integration and output-based validation. However, the analysis is constrained by examining only 30 candidates, leaving open the possibility of relevant work outside this semantic neighborhood.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Verifying chain-of-thought reasoning correctness via computational graph analysis. The field has evolved into a rich landscape organized around several complementary perspectives. At the highest level, one branch focuses on computational graph and circuit-based verification, examining how reasoning steps form analyzable structures and how internal model circuits can be inspected for correctness. Another major direction augments reasoning with external knowledge graphs, integrating structured world knowledge to ground and validate intermediate steps. Graph-structured prompting frameworks explore how to organize reasoning itself as a graph of interconnected thoughts, while verification via output analysis emphasizes post-hoc checking of generated reasoning chains. Additional branches address reasoning process optimization through training, compositional and logical evaluation benchmarks, interactive human-in-the-loop methods, specialized domain applications, and theoretical foundations. Works such as Graph of Thought[3] and Think on Graph[11] illustrate how graph representations can guide the reasoning process, while Faith and Fate[1] and GraphCheck[14] exemplify efforts to validate reasoning outputs. Within this ecosystem, particularly active lines of work contrast mechanistic analysis of model internals with external validation strategies. Some studies like Mechanistic Unveiling[21] and Uncovering Graph Reasoning[44] probe the internal circuits and attribution graphs that underlie reasoning steps, seeking to understand what computational structures emerge during chain-of-thought generation. Others, such as GraphReason[7] and Reasoning on Graphs[10], leverage external knowledge graphs to anchor reasoning in verifiable facts. The original paper, Verifying CoT Graph[0], sits squarely within the computational graph and circuit-based verification branch, specifically focusing on attribution graph and circuit analysis. Its emphasis on analyzing the computational graph structure of reasoning chains aligns it closely with mechanistic approaches like Mechanistic Unveiling[21] and Uncovering Graph Reasoning[44], which similarly dissect internal reasoning pathways. This contrasts with works that primarily validate outputs against external references, positioning Verifying CoT Graph[0] as part of an emerging effort to make reasoning verification more intrinsic and interpretable through graph-theoretic analysis of the reasoning process itself.

Claimed Contributions

Circuit-based Reasoning Verification (CRV) method

10 retrieved papers

The authors propose CRV, a white-box verification method that analyzes the structural properties of attribution graphs constructed from interpretable transcoder features. By training a classifier on graph-based structural fingerprints, the method detects reasoning errors by examining the computational process rather than just outputs or raw activations.

10 retrieved papers

Domain-specific structural signatures of reasoning errors

10 retrieved papers

The authors demonstrate through cross-domain experiments that error signatures in attribution graphs are task-specific. Failures in different reasoning domains (e.g., boolean logic versus arithmetic) produce distinct structural patterns, though a combined classifier can learn multiple failure geometries simultaneously.

10 retrieved papers

Causal interventions guided by mechanistic analysis

10 retrieved papers

The authors show that structural error signatures are causally implicated in reasoning failures by performing targeted interventions on specific transcoder features identified through their analysis. These interventions successfully correct computational errors, demonstrating that the method enables actionable model debugging beyond simple error detection.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[21] Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning PDF

Zhang Lin, Hu Lijie, Lin Zhang, Wang DI, Lijie Hu, Di Wang (2025)

[44] Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing PDF

Dai Xinnan, Xinnan Dai, Guo Kai, Chung-Hsiang Lo, Zeng, Shenglai, Kai Guo, Luo Dongsheng, Shenglai Zeng, Tang, Jiliang, Dongsheng Luo, Jiliang Tang (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Circuit-based Reasoning Verification (CRV) method

[15] Reasoning Paths as Signals: Augmenting Multi-hop Fact Verification through Structural Reasoning Progression PDF

Cannot Refute

[29] Graph elicitation for guiding multi-step reasoning in large language models PDF

Cannot Refute

[51] Towards Faithful Multi-step Reasoning through Fine-Grained Causal-aware Attribution Reasoning Distillation PDF

Cannot Refute

[52] Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? PDF

Cannot Refute

[53] Beyond the Answer: Advancing Multi-Hop QA with Fine-Grained Graph Reasoning and Evaluation PDF

Cannot Refute

[54] KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision PDF

Cannot Refute

[55] RADAR: A Reasoning-Guided Attribution Framework for Explainable Visual Data Analysis PDF

Cannot Refute

[56] CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity PDF

Cannot Refute

[57] A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains PDF

Cannot Refute

[58] CiRLExplainer: Causality-Inspired Explainer for Graph Neural Networks via Reinforcement Learning PDF

Cannot Refute

Contribution

Domain-specific structural signatures of reasoning errors

[69] Art: Automatic multi-step reasoning and tool-use for large language models PDF

Cannot Refute

[70] Failuresensoriq: A multi-choice qa dataset for understanding sensor relationships and failure modes PDF

Cannot Refute

[71] Failure modes of llms for causal reasoning on narratives PDF

Cannot Refute

[72] Stochastic subnetwork induction for contextual perturbation analysis in large language model architectures PDF

Cannot Refute

[73] Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks PDF

Cannot Refute

[74] Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning PDF

Cannot Refute

[75] Evaluating Tool Selection and Usage Efficiency of LLM-based Agents in Domain-Specific Tasks: A Comparative Analysis PDF

Cannot Refute

[76] FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning PDF

Cannot Refute

[77] Engibench: A benchmark for evaluating large language models on engineering problem solving PDF

Cannot Refute

[78] When thinking fails: The pitfalls of reasoning for instruction-following in llms PDF

Cannot Refute

Contribution

Causal interventions guided by mechanistic analysis

[59] Inference-time intervention: Eliciting truthful answers from a language model PDF

Cannot Refute

[60] Toward transparent ai: A survey on interpreting the inner structures of deep neural networks PDF

Cannot Refute

[61] Causalbench: A comprehensive benchmark for evaluating causal reasoning capabilities of large language models PDF

Cannot Refute

[62] Learning internal representations by error propagation PDF

Cannot Refute

[63] Causality-based neural network repair PDF

Cannot Refute

[64] Towards Error Centric Intelligence I, Beyond Observational Learning PDF

Cannot Refute

[65] Causality-Driven Neural Network Repair: Challenges and Opportunities PDF

Cannot Refute

[66] Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering PDF

Cannot Refute

[67] Causal intervention and parameter-free reasoning for few-shot SAR target recognition PDF

Cannot Refute

[68] Coca: Improving and explaining graph neural network-based vulnerability detection systems PDF

Cannot Refute

Verifying Chain-of-Thought Reasoning via its Computational Graph

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[21] Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning PDF

[44] Uncovering Graph Reasoning in Decoder-only Transformers with Circuit Tracing PDF

Contribution Analysis

Circuit-based Reasoning Verification (CRV) method

[15] Reasoning Paths as Signals: Augmenting Multi-hop Fact Verification through Structural Reasoning Progression PDF

[29] Graph elicitation for guiding multi-step reasoning in large language models PDF

[51] Towards Faithful Multi-step Reasoning through Fine-Grained Causal-aware Attribution Reasoning Distillation PDF

[52] Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? PDF

[53] Beyond the Answer: Advancing Multi-Hop QA with Fine-Grained Graph Reasoning and Evaluation PDF

[54] KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision PDF

[55] RADAR: A Reasoning-Guided Attribution Framework for Explainable Visual Data Analysis PDF

[56] CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity PDF

[57] A chain-of-thought is as strong as its weakest link: A benchmark for verifiers of reasoning chains PDF

[58] CiRLExplainer: Causality-Inspired Explainer for Graph Neural Networks via Reinforcement Learning PDF

Domain-specific structural signatures of reasoning errors

[69] Art: Automatic multi-step reasoning and tool-use for large language models PDF

[70] Failuresensoriq: A multi-choice qa dataset for understanding sensor relationships and failure modes PDF

[71] Failure modes of llms for causal reasoning on narratives PDF

[72] Stochastic subnetwork induction for contextual perturbation analysis in large language model architectures PDF

[73] Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks PDF

[74] Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning PDF

[75] Evaluating Tool Selection and Usage Efficiency of LLM-based Agents in Domain-Specific Tasks: A Comparative Analysis PDF

[76] FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning PDF

[77] Engibench: A benchmark for evaluating large language models on engineering problem solving PDF

[78] When thinking fails: The pitfalls of reasoning for instruction-following in llms PDF

Causal interventions guided by mechanistic analysis

[59] Inference-time intervention: Eliciting truthful answers from a language model PDF

[60] Toward transparent ai: A survey on interpreting the inner structures of deep neural networks PDF

[61] Causalbench: A comprehensive benchmark for evaluating causal reasoning capabilities of large language models PDF

[62] Learning internal representations by error propagation PDF

[63] Causality-based neural network repair PDF

[64] Towards Error Centric Intelligence I, Beyond Observational Learning PDF

[65] Causality-Driven Neural Network Repair: Challenges and Opportunities PDF

[66] Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering PDF

[67] Causal intervention and parameter-free reasoning for few-shot SAR target recognition PDF

[68] Coca: Improving and explaining graph neural network-based vulnerability detection systems PDF

Table of Contents