Verifying Chain-of-Thought Reasoning via its Computational Graph

ICLR 2026 Conference SubmissionAnonymous Authors
Mechanistic InterpretabilityChain-of-Thought ReasoningAttribution Graphs
Abstract:

Current Chain-of-Thought (CoT) verification methods predict reasoning correctness based on outputs (black-box) or activations (gray-box), but offer limited insight into \textit{why} a computation fails. We introduce a white-box method: \textbf{Circuit-based Reasoning Verification (CRV)}. We hypothesize that attribution graphs of correct CoT steps, viewed as \textit{execution traces} of the model's latent reasoning circuits, possess distinct structural fingerprints from those of incorrect steps. By training a classifier on structural features of these graphs, we show that these traces contain a powerful signal of reasoning errors. Our white-box approach yields novel scientific insights unattainable by other methods. (1) We demonstrate that structural signatures of error are highly predictive, establishing the viability of verifying reasoning directly via its computational graph. (2) We find these signatures to be highly domain-specific, revealing that failures in different reasoning tasks manifest as distinct computational patterns. (3) We provide evidence that these signatures are not merely correlational; by using our analysis to guide targeted interventions on individual transcoder features, we successfully correct the model's faulty reasoning. Our work shows that, by scrutinizing a model's computational process, we can move from simple error detection to a deeper, causal understanding of LLM reasoning.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Circuit-based Reasoning Verification (CRV), which analyzes attribution graphs of chain-of-thought steps as execution traces of latent reasoning circuits. Within the taxonomy, it resides in the 'Attribution Graph and Circuit Analysis' leaf under 'Computational Graph and Circuit-Based Verification'. This leaf contains only three papers total, including the original work, indicating a relatively sparse and emerging research direction. The approach represents a white-box verification method that examines internal computational structures rather than relying solely on output analysis or external knowledge augmentation.

The taxonomy reveals that the broader field encompasses multiple verification paradigms. Neighboring branches include 'Structural Pattern Analysis in Reasoning Chains' (2 papers) within the same parent category, and more populated areas like 'External Knowledge Graph Augmented Reasoning' (15+ papers across multiple leaves) and 'Verification via Output Analysis' (8 papers). The scope note for the paper's leaf explicitly focuses on 'mechanistic circuits within transformer models', distinguishing it from methods that use external knowledge graphs or analyze only final outputs. This positioning suggests the work explores a less-traveled path compared to knowledge-graph-based or black-box verification approaches.

Among 30 candidates examined across three contributions, none were found to clearly refute any claimed novelty. For the core CRV method, 10 candidates were examined with 0 refutable overlaps; similarly, domain-specific structural signatures and causal interventions each had 10 candidates examined with no clear prior work. This limited search scope suggests that within the top-30 semantically similar papers, the specific combination of attribution graph analysis, structural fingerprinting of errors, and domain-specific patterns appears distinctive. However, the analysis acknowledges this represents a bounded literature search rather than exhaustive coverage.

Given the sparse population of the attribution graph analysis leaf and the absence of refuting work among examined candidates, the approach appears to occupy a relatively novel position within the limited search scope. The mechanistic focus on computational graph structures for verification contrasts with the field's heavier emphasis on external knowledge integration and output-based validation. However, the analysis is constrained by examining only 30 candidates, leaving open the possibility of relevant work outside this semantic neighborhood.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Verifying chain-of-thought reasoning correctness via computational graph analysis. The field has evolved into a rich landscape organized around several complementary perspectives. At the highest level, one branch focuses on computational graph and circuit-based verification, examining how reasoning steps form analyzable structures and how internal model circuits can be inspected for correctness. Another major direction augments reasoning with external knowledge graphs, integrating structured world knowledge to ground and validate intermediate steps. Graph-structured prompting frameworks explore how to organize reasoning itself as a graph of interconnected thoughts, while verification via output analysis emphasizes post-hoc checking of generated reasoning chains. Additional branches address reasoning process optimization through training, compositional and logical evaluation benchmarks, interactive human-in-the-loop methods, specialized domain applications, and theoretical foundations. Works such as Graph of Thought[3] and Think on Graph[11] illustrate how graph representations can guide the reasoning process, while Faith and Fate[1] and GraphCheck[14] exemplify efforts to validate reasoning outputs. Within this ecosystem, particularly active lines of work contrast mechanistic analysis of model internals with external validation strategies. Some studies like Mechanistic Unveiling[21] and Uncovering Graph Reasoning[44] probe the internal circuits and attribution graphs that underlie reasoning steps, seeking to understand what computational structures emerge during chain-of-thought generation. Others, such as GraphReason[7] and Reasoning on Graphs[10], leverage external knowledge graphs to anchor reasoning in verifiable facts. The original paper, Verifying CoT Graph[0], sits squarely within the computational graph and circuit-based verification branch, specifically focusing on attribution graph and circuit analysis. Its emphasis on analyzing the computational graph structure of reasoning chains aligns it closely with mechanistic approaches like Mechanistic Unveiling[21] and Uncovering Graph Reasoning[44], which similarly dissect internal reasoning pathways. This contrasts with works that primarily validate outputs against external references, positioning Verifying CoT Graph[0] as part of an emerging effort to make reasoning verification more intrinsic and interpretable through graph-theoretic analysis of the reasoning process itself.

Claimed Contributions

Circuit-based Reasoning Verification (CRV) method

The authors propose CRV, a white-box verification method that analyzes the structural properties of attribution graphs constructed from interpretable transcoder features. By training a classifier on graph-based structural fingerprints, the method detects reasoning errors by examining the computational process rather than just outputs or raw activations.

10 retrieved papers
Domain-specific structural signatures of reasoning errors

The authors demonstrate through cross-domain experiments that error signatures in attribution graphs are task-specific. Failures in different reasoning domains (e.g., boolean logic versus arithmetic) produce distinct structural patterns, though a combined classifier can learn multiple failure geometries simultaneously.

10 retrieved papers
Causal interventions guided by mechanistic analysis

The authors show that structural error signatures are causally implicated in reasoning failures by performing targeted interventions on specific transcoder features identified through their analysis. These interventions successfully correct computational errors, demonstrating that the method enables actionable model debugging beyond simple error detection.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Circuit-based Reasoning Verification (CRV) method

The authors propose CRV, a white-box verification method that analyzes the structural properties of attribution graphs constructed from interpretable transcoder features. By training a classifier on graph-based structural fingerprints, the method detects reasoning errors by examining the computational process rather than just outputs or raw activations.

Contribution

Domain-specific structural signatures of reasoning errors

The authors demonstrate through cross-domain experiments that error signatures in attribution graphs are task-specific. Failures in different reasoning domains (e.g., boolean logic versus arithmetic) produce distinct structural patterns, though a combined classifier can learn multiple failure geometries simultaneously.

Contribution

Causal interventions guided by mechanistic analysis

The authors show that structural error signatures are causally implicated in reasoning failures by performing targeted interventions on specific transcoder features identified through their analysis. These interventions successfully correct computational errors, demonstrating that the method enables actionable model debugging beyond simple error detection.