Segment-Level Attribution for Selective Learning of Long Reasoning Traces

ICLR 2026 Conference SubmissionAnonymous Authors
long CoTsselective learningintegrated gradientsegment attributions
Abstract:

Large Reasoning Models (LRMs) achieve strong reasoning performance by generating long chains of thought (CoTs), yet only a small fraction of these traces meaningfully contributes to answer prediction, while the majority contains repetitive or truncated content. Such output redundancy is further propagated after supervised finetuning (SFT), as models learn to imitate verbose but uninformative patterns, which can degrade performance. To this end, we incorporate integrated gradient attribution to quantify each token's influence on final answers and aggregate them into two segment-level metrics: (1) \textit{attribution strength} measures the overall attribution magnitude; and (2) \textit{direction consistency} captures whether tokens' attributions within a segment are uniformly positive or negative (high consistency), or a mixture of both (moderate consistency). Based on these two metrics, we propose a segment-level selective learning framework to identify important segments with high attribution strength but moderate consistency that indicate reflective rather than shallow reasoning. The framework then applies selective SFT on these important segments while masking loss for unimportant ones. Experiments across multiple models and datasets show that our approach improves accuracy and output efficiency, enabling more effective learning from long reasoning traces.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes segment-level attribution metrics (attribution strength and direction consistency) to identify important reasoning segments for selective supervised finetuning. It occupies the 'Segment-Level Attribution for Selective Learning' leaf within the 'Segment-Level Credit Assignment and Optimization' branch. Notably, this leaf contains only the original paper itself, with no sibling papers identified in the taxonomy. This suggests the specific combination of integrated gradient attribution with segment-level selective SFT represents a relatively sparse research direction within the broader field of selective learning from long reasoning traces.

The taxonomy reveals that neighboring work pursues related but distinct approaches. The sibling leaf 'Reinforcement Learning with Segment-Level Advantage Estimation' focuses on RL-based policy optimization rather than attribution-guided supervised learning. Another sibling, 'Direct Reasoning Optimization for Open-Ended Tasks', addresses RL without verifiable rewards. The broader 'Attribution-Enhanced Reasoning and Explainability' branch emphasizes interpretability and faithfulness rather than optimization efficiency. The paper thus bridges attribution techniques (typically used for explainability) with selective learning objectives (typically addressed via RL), occupying a distinct methodological niche between these established directions.

Among 30 candidates examined across three contributions, none were found to clearly refute the paper's claims. For the segment-level attribution metrics using integrated gradients, 10 candidates were examined with 0 refutable matches. Similarly, the segment-level selective learning framework and the principled importance definition each examined 10 candidates with no refutations. This suggests that within the limited search scope, the specific combination of integrated gradient attribution at segment granularity for selective SFT appears relatively novel. However, the modest search scale means potentially relevant work outside the top-30 semantic matches may exist.

Based on the limited literature search of 30 candidates, the work appears to introduce a distinctive approach combining attribution-based segment selection with supervised finetuning. The absence of sibling papers in its taxonomy leaf and zero refutations across contributions suggest novelty within the examined scope. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of related work in adjacent communities or recent preprints not captured by semantic search.

Taxonomy

Core-task Taxonomy Papers
8
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: selective learning from long reasoning traces using segment-level attribution. This field addresses the challenge of training models on extended reasoning sequences by identifying which segments contribute most to correct outcomes. The taxonomy organizes work into four main branches. Segment-Level Credit Assignment and Optimization focuses on methods that attribute credit to individual reasoning steps and optimize learning accordingly, often using reinforcement learning or gradient-based techniques to selectively reinforce valuable segments. Attribution-Enhanced Reasoning and Explainability emphasizes interpretability, developing frameworks that trace causal relationships between reasoning steps and final answers. Structured Reasoning with Hierarchical Segmentation explores how to decompose complex reasoning into meaningful units, often leveraging hierarchical structures or knowledge graphs. Safety Assessment in Reasoning Processes examines how to evaluate and ensure the reliability of multi-step reasoning, particularly in high-stakes domains. Several active lines of work reveal key trade-offs in this landscape. One cluster, including Direct Reasoning Optimization[3] and Segment Policy Optimization[4], pursues fine-grained credit assignment through policy gradient methods that reward or penalize individual reasoning segments based on their contribution to task success. Another direction, represented by Causal Attribution Distillation[2] and KG-TRACES[7], emphasizes tracing causal dependencies to understand how intermediate steps influence outcomes. The original paper, Segment Attribution Learning[0], sits squarely within the credit assignment branch, sharing the optimization focus of works like Segment Policy Optimization[4] but distinguishing itself through its particular approach to attributing value across long traces. Meanwhile, SafeRBench[5] highlights an orthogonal concern—ensuring that segment-level learning does not compromise reasoning safety—illustrating how selective learning must balance efficiency gains against reliability requirements.

Claimed Contributions

Segment-level attribution metrics using integrated gradients

The authors introduce two segment-level metrics derived from integrated gradient attribution: attribution strength, which measures the overall magnitude of a segment's influence on model predictions, and direction consistency, which captures whether a segment exhibits uniform or mixed attribution directions. These metrics enable identification of important reasoning segments.

10 retrieved papers
Segment-level selective learning framework

The authors propose a framework that identifies important segments based on high attribution strength and moderate consistency, then applies selective supervised finetuning only on these segments while masking loss for unimportant ones. This approach acts as implicit regularization by preventing overfitting to uninformative content.

10 retrieved papers
Principled importance definition using integrated gradients

The authors adopt integrated gradients as a principled method to measure segment importance, capturing both direct and indirect influences on final answer prediction. This approach addresses limitations of sequential appending or leave-one-out methods that underestimate indirectly contributive segments.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Segment-level attribution metrics using integrated gradients

The authors introduce two segment-level metrics derived from integrated gradient attribution: attribution strength, which measures the overall magnitude of a segment's influence on model predictions, and direction consistency, which captures whether a segment exhibits uniform or mixed attribution directions. These metrics enable identification of important reasoning segments.

Contribution

Segment-level selective learning framework

The authors propose a framework that identifies important segments based on high attribution strength and moderate consistency, then applies selective supervised finetuning only on these segments while masking loss for unimportant ones. This approach acts as implicit regularization by preventing overfitting to uninformative content.

Contribution

Principled importance definition using integrated gradients

The authors adopt integrated gradients as a principled method to measure segment importance, capturing both direct and indirect influences on final answer prediction. This approach addresses limitations of sequential appending or leave-one-out methods that underestimate indirectly contributive segments.