Segment-Level Attribution for Selective Learning of Long Reasoning Traces

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

long CoTsselective learningintegrated gradientsegment attributions

Large Reasoning Models (LRMs) achieve strong reasoning performance by generating long chains of thought (CoTs), yet only a small fraction of these traces meaningfully contributes to answer prediction, while the majority contains repetitive or truncated content. Such output redundancy is further propagated after supervised finetuning (SFT), as models learn to imitate verbose but uninformative patterns, which can degrade performance. To this end, we incorporate integrated gradient attribution to quantify each token's influence on final answers and aggregate them into two segment-level metrics: (1) \textit{attribution strength} measures the overall attribution magnitude; and (2) \textit{direction consistency} captures whether tokens' attributions within a segment are uniformly positive or negative (high consistency), or a mixture of both (moderate consistency). Based on these two metrics, we propose a segment-level selective learning framework to identify important segments with high attribution strength but moderate consistency that indicate reflective rather than shallow reasoning. The framework then applies selective SFT on these important segments while masking loss for unimportant ones. Experiments across multiple models and datasets show that our approach improves accuracy and output efficiency, enabling more effective learning from long reasoning traces.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes segment-level attribution metrics (attribution strength and direction consistency) to identify important reasoning segments for selective supervised finetuning. It occupies the 'Segment-Level Attribution for Selective Learning' leaf within the 'Segment-Level Credit Assignment and Optimization' branch. Notably, this leaf contains only the original paper itself, with no sibling papers identified in the taxonomy. This suggests the specific combination of integrated gradient attribution with segment-level selective SFT represents a relatively sparse research direction within the broader field of selective learning from long reasoning traces.

The taxonomy reveals that neighboring work pursues related but distinct approaches. The sibling leaf 'Reinforcement Learning with Segment-Level Advantage Estimation' focuses on RL-based policy optimization rather than attribution-guided supervised learning. Another sibling, 'Direct Reasoning Optimization for Open-Ended Tasks', addresses RL without verifiable rewards. The broader 'Attribution-Enhanced Reasoning and Explainability' branch emphasizes interpretability and faithfulness rather than optimization efficiency. The paper thus bridges attribution techniques (typically used for explainability) with selective learning objectives (typically addressed via RL), occupying a distinct methodological niche between these established directions.

Among 30 candidates examined across three contributions, none were found to clearly refute the paper's claims. For the segment-level attribution metrics using integrated gradients, 10 candidates were examined with 0 refutable matches. Similarly, the segment-level selective learning framework and the principled importance definition each examined 10 candidates with no refutations. This suggests that within the limited search scope, the specific combination of integrated gradient attribution at segment granularity for selective SFT appears relatively novel. However, the modest search scale means potentially relevant work outside the top-30 semantic matches may exist.

Based on the limited literature search of 30 candidates, the work appears to introduce a distinctive approach combining attribution-based segment selection with supervised finetuning. The absence of sibling papers in its taxonomy leaf and zero refutations across contributions suggest novelty within the examined scope. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of related work in adjacent communities or recent preprints not captured by semantic search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: selective learning from long reasoning traces using segment-level attribution. This field addresses the challenge of training models on extended reasoning sequences by identifying which segments contribute most to correct outcomes. The taxonomy organizes work into four main branches. Segment-Level Credit Assignment and Optimization focuses on methods that attribute credit to individual reasoning steps and optimize learning accordingly, often using reinforcement learning or gradient-based techniques to selectively reinforce valuable segments. Attribution-Enhanced Reasoning and Explainability emphasizes interpretability, developing frameworks that trace causal relationships between reasoning steps and final answers. Structured Reasoning with Hierarchical Segmentation explores how to decompose complex reasoning into meaningful units, often leveraging hierarchical structures or knowledge graphs. Safety Assessment in Reasoning Processes examines how to evaluate and ensure the reliability of multi-step reasoning, particularly in high-stakes domains. Several active lines of work reveal key trade-offs in this landscape. One cluster, including Direct Reasoning Optimization[3] and Segment Policy Optimization[4], pursues fine-grained credit assignment through policy gradient methods that reward or penalize individual reasoning segments based on their contribution to task success. Another direction, represented by Causal Attribution Distillation[2] and KG-TRACES[7], emphasizes tracing causal dependencies to understand how intermediate steps influence outcomes. The original paper, Segment Attribution Learning[0], sits squarely within the credit assignment branch, sharing the optimization focus of works like Segment Policy Optimization[4] but distinguishing itself through its particular approach to attributing value across long traces. Meanwhile, SafeRBench[5] highlights an orthogonal concern—ensuring that segment-level learning does not compromise reasoning safety—illustrating how selective learning must balance efficiency gains against reliability requirements.

Claimed Contributions

Segment-level attribution metrics using integrated gradients

10 retrieved papers

The authors introduce two segment-level metrics derived from integrated gradient attribution: attribution strength, which measures the overall magnitude of a segment's influence on model predictions, and direction consistency, which captures whether a segment exhibits uniform or mixed attribution directions. These metrics enable identification of important reasoning segments.

10 retrieved papers

Segment-level selective learning framework

10 retrieved papers

The authors propose a framework that identifies important segments based on high attribution strength and moderate consistency, then applies selective supervised finetuning only on these segments while masking loss for unimportant ones. This approach acts as implicit regularization by preventing overfitting to uninformative content.

10 retrieved papers

Principled importance definition using integrated gradients

10 retrieved papers

The authors adopt integrated gradients as a principled method to measure segment importance, capturing both direct and indirect influences on final answer prediction. This approach addresses limitations of sequential appending or leave-one-out methods that underestimate indirectly contributive segments.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Segment-level attribution metrics using integrated gradients

[22] Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers PDF

Cannot Refute

[29] Beyond intuition: Rethinking token attributions inside transformers PDF

Cannot Refute

[30] A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification PDF

Cannot Refute

[31] Explaining pre-trained language models with attribution scores: An analysis in low-resource settings PDF

Cannot Refute

[32] GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs PDF

Cannot Refute

[33] An attribution method for Siamese encoders PDF

Cannot Refute

[34] Evaluating attribution methods for explainable nlp with transformers PDF

Cannot Refute

[35] Analyzing Latent Concepts in Code Language Models PDF

Cannot Refute

[36] Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models PDF

Cannot Refute

[37] Discretized Integrated Gradients for Explaining Language Models PDF

Cannot Refute

Contribution

Segment-level selective learning framework

[9] Lens: Learning to segment anything with unified reinforced reasoning PDF

Cannot Refute

[10] Importance weighting can help large language models self-improve PDF

Cannot Refute

[11] Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning PDF

Cannot Refute

[12] Lr-sql: A supervised fine-tuning method for text2sql tasks under low-resource scenarios PDF

Cannot Refute

[13] Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs PDF

Cannot Refute

[14] Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning PDF

Cannot Refute

[15] Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data PDF

Cannot Refute

[16] EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation PDF

Cannot Refute

[17] Audio Question Answering with GRPO-Based Fine-Tuning and Calibrated Segment-Level Predictions PDF

Cannot Refute

[18] Not All Thoughts Matter: Selective Attention for Efficient Reasoning PDF

Cannot Refute

Contribution

Principled importance definition using integrated gradients

[19] Guided integrated gradients: An adaptive path method for removing noise PDF

Cannot Refute

[20] A rigorous study of integrated gradients method and extensions to internal neuron attributions PDF

Cannot Refute

[21] Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution PDF

Cannot Refute

[22] Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers PDF

Cannot Refute

[23] TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation PDF

Cannot Refute

[24] IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution PDF

Cannot Refute

[25] Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification PDF

Cannot Refute

[26] Xrai: Better attributions through regions PDF

Cannot Refute

[27] Assessing the Reliability of Integrated Gradients-Based Saliency Maps for 3D Point Cloud Semantic Segmentation Models PDF

Cannot Refute

[28] Integrated Gradients for Feature Assessment in Point Cloud-Based Data Sets PDF

Cannot Refute

Segment-Level Attribution for Selective Learning of Long Reasoning Traces

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Segment-level attribution metrics using integrated gradients

[22] Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers PDF

[29] Beyond intuition: Rethinking token attributions inside transformers PDF

[30] A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification PDF

[31] Explaining pre-trained language models with attribution scores: An analysis in low-resource settings PDF

[32] GrAInS: Gradient-based Attribution for Inference-Time Steering of LLMs and VLMs PDF

[33] An attribution method for Siamese encoders PDF

[34] Evaluating attribution methods for explainable nlp with transformers PDF

[35] Analyzing Latent Concepts in Code Language Models PDF

[36] Uniform Discretized Integrated Gradients: An effective attribution based method for explaining large language models PDF

[37] Discretized Integrated Gradients for Explaining Language Models PDF

Segment-level selective learning framework

[9] Lens: Learning to segment anything with unified reinforced reasoning PDF

[10] Importance weighting can help large language models self-improve PDF

[11] Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning PDF

[12] Lr-sql: A supervised fine-tuning method for text2sql tasks under low-resource scenarios PDF

[13] Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs PDF

[14] Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning PDF

[15] Towards Efficient Medical Reasoning with Minimal Fine-Tuning Data PDF

[16] EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation PDF

[17] Audio Question Answering with GRPO-Based Fine-Tuning and Calibrated Segment-Level Predictions PDF

[18] Not All Thoughts Matter: Selective Attention for Efficient Reasoning PDF

Principled importance definition using integrated gradients

[19] Guided integrated gradients: An adaptive path method for removing noise PDF

[20] A rigorous study of integrated gradients method and extensions to internal neuron attributions PDF

[21] Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution PDF

[22] Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers PDF

[23] TIMING: Temporality-Aware Integrated Gradients for Time Series Explanation PDF

[24] IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution PDF

[25] Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification PDF

[26] Xrai: Better attributions through regions PDF

[27] Assessing the Reliability of Integrated Gradients-Based Saliency Maps for 3D Point Cloud Semantic Segmentation Models PDF

[28] Integrated Gradients for Feature Assessment in Point Cloud-Based Data Sets PDF

Table of Contents