TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Label misinformationdata curationinfluence function

Practical classification requires both high predictive accuracy and reliable confidence for human-AI collaboration. Given that a high-quality dataset is expensive and sometimes impossible, learning with noisy labels (LNL) is of great importance. The state-of-the-art works propose many denoising approaches by categorically correcting the label noise, i.e., change a label from one class to another. While effective in improving accuracy, they are less effective for learning reliable confidence. This happens especially when the number of classes grows, giving rise to more ambiguous samples. In addition, traditional approaches usually curate the training dataset (e.g., reweighting samples or correcting data labels) by intrinsically learning normalities from the noisy dataset. The curation performance can suffer when the noisy ratio is high enough to form a polluting normality.

In this work, we propose a training-time data-curation framework, TrainRef, to uniformly address predictive accuracy and confidence calibration by (1) an extrinsic small set of reference samples $D_{{ref}}$ to avoid normality pollution and (2) curate labels into a class distribution instead of a categorical class to handle sample ambiguity. Our insights lie in that the extrinsic information allows us to select more precise clean samples even when $|D_{{ref}}|$ equals to the number of classes (i.e., one sample per class). Technically, we design (1) a reference augmentation technique to select clean samples from the dataset based on $D_{{ref}}$ ; and (2) a model-dataset co-evolving technique for a near-perfect embedding space, which is used to vote on the class-distribution for the label of a noisy sample. Extensive experiments on CIFAR-100, Animal10N, and WebVision demonstrate that TrainRef outperform the state-of-the-art denoising techniques (DISC, L2B, and DivideMix) and model calibration techniques (label smoothing, Mixup, and temperature scaling). Furthermore, our user study shows that the model confidence trained by TrainRef well aligns with human intuition. More demonstration, proof, and experimental details are available at https://sites.google.com/view/train-ref.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes TrainRef, a framework that curates noisy labels into class distributions rather than categorical corrections, using an extrinsic reference set to avoid normality pollution. It resides in the 'Soft Label and Distribution-Based Correction' leaf, which contains only three papers total, including this work. This leaf sits within the broader 'Label Correction and Soft Label Learning' branch, indicating a moderately sparse research direction focused on probabilistic label refinement. The small sibling count suggests this specific approach—combining distributional curation with extrinsic reference data—occupies a relatively underexplored niche within the noisy label learning landscape.

The taxonomy reveals neighboring leaves addressing related but distinct strategies: 'Noise Modeling and Transition Estimation' explicitly models corruption processes, while 'Data Ambiguation and Regularization Techniques' employ label smoothing and mixup. TrainRef diverges from these by introducing an external reference set rather than learning noise patterns intrinsically from the corrupted dataset. Nearby branches like 'Sample Selection and Confidence Estimation' prioritize identifying clean samples over correcting labels, and 'Uncertainty Estimation and Calibration' focuses on post-hoc confidence adjustment. The framework's dual emphasis on accuracy and calibration positions it at the intersection of label correction and uncertainty quantification, bridging typically separate research threads.

Across three identified contributions, the analysis examined thirteen candidate papers with no clear refutations found. The core TrainRef framework and reference augmentation technique each faced six candidates without overlapping prior work, while the co-evolving embedding technique examined one candidate. This limited search scope—thirteen papers from semantic matching—suggests the specific combination of distributional curation, extrinsic reference sets, and joint accuracy-calibration objectives has not been extensively explored in the examined literature. However, the modest candidate pool means the analysis captures top semantic matches rather than an exhaustive field survey, leaving open the possibility of related work in less semantically similar papers.

Given the sparse taxonomy leaf and absence of refutations among examined candidates, the work appears to occupy a distinct position within soft label correction methods. The limited search scope—thirteen candidates rather than hundreds—means this assessment reflects novelty relative to closely related prior work, not the entire field. The framework's integration of extrinsic reference data with distributional label curation represents a methodological departure from intrinsic noise modeling approaches, though broader field coverage would strengthen confidence in this assessment.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Learning with noisy labels for accurate prediction and reliable confidence. The field addresses the challenge of training robust models when training labels are corrupted, while simultaneously ensuring that model predictions come with trustworthy confidence estimates. The taxonomy reveals a rich landscape organized around several complementary strategies. Sample Selection and Confidence Estimation methods (e.g., Confident Learning[8], Confidence Tracking Selection[16]) identify and filter unreliable examples based on model confidence signals. Label Correction and Soft Label Learning approaches, including works like Soft Labels[4] and Dirichlet Calibration[29], refine or soften noisy annotations to better reflect underlying uncertainty. Robust Loss Design and Optimization branches focus on loss functions that downweight or resist label corruption, while Meta-Learning and Adaptive Strategies (e.g., Meta Learning Modulation[9]) dynamically adjust training procedures. Domain-Specific Applications and Specialized Learning Settings address unique noise patterns in areas such as medical imaging (Dual Uncertainty Medical[32]) and remote sensing (Earth Observation Noise[14]), and Uncertainty Estimation and Calibration ensures that confidence scores remain well-calibrated even under noise. Recent work highlights tensions between correcting labels versus learning to trust model confidence under corruption. TrainRef[0] sits within the Soft Label and Distribution-Based Correction cluster, emphasizing probabilistic label refinement rather than hard sample rejection. This contrasts with nearby confidence-driven selection methods like Confident Classifiers[3], which prioritize identifying clean samples, and calibration-focused approaches such as Dirichlet Calibration[29], which adjust output distributions post-hoc. A central open question is whether soft label strategies can maintain reliable confidence without explicit calibration mechanisms, especially when noise is instance-dependent (Instance Dependent Confidence[34]) or when fairness concerns arise (Fairness Robustness Selection[12]). TrainRef[0] contributes to this dialogue by integrating label correction with confidence reliability, bridging the gap between purely corrective and purely selective paradigms.

Claimed Contributions

TrainRef framework for distributional label curation with extrinsic reference set

6 retrieved papers

The authors introduce TrainRef, a framework that curates noisy training data by converting categorical labels into class distributions and using a small trusted reference set to avoid learning polluted normalities from the noisy dataset itself. This approach uniformly improves both prediction accuracy and confidence calibration.

6 retrieved papers

Reference augmentation technique for clean sample selection

6 retrieved papers

The authors develop a reference augmentation technique that leverages the extrinsic reference set Dref to identify and select clean samples from the noisy dataset, enabling effective denoising even when the reference set contains only one sample per class.

6 retrieved papers

Model-dataset co-evolving technique for near-perfect embedding space

1 retrieved paper

The authors propose a co-evolving technique that iteratively refines both the model embedding space and the curated dataset, producing a high-quality embedding space used to vote on distributional labels for noisy samples.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Learning with confidence: training better classifiers from soft labels PDF

Sjoerd de Vries, Dirk Thierens (2025)

[29] Dirichlet-based prediction calibration for learning with noisy labels PDF

Huang, Sheng-Jun, Wang Ye-wen, Xie, Ming-Kun, Zong, Chen-Chen (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

TrainRef framework for distributional label curation with extrinsic reference set

[51] Drugood: Out-of-distribution dataset curator and benchmark for ai-aided drug discoveryâa focus on affinity prediction problems with noise annotations PDF

Cannot Refute

[52] Robust Video-Text Retrieval Via Noisy Pair Calibration PDF

Cannot Refute

[53] Overcoming Noisy Labels and Non-IID Data in Edge Federated Learning PDF

Cannot Refute

[54] When does dough become a bagel? analyzing the remaining mistakes on imagenet PDF

Cannot Refute

[55] Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement PDF

Cannot Refute

[56] ARCA23K: An audio dataset for investigating open-set label noise PDF

Cannot Refute

Contribution

Reference augmentation technique for clean sample selection

[58] Separating hard clean samples from noisy samples with samplesâ learning risk for DNN when learning with noisy labels PDF

Cannot Refute

[59] MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References PDF

Cannot Refute

[60] Hallucinating a cleanly labeled augmented dataset from a noisy labeled dataset using GAN PDF

Cannot Refute

[61] Training data augmentation and data selection PDF

Cannot Refute

[62] Noisy Label Classification using Label Noise Selection with Test-Time Augmentation Cross-Entropy and NoiseMix Learning PDF

Cannot Refute

[63] AGARDograph on advanced astroinertial navigation systems PDF

Cannot Refute

Contribution

Model-dataset co-evolving technique for near-perfect embedding space

[57] Label Enhancement via Joint Implicit Representation Clustering PDF

Cannot Refute

TrainRef: Curating Data with Label Distribution and Minimal Reference for Accurate Prediction and Reliable Confidence

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Learning with confidence: training better classifiers from soft labels PDF

[29] Dirichlet-based prediction calibration for learning with noisy labels PDF

Contribution Analysis

TrainRef framework for distributional label curation with extrinsic reference set

[51] Drugood: Out-of-distribution dataset curator and benchmark for ai-aided drug discoveryâa focus on affinity prediction problems with noise annotations PDF

[52] Robust Video-Text Retrieval Via Noisy Pair Calibration PDF

[53] Overcoming Noisy Labels and Non-IID Data in Edge Federated Learning PDF

[54] When does dough become a bagel? analyzing the remaining mistakes on imagenet PDF

[55] Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement PDF

[56] ARCA23K: An audio dataset for investigating open-set label noise PDF

Reference augmentation technique for clean sample selection

[58] Separating hard clean samples from noisy samples with samplesâ learning risk for DNN when learning with noisy labels PDF

[59] MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References PDF

[60] Hallucinating a cleanly labeled augmented dataset from a noisy labeled dataset using GAN PDF

[61] Training data augmentation and data selection PDF

[62] Noisy Label Classification using Label Noise Selection with Test-Time Augmentation Cross-Entropy and NoiseMix Learning PDF

[63] AGARDograph on advanced astroinertial navigation systems PDF

Model-dataset co-evolving technique for near-perfect embedding space

[57] Label Enhancement via Joint Implicit Representation Clustering PDF

Table of Contents

[51] Drugood: Out-of-distribution dataset curator and benchmark for ai-aided drug discoveryâa focus on affinity prediction problems with noise annotations PDF

[58] Separating hard clean samples from noisy samples with samplesâ learning risk for DNN when learning with noisy labels PDF