Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms
Overview
Overall Novelty Assessment
The paper proposes the first systematic benchmarking framework for positive-unlabeled learning algorithms, addressing inconsistent experimental settings that hinder fair comparison. Within the taxonomy, it resides in the 'Comprehensive Benchmarking and Evaluation Protocols' leaf under 'Evaluation Methodologies and Benchmarking Frameworks'. This leaf contains only two papers, including the original work and one sibling ('Evaluating PU Learning'). The sparse population suggests this is an emerging research direction rather than a crowded subfield, with limited prior work establishing standardized evaluation protocols for PU learning.
The taxonomy reveals that while core PU algorithms and specialized paradigms are well-developed (with multiple leaves containing 3-5 papers each), the evaluation methodology branch remains relatively underpopulated. Neighboring leaves address 'Performance Metrics and Measurement' (3 papers) and 'Model Selection and Validation Strategies' (1 paper), indicating that measurement and validation are recognized challenges but lack comprehensive benchmarking frameworks. The paper bridges evaluation infrastructure with algorithmic diversity across risk estimation, iterative methods, and robustness challenges, connecting to multiple branches while focusing specifically on systematic comparison protocols.
Among 30 candidates examined, none clearly refute the three main contributions. The first contribution (systematic benchmark) examined 10 candidates with zero refutable matches, suggesting novelty in establishing comprehensive evaluation infrastructure. The second contribution (model selection without negative validation data) also examined 10 candidates with no refutations, indicating this addresses a previously unresolved practical challenge. The third contribution (internal label shift identification and calibration) similarly found no overlapping prior work among 10 candidates. The limited search scope means these findings reflect top-30 semantic matches rather than exhaustive coverage, but the absence of refutations across all contributions suggests substantive novelty within the examined literature.
Given the sparse taxonomy leaf (2 papers total) and zero refutations across 30 candidates examined, the work appears to address a recognized gap in PU learning evaluation. The analysis covers top-K semantic matches and does not claim exhaustive field coverage, so additional related work may exist beyond this scope. However, the convergence of taxonomy structure and contribution-level statistics suggests the paper occupies relatively unexplored territory in establishing standardized, fair benchmarking protocols for a mature algorithmic landscape.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors develop a unified experimental framework that enables systematic and fair comparison of state-of-the-art positive-unlabeled learning algorithms. This benchmark provides careful and unified implementations of data generation, algorithm training, and evaluation processes.
The authors address the unrealistic practice of using negative data in validation sets by proposing and analyzing model selection criteria (proxy accuracy and proxy AUC score) that rely only on positive and unlabeled validation data, with theoretical and empirical validation.
The authors identify for the first time that the one-sample setting causes an internal label shift in unlabeled training data, which degrades performance of two-sample algorithms. They propose a calibration technique (Algorithm 1) with theoretical guarantees to ensure fair cross-family comparisons.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Evaluating the Positive Unlabeled Learning Problem PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
First PU learning benchmark for systematic algorithm comparison
The authors develop a unified experimental framework that enables systematic and fair comparison of state-of-the-art positive-unlabeled learning algorithms. This benchmark provides careful and unified implementations of data generation, algorithm training, and evaluation processes.
[2] A novel observation pointsâbased positiveâunlabeled learning algorithm PDF
[3] Self-pu: Self boosted and calibrated positive-unlabeled training PDF
[4] Online positive and unlabeled learning PDF
[5] Meta-learning for Positive-unlabeled Classification PDF
[7] Theoretical comparisons of positive-unlabeled learning against positive-negative learning PDF
[10] Angular Regularization for Positive-Unlabeled Learning on the Hypersphere PDF
[36] An adaptive asymmetric loss function for positive unlabeled learning PDF
[48] Pulns: Positive-unlabeled learning with effective negative sample selector PDF
[51] Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach PDF
[52] Global and local learning from positive and unlabeled examples PDF
Model selection criteria for PU learning without negative validation data
The authors address the unrealistic practice of using negative data in validation sets by proposing and analyzing model selection criteria (proxy accuracy and proxy AUC score) that rely only on positive and unlabeled validation data, with theoretical and empirical validation.
[53] Analysis of learning from positive and unlabeled data PDF
[54] Building text classifiers using positive and unlabeled examples PDF
[55] Robust model selection for positive and unlabeled learning with constraints PDF
[56] Fairness-aware model-agnostic positive and unlabeled learning PDF
[57] Learning from positive and unlabeled data with a selection bias PDF
[58] Learning from positive and unlabeled examples PDF
[59] Towards accurate model selection in deep unsupervised domain adaptation PDF
[60] PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning PDF
[61] A bagging SVM to learn from positive and unlabeled examples PDF
[62] Positive and Unlabeled Data: Model, Estimation, Inference, and Classification PDF
Identification of internal label shift problem and calibration approach
The authors identify for the first time that the one-sample setting causes an internal label shift in unlabeled training data, which degrades performance of two-sample algorithms. They propose a calibration technique (Algorithm 1) with theoretical guarantees to ensure fair cross-family comparisons.