PU-BENCH: A UNIFIED BENCHMARK FOR RIGOROUS AND REPRODUCIBLE PU LEARNING
Overview
Overall Novelty Assessment
The paper introduces PU-Bench, a unified benchmarking framework for positive-unlabeled learning that standardizes data generation, integrates sixteen state-of-the-art methods, and provides reproducible evaluation protocols. Within the taxonomy, it resides in the 'Theoretical Analysis and Comparative Studies' leaf under 'Core PU Learning Methodologies and Theoretical Frameworks,' alongside only one sibling paper (PU versus PN theoretical comparison). This leaf represents a sparse research direction focused on rigorous comparative analysis rather than novel algorithmic contributions, suggesting that systematic benchmarking efforts remain underexplored in the PU learning literature despite the proliferation of methods across other branches.
The taxonomy reveals a densely populated field with over fifty papers distributed across methodological innovations (risk estimation, sample selection, representation learning), data challenges (selection bias, class prior estimation), and domain applications (biomedical, fraud detection, computer vision). PU-Bench connects most directly to the 'Core Methodologies' branch by evaluating methods from multiple subtopics—unbiased risk estimators, pseudo-labeling strategies, and contrastive approaches—but diverges by focusing on empirical comparison rather than proposing new algorithms. Neighboring leaves like 'Unbiased Risk Estimation Approaches' and 'Reliable Negative Identification' contain the algorithmic work that PU-Bench evaluates, positioning this contribution as infrastructure for the broader research ecosystem.
Among thirty candidates examined through semantic search and citation expansion, none clearly refute any of the three core contributions: the unified benchmarking framework (ten candidates examined, zero refutable), the large-scale empirical study (ten candidates, zero refutable), and the analysis with actionable guidelines (ten candidates, zero refutable). This absence of overlapping prior work within the limited search scope suggests that comprehensive, standardized benchmarking infrastructure for PU learning has not been previously established at this scale. The framework contribution appears most distinctive, as existing comparative studies like the sibling paper focus on theoretical guarantees rather than reproducible empirical evaluation across diverse methods and datasets.
Based on the top-thirty semantic matches and taxonomy structure, the work addresses a recognized gap in PU learning research: the lack of standardized evaluation infrastructure that has led to inconsistent experimental settings and irreproducible findings. While the limited search scope cannot guarantee exhaustiveness, the absence of refuting candidates across all contributions and the sparse population of the 'Theoretical Analysis and Comparative Studies' leaf suggest that this benchmarking effort occupies relatively uncontested ground within the field's current landscape.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors present PU-Bench, an open-source framework that provides a configurable PU data generator, an integrated framework of 16 state-of-the-art PU methods, and standardized protocols for reproducible assessment of positive-unlabeled learning algorithms.
The authors perform a systematic evaluation benchmarking 16 representative PU methods across 8 diverse datasets with 15 distinct labeling ratios under 4 labeling assumptions, totaling more than 2,560 evaluations.
The authors deliver comprehensive analysis revealing strengths and limitations of current PU methods and propose practical, data-driven guidelines for algorithm selection and design based on effectiveness, efficiency, and robustness considerations.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[16] Theoretical comparisons of positive-unlabeled learning against positive-negative learning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
PU-Bench unified open-source benchmarking framework
The authors present PU-Bench, an open-source framework that provides a configurable PU data generator, an integrated framework of 16 state-of-the-art PU methods, and standardized protocols for reproducible assessment of positive-unlabeled learning algorithms.
[51] Unleashing the strengths of unlabelled data in deep learning-assisted pan-cancer abdominal organ quantification: the FLARE22 challenge PDF
[52] Usb: A unified semi-supervised learning benchmark for classification PDF
[53] Multiscale positive-unlabeled detection of ai-generated texts PDF
[54] Advancing emotional analysis with large language models PDF
[55] Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform PDF
[56] Learning Gait Representation From Massive Unlabelled Walking Videos: A Benchmark PDF
[57] SoK: The Impact of Unlabelled Data in Cyberthreat Detection PDF
[58] Deep representation features from DreamDIAXMBD improve the analysis of data-independent acquisition proteomics PDF
[59] Queryable and Interpretable PU Learning Through Probabilistic Circuits PDF
[60] Benchmarking anomaly detection algorithms in an industrial context: dealing with scarce labels and multiple positive types PDF
Large-scale comprehensive empirical study
The authors perform a systematic evaluation benchmarking 16 representative PU methods across 8 diverse datasets with 15 distinct labeling ratios under 4 labeling assumptions, totaling more than 2,560 evaluations.
[26] Conditional generative positive and unlabeled learning PDF
[28] A novel observation pointsâbased positiveâunlabeled learning algorithm PDF
[63] BiCSA-PUL: binary crow search algorithm for enhancing positive and unlabeled learning PDF
[68] Towards Improved Illicit Node Detection with Positive-Unlabelled Learning PDF
[69] A recent survey on instance-dependent positive and unlabeled learning PDF
[70] ESA: Example Sieve Approach for Multi-Positive and Unlabeled Learning PDF
[71] Spotting fake reviews via collective positive-unlabeled learning PDF
[72] Weighted Contrastive Learning With Hard Negative Mining for Positive and Unlabeled Learning PDF
[73] A Source Code Vulnerability Detection Method Based on Positive-Unlabeled Learning PDF
[74] Uncertainty-Aware Neighbor Calibration for Positive and Unlabeled Learning in Large Machine Learning Models PDF
In-depth analysis and actionable guidelines
The authors deliver comprehensive analysis revealing strengths and limitations of current PU methods and propose practical, data-driven guidelines for algorithm selection and design based on effectiveness, efficiency, and robustness considerations.