Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Positive-unlabeled learningweakly supervised learning.

Positive-unlabeled (PU) learning is a weakly supervised binary classification problem, in which the goal is to learn a binary classifier from only positive and unlabeled data, without access to negative data. In recent years, many PU learning algorithms have been developed to improve model performance. However, experimental settings are highly inconsistent, making it difficult to identify which algorithm performs better. In this paper, we propose the first PU learning benchmark to systematically compare PU learning algorithms. During our implementation, we identify subtle yet critical factors that affect the realistic and fair evaluation of PU learning algorithms. On the one hand, many PU learning algorithms rely on a validation set that includes negative data for model selection. This is unrealistic in traditional PU learning settings, where no negative data are available. To handle this problem, we systematically investigate model selection criteria for PU learning. On the other hand, the problem settings and solutions of PU learning have different families, i.e., the one-sample and two-sample settings. However, existing evaluation protocols are heavily biased towards the one-sample setting and neglect the significant difference between them. We identify the internal label shift problem of unlabeled training data for the one-sample setting and propose a simple yet effective calibration approach to ensure fair comparisons within and across families. We hope our framework will provide an accessible, realistic, and fair environment for evaluating PU learning algorithms in the future.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes the first systematic benchmarking framework for positive-unlabeled learning algorithms, addressing inconsistent experimental settings that hinder fair comparison. Within the taxonomy, it resides in the 'Comprehensive Benchmarking and Evaluation Protocols' leaf under 'Evaluation Methodologies and Benchmarking Frameworks'. This leaf contains only two papers, including the original work and one sibling ('Evaluating PU Learning'). The sparse population suggests this is an emerging research direction rather than a crowded subfield, with limited prior work establishing standardized evaluation protocols for PU learning.

The taxonomy reveals that while core PU algorithms and specialized paradigms are well-developed (with multiple leaves containing 3-5 papers each), the evaluation methodology branch remains relatively underpopulated. Neighboring leaves address 'Performance Metrics and Measurement' (3 papers) and 'Model Selection and Validation Strategies' (1 paper), indicating that measurement and validation are recognized challenges but lack comprehensive benchmarking frameworks. The paper bridges evaluation infrastructure with algorithmic diversity across risk estimation, iterative methods, and robustness challenges, connecting to multiple branches while focusing specifically on systematic comparison protocols.

Among 30 candidates examined, none clearly refute the three main contributions. The first contribution (systematic benchmark) examined 10 candidates with zero refutable matches, suggesting novelty in establishing comprehensive evaluation infrastructure. The second contribution (model selection without negative validation data) also examined 10 candidates with no refutations, indicating this addresses a previously unresolved practical challenge. The third contribution (internal label shift identification and calibration) similarly found no overlapping prior work among 10 candidates. The limited search scope means these findings reflect top-30 semantic matches rather than exhaustive coverage, but the absence of refutations across all contributions suggests substantive novelty within the examined literature.

Given the sparse taxonomy leaf (2 papers total) and zero refutations across 30 candidates examined, the work appears to address a recognized gap in PU learning evaluation. The analysis covers top-K semantic matches and does not claim exhaustive field coverage, so additional related work may exist beyond this scope. However, the convergence of taxonomy structure and contribution-level statistics suggests the paper occupies relatively unexplored territory in establishing standardized, fair benchmarking protocols for a mature algorithmic landscape.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: benchmarking and evaluation of positive-unlabeled learning algorithms. The field of positive-unlabeled (PU) learning addresses scenarios where only a subset of positive examples is labeled while the remaining data is unlabeled, mixing hidden positives with true negatives. The taxonomy reflects a mature research landscape organized around several complementary dimensions. Evaluation Methodologies and Benchmarking Frameworks focus on systematic protocols for comparing algorithms, as seen in works like Evaluating PU Learning[1] and Accessible Fair PU Evaluation[0]. Core PU Learning Algorithms and Theoretical Foundations encompass foundational methods and theoretical guarantees, including risk estimators and class-prior estimation techniques such as those explored in PU versus PN Theory[7] and Mixture Proportion Estimation[26]. Specialized PU Learning Paradigms branch into extensions like online learning (Online PU Learning[4]), meta-learning (Meta PU Learning[5]), and self-paced strategies (Self PU[3]). Robustness and Data Quality Challenges address noise, label corruption, and distribution shifts, while Advanced Representation and Regularization Techniques explore contrastive learning (PU Contrastive Learning[11]) and geometric constraints (Angular Regularization Hypersphere[10]). Domain-Specific Applications span cybersecurity, healthcare, and recommendation systems, and Methodological Surveys provide comprehensive overviews like PU Classifier Review[6]. Recent activity highlights tensions between algorithmic innovation and rigorous evaluation. Many studies introduce novel loss functions, regularization schemes, or architectural designs, yet standardized benchmarking remains challenging due to varying assumptions about class priors, label noise, and data distributions. Works addressing robustness, such as Positive Distribution Pollution[12] and Confidence Instance Noise[15], underscore the difficulty of maintaining performance when positive labels are corrupted or when unlabeled data deviates from training assumptions. Accessible Fair PU Evaluation[0] sits squarely within the Evaluation Methodologies branch, emphasizing the need for transparent, reproducible benchmarking protocols that can fairly compare diverse PU algorithms. Its focus on accessibility and fairness in evaluation complements earlier efforts like Evaluating PU Learning[1], which laid groundwork for systematic comparison, and contrasts with algorithm-centric works such as Meta PU Learning[5] or Self PU[3] that prioritize novel learning strategies over evaluation infrastructure. By addressing gaps in how PU methods are assessed, Accessible Fair PU Evaluation[0] aims to provide the community with tools to navigate the growing diversity of approaches and ensure that empirical claims rest on solid experimental foundations.

Claimed Contributions

First PU learning benchmark for systematic algorithm comparison

10 retrieved papers

The authors develop a unified experimental framework that enables systematic and fair comparison of state-of-the-art positive-unlabeled learning algorithms. This benchmark provides careful and unified implementations of data generation, algorithm training, and evaluation processes.

10 retrieved papers

Model selection criteria for PU learning without negative validation data

10 retrieved papers

The authors address the unrealistic practice of using negative data in validation sets by proposing and analyzing model selection criteria (proxy accuracy and proxy AUC score) that rely only on positive and unlabeled validation data, with theoretical and empirical validation.

10 retrieved papers

Identification of internal label shift problem and calibration approach

10 retrieved papers

The authors identify for the first time that the one-sample setting causes an internal label shift in unlabeled training data, which degrades performance of two-sample algorithms. They propose a calibration technique (Algorithm 1) with theoretical guarantees to ensure fair cross-family comparisons.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Evaluating the Positive Unlabeled Learning Problem PDF

Kristen Jaskie, Andreas Spanias (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First PU learning benchmark for systematic algorithm comparison

[2] A novel observation pointsâbased positiveâunlabeled learning algorithm PDF

Cannot Refute

[3] Self-pu: Self boosted and calibrated positive-unlabeled training PDF

Cannot Refute

[4] Online positive and unlabeled learning PDF

Cannot Refute

[5] Meta-learning for Positive-unlabeled Classification PDF

Cannot Refute

[7] Theoretical comparisons of positive-unlabeled learning against positive-negative learning PDF

Cannot Refute

[10] Angular Regularization for Positive-Unlabeled Learning on the Hypersphere PDF

Cannot Refute

[36] An adaptive asymmetric loss function for positive unlabeled learning PDF

Cannot Refute

[48] Pulns: Positive-unlabeled learning with effective negative sample selector PDF

Cannot Refute

[51] Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach PDF

Cannot Refute

[52] Global and local learning from positive and unlabeled examples PDF

Cannot Refute

Contribution

Model selection criteria for PU learning without negative validation data

[53] Analysis of learning from positive and unlabeled data PDF

Cannot Refute

[54] Building text classifiers using positive and unlabeled examples PDF

Cannot Refute

[55] Robust model selection for positive and unlabeled learning with constraints PDF

Cannot Refute

[56] Fairness-aware model-agnostic positive and unlabeled learning PDF

Cannot Refute

[57] Learning from positive and unlabeled data with a selection bias PDF

Cannot Refute

[58] Learning from positive and unlabeled examples PDF

Cannot Refute

[59] Towards accurate model selection in deep unsupervised domain adaptation PDF

Cannot Refute

[60] PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning PDF

Cannot Refute

[61] A bagging SVM to learn from positive and unlabeled examples PDF

Cannot Refute

[62] Positive and Unlabeled Data: Model, Estimation, Inference, and Classification PDF

Cannot Refute

Contribution

Identification of internal label shift problem and calibration approach

[17] Dist-pu: Positive-unlabeled learning from a label distribution perspective PDF

Cannot Refute

[38] Positive Unlabeled Learning: Optimization and Evaluation PDF

Cannot Refute

[63] Adapting to shifting correlations with unlabeled data calibration PDF

Cannot Refute

[64] Label Shift Estimation With Incremental Prior Update PDF

Cannot Refute

[65] Towards robust semi-supervised distribution alignment against label distribution shift with noisy annotations PDF

Cannot Refute

[66] Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation PDF

Cannot Refute

[67] Pue: Biased positive-unlabeled learning enhancement by causal inference PDF

Cannot Refute

[68] Positive-unlabeled learning with uncertainty-aware pseudo-label selection PDF

Cannot Refute

[69] LaSCal: Label-Shift Calibration without target labels PDF

Cannot Refute

[70] Towards Label Shift Adaptation for Robust IoT Device Identification PDF

Cannot Refute

Accessible, Realistic, and Fair Evaluation of Positive-Unlabeled Learning Algorithms

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Evaluating the Positive Unlabeled Learning Problem PDF

Contribution Analysis

First PU learning benchmark for systematic algorithm comparison

[2] A novel observation pointsâbased positiveâunlabeled learning algorithm PDF

[3] Self-pu: Self boosted and calibrated positive-unlabeled training PDF

[4] Online positive and unlabeled learning PDF

[5] Meta-learning for Positive-unlabeled Classification PDF

[7] Theoretical comparisons of positive-unlabeled learning against positive-negative learning PDF

[10] Angular Regularization for Positive-Unlabeled Learning on the Hypersphere PDF

[36] An adaptive asymmetric loss function for positive unlabeled learning PDF

[48] Pulns: Positive-unlabeled learning with effective negative sample selector PDF

[51] Re-Examine Distantly Supervised NER: A New Benchmark and a Simple Approach PDF

[52] Global and local learning from positive and unlabeled examples PDF

Model selection criteria for PU learning without negative validation data

[53] Analysis of learning from positive and unlabeled data PDF

[54] Building text classifiers using positive and unlabeled examples PDF

[55] Robust model selection for positive and unlabeled learning with constraints PDF

[56] Fairness-aware model-agnostic positive and unlabeled learning PDF

[57] Learning from positive and unlabeled data with a selection bias PDF

[58] Learning from positive and unlabeled examples PDF

[59] Towards accurate model selection in deep unsupervised domain adaptation PDF

[60] PLUS: Predicting cancer metastasis potential based on positive and unlabeled learning PDF

[61] A bagging SVM to learn from positive and unlabeled examples PDF

[62] Positive and Unlabeled Data: Model, Estimation, Inference, and Classification PDF

Identification of internal label shift problem and calibration approach

[17] Dist-pu: Positive-unlabeled learning from a label distribution perspective PDF

[38] Positive Unlabeled Learning: Optimization and Evaluation PDF

[63] Adapting to shifting correlations with unlabeled data calibration PDF

[64] Label Shift Estimation With Incremental Prior Update PDF

[65] Towards robust semi-supervised distribution alignment against label distribution shift with noisy annotations PDF

[66] Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation PDF

[67] Pue: Biased positive-unlabeled learning enhancement by causal inference PDF

[68] Positive-unlabeled learning with uncertainty-aware pseudo-label selection PDF

[69] LaSCal: Label-Shift Calibration without target labels PDF

[70] Towards Label Shift Adaptation for Robust IoT Device Identification PDF

Table of Contents

[2] A novel observation pointsâbased positiveâunlabeled learning algorithm PDF