Abstract:

Selective classification enhances the reliability of predictive models by allowing them to abstain from making uncertain predictions. In this work, we revisit the design of optimal selection functions through the lens of the Neyman–Pearson lemma, a classical result in statistics that characterizes the optimal rejection rule as a likelihood ratio test. We show that this perspective not only unifies the behavior of several post-hoc selection baselines, but also motivates new approaches to selective classification which we propose here. A central focus of our work is the setting of covariate shift, where the input distribution at test time differs from that at training. This realistic and challenging scenario remains relatively underexplored in the context of selective classification. We evaluate our proposed methods across a range of vision and language tasks, including both supervised learning and vision-language models. Our experiments demonstrate that our Neyman-Pearson-informed methods consistently outperform existing baselines, indicating that likelihood ratio-based selection offers a robust mechanism for improving selective classification under covariate shifts.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a Neyman–Pearson framework for selective classification under covariate shift, introducing two distance-based selector methods (∆-MDS and ∆-KNN) and providing comprehensive empirical evaluation. It resides in the 'Likelihood Ratio and Neyman-Pearson Approaches' leaf, which contains only two papers total. This represents a relatively sparse research direction within the broader taxonomy of fifty papers across thirty-six topics, suggesting the theoretical grounding of selective classification in classical statistical decision theory remains underexplored compared to post-hoc confidence methods or OOD detection approaches.

The taxonomy reveals neighboring work in sibling leaves under 'Theoretical Foundations and Optimal Selection Mechanisms,' including minimax analysis using transfer exponents and unified rejection frameworks across multiple loss functions. The paper's emphasis on likelihood ratios distinguishes it from the more crowded 'Post-Hoc Selection Strategies' branch (six papers across three leaves) and 'Selective Classification with Out-of-Distribution Detection' branch (thirteen papers across four leaves). The scope note for this leaf explicitly excludes heuristic confidence-based methods, positioning the work as a principled alternative to empirical baselines that dominate neighboring branches.

Among thirty candidates examined through limited semantic search, none clearly refuted the three main contributions. The Neyman–Pearson framework contribution examined ten candidates with zero refutable matches, as did the two novel distance-based methods and the comprehensive evaluation under covariate shift. This absence of overlapping prior work within the examined scope suggests the specific combination of classical statistical optimality theory with modern selective classification under distribution shift has received limited direct attention, though the search scale means potentially relevant work outside the top-thirty semantic matches may exist.

The analysis reflects a bounded literature search rather than exhaustive coverage of the field. The sparse population of the taxonomy leaf and lack of refutable candidates among thirty examined papers suggest novelty within the explored scope, though the limited search depth means the assessment cannot definitively rule out related work in adjacent statistical learning theory or domain adaptation literature not captured by semantic similarity to the paper's abstract and introduction.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: selective classification under covariate shift. The field addresses scenarios where a classifier must decide when to abstain from prediction when the test distribution differs from training. The taxonomy reveals several complementary perspectives. Theoretical Foundations and Optimal Selection Mechanisms explore principled approaches such as likelihood ratio tests and Neyman-Pearson frameworks for deriving provably optimal rejection rules. Selective Classification with Out-of-Distribution Detection focuses on identifying and rejecting inputs that fall outside the training manifold, often leveraging density estimation or self-supervised signals. Post-Hoc Selection Strategies and Confidence-Based Methods apply rejection mechanisms after standard training, typically using model confidence scores or ensembles. Calibration and Uncertainty Quantification emphasizes ensuring that predictive uncertainties align with true error rates, which is critical when distributions shift. End-to-End Training for Selective Classification integrates the rejection decision directly into the learning objective, while Learning Under Arbitrary Covariate Shift develops methods robust to general distributional changes. Domain-Specific Applications and Empirical Studies ground these ideas in real-world contexts such as medical diagnosis, autonomous driving, and question answering. Recent work highlights tensions between theoretical optimality and practical robustness. Plugin Selective Classification[7] and Calibrated Selective[8] illustrate post-hoc approaches that are simple to deploy but may lack guarantees under severe shift. In contrast, Optimal Selective Classification[0] resides within the Theoretical Foundations branch, emphasizing likelihood ratio and Neyman-Pearson principles to achieve provably optimal selection under known shift. Neighboring work such as Weighted Conformal[2] also leverages distributional knowledge to construct valid prediction sets, sharing the emphasis on principled uncertainty quantification. Meanwhile, branches like Selective Classification with Out-of-Distribution Detection (e.g., Self-supervised OOD[9], Reject OOD Models[13]) prioritize detecting novel inputs rather than optimizing coverage-risk trade-offs. The interplay between calibration (Calibration Multiclass Rejection[35]), domain adaptation (Efficient Covariate Shift[15]), and end-to-end learning (Learning to Reject[34]) remains an active area, with open questions around scalability, label shift, and the design of rejection losses that generalize across diverse covariate shifts.

Claimed Contributions

Neyman–Pearson framework for optimal selective classification

The authors apply the classical Neyman–Pearson lemma from statistics to selective classification, showing that the optimal selection function is a likelihood ratio test between correct and incorrect predictions. This theoretical framework provides principled guidance for designing selector functions in modern deep networks.

10 retrieved papers
Two novel distance-based selector methods: ∆-MDS and ∆-KNN

The authors propose ∆-MDS and ∆-KNN, which are modified versions of Mahalanobis distance and k-nearest neighbors methods that explicitly estimate separate distributions for correctly and incorrectly classified samples. They also introduce a linear combination strategy to merge distance-based and logit-based scores, all motivated by the Neyman–Pearson framework.

10 retrieved papers
Comprehensive evaluation under covariate shift

The authors provide extensive experiments on covariate shift scenarios across vision (ImageNet variants) and language (Amazon Reviews) tasks, evaluating both vision-language models like CLIP and supervised classifiers. This addresses an underexplored setting in selective classification where input distributions change while label spaces remain fixed.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Neyman–Pearson framework for optimal selective classification

The authors apply the classical Neyman–Pearson lemma from statistics to selective classification, showing that the optimal selection function is a likelihood ratio test between correct and incorrect predictions. This theoretical framework provides principled guidance for designing selector functions in modern deep networks.

Contribution

Two novel distance-based selector methods: ∆-MDS and ∆-KNN

The authors propose ∆-MDS and ∆-KNN, which are modified versions of Mahalanobis distance and k-nearest neighbors methods that explicitly estimate separate distributions for correctly and incorrectly classified samples. They also introduce a linear combination strategy to merge distance-based and logit-based scores, all motivated by the Neyman–Pearson framework.

Contribution

Comprehensive evaluation under covariate shift

The authors provide extensive experiments on covariate shift scenarios across vision (ImageNet variants) and language (Amazon Reviews) tasks, evaluating both vision-language models like CLIP and supervised classifiers. This addresses an underexplored setting in selective classification where input distributions change while label spaces remain fixed.