Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

selective classification

Selective classification enhances the reliability of predictive models by allowing them to abstain from making uncertain predictions. In this work, we revisit the design of optimal selection functions through the lens of the Neyman–Pearson lemma, a classical result in statistics that characterizes the optimal rejection rule as a likelihood ratio test. We show that this perspective not only unifies the behavior of several post-hoc selection baselines, but also motivates new approaches to selective classification which we propose here. A central focus of our work is the setting of covariate shift, where the input distribution at test time differs from that at training. This realistic and challenging scenario remains relatively underexplored in the context of selective classification. We evaluate our proposed methods across a range of vision and language tasks, including both supervised learning and vision-language models. Our experiments demonstrate that our Neyman-Pearson-informed methods consistently outperform existing baselines, indicating that likelihood ratio-based selection offers a robust mechanism for improving selective classification under covariate shifts.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a Neyman–Pearson framework for selective classification under covariate shift, introducing two distance-based selector methods (∆-MDS and ∆-KNN) and providing comprehensive empirical evaluation. It resides in the 'Likelihood Ratio and Neyman-Pearson Approaches' leaf, which contains only two papers total. This represents a relatively sparse research direction within the broader taxonomy of fifty papers across thirty-six topics, suggesting the theoretical grounding of selective classification in classical statistical decision theory remains underexplored compared to post-hoc confidence methods or OOD detection approaches.

The taxonomy reveals neighboring work in sibling leaves under 'Theoretical Foundations and Optimal Selection Mechanisms,' including minimax analysis using transfer exponents and unified rejection frameworks across multiple loss functions. The paper's emphasis on likelihood ratios distinguishes it from the more crowded 'Post-Hoc Selection Strategies' branch (six papers across three leaves) and 'Selective Classification with Out-of-Distribution Detection' branch (thirteen papers across four leaves). The scope note for this leaf explicitly excludes heuristic confidence-based methods, positioning the work as a principled alternative to empirical baselines that dominate neighboring branches.

Among thirty candidates examined through limited semantic search, none clearly refuted the three main contributions. The Neyman–Pearson framework contribution examined ten candidates with zero refutable matches, as did the two novel distance-based methods and the comprehensive evaluation under covariate shift. This absence of overlapping prior work within the examined scope suggests the specific combination of classical statistical optimality theory with modern selective classification under distribution shift has received limited direct attention, though the search scale means potentially relevant work outside the top-thirty semantic matches may exist.

The analysis reflects a bounded literature search rather than exhaustive coverage of the field. The sparse population of the taxonomy leaf and lack of refutable candidates among thirty examined papers suggest novelty within the explored scope, though the limited search depth means the assessment cannot definitively rule out related work in adjacent statistical learning theory or domain adaptation literature not captured by semantic similarity to the paper's abstract and introduction.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: selective classification under covariate shift. The field addresses scenarios where a classifier must decide when to abstain from prediction when the test distribution differs from training. The taxonomy reveals several complementary perspectives. Theoretical Foundations and Optimal Selection Mechanisms explore principled approaches such as likelihood ratio tests and Neyman-Pearson frameworks for deriving provably optimal rejection rules. Selective Classification with Out-of-Distribution Detection focuses on identifying and rejecting inputs that fall outside the training manifold, often leveraging density estimation or self-supervised signals. Post-Hoc Selection Strategies and Confidence-Based Methods apply rejection mechanisms after standard training, typically using model confidence scores or ensembles. Calibration and Uncertainty Quantification emphasizes ensuring that predictive uncertainties align with true error rates, which is critical when distributions shift. End-to-End Training for Selective Classification integrates the rejection decision directly into the learning objective, while Learning Under Arbitrary Covariate Shift develops methods robust to general distributional changes. Domain-Specific Applications and Empirical Studies ground these ideas in real-world contexts such as medical diagnosis, autonomous driving, and question answering. Recent work highlights tensions between theoretical optimality and practical robustness. Plugin Selective Classification[7] and Calibrated Selective[8] illustrate post-hoc approaches that are simple to deploy but may lack guarantees under severe shift. In contrast, Optimal Selective Classification[0] resides within the Theoretical Foundations branch, emphasizing likelihood ratio and Neyman-Pearson principles to achieve provably optimal selection under known shift. Neighboring work such as Weighted Conformal[2] also leverages distributional knowledge to construct valid prediction sets, sharing the emphasis on principled uncertainty quantification. Meanwhile, branches like Selective Classification with Out-of-Distribution Detection (e.g., Self-supervised OOD[9], Reject OOD Models[13]) prioritize detecting novel inputs rather than optimizing coverage-risk trade-offs. The interplay between calibration (Calibration Multiclass Rejection[35]), domain adaptation (Efficient Covariate Shift[15]), and end-to-end learning (Learning to Reject[34]) remains an active area, with open questions around scalability, label shift, and the design of rejection losses that generalize across diverse covariate shifts.

Claimed Contributions

Neyman–Pearson framework for optimal selective classification

10 retrieved papers

The authors apply the classical Neyman–Pearson lemma from statistics to selective classification, showing that the optimal selection function is a likelihood ratio test between correct and incorrect predictions. This theoretical framework provides principled guidance for designing selector functions in modern deep networks.

10 retrieved papers

Two novel distance-based selector methods: ∆-MDS and ∆-KNN

10 retrieved papers

The authors propose ∆-MDS and ∆-KNN, which are modified versions of Mahalanobis distance and k-nearest neighbors methods that explicitly estimate separate distributions for correctly and incorrectly classified samples. They also introduce a linear combination strategy to merge distance-based and logit-based scores, all motivated by the Neyman–Pearson framework.

10 retrieved papers

Comprehensive evaluation under covariate shift

10 retrieved papers

The authors provide extensive experiments on covariate shift scenarios across vision (ImageNet variants) and language (Amazon Reviews) tasks, evaluating both vision-language models like CLIP and supervised classifiers. This addresses an underexplored setting in selective classification where input distributions change while label spaces remain fixed.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Model-free selective inference under covariate shift via weighted conformal p-values PDF

Ying Jin, Emmanuel J. Candes, E. CandÃ¨s (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Neyman–Pearson framework for optimal selective classification

[70] Revamping Conformal Selection With Optimal Power: A Neyman--Pearson Perspective PDF

Cannot Refute

[71] Sequential analysis: Hypothesis testing and changepoint detection PDF

Cannot Refute

[72] Density Ratio Estimation and Neyman Pearson Classification with Missing Data PDF

Cannot Refute

[73] Weighted joint LRTs for cooperative spectrum sensing using K-means clustering PDF

Cannot Refute

[74] Online anomaly detection in the Neyman-Pearson hypothesis testing framework PDF

Cannot Refute

[75] Optimal Decision Rules for Composite Binary Hypothesis Testing under Neyman-Pearson Framework PDF

Cannot Refute

[76] Hybrid Fusion Combining Palmprint and Palm Vein for Large-Scale Palm-Based Recognition PDF

Cannot Refute

[77] Explicit Abstention Knobs for Predictable Reliability in Video Question Answering PDF

Cannot Refute

[78] On the validity of the likelihood ratio and maximum likelihood methods PDF

Cannot Refute

[79] Adaptive Robust and Nonparametric Procedures with Application to Communications, Radar, Sonar and Array Signal Processing PDF

Cannot Refute

Contribution

Two novel distance-based selector methods: ∆-MDS and ∆-KNN

[60] Feature Selection based on Mahalanobis Distance for Early Parkinson Disease Classification PDF

Cannot Refute

[61] Maturity status classification of papaya fruits based on machine learning and transfer learning approach PDF

Cannot Refute

[62] A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental â¦ PDF

Cannot Refute

[63] Imbalance data classification using local mahalanobis distance learning based on nearest neighbor PDF

Cannot Refute

[64] Quantum K-nearest neighbors classification algorithm based on Mahalanobis distance PDF

Cannot Refute

[65] An empirical study of distance metrics for k-nearest neighbor algorithm PDF

Cannot Refute

[66] Spatial outlier detection on discrete GNSS velocity fields using robust Mahalanobis-distance-based unsupervised classification PDF

Cannot Refute

[67] A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental effects PDF

Cannot Refute

[68] Learning a Mahalanobis distance metric for data clustering and classification PDF

Cannot Refute

[69] Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction PDF

Cannot Refute

Contribution

Comprehensive evaluation under covariate shift

[23] Selective question answering under domain shift PDF

Cannot Refute

[51] Confidence-based Class Relation Embedding for Few-shot Domain Adaptation PDF

Cannot Refute

[52] Spurious correlations in machine learning: A survey PDF

Cannot Refute

[53] Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models PDF

Cannot Refute

[54] Noisy-aware unsupervised domain adaptation for scene text recognition PDF

Cannot Refute

[55] To annotate or not? predicting performance drop under domain shift PDF

Cannot Refute

[56] The Generalization and Error Detection in LLM-based Text-to-SQL Systems PDF

Cannot Refute

[57] Confident Learning-Based Domain Adaptation for Hyperspectral Image Classification PDF

Cannot Refute

[58] Improving selective classification performance of deep neural networks through post-hoc logit normalization and temperature scaling PDF

Cannot Refute

[59] Detecting covariate shifts with vision-language foundation models PDF

Cannot Refute

Know When to Abstain: Optimal Selective Classification with Likelihood Ratios

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Model-free selective inference under covariate shift via weighted conformal p-values PDF

Contribution Analysis

Neyman–Pearson framework for optimal selective classification

[70] Revamping Conformal Selection With Optimal Power: A Neyman--Pearson Perspective PDF

[71] Sequential analysis: Hypothesis testing and changepoint detection PDF

[72] Density Ratio Estimation and Neyman Pearson Classification with Missing Data PDF

[73] Weighted joint LRTs for cooperative spectrum sensing using K-means clustering PDF

[74] Online anomaly detection in the Neyman-Pearson hypothesis testing framework PDF

[75] Optimal Decision Rules for Composite Binary Hypothesis Testing under Neyman-Pearson Framework PDF

[76] Hybrid Fusion Combining Palmprint and Palm Vein for Large-Scale Palm-Based Recognition PDF

[77] Explicit Abstention Knobs for Predictable Reliability in Video Question Answering PDF

[78] On the validity of the likelihood ratio and maximum likelihood methods PDF

[79] Adaptive Robust and Nonparametric Procedures with Application to Communications, Radar, Sonar and Array Signal Processing PDF

Two novel distance-based selector methods: ∆-MDS and ∆-KNN

[60] Feature Selection based on Mahalanobis Distance for Early Parkinson Disease Classification PDF

[61] Maturity status classification of papaya fruits based on machine learning and transfer learning approach PDF

[62] A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental â¦ PDF

[63] Imbalance data classification using local mahalanobis distance learning based on nearest neighbor PDF

[64] Quantum K-nearest neighbors classification algorithm based on Mahalanobis distance PDF

[65] An empirical study of distance metrics for k-nearest neighbor algorithm PDF

[66] Spatial outlier detection on discrete GNSS velocity fields using robust Mahalanobis-distance-based unsupervised classification PDF

[67] A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental effects PDF

[68] Learning a Mahalanobis distance metric for data clustering and classification PDF

[69] Mahalanobis distance based multivariate outlier detection to improve performance of hypertension prediction PDF

Comprehensive evaluation under covariate shift

[23] Selective question answering under domain shift PDF

[51] Confidence-based Class Relation Embedding for Few-shot Domain Adaptation PDF

[52] Spurious correlations in machine learning: A survey PDF

[53] Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models PDF

[54] Noisy-aware unsupervised domain adaptation for scene text recognition PDF

[55] To annotate or not? predicting performance drop under domain shift PDF

[56] The Generalization and Error Detection in LLM-based Text-to-SQL Systems PDF

[57] Confident Learning-Based Domain Adaptation for Hyperspectral Image Classification PDF

[58] Improving selective classification performance of deep neural networks through post-hoc logit normalization and temperature scaling PDF

[59] Detecting covariate shifts with vision-language foundation models PDF

Table of Contents

[62] A novel anomaly detection method based on adaptive Mahalanobis-squared distance and one-class kNN rule for structural health monitoring under environmental â¦ PDF