Contrastive Predictive Coding Done Right for Mutual Information Estimation

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

information estimationcontrastive predictive codingrepresentation learningnoise contrastive estimationdensity ratio estimation

The InfoNCE objective, originally introduced for contrastive representation learning, has become a popular choice for mutual information (MI) estimation, despite its indirect connection to MI. In this paper, we demonstrate why InfoNCE should not be regarded as a valid MI estimator, and we introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation. Our modification introduces an auxiliary \emph{anchor} class, enabling consistent density ratio estimation and yielding a plug-in MI estimator with significantly reduced bias. Beyond this, we generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed. This formulation unifies a broad spectrum of contrastive objectives, including NCE, InfoNCE, and $f$ -divergence variants, under a single principled framework. Empirically, we find that InfoNCE-anchor with the log score achieves the most accurate MI estimates; however, in self-supervised representation learning experiments, we find that the anchor does not improve the downstream task performance. These findings corroborate that contrastive representation learning benefits not from accurate MI estimation per se, but from the learning of structured density ratios.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes InfoNCE-anchor, a modified contrastive objective that introduces an auxiliary anchor class to enable consistent density ratio estimation and reduce bias in mutual information estimation. It also provides a tight characterization of InfoNCE as a K-way Jensen–Shannon divergence bound and unifies contrastive objectives through proper scoring rules. Within the taxonomy, this work resides in the 'Contrastive MI Bounds and Estimators' leaf under 'MI Estimation Theory and Bounds', alongside three sibling papers that similarly develop novel bounds and estimators for MI using contrastive principles. This leaf represents a focused but not overcrowded research direction within the broader fifty-paper taxonomy.

The taxonomy reveals that MI estimation theory branches into three subcategories: foundational bounds and estimators, refinement techniques addressing variance and optimization, and theoretical unification frameworks. The paper's leaf sits at the foundational level, while neighboring leaves address decomposition methods and energy-based refinements. The broader 'MI Estimation Theory and Bounds' branch contrasts with application-oriented branches such as Graph Representation Learning and Visual Representation Learning, which apply contrastive MI principles to domain-specific data. The scope note for this leaf explicitly excludes application-specific methods, positioning the work as a core theoretical contribution rather than an empirical extension to particular data modalities.

Among ten candidates examined across three contributions, two refutable pairs emerged. The InfoNCE-anchor objective faced one refutable candidate among one examined, suggesting substantial prior work on density ratio estimation modifications. The Jensen–Shannon divergence characterization encountered no refutations across five candidates, indicating relative novelty in this theoretical framing. The proper scoring rules unification found one refutable candidate among four examined, pointing to some existing work on unifying contrastive frameworks. The limited search scope—ten candidates total rather than hundreds—means these statistics reflect top semantic matches and immediate citations, not an exhaustive field survey. Contributions two and three appear more novel within this constrained examination.

Based on the top-ten semantic matches and citation expansion, the work appears to offer incremental theoretical refinements in a moderately explored area. The anchor modification and proper scoring rules framework show partial overlap with prior efforts, while the Jensen–Shannon characterization may represent a more distinctive contribution. The analysis does not cover the full landscape of contrastive MI estimation, so conclusions remain provisional pending broader literature review.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: mutual information estimation using contrastive objectives. The field organizes around several major branches that reflect both theoretical foundations and diverse application contexts. MI Estimation Theory and Bounds develops the mathematical underpinnings, including novel contrastive bounds and estimators that enable tractable optimization in high-dimensional spaces. Graph Representation Learning and Visual Representation Learning apply these principles to structured and perceptual data respectively, often employing contrastive frameworks to learn embeddings that preserve semantic relationships. Multimodal and Cross-Domain Learning extends MI-based objectives to settings where heterogeneous data sources must be aligned or transferred, while Knowledge Transfer and Distillation leverages mutual information to guide model compression and teacher-student paradigms. Specialized Application Domains capture niche uses ranging from fairness-aware clustering to trajectory modeling, illustrating the breadth of contrastive MI techniques across problem settings. Within the theoretical branch, a handful of works focus on tightening variational bounds and improving sample efficiency of MI estimators. Club[2] and Tight mutual information estimation[9] exemplify efforts to refine upper and lower bounds, addressing bias-variance trade-offs that plague naive contrastive methods. Contrastive Predictive Coding Done[0] situates itself in this lineage of contrastive MI bounds and estimators, emphasizing rigorous derivation of objectives that balance computational tractability with statistical accuracy. Nearby, On mutual information in[16] explores foundational questions about what contrastive losses actually optimize, while works like Infogcl[5] and Graph contrastive learning with[1] demonstrate how similar principles adapt to graph-structured domains. The interplay between theoretical guarantees and empirical performance remains an open question, with some studies prioritizing provable convergence and others favoring scalable heuristics that work well in practice despite weaker formal assurances.

Claimed Contributions

InfoNCE-anchor objective for consistent density ratio estimation

Can Refute

1 retrieved paper

The authors propose InfoNCE-anchor, a modification of the InfoNCE objective that adds an auxiliary anchor class. This enables the critic to estimate the density ratio directly without multiplicative ambiguity, facilitating consistent density ratio estimation and producing a plug-in MI estimator with lower bias than InfoNCE.

1 retrieved paper

Can Refute

Tight characterization of InfoNCE as K-way Jensen–Shannon divergence bound

5 retrieved papers

The authors establish that the InfoNCE objective is a tight variational lower bound of a K-way generalization of Jensen–Shannon divergence, not mutual information. This clarifies why InfoNCE cannot serve as a direct MI estimator and reveals its fundamental limitation for MI estimation.

5 retrieved papers

Unification of contrastive objectives via proper scoring rules

Can Refute

4 retrieved papers

The authors generalize InfoNCE-anchor using proper scoring rules from statistical decision theory, showing that InfoNCE-anchor corresponds to the log score. This framework unifies various contrastive objectives such as NCE, InfoNCE, and f-divergence variants under a single principled approach for density ratio estimation.

4 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[2] Club: A contrastive log-ratio upper bound of mutual information PDF

Pengyu Cheng, Weituo Hao, Shuyang Dai, Jiachang Liu, Zhe Gan, Lawrence Carin, L. Carin (2020)

[9] Tight mutual information estimation with contrastive fenchel-legendre optimization PDF

Guo, Qing, Qing Guo, Chen Junya, Junya Chen, Wang Dong, Dong Wang, Yang Yue-wei, Yuewei Yang, Deng, Xinwei, Xinwei Deng, Carin Lawrence, Lawrence Carin, Li Fan, Fan Li, L. Carin, Huang Jing, Chenyang Tao, Tao, Chenyang (2022)

[16] On mutual information in contrastive learning for visual representations PDF

Wu, Mike, Mike Wu, Zhuang, Chengxu, Chengxu Zhuang, MossÃ©, Milan, Milan Mosse, Yamins, Daniel, Daniel Yamins, Milan MossÃ©, Goodman, Noah, Noah D. Goodman, Daniel L. K. Yamins (2020)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

InfoNCE-anchor objective for consistent density ratio estimation

[56] Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression PDF

Can Refute

Contribution

Tight characterization of InfoNCE as K-way Jensen–Shannon divergence bound

[51] Mani: Maximizing mutual information for nuclei cross-domain unsupervised segmentation PDF

Cannot Refute

[52] Ïµ-Fair: Unifying Algorithmic Fairness via Information Budgets PDF

Cannot Refute

[53] Robust Multimodal Learning with Disentangled Representation via Mixture-of-Experts PDF

Cannot Refute

[54] Unsupervised Domain Adaptation via Joint Contrastive Learning PDF

Cannot Refute

[55] Neural Mutual Information Estimation with Reference Distributions PDF

Cannot Refute

Contribution

Unification of contrastive objectives via proper scoring rules

[59] A Unified Framework for Multi-distribution Density Ratio Estimation PDF

Can Refute

[57] Contrastive conditional latent diffusion for audio-visual segmentation PDF

Cannot Refute

[58] Pseudo-spherical contrastive divergence PDF

Cannot Refute

[60] Estimators for unnormalized statistical models based on self density ratio PDF

Cannot Refute

Contrastive Predictive Coding Done Right for Mutual Information Estimation

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[2] Club: A contrastive log-ratio upper bound of mutual information PDF

[9] Tight mutual information estimation with contrastive fenchel-legendre optimization PDF

[16] On mutual information in contrastive learning for visual representations PDF

Contribution Analysis

InfoNCE-anchor objective for consistent density ratio estimation

[56] Estimating the Density Ratio between Distributions with High Discrepancy using Multinomial Logistic Regression PDF

Tight characterization of InfoNCE as K-way Jensen–Shannon divergence bound

[51] Mani: Maximizing mutual information for nuclei cross-domain unsupervised segmentation PDF

[52] Ïµ-Fair: Unifying Algorithmic Fairness via Information Budgets PDF

[53] Robust Multimodal Learning with Disentangled Representation via Mixture-of-Experts PDF

[54] Unsupervised Domain Adaptation via Joint Contrastive Learning PDF

[55] Neural Mutual Information Estimation with Reference Distributions PDF

Unification of contrastive objectives via proper scoring rules

[59] A Unified Framework for Multi-distribution Density Ratio Estimation PDF

[57] Contrastive conditional latent diffusion for audio-visual segmentation PDF

[58] Pseudo-spherical contrastive divergence PDF

[60] Estimators for unnormalized statistical models based on self density ratio PDF

Table of Contents