Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.

ICLR 2026 Conference SubmissionAnonymous Authors
Privacymodel inversionadversarial examplesmemorization
Abstract:

In this paper, we show that the prevailing view that information dependency (including rote memorization) drives training data exposure to image reconstruction attacks is incomplete. We find that extensive exposure can persist without rote memorization, driven instead by a tunable connection to adversarial robustness. We begin by presenting three surprising results: (1) recent defenses that inhibit reconstruction by Model Inversion Attacks (MIAs), which evaluate leakage under an idealized attacker, do not reduce standard measures of information dependency (HSIC); (2) models that maximally memorize their training datasets remain robust to MIA reconstruction; and (3) models trained without seeing 97% of the training pixels, where recent information-theoretic bounds give arbitrarily strong privacy guarantees under standard assumptions, can still be devastatingly reconstructed by MIA. To explain these findings, we provide causal evidence that privacy under MIA arises from what the adversarial examples literature calls ``non-robust'' features (generalizable but imperceptible and unstable features). We further show that recent MIA defenses obtain their privacy improvements by unintentionally shifting models toward such features. We leverage this mechanism to introduce Anti-Adversarial Training (AT-AT), a training regime that intentionally learns non-robust features to obtain both superior reconstruction defense and higher accuracy than state-of-the-art defenses. Our results revise the prevailing understanding of training data exposure and reveal a new privacy-robustness tradeoff.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper challenges the prevailing assumption that information dependency drives training data exposure to model inversion attacks, proposing instead that adversarial robustness plays a central role. It resides in the 'Attack Surface Analysis and Vulnerability Factors' leaf, which contains only three papers total. This leaf sits under the broader 'Model Inversion Attack Mechanisms and Characterization' branch, indicating the work contributes to understanding what makes models vulnerable rather than proposing new attacks or defenses. The sparse population of this specific leaf suggests that systematic analysis of vulnerability factors remains an underexplored direction within the field.

The taxonomy reveals that most research effort concentrates on defense mechanisms, with four major branches dedicated to training-time interventions, deployment-time protections, federated learning privacy, and domain-specific solutions. The original paper's branch on attack mechanisms contains only two subtopics: attack methodology and vulnerability analysis. Neighboring leaves focus on reconstruction techniques and attack algorithms, while the paper's leaf specifically examines architectural features and training configurations that increase vulnerability. The scope note clarifies that this leaf excludes general attack methods, positioning the work as analytical rather than adversarial. This structural context suggests the paper addresses a gap in understanding root causes of vulnerability rather than iterating on existing attack or defense paradigms.

Among twenty-nine candidates examined, the contribution on privacy-adversarial robustness tradeoff shows one refutable candidate, while the other two contributions (evidence against information dependency and the AT-AT defense method) show no clear refutations across ten and nine candidates respectively. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The finding that information dependency does not drive leakage appears more novel within this search window, with no examined candidates directly contradicting it. The adversarial robustness connection has at least one overlapping prior work among the candidates reviewed, suggesting this mechanism has received some prior attention. The AT-AT defense method shows no refutations among nine candidates, though this may reflect the method's specificity rather than fundamental novelty.

Based on the limited search scope of twenty-nine candidates, the work appears to offer fresh perspective on vulnerability factors in a relatively sparse research direction. The taxonomy structure indicates that systematic vulnerability analysis receives less attention than defense development, and the sibling papers in the same leaf focus on different aspects of attack surfaces. However, the analysis cannot rule out relevant prior work outside the top-K semantic matches examined, particularly in adjacent fields like adversarial robustness or information theory that may not surface in model inversion literature searches.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
29
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: training data privacy against model inversion attacks. The field addresses how adversaries can reconstruct sensitive training samples from deployed models and how to defend against such threats. The taxonomy organizes research into several major branches: understanding attack mechanisms and vulnerability factors, developing training-time defenses that modify learning procedures, deploying runtime protections that limit information leakage, securing federated and distributed settings where gradients are shared, tailoring solutions to domain-specific constraints, establishing evaluation frameworks for comparing methods, and connecting to the broader privacy landscape including differential privacy and related threats. Representative works span attack characterization such as Be Careful What You[18] and On the Vulnerability of[22], training-time interventions like Bilateral dependency optimization[9], deployment-time protections including Neural Honeypoint[8] and TrapMI[39], federated defenses such as RVE-PFL[12], and evaluation benchmarks like MIBench[25]. Several active lines of work reveal key trade-offs and open questions. One cluster focuses on understanding what makes models vulnerable, examining factors like model architecture, training data characteristics, and the information dependency between inputs and outputs. Another explores training-time mitigations that balance privacy with utility, ranging from differential privacy approaches like Broadening differential privacy for[1] to gradient masking strategies such as Adaptive Hybrid Masking Strategy[14]. Deployment-time defenses offer an alternative by injecting noise or using adversarial perturbations at inference, as seen in Combining stochastic defenses to[13] and Get your foes fooled[19]. The original paper Reducing information dependency does[0] sits within the attack surface analysis cluster, closely examining vulnerability factors alongside Be Careful What You[18] and On the Vulnerability of[22]. While those neighbors characterize specific attack scenarios or model weaknesses, Reducing information dependency does[0] emphasizes how reducing the dependency between model outputs and training inputs can fundamentally limit inversion risk, offering a complementary perspective on what drives vulnerability.

Claimed Contributions

Evidence that information dependency does not cause training data privacy leakage

The authors present three experimental findings demonstrating that reducing information dependency or memorization does not prevent Model Inversion Attack (MIA) reconstructions. They show that effective defenses do not reduce HSIC metrics, models with maximal memorization remain robust to MIA, and models trained on heavily censored data can still be reconstructed.

10 retrieved papers
Privacy-adversarial robustness tradeoff mechanism

The authors establish that MIA privacy improvements in recent defenses correlate strongly with increased vulnerability to adversarial examples. They demonstrate that privacy leakage can be predicted almost perfectly from robust accuracy alone, revealing an unintentional reliance on non-robust features for privacy.

10 retrieved papers
Can Refute
Anti Adversarial Training (AT-AT) defense method

The authors propose AT-AT, a novel training approach that deliberately shifts models toward non-robust but generalizable features by reversing standard adversarial training. This method achieves superior reconstruction defense and higher accuracy than state-of-the-art defenses while making the privacy-robustness tradeoff a tunable parameter.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Evidence that information dependency does not cause training data privacy leakage

The authors present three experimental findings demonstrating that reducing information dependency or memorization does not prevent Model Inversion Attack (MIA) reconstructions. They show that effective defenses do not reduce HSIC metrics, models with maximal memorization remain robust to MIA, and models trained on heavily censored data can still be reconstructed.

Contribution

Privacy-adversarial robustness tradeoff mechanism

The authors establish that MIA privacy improvements in recent defenses correlate strongly with increased vulnerability to adversarial examples. They demonstrate that privacy leakage can be predicted almost perfectly from robust accuracy alone, revealing an unintentional reliance on non-robust features for privacy.

Contribution

Anti Adversarial Training (AT-AT) defense method

The authors propose AT-AT, a novel training approach that deliberately shifts models toward non-robust but generalizable features by reversing standard adversarial training. This method achieves superior reconstruction defense and higher accuracy than state-of-the-art defenses while making the privacy-robustness tradeoff a tunable parameter.