Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Privacymodel inversionadversarial examplesmemorization

In this paper, we show that the prevailing view that information dependency (including rote memorization) drives training data exposure to image reconstruction attacks is incomplete. We find that extensive exposure can persist without rote memorization, driven instead by a tunable connection to adversarial robustness. We begin by presenting three surprising results: (1) recent defenses that inhibit reconstruction by Model Inversion Attacks (MIAs), which evaluate leakage under an idealized attacker, do not reduce standard measures of information dependency (HSIC); (2) models that maximally memorize their training datasets remain robust to MIA reconstruction; and (3) models trained without seeing 97% of the training pixels, where recent information-theoretic bounds give arbitrarily strong privacy guarantees under standard assumptions, can still be devastatingly reconstructed by MIA. To explain these findings, we provide causal evidence that privacy under MIA arises from what the adversarial examples literature calls ``non-robust'' features (generalizable but imperceptible and unstable features). We further show that recent MIA defenses obtain their privacy improvements by unintentionally shifting models toward such features. We leverage this mechanism to introduce Anti-Adversarial Training (AT-AT), a training regime that intentionally learns non-robust features to obtain both superior reconstruction defense and higher accuracy than state-of-the-art defenses. Our results revise the prevailing understanding of training data exposure and reveal a new privacy-robustness tradeoff.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper challenges the prevailing assumption that information dependency drives training data exposure to model inversion attacks, proposing instead that adversarial robustness plays a central role. It resides in the 'Attack Surface Analysis and Vulnerability Factors' leaf, which contains only three papers total. This leaf sits under the broader 'Model Inversion Attack Mechanisms and Characterization' branch, indicating the work contributes to understanding what makes models vulnerable rather than proposing new attacks or defenses. The sparse population of this specific leaf suggests that systematic analysis of vulnerability factors remains an underexplored direction within the field.

The taxonomy reveals that most research effort concentrates on defense mechanisms, with four major branches dedicated to training-time interventions, deployment-time protections, federated learning privacy, and domain-specific solutions. The original paper's branch on attack mechanisms contains only two subtopics: attack methodology and vulnerability analysis. Neighboring leaves focus on reconstruction techniques and attack algorithms, while the paper's leaf specifically examines architectural features and training configurations that increase vulnerability. The scope note clarifies that this leaf excludes general attack methods, positioning the work as analytical rather than adversarial. This structural context suggests the paper addresses a gap in understanding root causes of vulnerability rather than iterating on existing attack or defense paradigms.

Among twenty-nine candidates examined, the contribution on privacy-adversarial robustness tradeoff shows one refutable candidate, while the other two contributions (evidence against information dependency and the AT-AT defense method) show no clear refutations across ten and nine candidates respectively. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The finding that information dependency does not drive leakage appears more novel within this search window, with no examined candidates directly contradicting it. The adversarial robustness connection has at least one overlapping prior work among the candidates reviewed, suggesting this mechanism has received some prior attention. The AT-AT defense method shows no refutations among nine candidates, though this may reflect the method's specificity rather than fundamental novelty.

Based on the limited search scope of twenty-nine candidates, the work appears to offer fresh perspective on vulnerability factors in a relatively sparse research direction. The taxonomy structure indicates that systematic vulnerability analysis receives less attention than defense development, and the sibling papers in the same leaf focus on different aspects of attack surfaces. However, the analysis cannot rule out relevant prior work outside the top-K semantic matches examined, particularly in adjacent fields like adversarial robustness or information theory that may not surface in model inversion literature searches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: training data privacy against model inversion attacks. The field addresses how adversaries can reconstruct sensitive training samples from deployed models and how to defend against such threats. The taxonomy organizes research into several major branches: understanding attack mechanisms and vulnerability factors, developing training-time defenses that modify learning procedures, deploying runtime protections that limit information leakage, securing federated and distributed settings where gradients are shared, tailoring solutions to domain-specific constraints, establishing evaluation frameworks for comparing methods, and connecting to the broader privacy landscape including differential privacy and related threats. Representative works span attack characterization such as Be Careful What You[18] and On the Vulnerability of[22], training-time interventions like Bilateral dependency optimization[9], deployment-time protections including Neural Honeypoint[8] and TrapMI[39], federated defenses such as RVE-PFL[12], and evaluation benchmarks like MIBench[25]. Several active lines of work reveal key trade-offs and open questions. One cluster focuses on understanding what makes models vulnerable, examining factors like model architecture, training data characteristics, and the information dependency between inputs and outputs. Another explores training-time mitigations that balance privacy with utility, ranging from differential privacy approaches like Broadening differential privacy for[1] to gradient masking strategies such as Adaptive Hybrid Masking Strategy[14]. Deployment-time defenses offer an alternative by injecting noise or using adversarial perturbations at inference, as seen in Combining stochastic defenses to[13] and Get your foes fooled[19]. The original paper Reducing information dependency does[0] sits within the attack surface analysis cluster, closely examining vulnerability factors alongside Be Careful What You[18] and On the Vulnerability of[22]. While those neighbors characterize specific attack scenarios or model weaknesses, Reducing information dependency does[0] emphasizes how reducing the dependency between model outputs and training inputs can fundamentally limit inversion risk, offering a complementary perspective on what drives vulnerability.

Claimed Contributions

Evidence that information dependency does not cause training data privacy leakage

10 retrieved papers

The authors present three experimental findings demonstrating that reducing information dependency or memorization does not prevent Model Inversion Attack (MIA) reconstructions. They show that effective defenses do not reduce HSIC metrics, models with maximal memorization remain robust to MIA, and models trained on heavily censored data can still be reconstructed.

10 retrieved papers

Privacy-adversarial robustness tradeoff mechanism

Can Refute

10 retrieved papers

The authors establish that MIA privacy improvements in recent defenses correlate strongly with increased vulnerability to adversarial examples. They demonstrate that privacy leakage can be predicted almost perfectly from robust accuracy alone, revealing an unintentional reliance on non-robust features for privacy.

10 retrieved papers

Can Refute

Anti Adversarial Training (AT-AT) defense method

9 retrieved papers

The authors propose AT-AT, a novel training approach that deliberately shifts models toward non-robust but generalizable features by reversing standard adversarial training. This method achieves superior reconstruction defense and higher accuracy than state-of-the-art defenses while making the privacy-robustness tradeoff a tunable parameter.

9 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[18] Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks PDF

Struppek, Lukas, Lukas Struppek, Hintersdorf, Dominik, Dominik Hintersdorf, Kersting, Kristian, Kristian Kersting, K. Kersting (2023) • International Conference on Learning Representations

[22] On the Vulnerability of Skip Connections to Model Inversion Attacks PDF

Koh Jun Hao, Sy-Tuyen Ho, Ngoc Bao Nguyen, Ngai-Man Cheung (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Evidence that information dependency does not cause training data privacy leakage

[51] Reconstructing Training Data with Informed Adversaries PDF

Cannot Refute

[52] Mitigating Data Exfiltration Attacks Through Layer-Wise Learning Rate Decay Fine-Tuning PDF

Cannot Refute

[53] SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark PDF

Cannot Refute

[54] Reconstructing Training Data from Trained Neural Networks PDF

Cannot Refute

[55] Memory Backdoor Attacks on Neural Networks PDF

Cannot Refute

[56] No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks PDF

Cannot Refute

[57] A Lightweight Image Super-Resolution Network Based on ESRGAN for Rapid Tomato Leaf Disease Classification PDF

Cannot Refute

[58] Latent Diffusion Inversion Requires Understanding the Latent Space PDF

Cannot Refute

[59] LeakyCLIP: Extracting Training Data from CLIP PDF

Cannot Refute

[60] Incorporation of local dependent reliability information into the Prior Image Constrained Compressed Sensing (PICCS) reconstruction algorithm. PDF

Cannot Refute

Contribution

Privacy-adversarial robustness tradeoff mechanism

[74] Robust or private? adversarial training makes models more vulnerable to privacy attacks PDF

Can Refute

[3] Privacy-Preserving Task-Oriented Semantic Communications Against Model Inversion Attacks PDF

Cannot Refute

[9] Bilateral dependency optimization: Defending against model-inversion attacks PDF

Cannot Refute

[23] Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering PDF

Cannot Refute

[36] Defending against model inversion attacks via random erasing PDF

Cannot Refute

[50] Improving Robustness to Model Inversion Attacks via Mutual Information Regularization PDF

Cannot Refute

[70] Investigation of the Robustness of XAI-Based Federated Learning Against Adversarial Attacks for Smart Grid False Data Detection PDF

Cannot Refute

[71] Crafter: Facial feature crafting against inversion-based identity theft on deep models PDF

Cannot Refute

[72] Extracting robust models with uncertain examples PDF

Cannot Refute

[73] Robust zero-watermarking algorithm for diffusion-weighted images based on multiscale feature fusion PDF

Cannot Refute

Contribution

Anti Adversarial Training (AT-AT) defense method

[61] Robust medical diagnosis: a novel two-phase deep learning framework for adversarial proof disease detection in radiology images PDF

Cannot Refute

[62] Transferable facial privacy protection against blind face restoration via domain-consistent adversarial obfuscation PDF

Cannot Refute

[63] Addressing the false negative problem of deep learning MRI reconstruction models by adversarial attacks and robust training PDF

Cannot Refute

[64] A Survey on Image Perturbations for Model Robustness: Attacks and Defenses PDF

Cannot Refute

[65] Disentangled Information Bottleneck for Adversarial Text Defense PDF

Cannot Refute

[66] AI Security and Privacy PDF

Cannot Refute

[67] PNRF: Defending against reconstruction attacks in split federated learning via adversarial perturbation on non-robust features PDF

Cannot Refute

[68] Privacy Attacks and Defenses Under Security Threats in Machine Learning PDF

Cannot Refute

[69] Squeeze And Reconstruct: Improved Practical Adversarial Defense Using Paired Image Compression And Reconstruction PDF

Cannot Refute

Reducing information dependency does not cause training data privacy. Adversarially non-robust features do.

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[18] Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks PDF

[22] On the Vulnerability of Skip Connections to Model Inversion Attacks PDF

Contribution Analysis

Evidence that information dependency does not cause training data privacy leakage

[51] Reconstructing Training Data with Informed Adversaries PDF

[52] Mitigating Data Exfiltration Attacks Through Layer-Wise Learning Rate Decay Fine-Tuning PDF

[53] SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark PDF

[54] Reconstructing Training Data from Trained Neural Networks PDF

[55] Memory Backdoor Attacks on Neural Networks PDF

[56] No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks PDF

[57] A Lightweight Image Super-Resolution Network Based on ESRGAN for Rapid Tomato Leaf Disease Classification PDF

[58] Latent Diffusion Inversion Requires Understanding the Latent Space PDF

[59] LeakyCLIP: Extracting Training Data from CLIP PDF

[60] Incorporation of local dependent reliability information into the Prior Image Constrained Compressed Sensing (PICCS) reconstruction algorithm. PDF

Privacy-adversarial robustness tradeoff mechanism

[74] Robust or private? adversarial training makes models more vulnerable to privacy attacks PDF

[3] Privacy-Preserving Task-Oriented Semantic Communications Against Model Inversion Attacks PDF

[9] Bilateral dependency optimization: Defending against model-inversion attacks PDF

[23] Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering PDF

[36] Defending against model inversion attacks via random erasing PDF

[50] Improving Robustness to Model Inversion Attacks via Mutual Information Regularization PDF

[70] Investigation of the Robustness of XAI-Based Federated Learning Against Adversarial Attacks for Smart Grid False Data Detection PDF

[71] Crafter: Facial feature crafting against inversion-based identity theft on deep models PDF

[72] Extracting robust models with uncertain examples PDF

[73] Robust zero-watermarking algorithm for diffusion-weighted images based on multiscale feature fusion PDF

Anti Adversarial Training (AT-AT) defense method

[61] Robust medical diagnosis: a novel two-phase deep learning framework for adversarial proof disease detection in radiology images PDF

[62] Transferable facial privacy protection against blind face restoration via domain-consistent adversarial obfuscation PDF

[63] Addressing the false negative problem of deep learning MRI reconstruction models by adversarial attacks and robust training PDF

[64] A Survey on Image Perturbations for Model Robustness: Attacks and Defenses PDF

[65] Disentangled Information Bottleneck for Adversarial Text Defense PDF

[66] AI Security and Privacy PDF

[67] PNRF: Defending against reconstruction attacks in split federated learning via adversarial perturbation on non-robust features PDF

[68] Privacy Attacks and Defenses Under Security Threats in Machine Learning PDF

[69] Squeeze And Reconstruct: Improved Practical Adversarial Defense Using Paired Image Compression And Reconstruction PDF

Table of Contents