FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation

ICLR 2026 Conference SubmissionAnonymous Authors
Data-Free Robustness Distillation; Robust Fairness
Abstract:

Data-Free Robustness Distillation (DFRD) aims to transfer the robustness from the teacher to the student without accessing the training data. While existing methods focus on overall robustness, they overlook the robust fairness issues, leading to severe disparity of robustness across different categories. In this paper, we find two key problems: (1) student model distilled with equal class proportion data behaves significantly different across distinct categories; and (2) the robustness of student model is not stable across different attacks target. To bridge these gaps, we present the first Fairness Enhanced data-free Robustness Distillation (FERD) framework to adjust the proportion and distribution of adversarial examples. For the proportion, FERD adopts a robustness guided class reweighting strategy to synthesize more samples for the less robust categories, thereby improving robustness of them. For the distribution, FERD generates complementary data samples for advanced robustness distillation. It generates Fairness-Aware Examples (FAEs) by enforcing a uniformity constraint on feature-level predictions, which suppress the dominance of class-specific non-robust features, providing a more balanced representation across all categories. Then, FERD constructs Uniform-Target Adversarial Examples (UTAEs) from FAEs by applying a uniform target class constraint to avoid biased attack directions, which distribute the attack targets across all categories and prevents overfitting to specific vulnerable categories. Extensive experiments on three public datasets show that FERD achieves state-of-the-art worst-class robustness under all adversarial attack (e.g., the worst-class robustness under FGSM and AutoAttack are improved by 15.1% and 6.4% using MobileNetV2 on CIFAR-10), demonstrating superior performance in both robustness and fairness aspects. Our code is available at: https://anonymous.4open.science/r/FERD-2A48/.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces FERD, a framework for data-free adversarial robustness distillation that explicitly addresses fairness across categories. It resides in the 'Adversarial Robustness Distillation with Fairness Enhancement' leaf, which contains five papers total including the original work. This leaf sits within the broader 'Data-Free Knowledge Distillation with Fairness Objectives' branch, indicating a moderately populated research direction. The taxonomy reveals that while data-free distillation and adversarial robustness are established areas, their intersection with fairness objectives represents a more specialized niche with limited prior exploration.

The taxonomy structure shows neighboring leaves addressing class-imbalanced teacher distillation and fairness-aware methods without demographic information, suggesting related but distinct research threads. A parallel branch focuses on robustness and diversity enhancement without explicit fairness goals, while specialized applications occupy a separate top-level category. FERD's position bridges adversarial robustness concerns with fairness constraints, distinguishing it from sibling works that may emphasize demographic parity or bias mitigation through different mechanisms. The scope notes indicate that methods requiring original training data or lacking adversarial robustness focus belong elsewhere, clarifying FERD's unique positioning at this intersection.

Among the three identified contributions, the first claim of investigating robust fairness in data-free settings examined ten candidates and found one potentially refutable prior work, suggesting some overlap in problem formulation within the limited search scope. The second contribution on robustness-guided class reweighting examined two candidates with no clear refutation, indicating relative novelty in this specific mechanism. The third contribution on fairness-aware example generation examined one candidate without refutation. These statistics reflect a targeted literature search of thirteen total candidates, not an exhaustive survey, meaning additional relevant work may exist beyond this analysis.

Based on the limited search scope of thirteen candidates, the framework appears to occupy a recognizable but not densely populated research space. The taxonomy reveals that while individual components like adversarial distillation and fairness-aware learning have established foundations, their integration in data-free settings remains relatively underexplored. The analysis cannot definitively assess novelty beyond the examined candidates, and a broader literature review would be needed to confirm the extent of prior work addressing this specific combination of constraints.

Taxonomy

Core-task Taxonomy Papers
14
3
Claimed Contributions
13
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Fairness-enhanced data-free adversarial robustness distillation. The field structure suggested by this taxonomy centers on knowledge distillation methods that operate without access to original training data, while simultaneously addressing fairness and robustness concerns. The top-level branches reveal three main directions: one branch focuses explicitly on integrating fairness objectives into data-free distillation frameworks, often targeting demographic parity or bias mitigation across subgroups; a second branch emphasizes robustness and diversity in the distilled models, exploring adversarial training and ensemble techniques; and a third branch covers specialized applications such as medical imaging or federated settings. Representative works like Fairness without Demographics[3] and Anti-Bias Soft Label[6] illustrate how fairness constraints can be woven into distillation pipelines, while Robustness Diversity Distillation[13] exemplifies efforts to preserve model resilience under distribution shifts. Particularly active lines of work explore the tension between achieving strong adversarial robustness and maintaining fairness across sensitive attributes, especially when no original data is available for retraining. Methods such as Impartial Adversarial Distillation[4] and Fairness Logit Distillation[5] demonstrate contrasting strategies for balancing these dual objectives, with some emphasizing logit-level regularization and others leveraging synthetic data generation. The original paper FERD[0] sits squarely within the branch on adversarial robustness distillation with fairness enhancement, closely aligning with works like Anti-Bias Soft Label[6] and Class-wise Fair Training[8] that also tackle bias mitigation in resource-constrained or data-scarce scenarios. Compared to these neighbors, FERD[0] appears to integrate adversarial perturbations more tightly with fairness metrics, offering a unified framework that addresses both robustness and equity without relying on demographic labels or original datasets.

Claimed Contributions

First investigation of robust fairness in data-free robustness distillation

The authors identify and analyze robust fairness issues in data-free robustness distillation for the first time, discovering that students distilled with equal class proportions show class-wise robustness discrepancies and that attack success rates vary significantly by target class.

10 retrieved papers
Can Refute
FERD framework with robustness-guided class reweighting strategy

The authors introduce a fairness-enhanced data-free adversarial robustness distillation framework that adjusts sample proportions using a robustness-guided class reweighting strategy to synthesize more samples from weakly robust categories, improving their robustness.

2 retrieved papers
Fairness-Aware Examples and Uniform-Target Adversarial Examples generation methods

The authors design two complementary data generation methods: FAEs that suppress class-specific non-robust features through uniformity constraints on feature predictions, and UTAEs that distribute attack targets uniformly across categories to prevent biased attack directions.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First investigation of robust fairness in data-free robustness distillation

The authors identify and analyze robust fairness issues in data-free robustness distillation for the first time, discovering that students distilled with equal class proportions show class-wise robustness discrepancies and that attack success rates vary significantly by target class.

Contribution

FERD framework with robustness-guided class reweighting strategy

The authors introduce a fairness-enhanced data-free adversarial robustness distillation framework that adjusts sample proportions using a robustness-guided class reweighting strategy to synthesize more samples from weakly robust categories, improving their robustness.

Contribution

Fairness-Aware Examples and Uniform-Target Adversarial Examples generation methods

The authors design two complementary data generation methods: FAEs that suppress class-specific non-robust features through uniformity constraints on feature predictions, and UTAEs that distribute attack targets uniformly across categories to prevent biased attack directions.

FERD: Fairness-Enhanced Data-Free Adversarial Robustness Distillation | Novelty Validation