Mitigating Spurious Correlation via Distributionally Robust Learning with Hierarchical Ambiguity Sets

ICLR 2026 Conference SubmissionAnonymous Authors
Spurious CorrelationSubpopulation ShiftGroup Distributionally Robust OptimizationWasserstein Distributionally Robust Optimization
Abstract:

Conventional supervised learning methods are often vulnerable to spurious correlations, particularly under distribution shifts in test data. To address this issue, several approaches, most notably Group DRO, have been developed. While these methods are highly robust to subpopulation or group shifts, they remain vulnerable to intra-group distributional shifts, which frequently occur in minority groups with limited samples. We propose a hierarchical extension of Group DRO that addresses both inter-group and intra-group uncertainties, providing robustness to distribution shifts at multiple levels. We also introduce new benchmark settings that simulate realistic minority group distribution shifts—an important yet previously underexplored challenge in spurious correlation research. Our method demonstrates strong robustness under these conditions—where existing robust learning methods consistently fail—while also achieving superior performance on standard benchmarks. These results highlight the importance of broadening the ambiguity set to better capture both inter-group and intra-group distributional uncertainties.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a hierarchical extension of Group DRO that addresses both inter-group and intra-group distributional uncertainties, aiming to improve robustness under distribution shifts at multiple levels. It resides in the 'Hierarchical Ambiguity Set and Multi-Granular Decomposition' leaf, which contains only three papers total, including this one. This leaf sits within the broader 'Hierarchical and Multi-Level Robustness Frameworks' branch, indicating a relatively sparse research direction focused on explicit hierarchical uncertainty modeling. The small number of sibling papers suggests this is an emerging rather than crowded area.

The taxonomy reveals several neighboring directions: 'Hierarchical Feature Disentanglement and Representation Learning' (three papers) focuses on separating domain-related from invariant features, while 'Causal Inference and Invariance Learning' pursues stable predictors through causal mechanisms. The paper's hierarchical ambiguity set approach contrasts with causal disentanglement methods that learn invariances end-to-end, and differs from temporal dependency modeling that addresses evolving distributions over time. Its formal optimization framework distinguishes it from more detection-oriented approaches like zero-shot spurious correlation methods in adjacent branches.

Among 24 candidates examined across three contributions, the hierarchical ambiguity set contribution (4 candidates examined) shows no clear refutation, suggesting relative novelty in this specific formulation. However, the tractable minimax optimization algorithm (10 candidates, 2 refutable) and new benchmark settings for minority-group shifts (10 candidates, 1 refutable) encounter more substantial prior work. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The core hierarchical framework appears more distinctive than its algorithmic implementation or evaluation protocols.

Based on the 24-candidate search, the work occupies a sparsely populated research direction with limited direct competition in hierarchical ambiguity sets. The analysis captures semantic neighbors and citation-expanded papers but cannot claim comprehensive field coverage. The hierarchical uncertainty modeling appears relatively novel, while the optimization techniques and benchmarking contributions face more overlap with existing literature within the examined scope.

Taxonomy

Core-task Taxonomy Papers
17
3
Claimed Contributions
24
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Mitigating spurious correlations under distribution shifts with hierarchical robustness. The field addresses the challenge of building models that remain reliable when test distributions differ from training data, particularly when spurious features mislead standard learning. The taxonomy organizes approaches into several main branches: Hierarchical and Multi-Level Robustness Frameworks decompose the problem across granularities or nested uncertainty sets; Causal Inference and Invariance Learning seeks stable predictors by identifying invariant causal mechanisms; Temporal Dependency and Dynamic Distribution Modeling handles shifts that evolve over time; Meta-Learning and Generalization Under Distribution Shifts trains models to adapt quickly across diverse scenarios; Domain-Adaptive Detection and Classification tailors representations to new domains; and Sparse and Flexible Model Design for Robustness emphasizes architectures that avoid overfitting to spurious cues. Together, these branches reflect a spectrum from explicit causal reasoning to adaptive learning strategies, each targeting robustness from a different angle. Recent work highlights contrasts between methods that impose hierarchical structure versus those that learn invariances end-to-end. For instance, Hierarchical Ambiguity Sets[0] and HQD-EM[12] both leverage multi-granular decompositions to manage uncertainty at different levels, offering principled ways to balance worst-case robustness with empirical performance. Meanwhile, Zero-Shot Spurious Correlations[1] explores detecting and mitigating spurious features without retraining, and Graph Causal Ensembles[4] and Causal Disentanglement Generalization[8] pursue causal structures to isolate invariant predictors. The original paper, Hierarchical Ambiguity Sets[0], sits squarely within the hierarchical robustness branch, emphasizing nested ambiguity sets to systematically address shifts at multiple scales. Compared to neighbors like HQD-EM[12], which also decomposes distributions hierarchically, Hierarchical Ambiguity Sets[0] appears to focus more on formal optimization frameworks, while Zero-Shot Spurious Correlations[1] takes a more detection-oriented stance. This positioning underscores an ongoing tension between structured, theory-driven approaches and flexible, data-driven adaptation strategies.

Claimed Contributions

Hierarchical ambiguity set for distributionally robust optimization

The authors introduce a hierarchical extension of Group DRO that models distributional uncertainty at two levels: inter-group shifts (changes in group proportions) and intra-group shifts (within-group distributional variations). This framework uses a Wasserstein-distance-based formulation to provide robustness to distribution shifts at multiple levels, particularly for minority groups with limited samples.

4 retrieved papers
Tractable minimax optimization algorithm

The authors reformulate the hierarchical DRO problem into a tractable surrogate objective and provide an iterative coordinate-wise training procedure that alternates between updating semantic variables, group weights, and model parameters. This makes the framework computationally feasible for practical applications.

10 retrieved papers
Can Refute
New benchmark settings for minority-group distribution shifts

The authors construct modified versions of standard benchmarks (CMNIST, Waterbirds, CelebA) that simulate realistic intra-group distributional shifts in minority groups by altering train-test splits. These settings expose a critical failure mode where existing robust learning methods perform poorly, while their proposed method maintains strong performance.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Hierarchical ambiguity set for distributionally robust optimization

The authors introduce a hierarchical extension of Group DRO that models distributional uncertainty at two levels: inter-group shifts (changes in group proportions) and intra-group shifts (within-group distributional variations). This framework uses a Wasserstein-distance-based formulation to provide robustness to distribution shifts at multiple levels, particularly for minority groups with limited samples.

Contribution

Tractable minimax optimization algorithm

The authors reformulate the hierarchical DRO problem into a tractable surrogate objective and provide an iterative coordinate-wise training procedure that alternates between updating semantic variables, group weights, and model parameters. This makes the framework computationally feasible for practical applications.

Contribution

New benchmark settings for minority-group distribution shifts

The authors construct modified versions of standard benchmarks (CMNIST, Waterbirds, CelebA) that simulate realistic intra-group distributional shifts in minority groups by altering train-test splits. These settings expose a critical failure mode where existing robust learning methods perform poorly, while their proposed method maintains strong performance.