Bandit Learning in Matching Markets Robust to Adversarial Corruptions

ICLR 2026 Conference SubmissionAnonymous Authors
BanditsMatching marketsRobust algorithmsAdversarial corruptions
Abstract:

This paper investigates the problem of bandit learning in two-sided decentralized matching markets with adversarial corruptions. In matching markets, players on one side aim to learn their unknown preferences over arms on the other side through iterative online learning, with the goal of identifying the optimal stable match. However, in real-world applications, stochastic rewards observed by players may be corrupted by malicious adversaries, potentially misleading the learning process and causing convergence to a sub-optimal match. We study this problem under two settings: one where the corruption level CC (defined as the sum of the largest adversarial alterations to the feedback across rounds) is known, and another where it is unknown. For the known corruption setting, we develop a robust variant of the classical Explore-Then-Gale-Shapley (ETGS) algorithm by incorporating widened confidence intervals. For the unknown corruption case, we propose a Multi-layer ETGS race method that adaptively mitigates adversarial effects without prior corruption knowledge. We provide theoretical guarantees for both algorithms by establishing upper bounds on their optimal stable regret, and further derive the lower bound to demonstrate their optimality.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper develops robust bandit learning algorithms for two-sided matching markets under adversarial corruptions, addressing scenarios where feedback may be maliciously altered. Within the taxonomy, it occupies the 'Known Corruption Level Settings' leaf under 'Adversarial Robustness in Matching Markets'. Notably, this leaf contains only the original paper itself—no sibling papers were identified in the taxonomy. This positioning suggests the paper targets a relatively sparse research direction, specifically combining matching market learning with known adversarial corruption budgets, a problem formulation that appears underexplored in the examined literature.

The taxonomy reveals neighboring research directions that contextualize this work. The sibling leaf 'General Adversarial Multi-Armed Bandit Frameworks' addresses foundational adversarial bandit models but without matching market structure. Meanwhile, 'Decentralized Learning in Stochastic Matching Markets' examines similar two-sided learning problems under benign stochastic feedback, and 'Non-Stationary and Dynamic Matching Environments' considers temporal preference changes rather than adversarial interference. The paper thus bridges adversarial robustness techniques from general bandit theory with the structural constraints of stable matching, occupying a niche between purely stochastic matching approaches and general adversarial bandit frameworks.

Among twenty-one candidates examined through semantic search and citation expansion, none were found to clearly refute the paper's three main contributions. The robust bandit learning framework examined ten candidates with zero refutations, the robust ETGS variant examined one candidate with no overlap, and the multi-layer ETGS race method examined ten candidates without finding prior work. This limited search scope suggests that within the top semantic matches, no directly overlapping prior work was identified. However, the analysis explicitly acknowledges examining only twenty-one candidates total, not an exhaustive literature review, leaving open the possibility of relevant work beyond this search radius.

Based on the limited search conducted, the paper appears to address a novel problem formulation at the intersection of matching markets and adversarial robustness. The absence of sibling papers in its taxonomy leaf and zero refutations across contributions suggest originality, though this assessment is constrained by the scope of twenty-one examined candidates. The work's positioning between established research branches—adversarial bandits and stochastic matching—indicates it may be opening a new research direction rather than incrementally extending a crowded subfield.

Taxonomy

Core-task Taxonomy Papers
5
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Bandit learning in two-sided matching markets with adversarial corruptions. The field addresses how agents in matching markets—such as workers and firms, or students and schools—learn optimal strategies when preferences are initially unknown and must be discovered through repeated interactions. The taxonomy reveals four main branches that capture different challenges and modeling assumptions. Adversarial Robustness in Matching Markets focuses on settings where feedback or preferences may be corrupted by adversarial noise, distinguishing cases with known versus unknown corruption levels. Decentralized Learning in Stochastic Matching Markets examines scenarios where multiple agents learn simultaneously without central coordination, as seen in works like Decentralized two-sided bandit learning[2]. Non-Stationary and Dynamic Matching Environments, exemplified by Competing Bandits in Non-Stationary[3], considers markets where preferences or availability change over time. Finally, Specialized Market Structures and Preference Models explores particular matching frameworks, such as those with specific stability notions or preference restrictions. A central tension across these branches involves balancing exploration efficiency with robustness to various forms of uncertainty. While stochastic settings often assume benign randomness, adversarial frameworks must contend with worst-case corruptions that can mislead learning algorithms. Bandit Learning in Matching[0] sits squarely within the Adversarial Robustness branch, specifically addressing known corruption level settings where the extent of adversarial interference is given in advance. This contrasts with purely stochastic approaches like Matching in multi-arm bandit[1], which do not account for adversarial noise, and with non-stationary models such as Competing Bandits in Non-Stationary[3], where the challenge stems from temporal drift rather than deliberate corruption. By focusing on adversarial robustness with known corruption budgets, the original paper tackles a middle ground between overly optimistic stochastic assumptions and fully adaptive adversaries, offering algorithms that can provably handle bounded perturbations in matching feedback.

Claimed Contributions

Robust bandit learning framework for matching markets with adversarial corruptions

The authors introduce a new problem formulation that extends bandit learning in matching markets to handle adversarial corruptions of stochastic feedback. This framework captures realistic scenarios where feedback may be corrupted by malicious adversaries or external noise.

10 retrieved papers
Robust ETGS variant with widened confidence intervals for known corruption

The authors propose a modified ETGS algorithm that enlarges confidence intervals proportionally to the known corruption level C, enabling the algorithm to remain robust against adversarial perturbations while maintaining theoretical guarantees.

1 retrieved paper
Multi-layer ETGS race method with synchronization mechanism for unknown corruption

The authors develop a novel algorithm that maintains multiple ETGS instances at different learning rates and introduces a sub-phase level synchronization mechanism to coordinate players and avoid matching conflicts. This method handles any level of corruption without requiring prior knowledge of the corruption budget.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Robust bandit learning framework for matching markets with adversarial corruptions

The authors introduce a new problem formulation that extends bandit learning in matching markets to handle adversarial corruptions of stochastic feedback. This framework captures realistic scenarios where feedback may be corrupted by malicious adversaries or external noise.

Contribution

Robust ETGS variant with widened confidence intervals for known corruption

The authors propose a modified ETGS algorithm that enlarges confidence intervals proportionally to the known corruption level C, enabling the algorithm to remain robust against adversarial perturbations while maintaining theoretical guarantees.

Contribution

Multi-layer ETGS race method with synchronization mechanism for unknown corruption

The authors develop a novel algorithm that maintains multiple ETGS instances at different learning rates and introduces a sub-phase level synchronization mechanism to coordinate players and avoid matching conflicts. This method handles any level of corruption without requiring prior knowledge of the corruption budget.