Bandit Learning in Matching Markets Robust to Adversarial Corruptions

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

BanditsMatching marketsRobust algorithmsAdversarial corruptions

This paper investigates the problem of bandit learning in two-sided decentralized matching markets with adversarial corruptions. In matching markets, players on one side aim to learn their unknown preferences over arms on the other side through iterative online learning, with the goal of identifying the optimal stable match. However, in real-world applications, stochastic rewards observed by players may be corrupted by malicious adversaries, potentially misleading the learning process and causing convergence to a sub-optimal match. We study this problem under two settings: one where the corruption level $C$ (defined as the sum of the largest adversarial alterations to the feedback across rounds) is known, and another where it is unknown. For the known corruption setting, we develop a robust variant of the classical Explore-Then-Gale-Shapley (ETGS) algorithm by incorporating widened confidence intervals. For the unknown corruption case, we propose a Multi-layer ETGS race method that adaptively mitigates adversarial effects without prior corruption knowledge. We provide theoretical guarantees for both algorithms by establishing upper bounds on their optimal stable regret, and further derive the lower bound to demonstrate their optimality.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper develops robust bandit learning algorithms for two-sided matching markets under adversarial corruptions, addressing scenarios where feedback may be maliciously altered. Within the taxonomy, it occupies the 'Known Corruption Level Settings' leaf under 'Adversarial Robustness in Matching Markets'. Notably, this leaf contains only the original paper itself—no sibling papers were identified in the taxonomy. This positioning suggests the paper targets a relatively sparse research direction, specifically combining matching market learning with known adversarial corruption budgets, a problem formulation that appears underexplored in the examined literature.

The taxonomy reveals neighboring research directions that contextualize this work. The sibling leaf 'General Adversarial Multi-Armed Bandit Frameworks' addresses foundational adversarial bandit models but without matching market structure. Meanwhile, 'Decentralized Learning in Stochastic Matching Markets' examines similar two-sided learning problems under benign stochastic feedback, and 'Non-Stationary and Dynamic Matching Environments' considers temporal preference changes rather than adversarial interference. The paper thus bridges adversarial robustness techniques from general bandit theory with the structural constraints of stable matching, occupying a niche between purely stochastic matching approaches and general adversarial bandit frameworks.

Among twenty-one candidates examined through semantic search and citation expansion, none were found to clearly refute the paper's three main contributions. The robust bandit learning framework examined ten candidates with zero refutations, the robust ETGS variant examined one candidate with no overlap, and the multi-layer ETGS race method examined ten candidates without finding prior work. This limited search scope suggests that within the top semantic matches, no directly overlapping prior work was identified. However, the analysis explicitly acknowledges examining only twenty-one candidates total, not an exhaustive literature review, leaving open the possibility of relevant work beyond this search radius.

Based on the limited search conducted, the paper appears to address a novel problem formulation at the intersection of matching markets and adversarial robustness. The absence of sibling papers in its taxonomy leaf and zero refutations across contributions suggest originality, though this assessment is constrained by the scope of twenty-one examined candidates. The work's positioning between established research branches—adversarial bandits and stochastic matching—indicates it may be opening a new research direction rather than incrementally extending a crowded subfield.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Bandit learning in two-sided matching markets with adversarial corruptions. The field addresses how agents in matching markets—such as workers and firms, or students and schools—learn optimal strategies when preferences are initially unknown and must be discovered through repeated interactions. The taxonomy reveals four main branches that capture different challenges and modeling assumptions. Adversarial Robustness in Matching Markets focuses on settings where feedback or preferences may be corrupted by adversarial noise, distinguishing cases with known versus unknown corruption levels. Decentralized Learning in Stochastic Matching Markets examines scenarios where multiple agents learn simultaneously without central coordination, as seen in works like Decentralized two-sided bandit learning[2]. Non-Stationary and Dynamic Matching Environments, exemplified by Competing Bandits in Non-Stationary[3], considers markets where preferences or availability change over time. Finally, Specialized Market Structures and Preference Models explores particular matching frameworks, such as those with specific stability notions or preference restrictions. A central tension across these branches involves balancing exploration efficiency with robustness to various forms of uncertainty. While stochastic settings often assume benign randomness, adversarial frameworks must contend with worst-case corruptions that can mislead learning algorithms. Bandit Learning in Matching[0] sits squarely within the Adversarial Robustness branch, specifically addressing known corruption level settings where the extent of adversarial interference is given in advance. This contrasts with purely stochastic approaches like Matching in multi-arm bandit[1], which do not account for adversarial noise, and with non-stationary models such as Competing Bandits in Non-Stationary[3], where the challenge stems from temporal drift rather than deliberate corruption. By focusing on adversarial robustness with known corruption budgets, the original paper tackles a middle ground between overly optimistic stochastic assumptions and fully adaptive adversaries, offering algorithms that can provably handle bounded perturbations in matching feedback.

Claimed Contributions

Robust bandit learning framework for matching markets with adversarial corruptions

10 retrieved papers

The authors introduce a new problem formulation that extends bandit learning in matching markets to handle adversarial corruptions of stochastic feedback. This framework captures realistic scenarios where feedback may be corrupted by malicious adversaries or external noise.

10 retrieved papers

Robust ETGS variant with widened confidence intervals for known corruption

1 retrieved paper

The authors propose a modified ETGS algorithm that enlarges confidence intervals proportionally to the known corruption level C, enabling the algorithm to remain robust against adversarial perturbations while maintaining theoretical guarantees.

1 retrieved paper

Multi-layer ETGS race method with synchronization mechanism for unknown corruption

10 retrieved papers

The authors develop a novel algorithm that maintains multiple ETGS instances at different learning rates and introduces a sub-phase level synchronization mechanism to coordinate players and avoid matching conflicts. This method handles any level of corruption without requiring prior knowledge of the corruption budget.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Robust bandit learning framework for matching markets with adversarial corruptions

[1] Matching in multi-arm bandit with collision PDF

Cannot Refute

[2] Decentralized two-sided bandit learning in matching market PDF

Cannot Refute

[3] Competing Bandits in Non-Stationary Matching Markets PDF

Cannot Refute

[16] Regret analysis of bilateral trade with a smoothed adversary PDF

Cannot Refute

[17] Fair online bilateral trade PDF

Cannot Refute

[18] Privacy-preserving stable data trading for unknown market based on blockchain PDF

Cannot Refute

[19] Bilateral trade: A regret minimization perspective PDF

Cannot Refute

[20] Decentralized Competing Bandits in Non-Stationary Matching Markets PDF

Cannot Refute

[21] A regret analysis of bilateral trade PDF

Cannot Refute

[22] Dynamic Matching Bandit For Two-Sided Online Markets PDF

Cannot Refute

Contribution

Robust ETGS variant with widened confidence intervals for known corruption

[23] Bandit Learning in Matching Markets with Indifference PDF

Cannot Refute

Contribution

Multi-layer ETGS race method with synchronization mechanism for unknown corruption

[6] Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions PDF

Cannot Refute

[7] Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency PDF

Cannot Refute

[8] Linear Contextual Bandits with Adversarial Corruptions PDF

Cannot Refute

[9] Federated Multi-Armed Bandits PDF

Cannot Refute

[10] Observation-Augmented Contextual Multi-Armed Bandits for Robotic Exploration with Uncertain Semantic Data PDF

Cannot Refute

[11] Towards best-of-all-worlds online learning with feedback graphs PDF

Cannot Refute

[12] Proactive Deployment of Chain-based VNF Backup at the Edge using Online Bandit Learning PDF

Cannot Refute

[13] A new bandit setting balancing information from state evolution and corrupted context PDF

Cannot Refute

[14] Problem-dependent Regret for Lexicographic Multi-Armed Bandits with Adversarial Corruptions PDF

Cannot Refute

[15] Corrupted Learning Dynamics in Games PDF

Cannot Refute

Bandit Learning in Matching Markets Robust to Adversarial Corruptions

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Robust bandit learning framework for matching markets with adversarial corruptions

[1] Matching in multi-arm bandit with collision PDF

[2] Decentralized two-sided bandit learning in matching market PDF

[3] Competing Bandits in Non-Stationary Matching Markets PDF

[16] Regret analysis of bilateral trade with a smoothed adversary PDF

[17] Fair online bilateral trade PDF

[18] Privacy-preserving stable data trading for unknown market based on blockchain PDF

[19] Bilateral trade: A regret minimization perspective PDF

[20] Decentralized Competing Bandits in Non-Stationary Matching Markets PDF

[21] A regret analysis of bilateral trade PDF

[22] Dynamic Matching Bandit For Two-Sided Online Markets PDF

Robust ETGS variant with widened confidence intervals for known corruption

[23] Bandit Learning in Matching Markets with Indifference PDF

Multi-layer ETGS race method with synchronization mechanism for unknown corruption

[6] Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions PDF

[7] Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency PDF

[8] Linear Contextual Bandits with Adversarial Corruptions PDF

[9] Federated Multi-Armed Bandits PDF

[10] Observation-Augmented Contextual Multi-Armed Bandits for Robotic Exploration with Uncertain Semantic Data PDF

[11] Towards best-of-all-worlds online learning with feedback graphs PDF

[12] Proactive Deployment of Chain-based VNF Backup at the Edge using Online Bandit Learning PDF

[13] A new bandit setting balancing information from state evolution and corrupted context PDF

[14] Problem-dependent Regret for Lexicographic Multi-Armed Bandits with Adversarial Corruptions PDF

[15] Corrupted Learning Dynamics in Games PDF

Table of Contents