DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks

ICLR 2026 Conference SubmissionAnonymous Authors
Bayesian OptimizationData Mixture OptimizationOptimization from feedback
Abstract:

The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, we do not have fine-grained knowledge of the data in the evaluation task (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data is relevant for fine-tuning the LLM. Instead, we can only deploy the LLM on the unseen task to gather multiple rounds of coarse, noisy feedback on how well the model performs (e.g., user ratings). Our paper presents DUET, a novel global-to-local algorithm that optimizes training data mixtures by interleaving data selection with Bayesian optimization to exploit coarse and noisy feedback from a downstream evaluation task. DUET is flexible enough to incorporate different data selection methods, each with different performance-compute tradeoffs. By analyzing DUET's cumulative regret, we theoretically show that DUET converges to the optimal training data mixture even without any fine-grained data information from an unseen task. Finally, our experiments across a variety of language tasks demonstrate that DUET attains substantial performance improvement over existing data selection and mixing methods in the unseen-task setting. Our anonymized code can be found at https://github.com/pmsdapfmbf/DUET.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DUET, a global-to-local algorithm that optimizes training data mixtures using Bayesian optimization guided by coarse, noisy feedback from unseen evaluation tasks. It resides in the 'Bayesian Optimization with Task Feedback' leaf, which contains only two papers including this one. This is a relatively sparse research direction within the broader 'Feedback-Driven Data Mixture Optimization' branch, suggesting the specific combination of Bayesian optimization with coarse task feedback for mixture optimization remains underexplored compared to reweighting-based or predictive modeling approaches.

The taxonomy reveals several neighboring directions: 'Gradient-Based Feedback Alignment' uses online gradient signals rather than Bayesian search, while 'Adversarial and Agent-Based Feedback' employs self-learning agents. The sibling paper in the same leaf shares the Bayesian optimization framework but may differ in feedback granularity or data selection mechanisms. Adjacent branches like 'Reweighting-Based Data Mixture Optimization' (e.g., DoReMi) adjust domain weights without iterative feedback loops, and 'Predictive Modeling' approaches extrapolate performance from small-scale experiments rather than deploying models iteratively. DUET's interleaving of data selection with Bayesian optimization distinguishes it from these single-pass or purely predictive methods.

Among 24 candidates examined, no contribution was clearly refuted. The DUET algorithm itself was assessed against 4 candidates with no refutable overlap; the theoretical convergence analysis examined 10 candidates with no prior work providing equivalent regret bounds; and the problem formulation for unseen tasks reviewed 10 candidates without finding direct precedents. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the specific integration of coarse feedback, Bayesian optimization, and convergence guarantees appears novel. However, the modest candidate pool means the analysis cannot rule out relevant work outside this sample.

Given the sparse taxonomy leaf and absence of refutable candidates in the examined set, the work appears to occupy a relatively unexplored niche. The limited search scope (24 papers) and the narrow sibling set (one other paper) indicate that while the approach seems novel within the sampled literature, a more exhaustive review—especially of adjacent optimization and active learning communities—would be necessary to fully assess originality. The convergence analysis and coarse-feedback formulation stand out as potentially distinctive contributions based on the available evidence.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: optimizing training data mixtures for unseen evaluation tasks. The field addresses how to compose or weight diverse data sources so that models generalize well to new domains or tasks not seen during training. The taxonomy organizes research into several main branches: Predictive Modeling of Data Mixture Performance focuses on forecasting which mixtures will yield strong downstream results, often via scaling laws or learned performance models; Feedback-Driven Data Mixture Optimization iteratively refines mixtures using signals from validation performance or Bayesian search; Reweighting-Based Data Mixture Optimization adjusts domain or example weights to balance contributions, exemplified by methods like DoReMi[8]; Multi-Task and Multi-Domain Data Composition studies how to blend tasks or domains for broad capability; Data Augmentation and Synthetic Data Generation creates or modifies training examples to fill gaps; Out-of-Distribution Detection and Data Quality Assessment identifies low-quality or off-distribution samples; and Domain-Specific and Application-Oriented Optimization tailors mixtures to particular use cases such as clinical text or code pretraining. A particularly active line of work explores feedback-driven approaches that treat mixture optimization as a search problem, using Bayesian optimization or other iterative strategies to navigate the space of possible data compositions. DUET Coarse Feedback[0] sits within this Bayesian Optimization with Task Feedback cluster, building on DUET[1] by incorporating coarser feedback signals to guide mixture selection more efficiently. This contrasts with reweighting methods like DoReMi[8], which rely on reference model perplexity to adjust domain weights in a single pass, and with predictive modeling approaches such as Data Mixing Laws[2] that extrapolate performance from smaller-scale experiments. The central tension across these branches is between sample efficiency—how quickly one can identify a good mixture—and the fidelity of the feedback signal used to steer optimization. DUET Coarse Feedback[0] addresses this trade-off by accepting noisier task-level feedback in exchange for reduced evaluation cost, positioning it as a practical middle ground between exhaustive search and purely heuristic reweighting.

Claimed Contributions

DUET algorithm for optimizing training data mixtures via coarse feedback

The authors introduce DUET, an algorithm that combines data selection methods with Bayesian optimization in an iterative manner to optimize training data mixtures using only coarse, noisy feedback from unseen evaluation tasks, without requiring fine-grained data information.

4 retrieved papers
Theoretical convergence analysis via cumulative regret

The authors provide a theoretical analysis demonstrating that DUET converges to the optimal training data mixture by analyzing the algorithm's attained cumulative regret under the Bayesian optimization framework, proving convergence without requiring detailed evaluation task data.

10 retrieved papers
Novel problem formulation for unseen evaluation tasks

The authors formalize a new problem setting where practitioners lack fine-grained information about evaluation task data but can iteratively gather coarse performance feedback, addressing a gap between traditional domain adaptation and domain generalization approaches.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DUET algorithm for optimizing training data mixtures via coarse feedback

The authors introduce DUET, an algorithm that combines data selection methods with Bayesian optimization in an iterative manner to optimize training data mixtures using only coarse, noisy feedback from unseen evaluation tasks, without requiring fine-grained data information.

Contribution

Theoretical convergence analysis via cumulative regret

The authors provide a theoretical analysis demonstrating that DUET converges to the optimal training data mixture by analyzing the algorithm's attained cumulative regret under the Bayesian optimization framework, proving convergence without requiring detailed evaluation task data.

Contribution

Novel problem formulation for unseen evaluation tasks

The authors formalize a new problem setting where practitioners lack fine-grained information about evaluation task data but can iteratively gather coarse performance feedback, addressing a gap between traditional domain adaptation and domain generalization approaches.