DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Bayesian OptimizationData Mixture OptimizationOptimization from feedback

The performance of an LLM depends heavily on the relevance of its training data to the downstream evaluation task. However, in practice, we do not have fine-grained knowledge of the data in the evaluation task (e.g., conversations between an LLM and a user are end-to-end encrypted). Hence, it is unclear what data is relevant for fine-tuning the LLM. Instead, we can only deploy the LLM on the unseen task to gather multiple rounds of coarse, noisy feedback on how well the model performs (e.g., user ratings). Our paper presents DUET, a novel global-to-local algorithm that optimizes training data mixtures by interleaving data selection with Bayesian optimization to exploit coarse and noisy feedback from a downstream evaluation task. DUET is flexible enough to incorporate different data selection methods, each with different performance-compute tradeoffs. By analyzing DUET's cumulative regret, we theoretically show that DUET converges to the optimal training data mixture even without any fine-grained data information from an unseen task. Finally, our experiments across a variety of language tasks demonstrate that DUET attains substantial performance improvement over existing data selection and mixing methods in the unseen-task setting. Our anonymized code can be found at https://github.com/pmsdapfmbf/DUET.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DUET, a global-to-local algorithm that optimizes training data mixtures using Bayesian optimization guided by coarse, noisy feedback from unseen evaluation tasks. It resides in the 'Bayesian Optimization with Task Feedback' leaf, which contains only two papers including this one. This is a relatively sparse research direction within the broader 'Feedback-Driven Data Mixture Optimization' branch, suggesting the specific combination of Bayesian optimization with coarse task feedback for mixture optimization remains underexplored compared to reweighting-based or predictive modeling approaches.

The taxonomy reveals several neighboring directions: 'Gradient-Based Feedback Alignment' uses online gradient signals rather than Bayesian search, while 'Adversarial and Agent-Based Feedback' employs self-learning agents. The sibling paper in the same leaf shares the Bayesian optimization framework but may differ in feedback granularity or data selection mechanisms. Adjacent branches like 'Reweighting-Based Data Mixture Optimization' (e.g., DoReMi) adjust domain weights without iterative feedback loops, and 'Predictive Modeling' approaches extrapolate performance from small-scale experiments rather than deploying models iteratively. DUET's interleaving of data selection with Bayesian optimization distinguishes it from these single-pass or purely predictive methods.

Among 24 candidates examined, no contribution was clearly refuted. The DUET algorithm itself was assessed against 4 candidates with no refutable overlap; the theoretical convergence analysis examined 10 candidates with no prior work providing equivalent regret bounds; and the problem formulation for unseen tasks reviewed 10 candidates without finding direct precedents. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the specific integration of coarse feedback, Bayesian optimization, and convergence guarantees appears novel. However, the modest candidate pool means the analysis cannot rule out relevant work outside this sample.

Given the sparse taxonomy leaf and absence of refutable candidates in the examined set, the work appears to occupy a relatively unexplored niche. The limited search scope (24 papers) and the narrow sibling set (one other paper) indicate that while the approach seems novel within the sampled literature, a more exhaustive review—especially of adjacent optimization and active learning communities—would be necessary to fully assess originality. The convergence analysis and coarse-feedback formulation stand out as potentially distinctive contributions based on the available evidence.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: optimizing training data mixtures for unseen evaluation tasks. The field addresses how to compose or weight diverse data sources so that models generalize well to new domains or tasks not seen during training. The taxonomy organizes research into several main branches: Predictive Modeling of Data Mixture Performance focuses on forecasting which mixtures will yield strong downstream results, often via scaling laws or learned performance models; Feedback-Driven Data Mixture Optimization iteratively refines mixtures using signals from validation performance or Bayesian search; Reweighting-Based Data Mixture Optimization adjusts domain or example weights to balance contributions, exemplified by methods like DoReMi[8]; Multi-Task and Multi-Domain Data Composition studies how to blend tasks or domains for broad capability; Data Augmentation and Synthetic Data Generation creates or modifies training examples to fill gaps; Out-of-Distribution Detection and Data Quality Assessment identifies low-quality or off-distribution samples; and Domain-Specific and Application-Oriented Optimization tailors mixtures to particular use cases such as clinical text or code pretraining. A particularly active line of work explores feedback-driven approaches that treat mixture optimization as a search problem, using Bayesian optimization or other iterative strategies to navigate the space of possible data compositions. DUET Coarse Feedback[0] sits within this Bayesian Optimization with Task Feedback cluster, building on DUET[1] by incorporating coarser feedback signals to guide mixture selection more efficiently. This contrasts with reweighting methods like DoReMi[8], which rely on reference model perplexity to adjust domain weights in a single pass, and with predictive modeling approaches such as Data Mixing Laws[2] that extrapolate performance from smaller-scale experiments. The central tension across these branches is between sample efficiency—how quickly one can identify a good mixture—and the fidelity of the feedback signal used to steer optimization. DUET Coarse Feedback[0] addresses this trade-off by accepting noisier task-level feedback in exchange for reduced evaluation cost, positioning it as a practical middle ground between exhaustive search and purely heuristic reweighting.

Claimed Contributions

DUET algorithm for optimizing training data mixtures via coarse feedback

4 retrieved papers

The authors introduce DUET, an algorithm that combines data selection methods with Bayesian optimization in an iterative manner to optimize training data mixtures using only coarse, noisy feedback from unseen evaluation tasks, without requiring fine-grained data information.

4 retrieved papers

Theoretical convergence analysis via cumulative regret

10 retrieved papers

The authors provide a theoretical analysis demonstrating that DUET converges to the optimal training data mixture by analyzing the algorithm's attained cumulative regret under the Bayesian optimization framework, proving convergence without requiring detailed evaluation task data.

10 retrieved papers

Novel problem formulation for unseen evaluation tasks

10 retrieved papers

The authors formalize a new problem setting where practitioners lack fine-grained information about evaluation task data but can iteratively gather coarse performance feedback, addressing a gap between traditional domain adaptation and domain generalization approaches.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

Chen Zhiliang, Zhiliang Chen, Foo, Chuan-Sheng, Gregory Kang Ruey Lau, Low, Bryan Kian Hsiang, Chuan-Sheng Foo, K. H. Low (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DUET algorithm for optimizing training data mixtures via coarse feedback

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

Cannot Refute

[69] BPR: Bayesian personalized ranking from implicit feedback PDF

Cannot Refute

[70] Interactive Training: Feedback-Driven Neural Network Optimization PDF

Cannot Refute

[71] Relevance feedback using generalized Bayesian framework with region-based optimization learning PDF

Cannot Refute

Contribution

Theoretical convergence analysis via cumulative regret

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

Cannot Refute

[51] Convergence Guarantees for Adaptive Bayesian Quadrature Methods PDF

Cannot Refute

[52] Convergence-Guaranteed Parametric Bayesian Distributed Cooperative Localization PDF

Cannot Refute

[53] Mixed-variable Bayesian optimization PDF

Cannot Refute

[54] Bayesian Optimization with Exponential Convergence PDF

Cannot Refute

[55] Think global and act local: Bayesian optimisation over high-dimensional categorical and mixed search spaces PDF

Cannot Refute

[56] A hybrid optimization algorithm with Bayesian inference for probabilistic model updating PDF

Cannot Refute

[57] An empirical Bayesian strategy for solving the simultaneous sparse approximation problem PDF

Cannot Refute

[58] Blending Data and Knowledge for Process Industrial Modeling Under Riemannian Preconditioned Bayesian Framework PDF

Cannot Refute

[59] Bayesian maximum entropy and data fusion for processing qualitative data: theory and application for crowdsourced cropland occurrences in Ethiopia PDF

Cannot Refute

Contribution

Novel problem formulation for unseen evaluation tasks

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

Cannot Refute

[60] Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback PDF

Cannot Refute

[61] Self-Refine: Iterative Refinement with Self-Feedback PDF

Cannot Refute

[62] Self-Play Preference Optimization for Language Model Alignment PDF

Cannot Refute

[63] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops PDF

Cannot Refute

[64] Iterative Reasoning Preference Optimization PDF

Cannot Refute

[65] RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback PDF

Cannot Refute

[66] FABRIC: Personalizing Diffusion Models with Iterative Feedback PDF

Cannot Refute

[67] Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models PDF

Cannot Refute

[68] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization PDF

Cannot Refute

DUET: Optimizing Training Data Mixtures via Coarse, Noisy Feedback from Unseen Evaluation Tasks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

Contribution Analysis

DUET algorithm for optimizing training data mixtures via coarse feedback

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

[69] BPR: Bayesian personalized ranking from implicit feedback PDF

[70] Interactive Training: Feedback-Driven Neural Network Optimization PDF

[71] Relevance feedback using generalized Bayesian framework with region-based optimization learning PDF

Theoretical convergence analysis via cumulative regret

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

[51] Convergence Guarantees for Adaptive Bayesian Quadrature Methods PDF

[52] Convergence-Guaranteed Parametric Bayesian Distributed Cooperative Localization PDF

[53] Mixed-variable Bayesian optimization PDF

[54] Bayesian Optimization with Exponential Convergence PDF

[55] Think global and act local: Bayesian optimisation over high-dimensional categorical and mixed search spaces PDF

[56] A hybrid optimization algorithm with Bayesian inference for probabilistic model updating PDF

[57] An empirical Bayesian strategy for solving the simultaneous sparse approximation problem PDF

[58] Blending Data and Knowledge for Process Industrial Modeling Under Riemannian Preconditioned Bayesian Framework PDF

[59] Bayesian maximum entropy and data fusion for processing qualitative data: theory and application for crowdsourced cropland occurrences in Ethiopia PDF

Novel problem formulation for unseen evaluation tasks

[1] DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks PDF

[60] Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback PDF

[61] Self-Refine: Iterative Refinement with Self-Feedback PDF

[62] Self-Play Preference Optimization for Language Model Alignment PDF

[63] A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops PDF

[64] Iterative Reasoning Preference Optimization PDF

[65] RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback PDF

[66] FABRIC: Personalizing Diffusion Models with Iterative Feedback PDF

[67] Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models PDF

[68] OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization PDF

Table of Contents