DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

ICLR 2026 Conference SubmissionAnonymous Authors
Robot LearningData CurationImitation Learning
Abstract:

Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that improves the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we introduce a surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on 60+ simulation and real-world manipulation tasks, notably showing successful data selection from the largest open collections of robot datasets (OXE); demonstrating consistent gains in success rates over prior works. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DataMIL, a framework for selecting and mixing demonstration data to train specialized robot policies. It sits within the 'Data Mixing and Weighting Strategies' leaf of the taxonomy, which contains only one other sibling paper (Data Retrieval Weights). This leaf is notably sparse compared to neighboring branches such as 'Quality Metrics and Estimation' (four papers) or 'Retrieval-Based Few-Shot Learning' (four papers), suggesting that principled data mixing for robot imitation learning remains an underexplored research direction despite the growing availability of large-scale datasets.

The taxonomy reveals that DataMIL occupies a position between two related but distinct research threads. Upstream, the 'Data Quality Assessment and Curation' branch (eight papers across two leaves) focuses on filtering demonstrations before training, using metrics like mutual information or success likelihood. Downstream, the 'Retrieval-Based Few-Shot Learning' leaf (four papers) emphasizes selecting relevant subsets from prior datasets using distance metrics. DataMIL bridges these directions by reasoning about data selection in an end-to-end manner during policy training, rather than relying solely on upfront filtering or retrieval heuristics.

Among thirty candidates examined, none clearly refute any of the three core contributions. The DataMIL framework itself (ten candidates examined, zero refutable) appears novel in its application of the datamodels paradigm to robot imitation learning. The surrogate loss function (ten candidates, zero refutable) addresses a tractability challenge specific to robotic settings, where rollout costs make standard datamodels approaches prohibitive. The adaptations of datamodels for robotic contexts (ten candidates, zero refutable) also show no substantial prior overlap within the limited search scope, though the analysis acknowledges this reflects top-K semantic matches rather than exhaustive coverage.

Based on the limited literature search, the work appears to introduce a relatively fresh perspective on data mixing for robot learning. The sparse taxonomy leaf and absence of refutable candidates suggest novelty, though the search scope (thirty candidates) leaves open the possibility of relevant prior work in adjacent machine learning communities. The framework's distinctiveness lies in its end-to-end optimization approach, which contrasts with the filtering-then-training or retrieval-then-training paradigms prevalent in neighboring taxonomy branches.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: data selection for robot imitation learning. The field addresses how to choose, curate, and combine demonstration data so that robots can learn manipulation skills efficiently and robustly. The taxonomy organizes research into several major branches. Data Quality Assessment and Curation focuses on filtering or ranking demonstrations by metrics such as success likelihood or mutual information (e.g., Data Quality Imitation[1], Mutual Information Curation[8]). Data Retrieval and Augmentation from Prior Datasets explores how to mine and mix existing corpora, including strategies for weighting or blending heterogeneous sources (e.g., Re-mix[4], Data Retrieval Weights[2]). Active Data Collection and Human Interaction examines methods that query human teachers or adaptively gather new demonstrations (e.g., Learning Human Teachers[3], Batch Active Preference[18]). Data Scaling and Efficiency Analysis investigates how performance changes with dataset size and composition (e.g., Data Scaling Laws[5]). Representation and Feature Selection for Learning considers which state or action features matter most for generalization. Domain Adaptation and Transfer Learning tackles distribution shifts across tasks or embodiments. Specialized Learning Paradigms and Applications cover niche settings such as dexterous grasping or surgical robotics, while Foundational Methods and Frameworks provide core algorithmic building blocks. A particularly active line of work centers on mixing and weighting strategies within the Data Retrieval and Augmentation branch, where researchers debate how to balance diverse demonstration sources—some high-quality, some suboptimal—to maximize policy performance. DataMIL[0] sits squarely in this cluster, proposing a principled approach to data mixing that accounts for varying demonstration quality and task relevance. It contrasts with Re-mix[4], which emphasizes replay-buffer blending for continual learning, and with Data Retrieval Weights[2], which focuses on retrieval-based weighting from large prior datasets. Meanwhile, works in Data Quality Assessment (e.g., Data Quality Imitation[1], Mutual Information Curation[8]) offer complementary perspectives by first filtering demonstrations before any mixing occurs. Across these branches, a central open question remains: whether to curate aggressively upfront or to rely on adaptive weighting during training, and how scaling laws (Data Scaling Laws[5]) inform these trade-offs in practice.

Claimed Contributions

DataMIL framework for robot imitation learning data selection

The authors propose DataMIL, a framework that extends the datamodels paradigm to robotics by directly optimizing data selection for task success rather than using human-defined heuristics like semantic or visual similarity. The method uses the policy to evaluate which data points improve performance in an end-to-end manner.

10 retrieved papers
Surrogate loss function for tractable data selection without rollouts

The authors introduce a proxy metric based on validation loss on held-out target demonstrations that replaces expensive real-world rollouts during datamodel estimation. This makes the approach tractable and fully differentiable while maintaining sufficient correlation with true policy performance.

10 retrieved papers
Adaptations of datamodels for robotic settings

The authors develop several modifications to make datamodels work in robotics, including clustering training examples at different temporal scales to reduce variance, using a proxy metric to avoid rollouts, and incorporating target data during estimation to reduce distribution shift in heterogeneous datasets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DataMIL framework for robot imitation learning data selection

The authors propose DataMIL, a framework that extends the datamodels paradigm to robotics by directly optimizing data selection for task success rather than using human-defined heuristics like semantic or visual similarity. The method uses the policy to evaluate which data points improve performance in an end-to-end manner.

Contribution

Surrogate loss function for tractable data selection without rollouts

The authors introduce a proxy metric based on validation loss on held-out target demonstrations that replaces expensive real-world rollouts during datamodel estimation. This makes the approach tractable and fully differentiable while maintaining sufficient correlation with true policy performance.

Contribution

Adaptations of datamodels for robotic settings

The authors develop several modifications to make datamodels work in robotics, including clustering training examples at different temporal scales to reduce variance, using a proxy metric to avoid rollouts, and incorporating target data during estimation to reduce distribution shift in heterogeneous datasets.