DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

Robot LearningData CurationImitation Learning

Recently, the robotics community has amassed ever larger and more diverse datasets to train generalist policies. However, while these policies achieve strong mean performance across a variety of tasks, they often underperform on individual, specialized tasks and require further tuning on newly acquired task-specific data. Combining task-specific data with carefully curated subsets of large prior datasets via co-training can produce better specialized policies, but selecting data naively may actually harm downstream performance. To address this, we introduce DataMIL, a data selection framework built on the datamodels paradigm that reasons about data selection in an end-to-end manner, using the policy itself to identify which data points will most improve performance. Unlike standard practices that filter data using human notions of quality (e.g., based on semantic or visual similarity), DataMIL directly optimizes data selection for task success, allowing us to select data that improves the policy while dropping data that degrade it. To avoid performing expensive rollouts in the environment during selection, we introduce a surrogate loss function on task-specific data, allowing us to use DataMIL in the real world without degrading performance. We validate our approach on 60+ simulation and real-world manipulation tasks, notably showing successful data selection from the largest open collections of robot datasets (OXE); demonstrating consistent gains in success rates over prior works. Our results underscore the importance of end-to-end, performance-aware data selection for unlocking the potential of large prior datasets in robotics.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces DataMIL, a framework for selecting and mixing demonstration data to train specialized robot policies. It sits within the 'Data Mixing and Weighting Strategies' leaf of the taxonomy, which contains only one other sibling paper (Data Retrieval Weights). This leaf is notably sparse compared to neighboring branches such as 'Quality Metrics and Estimation' (four papers) or 'Retrieval-Based Few-Shot Learning' (four papers), suggesting that principled data mixing for robot imitation learning remains an underexplored research direction despite the growing availability of large-scale datasets.

The taxonomy reveals that DataMIL occupies a position between two related but distinct research threads. Upstream, the 'Data Quality Assessment and Curation' branch (eight papers across two leaves) focuses on filtering demonstrations before training, using metrics like mutual information or success likelihood. Downstream, the 'Retrieval-Based Few-Shot Learning' leaf (four papers) emphasizes selecting relevant subsets from prior datasets using distance metrics. DataMIL bridges these directions by reasoning about data selection in an end-to-end manner during policy training, rather than relying solely on upfront filtering or retrieval heuristics.

Among thirty candidates examined, none clearly refute any of the three core contributions. The DataMIL framework itself (ten candidates examined, zero refutable) appears novel in its application of the datamodels paradigm to robot imitation learning. The surrogate loss function (ten candidates, zero refutable) addresses a tractability challenge specific to robotic settings, where rollout costs make standard datamodels approaches prohibitive. The adaptations of datamodels for robotic contexts (ten candidates, zero refutable) also show no substantial prior overlap within the limited search scope, though the analysis acknowledges this reflects top-K semantic matches rather than exhaustive coverage.

Based on the limited literature search, the work appears to introduce a relatively fresh perspective on data mixing for robot learning. The sparse taxonomy leaf and absence of refutable candidates suggest novelty, though the search scope (thirty candidates) leaves open the possibility of relevant prior work in adjacent machine learning communities. The framework's distinctiveness lies in its end-to-end optimization approach, which contrasts with the filtering-then-training or retrieval-then-training paradigms prevalent in neighboring taxonomy branches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: data selection for robot imitation learning. The field addresses how to choose, curate, and combine demonstration data so that robots can learn manipulation skills efficiently and robustly. The taxonomy organizes research into several major branches. Data Quality Assessment and Curation focuses on filtering or ranking demonstrations by metrics such as success likelihood or mutual information (e.g., Data Quality Imitation[1], Mutual Information Curation[8]). Data Retrieval and Augmentation from Prior Datasets explores how to mine and mix existing corpora, including strategies for weighting or blending heterogeneous sources (e.g., Re-mix[4], Data Retrieval Weights[2]). Active Data Collection and Human Interaction examines methods that query human teachers or adaptively gather new demonstrations (e.g., Learning Human Teachers[3], Batch Active Preference[18]). Data Scaling and Efficiency Analysis investigates how performance changes with dataset size and composition (e.g., Data Scaling Laws[5]). Representation and Feature Selection for Learning considers which state or action features matter most for generalization. Domain Adaptation and Transfer Learning tackles distribution shifts across tasks or embodiments. Specialized Learning Paradigms and Applications cover niche settings such as dexterous grasping or surgical robotics, while Foundational Methods and Frameworks provide core algorithmic building blocks. A particularly active line of work centers on mixing and weighting strategies within the Data Retrieval and Augmentation branch, where researchers debate how to balance diverse demonstration sources—some high-quality, some suboptimal—to maximize policy performance. DataMIL[0] sits squarely in this cluster, proposing a principled approach to data mixing that accounts for varying demonstration quality and task relevance. It contrasts with Re-mix[4], which emphasizes replay-buffer blending for continual learning, and with Data Retrieval Weights[2], which focuses on retrieval-based weighting from large prior datasets. Meanwhile, works in Data Quality Assessment (e.g., Data Quality Imitation[1], Mutual Information Curation[8]) offer complementary perspectives by first filtering demonstrations before any mixing occurs. Across these branches, a central open question remains: whether to curate aggressively upfront or to rely on adaptive weighting during training, and how scaling laws (Data Scaling Laws[5]) inform these trade-offs in practice.

Claimed Contributions

DataMIL framework for robot imitation learning data selection

10 retrieved papers

The authors propose DataMIL, a framework that extends the datamodels paradigm to robotics by directly optimizing data selection for task success rather than using human-defined heuristics like semantic or visual similarity. The method uses the policy to evaluate which data points improve performance in an end-to-end manner.

10 retrieved papers

Surrogate loss function for tractable data selection without rollouts

10 retrieved papers

The authors introduce a proxy metric based on validation loss on held-out target demonstrations that replaces expensive real-world rollouts during datamodel estimation. This makes the approach tractable and fully differentiable while maintaining sufficient correlation with true policy performance.

10 retrieved papers

Adaptations of datamodels for robotic settings

10 retrieved papers

The authors develop several modifications to make datamodels work in robotics, including clustering training examples at different temporal scales to reduce variance, using a proxy metric to avoid rollouts, and incorporating target data during estimation to reduce distribution shift in heterogeneous datasets.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Re-mix: Optimizing data mixtures for large scale imitation learning PDF

Hejna, Joey, Bhateja, Chethan, Joey Hejna, Jiang YiChen, Chethan Bhateja, Pertsch, Karl, Yichen Jiang, Sadigh, Dorsa, Karl Pertsch, Dorsa Sadigh (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

DataMIL framework for robot imitation learning data selection

[6] Interventional data generation for robust and data-efficient robot imitation learning PDF

Cannot Refute

[41] Learning and Retrieval from Prior Data for Skill-based Imitation Learning PDF

Cannot Refute

[61] Goal-conditioned imitation learning using score-based diffusion policies PDF

Cannot Refute

[62] Coherent soft imitation learning PDF

Cannot Refute

[63] Towards imitation learning to branch for mip: A hybrid reinforcement learning based sample augmentation approach PDF

Cannot Refute

[64] Map-based deep imitation learning for obstacle avoidance PDF

Cannot Refute

[65] FAGR: Feature-Action Generative Replay for Robot Lifelong Imitation Learning PDF

Cannot Refute

[66] Hierarchical Human Demonstration Toward Imitation Learning of Generalist Robot Planner PDF

Cannot Refute

[67] Behavior imitation for manipulator control and grasping with deep reinforcement learning PDF

Cannot Refute

[68] Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations PDF

Cannot Refute

Contribution

Surrogate loss function for tractable data selection without rollouts

[51] Predicting with proxies: Transfer learning in high dimension PDF

Cannot Refute

[52] Choosing a proxy metric from past experiments PDF

Cannot Refute

[53] Loss function considering dead zone for neural networks PDF

Cannot Refute

[54] When Can Proxies Improve the Sample Complexity of Preference Learning? PDF

Cannot Refute

[55] ASP: Automatic Selection of Proxy dataset for efficient AutoML PDF

Cannot Refute

[56] Sample selecting method based on feature density for pest identification in smart agriculture PDF

Cannot Refute

[57] Reinforcement neural fuzzy surrogate-assisted multiobjective evolutionary fuzzy systems with robot learning control application PDF

Cannot Refute

[58] Learn to grasp with less supervision: A data-efficient maximum likelihood grasp sampling loss PDF

Cannot Refute

[59] Accelerating neural architecture search via proxy data PDF

Cannot Refute

[60] Approximate selection with guarantees using proxies PDF

Cannot Refute

Contribution

Adaptations of datamodels for robotic settings

[1] Data quality in imitation learning PDF

Cannot Refute

[15] Safari: Safe and active robot imitation learning with imagination PDF

Cannot Refute

[69] Enhancing visual domain robustness in behaviour cloning via saliency-guided augmentation PDF

Cannot Refute

[70] Dida: Denoised imitation learning based on domain adaptation PDF

Cannot Refute

[71] Grasping with chopsticks: Combating covariate shift in model-free imitation learning for fine manipulation PDF

Cannot Refute

[72] Off-dynamics reinforcement learning via domain adaptation and reward augmented imitation PDF

Cannot Refute

[73] Efficient guided policy search via imitation of robust tube MPC PDF

Cannot Refute

[74] Domain Adaptive Imitation Learning with Visual Observation PDF

Cannot Refute

[75] Robot See, Robot Do: On the Development of Robust and Adaptive Imitation Learning for Robots PDF

Cannot Refute

[76] Imitation learning for sim-to-real adaptation of robotic cutting policies based on residual Gaussian process disturbance force model PDF

Cannot Refute

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Re-mix: Optimizing data mixtures for large scale imitation learning PDF

Contribution Analysis

DataMIL framework for robot imitation learning data selection

[6] Interventional data generation for robust and data-efficient robot imitation learning PDF

[41] Learning and Retrieval from Prior Data for Skill-based Imitation Learning PDF

[61] Goal-conditioned imitation learning using score-based diffusion policies PDF

[62] Coherent soft imitation learning PDF

[63] Towards imitation learning to branch for mip: A hybrid reinforcement learning based sample augmentation approach PDF

[64] Map-based deep imitation learning for obstacle avoidance PDF

[65] FAGR: Feature-Action Generative Replay for Robot Lifelong Imitation Learning PDF

[66] Hierarchical Human Demonstration Toward Imitation Learning of Generalist Robot Planner PDF

[67] Behavior imitation for manipulator control and grasping with deep reinforcement learning PDF

[68] Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations PDF

Surrogate loss function for tractable data selection without rollouts

[51] Predicting with proxies: Transfer learning in high dimension PDF

[52] Choosing a proxy metric from past experiments PDF

[53] Loss function considering dead zone for neural networks PDF

[54] When Can Proxies Improve the Sample Complexity of Preference Learning? PDF

[55] ASP: Automatic Selection of Proxy dataset for efficient AutoML PDF

[56] Sample selecting method based on feature density for pest identification in smart agriculture PDF

[57] Reinforcement neural fuzzy surrogate-assisted multiobjective evolutionary fuzzy systems with robot learning control application PDF

[58] Learn to grasp with less supervision: A data-efficient maximum likelihood grasp sampling loss PDF

[59] Accelerating neural architecture search via proxy data PDF

[60] Approximate selection with guarantees using proxies PDF

Adaptations of datamodels for robotic settings

[1] Data quality in imitation learning PDF

[15] Safari: Safe and active robot imitation learning with imagination PDF

[69] Enhancing visual domain robustness in behaviour cloning via saliency-guided augmentation PDF

[70] Dida: Denoised imitation learning based on domain adaptation PDF

[71] Grasping with chopsticks: Combating covariate shift in model-free imitation learning for fine manipulation PDF

[72] Off-dynamics reinforcement learning via domain adaptation and reward augmented imitation PDF

[73] Efficient guided policy search via imitation of robust tube MPC PDF

[74] Domain Adaptive Imitation Learning with Visual Observation PDF

[75] Robot See, Robot Do: On the Development of Robust and Adaptive Imitation Learning for Robots PDF

[76] Imitation learning for sim-to-real adaptation of robotic cutting policies based on residual Gaussian process disturbance force model PDF

Table of Contents