DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Overview
Overall Novelty Assessment
The paper introduces DataMIL, a framework for selecting and mixing demonstration data to train specialized robot policies. It sits within the 'Data Mixing and Weighting Strategies' leaf of the taxonomy, which contains only one other sibling paper (Data Retrieval Weights). This leaf is notably sparse compared to neighboring branches such as 'Quality Metrics and Estimation' (four papers) or 'Retrieval-Based Few-Shot Learning' (four papers), suggesting that principled data mixing for robot imitation learning remains an underexplored research direction despite the growing availability of large-scale datasets.
The taxonomy reveals that DataMIL occupies a position between two related but distinct research threads. Upstream, the 'Data Quality Assessment and Curation' branch (eight papers across two leaves) focuses on filtering demonstrations before training, using metrics like mutual information or success likelihood. Downstream, the 'Retrieval-Based Few-Shot Learning' leaf (four papers) emphasizes selecting relevant subsets from prior datasets using distance metrics. DataMIL bridges these directions by reasoning about data selection in an end-to-end manner during policy training, rather than relying solely on upfront filtering or retrieval heuristics.
Among thirty candidates examined, none clearly refute any of the three core contributions. The DataMIL framework itself (ten candidates examined, zero refutable) appears novel in its application of the datamodels paradigm to robot imitation learning. The surrogate loss function (ten candidates, zero refutable) addresses a tractability challenge specific to robotic settings, where rollout costs make standard datamodels approaches prohibitive. The adaptations of datamodels for robotic contexts (ten candidates, zero refutable) also show no substantial prior overlap within the limited search scope, though the analysis acknowledges this reflects top-K semantic matches rather than exhaustive coverage.
Based on the limited literature search, the work appears to introduce a relatively fresh perspective on data mixing for robot learning. The sparse taxonomy leaf and absence of refutable candidates suggest novelty, though the search scope (thirty candidates) leaves open the possibility of relevant prior work in adjacent machine learning communities. The framework's distinctiveness lies in its end-to-end optimization approach, which contrasts with the filtering-then-training or retrieval-then-training paradigms prevalent in neighboring taxonomy branches.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose DataMIL, a framework that extends the datamodels paradigm to robotics by directly optimizing data selection for task success rather than using human-defined heuristics like semantic or visual similarity. The method uses the policy to evaluate which data points improve performance in an end-to-end manner.
The authors introduce a proxy metric based on validation loss on held-out target demonstrations that replaces expensive real-world rollouts during datamodel estimation. This makes the approach tractable and fully differentiable while maintaining sufficient correlation with true policy performance.
The authors develop several modifications to make datamodels work in robotics, including clustering training examples at different temporal scales to reduce variance, using a proxy metric to avoid rollouts, and incorporating target data during estimation to reduce distribution shift in heterogeneous datasets.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[4] Re-mix: Optimizing data mixtures for large scale imitation learning PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
DataMIL framework for robot imitation learning data selection
The authors propose DataMIL, a framework that extends the datamodels paradigm to robotics by directly optimizing data selection for task success rather than using human-defined heuristics like semantic or visual similarity. The method uses the policy to evaluate which data points improve performance in an end-to-end manner.
[6] Interventional data generation for robust and data-efficient robot imitation learning PDF
[41] Learning and Retrieval from Prior Data for Skill-based Imitation Learning PDF
[61] Goal-conditioned imitation learning using score-based diffusion policies PDF
[62] Coherent soft imitation learning PDF
[63] Towards imitation learning to branch for mip: A hybrid reinforcement learning based sample augmentation approach PDF
[64] Map-based deep imitation learning for obstacle avoidance PDF
[65] FAGR: Feature-Action Generative Replay for Robot Lifelong Imitation Learning PDF
[66] Hierarchical Human Demonstration Toward Imitation Learning of Generalist Robot Planner PDF
[67] Behavior imitation for manipulator control and grasping with deep reinforcement learning PDF
[68] Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations PDF
Surrogate loss function for tractable data selection without rollouts
The authors introduce a proxy metric based on validation loss on held-out target demonstrations that replaces expensive real-world rollouts during datamodel estimation. This makes the approach tractable and fully differentiable while maintaining sufficient correlation with true policy performance.
[51] Predicting with proxies: Transfer learning in high dimension PDF
[52] Choosing a proxy metric from past experiments PDF
[53] Loss function considering dead zone for neural networks PDF
[54] When Can Proxies Improve the Sample Complexity of Preference Learning? PDF
[55] ASP: Automatic Selection of Proxy dataset for efficient AutoML PDF
[56] Sample selecting method based on feature density for pest identification in smart agriculture PDF
[57] Reinforcement neural fuzzy surrogate-assisted multiobjective evolutionary fuzzy systems with robot learning control application PDF
[58] Learn to grasp with less supervision: A data-efficient maximum likelihood grasp sampling loss PDF
[59] Accelerating neural architecture search via proxy data PDF
[60] Approximate selection with guarantees using proxies PDF
Adaptations of datamodels for robotic settings
The authors develop several modifications to make datamodels work in robotics, including clustering training examples at different temporal scales to reduce variance, using a proxy metric to avoid rollouts, and incorporating target data during estimation to reduce distribution shift in heterogeneous datasets.