Real-Time Robot Execution with Masked Action Chunking
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce REMAC, a training-time method that adapts pretrained vision-language-action policies for asynchronous inference by learning corrective adjustments through masked action chunking. This approach addresses intra-chunk inconsistency by masking arbitrary portions of action chunks during training, enabling the policy to handle misalignments between observations and executed actions without introducing additional inference latency.
The authors identify and formalize intra-chunk inconsistency as a previously overlooked challenge in asynchronous inference with action chunking. This occurs when executed actions from a previous chunk are conditioned on outdated observations, creating a perception-action mismatch within a single chunk that degrades policy performance.
The authors propose a prefix-preserved sampling procedure that initializes action generation using previously executed actions as priors and preserves the overlapping segment between consecutive chunks during sampling. This method enhances inter-chunk continuity by maintaining coherence across chunk boundaries during asynchronous execution.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference PDF
[7] Leave no observation behind: Real-time correction for vla action chunks PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
REMAC: Real-time Execution with Masked Action Chunking
The authors introduce REMAC, a training-time method that adapts pretrained vision-language-action policies for asynchronous inference by learning corrective adjustments through masked action chunking. This approach addresses intra-chunk inconsistency by masking arbitrary portions of action chunks during training, enabling the policy to handle misalignments between observations and executed actions without introducing additional inference latency.
[7] Leave no observation behind: Real-time correction for vla action chunks PDF
[34] : a VLA That Learns From Experience PDF
[35] AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models PDF
[36] A Survey on Reinforcement Learning of Vision-Language-Action Models for Robotic Manipulation PDF
[37] Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies PDF
Identification of intra-chunk inconsistency as a critical failure mode
The authors identify and formalize intra-chunk inconsistency as a previously overlooked challenge in asynchronous inference with action chunking. This occurs when executed actions from a previous chunk are conditioned on outdated observations, creating a perception-action mismatch within a single chunk that degrades policy performance.
[1] VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference PDF
[38] SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics PDF
[39] ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning PDF
[40] Mobile robot programming using natural language PDF
Prefix-preserved sampling procedure for inter-chunk continuity
The authors propose a prefix-preserved sampling procedure that initializes action generation using previously executed actions as priors and preserves the overlapping segment between consecutive chunks during sampling. This method enhances inter-chunk continuity by maintaining coherence across chunk boundaries during asynchronous execution.