Real-Time Robot Execution with Masked Action Chunking

ICLR 2026 Conference SubmissionAnonymous Authors
Robot ManipulationReal-time Execution
Abstract:

Real-time execution is essential for cyber-physical systems such as robots. These systems operate in dynamic real-world environments where even small delays can undermine responsiveness and compromise performance. Asynchronous inference has recently emerged as a system-level paradigm for real-time robot manipulation, enabling the next action chunk to be predicted while the current one is being executed. While this approach achieves real-time responsiveness, naive integration often results in execution failure. Previous methods attributed this failure to inter-chunk discontinuity and developed test-time algorithms to smooth chunk boundaries. In contrast, we identify another critical yet overlooked factor: intra-chunk inconsistency, where the robot’s executed action chunk partially misaligns with its current perception. To address this, we propose REMAC, which learns corrective adjustments on the pretrained policy through masked action chunking, enabling the policy to remain resilient under mismatches between intended actions and actual execution during asynchronous inference. In addition, we introduce a prefix-preserved sampling procedure to reinforce inter-chunk continuity. Overall, our method delivers more reliable policies without incurring additional latency. Extensive experiments in both simulation and real-world settings demonstrate that our method enables faster task execution, maintains robustness across varying delays, and consistently achieves higher completion rates.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
33
3
Claimed Contributions
12
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: real-time robot manipulation under asynchronous inference. The field addresses the challenge of executing manipulation policies when perception and decision-making modules run at different rates or with variable latency. The taxonomy organizes work into several main branches: asynchronous inference architectures for vision-language-action models, which handle the mismatch between slow neural network inference and fast control loops; deep reinforcement learning training methods that account for asynchronous data collection; real-time motion planning and trajectory optimization that must react quickly despite delayed observations; system infrastructure and middleware designed to coordinate asynchronous components; perception and inference optimization techniques that reduce latency; and human-in-the-loop refinement approaches that improve policies post-deployment. Representative efforts include asynchronous off-policy RL methods (Asynchronous Off-Policy[2]) for training, middleware solutions (xbot2 Middleware[12], ROS2 Timed Rebeca[10]) for coordination, and model predictive control adaptations (ASAP-MPC[11], Incremental MPC Time-Delay[25]) for planning under delay. A particularly active line of work focuses on future-state-aware and chunk correction approaches within the asynchronous inference architectures branch, where methods predict or compensate for the time lag between observation and action execution. Masked Action Chunking[0] sits squarely in this cluster, addressing how to generate and refine sequences of actions when inference cannot keep pace with control frequency. It shares thematic concerns with Real-Time Correction VLA[7], which also emphasizes correcting action sequences on-the-fly, and with approaches like Action Chunking Flow[3] that structure action generation to respect temporal dependencies. Nearby work such as Observe Then Act[5] and VLASH[1] similarly grapple with the trade-off between waiting for fresh observations versus acting on potentially stale information. The central tension across these methods is balancing reactivity—how quickly the system can respond to new sensory input—against the computational cost of frequent re-inference, with different solutions offering varying degrees of look-ahead prediction, action buffering, and online correction.

Claimed Contributions

REMAC: Real-time Execution with Masked Action Chunking

The authors introduce REMAC, a training-time method that adapts pretrained vision-language-action policies for asynchronous inference by learning corrective adjustments through masked action chunking. This approach addresses intra-chunk inconsistency by masking arbitrary portions of action chunks during training, enabling the policy to handle misalignments between observations and executed actions without introducing additional inference latency.

5 retrieved papers
Identification of intra-chunk inconsistency as a critical failure mode

The authors identify and formalize intra-chunk inconsistency as a previously overlooked challenge in asynchronous inference with action chunking. This occurs when executed actions from a previous chunk are conditioned on outdated observations, creating a perception-action mismatch within a single chunk that degrades policy performance.

4 retrieved papers
Can Refute
Prefix-preserved sampling procedure for inter-chunk continuity

The authors propose a prefix-preserved sampling procedure that initializes action generation using previously executed actions as priors and preserves the overlapping segment between consecutive chunks during sampling. This method enhances inter-chunk continuity by maintaining coherence across chunk boundaries during asynchronous execution.

3 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

REMAC: Real-time Execution with Masked Action Chunking

The authors introduce REMAC, a training-time method that adapts pretrained vision-language-action policies for asynchronous inference by learning corrective adjustments through masked action chunking. This approach addresses intra-chunk inconsistency by masking arbitrary portions of action chunks during training, enabling the policy to handle misalignments between observations and executed actions without introducing additional inference latency.

Contribution

Identification of intra-chunk inconsistency as a critical failure mode

The authors identify and formalize intra-chunk inconsistency as a previously overlooked challenge in asynchronous inference with action chunking. This occurs when executed actions from a previous chunk are conditioned on outdated observations, creating a perception-action mismatch within a single chunk that degrades policy performance.

Contribution

Prefix-preserved sampling procedure for inter-chunk continuity

The authors propose a prefix-preserved sampling procedure that initializes action generation using previously executed actions as priors and preserves the overlapping segment between consecutive chunks during sampling. This method enhances inter-chunk continuity by maintaining coherence across chunk boundaries during asynchronous execution.

Real-Time Robot Execution with Masked Action Chunking | Novelty Validation