ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures

ICLR 2026 Conference SubmissionAnonymous Authors
Vision-language-actionembodied agentlarge language modelsLong-Horizon Planning
Abstract:

Vision–Language–Action (VLA) agents follow instructions to perform multi-step tasks in multimodal environments. To support planning and execution in such settings, many approaches typically adopt structured post-hoc or rely on fixed decomposition and rigid alignment to improve success rate. However, once an intermediate subgoal or action is mis-specified and without a flexible correction mechanism, local errors propagate through subsequent steps and eventually accumulate into cascading failures in long-horizon reasoning. To mitigate this compounding effect, we propose Reflective Contrastive Alignment and Planning Architecture (ReCAPA), a framework that uses predictive correction to anticipate deviations and adjust representations across three levels: actions, subgoals, and trajectories. Semantic alignment is enforced at all levels using a Sinkhorn-based module and a Score-field module. The corrective signals, derived from predictive correction and alignment mechanisms, jointly update the execution network during training, enabling it to flexibly adjust fine-grained steps to remain aligned with the overall intent. We further introduce two new metrics to quantify error propagation and recovery processes in tasks. Experiments show that ReCAPA achieves competitive results on embodied agent benchmarks such as VisualAgentBench, MineDojo, and MAP-THOR, outperforming strong proprietary and open-source Large Language Model (LLM) baselines.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ReCAPA, a framework for vision-language-action agents that uses hierarchical predictive correction to mitigate cascading failures in multi-step task execution. It resides in the 'Hierarchical Planning and Error Correction in VLA Agents' leaf, which contains only two papers including this one. This represents a relatively sparse research direction within the broader taxonomy of 50 papers across 33 leaf nodes, suggesting the specific focus on predictive correction mechanisms for error propagation in embodied agents is not yet densely populated.

The taxonomy reveals that ReCAPA sits within the 'Vision-Language-Action Agent Architectures and Planning' branch, which also includes work on embodied agent environments and benchmarks. Neighboring branches address language model reasoning capabilities and multi-objective optimization, but these lack the embodied grounding and hierarchical action correction focus. The sibling paper Video-of-Thought emphasizes visual reasoning traces for action guidance, whereas ReCAPA appears to differentiate itself through predictive correction mechanisms that operate across action, subgoal, and trajectory levels to prevent error accumulation before failures cascade.

Among 20 candidates examined across three contributions, none were found to clearly refute the proposed work. The ReCAPA framework examined 6 candidates with no refutable overlap, the error propagation metrics (EPR and PAC) examined 10 candidates with no refutable overlap, and the Hierarchical Predictive Contrastive Correction module examined 4 candidates with no refutable overlap. This suggests that within the limited search scope, the specific combination of hierarchical predictive correction, multi-level alignment mechanisms, and error propagation quantification appears relatively novel, though the search scale of 20 papers means substantial prior work may exist beyond these top semantic matches.

Based on the limited literature search of 20 candidates, the work appears to occupy a distinct position combining hierarchical planning with predictive correction for cascading failure mitigation. However, the sparse population of its taxonomy leaf and the modest search scope mean this assessment reflects only the most semantically similar work, not an exhaustive field survey. The absence of refutable candidates may indicate genuine novelty or simply that closely related work uses different terminology or framing.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

The core task of this survey centers on understanding how hierarchical predictive correction mechanisms can mitigate cascading failures across diverse problem domains. The taxonomy reveals a field organized around eight major branches, spanning from Vision-Language-Action Agent Architectures and Planning—which explores how embodied agents integrate perception, reasoning, and action—to Domain-Specific Research Studies that ground theoretical advances in concrete application areas. Language Model Capabilities and Applications examines the reasoning and generation power of modern LLMs, while Multi-Objective and Multi-Task Optimization addresses settings where agents must balance competing goals. Machine Learning Methods and Evaluation, Objective Weighting and Decision Criteria Methods, and Quality Assessment and Evaluation Metrics collectively provide the methodological toolkit for designing, tuning, and validating complex systems. Research Methodology and Design offers meta-level guidance on structuring empirical investigations. Together, these branches reflect a field that bridges foundational AI architectures with rigorous evaluation frameworks and real-world deployment challenges. Particularly active lines of work emerge at the intersection of hierarchical planning and error recovery in agent systems. Several studies explore how agents can decompose complex tasks into manageable subtasks and adaptively correct errors when lower-level actions fail, a theme evident in works like Scaling Test-Time Compute[3] and Training Problem Solving[4], which investigate how additional computation at inference time or during training can improve robustness. ReCAPA[0] situates itself squarely within the Vision-Language-Action branch, specifically under Hierarchical Planning and Error Correction in VLA Agents, where it shares conceptual ground with Video-of-Thought[21]. While Video-of-Thought[21] emphasizes visual reasoning traces to guide action sequences, ReCAPA[0] appears to focus more directly on predictive correction mechanisms that anticipate and preempt cascading failures before they propagate through a hierarchical plan. This positions ReCAPA[0] as a contribution to the growing effort to make embodied agents not only capable of complex reasoning but also resilient to the compounding errors that arise in multi-step, real-world tasks.

Claimed Contributions

ReCAPA framework with hierarchical predictive correction

ReCAPA is a framework that mitigates cascading failures in long-horizon reasoning by using hierarchical predictive correction across action, subgoal, and trajectory levels. It combines cross-level prediction with prompt-trajectory alignment to anticipate and correct deviations early, preventing error propagation.

6 retrieved papers
Error propagation metrics: EPR and PAC

Two diagnostic metrics are introduced: Error Propagation Rate (EPR) quantifies how mistakes compound across future steps, while Propagation Attenuation Coefficient (PAC) measures how quickly errors dissipate over time, providing tools to evaluate agent stability beyond success rate.

10 retrieved papers
Hierarchical Predictive Contrastive Correction module

HPCC is a module that predicts higher-level representations from lower-level steps and provides corrective signals through cross-level contrastive learning. It enforces consistency across action, subgoal, and trajectory levels using predictive losses and alignment mechanisms.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ReCAPA framework with hierarchical predictive correction

ReCAPA is a framework that mitigates cascading failures in long-horizon reasoning by using hierarchical predictive correction across action, subgoal, and trajectory levels. It combines cross-level prediction with prompt-trajectory alignment to anticipate and correct deviations early, preventing error propagation.

Contribution

Error propagation metrics: EPR and PAC

Two diagnostic metrics are introduced: Error Propagation Rate (EPR) quantifies how mistakes compound across future steps, while Propagation Attenuation Coefficient (PAC) measures how quickly errors dissipate over time, providing tools to evaluate agent stability beyond success rate.

Contribution

Hierarchical Predictive Contrastive Correction module

HPCC is a module that predicts higher-level representations from lower-level steps and provides corrective signals through cross-level contrastive learning. It enforces consistency across action, subgoal, and trajectory levels using predictive losses and alignment mechanisms.