ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

Vision-language-actionembodied agentlarge language modelsLong-Horizon Planning

Vision–Language–Action (VLA) agents follow instructions to perform multi-step tasks in multimodal environments. To support planning and execution in such settings, many approaches typically adopt structured post-hoc or rely on fixed decomposition and rigid alignment to improve success rate. However, once an intermediate subgoal or action is mis-specified and without a flexible correction mechanism, local errors propagate through subsequent steps and eventually accumulate into cascading failures in long-horizon reasoning. To mitigate this compounding effect, we propose Reflective Contrastive Alignment and Planning Architecture (ReCAPA), a framework that uses predictive correction to anticipate deviations and adjust representations across three levels: actions, subgoals, and trajectories. Semantic alignment is enforced at all levels using a Sinkhorn-based module and a Score-field module. The corrective signals, derived from predictive correction and alignment mechanisms, jointly update the execution network during training, enabling it to flexibly adjust fine-grained steps to remain aligned with the overall intent. We further introduce two new metrics to quantify error propagation and recovery processes in tasks. Experiments show that ReCAPA achieves competitive results on embodied agent benchmarks such as VisualAgentBench, MineDojo, and MAP-THOR, outperforming strong proprietary and open-source Large Language Model (LLM) baselines.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes ReCAPA, a framework for vision-language-action agents that uses hierarchical predictive correction to mitigate cascading failures in multi-step task execution. It resides in the 'Hierarchical Planning and Error Correction in VLA Agents' leaf, which contains only two papers including this one. This represents a relatively sparse research direction within the broader taxonomy of 50 papers across 33 leaf nodes, suggesting the specific focus on predictive correction mechanisms for error propagation in embodied agents is not yet densely populated.

The taxonomy reveals that ReCAPA sits within the 'Vision-Language-Action Agent Architectures and Planning' branch, which also includes work on embodied agent environments and benchmarks. Neighboring branches address language model reasoning capabilities and multi-objective optimization, but these lack the embodied grounding and hierarchical action correction focus. The sibling paper Video-of-Thought emphasizes visual reasoning traces for action guidance, whereas ReCAPA appears to differentiate itself through predictive correction mechanisms that operate across action, subgoal, and trajectory levels to prevent error accumulation before failures cascade.

Among 20 candidates examined across three contributions, none were found to clearly refute the proposed work. The ReCAPA framework examined 6 candidates with no refutable overlap, the error propagation metrics (EPR and PAC) examined 10 candidates with no refutable overlap, and the Hierarchical Predictive Contrastive Correction module examined 4 candidates with no refutable overlap. This suggests that within the limited search scope, the specific combination of hierarchical predictive correction, multi-level alignment mechanisms, and error propagation quantification appears relatively novel, though the search scale of 20 papers means substantial prior work may exist beyond these top semantic matches.

Based on the limited literature search of 20 candidates, the work appears to occupy a distinct position combining hierarchical planning with predictive correction for cascading failure mitigation. However, the sparse population of its taxonomy leaf and the modest search scope mean this assessment reflects only the most semantically similar work, not an exhaustive field survey. The absence of refutable candidates may indicate genuine novelty or simply that closely related work uses different terminology or framing.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

The core task of this survey centers on understanding how hierarchical predictive correction mechanisms can mitigate cascading failures across diverse problem domains. The taxonomy reveals a field organized around eight major branches, spanning from Vision-Language-Action Agent Architectures and Planning—which explores how embodied agents integrate perception, reasoning, and action—to Domain-Specific Research Studies that ground theoretical advances in concrete application areas. Language Model Capabilities and Applications examines the reasoning and generation power of modern LLMs, while Multi-Objective and Multi-Task Optimization addresses settings where agents must balance competing goals. Machine Learning Methods and Evaluation, Objective Weighting and Decision Criteria Methods, and Quality Assessment and Evaluation Metrics collectively provide the methodological toolkit for designing, tuning, and validating complex systems. Research Methodology and Design offers meta-level guidance on structuring empirical investigations. Together, these branches reflect a field that bridges foundational AI architectures with rigorous evaluation frameworks and real-world deployment challenges. Particularly active lines of work emerge at the intersection of hierarchical planning and error recovery in agent systems. Several studies explore how agents can decompose complex tasks into manageable subtasks and adaptively correct errors when lower-level actions fail, a theme evident in works like Scaling Test-Time Compute[3] and Training Problem Solving[4], which investigate how additional computation at inference time or during training can improve robustness. ReCAPA[0] situates itself squarely within the Vision-Language-Action branch, specifically under Hierarchical Planning and Error Correction in VLA Agents, where it shares conceptual ground with Video-of-Thought[21]. While Video-of-Thought[21] emphasizes visual reasoning traces to guide action sequences, ReCAPA[0] appears to focus more directly on predictive correction mechanisms that anticipate and preempt cascading failures before they propagate through a hierarchical plan. This positions ReCAPA[0] as a contribution to the growing effort to make embodied agents not only capable of complex reasoning but also resilient to the compounding errors that arise in multi-step, real-world tasks.

Claimed Contributions

ReCAPA framework with hierarchical predictive correction

6 retrieved papers

ReCAPA is a framework that mitigates cascading failures in long-horizon reasoning by using hierarchical predictive correction across action, subgoal, and trajectory levels. It combines cross-level prediction with prompt-trajectory alignment to anticipate and correct deviations early, preventing error propagation.

6 retrieved papers

Error propagation metrics: EPR and PAC

10 retrieved papers

Two diagnostic metrics are introduced: Error Propagation Rate (EPR) quantifies how mistakes compound across future steps, while Propagation Attenuation Coefficient (PAC) measures how quickly errors dissipate over time, providing tools to evaluate agent stability beyond success rate.

10 retrieved papers

Hierarchical Predictive Contrastive Correction module

4 retrieved papers

HPCC is a module that predicts higher-level representations from lower-level steps and provides corrective signals through cross-level contrastive learning. It enforces consistency across action, subgoal, and trajectory levels using predictive losses and alignment mechanisms.

4 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

ReCAPA framework with hierarchical predictive correction

[51] Modular multi-level replanning tamp framework for dynamic environment PDF

Cannot Refute

[52] Iteratively refined feasibility checks in robotic assembly sequence planning PDF

Cannot Refute

[53] Long-horizon visual planning with goal-conditioned hierarchical predictors PDF

Cannot Refute

[54] DT-HRL: Mastering Long-Sequence Manipulation with Reimagined Hierarchical Reinforcement Learning PDF

Cannot Refute

[55] ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning PDF

Cannot Refute

[56] Generative Proto-Sequence: Sequence-Level Decision Making for Long-Horizon Reinforcement Learning PDF

Cannot Refute

Contribution

Error propagation metrics: EPR and PAC

[57] How language model hallucinations can snowball PDF

Cannot Refute

[58] Scaling flaws of verifier-guided search in mathematical reasoning PDF

Cannot Refute

[59] Measuring chain of thought faithfulness by unlearning reasoning steps PDF

Cannot Refute

[60] Faithful and unfaithful error recovery in chain of thought PDF

Cannot Refute

[61] ART: Automatic multi-step reasoning and tool-use for large language models PDF

Cannot Refute

[62] Lost at the Beginning of Reasoning PDF

Cannot Refute

[63] PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier PDF

Cannot Refute

[64] Dissociation of faithful and unfaithful reasoning in llms PDF

Cannot Refute

[65] Stochastic lexical dissonance injection for self-consistent reasoning in large language models: A quantitative investigation PDF

Cannot Refute

[66] Recursive decomposition of logical thoughts: Framework for superior reasoning and knowledge propagation in large language models PDF

Cannot Refute

Contribution

Hierarchical Predictive Contrastive Correction module

[67] HiCLRE: A hierarchical contrastive learning framework for distantly supervised relation extraction PDF

Cannot Refute

[68] Cascaded State Space and Contrastive Learning for Cross-Domain Few-Shot Segmentation PDF

Cannot Refute

[69] Salient object detection based on multi-scale contrast PDF

Cannot Refute

[70] Multi-hierarchical error-aware contrastive learning for event argument extraction PDF

Cannot Refute

ReCAPA: Hierarchical Predictive Correction to Mitigate Cascading Failures

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

ReCAPA framework with hierarchical predictive correction

[51] Modular multi-level replanning tamp framework for dynamic environment PDF

[52] Iteratively refined feasibility checks in robotic assembly sequence planning PDF

[53] Long-horizon visual planning with goal-conditioned hierarchical predictors PDF

[54] DT-HRL: Mastering Long-Sequence Manipulation with Reimagined Hierarchical Reinforcement Learning PDF

[55] ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning PDF

[56] Generative Proto-Sequence: Sequence-Level Decision Making for Long-Horizon Reinforcement Learning PDF

Error propagation metrics: EPR and PAC

[57] How language model hallucinations can snowball PDF

[58] Scaling flaws of verifier-guided search in mathematical reasoning PDF

[59] Measuring chain of thought faithfulness by unlearning reasoning steps PDF

[60] Faithful and unfaithful error recovery in chain of thought PDF

[61] ART: Automatic multi-step reasoning and tool-use for large language models PDF

[62] Lost at the Beginning of Reasoning PDF

[63] PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier PDF

[64] Dissociation of faithful and unfaithful reasoning in llms PDF

[65] Stochastic lexical dissonance injection for self-consistent reasoning in large language models: A quantitative investigation PDF

[66] Recursive decomposition of logical thoughts: Framework for superior reasoning and knowledge propagation in large language models PDF

Hierarchical Predictive Contrastive Correction module

[67] HiCLRE: A hierarchical contrastive learning framework for distantly supervised relation extraction PDF

[68] Cascaded State Space and Contrastive Learning for Cross-Domain Few-Shot Segmentation PDF

[69] Salient object detection based on multi-scale contrast PDF

[70] Multi-hierarchical error-aware contrastive learning for event argument extraction PDF

Table of Contents