Compositional Diffusion with Guided search for Long-Horizon Planning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Diffusion ModelsCompositional DiffusionGoal-directed Planning

Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this \emph{mode averaging} problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Compositional Diffusion with Guided Search (CDGS), a method for composing short-horizon diffusion models into long-horizon robot manipulation plans. It resides in the 'Diffusion-Based Trajectory Composition' leaf, which contains only four papers total, including this work and three siblings. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific approach of embedding search within diffusion denoising for compositional planning is not yet heavily explored.

The taxonomy reveals that robotic manipulation encompasses multiple alternative paradigms: hierarchical skill chaining and subgoal decomposition (five papers), imitation learning (three papers), model-based RL (two papers), and vision-language-action systems (three papers). The diffusion-based trajectory composition leaf sits adjacent to these approaches, sharing the goal of long-horizon planning but diverging in its use of probabilistic generative models rather than hierarchical abstractions or reinforcement learning. The scope note explicitly excludes non-diffusion methods, positioning this work within a narrower methodological niche focused on generative composition.

Among 18 candidates examined across three contributions, the iterative resampling mechanism shows one refutable candidate from 10 examined, while the core CDGS framework and likelihood-based pruning appear more novel (zero refutable candidates from eight and zero examined, respectively). The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The iterative resampling mechanism's overlap with prior work suggests this component may have precedent, while the integration of search-based mode exploration within diffusion denoising appears less directly anticipated by the examined literature.

Given the sparse four-paper leaf and limited 18-candidate search, the work appears to occupy a relatively unexplored intersection of diffusion models and search-based planning for compositional generation. The taxonomy context suggests the field is fragmented across diverse methodological branches, with diffusion-based composition representing a minority approach. However, the analysis cannot rule out relevant work outside the top-K semantic neighborhood or in adjacent communities not captured by the search strategy.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Compositional generation of long-horizon sequences from short-horizon models. This field addresses the challenge of extending predictions or plans far into the future by composing outputs from models trained on shorter temporal windows. The taxonomy reveals a diverse landscape spanning robotic manipulation and task planning, spatiotemporal forecasting (weather radar, precipitation), video generation and synthesis, motion and trajectory synthesis, sequential recommendation, domain-specific temporal modeling (medical imaging, materials science), reasoning and memory architectures, multi-step time series forecasting, long-horizon vision tasks, and bio-inspired neuromorphic memory systems. Within robotic manipulation, diffusion-based trajectory composition has emerged as a particularly active direction, leveraging generative models to stitch together short-horizon skills or subgoals into coherent long-horizon behaviors. Meanwhile, spatiotemporal forecasting branches explore recurrent and attention-based architectures for extrapolating radar echoes or precipitation patterns, and video synthesis methods tackle the challenge of maintaining temporal consistency over extended sequences. A central tension across these branches involves balancing computational efficiency with the ability to capture long-range dependencies and avoid compounding errors. In robotic planning, works like Trajectory Stitching Diffusion[5] and Diverse Trajectory Stitching[13] explore how to compose pre-trained diffusion models for different skills, while Compositional Diffusion Planning[0] sits within this cluster by emphasizing modular composition of trajectory segments to achieve extended task horizons. Compared to approaches that rely on hierarchical abstractions (e.g., Subgoal Manipulation[2]) or skill chaining (Skill Chaining Diffusion[7]), the diffusion-based composition methods offer flexible probabilistic blending of short-horizon priors. In contrast, spatiotemporal forecasting branches such as LSTM Radar Extrapolation[3] and SepConv Radar Ensemble[4] focus on recurrent or convolutional architectures for weather prediction, highlighting a different set of trade-offs around spatial resolution and ensemble uncertainty. The original paper's emphasis on diffusion-based trajectory composition places it squarely within the robotic manipulation branch, where it contributes to ongoing efforts to scale planning horizons without retraining monolithic models.

Claimed Contributions

Compositional Diffusion with Guided Search (CDGS)

8 retrieved papers

CDGS is a novel inference-time algorithm that integrates guided search into the diffusion denoising process to compose short-horizon local generative models into coherent long-horizon plans. The method addresses the mode-averaging problem in compositional generative models through population-based sampling, iterative resampling for global consistency, and likelihood-based pruning of infeasible candidates.

8 retrieved papers

Iterative resampling mechanism for local-to-global message passing

Can Refute

10 retrieved papers

The method introduces an iterative resampling procedure that alternates between forward noising and denoising steps to propagate information across distant segments through overlapping variables. This enables effective local-to-global message passing, ensuring that compositional sampling produces globally coherent candidate plans.

10 retrieved papers

Can Refute

Likelihood-based pruning using DDIM inversion

0 retrieved papers

The approach employs a novel pruning mechanism based on DDIM inversion to approximate local plan likelihoods and filter out incoherent global plans. The method defines a smoothness measure based on diffusion trajectory curvature to identify and eliminate plans with locally inconsistent segments that result from mode-averaging.

0 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Generative trajectory stitching through diffusion composition PDF

Luo, Yunhao, Mishra, Utkarsh A., Yunhao Luo, Du, Yilun, Utkarsh A. Mishra, Xu, Danfei, Yilun Du, Danfei Xu (2025)

[13] What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? PDF

Q Clark, F Shkurti (2025)

[50] Compositional Visual Planning via Inference-Time Diffusion Scaling PDF

TD SCALING (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Compositional Diffusion with Guided Search (CDGS)

[50] Compositional Visual Planning via Inference-Time Diffusion Scaling PDF

Cannot Refute

[51] Unifying Modern AI with Robotics: Survey on MDPs with Diffusion and Foundation Models PDF

Cannot Refute

[52] Compositional Monte Carlo Tree Diffusion for Extendable Planning PDF

Cannot Refute

[53] Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning PDF

Cannot Refute

[54] Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems PDF

Cannot Refute

[55] Inference-time Scaling of Diffusion Models through Classical Search PDF

Cannot Refute

[56] ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion PDF

Cannot Refute

[57] Controllable Graph Generation with Diffusion Models via Inference-Time Tree Search Guidance PDF

Cannot Refute

Contribution

Iterative resampling mechanism for local-to-global message passing

[62] Compositional foundation models for hierarchical planning PDF

Can Refute

[58] Artificial intelligence for catalyst design and synthesis PDF

Cannot Refute

[59] AccDiffusion: An Accurate Method for Higher-Resolution Image Generation PDF

Cannot Refute

[60] ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints PDF

Cannot Refute

[61] Constructing a 3D Town from a Single Image PDF

Cannot Refute

[63] Hydra: A hyper agent for dynamic compositional visual reasoning PDF

Cannot Refute

[64] Roomtex: Texturing compositional indoor scenes via iterative inpainting PDF

Cannot Refute

[65] Streamlining Robust Constrained Production Optimization: An Integrated Framework Utilizing Automatically Differentiated Gradient from Deep-Learning-Based â¦ PDF

Cannot Refute

[66] Multi-view people tracking via hierarchical trajectory composition PDF

Cannot Refute

[67] Seismic Data Interpolation via Denoising Diffusion Implicit Models With Coherence-Corrected Resampling PDF

Cannot Refute

Contribution

Compositional Diffusion with Guided search for Long-Horizon Planning

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Generative trajectory stitching through diffusion composition PDF

[13] What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? PDF

[50] Compositional Visual Planning via Inference-Time Diffusion Scaling PDF

Contribution Analysis

Compositional Diffusion with Guided Search (CDGS)

[50] Compositional Visual Planning via Inference-Time Diffusion Scaling PDF

[51] Unifying Modern AI with Robotics: Survey on MDPs with Diffusion and Foundation Models PDF

[52] Compositional Monte Carlo Tree Diffusion for Extendable Planning PDF

[53] Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning PDF

[54] Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems PDF

[55] Inference-time Scaling of Diffusion Models through Classical Search PDF

[56] ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion PDF

[57] Controllable Graph Generation with Diffusion Models via Inference-Time Tree Search Guidance PDF

Iterative resampling mechanism for local-to-global message passing

[62] Compositional foundation models for hierarchical planning PDF

[58] Artificial intelligence for catalyst design and synthesis PDF

[59] AccDiffusion: An Accurate Method for Higher-Resolution Image Generation PDF

[60] ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints PDF

[61] Constructing a 3D Town from a Single Image PDF

[63] Hydra: A hyper agent for dynamic compositional visual reasoning PDF

[64] Roomtex: Texturing compositional indoor scenes via iterative inpainting PDF

[65] Streamlining Robust Constrained Production Optimization: An Integrated Framework Utilizing Automatically Differentiated Gradient from Deep-Learning-Based â¦ PDF

[66] Multi-view people tracking via hierarchical trajectory composition PDF

[67] Seismic Data Interpolation via Denoising Diffusion Implicit Models With Coherence-Corrected Resampling PDF

Likelihood-based pruning using DDIM inversion

Table of Contents

[65] Streamlining Robust Constrained Production Optimization: An Integrated Framework Utilizing Automatically Differentiated Gradient from Deep-Learning-Based â¦ PDF