Compositional Diffusion with Guided search for Long-Horizon Planning
Overview
Overall Novelty Assessment
The paper proposes Compositional Diffusion with Guided Search (CDGS), a method for composing short-horizon diffusion models into long-horizon robot manipulation plans. It resides in the 'Diffusion-Based Trajectory Composition' leaf, which contains only four papers total, including this work and three siblings. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific approach of embedding search within diffusion denoising for compositional planning is not yet heavily explored.
The taxonomy reveals that robotic manipulation encompasses multiple alternative paradigms: hierarchical skill chaining and subgoal decomposition (five papers), imitation learning (three papers), model-based RL (two papers), and vision-language-action systems (three papers). The diffusion-based trajectory composition leaf sits adjacent to these approaches, sharing the goal of long-horizon planning but diverging in its use of probabilistic generative models rather than hierarchical abstractions or reinforcement learning. The scope note explicitly excludes non-diffusion methods, positioning this work within a narrower methodological niche focused on generative composition.
Among 18 candidates examined across three contributions, the iterative resampling mechanism shows one refutable candidate from 10 examined, while the core CDGS framework and likelihood-based pruning appear more novel (zero refutable candidates from eight and zero examined, respectively). The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The iterative resampling mechanism's overlap with prior work suggests this component may have precedent, while the integration of search-based mode exploration within diffusion denoising appears less directly anticipated by the examined literature.
Given the sparse four-paper leaf and limited 18-candidate search, the work appears to occupy a relatively unexplored intersection of diffusion models and search-based planning for compositional generation. The taxonomy context suggests the field is fragmented across diverse methodological branches, with diffusion-based composition representing a minority approach. However, the analysis cannot rule out relevant work outside the top-K semantic neighborhood or in adjacent communities not captured by the search strategy.
Taxonomy
Research Landscape Overview
Claimed Contributions
CDGS is a novel inference-time algorithm that integrates guided search into the diffusion denoising process to compose short-horizon local generative models into coherent long-horizon plans. The method addresses the mode-averaging problem in compositional generative models through population-based sampling, iterative resampling for global consistency, and likelihood-based pruning of infeasible candidates.
The method introduces an iterative resampling procedure that alternates between forward noising and denoising steps to propagate information across distant segments through overlapping variables. This enables effective local-to-global message passing, ensuring that compositional sampling produces globally coherent candidate plans.
The approach employs a novel pruning mechanism based on DDIM inversion to approximate local plan likelihoods and filter out incoherent global plans. The method defines a smoothness measure based on diffusion trajectory curvature to identify and eliminate plans with locally inconsistent segments that result from mode-averaging.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[5] Generative trajectory stitching through diffusion composition PDF
[13] What Do You Need for Diverse Trajectory Stitching in Diffusion Planning? PDF
[50] Compositional Visual Planning via Inference-Time Diffusion Scaling PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Compositional Diffusion with Guided Search (CDGS)
CDGS is a novel inference-time algorithm that integrates guided search into the diffusion denoising process to compose short-horizon local generative models into coherent long-horizon plans. The method addresses the mode-averaging problem in compositional generative models through population-based sampling, iterative resampling for global consistency, and likelihood-based pruning of infeasible candidates.
[50] Compositional Visual Planning via Inference-Time Diffusion Scaling PDF
[51] Unifying Modern AI with Robotics: Survey on MDPs with Diffusion and Foundation Models PDF
[52] Compositional Monte Carlo Tree Diffusion for Extendable Planning PDF
[53] Hybrid Diffusion for Simultaneous Symbolic and Continuous Planning PDF
[54] Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems PDF
[55] Inference-time Scaling of Diffusion Models through Classical Search PDF
[56] ComposableNav: Instruction-Following Navigation in Dynamic Environments via Composable Diffusion PDF
[57] Controllable Graph Generation with Diffusion Models via Inference-Time Tree Search Guidance PDF
Iterative resampling mechanism for local-to-global message passing
The method introduces an iterative resampling procedure that alternates between forward noising and denoising steps to propagate information across distant segments through overlapping variables. This enables effective local-to-global message passing, ensuring that compositional sampling produces globally coherent candidate plans.
[62] Compositional foundation models for hierarchical planning PDF
[58] Artificial intelligence for catalyst design and synthesis PDF
[59] AccDiffusion: An Accurate Method for Higher-Resolution Image Generation PDF
[60] ShapeShift: Towards Text-to-Shape Arrangement Synthesis with Content-Aware Geometric Constraints PDF
[61] Constructing a 3D Town from a Single Image PDF
[63] Hydra: A hyper agent for dynamic compositional visual reasoning PDF
[64] Roomtex: Texturing compositional indoor scenes via iterative inpainting PDF
[65] Streamlining Robust Constrained Production Optimization: An Integrated Framework Utilizing Automatically Differentiated Gradient from Deep-Learning-Based ⦠PDF
[66] Multi-view people tracking via hierarchical trajectory composition PDF
[67] Seismic Data Interpolation via Denoising Diffusion Implicit Models With Coherence-Corrected Resampling PDF
Likelihood-based pruning using DDIM inversion
The approach employs a novel pruning mechanism based on DDIM inversion to approximate local plan likelihoods and filter out incoherent global plans. The method defines a smoothness measure based on diffusion trajectory curvature to identify and eliminate plans with locally inconsistent segments that result from mode-averaging.