Contrastive Diffusion Guidance for Spatial Inverse Problems

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Diffusion ModelsInverse ProblemsContrastive LearningSpatial Inference

We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user’s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper addresses floorplan reconstruction from user movement trajectories using a diffusion-based posterior sampler with a novel contrastive embedding approach. It resides in the 'Floorplan and Indoor Layout Reconstruction' leaf of the taxonomy, which contains only two papers total. This is a notably sparse research direction within the broader 'Trajectory-Based Spatial Inference' branch, suggesting the problem space is relatively underexplored compared to neighboring areas like road geometry reconstruction or classical structure-from-motion pipelines that contain significantly more prior work.

The taxonomy reveals that most trajectory-based spatial inference work focuses on outdoor road networks or semantic scene modeling, with limited attention to indoor floorplan recovery. The paper's closest neighbors are Multi-Level Indoor Reconstruction and Automated Indoor Reconstruction, which employ hierarchical geometric reasoning rather than generative diffusion models. The broader 'Structure from Motion' branch contains extensive work on visual reconstruction methods, but these rely on photogrammetric principles rather than pure trajectory data, highlighting a clear methodological boundary between vision-based and movement-based spatial inference.

Among twenty candidates examined across three contributions, no clearly refuting prior work was identified. The contrastive embedding space for likelihood approximation examined nine candidates with zero refutations, while the CoGuide method examined one candidate and the Adam-DDIM integration examined ten candidates, both without refutation. This suggests that within the limited search scope of top-K semantic matches, the specific combination of diffusion-based sampling, contrastive embeddings, and trajectory-conditioned floorplan generation appears relatively unexplored, though the small candidate pool means substantial related work may exist beyond this search.

Based on the limited literature search of twenty candidates, the work appears to occupy a novel position combining generative diffusion models with trajectory-based indoor reconstruction. However, the sparse taxonomy leaf and small search scope mean this assessment reflects only the immediate semantic neighborhood, not an exhaustive field survey. The absence of refuting candidates may indicate genuine novelty or simply reflect the narrow search aperture and the nascent state of diffusion-based approaches in this specific application domain.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Reconstructing spatial layouts from movement trajectories. The field encompasses a diverse set of approaches organized into five main branches. Structure from Motion and 3D Scene Reconstruction (e.g., Structure from Motion Survey[1], Pixel-Perfect Structure[9]) focuses on classical photogrammetric and vision-based methods that recover geometry from image sequences or camera motion. Trajectory-Based Spatial Inference emphasizes inferring indoor layouts, floorplans, and semantic structure directly from movement data, often leveraging WiFi signals (WiFi Structure from Motion[7]) or egocentric observations (Egocentric Omnidirectional Reconstruction[34]). Vehicle Trajectory Reconstruction and Processing targets road-level geometry and lane networks (Lane-Level Road Geometry[8], Stopbar Video Reconstruction[39]), while Human and Object Motion Analysis examines behavioral patterns, tracking, and motion understanding (Segmentation is Tracking[29], Environment-aware Motion Matching[33]). Finally, Geometric and Kinematic Modeling addresses the mathematical foundations of motion and shape recovery (Line Trajectory Kinematics[16], Piecewise Planar Motion[41]). Recent work reveals a tension between classical geometric pipelines and learning-driven methods that exploit trajectory semantics or diffusion-based priors. Within Trajectory-Based Spatial Inference, a small cluster of studies tackles indoor reconstruction from sparse or noisy movement traces: Multi-Level Indoor Reconstruction[13] and Automated Indoor Reconstruction[10] demonstrate how hierarchical reasoning can handle multi-story buildings, while Contrastive Diffusion Guidance[0] introduces a generative framework that contrasts trajectory-conditioned priors to refine layout hypotheses. This approach sits close to Multi-Level Indoor Reconstruction[13] in targeting floorplan recovery, yet diverges by leveraging diffusion models rather than purely geometric constraints. Meanwhile, works like Sparse Trajectory Reconstruction[15] and Multi-Source Trajectory Reconstruction[12] explore fusion of heterogeneous data streams, highlighting open questions about how to best integrate partial observations and maintain global consistency across large-scale environments.

Claimed Contributions

Contrastive embedding space for likelihood score approximation

9 retrieved papers

The authors propose reformulating the likelihood score in a learned embedding space trained with contrastive learning. This embedding space maps compatible floorplan-trajectory pairs close together while separating incompatible pairs, providing a smoother surrogate for the intractable likelihood score in diffusion-based posterior sampling.

9 retrieved papers

CoGuide method for spatial inverse problems

1 retrieved paper

The authors introduce CoGuide, a diffusion-based method that uses contrastive guidance to solve spatial inverse problems, specifically reconstructing floorplans from user movement trajectories. The method addresses challenges posed by non-differentiable path-planning operators by operating in a learned embedding space.

1 retrieved paper

Adam optimizer integration with DDIM sampling

10 retrieved papers

The authors propose replacing standard gradient descent with the Adam optimizer during DDIM sampling steps, combined with cosine annealing of the learning rate. This modification improves convergence in the nonconvex posterior optimization by providing higher-order information about the optimization landscape.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[13] Automatic reconstruction of multi-level indoor spaces from point cloud and trajectory PDF

Gahyeon Lim, Nakju Doh, N. Doh (2021)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Contrastive embedding space for likelihood score approximation

[61] Contrastive sampling chains in diffusion models PDF

Cannot Refute

[62] Contrastive conditional latent diffusion for audio-visual segmentation PDF

Cannot Refute

[63] A statistical theory of contrastive pre-training and multimodal generative ai PDF

Cannot Refute

[64] Bridging Generative and Representation Learning with Diffusion Models PDF

Cannot Refute

[65] Context Matters: Enhancing Sequential Recommendation with Context-aware Diffusion-based Contrastive Learning PDF

Cannot Refute

[66] Your diffusion model is secretly a noise classifier and benefits from contrastive training PDF

Cannot Refute

[67] Fusion of diffusion models and intent learning in sequential recommendation PDF

Cannot Refute

[68] Guidance Conditions in Generative Modeling: Elevating Discriminative Capabilities and Controllability PDF

Cannot Refute

[69] Neural distribution estimation as a two-part problem PDF

Cannot Refute

Contribution

CoGuide method for spatial inverse problems

[70] Reconstructing visible and invisible maps of buildings PDF

Cannot Refute

Contribution

Adam optimizer integration with DDIM sampling

[51] Simple and fast distillation of diffusion models PDF

Cannot Refute

[52] Mitigating data imbalance in semantic segmentation using sequential unconditional and conditional diffusion models: a case study in digital rock physics PDF

Cannot Refute

[53] Gradient-free Decoder Inversion in Latent Diffusion Models PDF

Cannot Refute

[54] From noise to sound: Audio synthesis via diffusion models PDF

Cannot Refute

[55] Motion consistency model: Accelerating video diffusion with disentangled motion-appearance distillation PDF

Cannot Refute

[56] Sparse attention diffusion model for pathological micrograph deblurring PDF

Cannot Refute

[57] Boosting Diffusion Models with an Adaptive Momentum Sampler. PDF

Cannot Refute

[58] Dreamsampler: Unifying diffusion sampling and score distillation for image manipulation PDF

Cannot Refute

[59] Diffusion models for medical image computing: A survey PDF

Cannot Refute

[60] DaptDiffusion: Enhancing pixel-level interactive editing with dense-UNet and Adam point update in diffusion models PDF

Cannot Refute

Contrastive Diffusion Guidance for Spatial Inverse Problems

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[13] Automatic reconstruction of multi-level indoor spaces from point cloud and trajectory PDF

Contribution Analysis

Contrastive embedding space for likelihood score approximation

[61] Contrastive sampling chains in diffusion models PDF

[62] Contrastive conditional latent diffusion for audio-visual segmentation PDF

[63] A statistical theory of contrastive pre-training and multimodal generative ai PDF

[64] Bridging Generative and Representation Learning with Diffusion Models PDF

[65] Context Matters: Enhancing Sequential Recommendation with Context-aware Diffusion-based Contrastive Learning PDF

[66] Your diffusion model is secretly a noise classifier and benefits from contrastive training PDF

[67] Fusion of diffusion models and intent learning in sequential recommendation PDF

[68] Guidance Conditions in Generative Modeling: Elevating Discriminative Capabilities and Controllability PDF

[69] Neural distribution estimation as a two-part problem PDF

CoGuide method for spatial inverse problems

[70] Reconstructing visible and invisible maps of buildings PDF

Adam optimizer integration with DDIM sampling

[51] Simple and fast distillation of diffusion models PDF

[52] Mitigating data imbalance in semantic segmentation using sequential unconditional and conditional diffusion models: a case study in digital rock physics PDF

[53] Gradient-free Decoder Inversion in Latent Diffusion Models PDF

[54] From noise to sound: Audio synthesis via diffusion models PDF

[55] Motion consistency model: Accelerating video diffusion with disentangled motion-appearance distillation PDF

[56] Sparse attention diffusion model for pathological micrograph deblurring PDF

[57] Boosting Diffusion Models with an Adaptive Momentum Sampler. PDF

[58] Dreamsampler: Unifying diffusion sampling and score distillation for image manipulation PDF

[59] Diffusion models for medical image computing: A survey PDF

[60] DaptDiffusion: Enhancing pixel-level interactive editing with dense-UNet and Adam point update in diffusion models PDF

Table of Contents