Contrastive Diffusion Guidance for Spatial Inverse Problems

ICLR 2026 Conference SubmissionAnonymous Authors
Diffusion ModelsInverse ProblemsContrastive LearningSpatial Inference
Abstract:

We consider the inverse problem of reconstructing the spatial layout of a place, a home floorplan for example, from a user’s movements inside that layout. Direct inversion is ill-posed since many floorplans can explain the same movement trajectories. We adopt a diffusion-based posterior sampler to generate layouts consistent with the measurements. While active research is in progress on generative inverse solvers, we find that the forward operator in our problem poses new challenges. The path planning process inside a floorplan is a non-invertible, non-differentiable function, and causes instability while optimizing using the likelihood score. We break-away from existing approaches and reformulate the likelihood score in a smoother embedding space. The embedding space is trained with a contrastive loss which brings compatible floorplans and trajectories close to each other, while pushing mismatched pairs far apart. We show that a surrogate form of the likelihood score in this embedding space is a valid approximation of the true likelihood score, making it possible to steer the denoising process towards the posterior. Across extensive experiments, our model CoGuide produces more consistent floorplans from trajectories, and is more robust than differentiable-planner baselines and guided-diffusion methods.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper addresses floorplan reconstruction from user movement trajectories using a diffusion-based posterior sampler with a novel contrastive embedding approach. It resides in the 'Floorplan and Indoor Layout Reconstruction' leaf of the taxonomy, which contains only two papers total. This is a notably sparse research direction within the broader 'Trajectory-Based Spatial Inference' branch, suggesting the problem space is relatively underexplored compared to neighboring areas like road geometry reconstruction or classical structure-from-motion pipelines that contain significantly more prior work.

The taxonomy reveals that most trajectory-based spatial inference work focuses on outdoor road networks or semantic scene modeling, with limited attention to indoor floorplan recovery. The paper's closest neighbors are Multi-Level Indoor Reconstruction and Automated Indoor Reconstruction, which employ hierarchical geometric reasoning rather than generative diffusion models. The broader 'Structure from Motion' branch contains extensive work on visual reconstruction methods, but these rely on photogrammetric principles rather than pure trajectory data, highlighting a clear methodological boundary between vision-based and movement-based spatial inference.

Among twenty candidates examined across three contributions, no clearly refuting prior work was identified. The contrastive embedding space for likelihood approximation examined nine candidates with zero refutations, while the CoGuide method examined one candidate and the Adam-DDIM integration examined ten candidates, both without refutation. This suggests that within the limited search scope of top-K semantic matches, the specific combination of diffusion-based sampling, contrastive embeddings, and trajectory-conditioned floorplan generation appears relatively unexplored, though the small candidate pool means substantial related work may exist beyond this search.

Based on the limited literature search of twenty candidates, the work appears to occupy a novel position combining generative diffusion models with trajectory-based indoor reconstruction. However, the sparse taxonomy leaf and small search scope mean this assessment reflects only the immediate semantic neighborhood, not an exhaustive field survey. The absence of refuting candidates may indicate genuine novelty or simply reflect the narrow search aperture and the nascent state of diffusion-based approaches in this specific application domain.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Reconstructing spatial layouts from movement trajectories. The field encompasses a diverse set of approaches organized into five main branches. Structure from Motion and 3D Scene Reconstruction (e.g., Structure from Motion Survey[1], Pixel-Perfect Structure[9]) focuses on classical photogrammetric and vision-based methods that recover geometry from image sequences or camera motion. Trajectory-Based Spatial Inference emphasizes inferring indoor layouts, floorplans, and semantic structure directly from movement data, often leveraging WiFi signals (WiFi Structure from Motion[7]) or egocentric observations (Egocentric Omnidirectional Reconstruction[34]). Vehicle Trajectory Reconstruction and Processing targets road-level geometry and lane networks (Lane-Level Road Geometry[8], Stopbar Video Reconstruction[39]), while Human and Object Motion Analysis examines behavioral patterns, tracking, and motion understanding (Segmentation is Tracking[29], Environment-aware Motion Matching[33]). Finally, Geometric and Kinematic Modeling addresses the mathematical foundations of motion and shape recovery (Line Trajectory Kinematics[16], Piecewise Planar Motion[41]). Recent work reveals a tension between classical geometric pipelines and learning-driven methods that exploit trajectory semantics or diffusion-based priors. Within Trajectory-Based Spatial Inference, a small cluster of studies tackles indoor reconstruction from sparse or noisy movement traces: Multi-Level Indoor Reconstruction[13] and Automated Indoor Reconstruction[10] demonstrate how hierarchical reasoning can handle multi-story buildings, while Contrastive Diffusion Guidance[0] introduces a generative framework that contrasts trajectory-conditioned priors to refine layout hypotheses. This approach sits close to Multi-Level Indoor Reconstruction[13] in targeting floorplan recovery, yet diverges by leveraging diffusion models rather than purely geometric constraints. Meanwhile, works like Sparse Trajectory Reconstruction[15] and Multi-Source Trajectory Reconstruction[12] explore fusion of heterogeneous data streams, highlighting open questions about how to best integrate partial observations and maintain global consistency across large-scale environments.

Claimed Contributions

Contrastive embedding space for likelihood score approximation

The authors propose reformulating the likelihood score in a learned embedding space trained with contrastive learning. This embedding space maps compatible floorplan-trajectory pairs close together while separating incompatible pairs, providing a smoother surrogate for the intractable likelihood score in diffusion-based posterior sampling.

9 retrieved papers
CoGuide method for spatial inverse problems

The authors introduce CoGuide, a diffusion-based method that uses contrastive guidance to solve spatial inverse problems, specifically reconstructing floorplans from user movement trajectories. The method addresses challenges posed by non-differentiable path-planning operators by operating in a learned embedding space.

1 retrieved paper
Adam optimizer integration with DDIM sampling

The authors propose replacing standard gradient descent with the Adam optimizer during DDIM sampling steps, combined with cosine annealing of the learning rate. This modification improves convergence in the nonconvex posterior optimization by providing higher-order information about the optimization landscape.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Contrastive embedding space for likelihood score approximation

The authors propose reformulating the likelihood score in a learned embedding space trained with contrastive learning. This embedding space maps compatible floorplan-trajectory pairs close together while separating incompatible pairs, providing a smoother surrogate for the intractable likelihood score in diffusion-based posterior sampling.

Contribution

CoGuide method for spatial inverse problems

The authors introduce CoGuide, a diffusion-based method that uses contrastive guidance to solve spatial inverse problems, specifically reconstructing floorplans from user movement trajectories. The method addresses challenges posed by non-differentiable path-planning operators by operating in a learned embedding space.

Contribution

Adam optimizer integration with DDIM sampling

The authors propose replacing standard gradient descent with the Adam optimizer during DDIM sampling steps, combined with cosine annealing of the learning rate. This modification improves convergence in the nonconvex posterior optimization by providing higher-order information about the optimization landscape.