The Spacetime of Diffusion Models: An Information Geometry Perspective

ICLR 2026 Conference SubmissionAnonymous Authors
diffusion modelsinformation geometry
Abstract:

We present a novel geometric perspective on the latent space of diffusion models. We first show that the standard pullback approach, utilizing the deterministic probability flow ODE decoder, is fundamentally flawed. It provably forces geodesics to decode as straight segments in data space, effectively ignoring any intrinsic data geometry beyond the ambient Euclidean space. Complementing this view, diffusion also admits a stochastic decoder via the reverse SDE, which enables an information geometric treatment with the Fisher-Rao metric. However, a choice of xT\mathbf{x}_T as the latent representation collapses this metric due to memorylessness. We address this by introducing a latent spacetime z=(xt,t)\mathbf{z}=(\mathbf{x}_t,t) that indexes the family of denoising distributions p(x0xt)p(\mathbf{x}_0 | \mathbf{x}_t) across all noise scales, yielding a nontrivial geometric structure. We prove these distributions form an exponential family and derive simulation-free estimators for curve lengths, enabling efficient geodesic computation. The resulting structure induces a principled Diffusion Edit Distance, where geodesics trace minimal sequences of noise and denoise edits between data. We also demonstrate benefits for transition path sampling in molecular systems, including constrained variants such as low-variance transitions and region avoidance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a latent spacetime representation (x_t, t) for diffusion models, deriving a Fisher-Rao metric structure and proving the denoising distributions form an exponential family. It resides in the 'Information Geometry and Metric Foundations' leaf, which contains only two papers total. This is a sparse research direction within the broader taxonomy of 50 papers, suggesting the theoretical foundations of diffusion latent geometry remain relatively underexplored compared to architectural or application-focused branches.

The taxonomy reveals neighboring leaves focused on empirical manifold analysis and semantic manipulation, while sibling branches address geometric latent architectures (molecular generation, 3D shapes) and non-Euclidean spaces (hyperbolic, graph-structured). The paper's theoretical metric derivation contrasts with these empirical or architectural approaches. Its scope_note explicitly excludes 'empirical manifold analysis or semantic direction discovery,' positioning it as foundational theory rather than applied geometry or editing methods.

Among 30 candidates examined, the latent spacetime representation and exponential family structure each show one refutable candidate (10 examined per contribution), while the Diffusion Edit Distance metric shows none (10 examined, zero refutable). The limited search scope means these statistics reflect top-K semantic matches, not exhaustive coverage. The edit distance contribution appears more novel within this sample, though the spacetime and exponential family ideas encounter at least one overlapping prior work each.

Given the sparse taxonomy leaf and limited literature search, the work appears to occupy relatively uncharted theoretical territory, though the presence of refutable candidates for two contributions suggests some prior exploration of information-geometric frameworks. The analysis covers top-30 semantic matches and does not claim exhaustive field coverage.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: geometric structure of diffusion model latent space. The field has organized itself around several complementary perspectives on how geometry shapes generative modeling. One major branch examines latent space geometry and metric structure, investigating the intrinsic mathematical properties—curvature, distance metrics, and information-geometric foundations—that govern how diffusion processes unfold in learned representations. A second branch focuses on geometric latent diffusion architectures, designing models that explicitly encode spatial or structural priors into their latent codes. Meanwhile, non-Euclidean and structured latent spaces explore hyperbolic, spherical, and graph-based geometries that better capture hierarchical or relational data, as seen in works like Hyperbolic Graph Diffusion[7] and Mixed Curvature Graph[10]. Parallel efforts in data-space geometric diffusion apply differential geometry directly to observed manifolds—molecular conformations in GeoDiff[5] or 3D shapes in Geometric Latent Diffusion[6]—while spatial control and layout-conditioned generation (e.g., LayoutDiffusion[2]) emphasize user-driven geometric constraints. Domain-specific applications round out the taxonomy, targeting problems from cryo-EM reconstruction to video generation where spatial coherence is paramount. Recent work reveals a tension between learning geometry implicitly versus imposing it by design. Many studies adopt Riemannian or information-geometric tools to understand latent manifolds post hoc, seeking interpretable directions or optimal traversal paths as in Riemannian Traversal[21] and Interpretable Directions[12]. Others, like Geometric Trajectory Diffusion[1] and Autodecoding Latent[3], build geometric inductive biases directly into the architecture or training objective. Spacetime Diffusion[0] sits within the information geometry and metric foundations cluster, emphasizing the mathematical underpinnings of how diffusion dynamics respect latent metric structure. This contrasts with nearby architectural approaches such as Geometric Latent Diffusion[6], which prioritizes equivariance and symmetry in the forward model, and with Learning Geometry[48], which focuses on discovering rather than assuming the latent manifold. Together, these lines of inquiry highlight an open question: whether the most effective geometric structure emerges from data-driven learning or from principled mathematical constraints.

Claimed Contributions

Latent spacetime representation with Fisher-Rao metric

The authors propose representing the latent space of diffusion models as a (D+1)-dimensional spacetime z = (xt, t) that indexes denoising distributions across all noise levels. This spacetime is equipped with a Fisher-Rao metric that varies with both state and time, restoring nontrivial geometry and enabling navigation across noise levels within a unified structure.

10 retrieved papers
Can Refute
Exponential family structure and simulation-free geodesic computation

The authors prove that denoising distributions in diffusion models form an exponential family, which simplifies the geometry and yields a practical method for computing geodesics. This enables curve length evaluation without running the reverse SDE, significantly reducing computational cost through simulation-free estimation.

10 retrieved papers
Can Refute
Diffusion Edit Distance metric

The Fisher-Rao geometry induces a principled distance metric called Diffusion Edit Distance on data, where geodesics between two data points trace the minimal sequence of edits: adding just enough noise to forget information specific to one endpoint and then denoising to introduce information specific to the other endpoint.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Latent spacetime representation with Fisher-Rao metric

The authors propose representing the latent space of diffusion models as a (D+1)-dimensional spacetime z = (xt, t) that indexes denoising distributions across all noise levels. This spacetime is equipped with a Fisher-Rao metric that varies with both state and time, restoring nontrivial geometry and enabling navigation across noise levels within a unified structure.

Contribution

Exponential family structure and simulation-free geodesic computation

The authors prove that denoising distributions in diffusion models form an exponential family, which simplifies the geometry and yields a practical method for computing geodesics. This enables curve length evaluation without running the reverse SDE, significantly reducing computational cost through simulation-free estimation.

Contribution

Diffusion Edit Distance metric

The Fisher-Rao geometry induces a principled distance metric called Diffusion Edit Distance on data, where geodesics between two data points trace the minimal sequence of edits: adding just enough noise to forget information specific to one endpoint and then denoising to introduce information specific to the other endpoint.

The Spacetime of Diffusion Models: An Information Geometry Perspective | Novelty Validation