A Unification of Discrete, Gaussian, and Simplicial Diffusion

ICLR 2026 Conference SubmissionAnonymous Authors
discrete diffusionsimplicial diffusiongaussian diffusiongenerative modelsproteinsdna
Abstract:

To model discrete sequences such as DNA, proteins, and language using diffusion, practitioners must choose between three major methods: diffusion in discrete space, Gaussian diffusion in Euclidean space, or diffusion on the simplex. Despite their shared goal, these models have disparate algorithms, theoretical structures, and tradeoffs: discrete diffusion has the most natural domain, Gaussian diffusion has more mature algorithms, and diffusion on the simplex in principle combines the strengths of the other two but in practice suffers from a numerically unstable stochastic processes. Ideally we could see each of these models as instances of the same underlying framework, and enable practitioners to switch between models for downstream applications. However previous theories have only considered connections in special cases. Here we build a theory unifying all three methods of discrete diffusion as different parameterizations of the same underlying process: the Wright-Fisher population genetics model. In particular, we find simplicial and Gaussian diffusion as two large-population limits. Our theory formally connects the likelihoods and hyperparameters of these models and leverages decades of mathematical genetics literature to unlock stable simplicial diffusion. Finally, we relieve the practitioner of balancing model trade-offs by demonstrating it is possible to train a single model that can perform diffusion in any of these three domains at test time. Our experiments show that Wright-Fisher simplicial diffusion is more stable and outperforms previous simplicial diffusion models on conditional DNA generation. We also show that we can train models on multiple domains at once that are competitive with models trained on any individual domain.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a theoretical unification of discrete, Gaussian, and simplicial diffusion models through the Wright-Fisher population genetics framework. According to the taxonomy, this work occupies the 'Population Genetics-Based Unification of Diffusion Parameterizations' leaf, which currently contains only this paper as its sole member. This positioning suggests the paper pioneers a relatively sparse research direction within the broader field of unified diffusion theories, distinguishing itself from neighboring approaches that rely on general state-space abstractions or geometric smoothness arguments.

The taxonomy reveals two main branches: theoretical unification frameworks and application-specific architectures. The paper sits within the theoretical branch alongside 'General State Space Diffusion Theory,' which addresses similar unification goals but without population genetics grounding. The application branch, exemplified by hybrid embedding-space methods for text generation, represents a parallel but distinct research trajectory focused on domain-specific performance rather than cross-domain theoretical synthesis. The paper's population genetics lens thus carves out a methodological niche between abstract algebraic treatments and purely empirical architectural innovations.

Among the four candidates examined in the limited literature search, none clearly refute the paper's three main contributions. The unification via Wright-Fisher model was examined against zero candidates, while both the stable simplicial diffusion and sufficient-statistic parameterization contributions each faced two candidates with no refutations identified. This suggests that within the examined scope—admittedly narrow at four total candidates—the specific combination of population genetics theory, numerical stability improvements, and unified training mechanisms appears relatively unexplored. However, the small search scale means substantial prior work may exist beyond these top-ranked semantic matches.

Given the limited search scope of four candidates and the paper's position as the sole occupant of its taxonomy leaf, the work appears to introduce a novel theoretical perspective within the examined literature. The absence of sibling papers and the sparse population of the parent branch suggest this population genetics-based unification represents a fresh angle on discrete diffusion modeling. However, definitive novelty claims require broader literature coverage beyond the top-K semantic matches analyzed here.

Taxonomy

Core-task Taxonomy Papers
2
3
Claimed Contributions
4
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Unifying discrete, Gaussian, and simplicial diffusion models for discrete sequences. The field of diffusion models for discrete data has evolved into two broad directions. The first branch, Unified Theoretical Frameworks for Diffusion Across State Spaces, seeks to establish common mathematical principles that connect seemingly disparate diffusion formulations—discrete categorical processes, continuous Gaussian approximations, and simplex-constrained variants—under a single conceptual umbrella. Works in this branch often draw on insights from population genetics, optimal transport, or abstract state-space theory to reveal shared structure. The second branch, Application-Specific Diffusion Architectures for Discrete Sequences, focuses on tailoring diffusion mechanisms to particular domains such as text generation, protein design, or graph synthesis, emphasizing practical performance and domain constraints over theoretical unification. Together, these branches reflect a tension between generality and specialization that characterizes much of the recent literature. Within the theoretical branch, a particularly active line of inquiry explores how population genetics concepts—such as the Wright-Fisher model—can unify different parameterizations of the forward and reverse diffusion processes. Unified Diffusion[0] exemplifies this approach by demonstrating that discrete, Gaussian, and simplicial diffusion can be viewed as special cases of a single framework rooted in evolutionary dynamics. This contrasts with other recent efforts like Smoothie[1], which emphasizes smooth interpolations on the simplex, and Diffusion State Spaces[2], which abstracts diffusion to general state-space representations. By situating discrete sequence modeling within population genetics, Unified Diffusion[0] offers a principled way to transfer insights across state spaces, while neighboring works tend to prioritize either geometric smoothness or broad algebraic generality. The interplay among these perspectives highlights ongoing questions about which unifying lens best balances mathematical elegance with practical applicability.

Claimed Contributions

Unification of discrete, Gaussian, and simplicial diffusion via Wright-Fisher model

The authors formally prove that discrete, Gaussian, and simplicial diffusion are instances of the Wright-Fisher model from population genetics. Discrete diffusion corresponds to population size 1, while simplicial and Gaussian diffusion emerge as large-population limits with and without reproduction respectively.

0 retrieved papers
Stable simplicial diffusion using mathematical genetics literature

The authors address numerical instability issues in simplicial diffusion by applying solutions from mathematical genetics literature. They demonstrate that this stable simplicial diffusion outperforms previous simplicial diffusion models on conditional DNA generation tasks.

2 retrieved papers
Sufficient-statistic parameterization for unified training across domains

The authors introduce a sufficient-statistic parameterization that enables training a single neural network capable of performing diffusion in discrete, Gaussian, or simplicial domains at test time. Experiments show these unified models are competitive with models trained on individual domains.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Unification of discrete, Gaussian, and simplicial diffusion via Wright-Fisher model

The authors formally prove that discrete, Gaussian, and simplicial diffusion are instances of the Wright-Fisher model from population genetics. Discrete diffusion corresponds to population size 1, while simplicial and Gaussian diffusion emerge as large-population limits with and without reproduction respectively.

Contribution

Stable simplicial diffusion using mathematical genetics literature

The authors address numerical instability issues in simplicial diffusion by applying solutions from mathematical genetics literature. They demonstrate that this stable simplicial diffusion outperforms previous simplicial diffusion models on conditional DNA generation tasks.

Contribution

Sufficient-statistic parameterization for unified training across domains

The authors introduce a sufficient-statistic parameterization that enables training a single neural network capable of performing diffusion in discrete, Gaussian, or simplicial domains at test time. Experiments show these unified models are competitive with models trained on individual domains.