Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics

ICLR 2026 Conference SubmissionAnonymous Authors
ProteinsMolecular dynamicsGenerative modelingDiffusion modelsAutoregressive modelingSE(3)-equivariant diffusionSpatiotemporal modeling
Abstract:

Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics, but their computational cost limits access to biologically relevant timescales. Recent generative models have shown promise in accelerating simulations, yet they struggle with long-horizon generation due to architectural constraints, error accumulation, and inadequate modeling of spatiotemporal dynamics. We present STAR-MD (Spatio-Temporal Autoregressive Rollout for Molecular Dynamics), a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales. Our key innovation is a causal diffusion transformer with joint spatiotemporal attention that efficiently captures complex space-time dependencies while avoiding the memory bottlenecks of existing methods. On the standard ATLAS benchmark, STAR-MD achieves state-of-the-art performance across all metrics--substantially improving conformational coverage, structural validity, and dynamic fidelity compared to previous methods. STAR-MD successfully extrapolates to generate stable microsecond-scale trajectories where baseline methods fail catastrophically, maintaining high structural quality throughout the extended rollout. Our comprehensive evaluation reveals severe limitations in current models for long-horizon generation, while demonstrating that STAR-MD's joint spatiotemporal modeling enables robust dynamics simulation at biologically relevant timescales, paving the way for accelerated exploration of protein function.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

STAR-MD proposes a spatio-temporal autoregressive diffusion model for generating microsecond-scale protein trajectories. The paper resides in the 'Autoregressive Trajectory Generation' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy of fifty papers. This leaf focuses specifically on sequential prediction of future protein states over extended timescales, distinguishing it from single-step ensemble sampling methods found in neighboring diffusion-based approaches.

The taxonomy reveals that STAR-MD sits within the 'Deep Generative Models for Protein Dynamics' branch, adjacent to leaves covering diffusion-based conformational sampling, foundation models, and variational methods. The scope notes clarify that autoregressive trajectory generation explicitly excludes non-sequential generative methods, while the diffusion sampling leaf excludes long-horizon rollout approaches. This positioning suggests STAR-MD bridges two methodological paradigms—diffusion modeling and autoregressive generation—in a research area where most prior work treats these as separate strategies rather than unified architectures.

Among twenty-two candidates examined across three contributions, the core STAR-MD framework shows one refutable candidate from ten examined, while the causal diffusion transformer architecture and technical stability improvements show no clear refutations from ten and two candidates respectively. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The architectural innovation appears more novel within this sample, though the overall framework concept has at least one overlapping prior work among the candidates reviewed.

Given the sparse three-paper leaf and limited twenty-two-candidate search, the analysis suggests moderate novelty in architectural design but acknowledges incomplete coverage of the broader literature. The combination of causal attention with diffusion-based autoregressive rollout may represent a meaningful synthesis, though the search scope prevents definitive claims about field-wide originality. The taxonomy structure indicates this work addresses an active but not yet crowded research direction.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
22
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: long-horizon protein dynamics generation. The field aims to predict or simulate how proteins evolve over extended timescales, capturing conformational transitions and functional motions that are often inaccessible to standard molecular dynamics. The taxonomy reflects a multifaceted landscape organized into five main branches. Deep Generative Models for Protein Dynamics encompasses neural approaches—including diffusion models, autoregressive schemes, and latent-space methods—that learn to generate plausible trajectories from data. Physics-Based Simulation and Enhanced Sampling gathers classical and hybrid techniques such as Markov state models and advanced sampling strategies that leverage physical force fields. Dynamics Analysis and Mechanistic Interpretation focuses on extracting biological insight from simulated or experimental trajectories, while Methodological Foundations and Computational Infrastructure addresses algorithmic building blocks and software frameworks. Finally, Domain-Specific Applications and Case Studies illustrate how these methods apply to particular proteins or biological questions, such as enzyme catalysis or ligand binding kinetics. Within the generative-modeling branch, a particularly active line of work explores autoregressive trajectory generation, where models predict successive conformational snapshots conditioned on prior states. Scalable Spatio-Temporal Diffusion[0] sits squarely in this cluster, emphasizing efficient handling of long sequences through diffusion-based architectures. Nearby efforts like DeepJump[19] and TEMPO[33] also tackle autoregressive or stepwise generation but may differ in their treatment of temporal correlations or the incorporation of physical priors. Another contrasting theme emerges in works such as AI2BMD[1] and 4D Diffusion Dynamics[2], which blend learned generative components with physics-inspired constraints to balance data-driven flexibility and thermodynamic consistency. The central trade-off across these branches is between pure learning—where models capture complex distributions directly from simulation data—and hybrid strategies that enforce known physical laws, each offering different advantages in accuracy, generalizability, and computational cost. Scalable Spatio-Temporal Diffusion[0] represents a step toward scaling diffusion frameworks for spatiotemporal protein data, positioning itself among methods that prioritize end-to-end learning while remaining mindful of the long-horizon challenge.

Claimed Contributions

STAR-MD: Spatio-Temporal Autoregressive Rollout for Molecular Dynamics

The authors introduce STAR-MD, an SE(3)-equivariant autoregressive diffusion model designed to generate physically plausible protein trajectories at microsecond timescales. The model addresses scalability and long-horizon generation challenges in protein dynamics modeling.

10 retrieved papers
Can Refute
Causal diffusion transformer with joint spatiotemporal attention

The authors propose a novel causal diffusion transformer architecture that employs joint spatiotemporal attention to model complex space-time dependencies efficiently. This design avoids the memory bottlenecks associated with pairwise feature representations used in prior methods.

10 retrieved papers
Technical improvements for stable long-horizon generation

The authors introduce several technical innovations including historical context noise perturbation, block-diffusion-style causal training, and continuous-time conditioning. These improvements enable efficient training and stable generation of long protein trajectories while mitigating error accumulation.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

STAR-MD: Spatio-Temporal Autoregressive Rollout for Molecular Dynamics

The authors introduce STAR-MD, an SE(3)-equivariant autoregressive diffusion model designed to generate physically plausible protein trajectories at microsecond timescales. The model addresses scalability and long-horizon generation challenges in protein dynamics modeling.

Contribution

Causal diffusion transformer with joint spatiotemporal attention

The authors propose a novel causal diffusion transformer architecture that employs joint spatiotemporal attention to model complex space-time dependencies efficiently. This design avoids the memory bottlenecks associated with pairwise feature representations used in prior methods.

Contribution

Technical improvements for stable long-horizon generation

The authors introduce several technical innovations including historical context noise perturbation, block-diffusion-style causal training, and continuous-time conditioning. These improvements enable efficient training and stable generation of long protein trajectories while mitigating error accumulation.

Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics | Novelty Validation