Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

ProteinsMolecular dynamicsGenerative modelingDiffusion modelsAutoregressive modelingSE(3)-equivariant diffusionSpatiotemporal modeling

Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics, but their computational cost limits access to biologically relevant timescales. Recent generative models have shown promise in accelerating simulations, yet they struggle with long-horizon generation due to architectural constraints, error accumulation, and inadequate modeling of spatiotemporal dynamics. We present STAR-MD (Spatio-Temporal Autoregressive Rollout for Molecular Dynamics), a scalable SE(3)-equivariant diffusion model that generates physically plausible protein trajectories over microsecond timescales. Our key innovation is a causal diffusion transformer with joint spatiotemporal attention that efficiently captures complex space-time dependencies while avoiding the memory bottlenecks of existing methods. On the standard ATLAS benchmark, STAR-MD achieves state-of-the-art performance across all metrics--substantially improving conformational coverage, structural validity, and dynamic fidelity compared to previous methods. STAR-MD successfully extrapolates to generate stable microsecond-scale trajectories where baseline methods fail catastrophically, maintaining high structural quality throughout the extended rollout. Our comprehensive evaluation reveals severe limitations in current models for long-horizon generation, while demonstrating that STAR-MD's joint spatiotemporal modeling enables robust dynamics simulation at biologically relevant timescales, paving the way for accelerated exploration of protein function.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

STAR-MD proposes a spatio-temporal autoregressive diffusion model for generating microsecond-scale protein trajectories. The paper resides in the 'Autoregressive Trajectory Generation' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy of fifty papers. This leaf focuses specifically on sequential prediction of future protein states over extended timescales, distinguishing it from single-step ensemble sampling methods found in neighboring diffusion-based approaches.

The taxonomy reveals that STAR-MD sits within the 'Deep Generative Models for Protein Dynamics' branch, adjacent to leaves covering diffusion-based conformational sampling, foundation models, and variational methods. The scope notes clarify that autoregressive trajectory generation explicitly excludes non-sequential generative methods, while the diffusion sampling leaf excludes long-horizon rollout approaches. This positioning suggests STAR-MD bridges two methodological paradigms—diffusion modeling and autoregressive generation—in a research area where most prior work treats these as separate strategies rather than unified architectures.

Among twenty-two candidates examined across three contributions, the core STAR-MD framework shows one refutable candidate from ten examined, while the causal diffusion transformer architecture and technical stability improvements show no clear refutations from ten and two candidates respectively. The limited search scope means these statistics reflect top-K semantic matches rather than exhaustive coverage. The architectural innovation appears more novel within this sample, though the overall framework concept has at least one overlapping prior work among the candidates reviewed.

Given the sparse three-paper leaf and limited twenty-two-candidate search, the analysis suggests moderate novelty in architectural design but acknowledges incomplete coverage of the broader literature. The combination of causal attention with diffusion-based autoregressive rollout may represent a meaningful synthesis, though the search scope prevents definitive claims about field-wide originality. The taxonomy structure indicates this work addresses an active but not yet crowded research direction.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: long-horizon protein dynamics generation. The field aims to predict or simulate how proteins evolve over extended timescales, capturing conformational transitions and functional motions that are often inaccessible to standard molecular dynamics. The taxonomy reflects a multifaceted landscape organized into five main branches. Deep Generative Models for Protein Dynamics encompasses neural approaches—including diffusion models, autoregressive schemes, and latent-space methods—that learn to generate plausible trajectories from data. Physics-Based Simulation and Enhanced Sampling gathers classical and hybrid techniques such as Markov state models and advanced sampling strategies that leverage physical force fields. Dynamics Analysis and Mechanistic Interpretation focuses on extracting biological insight from simulated or experimental trajectories, while Methodological Foundations and Computational Infrastructure addresses algorithmic building blocks and software frameworks. Finally, Domain-Specific Applications and Case Studies illustrate how these methods apply to particular proteins or biological questions, such as enzyme catalysis or ligand binding kinetics. Within the generative-modeling branch, a particularly active line of work explores autoregressive trajectory generation, where models predict successive conformational snapshots conditioned on prior states. Scalable Spatio-Temporal Diffusion[0] sits squarely in this cluster, emphasizing efficient handling of long sequences through diffusion-based architectures. Nearby efforts like DeepJump[19] and TEMPO[33] also tackle autoregressive or stepwise generation but may differ in their treatment of temporal correlations or the incorporation of physical priors. Another contrasting theme emerges in works such as AI2BMD[1] and 4D Diffusion Dynamics[2], which blend learned generative components with physics-inspired constraints to balance data-driven flexibility and thermodynamic consistency. The central trade-off across these branches is between pure learning—where models capture complex distributions directly from simulation data—and hybrid strategies that enforce known physical laws, each offering different advantages in accuracy, generalizability, and computational cost. Scalable Spatio-Temporal Diffusion[0] represents a step toward scaling diffusion frameworks for spatiotemporal protein data, positioning itself among methods that prioritize end-to-end learning while remaining mindful of the long-horizon challenge.

Claimed Contributions

STAR-MD: Spatio-Temporal Autoregressive Rollout for Molecular Dynamics

Can Refute

10 retrieved papers

The authors introduce STAR-MD, an SE(3)-equivariant autoregressive diffusion model designed to generate physically plausible protein trajectories at microsecond timescales. The model addresses scalability and long-horizon generation challenges in protein dynamics modeling.

10 retrieved papers

Can Refute

Causal diffusion transformer with joint spatiotemporal attention

10 retrieved papers

The authors propose a novel causal diffusion transformer architecture that employs joint spatiotemporal attention to model complex space-time dependencies efficiently. This design avoids the memory bottlenecks associated with pairwise feature representations used in prior methods.

10 retrieved papers

Technical improvements for stable long-horizon generation

2 retrieved papers

The authors introduce several technical innovations including historical context noise perturbation, block-diffusion-style causal training, and continuous-time conditioning. These improvements enable efficient training and stable generation of long protein trajectories while mitigating error accumulation.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[19] Accelerating Protein Molecular Dynamics Simulation with DeepJump PDF

Ponnapati, Manvitha, Smidt, Tess, Jacobson Joseph (2025)

[33] TEMPO: Temporal Multi-scale Autoregressive Generation of Protein Conformational Ensembles PDF

Yaoyao Xu, Di Wang, Zihan Zhou, Tianshu Yu, Mingchen Chen (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

STAR-MD: Spatio-Temporal Autoregressive Rollout for Molecular Dynamics

[69] Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression PDF

Can Refute

[3] Angular Deviation Diffuser: A Transformer-Based Diffusion Model for Efficient Protein Conformational Ensemble Generation PDF

Cannot Refute

[19] Accelerating Protein Molecular Dynamics Simulation with DeepJump PDF

Cannot Refute

[63] Equivariant blurring diffusion for hierarchical molecular conformer generation PDF

Cannot Refute

[64] Equivariant graph neural operator for modeling 3d dynamics PDF

Cannot Refute

[65] DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations PDF

Cannot Refute

[66] Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation PDF

Cannot Refute

[67] Fast protein backbone generation with se (3) flow matching PDF

Cannot Refute

[68] From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering PDF

Cannot Refute

[70] A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets PDF

Cannot Refute

Contribution

Causal diffusion transformer with joint spatiotemporal attention

[51] Videograin: Modulating space-time attention for multi-grained video editing PDF

Cannot Refute

[52] Long-range transformers for dynamic spatiotemporal forecasting PDF

Cannot Refute

[53] Swap attention in spatiotemporal diffusions for text-to-video generation PDF

Cannot Refute

[54] CityCAN: Causal attention network for citywide spatio-temporal forecasting PDF

Cannot Refute

[55] AMDiffusion: Domain-Adaptive Diffusion Modeling for Causal Data Fusion in Additive Manufacturing Digital Twins PDF

Cannot Refute

[56] Icst-dnet: An interpretable causal spatio-temporal diffusion network for traffic speed prediction PDF

Cannot Refute

[57] 360-degree Human Video Generation with 4D Diffusion Transformer PDF

Cannot Refute

[58] VDT: General-purpose Video Diffusion Transformers via Mask Modeling PDF

Cannot Refute

[59] Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach PDF

Cannot Refute

[60] UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines PDF

Cannot Refute

Contribution

Technical improvements for stable long-horizon generation

[61] Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation PDF

Cannot Refute

[62] BlockVid: Block Diffusion for High-Fidelity and Coherent Minute-Long Video Generation PDF

Cannot Refute

Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[19] Accelerating Protein Molecular Dynamics Simulation with DeepJump PDF

[33] TEMPO: Temporal Multi-scale Autoregressive Generation of Protein Conformational Ensembles PDF

Contribution Analysis

STAR-MD: Spatio-Temporal Autoregressive Rollout for Molecular Dynamics

[69] Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression PDF

[3] Angular Deviation Diffuser: A Transformer-Based Diffusion Model for Efficient Protein Conformational Ensemble Generation PDF

[19] Accelerating Protein Molecular Dynamics Simulation with DeepJump PDF

[63] Equivariant blurring diffusion for hierarchical molecular conformer generation PDF

[64] Equivariant graph neural operator for modeling 3d dynamics PDF

[65] DiffMD: A Geometric Diffusion Model for Molecular Dynamics Simulations PDF

[66] Diffusion-edfs: Bi-equivariant denoising generative modeling on se (3) for visual robotic manipulation PDF

[67] Fast protein backbone generation with se (3) flow matching PDF

[68] From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering PDF

[70] A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets PDF

Causal diffusion transformer with joint spatiotemporal attention

[51] Videograin: Modulating space-time attention for multi-grained video editing PDF

[52] Long-range transformers for dynamic spatiotemporal forecasting PDF

[53] Swap attention in spatiotemporal diffusions for text-to-video generation PDF

[54] CityCAN: Causal attention network for citywide spatio-temporal forecasting PDF

[55] AMDiffusion: Domain-Adaptive Diffusion Modeling for Causal Data Fusion in Additive Manufacturing Digital Twins PDF

[56] Icst-dnet: An interpretable causal spatio-temporal diffusion network for traffic speed prediction PDF

[57] 360-degree Human Video Generation with 4D Diffusion Transformer PDF

[58] VDT: General-purpose Video Diffusion Transformers via Mask Modeling PDF

[59] Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach PDF

[60] UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines PDF

Technical improvements for stable long-horizon generation

[61] Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation PDF

[62] BlockVid: Block Diffusion for High-Fidelity and Coherent Minute-Long Video Generation PDF

Table of Contents