P3D: Highly Scalable 3D Neural Surrogates for Physics Simulations with Global Context

ICLR 2026 Conference SubmissionAnonymous Authors
neural surrogatesphysics simulationstransformers3D
Abstract:

We present a scalable framework for learning deterministic and probabilistic neural surrogates for high-resolution 3D physics simulations. We introduce P3D, a hybrid CNN-Transformer backbone architecture targeted for 3D physics simulations, which significantly outperforms existing architectures in terms of speed and accuracy. Our proposed network can be pretrained on small patches of the simulation domain, which can be fused to obtain a global solution, optionally guided via a scalable sequence-to-sequence model to include long-range dependencies. This setup allows for training large-scale models with reduced memory and compute requirements for high-resolution datasets. We evaluate our backbone architecture against a large set of baseline methods with the objective to simultaneously learn 14 different types of PDE dynamics in 3D. We demonstrate how to scale our model to high-resolution isotropic turbulence with spatial resolutions of up to 5123512^3. Finally, we show the versatility of our architecture by training it as a diffusion model to produce probabilistic samples of highly turbulent 3D channel flows across varying Reynolds numbers, accurately capturing the underlying flow statistics.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces P3D, a hybrid CNN-Transformer architecture for learning neural surrogates of high-resolution 3D physics simulations, with a focus on scalability and accuracy. It resides in the 'Hybrid and Multi-Scale Architectures' leaf, which contains four papers total, including the original work. This leaf sits within the broader 'Neural Architecture Design for 3D Physics' branch, indicating a moderately populated research direction. The taxonomy reveals that hybrid architectures combining multiple network types are an active but not overcrowded area, with sibling categories addressing graph-based, transformer-only, and domain-specific designs.

The taxonomy tree shows that neighboring leaves include 'Graph and Geometric Neural Networks' (three papers), 'Transformer-Based Architectures' (two papers), and 'Domain-Specific Network Designs' (three papers). The paper's hybrid approach bridges convolutional and transformer paradigms, distinguishing it from pure transformer methods in the sibling leaf. The 'Computational Efficiency and Scalability' branch (two papers) addresses related concerns about high-resolution simulation, while 'Physics-Informed Learning Frameworks' (seventeen papers across four leaves) represents a more densely populated alternative strategy. The taxonomy's scope and exclude notes clarify that this work focuses on architecture rather than physics integration or training methodologies.

Among twenty candidates examined, three refutable pairs were identified, all associated with the third contribution on flexible finetuning setups with memory-efficient gradient control. The first contribution (P3D hybrid architecture) examined seven candidates with zero refutations, suggesting relative novelty in this specific architectural combination. The second contribution (crop-based pretraining with global context) examined three candidates, also with zero refutations. The third contribution's three refutable candidates indicate that memory-efficient training strategies have more substantial prior work within the limited search scope. The analysis explicitly covers top-K semantic matches plus citation expansion, not an exhaustive literature review.

Based on the limited search scope of twenty candidates, the architectural contributions appear more distinctive than the training methodology. The taxonomy context reveals a moderately active research area with clear boundaries separating hybrid architectures from graph-based, transformer-only, and physics-informed approaches. The contribution-level statistics suggest that while the core P3D design shows novelty signals, the memory-efficient training aspects overlap with existing work. This assessment reflects the examined candidate set and does not claim comprehensive coverage of all relevant prior art.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
3
Refutable Paper

Research Landscape Overview

Core task: Learning neural surrogates for high-resolution 3D physics simulations. The field has evolved into several major branches that reflect different strategic emphases. Neural Architecture Design for 3D Physics explores specialized network structures—ranging from graph-based methods like Boundary Graph Networks[9] to hybrid and multi-scale designs that blend convolutional, transformer, and implicit representations. Physics-Informed Learning Frameworks, exemplified by works such as Modified Loss PINN[2] and PINNs Fluid Review[7], embed governing equations directly into training objectives. Surrogate Modeling and Operator Learning focuses on data-driven approximations of solution operators, often leveraging Graph Neural Operators[15] or frequency-domain techniques like Frequency Domain Wind[5]. Meanwhile, Computational Efficiency and Scalability addresses the practical challenge of deploying these surrogates at scale, as seen in P3D Scalable Surrogates[1]. Additional branches cover Benchmarking and Evaluation Frameworks, Specialized Physics Applications (from turbulence to astrophysics), Inverse Problems and Reconstruction, and Scene Generation and Dynamics Modeling, collectively spanning a wide spectrum of simulation tasks and domain requirements. Within the Neural Architecture Design branch, a particularly active line of work investigates hybrid and multi-scale architectures that combine local feature extraction with global context modeling. P3D Global Context[0] sits squarely in this cluster, emphasizing mechanisms that capture long-range dependencies in high-resolution 3D fields—a recurring challenge when simulating complex fluid or structural dynamics. Nearby efforts such as LRQ Solver[30] and Multi Resolution Hash[33] similarly explore multi-resolution representations, though they differ in how they balance computational cost against fidelity. Flow3DNet[3] offers another perspective by integrating flow-specific inductive biases into the architecture. The central trade-off across these works revolves around expressiveness versus efficiency: richer global context can improve accuracy on intricate phenomena, but often at the expense of memory and compute. P3D Global Context[0] addresses this by proposing tailored attention or hierarchical encoding strategies, positioning itself as a step toward scalable yet expressive surrogates for demanding 3D physics scenarios.

Claimed Contributions

P3D hybrid CNN-Transformer architecture for 3D physics simulations

The authors propose P3D, a novel backbone architecture that combines convolutional neural networks for efficient local feature extraction with windowed transformer blocks for learning generalizable token representations. This hybrid design is specifically optimized for scaling to very high-resolution 3D physics simulations.

7 retrieved papers
Crop-based pretraining with global context model for scalability

The authors introduce a scalable training framework where P3D can be pretrained on small spatial patches and then scaled to full domains. A sequence-to-sequence context model processes global dependencies by linking bottleneck representations, and region tokens inject global information back into decoder layers via adaptive normalization.

3 retrieved papers
Flexible finetuning setups with memory-efficient gradient control

The authors develop multiple training and inference configurations that allow selective gradient backpropagation through network components. These setups enable memory-efficient finetuning by freezing encoders or randomly disabling gradient flow, reducing computational costs while maintaining model performance.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

P3D hybrid CNN-Transformer architecture for 3D physics simulations

The authors propose P3D, a novel backbone architecture that combines convolutional neural networks for efficient local feature extraction with windowed transformer blocks for learning generalizable token representations. This hybrid design is specifically optimized for scaling to very high-resolution 3D physics simulations.

Contribution

Crop-based pretraining with global context model for scalability

The authors introduce a scalable training framework where P3D can be pretrained on small spatial patches and then scaled to full domains. A sequence-to-sequence context model processes global dependencies by linking bottleneck representations, and region tokens inject global information back into decoder layers via adaptive normalization.

Contribution

Flexible finetuning setups with memory-efficient gradient control

The authors develop multiple training and inference configurations that allow selective gradient backpropagation through network components. These setups enable memory-efficient finetuning by freezing encoders or randomly disabling gradient flow, reducing computational costs while maintaining model performance.