StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.7 Download Report PDF

3D Gaussian Splatting; Monocular Video Reconstruction

Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams demands robust online methods that recover scene dynamics from sparse observations under strict latency and memory constraints. Yet most dynamic reconstruction methods rely on hours of per-scene optimization under full-sequence access, limiting practical deployment. In this work, we introduce StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner. It is achieved via three key technical innovations: 1) a probabilistic sampling mechanism that robustly predicts 3D Gaussians from uncalibrated inputs; 2) a bidirectional deformation field that yields reliable associations across frames and mitigates long-term error accumulation; 3) an adaptive Gaussian fusion operation that propagates persistent Gaussians while handling emerging and vanishing ones. Extensive experiments on standard dynamic and static benchmarks demonstrate that StreamSplat achieves state-of-the-art reconstruction quality and dynamic scene modeling. Uniquely, our method supports the online reconstruction of arbitrarily long video streams with a $1200\times$ speedup over optimization-based methods. Our code and models will be made publicly available.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

StreamSplat introduces a feed-forward framework for online dynamic 3D reconstruction using Gaussian Splatting representations, processing uncalibrated video streams without per-scene optimization. The paper resides in the 'Real-Time Dynamic Gaussian Splatting' leaf, which contains five papers total, indicating a moderately populated but emerging research direction. This leaf sits within the broader 'Feed-Forward Dynamic Scene Reconstruction' branch, distinguishing itself from optimization-based methods that require iterative refinement. The focus on streaming input and online adaptation positions StreamSplat at the intersection of real-time performance and dynamic scene modeling.

The taxonomy reveals neighboring research directions that contextualize StreamSplat's contributions. Adjacent leaves include 'Generative Model-Based 3D Reconstruction' (leveraging diffusion priors) and 'Multi-Human 4D Reconstruction' (specialized for human subjects), both under the same feed-forward parent branch. The 'Incremental and Online Reconstruction' branch contains methods like dense volumetric reconstruction and online human-scene reconstruction, which share the streaming constraint but differ in representation choice (volumetric vs. Gaussian). StreamSplat's uncalibrated input handling also connects to the 'Uncalibrated Reconstruction Techniques' branch, though that category emphasizes augmented reality applications rather than dynamic scene modeling.

Among fifteen candidates examined across three contributions, none were identified as clearly refuting StreamSplat's novelty. The core framework (Contribution 1) examined nine candidates with zero refutable overlaps, suggesting limited prior work on fully feed-forward, online Gaussian splatting for dynamic scenes. The probabilistic sampling mechanism (Contribution 2) and bidirectional deformation field with adaptive fusion (Contribution 3) examined four and two candidates respectively, also without refutation. This limited search scope—fifteen papers from semantic retrieval—indicates that while no direct precedents emerged, the analysis does not exhaustively cover all related work in real-time reconstruction or deformation modeling.

Given the constrained literature search and the moderately populated taxonomy leaf, StreamSplat appears to occupy a distinct niche within real-time dynamic Gaussian splatting. The absence of refutable candidates among fifteen examined papers suggests technical differentiation from sibling works, though the small sample size precludes definitive claims about field-wide novelty. The combination of online processing, uncalibrated input, and adaptive Gaussian fusion distinguishes StreamSplat from optimization-heavy or batch-processing alternatives, but broader validation against the full corpus of dynamic reconstruction methods remains necessary.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: online dynamic 3D reconstruction from uncalibrated video streams. The field addresses the challenge of recovering time-varying geometry from video without prior camera calibration, spanning a diverse set of methodological branches. Feed-forward dynamic scene reconstruction methods emphasize rapid, single-pass inference using learned priors, often leveraging neural representations or Gaussian splatting for real-time performance. Optimization-based approaches iteratively refine geometry and motion through energy minimization, trading speed for accuracy. Incremental and online reconstruction techniques process frames sequentially, maintaining consistency across time, while static human reconstruction from monocular video focuses on capturing detailed body shape and pose from single-camera setups. Sparse feature-based methods rely on keypoint tracking and structure-from-motion pipelines, whereas depth-guided neural reconstruction integrates depth sensors or learned depth cues to regularize geometry. Specialized application branches target domains such as medical imaging, architectural scenes, or aerial footage, and motion and dynamics estimation explicitly models temporal deformations. Uncalibrated reconstruction techniques handle unknown or varying intrinsics, a critical capability for casual video capture. Recent work has concentrated on real-time dynamic Gaussian splatting, where methods like StreamSplat[0], DGS-LRM[4], and SplineGS[22] push the frontier of feed-forward reconstruction by representing scenes as collections of evolving 3D Gaussians. StreamSplat[0] emphasizes streaming video input and online adaptation, positioning itself within the real-time dynamic Gaussian splatting cluster alongside neighbors such as Bullet-Time Reconstruction[5] and QUEEN[24], which explore multi-view synchronization and quality-efficiency trade-offs. In contrast, optimization-based pipelines like Lyra[1] and incremental methods such as MegaSaM[6] prioritize geometric fidelity over speed, iteratively refining reconstructions as new frames arrive. The tension between feed-forward speed and optimization-based accuracy remains a central theme, with StreamSplat[0] leaning toward the former by exploiting learned scene priors and efficient splatting, while closely related works like SplineGS[22] explore temporal smoothness through spline-based motion modeling.

Claimed Contributions

StreamSplat framework for online dynamic 3D reconstruction

9 retrieved papers

The authors present StreamSplat, a fully feed-forward system that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting representations in an online manner, achieving real-time performance with a 1200× speedup over optimization-based methods.

9 retrieved papers

Probabilistic sampling mechanism for 3D Gaussian position prediction

4 retrieved papers

The authors propose a probabilistic position sampling strategy that predicts a truncated normal distribution for each 3D offset rather than direct regression. This approach captures geometric uncertainty and avoids local minima common in feed-forward models.

4 retrieved papers

Bidirectional deformation field with adaptive Gaussian fusion

2 retrieved papers

The authors introduce a bidirectional deformation field that models both forward and backward motion between consecutive frames, combined with an adaptive fusion mechanism based on time-dependent opacity. This enables robust cross-frame associations and maintains temporal coherence while naturally handling emerging and vanishing scene content.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos PDF

Lin, Chieh Hubert, LV Zhaoyang, C. Lin, Wu, Songyin, Zhaoyang Lv, Xu Zhen, Songyin Wu, Nguyen-Phuoc, Thu, Zhen Xu, Tseng, Hung-Yu, Thu Nguyen-Phuoc, Straub, Julian, Hung-Yu Tseng, Khan, Numair, Julian Straub, Xiao Lei, Numair Khan, Yang, Ming-Hsuan, Lei Xiao, Ren Yuheng, Ming-Hsuan Yang, Newcombe, Richard, Yuheng Ren, Dong Zhao, Richard Newcombe, Li, Zhengqin, Zhao Dong, Zhengqin Li (2025) • arXiv.org

[5] Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos PDF

Liang, Hanxue, Ren Jiawei, Hanxue Liang, Mirzaei, Ashkan, Jiawei Ren, Torralba, Antonio, Ashkan Mirzaei, Liu, Ziwei, Antonio Torralba, Gilitschenski, Igor, Ziwei Liu, Fidler, Sanja, Igor Gilitschenski, Oztireli, Cengiz, Sanja Fidler, Ling Huan, Cengiz Oztireli, Gojcic, Zan, Huan Ling, Huang Jia-hui, Zan Gojcic, Jiahui Huang (2024) • arXiv.org

[22] SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video PDF

Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello, Jaeho Moon, J. Bello, Jihyong Oh, Munchurl Kim (2024) • Computer Vision and Pattern Recognition

[24] QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos PDF

Girish, Sharath, Li Tianye, Sharath Girish, Mazumdar, Amrita, Tianye Li, Shrivastava Abhinav, Amrita Mazumdar, Luebke David, Abhinav Shrivastava, De Mello, Shalini, David Luebke, Shalini De Mello (2024) • Neural Information Processing Systems

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

StreamSplat framework for online dynamic 3D reconstruction

[5] Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos PDF

Cannot Refute

[15] Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos PDF

Cannot Refute

[51] SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos PDF

Cannot Refute

[52] Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views PDF

Cannot Refute

[53] PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video PDF

Cannot Refute

[55] Anysplat: Feed-forward 3d gaussian splatting from unconstrained views PDF

Cannot Refute

[56] MapAnything: Universal Feed-Forward Metric 3D Reconstruction PDF

Cannot Refute

[57] A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose PDF

Cannot Refute

[58] Large spatial model: End-to-end unposed images to semantic 3d PDF

Cannot Refute

Contribution

Probabilistic sampling mechanism for 3D Gaussian position prediction

[61] LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals PDF

Cannot Refute

[62] A causal bayesian network and probabilistic programming based reasoning framework for robot manipulation under uncertainty PDF

Cannot Refute

[63] Unsupervised learning of platform motion in synthetic aperture sonar PDF

Cannot Refute

[64] 3D GAUSSIAN SPLATTING FOR REAL TIME RADIANCE FIELD RENDERING USING INSTA360 CAMERA PDF

Cannot Refute

Contribution

Bidirectional deformation field with adaptive Gaussian fusion

[59] LSTM-Kalman Filter-Based Multi-Sensor Signal Fusion for UAV Altitude Prediction in Non-Gaussian Environments. PDF

Cannot Refute

[60] FutureGS: Structured Gaussian Fields for Future-Aware Dynamic Scene Modeling PDF

Cannot Refute

StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] DGS-LRM: Real-Time Deformable 3D Gaussian Reconstruction From Monocular Videos PDF

[5] Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos PDF

[22] SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video PDF

[24] QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos PDF

Contribution Analysis

StreamSplat framework for online dynamic 3D reconstruction

[5] Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos PDF

[15] Forge4D: Feed-Forward 4D Human Reconstruction and Interpolation from Uncalibrated Sparse-view Videos PDF

[51] SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos PDF

[52] Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views PDF

[53] PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video PDF

[55] Anysplat: Feed-forward 3d gaussian splatting from unconstrained views PDF

[56] MapAnything: Universal Feed-Forward Metric 3D Reconstruction PDF

[57] A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose PDF

[58] Large spatial model: End-to-end unposed images to semantic 3d PDF

Probabilistic sampling mechanism for 3D Gaussian position prediction

[61] LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals PDF

[62] A causal bayesian network and probabilistic programming based reasoning framework for robot manipulation under uncertainty PDF

[63] Unsupervised learning of platform motion in synthetic aperture sonar PDF

[64] 3D GAUSSIAN SPLATTING FOR REAL TIME RADIANCE FIELD RENDERING USING INSTA360 CAMERA PDF

Bidirectional deformation field with adaptive Gaussian fusion

[59] LSTM-Kalman Filter-Based Multi-Sensor Signal Fusion for UAV Altitude Prediction in Non-Gaussian Environments. PDF

[60] FutureGS: Structured Gaussian Fields for Future-Aware Dynamic Scene Modeling PDF

Table of Contents