StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams

ICLR 2026 Conference SubmissionAnonymous Authors
3D Gaussian Splatting; Monocular Video Reconstruction
Abstract:

Real-time reconstruction of dynamic 3D scenes from uncalibrated video streams demands robust online methods that recover scene dynamics from sparse observations under strict latency and memory constraints. Yet most dynamic reconstruction methods rely on hours of per-scene optimization under full-sequence access, limiting practical deployment. In this work, we introduce StreamSplat, a fully feed-forward framework that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting (3DGS) representations in an online manner. It is achieved via three key technical innovations: 1) a probabilistic sampling mechanism that robustly predicts 3D Gaussians from uncalibrated inputs; 2) a bidirectional deformation field that yields reliable associations across frames and mitigates long-term error accumulation; 3) an adaptive Gaussian fusion operation that propagates persistent Gaussians while handling emerging and vanishing ones. Extensive experiments on standard dynamic and static benchmarks demonstrate that StreamSplat achieves state-of-the-art reconstruction quality and dynamic scene modeling. Uniquely, our method supports the online reconstruction of arbitrarily long video streams with a 1200×1200\times speedup over optimization-based methods. Our code and models will be made publicly available.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

StreamSplat introduces a feed-forward framework for online dynamic 3D reconstruction using Gaussian Splatting representations, processing uncalibrated video streams without per-scene optimization. The paper resides in the 'Real-Time Dynamic Gaussian Splatting' leaf, which contains five papers total, indicating a moderately populated but emerging research direction. This leaf sits within the broader 'Feed-Forward Dynamic Scene Reconstruction' branch, distinguishing itself from optimization-based methods that require iterative refinement. The focus on streaming input and online adaptation positions StreamSplat at the intersection of real-time performance and dynamic scene modeling.

The taxonomy reveals neighboring research directions that contextualize StreamSplat's contributions. Adjacent leaves include 'Generative Model-Based 3D Reconstruction' (leveraging diffusion priors) and 'Multi-Human 4D Reconstruction' (specialized for human subjects), both under the same feed-forward parent branch. The 'Incremental and Online Reconstruction' branch contains methods like dense volumetric reconstruction and online human-scene reconstruction, which share the streaming constraint but differ in representation choice (volumetric vs. Gaussian). StreamSplat's uncalibrated input handling also connects to the 'Uncalibrated Reconstruction Techniques' branch, though that category emphasizes augmented reality applications rather than dynamic scene modeling.

Among fifteen candidates examined across three contributions, none were identified as clearly refuting StreamSplat's novelty. The core framework (Contribution 1) examined nine candidates with zero refutable overlaps, suggesting limited prior work on fully feed-forward, online Gaussian splatting for dynamic scenes. The probabilistic sampling mechanism (Contribution 2) and bidirectional deformation field with adaptive fusion (Contribution 3) examined four and two candidates respectively, also without refutation. This limited search scope—fifteen papers from semantic retrieval—indicates that while no direct precedents emerged, the analysis does not exhaustively cover all related work in real-time reconstruction or deformation modeling.

Given the constrained literature search and the moderately populated taxonomy leaf, StreamSplat appears to occupy a distinct niche within real-time dynamic Gaussian splatting. The absence of refutable candidates among fifteen examined papers suggests technical differentiation from sibling works, though the small sample size precludes definitive claims about field-wide novelty. The combination of online processing, uncalibrated input, and adaptive Gaussian fusion distinguishes StreamSplat from optimization-heavy or batch-processing alternatives, but broader validation against the full corpus of dynamic reconstruction methods remains necessary.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
15
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: online dynamic 3D reconstruction from uncalibrated video streams. The field addresses the challenge of recovering time-varying geometry from video without prior camera calibration, spanning a diverse set of methodological branches. Feed-forward dynamic scene reconstruction methods emphasize rapid, single-pass inference using learned priors, often leveraging neural representations or Gaussian splatting for real-time performance. Optimization-based approaches iteratively refine geometry and motion through energy minimization, trading speed for accuracy. Incremental and online reconstruction techniques process frames sequentially, maintaining consistency across time, while static human reconstruction from monocular video focuses on capturing detailed body shape and pose from single-camera setups. Sparse feature-based methods rely on keypoint tracking and structure-from-motion pipelines, whereas depth-guided neural reconstruction integrates depth sensors or learned depth cues to regularize geometry. Specialized application branches target domains such as medical imaging, architectural scenes, or aerial footage, and motion and dynamics estimation explicitly models temporal deformations. Uncalibrated reconstruction techniques handle unknown or varying intrinsics, a critical capability for casual video capture. Recent work has concentrated on real-time dynamic Gaussian splatting, where methods like StreamSplat[0], DGS-LRM[4], and SplineGS[22] push the frontier of feed-forward reconstruction by representing scenes as collections of evolving 3D Gaussians. StreamSplat[0] emphasizes streaming video input and online adaptation, positioning itself within the real-time dynamic Gaussian splatting cluster alongside neighbors such as Bullet-Time Reconstruction[5] and QUEEN[24], which explore multi-view synchronization and quality-efficiency trade-offs. In contrast, optimization-based pipelines like Lyra[1] and incremental methods such as MegaSaM[6] prioritize geometric fidelity over speed, iteratively refining reconstructions as new frames arrive. The tension between feed-forward speed and optimization-based accuracy remains a central theme, with StreamSplat[0] leaning toward the former by exploiting learned scene priors and efficient splatting, while closely related works like SplineGS[22] explore temporal smoothness through spline-based motion modeling.

Claimed Contributions

StreamSplat framework for online dynamic 3D reconstruction

The authors present StreamSplat, a fully feed-forward system that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting representations in an online manner, achieving real-time performance with a 1200× speedup over optimization-based methods.

9 retrieved papers
Probabilistic sampling mechanism for 3D Gaussian position prediction

The authors propose a probabilistic position sampling strategy that predicts a truncated normal distribution for each 3D offset rather than direct regression. This approach captures geometric uncertainty and avoids local minima common in feed-forward models.

4 retrieved papers
Bidirectional deformation field with adaptive Gaussian fusion

The authors introduce a bidirectional deformation field that models both forward and backward motion between consecutive frames, combined with an adaptive fusion mechanism based on time-dependent opacity. This enables robust cross-frame associations and maintains temporal coherence while naturally handling emerging and vanishing scene content.

2 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

StreamSplat framework for online dynamic 3D reconstruction

The authors present StreamSplat, a fully feed-forward system that instantly transforms uncalibrated video streams of arbitrary length into dynamic 3D Gaussian Splatting representations in an online manner, achieving real-time performance with a 1200× speedup over optimization-based methods.

Contribution

Probabilistic sampling mechanism for 3D Gaussian position prediction

The authors propose a probabilistic position sampling strategy that predicts a truncated normal distribution for each 3D offset rather than direct regression. This approach captures geometric uncertainty and avoids local minima common in feed-forward models.

Contribution

Bidirectional deformation field with adaptive Gaussian fusion

The authors introduce a bidirectional deformation field that models both forward and backward motion between consecutive frames, combined with an adaptive fusion mechanism based on time-dependent opacity. This enables robust cross-frame associations and maintains temporal coherence while naturally handling emerging and vanishing scene content.