YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting
Overview
Overall Novelty Assessment
YoNoSplat contributes a feedforward model that reconstructs 3D Gaussian Splatting representations from arbitrary numbers of images, handling both posed and unposed, calibrated and uncalibrated inputs. It resides in the 'Uncalibrated and Pose-Free Gaussian Reconstruction' leaf, which contains only three papers total (including YoNoSplat itself). This is a relatively sparse research direction within the broader Gaussian Splatting-Based Feedforward Reconstruction branch, indicating that joint learning of Gaussians and camera parameters without calibration remains an emerging and challenging area.
The taxonomy tree shows that YoNoSplat's leaf is nested under Multi-View Gaussian Reconstruction, which also includes sibling leaves for Sparse-View Gaussian Splatting (three papers assuming known poses) and Surround-View and Driving Scene Gaussian Reconstruction (one paper for vehicle-mounted scenarios). Neighboring branches address Enhanced Gaussian Reconstruction Techniques (three papers on voxel alignment and super-resolution) and Gaussian-Based Generative and Latent Modeling (three papers using Gaussians for generation). YoNoSplat diverges from these by tackling the uncalibrated setting, whereas sparse-view methods require known camera parameters and generative frameworks focus on synthesis rather than reconstruction from unstructured collections.
Among 29 candidates examined, the first contribution—'YoNoSplat: versatile feedforward model for 3D Gaussian Splatting'—shows one refutable candidate out of nine examined, suggesting some overlap with prior work in the uncalibrated Gaussian reconstruction space. The second contribution, 'Mix-forcing training strategy,' examined ten candidates with zero refutations, indicating this training approach appears more novel within the limited search scope. The third contribution, 'Scale ambiguity resolution through normalization and intrinsic prediction,' also examined ten candidates with no refutations, suggesting these technical solutions may be less directly addressed in the candidate pool.
Based on the top-29 semantic matches, YoNoSplat's core architecture shows some prior overlap, while its training strategy and scale-ambiguity solutions appear more distinctive. The sparse population of its taxonomy leaf (three papers) and the limited search scope mean this assessment captures only a snapshot of the most semantically similar work, not an exhaustive survey of all uncalibrated Gaussian reconstruction methods.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors introduce YoNoSplat, a feedforward model that reconstructs 3D Gaussian Splatting representations from an arbitrary number of unposed and uncalibrated images. The model operates effectively in both pose-free and pose-dependent settings, as well as with calibrated and uncalibrated inputs, achieving state-of-the-art performance across multiple benchmarks.
The authors propose a novel mix-forcing training strategy that addresses the entanglement between learning 3D Gaussians and camera parameters. The approach begins with teacher-forcing using ground-truth poses and gradually transitions to using a mixture of predicted and ground-truth poses, preventing training instability and exposure bias.
The authors resolve scale ambiguity through two mechanisms: a pairwise camera-distance normalization scheme that normalizes scenes by maximum pairwise distance between camera centers, and an Intrinsic Condition Embedding module that predicts and conditions on camera intrinsics, enabling reconstruction from uncalibrated images.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[8] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images PDF
[15] Pref3r: Pose-free feed-forward 3d gaussian splatting from variable-length image sequence PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
YoNoSplat: versatile feedforward model for 3D Gaussian Splatting
The authors introduce YoNoSplat, a feedforward model that reconstructs 3D Gaussian Splatting representations from an arbitrary number of unposed and uncalibrated images. The model operates effectively in both pose-free and pose-dependent settings, as well as with calibrated and uncalibrated inputs, achieving state-of-the-art performance across multiple benchmarks.
[71] Anysplat: Feed-forward 3d gaussian splatting from unconstrained views PDF
[5] Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images PDF
[8] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images PDF
[11] Wonderland: Navigating 3d scenes from a single image PDF
[70] Uni3r: Unified 3d reconstruction and semantic understanding via generalizable gaussian splatting from unposed multi-view images PDF
[72] Advances in feed-forward 3d reconstruction and view synthesis: A survey PDF
[73] Pf3plat: Pose-free feed-forward 3d gaussian splatting PDF
[74] ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views PDF
[75] Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images PDF
Mix-forcing training strategy
The authors propose a novel mix-forcing training strategy that addresses the entanglement between learning 3D Gaussians and camera parameters. The approach begins with teacher-forcing using ground-truth poses and gradually transitions to using a mixture of predicted and ground-truth poses, preventing training instability and exposure bias.
[27] Mapanything: Universal feed-forward metric 3d reconstruction PDF
[51] Vggsfm: Visual geometry grounded deep structure from motion PDF
[52] RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction PDF
[53] TRICKY 2025 Challenge on Monocular Depth from Images of Specular and Transparent Surfaces PDF
[54] Pose adaptive dual mixup for few-shot single-view 3d reconstruction PDF
[55] Cooperative holistic scene understanding: Unifying 3d object, layout, and camera pose estimation PDF
[56] Fusing the old with the new: Learning relative camera pose with geometry-guided uncertainty PDF
[57] A Hybrid Approach for Cross-Modality Pose Estimation Between Image and Point Cloud PDF
[58] Guiding local feature matching with surface curvature PDF
[59] Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network PDF
Scale ambiguity resolution through normalization and intrinsic prediction
The authors resolve scale ambiguity through two mechanisms: a pairwise camera-distance normalization scheme that normalizes scenes by maximum pairwise distance between camera centers, and an Intrinsic Condition Embedding module that predicts and conditions on camera intrinsics, enabling reconstruction from uncalibrated images.