YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

3D Gaussian splattingfeedforward modelnovel view synthesispose-free

Fast and flexible 3D scene reconstruction from unstructured image collections remains a significant challenge. We present YoNoSplat, a feedforward model that reconstructs high-quality 3D Gaussian Splatting representations from an arbitrary number of images. Our model is highly versatile, operating effectively with both posed and unposed, calibrated and uncalibrated inputs. YoNoSplat predicts local Gaussians and camera poses for each view, which are aggregated into a global representation using either predicted or provided poses. To overcome the inherent difficulty of jointly learning 3D Gaussians and camera parameters, we introduce a novel mixing training strategy. This approach mitigates the entanglement between the two tasks by initially using ground-truth poses to aggregate local Gaussians and gradually transitioning to a mix of predicted and ground-truth poses, which prevents both training instability and exposure bias. We further resolve the scale ambiguity problem by a novel pairwise camera-distance normalization scheme and by embedding camera intrinsics into the network. Moreover, YoNoSplat also predicts intrinsic parameters, making it feasible for uncalibrated inputs. YoNoSplat demonstrates exceptional efficiency, reconstructing a scene from 100 views (at 280×518 resolution) in just 2.69 seconds on an NVIDIA GH200 GPU. It achieves state-of-the-art performance on standard benchmarks in both pose-free and pose-dependent settings. The code and pretrained models will be made public.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

YoNoSplat contributes a feedforward model that reconstructs 3D Gaussian Splatting representations from arbitrary numbers of images, handling both posed and unposed, calibrated and uncalibrated inputs. It resides in the 'Uncalibrated and Pose-Free Gaussian Reconstruction' leaf, which contains only three papers total (including YoNoSplat itself). This is a relatively sparse research direction within the broader Gaussian Splatting-Based Feedforward Reconstruction branch, indicating that joint learning of Gaussians and camera parameters without calibration remains an emerging and challenging area.

The taxonomy tree shows that YoNoSplat's leaf is nested under Multi-View Gaussian Reconstruction, which also includes sibling leaves for Sparse-View Gaussian Splatting (three papers assuming known poses) and Surround-View and Driving Scene Gaussian Reconstruction (one paper for vehicle-mounted scenarios). Neighboring branches address Enhanced Gaussian Reconstruction Techniques (three papers on voxel alignment and super-resolution) and Gaussian-Based Generative and Latent Modeling (three papers using Gaussians for generation). YoNoSplat diverges from these by tackling the uncalibrated setting, whereas sparse-view methods require known camera parameters and generative frameworks focus on synthesis rather than reconstruction from unstructured collections.

Among 29 candidates examined, the first contribution—'YoNoSplat: versatile feedforward model for 3D Gaussian Splatting'—shows one refutable candidate out of nine examined, suggesting some overlap with prior work in the uncalibrated Gaussian reconstruction space. The second contribution, 'Mix-forcing training strategy,' examined ten candidates with zero refutations, indicating this training approach appears more novel within the limited search scope. The third contribution, 'Scale ambiguity resolution through normalization and intrinsic prediction,' also examined ten candidates with no refutations, suggesting these technical solutions may be less directly addressed in the candidate pool.

Based on the top-29 semantic matches, YoNoSplat's core architecture shows some prior overlap, while its training strategy and scale-ambiguity solutions appear more distinctive. The sparse population of its taxonomy leaf (three papers) and the limited search scope mean this assessment captures only a snapshot of the most semantically similar work, not an exhaustive survey of all uncalibrated Gaussian reconstruction methods.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: feedforward 3D scene reconstruction from unstructured images. The field has evolved from traditional optimization-based pipelines toward modern feedforward architectures that predict geometry in a single pass. The taxonomy reveals several major branches: Gaussian Splatting-Based Feedforward Reconstruction leverages explicit point primitives for efficient rendering and reconstruction, while Volumetric and Implicit Feedforward Reconstruction employs neural fields or voxel grids for dense scene representation. Unified and Multi-Task Feedforward Frameworks integrate multiple objectives such as depth, pose, and appearance into joint models, whereas Traditional and Optimization-Based Reconstruction captures classical structure-from-motion and bundle adjustment methods. Surveys, Reviews, and Methodological Overviews (e.g., Feedforward Review[7], Deep Learning Survey[12]) synthesize progress across these paradigms, and Non-Visual and Alternative Sensing Modalities explore radar, WiFi, and other unconventional inputs (WiFi Scene Reconstruction[18], Radar Face Reconstruction[20]). Within Gaussian splatting approaches, a particularly active line addresses multi-view reconstruction with varying degrees of camera calibration. Works like MVSplat[5] and Flash3d[1] assume known or partially known poses, enabling robust feedforward prediction of splat parameters from sparse views. In contrast, YoNoSplat[0] tackles the harder uncalibrated and pose-free setting, jointly inferring camera geometry and 3D Gaussians without prior calibration—a direction also explored by Pref3r[15] and UniForward[8]. This uncalibrated branch confronts fundamental ambiguities in scale and alignment that calibrated methods sidestep, yet it promises greater flexibility for in-the-wild image collections. YoNoSplat[0] sits squarely in this niche, emphasizing end-to-end learning where both scene structure and camera parameters emerge from unstructured input, distinguishing it from neighbors that rely on at least partial pose supervision.

Claimed Contributions

YoNoSplat: versatile feedforward model for 3D Gaussian Splatting

Can Refute

9 retrieved papers

The authors introduce YoNoSplat, a feedforward model that reconstructs 3D Gaussian Splatting representations from an arbitrary number of unposed and uncalibrated images. The model operates effectively in both pose-free and pose-dependent settings, as well as with calibrated and uncalibrated inputs, achieving state-of-the-art performance across multiple benchmarks.

9 retrieved papers

Can Refute

Mix-forcing training strategy

10 retrieved papers

The authors propose a novel mix-forcing training strategy that addresses the entanglement between learning 3D Gaussians and camera parameters. The approach begins with teacher-forcing using ground-truth poses and gradually transitions to using a mixture of predicted and ground-truth poses, preventing training instability and exposure bias.

10 retrieved papers

Scale ambiguity resolution through normalization and intrinsic prediction

10 retrieved papers

The authors resolve scale ambiguity through two mechanisms: a pairwise camera-distance normalization scheme that normalizes scenes by maximum pairwise distance between camera centers, and an Intrinsic Condition Embedding module that predicts and conditions on camera intrinsics, enabling reconstruction from uncalibrated images.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images PDF

Tian Qi-jian, Tan Xin, Qijian Tian, Gong, Jingyu, Xin Tan, Xie Yuan, Jing-yu Gong, Ma Lizhuang, Yuan Xie, Lizhuang Ma (2025)

[15] Pref3r: Pose-free feed-forward 3d gaussian splatting from variable-length image sequence PDF

Chen, Zequn, Yang Jie-zhi, Zequn Chen, Yang Heng, Jiezhi Yang, Heng Yang (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

YoNoSplat: versatile feedforward model for 3D Gaussian Splatting

[71] Anysplat: Feed-forward 3d gaussian splatting from unconstrained views PDF

Can Refute

[5] Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images PDF

Cannot Refute

[8] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images PDF

Cannot Refute

[11] Wonderland: Navigating 3d scenes from a single image PDF

Cannot Refute

[70] Uni3r: Unified 3d reconstruction and semantic understanding via generalizable gaussian splatting from unposed multi-view images PDF

Cannot Refute

[72] Advances in feed-forward 3d reconstruction and view synthesis: A survey PDF

Cannot Refute

[73] Pf3plat: Pose-free feed-forward 3d gaussian splatting PDF

Cannot Refute

[74] ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views PDF

Cannot Refute

[75] Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images PDF

Cannot Refute

Contribution

Mix-forcing training strategy

[27] Mapanything: Universal feed-forward metric 3d reconstruction PDF

Cannot Refute

[51] Vggsfm: Visual geometry grounded deep structure from motion PDF

Cannot Refute

[52] RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction PDF

Cannot Refute

[53] TRICKY 2025 Challenge on Monocular Depth from Images of Specular and Transparent Surfaces PDF

Cannot Refute

[54] Pose adaptive dual mixup for few-shot single-view 3d reconstruction PDF

Cannot Refute

[55] Cooperative holistic scene understanding: Unifying 3d object, layout, and camera pose estimation PDF

Cannot Refute

[56] Fusing the old with the new: Learning relative camera pose with geometry-guided uncertainty PDF

Cannot Refute

[57] A Hybrid Approach for Cross-Modality Pose Estimation Between Image and Point Cloud PDF

Cannot Refute

[58] Guiding local feature matching with surface curvature PDF

Cannot Refute

[59] Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network PDF

Cannot Refute

Contribution

Scale ambiguity resolution through normalization and intrinsic prediction

[30] Rig3R: Rig-Aware Conditioning and Discovery for 3D Reconstruction PDF

Cannot Refute

[60] : Permutation-Equivariant Visual Geometry Learning PDF

Cannot Refute

[61] Multi-view stereo algorithms based on deep learning: a survey PDF

Cannot Refute

[62] Variational multi-scale representation for estimating uncertainty in 3d gaussian splatting PDF

Cannot Refute

[63] Uncertainty-aware lidar-visual 3d reconstruction PDF

Cannot Refute

[64] Multi-view scene flow estimation: A view centered variational approach PDF

Cannot Refute

[65] Scale accuracy evaluation of image-based 3D reconstruction strategies using laser photogrammetry PDF

Cannot Refute

[66] MonSter++: Unified Stereo Matching, Multi-view Stereo, and Real-time Stereo with Monodepth Priors PDF

Cannot Refute

[67] A Simple Strategy for Body Estimation from Partial-View Images PDF

Cannot Refute

[68] Underwater image-based 3D reconstruction with quality estimation PDF

Cannot Refute

YoNoSplat: You Only Need One Model for Feedforward 3D Gaussian Splatting

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images PDF

[15] Pref3r: Pose-free feed-forward 3d gaussian splatting from variable-length image sequence PDF

Contribution Analysis

YoNoSplat: versatile feedforward model for 3D Gaussian Splatting

[71] Anysplat: Feed-forward 3d gaussian splatting from unconstrained views PDF

[5] Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images PDF

[8] UniForward: Unified 3D Scene and Semantic Field Reconstruction via Feed-Forward Gaussian Splatting from Only Sparse-View Images PDF

[11] Wonderland: Navigating 3d scenes from a single image PDF

[70] Uni3r: Unified 3d reconstruction and semantic understanding via generalizable gaussian splatting from unposed multi-view images PDF

[72] Advances in feed-forward 3d reconstruction and view synthesis: A survey PDF

[73] Pf3plat: Pose-free feed-forward 3d gaussian splatting PDF

[74] ProSplat: Improved Feed-Forward 3D Gaussian Splatting for Wide-Baseline Sparse Views PDF

[75] Gaussian graph network: Learning efficient and generalizable gaussian representations from multi-view images PDF

Mix-forcing training strategy

[27] Mapanything: Universal feed-forward metric 3d reconstruction PDF

[51] Vggsfm: Visual geometry grounded deep structure from motion PDF

[52] RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction PDF

[53] TRICKY 2025 Challenge on Monocular Depth from Images of Specular and Transparent Surfaces PDF

[54] Pose adaptive dual mixup for few-shot single-view 3d reconstruction PDF

[55] Cooperative holistic scene understanding: Unifying 3d object, layout, and camera pose estimation PDF

[56] Fusing the old with the new: Learning relative camera pose with geometry-guided uncertainty PDF

[57] A Hybrid Approach for Cross-Modality Pose Estimation Between Image and Point Cloud PDF

[58] Guiding local feature matching with surface curvature PDF

[59] Joint estimation of depth and motion from a monocular endoscopy image sequence using a multi-loss rebalancing network PDF

Scale ambiguity resolution through normalization and intrinsic prediction

[30] Rig3R: Rig-Aware Conditioning and Discovery for 3D Reconstruction PDF

[60] : Permutation-Equivariant Visual Geometry Learning PDF

[61] Multi-view stereo algorithms based on deep learning: a survey PDF

[62] Variational multi-scale representation for estimating uncertainty in 3d gaussian splatting PDF

[63] Uncertainty-aware lidar-visual 3d reconstruction PDF

[64] Multi-view scene flow estimation: A view centered variational approach PDF

[65] Scale accuracy evaluation of image-based 3D reconstruction strategies using laser photogrammetry PDF

[66] MonSter++: Unified Stereo Matching, Multi-view Stereo, and Real-time Stereo with Monodepth Priors PDF

[67] A Simple Strategy for Body Estimation from Partial-View Images PDF

[68] Underwater image-based 3D reconstruction with quality estimation PDF

Table of Contents