StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

3D Style Transfer3D Gaussian SplattingSingle-forward Stylization3D Reconstruction

We present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi- view collection, conditioned on a separate reference style image. Stylos synthesizes a stylized 3D Gaussian scene without per-scene optimization or precomputed poses, achieving geometry-aware, view-consistent stylization that generalizes to unseen categories, scenes, and styles. At its core, Stylos adopts a Transformer backbone with two pathways: geometry predictions retain self-attention to preserve geometric fidelity, while style is injected via global cross-attention to enforce visual consistency across views. With the addition of a voxel-based 3D style loss that aligns aggregated scene features to style statistics, Stylos enforces view-consistent stylization while preserving geometry. Experiments across multiple datasets demonstrate that Stylos delivers high-quality zero-shot stylization, highlighting the ef- fectiveness of global style–content coupling, the proposed 3D style loss, and the scalability of our framework from single view to large-scale multi-view settings. Our codes will be fully open-sourced soon.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

Stylos proposes a single-forward 3D Gaussian framework for style transfer that operates without per-scene optimization or precomputed poses, targeting geometry-aware view-consistent stylization. The paper resides in the 'Feed-Forward Gaussian Stylization' leaf, which contains only two papers total (including Stylos itself). This represents a sparse research direction within the broader Gaussian Splatting-Based Stylization branch, suggesting the feed-forward paradigm for Gaussian-based stylization remains relatively underexplored compared to optimization-heavy or NeRF-based alternatives.

The taxonomy reveals that Stylos sits within a larger ecosystem of Gaussian Splatting methods, including neighboring leaves like 'Reference-Based Gaussian Stylization' (three papers), 'Multi-Modal Style Conditioning' (three papers), and 'Text-Driven Gaussian Stylization' (two papers). These adjacent directions emphasize controllable editing, multi-encoder frameworks, or text guidance, whereas Feed-Forward Gaussian Stylization focuses on instant inference without iterative refinement. The broader Gaussian Splatting-Based Stylization branch excludes implicit volumetric methods like NeRF, positioning Stylos within explicit point-primitive representations that prioritize real-time rendering efficiency.

Among 24 candidates examined across three contributions, the 'single-forward framework' contribution shows two refutable candidates out of ten examined, indicating some prior work in feed-forward stylization exists within the limited search scope. The 'dual-pathway design' (zero refutations from ten candidates) and 'voxel-based 3D style loss' (zero refutations from four candidates) appear more distinctive within the examined literature. These statistics reflect a top-K semantic search plus citation expansion, not an exhaustive survey, so the presence of overlapping work in the feed-forward contribution suggests this aspect may be less novel, while the architectural and loss design choices appear more differentiated.

Based on the limited search scope of 24 candidates, Stylos appears to occupy a relatively sparse research niche (feed-forward Gaussian stylization) with some architectural novelty in its dual-pathway design and 3D style loss. The single-forward framework contribution encounters more prior overlap, suggesting this high-level paradigm may be less distinctive. The analysis does not cover exhaustive literature beyond top-K semantic matches, so conclusions remain provisional pending broader review.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: 3D stylization with view-consistent geometry preservation. The field has evolved around several complementary representation paradigms. Neural Radiance Field-Based Stylization methods leverage volumetric rendering to achieve smooth view synthesis while applying artistic transformations, with works like StyleSDF[1] and ArtNVG[5] exploring implicit scene encoding. Gaussian Splatting-Based Stylization has emerged as a faster alternative, using explicit point primitives that enable real-time rendering and efficient style transfer, exemplified by approaches such as StylizedGS[47] and GaussianBlender[46]. Meanwhile, 3D-Aware Generative Model Stylization and Diffusion-Based 3D Stylization branches harness powerful generative priors to guide stylization, with methods like AgileGAN3D[12] and DiffStyle360[50] demonstrating how pretrained models can inform geometry-aware transformations. Geometry-Aware Style Transfer focuses on preserving structural integrity during artistic manipulation, while Specialized Application Domains address targeted use cases such as avatars and urban scenes, and Architectural Components provide reusable mechanisms like attention modules and consistency losses. A central tension across these branches involves balancing stylization fidelity against geometric preservation and computational efficiency. Feed-forward methods within Gaussian Splatting aim for rapid inference without per-scene optimization, contrasting with iterative NeRF-based approaches that may achieve finer control at higher cost. StylOS[0] situates itself in the Feed-Forward Gaussian Stylization cluster, emphasizing efficient one-shot style application to Gaussian representations while maintaining multi-view consistency. This positions it alongside GaussianBlender[46], which similarly targets fast stylization of splatting scenes, but StylOS[0] distinguishes itself through its particular feed-forward architecture that avoids costly test-time refinement. Compared to optimization-heavy methods like View Consistent Editing[3] or geometry-encoding approaches such as Geometry Aware Encoder[4], StylOS[0] trades exhaustive per-scene tuning for generalization speed, reflecting broader debates about whether stylization pipelines should prioritize adaptability or immediacy.

Claimed Contributions

Shared-backbone design with dual pathways for geometry and style

10 retrieved papers

The authors propose a Transformer architecture with two distinct pathways: one pathway uses self-attention for geometry predictions to maintain structural accuracy, while the other injects style information through global cross-attention to ensure consistent visual appearance across multiple views.

10 retrieved papers

Voxel-based 3D style loss

4 retrieved papers

The authors introduce a novel style loss function that operates in voxel space by aligning multi-view aggregated 3D scene features with 2D style statistics, enforcing view-consistent stylization while preserving geometric structure.

4 retrieved papers

Single-forward 3D stylization framework (Stylos)

Can Refute

10 retrieved papers

The authors develop Stylos, a feed-forward framework that performs 3D style transfer without per-scene optimization or precomputed camera poses, capable of processing variable numbers of input views and achieving zero-shot generalization to unseen categories, scenes, and styles.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[46] GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces PDF

Melis Ocal, Xiaoyan Xing, Yue Li, Ngo Anh Vien, Sezer Karaoglu, Theo Gevers (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Shared-backbone design with dual pathways for geometry and style

[51] Twins transformer: rolling bearing fault diagnosis based on cross-attention fusion of time and frequency domain features PDF

Cannot Refute

[52] A U-shaped convolution-aided transformer with double attention for hyperspectral image classification PDF

Cannot Refute

[53] Dlnet: A dual-level network with self-and cross-attention for high-resolution remote sensing segmentation PDF

Cannot Refute

[54] Enhancing Facial Beauty Prediction via a Dual-Pathway Hybrid Architecture Integrating Vmamba and ViT: DE Boukhari, A. Chemsa PDF

Cannot Refute

[55] Dual-modal 3d human pose estimation using insole foot pressure sensors PDF

Cannot Refute

[56] Dual-Stream Siamese Vision Transformer With Mutual Attention For Radar Gait Verification PDF

Cannot Refute

[57] Tsformer: Tracking structure transformer for image inpainting PDF

Cannot Refute

[58] PanFormer: A Transformer Based Model for Pan-Sharpening PDF

Cannot Refute

[59] HierVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment PDF

Cannot Refute

[60] Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer PDF

Cannot Refute

Contribution

Voxel-based 3D style loss

[5] ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization PDF

Cannot Refute

[69] Standardization of gram matrix for improved 3D neural style transfer PDF

Cannot Refute

[70] Sttcnerf: Style Transfer of Neural Radiance Fields for 3d Scene Based on Texture Consistency Constraint PDF

Cannot Refute

[71] Three-Dimensional Voxel-Based Neural Style Transfer and Quantification PDF

Cannot Refute

Contribution

Single-forward 3D stylization framework (Stylos)

[35] Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles PDF

Can Refute

[67] G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles PDF

Can Refute

[19] Towards multi-view consistent style transfer with one-step diffusion via vision conditioning PDF

Cannot Refute

[61] Uni3r: Unified 3d reconstruction and semantic understanding via generalizable gaussian splatting from unposed multi-view images PDF

Cannot Refute

[62] PanSt3R: Multi-view Consistent Panoptic Segmentation PDF

Cannot Refute

[63] Drag view: Generalizable novel view synthesis with unposed imagery PDF

Cannot Refute

[64] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views PDF

Cannot Refute

[65] UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views PDF

Cannot Refute

[66] PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence PDF

Cannot Refute

[68] Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration PDF

Cannot Refute

StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[46] GaussianBlender: Instant Stylization of 3D Gaussians with Disentangled Latent Spaces PDF

Contribution Analysis

Shared-backbone design with dual pathways for geometry and style

[51] Twins transformer: rolling bearing fault diagnosis based on cross-attention fusion of time and frequency domain features PDF

[52] A U-shaped convolution-aided transformer with double attention for hyperspectral image classification PDF

[53] Dlnet: A dual-level network with self-and cross-attention for high-resolution remote sensing segmentation PDF

[54] Enhancing Facial Beauty Prediction via a Dual-Pathway Hybrid Architecture Integrating Vmamba and ViT: DE Boukhari, A. Chemsa PDF

[55] Dual-modal 3d human pose estimation using insole foot pressure sensors PDF

[56] Dual-Stream Siamese Vision Transformer With Mutual Attention For Radar Gait Verification PDF

[57] Tsformer: Tracking structure transformer for image inpainting PDF

[58] PanFormer: A Transformer Based Model for Pan-Sharpening PDF

[59] HierVL: Semi-Supervised Segmentation leveraging Hierarchical Vision-Language Synergy with Dynamic Text-Spatial Query Alignment PDF

[60] Joint Classification of Hyperspectral Images and LiDAR Data Based on Dual-Branch Transformer PDF

Voxel-based 3D style loss

[5] ArtNVG: Content-Style Separated Artistic Neighboring-View Gaussian Stylization PDF

[69] Standardization of gram matrix for improved 3D neural style transfer PDF

[70] Sttcnerf: Style Transfer of Neural Radiance Fields for 3d Scene Based on Texture Consistency Constraint PDF

[71] Three-Dimensional Voxel-Based Neural Style Transfer and Quantification PDF

Single-forward 3D stylization framework (Stylos)

[35] Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles PDF

[67] G3DST: Generalizing 3D Style Transfer with Neural Radiance Fields across Scenes and Styles PDF

[19] Towards multi-view consistent style transfer with one-step diffusion via vision conditioning PDF

[61] Uni3r: Unified 3d reconstruction and semantic understanding via generalizable gaussian splatting from unposed multi-view images PDF

[62] PanSt3R: Multi-view Consistent Panoptic Segmentation PDF

[63] Drag view: Generalizable novel view synthesis with unposed imagery PDF

[64] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views PDF

[65] UFV-Splatter: Pose-Free Feed-Forward 3D Gaussian Splatting Adapted to Unfavorable Views PDF

[66] PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence PDF

[68] Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration PDF

Table of Contents