StylOS: Multi-View 3D Stylization with Single-Forward Gaussian Splatting

ICLR 2026 Conference SubmissionAnonymous Authors
3D Style Transfer3D Gaussian SplattingSingle-forward Stylization3D Reconstruction
Abstract:

We present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi- view collection, conditioned on a separate reference style image. Stylos synthesizes a stylized 3D Gaussian scene without per-scene optimization or precomputed poses, achieving geometry-aware, view-consistent stylization that generalizes to unseen categories, scenes, and styles. At its core, Stylos adopts a Transformer backbone with two pathways: geometry predictions retain self-attention to preserve geometric fidelity, while style is injected via global cross-attention to enforce visual consistency across views. With the addition of a voxel-based 3D style loss that aligns aggregated scene features to style statistics, Stylos enforces view-consistent stylization while preserving geometry. Experiments across multiple datasets demonstrate that Stylos delivers high-quality zero-shot stylization, highlighting the ef- fectiveness of global style–content coupling, the proposed 3D style loss, and the scalability of our framework from single view to large-scale multi-view settings. Our codes will be fully open-sourced soon.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

Stylos proposes a single-forward 3D Gaussian framework for style transfer that operates without per-scene optimization or precomputed poses, targeting geometry-aware view-consistent stylization. The paper resides in the 'Feed-Forward Gaussian Stylization' leaf, which contains only two papers total (including Stylos itself). This represents a sparse research direction within the broader Gaussian Splatting-Based Stylization branch, suggesting the feed-forward paradigm for Gaussian-based stylization remains relatively underexplored compared to optimization-heavy or NeRF-based alternatives.

The taxonomy reveals that Stylos sits within a larger ecosystem of Gaussian Splatting methods, including neighboring leaves like 'Reference-Based Gaussian Stylization' (three papers), 'Multi-Modal Style Conditioning' (three papers), and 'Text-Driven Gaussian Stylization' (two papers). These adjacent directions emphasize controllable editing, multi-encoder frameworks, or text guidance, whereas Feed-Forward Gaussian Stylization focuses on instant inference without iterative refinement. The broader Gaussian Splatting-Based Stylization branch excludes implicit volumetric methods like NeRF, positioning Stylos within explicit point-primitive representations that prioritize real-time rendering efficiency.

Among 24 candidates examined across three contributions, the 'single-forward framework' contribution shows two refutable candidates out of ten examined, indicating some prior work in feed-forward stylization exists within the limited search scope. The 'dual-pathway design' (zero refutations from ten candidates) and 'voxel-based 3D style loss' (zero refutations from four candidates) appear more distinctive within the examined literature. These statistics reflect a top-K semantic search plus citation expansion, not an exhaustive survey, so the presence of overlapping work in the feed-forward contribution suggests this aspect may be less novel, while the architectural and loss design choices appear more differentiated.

Based on the limited search scope of 24 candidates, Stylos appears to occupy a relatively sparse research niche (feed-forward Gaussian stylization) with some architectural novelty in its dual-pathway design and 3D style loss. The single-forward framework contribution encounters more prior overlap, suggesting this high-level paradigm may be less distinctive. The analysis does not cover exhaustive literature beyond top-K semantic matches, so conclusions remain provisional pending broader review.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
24
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: 3D stylization with view-consistent geometry preservation. The field has evolved around several complementary representation paradigms. Neural Radiance Field-Based Stylization methods leverage volumetric rendering to achieve smooth view synthesis while applying artistic transformations, with works like StyleSDF[1] and ArtNVG[5] exploring implicit scene encoding. Gaussian Splatting-Based Stylization has emerged as a faster alternative, using explicit point primitives that enable real-time rendering and efficient style transfer, exemplified by approaches such as StylizedGS[47] and GaussianBlender[46]. Meanwhile, 3D-Aware Generative Model Stylization and Diffusion-Based 3D Stylization branches harness powerful generative priors to guide stylization, with methods like AgileGAN3D[12] and DiffStyle360[50] demonstrating how pretrained models can inform geometry-aware transformations. Geometry-Aware Style Transfer focuses on preserving structural integrity during artistic manipulation, while Specialized Application Domains address targeted use cases such as avatars and urban scenes, and Architectural Components provide reusable mechanisms like attention modules and consistency losses. A central tension across these branches involves balancing stylization fidelity against geometric preservation and computational efficiency. Feed-forward methods within Gaussian Splatting aim for rapid inference without per-scene optimization, contrasting with iterative NeRF-based approaches that may achieve finer control at higher cost. StylOS[0] situates itself in the Feed-Forward Gaussian Stylization cluster, emphasizing efficient one-shot style application to Gaussian representations while maintaining multi-view consistency. This positions it alongside GaussianBlender[46], which similarly targets fast stylization of splatting scenes, but StylOS[0] distinguishes itself through its particular feed-forward architecture that avoids costly test-time refinement. Compared to optimization-heavy methods like View Consistent Editing[3] or geometry-encoding approaches such as Geometry Aware Encoder[4], StylOS[0] trades exhaustive per-scene tuning for generalization speed, reflecting broader debates about whether stylization pipelines should prioritize adaptability or immediacy.

Claimed Contributions

Shared-backbone design with dual pathways for geometry and style

The authors propose a Transformer architecture with two distinct pathways: one pathway uses self-attention for geometry predictions to maintain structural accuracy, while the other injects style information through global cross-attention to ensure consistent visual appearance across multiple views.

10 retrieved papers
Voxel-based 3D style loss

The authors introduce a novel style loss function that operates in voxel space by aligning multi-view aggregated 3D scene features with 2D style statistics, enforcing view-consistent stylization while preserving geometric structure.

4 retrieved papers
Single-forward 3D stylization framework (Stylos)

The authors develop Stylos, a feed-forward framework that performs 3D style transfer without per-scene optimization or precomputed camera poses, capable of processing variable numbers of input views and achieving zero-shot generalization to unseen categories, scenes, and styles.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Shared-backbone design with dual pathways for geometry and style

The authors propose a Transformer architecture with two distinct pathways: one pathway uses self-attention for geometry predictions to maintain structural accuracy, while the other injects style information through global cross-attention to ensure consistent visual appearance across multiple views.

Contribution

Voxel-based 3D style loss

The authors introduce a novel style loss function that operates in voxel space by aligning multi-view aggregated 3D scene features with 2D style statistics, enforcing view-consistent stylization while preserving geometric structure.

Contribution

Single-forward 3D stylization framework (Stylos)

The authors develop Stylos, a feed-forward framework that performs 3D style transfer without per-scene optimization or precomputed camera poses, capable of processing variable numbers of input views and achieving zero-shot generalization to unseen categories, scenes, and styles.