Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

City generationView generation3DGSSatellite imageryDiffusion models

Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack of large-scale and high-quality real-world 3D scans for training generalizable generative models. In this paper, we take an alternative route to create large-scale 3D scenes by synergizing the readily available satellite imagery that supplies realistic coarse geometry and the open-domain diffusion model for creating high-quality close-up appearances. We propose Skyfall-GS, a novel hybrid framework that synthesizes immersive city-block scale 3D urban scenes by combining satellite reconstruction with diffusion refinement, eliminating the need for costly 3D annotations, also featuring real-time, immersive 3D exploration. We tailor a curriculum-driven iterative refinement strategy to progressively enhance geometric completeness and photorealistic textures. Extensive experiments demonstrate that Skyfall-GS provides improved cross-view consistent geometry and more realistic textures compared to state-of-the-art approaches.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Skyfall-GS, a hybrid framework that synthesizes city-block scale 3D urban scenes by combining satellite reconstruction with diffusion-based refinement. It resides in the 'Diffusion and GAN-Based 3D Urban Scene Generation' leaf, which contains eight papers including the original work. This leaf represents a moderately active research direction within the broader Neural Rendering and Generative 3D Synthesis branch, focusing on generative models that produce photorealistic or controllable 3D content from satellite imagery rather than classical photogrammetric pipelines.

The taxonomy tree reveals that the paper's leaf sits alongside two other neural rendering approaches: NeRF-based methods and cross-view synthesis techniques that generate street-level views from overhead imagery. Neighboring branches include geometric reconstruction methods (stereo and multi-view) and learning-based monocular depth estimation. Skyfall-GS bridges generative synthesis with geometric reconstruction by leveraging satellite-derived coarse geometry, distinguishing it from purely generative approaches like Sat2Scene or procedural methods like MagicCity that prioritize controllability over geometric fidelity.

Among thirty candidates examined, the core Skyfall-GS framework contribution shows one refutable candidate out of ten examined, suggesting some overlap with prior generative synthesis work. The open-domain diffusion refinement contribution and curriculum-learning strategy each examined ten candidates with zero refutations, indicating these methodological choices appear less directly anticipated in the limited search scope. The statistics reflect a focused but not exhaustive literature review, leaving open the possibility of additional relevant work beyond the top-thirty semantic matches.

Based on the limited search scope, the work appears to occupy a moderately explored niche within generative 3D urban synthesis, with the framework-level contribution showing some prior overlap while the refinement and curriculum strategies exhibit less direct precedent among examined candidates. The analysis covers top-thirty semantic matches and does not claim exhaustive coverage of the broader generative modeling or satellite reconstruction literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Synthesizing large-scale 3D urban scenes from satellite imagery. The field is organized around five main branches that reflect distinct methodological emphases and problem settings. Geometric 3D Reconstruction from Satellite Stereo and Multi-View Imagery focuses on classical photogrammetric pipelines that exploit multi-date or multi-angle observations to recover depth and surface models, often relying on stereo matching and structure-from-motion techniques. Learning-Based Monocular and Semantic 3D Reconstruction leverages deep networks to infer height, building footprints, or semantic labels from single overhead images, addressing scenarios where stereo pairs are unavailable. Neural Rendering and Generative 3D Synthesis from Satellite Imagery applies neural radiance fields, diffusion models, and GANs to produce photorealistic or controllable 3D content directly from satellite data. 3D City Modeling Pipelines and Applications encompasses end-to-end workflows that integrate reconstruction, vectorization, and level-of-detail generation for urban planning and simulation. Supporting Technologies and Datasets for Satellite-Based 3D Reconstruction provides the foundational benchmarks, open-source tools, and sensor fusion strategies that enable progress across all other branches. Recent work has seen a surge in generative and neural rendering approaches that move beyond traditional geometry-first pipelines. Within the Neural Rendering and Generative 3D Synthesis branch, diffusion and GAN-based methods such as Sat2Scene[11], Sat2city[10], and Sat2RealCity[22] explore how to hallucinate plausible street-level views or volumetric representations from overhead imagery, trading geometric precision for visual realism and controllability. Skyfall-GS[0] sits squarely in this cluster, emphasizing generative synthesis of urban scenes via Gaussian splatting conditioned on satellite inputs. Compared to Sat2Scene[11], which focuses on diffusion-driven view synthesis, Skyfall-GS[0] adopts an explicit 3D representation that may offer faster rendering and more direct geometric control. Meanwhile, works like MagicCity[36] and Citycraft[21] push generative modeling toward interactive urban design and procedural content creation, highlighting an ongoing tension between fidelity to real-world geometry and the flexibility needed for creative or planning applications.

Claimed Contributions

Skyfall-GS framework for synthesizing immersive 3D urban scenes from satellite imagery

Can Refute

10 retrieved papers

The authors propose Skyfall-GS, a novel hybrid framework that combines satellite-based 3D Gaussian Splatting reconstruction with diffusion model refinement to generate city-block scale 3D urban scenes. This approach eliminates the need for costly 3D annotations or street-level training data while enabling real-time interactive rendering.

10 retrieved papers

Can Refute

Open-domain refinement approach using pre-trained text-to-image diffusion models

10 retrieved papers

The method leverages pre-trained text-to-image diffusion models to hallucinate realistic appearances and complete occluded regions (such as building facades) without requiring training on domain-specific 3D datasets, providing better generalization compared to existing city generation methods.

10 retrieved papers

Curriculum-learning-based iterative refinement strategy

10 retrieved papers

The authors introduce a curriculum-driven iterative dataset update technique that progressively lowers camera viewpoints from sky to ground across optimization episodes. This strategy gradually reveals and refines previously occluded regions, improving geometric completeness and photorealistic textures.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[10] Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion PDF

Hua, Tongyan, Jiang, Lutao, Tongyan Hua, Chen, Ying-Cong, Lutao Jiang, Ying-Cong Chen, Wufan Zhao (2025)

[11] Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion PDF

Zuoyue Li, Zhenqiang Li, Zhaopeng Cui, Marc Pollefeys, Martin R. Oswald (2024)

[21] Citycraft: A real crafter for 3d city generation PDF

Deng, Jie, Chai, Wenhao, Jie Deng, Huang Jun-sheng, Wenhao Chai, Zhao, Zhonghan, Junsheng Huang, Huang Qixuan, Zhonghan Zhao, Gao MingYan, Qixuan Huang, Guo Jianshu, Mingyan Gao, Hao Sheng-yu, Jianshu Guo, Hu Wenhao, Shengyu Hao, Hwang, Jenq-Neng, Wenhao Hu, Li Xi, Jenq-Neng Hwang, Wang, Gaoang, Xi Li, Gaoang Wang (2024)

[22] Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery PDF

Yijie Kang, Xinliang Wang, Zhenyu Wu, Yifeng Shi, Hailong Zhu (2025)

[36] MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency PDF

X Yao, X Wang, H Wu, C Ping (2025)

[47] From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images PDF

Fei Yu, Yu Liu, Luyang Tang, Mingchao Sun, Zengye Ge, Rui Bu, Yuchao Jin, Haisen Zhao, He Sun, Yangyan Li, Mu Xu, Wenzheng Chen, Baoquan Chen (2025)

[50] Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models PDF

Wang Qing-yi, Liang, Yuebing, Qingyi Wang, Zheng, Yunhan, Yuebing Liang, Xu Kaiyuan, Yunhan Zheng, Zhao, Jinhua, Kaiyuan Xu, Wang Shenhao, Jinhua Zhao, Shenhao Wang (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Skyfall-GS framework for synthesizing immersive 3D urban scenes from satellite imagery

[11] Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion PDF

Can Refute

[10] Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion PDF

Cannot Refute

[13] Machine-learned 3D Building Vectorization from Satellite Imagery PDF

Cannot Refute

[21] Citycraft: A real crafter for 3d city generation PDF

Cannot Refute

[50] Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models PDF

Cannot Refute

[61] SAT-SKYLINES: 3D Building Generation from Satellite Imagery and Coarse Geometric Priors PDF

Cannot Refute

[62] Generative building feature estimation from satellite images PDF

Cannot Refute

[63] CaLiSa-NeRF: Neural Radiance Field with Pinhole Camera Images, LiDAR point clouds and Satellite Imagery for Urban Scene Representation PDF

Cannot Refute

[64] Remote sensing neural radiance fields for multi-view satellite photogrammetry PDF

Cannot Refute

[65] An Improved 3D Reconstruction Method for Satellite Images Based on Generative Adversarial Network Image Enhancement PDF

Cannot Refute

Contribution

Open-domain refinement approach using pre-trained text-to-image diffusion models

[51] ReconFusion: 3D Reconstruction with Diffusion Priors PDF

Cannot Refute

[52] Point cloud completion with pretrained text-to-image diffusion models PDF

Cannot Refute

[53] CAT3D: Create Anything in 3D with Multi-View Diffusion Models PDF

Cannot Refute

[54] Sith: Single-view textured human reconstruction with image-conditioned diffusion PDF

Cannot Refute

[55] Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation PDF

Cannot Refute

[56] Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models PDF

Cannot Refute

[57] Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior PDF

Cannot Refute

[58] Scalable 3D Captioning with Pretrained Models PDF

Cannot Refute

[59] Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction PDF

Cannot Refute

[60] PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models PDF

Cannot Refute

Contribution

Curriculum-learning-based iterative refinement strategy

[66] NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction PDF

Cannot Refute

[67] 3d semi-supervised learning with uncertainty-aware multi-view co-training PDF

Cannot Refute

[68] Long3r: Long sequence streaming 3d reconstruction PDF

Cannot Refute

[69] Real3D: Towards Scaling Large Reconstruction Models with Real Images PDF

Cannot Refute

[70] Pixel2Mesh++: 3D mesh generation and refinement from multi-view images PDF

Cannot Refute

[71] Geomvsnet: Learning multi-view stereo with geometry perception PDF

Cannot Refute

[72] Vistadream: Sampling multiview consistent images for single-view scene reconstruction PDF

Cannot Refute

[73] Multi-view stereo in the deep learning era: A comprehensive review PDF

Cannot Refute

[74] Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation PDF

Cannot Refute

[75] Pvp-recon: Progressive view planning via warping consistency for sparse-view surface reconstruction PDF

Cannot Refute

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[10] Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion PDF

[11] Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion PDF

[21] Citycraft: A real crafter for 3d city generation PDF

[22] Sat2RealCity: Geometry-Aware and Appearance-Controllable 3D Urban Generation from Satellite Imagery PDF

[36] MagicCity: Geometry-Aware 3D City Generation from Satellite Imagery with Multi-View Consistency PDF

[47] From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images PDF

[50] Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models PDF

Contribution Analysis

Skyfall-GS framework for synthesizing immersive 3D urban scenes from satellite imagery

[11] Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion PDF

[10] Sat2city: 3d city generation from a single satellite image with cascaded latent diffusion PDF

[13] Machine-learned 3D Building Vectorization from Satellite Imagery PDF

[21] Citycraft: A real crafter for 3d city generation PDF

[50] Generative AI for Urban Planning: Synthesizing Satellite Imagery via Diffusion Models PDF

[61] SAT-SKYLINES: 3D Building Generation from Satellite Imagery and Coarse Geometric Priors PDF

[62] Generative building feature estimation from satellite images PDF

[63] CaLiSa-NeRF: Neural Radiance Field with Pinhole Camera Images, LiDAR point clouds and Satellite Imagery for Urban Scene Representation PDF

[64] Remote sensing neural radiance fields for multi-view satellite photogrammetry PDF

[65] An Improved 3D Reconstruction Method for Satellite Images Based on Generative Adversarial Network Image Enhancement PDF

Open-domain refinement approach using pre-trained text-to-image diffusion models

[51] ReconFusion: 3D Reconstruction with Diffusion Priors PDF

[52] Point cloud completion with pretrained text-to-image diffusion models PDF

[53] CAT3D: Create Anything in 3D with Multi-View Diffusion Models PDF

[54] Sith: Single-view textured human reconstruction with image-conditioned diffusion PDF

[55] Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation PDF

[56] Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models PDF

[57] Make-it-3d: High-fidelity 3d creation from a single image with diffusion prior PDF

[58] Scalable 3D Captioning with Pretrained Models PDF

[59] Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction PDF

[60] PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models PDF

Curriculum-learning-based iterative refinement strategy

[66] NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction PDF

[67] 3d semi-supervised learning with uncertainty-aware multi-view co-training PDF

[68] Long3r: Long sequence streaming 3d reconstruction PDF

[69] Real3D: Towards Scaling Large Reconstruction Models with Real Images PDF

[70] Pixel2Mesh++: 3D mesh generation and refinement from multi-view images PDF

[71] Geomvsnet: Learning multi-view stereo with geometry perception PDF

[72] Vistadream: Sampling multiview consistent images for single-view scene reconstruction PDF

[73] Multi-view stereo in the deep learning era: A comprehensive review PDF

[74] Im-3d: Iterative multiview diffusion and reconstruction for high-quality 3d generation PDF

[75] Pvp-recon: Progressive view planning via warping consistency for sparse-view surface reconstruction PDF

Table of Contents