DA: Depth Anything in Any Direction
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
A pipeline that converts perspective RGB-depth pairs into full panoramic data through Perspective-to-Equirectangular projection and panoramic out-painting using FLUX-I2P. This engine scales up panoramic training data by approximately 10 times, significantly improving zero-shot generalization.
A Vision Transformer backbone that uses cross-attention with spherical embeddings derived from azimuth and polar angles. Image features attend to fixed spherical embeddings to produce distortion-aware representations, mitigating spherical distortions without requiring auxiliary modules or cubemap fusion.
A thorough evaluation framework comparing both zero-shot and in-domain methods, as well as panoramic and perspective approaches, across multiple recognized datasets. The benchmark demonstrates that DA2 achieves state-of-the-art zero-shot performance and even surpasses prior in-domain methods.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[12] Omnidirectional stereo depth estimation based on spherical deep network PDF
[33] EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation PDF
[37] SPDET: Edge-Aware Self-Supervised Panoramic Depth Estimation Transformer With Spherical Geometry PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Panoramic data curation engine
A pipeline that converts perspective RGB-depth pairs into full panoramic data through Perspective-to-Equirectangular projection and panoramic out-painting using FLUX-I2P. This engine scales up panoramic training data by approximately 10 times, significantly improving zero-shot generalization.
[35] Depth anywhere: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation PDF
[4] Unifuse: Unidirectional fusion for 360 panorama depth estimation PDF
[24] High-resolution depth estimation for 360deg panoramas through perspective and panoramic depth images registration PDF
[51] Geometry-Aware Self-Supervised Indoor 360° Depth Estimation via Asymmetric Dual-Domain Collaborative Learning PDF
[52] DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization PDF
[53] Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective PDF
[54] Deep synthesis and exploration of omnidirectional stereoscopic environments from a single surround-view panoramic image PDF
[55] 360 degree fish eye optical construction for equirectangular projection of panoramic images PDF
[56] EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control PDF
SphereViT architecture
A Vision Transformer backbone that uses cross-attention with spherical embeddings derived from azimuth and polar angles. Image features attend to fixed spherical embeddings to produce distortion-aware representations, mitigating spherical distortions without requiring auxiliary modules or cubemap fusion.
[15] PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation PDF
[22] Distortion-aware outdoor panoramic depth estimation via localâglobal fusion PDF
[63] Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360 Videos PDF
[64] SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception PDF
[65] SGFormer: Spherical Geometry Transformer for 360 Depth Estimation PDF
[66] Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos PDF
[67] Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation PDF
[68] Humanoidpano: Hybrid spherical panoramic-lidar cross-modal perception for humanoid robots PDF
[69] SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation PDF
[70] A Comparison of Spherical Neural Networks for Surround-View Fisheye Image Semantic Segmentation PDF
Comprehensive benchmark for panoramic depth estimation
A thorough evaluation framework comparing both zero-shot and in-domain methods, as well as panoramic and perspective approaches, across multiple recognized datasets. The benchmark demonstrates that DA2 achieves state-of-the-art zero-shot performance and even surpasses prior in-domain methods.