DA $^2$ : Depth Anything in Any Direction

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.4 Download Report PDF

PanoramasDepth (Distance) Estimation

Panorama has a full FoV (360 $^\circ\times$ 180 $^\circ$ ), offering a more complete visual description than perspective images. Thanks to this characteristic, panoramic depth estimation is gaining increasing traction in 3D vision. However, due to the scarcity of panoramic data, previous methods are often restricted to in-domain settings, leading to poor zero-shot generalization. Furthermore, due to the spherical distortions inherent in panoramas, many approaches rely on perspective splitting (\textit{e.g.}, cubemaps), which leads to suboptimal efficiency. To address these challenges, we propose $\textbf{DA}$$^{\textbf{2}}$ : $\textbf{D}$ epth $\textbf{A}$ nything in $\textbf{A}$ ny $\textbf{D}$ irection, an accurate, zero-shot generalizable, and fully end-to-end panoramic depth estimator. Specifically, for scaling up panoramic data, we introduce a data curation engine for generating high-quality panoramic depth data from perspective, and create $\sim$ 543K panoramic RGB-depth pairs, bringing the total to $\sim$ 607K. To further mitigate the spherical distortions, we present SphereViT, which explicitly leverages spherical coordinates to enforce the spherical geometric consistency in panoramic image features, yielding improved performance. A comprehensive benchmark on multiple datasets clearly demonstrates DA $^{2}$ 's SoTA performance, with an average 38% improvement on AbsRel over the strongest zero-shot baseline. Surprisingly, DA $^{2}$ even outperforms prior in-domain methods, highlighting its superior zero-shot generalization. Moreover, as an end-to-end solution, DA $^{2}$ exhibits much higher efficiency over fusion-based approaches. Both the code and the curated panoramic data will be released.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: panoramic depth estimation. The field has evolved into several major branches that reflect different input modalities and architectural strategies. Monocular panoramic depth estimation forms the largest branch, encompassing distortion-aware architectures that handle spherical geometry through specialized convolutions, coordinate-based methods, and transformer-based designs. Stereo and multi-view approaches leverage multiple panoramic images to improve geometric consistency, while sensor fusion methods combine panoramic cameras with LiDAR or other modalities to extend depth range and accuracy. High-resolution and perspective-panoramic registration techniques address the challenge of aligning standard pinhole images with 360-degree views, and application-specific branches tailor depth estimation to domains such as autonomous driving or indoor scene understanding. Datasets, benchmarks, and foundation models provide the infrastructure for training and evaluation, with recent works like Depth Anywhere[35] and Depth Any Panoramas[47] exploring generalization across diverse panoramic scenarios. Within monocular methods, a central tension exists between approaches that explicitly model spherical distortion versus those that adapt standard perspective architectures. Distortion-aware filters and spherical convolutions, exemplified by early work like Distortion Aware Filters[11] and Omnidepth[18], directly address equirectangular projection artifacts, while coordinate-based methods such as Spherical Deep Network[12] and EGformer[33] encode geometric priors through positional embeddings or spherical harmonics. The original paper, Depth Anything Direction[0], sits within this coordinate-based cluster, emphasizing directional cues in spherical space similarly to SPDET[37] and EGformer[33]. Compared to these neighbors, Depth Anything Direction[0] appears to push toward more flexible geometric representations that can generalize across viewing conditions, contrasting with SPDET[37]'s focus on explicit tangent-plane decompositions. This line of work reflects ongoing exploration of how best to encode 360-degree geometry without sacrificing the representational power of modern deep networks.

Claimed Contributions

Panoramic data curation engine

Can Refute

9 retrieved papers

A pipeline that converts perspective RGB-depth pairs into full panoramic data through Perspective-to-Equirectangular projection and panoramic out-painting using FLUX-I2P. This engine scales up panoramic training data by approximately 10 times, significantly improving zero-shot generalization.

9 retrieved papers

Can Refute

SphereViT architecture

Can Refute

10 retrieved papers

A Vision Transformer backbone that uses cross-attention with spherical embeddings derived from azimuth and polar angles. Image features attend to fixed spherical embeddings to produce distortion-aware representations, mitigating spherical distortions without requiring auxiliary modules or cubemap fusion.

10 retrieved papers

Can Refute

Comprehensive benchmark for panoramic depth estimation

Can Refute

9 retrieved papers

A thorough evaluation framework comparing both zero-shot and in-domain methods, as well as panoramic and perspective approaches, across multiple recognized datasets. The benchmark demonstrates that DA2 achieves state-of-the-art zero-shot performance and even surpasses prior in-domain methods.

9 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[12] Omnidirectional stereo depth estimation based on spherical deep network PDF

Ming Li, Xuejiao Hu, J. Dai, Yang Li, S. Du, Jingzhao Dai, Sidan Du (2021)

[33] EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation PDF

Ilwi Yun, Chanyong Shin, Hyunku Lee, Hyuk-Jae Lee, Chae Eun Rhee, Chae-Eun Rhee (2023)

[37] SPDET: Edge-Aware Self-Supervised Panoramic Depth Estimation Transformer With Spherical Geometry PDF

Chuanqing Zhuang, Zhengda Lu, Yiqun Wang, Jun Xiao, Ying Wang (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Panoramic data curation engine

[35] Depth anywhere: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation PDF

Can Refute

[4] Unifuse: Unidirectional fusion for 360 panorama depth estimation PDF

Cannot Refute

[24] High-resolution depth estimation for 360deg panoramas through perspective and panoramic depth images registration PDF

Cannot Refute

[51] Geometry-Aware Self-Supervised Indoor 360Â° Depth Estimation via Asymmetric Dual-Domain Collaborative Learning PDF

Cannot Refute

[52] DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization PDF

Cannot Refute

[53] Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective PDF

Cannot Refute

[54] Deep synthesis and exploration of omnidirectional stereoscopic environments from a single surround-view panoramic image PDF

Cannot Refute

[55] 360 degree fish eye optical construction for equirectangular projection of panoramic images PDF

Cannot Refute

[56] EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control PDF

Cannot Refute

Contribution

SphereViT architecture

[15] PanoFormer: Panorama Transformer for Indoor 360Â° Depth Estimation PDF

Can Refute

[22] Distortion-aware outdoor panoramic depth estimation via localâglobal fusion PDF

Cannot Refute

[63] Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360 Videos PDF

Cannot Refute

[64] SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception PDF

Cannot Refute

[65] SGFormer: Spherical Geometry Transformer for 360 Depth Estimation PDF

Cannot Refute

[66] Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos PDF

Cannot Refute

[67] Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation PDF

Cannot Refute

[68] Humanoidpano: Hybrid spherical panoramic-lidar cross-modal perception for humanoid robots PDF

Cannot Refute

[69] SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation PDF

Cannot Refute

[70] A Comparison of Spherical Neural Networks for Surround-View Fisheye Image Semantic Segmentation PDF

Cannot Refute

Contribution

Comprehensive benchmark for panoramic depth estimation

[59] PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation PDF

Can Refute

[14] Sn360: Semantic and surface normal cascaded multi-task 360 monocular depth estimation PDF

Cannot Refute

[32] Pano3d: A holistic benchmark and a solid baseline for 360deg depth estimation PDF

Cannot Refute

[47] Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation PDF

Cannot Refute

[57] Metric3d: Towards zero-shot metric 3d prediction from a single image PDF

Cannot Refute

[58] Depth any camera: Zero-shot metric depth estimation from any camera PDF

Cannot Refute

[60] Open panoramic segmentation PDF

Cannot Refute

[61] PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and MÃ¶bius Spatial Augmentation PDF

Cannot Refute

[62] Depth Anything in : Towards Scale Invariance in the Wild PDF

Cannot Refute

DA2^22: Depth Anything in Any Direction

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[12] Omnidirectional stereo depth estimation based on spherical deep network PDF

[33] EGformer: Equirectangular Geometry-biased Transformer for 360 Depth Estimation PDF

[37] SPDET: Edge-Aware Self-Supervised Panoramic Depth Estimation Transformer With Spherical Geometry PDF

Contribution Analysis

Panoramic data curation engine

[35] Depth anywhere: Enhancing 360 monocular depth estimation via perspective distillation and unlabeled data augmentation PDF

[4] Unifuse: Unidirectional fusion for 360 panorama depth estimation PDF

[24] High-resolution depth estimation for 360deg panoramas through perspective and panoramic depth images registration PDF

[51] Geometry-Aware Self-Supervised Indoor 360Â° Depth Estimation via Asymmetric Dual-Domain Collaborative Learning PDF

[52] DreamCube: RGB-D Panorama Generation via Multi-plane Synchronization PDF

[53] Revisiting 360 Depth Estimation with PanoGabor: A New Fusion Perspective PDF

[54] Deep synthesis and exploration of omnidirectional stereoscopic environments from a single surround-view panoramic image PDF

[55] 360 degree fish eye optical construction for equirectangular projection of panoramic images PDF

[56] EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control PDF

SphereViT architecture

[15] PanoFormer: Panorama Transformer for Indoor 360Â° Depth Estimation PDF

[22] Distortion-aware outdoor panoramic depth estimation via localâglobal fusion PDF

[63] Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360 Videos PDF

[64] SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception PDF

[65] SGFormer: Spherical Geometry Transformer for 360 Depth Estimation PDF

[66] Spherical Vision Transformers for Audio-Visual Saliency Prediction in 360-Degree Videos PDF

[67] Mamba4PASS: Vision Mamba for PAnoramic Semantic Segmentation PDF

[68] Humanoidpano: Hybrid spherical panoramic-lidar cross-modal perception for humanoid robots PDF

[69] SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation PDF

[70] A Comparison of Spherical Neural Networks for Surround-View Fisheye Image Semantic Segmentation PDF

Comprehensive benchmark for panoramic depth estimation

[59] PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation PDF

[14] Sn360: Semantic and surface normal cascaded multi-task 360 monocular depth estimation PDF

[32] Pano3d: A holistic benchmark and a solid baseline for 360deg depth estimation PDF

[47] Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation PDF

[57] Metric3d: Towards zero-shot metric 3d prediction from a single image PDF

[58] Depth any camera: Zero-shot metric depth estimation from any camera PDF

[60] Open panoramic segmentation PDF

[61] PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and MÃ¶bius Spatial Augmentation PDF

[62] Depth Anything in : Towards Scale Invariance in the Wild PDF

Table of Contents

DA $^2$ : Depth Anything in Any Direction

[22] Distortion-aware outdoor panoramic depth estimation via localâglobal fusion PDF