Large Depth Completion Model from Sparse Observations

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Depth Completion

This work presents the Large Depth Completion Model (LDCM), a simple, effective, and robust framework for single-view metric depth estimation with sparse observations. Without relying on complex architectural designs, LDCM generates metric-accurate dense depth maps in one large transformer. It outperforms existing approaches across diverse datasets and sparse observations. We achieve this from two key perspectives: (1) maximizing the potential of existing monocular foundation models to improve sparse observations preprocessing, and (2) reformulating training objectives to better capture geometric structure and metric consistency. Specifically, a Poisson-based depth initialization module is firstly introduced to generate a uniform coarse dense depth map from diverse sparse observations, which serves as a strong structural prior for the network. Regarding the training objective, we replace the conventional depth head with a point map head that regresses per-pixel 3D coordinates in camera space, enabling the model to directly learn the underlying 3D scene structure instead of performing pixel-wise depth map restoration. Moreover, this design eliminates the need for camera intrinsic parameters, allowing LDCM to naturally produce metric-scaled 3D point maps. Extensive experiments demonstrate that LDCM consistently outperforms state-of-the-art methods across multiple benchmarks and varying sparsity priors in both depth completion and point map estimation, showcasing its effectiveness and strong generalization to unseen data distributions.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LDCM, a transformer-based framework for metric depth estimation from sparse observations, emphasizing point map regression over conventional depth map restoration. It resides in the Transformer-Based Completion leaf under LiDAR-Based Depth Completion, which contains only three papers including this work. This is a relatively sparse research direction within the broader taxonomy of fifty papers, suggesting that transformer architectures for sparse depth completion remain an emerging area compared to the more populated convolutional propagation methods (six papers) or the diverse zero-shot foundation model approaches (seven papers across three subcategories).

The taxonomy tree reveals that LDCM sits within the Sparse-to-Dense Depth Completion branch, which focuses on densifying extremely sparse measurements like LiDAR. Neighboring leaves include Convolutional Propagation Methods, Diffusion-Based Completion, and Stereo-LiDAR Fusion, all addressing similar sensor-based completion tasks but with different architectural paradigms. The broader taxonomy also contains Monocular Depth with Sparse Guidance and Zero-Shot Foundation Model Approaches, which handle sparse depth differently—through visual SLAM integration or test-time adaptation rather than direct sensor fusion. LDCM's emphasis on transformer-based global context modeling distinguishes it from local convolutional propagation while remaining within the sensor-driven completion paradigm.

Among twenty-seven candidates examined, none clearly refute the three core contributions. The point map regression approach (ten candidates examined, zero refutable) and Poisson-based initialization module (seven candidates, zero refutable) appear to lack direct prior work in the limited search scope. The zero-shot framework claim (ten candidates, zero refutable) similarly shows no overlapping prior art among the examined papers. However, this analysis reflects a top-K semantic search plus citation expansion, not an exhaustive literature review. The absence of refutable candidates suggests these contributions may be novel within the examined subset, though the limited scope leaves open the possibility of relevant work outside the search radius.

Given the sparse population of the Transformer-Based Completion leaf and the absence of refutable prior work among twenty-seven candidates, the contributions appear relatively novel within the examined literature. The combination of transformer architecture, point map regression, and Poisson initialization distinguishes LDCM from its two sibling papers and the broader convolutional or diffusion-based alternatives. However, the limited search scope and the small size of the transformer-based completion cluster mean this assessment is provisional, contingent on the specific papers retrieved rather than a comprehensive field survey.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: metric depth estimation from sparse observations. The field addresses how to recover dense, metrically accurate depth maps when only limited depth cues are available, such as sparse LiDAR points, a handful of depth samples, or monocular images paired with minimal guidance. The taxonomy reveals several complementary branches: Sparse-to-Dense Depth Completion focuses on propagating sparse LiDAR or structured measurements into full depth maps, often using convolutional or transformer architectures; Monocular Depth with Sparse Guidance explores how a few depth anchors can scale or refine otherwise relative monocular predictions; Zero-Shot and Foundation Model Approaches leverage large pretrained models to generalize across domains without task-specific training; Domain-Specific Depth Estimation tailors methods to specialized settings like medical imaging, agriculture, or fisheye cameras; Multi-View and Temporal Depth Estimation fuses information across frames or viewpoints; and Training Strategies and Architectures examines loss functions, network designs, and data augmentation techniques that underpin these methods. Representative works such as Deep depth completion from[1] and Fast and accurate depth[3] illustrate early convolutional approaches, while newer efforts like UniDepth[15] and Depth Any Camera[18] exemplify foundation model strategies. Recent activity highlights a tension between specialized architectures and general-purpose foundation models. Within LiDAR-based completion, transformer-based methods have gained traction for their ability to model long-range dependencies, as seen in Large Depth Completion Model[0], which sits alongside earlier convolutional designs like Attention Unet for lightweight[30] and Depth Estimation Using Sparse[47]. These transformer approaches aim to better propagate sparse LiDAR signals across large spatial gaps, contrasting with the local receptive fields of traditional CNNs. Meanwhile, zero-shot and foundation model branches explore whether large-scale pretraining can bypass domain-specific tuning, raising questions about generalization versus task-specific performance. The original paper, Large Depth Completion Model[0], aligns closely with the transformer-based completion cluster, emphasizing scalable architectures that handle very sparse inputs, and differs from lighter convolutional alternatives by prioritizing expressive capacity over computational efficiency.

Claimed Contributions

Large Depth Completion Model (LDCM) with point map regression

10 retrieved papers

The authors introduce LDCM, a framework that reformulates depth completion as point map regression rather than depth map restoration. By predicting per-pixel 3D coordinates in camera space, the model learns underlying 3D scene structure directly and produces metric-scaled outputs without requiring camera intrinsics.

10 retrieved papers

Poisson-based coarse depth initialization module

7 retrieved papers

The authors propose a Poisson-based depth initialization module that combines sparse depth observations with relative depth cues from a foundation model to generate a uniform coarse dense depth map. This module serves as a strong structural prior by solving a gradient-domain optimization problem that preserves fine geometric structures and metric consistency.

7 retrieved papers

State-of-the-art zero-shot depth completion framework

10 retrieved papers

The authors present a simple yet effective framework that achieves superior zero-shot performance across multiple benchmarks and varying sparsity patterns in both depth completion and point map estimation, demonstrating strong generalization and robustness without relying on complex architectural designs.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[30] Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB image PDF

Tao Zhao, S. Pan, Wang Gao, Chao Sheng, Yingchun Sun, Jiansheng Wei, Shuguo Pan (2021)

[47] Depth Estimation Using Sparse Depth and Transformer PDF

Roopak Malik, Praful Hambarde, Subrahmanyam Murala (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Large Depth Completion Model (LDCM) with point map regression

[65] 3d human pose estimation from a single image via distance matrix regression PDF

Cannot Refute

[66] Joint 3d face reconstruction and dense alignment with position map regression network PDF

Cannot Refute

[67] Dens3R: A Foundation Model for 3D Geometry Prediction PDF

Cannot Refute

[68] Point-to-point regression pointnet for 3d hand pose estimation PDF

Cannot Refute

[69] DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction PDF

Cannot Refute

[70] Dewarpnet: Single-image document unwarping with stacked 3d and 2d regression networks PDF

Cannot Refute

[71] 2D/3D image registration using regression learning PDF

Cannot Refute

[72] ACE-SLAM: Scene Coordinate Regression for Neural Implicit Real-Time SLAM PDF

Cannot Refute

[73] Mvpnet: Multi-view point regression networks for 3d object reconstruction from a single image PDF

Cannot Refute

[74] Improved 3D Point-Line Mapping Regression for Camera Relocalization PDF

Cannot Refute

Contribution

Poisson-based coarse depth initialization module

[58] High-quality surface reconstruction using gaussian surfels PDF

Cannot Refute

[59] Surface reconstruction from 3d gaussian splatting via local structural hints PDF

Cannot Refute

[60] SLAM-based dense surface reconstruction in monocular minimally invasive surgery and its application to augmented reality PDF

Cannot Refute

[61] Sparse-to-dense depth completion revisited: Sampling strategy and graph construction PDF

Cannot Refute

[62] Monocular Depth Estimation in the Foundation Model Era: A Survey PDF

Cannot Refute

[63] Depth map recovery of targets based on monocular polarized images and adaptive regularization techniques PDF

Cannot Refute

[64] Augmented Reality for Depth Cues in Monocular Minimally Invasive Surgery PDF

Cannot Refute

Contribution

State-of-the-art zero-shot depth completion framework

[37] Depth Anything with Any Prior PDF

Cannot Refute

[38] A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation PDF

Cannot Refute

[46] Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation PDF

Cannot Refute

[51] Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image PDF

Cannot Refute

[52] Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction PDF

Cannot Refute

[53] Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues PDF

Cannot Refute

[54] Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model PDF

Cannot Refute

[55] SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments PDF

Cannot Refute

[56] Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation PDF

Cannot Refute

[57] Depth Pro: Sharp Monocular Metric Depth in Less Than a Second PDF

Cannot Refute

Large Depth Completion Model from Sparse Observations

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[30] Attention Unet++ for lightweight depth estimation from sparse depth samples and a single RGB image PDF

[47] Depth Estimation Using Sparse Depth and Transformer PDF

Contribution Analysis

Large Depth Completion Model (LDCM) with point map regression

[65] 3d human pose estimation from a single image via distance matrix regression PDF

[66] Joint 3d face reconstruction and dense alignment with position map regression network PDF

[67] Dens3R: A Foundation Model for 3D Geometry Prediction PDF

[68] Point-to-point regression pointnet for 3d hand pose estimation PDF

[69] DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction PDF

[70] Dewarpnet: Single-image document unwarping with stacked 3d and 2d regression networks PDF

[71] 2D/3D image registration using regression learning PDF

[72] ACE-SLAM: Scene Coordinate Regression for Neural Implicit Real-Time SLAM PDF

[73] Mvpnet: Multi-view point regression networks for 3d object reconstruction from a single image PDF

[74] Improved 3D Point-Line Mapping Regression for Camera Relocalization PDF

Poisson-based coarse depth initialization module

[58] High-quality surface reconstruction using gaussian surfels PDF

[59] Surface reconstruction from 3d gaussian splatting via local structural hints PDF

[60] SLAM-based dense surface reconstruction in monocular minimally invasive surgery and its application to augmented reality PDF

[61] Sparse-to-dense depth completion revisited: Sampling strategy and graph construction PDF

[62] Monocular Depth Estimation in the Foundation Model Era: A Survey PDF

[63] Depth map recovery of targets based on monocular polarized images and adaptive regularization techniques PDF

[64] Augmented Reality for Depth Cues in Monocular Minimally Invasive Surgery PDF

State-of-the-art zero-shot depth completion framework

[37] Depth Anything with Any Prior PDF

[38] A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation PDF

[46] Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation PDF

[51] Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image PDF

[52] Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction PDF

[53] Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues PDF

[54] Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model PDF

[55] SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments PDF

[56] Zero-Shot Metric Depth Estimation via Monocular Visual-Inertial Rescaling for Autonomous Aerial Navigation PDF

[57] Depth Pro: Sharp Monocular Metric Depth in Less Than a Second PDF

Table of Contents