Large Depth Completion Model from Sparse Observations

ICLR 2026 Conference SubmissionAnonymous Authors
Depth Completion
Abstract:

This work presents the Large Depth Completion Model (LDCM), a simple, effective, and robust framework for single-view metric depth estimation with sparse observations. Without relying on complex architectural designs, LDCM generates metric-accurate dense depth maps in one large transformer. It outperforms existing approaches across diverse datasets and sparse observations. We achieve this from two key perspectives: (1) maximizing the potential of existing monocular foundation models to improve sparse observations preprocessing, and (2) reformulating training objectives to better capture geometric structure and metric consistency. Specifically, a Poisson-based depth initialization module is firstly introduced to generate a uniform coarse dense depth map from diverse sparse observations, which serves as a strong structural prior for the network. Regarding the training objective, we replace the conventional depth head with a point map head that regresses per-pixel 3D coordinates in camera space, enabling the model to directly learn the underlying 3D scene structure instead of performing pixel-wise depth map restoration. Moreover, this design eliminates the need for camera intrinsic parameters, allowing LDCM to naturally produce metric-scaled 3D point maps. Extensive experiments demonstrate that LDCM consistently outperforms state-of-the-art methods across multiple benchmarks and varying sparsity priors in both depth completion and point map estimation, showcasing its effectiveness and strong generalization to unseen data distributions.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces LDCM, a transformer-based framework for metric depth estimation from sparse observations, emphasizing point map regression over conventional depth map restoration. It resides in the Transformer-Based Completion leaf under LiDAR-Based Depth Completion, which contains only three papers including this work. This is a relatively sparse research direction within the broader taxonomy of fifty papers, suggesting that transformer architectures for sparse depth completion remain an emerging area compared to the more populated convolutional propagation methods (six papers) or the diverse zero-shot foundation model approaches (seven papers across three subcategories).

The taxonomy tree reveals that LDCM sits within the Sparse-to-Dense Depth Completion branch, which focuses on densifying extremely sparse measurements like LiDAR. Neighboring leaves include Convolutional Propagation Methods, Diffusion-Based Completion, and Stereo-LiDAR Fusion, all addressing similar sensor-based completion tasks but with different architectural paradigms. The broader taxonomy also contains Monocular Depth with Sparse Guidance and Zero-Shot Foundation Model Approaches, which handle sparse depth differently—through visual SLAM integration or test-time adaptation rather than direct sensor fusion. LDCM's emphasis on transformer-based global context modeling distinguishes it from local convolutional propagation while remaining within the sensor-driven completion paradigm.

Among twenty-seven candidates examined, none clearly refute the three core contributions. The point map regression approach (ten candidates examined, zero refutable) and Poisson-based initialization module (seven candidates, zero refutable) appear to lack direct prior work in the limited search scope. The zero-shot framework claim (ten candidates, zero refutable) similarly shows no overlapping prior art among the examined papers. However, this analysis reflects a top-K semantic search plus citation expansion, not an exhaustive literature review. The absence of refutable candidates suggests these contributions may be novel within the examined subset, though the limited scope leaves open the possibility of relevant work outside the search radius.

Given the sparse population of the Transformer-Based Completion leaf and the absence of refutable prior work among twenty-seven candidates, the contributions appear relatively novel within the examined literature. The combination of transformer architecture, point map regression, and Poisson initialization distinguishes LDCM from its two sibling papers and the broader convolutional or diffusion-based alternatives. However, the limited search scope and the small size of the transformer-based completion cluster mean this assessment is provisional, contingent on the specific papers retrieved rather than a comprehensive field survey.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
27
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: metric depth estimation from sparse observations. The field addresses how to recover dense, metrically accurate depth maps when only limited depth cues are available, such as sparse LiDAR points, a handful of depth samples, or monocular images paired with minimal guidance. The taxonomy reveals several complementary branches: Sparse-to-Dense Depth Completion focuses on propagating sparse LiDAR or structured measurements into full depth maps, often using convolutional or transformer architectures; Monocular Depth with Sparse Guidance explores how a few depth anchors can scale or refine otherwise relative monocular predictions; Zero-Shot and Foundation Model Approaches leverage large pretrained models to generalize across domains without task-specific training; Domain-Specific Depth Estimation tailors methods to specialized settings like medical imaging, agriculture, or fisheye cameras; Multi-View and Temporal Depth Estimation fuses information across frames or viewpoints; and Training Strategies and Architectures examines loss functions, network designs, and data augmentation techniques that underpin these methods. Representative works such as Deep depth completion from[1] and Fast and accurate depth[3] illustrate early convolutional approaches, while newer efforts like UniDepth[15] and Depth Any Camera[18] exemplify foundation model strategies. Recent activity highlights a tension between specialized architectures and general-purpose foundation models. Within LiDAR-based completion, transformer-based methods have gained traction for their ability to model long-range dependencies, as seen in Large Depth Completion Model[0], which sits alongside earlier convolutional designs like Attention Unet for lightweight[30] and Depth Estimation Using Sparse[47]. These transformer approaches aim to better propagate sparse LiDAR signals across large spatial gaps, contrasting with the local receptive fields of traditional CNNs. Meanwhile, zero-shot and foundation model branches explore whether large-scale pretraining can bypass domain-specific tuning, raising questions about generalization versus task-specific performance. The original paper, Large Depth Completion Model[0], aligns closely with the transformer-based completion cluster, emphasizing scalable architectures that handle very sparse inputs, and differs from lighter convolutional alternatives by prioritizing expressive capacity over computational efficiency.

Claimed Contributions

Large Depth Completion Model (LDCM) with point map regression

The authors introduce LDCM, a framework that reformulates depth completion as point map regression rather than depth map restoration. By predicting per-pixel 3D coordinates in camera space, the model learns underlying 3D scene structure directly and produces metric-scaled outputs without requiring camera intrinsics.

10 retrieved papers
Poisson-based coarse depth initialization module

The authors propose a Poisson-based depth initialization module that combines sparse depth observations with relative depth cues from a foundation model to generate a uniform coarse dense depth map. This module serves as a strong structural prior by solving a gradient-domain optimization problem that preserves fine geometric structures and metric consistency.

7 retrieved papers
State-of-the-art zero-shot depth completion framework

The authors present a simple yet effective framework that achieves superior zero-shot performance across multiple benchmarks and varying sparsity patterns in both depth completion and point map estimation, demonstrating strong generalization and robustness without relying on complex architectural designs.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Large Depth Completion Model (LDCM) with point map regression

The authors introduce LDCM, a framework that reformulates depth completion as point map regression rather than depth map restoration. By predicting per-pixel 3D coordinates in camera space, the model learns underlying 3D scene structure directly and produces metric-scaled outputs without requiring camera intrinsics.

Contribution

Poisson-based coarse depth initialization module

The authors propose a Poisson-based depth initialization module that combines sparse depth observations with relative depth cues from a foundation model to generate a uniform coarse dense depth map. This module serves as a strong structural prior by solving a gradient-domain optimization problem that preserves fine geometric structures and metric consistency.

Contribution

State-of-the-art zero-shot depth completion framework

The authors present a simple yet effective framework that achieves superior zero-shot performance across multiple benchmarks and varying sparsity patterns in both depth completion and point map estimation, demonstrating strong generalization and robustness without relying on complex architectural designs.

Large Depth Completion Model from Sparse Observations | Novelty Validation