Light of Normals: Unified Feature Representation for Universal Photometric Stereo

ICLR 2026 Conference SubmissionAnonymous Authors
photometric stereo; normal estimation; register; VIT
Abstract:

Universal photometric stereo (PS) is defined by two factors: it must (i) operate under arbitrary, unknown lighting conditions and (ii) avoid reliance on specific illumination models. Despite progress (e.g., SDM UniPS), two challenges remain. First, current encoders cannot guarantee that illumination and normal information are decoupled. To enforce decoupling, we introduce LINO UniPS with two key components: (i) Light Register Tokens with light alignment supervision to aggregate point, direction, and environment lights; (ii) Interleaved Attention Block featuring global cross-image attention that takes all lighting conditions together so the encoder can factor out lighting while retaining normal-related evidence. Second, high-frequency geometric details are easily lost. We address this with (i) a Wavelet-based Dual-branch Architecture and (ii) a Normal-gradient Perception Loss. These techniques yield a \textbf{unified} feature space in which lighting is explicitly represented by register tokens, while normal details are preserved via wavelet branch. We further introduce PS-Verse, a large-scale synthetic dataset graded by geometric complexity and lighting diversity, and adopt curriculum training from simple to complex scenes. Extensive experiments show new state-of-the-art results on public benchmarks (e.g., DiLiGenT, Luces), stronger generalization to real materials, and improved efficiency; ablations confirm that Light Register Tokens + Interleaved Attention Block drive better feature decoupling, while Wavelet-based Dual-branch Architecture + Normal-gradient Perception Loss recover finer details.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes LINO UniPS, a universal photometric stereo method that decouples lighting from normal information using Light Register Tokens and Interleaved Attention Blocks, while preserving high-frequency geometric details via a wavelet-based dual-branch architecture. It resides in the 'Universal Photometric Stereo with Unified Feature Representations' leaf, which currently contains no other papers in the taxonomy. This places the work in a relatively sparse research direction within the broader surface-based photometric stereo landscape, suggesting the specific combination of unified feature learning and explicit lighting decoupling under arbitrary unknown illumination is not yet densely populated.

The taxonomy reveals several neighboring directions that address related but distinct challenges. The sibling leaf 'Diffusion-Based Multi-Light Synthesis' explores auxiliary lighting generation, while 'Shadow-Aware and Reflectance-Agnostic Normal Estimation' focuses on explicit shadow modeling or avoiding reflectance disentanglement. Nearby branches include volumetric neural rendering approaches and single-view inverse rendering methods. The paper's emphasis on unified feature spaces that factor out lighting while retaining normal evidence distinguishes it from shadow-centric methods and positions it between classical surface-based photometric stereo and modern neural inverse rendering paradigms.

Among thirteen candidates examined, the wavelet-based architecture contribution shows one refutable candidate from ten examined, indicating some prior work on frequency-domain processing for geometric detail preservation. The Light Register Tokens and Interleaved Attention Block contribution examined zero candidates, suggesting either limited semantic overlap in the search or a genuinely novel architectural design. The PS-Verse dataset contribution examined three candidates with none refutable, though the limited search scope means comprehensive dataset novelty assessment remains uncertain. The analysis explicitly covers top-K semantic matches and does not claim exhaustive coverage of all relevant prior work.

Given the sparse taxonomy leaf and limited search scope of thirteen candidates, the work appears to occupy a relatively underexplored niche combining explicit lighting decoupling with wavelet-based detail preservation. However, the single refutable candidate for the wavelet contribution and the constrained search scale suggest caution in drawing definitive novelty conclusions. A broader literature review covering more candidates and adjacent research areas would strengthen confidence in the originality assessment.

Taxonomy

Core-task Taxonomy Papers
25
3
Claimed Contributions
13
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: surface normal estimation from multi-light images under unknown illumination. The field divides into several major branches that reflect different modeling philosophies and application contexts. Neural inverse rendering with volumetric representations (e.g., Tensoir[1], PS-NeRF[5]) leverages implicit scene encodings to jointly recover geometry and material properties, often targeting full 3D reconstruction. Surface-based photometric stereo under unknown lighting focuses more directly on normal estimation by modeling reflectance and light transport without requiring calibrated sources, spanning classical formulations (General Unknown Lighting[9]) and modern learning-based approaches (Deep Photometric Stereo[14], SPLINE-Net[7]). Single-view inverse rendering and scene-level reconstruction methods (NeRFactor[13], Outdoor Inverse Rendering[11]) tackle broader decomposition problems, while multi-view stereo with illumination fusion (Fusing Multiview Photometric[22]) integrates geometric cues across viewpoints. Calibration and system design branches address hardware setups (Automating RTI[8], Virtual Multiillumination Dome[17]), and application-specific analyses explore domains like fingerprint imaging (Multilight Fingerprint[12]) or relightable human capture (Relightable Human Performances[15], Relightable Neural Human[24]). A particularly active line of work centers on universal or unified feature representations that handle diverse materials and lighting conditions without per-scene calibration. Light of Normals[0] sits within this branch, emphasizing a unified framework that generalizes across surface types under unknown illumination. This contrasts with methods like DeepShaRM[3], which explicitly models shadow and interreflection effects, or RANA[6], which focuses on robust aggregation of multi-light observations. Meanwhile, approaches such as Shadow-aware Photometric[10] and Attached Shadow Coding[4] tackle specific challenges like cast shadows, and recent works (GS-I3[16], Neural LightRig[2]) explore hybrid representations blending volumetric and surface-based reasoning. The central tension across these directions involves balancing model expressiveness—capturing complex light transport and material variation—with the need for practical generalization when lighting is uncalibrated, a challenge that Light of Normals[0] addresses through its unified feature design.

Claimed Contributions

Light Register Tokens with Light Alignment supervision and Interleaved Attention Block

The authors propose specialized learnable tokens (Point, Direction, Env) supervised by explicit Light Alignment loss to aggregate global illumination information, combined with an Interleaved Attention Block featuring global cross-image attention. This design enables the encoder to decouple lighting from surface normals and produce a unified feature representation.

0 retrieved papers
Wavelet-based Dual-branch Architecture and Normal-gradient Perception Loss

The authors introduce a dual-branch architecture using discrete wavelet transform to preserve high-frequency information during feature extraction, paired with a confidence-weighted loss that emphasizes errors in high-frequency regions. These components work together to recover fine-scale geometric details that are typically lost in conventional up/downsampling operations.

10 retrieved papers
Can Refute
PS-Verse dataset with curriculum training strategy

The authors construct a large-scale synthetic dataset containing 100,000 scenes with 17,805 textured 3D models, graded by geometric complexity (four levels plus normal mapping) and lighting diversity. They employ curriculum learning that progresses from simple to complex scenes, enhancing model robustness and generalization under challenging real-world lighting conditions.

3 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Light Register Tokens with Light Alignment supervision and Interleaved Attention Block

The authors propose specialized learnable tokens (Point, Direction, Env) supervised by explicit Light Alignment loss to aggregate global illumination information, combined with an Interleaved Attention Block featuring global cross-image attention. This design enables the encoder to decouple lighting from surface normals and produce a unified feature representation.

Contribution

Wavelet-based Dual-branch Architecture and Normal-gradient Perception Loss

The authors introduce a dual-branch architecture using discrete wavelet transform to preserve high-frequency information during feature extraction, paired with a confidence-weighted loss that emphasizes errors in high-frequency regions. These components work together to recover fine-scale geometric details that are typically lost in conventional up/downsampling operations.

Contribution

PS-Verse dataset with curriculum training strategy

The authors construct a large-scale synthetic dataset containing 100,000 scenes with 17,805 textured 3D models, graded by geometric complexity (four levels plus normal mapping) and lighting diversity. They employ curriculum learning that progresses from simple to complex scenes, enhancing model robustness and generalization under challenging real-world lighting conditions.