What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Large Language ModeVision-Language ModelSpatial ReasoningSpatial AgentActive Exploration

Current foundation models can answer spatial reasoning questions about a given image or text, yet they lack the fundamental ability to build a genuine spatial understanding of an environment through active exploration. This reflects a critical blind spot in prevailing evaluation protocols, which predominantly test passive reasoning on curated data rather than the active construction of knowledge under uncertainty. To address this, we introduce Theory of Space (ToS), a new framework analogous to the Theory of Mind. While Theory of Mind concerns an agent's ability to model the hidden mental states of others, ToS concerns its ability to construct, update, and utilize an internal belief about the unobserved structure of its spatial environment from local, incomplete observations. We implement ToS with a comprehensive benchmark featuring both text-based and visual environments. Instead of performing specific tasks in such environments, the primary objective is to build a complete and accurate spatial belief through curiosity-driven exploration. A core innovation of our framework is the direct probing of this internal belief: we prompt models to explicitly present their cognitive map at each step, allowing us to measure not only task performance but also the quality, consistency, and evolution of the underlying spatial model itself. By evaluating state-of-the-art models as both active explorers and passive reasoners (using logs from scripted proxy agents), we disentangle exploration strategy from reasoning ability. Our analysis reveals common failure modes in spatial belief management, such as egomotion update errors and the inability to maintain a globally consistent map. The ToS framework provides the concepts and tools necessary to evaluate and build agents with more robust, human-like spatial intelligence.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Theory of Space (ToS), a framework for evaluating how foundation models actively construct spatial beliefs through exploration, analogous to Theory of Mind for mental state modeling. It resides in the 'Theory-Driven Spatial Belief Frameworks' leaf, which contains only three papers total, including this work and two siblings. This represents a notably sparse research direction within the broader taxonomy of 48 papers across the field, suggesting the paper targets a relatively underexplored conceptual niche focused on principled frameworks for spatial belief construction rather than task-specific navigation or perception methods.

The taxonomy reveals that neighboring research directions are substantially more populated: 'Memory-Augmented Spatial Reasoning' (3 papers), 'Foundation Model-Guided Exploration' (4 papers), and 'Zero-Shot Object Navigation' (4 papers) all address related but distinct aspects of spatial intelligence. The sibling papers in the same leaf—Spatial Schema Intuitions and Adaptive World Models—examine cognitive primitives and dynamic model adaptation respectively, whereas ToS emphasizes the active exploration process itself. The taxonomy's scope and exclude notes clarify that ToS belongs here because it proposes a theoretical evaluation framework rather than applying existing methods to specific tasks, distinguishing it from application-oriented categories.

Among 30 candidates examined through semantic search, the contribution-level analysis shows varied novelty profiles. The ToS framework itself (10 candidates examined, 0 refutable) and the comprehensive benchmark (10 candidates examined, 0 refutable) appear to have limited direct prior work within the search scope. However, the direct probing mechanism for internal spatial beliefs (10 candidates examined, 1 refutable) shows at least one candidate providing overlapping prior work. This suggests that while the overarching framework may be relatively novel, specific technical components like belief probing have some precedent in the examined literature, though the limited search scale means substantial related work could exist beyond these 30 candidates.

Given the sparse taxonomy leaf and limited search scope, the work appears to occupy a conceptual space with relatively few direct competitors among the examined papers. The framework's emphasis on curiosity-driven exploration and explicit cognitive map probing distinguishes it from task-oriented navigation benchmarks, though the analysis acknowledges it covers only top-30 semantic matches and does not claim exhaustive coverage of all potentially relevant spatial reasoning literature.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Actively constructing spatial beliefs through exploration in foundation models. The field encompasses a diverse set of approaches that address how agents build, maintain, and reason about spatial knowledge in embodied settings. At the highest level, the taxonomy distinguishes between works focused on spatial belief construction and cognitive mapping (which develop explicit or implicit representations of environments), exploration strategies and active perception (which determine how agents gather spatial information), embodied navigation and object search (which apply spatial reasoning to goal-directed tasks), reinforcement learning and interactive decision-making (which learn policies through environmental interaction), benchmarks and evaluation frameworks (which standardize assessment), foundation models for robotics and embodied AI (which leverage large pretrained models for spatial tasks), domain-specific and applied spatial systems (which target particular application areas), and cognitive and learning sciences perspectives (which draw on human spatial cognition research). Representative works such as SpatialVLA[1] and Embodied-r[3] illustrate how foundation models are being adapted to spatial reasoning, while Voronav[2] and SSR-ZSON[8] exemplify navigation-centric approaches. A particularly active line of work centers on how agents should balance exploration with exploitation when spatial knowledge is incomplete or uncertain, as seen in Explore Until Confident[6] and Adaptive World Models[13]. Another contrasting theme involves whether to rely on end-to-end learned representations versus structured symbolic or schema-based spatial models, a tension visible across Foundation Models Hypothesis Testing[7] and Spatial Schema Intuitions[4]. Beyond the View[0] sits within the theory-driven spatial belief frameworks cluster, emphasizing principled mechanisms for belief updating during exploration. Compared to Spatial Schema Intuitions[4], which examines cognitive primitives for spatial understanding, and Adaptive World Models[13], which focuses on dynamic model adaptation, Beyond the View[0] appears to prioritize the active construction process itself—how agents iteratively refine spatial hypotheses by strategically choosing where to look next. This positioning highlights ongoing questions about the interplay between model architecture, exploration policy, and the granularity of spatial representations needed for robust embodied intelligence.

Claimed Contributions

Theory of Space (ToS) framework

10 retrieved papers

The authors propose ToS as a conceptual framework for evaluating an agent's ability to actively construct, update, and utilize an internal spatial belief from partial observations. Unlike Theory of Mind, which models hidden mental states of others, ToS models the uncertain, unobserved structure of physical space through curiosity-driven exploration.

10 retrieved papers

Comprehensive benchmark for active spatial belief construction

10 retrieved papers

The authors develop a benchmark that evaluates agents through active exploration in procedurally generated multi-room environments. The benchmark includes both text-based and vision-based modalities, scripted proxy agents for disentangling exploration from reasoning, and a suite of spatial cognition tasks covering route and survey knowledge.

10 retrieved papers

Direct probing mechanism for internal spatial beliefs

Can Refute

10 retrieved papers

The authors introduce a method to explicitly probe the agent's internal spatial representation by requiring it to externalize its cognitive map at any exploration step. This allows measurement of not only task performance but also the quality, consistency, and evolution of the underlying spatial model itself, moving beyond black-box evaluation.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] Exploring Spatial Schema Intuitions in Large Language and Vision Models PDF

Wicke, Philipp (2024)

[13] Assessing adaptive world models in machines with novel games PDF

Ying, Lance, Collins, Katherine M., Lance Ying, Sharma, Prafull, Katherine M. Collins, Colas, CÃ©dric, Prafull Sharma, C'edric Colas, Weller, Adrian, Kaiya Ivy Zhao, Tavares, Zenna, Adrian Weller, Isola, Phillip, Zenna Tavares, Gershman, Samuel J., Phillip Isola, Samuel Gershman, Griffiths, Thomas L., Jacob D. Andreas, Chollet, FranÃ§ois, Thomas L. Griffiths, Allen, Kelsey R., FranÃ§ois Chollet, Tenenbaum, Joshua B., Kelsey Allen, Joshua B. Tenenbaum (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Theory of Space (ToS) framework

[10] Seeing is Believing: Belief-Space Planning with Foundation Models as Uncertainty Estimators PDF

Cannot Refute

[69] CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction PDF

Cannot Refute

[70] An exploration of embodied visual exploration PDF

Cannot Refute

[71] Cognitive mapping and planning for visual navigation PDF

Cannot Refute

[72] Exploration patterns shape cognitive map learning PDF

Cannot Refute

[73] The Spread of Beliefs in Partially Modularized Communities PDF

Cannot Refute

[74] COVID-19 in Toronto: A Spatial Exploratory Analysis PDF

Cannot Refute

[75] Exploratory spatial analysis PDF

Cannot Refute

[76] Qualitative spatial representations from task-oriented perception and exploratory behaviors PDF

Cannot Refute

[77] Behavior determines the hippocampal spatial mapping of a multisensory environment. PDF

Cannot Refute

Contribution

Comprehensive benchmark for active spatial belief construction

[49] Openeqa: Embodied question answering in the era of foundation models PDF

Cannot Refute

[50] Soundspaces: Audio-visual navigation in 3d environments PDF

Cannot Refute

[51] SELM: From Efficient Autonomous Exploration to Long-term Monitoring in Semantic Level PDF

Cannot Refute

[52] Robocas: A benchmark for robotic manipulation in complex object arrangement scenarios PDF

Cannot Refute

[53] Noveld: A simple yet effective exploration criterion PDF

Cannot Refute

[54] CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space PDF

Cannot Refute

[55] SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning PDF

Cannot Refute

[56] EscapeCraft: A 3D Room Escape Environment for Benchmarking Complex Multimodal Reasoning Ability PDF

Cannot Refute

[57] 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model PDF

Cannot Refute

[58] VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms PDF

Cannot Refute

Contribution

Direct probing mechanism for internal spatial beliefs

[63] Spatial mental modeling from limited views PDF

Can Refute

[59] Thinking in space: How multimodal large language models see, remember, and recall spaces PDF

Cannot Refute

[60] Learning place cells and remapping by decoding the cognitive map PDF

Cannot Refute

[61] From reactive to cognitive: brain-inspired spatial intelligence for embodied agents PDF

Cannot Refute

[62] VResin: Externalizing spatial memory into 3D sketch maps PDF

Cannot Refute

[64] Probing for consciousness in machines PDF

Cannot Refute

[65] GeoAI: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond PDF

Cannot Refute

[66] From geometry to behavior: An introduction to spatial cognition PDF

Cannot Refute

[67] Mind meets space: Rethinking agentic spatial intelligence from a neuroscience-inspired perspective PDF

Cannot Refute

[68] Probing mental representations of space through sketch mapping: a scoping review PDF

Cannot Refute

What Lies Beyond the View? Actively Constructing Spatial Beliefs in Foundation Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] Exploring Spatial Schema Intuitions in Large Language and Vision Models PDF

[13] Assessing adaptive world models in machines with novel games PDF

Contribution Analysis

Theory of Space (ToS) framework

[10] Seeing is Believing: Belief-Space Planning with Foundation Models as Uncertainty Estimators PDF

[69] CogniPlan: Uncertainty-Guided Path Planning with Conditional Generative Layout Prediction PDF

[70] An exploration of embodied visual exploration PDF

[71] Cognitive mapping and planning for visual navigation PDF

[72] Exploration patterns shape cognitive map learning PDF

[73] The Spread of Beliefs in Partially Modularized Communities PDF

[74] COVID-19 in Toronto: A Spatial Exploratory Analysis PDF

[75] Exploratory spatial analysis PDF

[76] Qualitative spatial representations from task-oriented perception and exploratory behaviors PDF

[77] Behavior determines the hippocampal spatial mapping of a multisensory environment. PDF

Comprehensive benchmark for active spatial belief construction

[49] Openeqa: Embodied question answering in the era of foundation models PDF

[50] Soundspaces: Audio-visual navigation in 3d environments PDF

[51] SELM: From Efficient Autonomous Exploration to Long-term Monitoring in Semantic Level PDF

[52] Robocas: A benchmark for robotic manipulation in complex object arrangement scenarios PDF

[53] Noveld: A simple yet effective exploration criterion PDF

[54] CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space PDF

[55] SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning PDF

[56] EscapeCraft: A 3D Room Escape Environment for Benchmarking Complex Multimodal Reasoning Ability PDF

[57] 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model PDF

[58] VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms PDF

Direct probing mechanism for internal spatial beliefs

[63] Spatial mental modeling from limited views PDF

[59] Thinking in space: How multimodal large language models see, remember, and recall spaces PDF

[60] Learning place cells and remapping by decoding the cognitive map PDF

[61] From reactive to cognitive: brain-inspired spatial intelligence for embodied agents PDF

[62] VResin: Externalizing spatial memory into 3D sketch maps PDF

[64] Probing for consciousness in machines PDF

[65] GeoAI: spatially explicit artificial intelligence techniques for geographic knowledge discovery and beyond PDF

[66] From geometry to behavior: An introduction to spatial cognition PDF

[67] Mind meets space: Rethinking agentic spatial intelligence from a neuroscience-inspired perspective PDF

[68] Probing mental representations of space through sketch mapping: a scoping review PDF

Table of Contents