Building spatial world models from sparse transitional episodic memories

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Spatial RepresentationsWorld ModelsEpisodic Memory ModelsTransformersNavigation

Many animals possess a remarkable capacity to rapidly construct flexible cognitive maps of their environments. These maps are crucial for ethologically relevant behaviors such as navigation, exploration, and planning. Existing computational models typically require long sequential trajectories to build accurate maps, but neuroscience evidence suggests maps can also arise from integrating disjoint experiences governed by consistent spatial rules. We introduce the Episodic Spatial World Model (ESWM), a novel framework that constructs spatial maps from sparse, disjoint episodic memories. Across environments of varying complexity, ESWM predicts unobserved transitions from minimal experience, and the geometry of its latent space aligns with that of the environment. Because it operates on episodic memories that can be independently stored and updated, ESWM is inherently adaptive, enabling rapid adjustment to environmental changes. Furthermore, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training. Our work demonstrates how neuroscience-inspired principles of episodic memory can advance the development of more flexible and generalizable world models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Episodic Spatial World Model (ESWM), a framework that constructs spatial maps from sparse, disjoint episodic memories rather than long sequential trajectories. It resides in the 'Unified Spatial-Episodic Memory Architectures' leaf, which contains five papers total (including the original). This leaf sits within the broader 'Computational Models of Spatial and Episodic Memory' branch, indicating a moderately populated research direction focused on integrating spatial navigation and episodic memory within single computational frameworks. The taxonomy shows this is an active but not overcrowded area, with sibling papers exploring similar integration challenges using attractor networks and factorized representations.

The taxonomy reveals neighboring research directions that contextualize ESWM's position. Adjacent leaves include 'Episodic Memory Encoding and Retrieval Models' (three papers on temporal indexing and mental models) and 'Spatial Navigation and Cognitive Mapping Models' (four papers on allocentric/egocentric representations). The exclude_note for the original leaf clarifies that models focusing exclusively on spatial or episodic aspects belong elsewhere, positioning ESWM as explicitly bridging both domains. Nearby AI Architectures branches explore reinforcement learning with episodic memory (four papers) and world models for sequential decision-making (three papers), suggesting ESWM connects computational modeling with practical AI implementation concerns.

Across three core contributions, the literature search examined thirty candidate papers total, finding zero refutable pairs. For the ESWM framework itself, ten candidates were examined with none providing clear refutation. Similarly, the geometric latent space contribution and zero-shot navigation capabilities each had ten candidates examined, again with no refutations identified. This suggests that among the limited top-thirty semantic matches explored, no prior work directly overlaps with ESWM's specific combination of sparse episodic integration, geometric alignment, and zero-shot task transfer. However, the modest search scope means more comprehensive surveys might reveal closer precedents in the broader literature.

Given the limited thirty-candidate search, ESWM appears to occupy a distinctive position within its moderately populated research area. The absence of refutations across all contributions, combined with the taxonomy showing only four sibling papers in the same leaf, suggests the work explores a relatively underexplored combination of episodic sparsity and spatial geometry. However, the analysis explicitly does not cover exhaustive literature review, and the taxonomy's fifty total papers indicate substantial related work exists across neighboring branches that may inform assessments of incremental versus transformative novelty.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: constructing spatial world models from sparse episodic memories. This field bridges neuroscience, cognitive science, and artificial intelligence to understand how agents—biological or artificial—build coherent representations of space from fragmentary experiences. The taxonomy reflects a multifaceted landscape: Neuroscience Foundations examine the neural substrates of place cells, grid cells, and hippocampal remapping that underpin spatial coding; Computational Models translate these insights into algorithmic frameworks for memory consolidation and retrieval; AI Architectures explore how modern machine learning systems can leverage episodic buffers and world models for navigation and planning; Human Behavioral Studies probe the cognitive strategies people use to integrate spatial context with event memory; and Theoretical Perspectives seek unifying principles that link these domains. Representative works such as Unifying Spatial Episodic[5] and Prestructured Spatial Representations[7] illustrate efforts to merge spatial and episodic streams into coherent architectures, while studies like Dynamic Neural Navigation[3] and Planning from Imagination[2] demonstrate how agents can exploit sparse memories for flexible decision-making. Several active lines of work highlight key trade-offs and open questions. One strand focuses on how episodic retrieval mechanisms—ranging from non-Hebbian codes (Non-Hebbian Episodic Code[1]) to agentic control strategies (Agentic Episodic Control[4])—can scaffold spatial reasoning when observations are incomplete. Another explores the role of prestructured representations versus learned world models, debating whether spatial scaffolds (Spatial Scaffolds[21]) or latent generative models (Latent World Models[11]) better capture the flexibility of human-like memory. Spatial World Models[0] sits within the Unified Spatial-Episodic Memory Architectures branch, closely aligned with Unifying Spatial Episodic[5] in its emphasis on integrating sparse episodic snapshots into a coherent spatial framework. Compared to Prestructured Spatial Representations[7], which assumes innate geometric priors, Spatial World Models[0] appears to prioritize learning from episodic experience, reflecting a more constructivist stance on how world models emerge from memory.

Claimed Contributions

Episodic Spatial World Model (ESWM) framework

10 retrieved papers

The authors propose ESWM, a neural network framework that builds coherent spatial world models by integrating sparse, disjoint one-step transitions (episodic memories) rather than requiring long sequential trajectories. The model meta-learns to predict missing components of unseen transitions given a memory bank of disjoint experiences.

10 retrieved papers

Geometric latent space reflecting environment topology

10 retrieved papers

ESWM's internal representations form a geometric map that mirrors the spatial layout of environments, including obstacles and boundaries. This structured latent space emerges without explicit supervision for spatial mapping and dynamically adapts when new memories are added or environmental structures change.

10 retrieved papers

Zero-shot exploration and navigation capabilities

10 retrieved papers

The learned world model supports near-optimal exploration and navigation in novel environments without task-specific training. ESWM can autonomously explore unfamiliar spaces and plan paths between arbitrary locations using only its learned ability to integrate episodic memories.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[5] Unifying spatial and episodic representations in the hippocampus through flexible memory use PDF

Xiangshuai Zeng, Jon Recalde, Laurenz Wiskott, Cheng Sen, Sen Cheng (2025) • bioRxiv

[7] High-capacity flexible hippocampal associative and episodic memory enabled by prestructured âspatialâ representations PDF

S Chandra, S Sharma, R Chaudhuri, I Fiete (2023)

[21] Episodic and associative memory from spatial scaffolds in the hippocampus PDF

Sarthak Chandra, Sugandha Sharma, Rishidev Chaudhuri, I. Fiete, Ila Fiete (2024) • bioRxiv

[40] A unified model of spatial and episodic memory PDF

Edmund T. Rolls, Simon M. Stringer, E. Rolls, Thomas Trappenberg, S. Stringer, E.T. Rolls, T. Trappenberg, S.M. Stringer, T. P. Trappenberg (2002)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Episodic Spatial World Model (ESWM) framework

[7] High-capacity flexible hippocampal associative and episodic memory enabled by prestructured âspatialâ representations PDF

Cannot Refute

[9] Hippocampal place cells, context, and episodic memory PDF

Cannot Refute

[13] Reinforcement learning and episodic memory in humans and animals: an integrative framework PDF

Cannot Refute

[27] Sleep Benefits Spatial Context Binding in Episodic Memory PDF

Cannot Refute

[45] Breaking the chains: Toward a neural-level account of episodic memory. PDF

Cannot Refute

[61] 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model PDF

Cannot Refute

[62] Towards qualitative spatiotemporal representations for episodic memory PDF

Cannot Refute

[63] Brain representations of space and time in episodic memory: A systematic review and meta-analysis PDF

Cannot Refute

[64] World model as a graph: Learning latent landmarks for planning PDF

Cannot Refute

[65] Space is a latent sequence: Structured sequence learning as a unified theory of representation in the hippocampus PDF

Cannot Refute

Contribution

Geometric latent space reflecting environment topology

[51] Latent representation learning for geospatial entities PDF

Cannot Refute

[52] The role of latent representations for design space exploration of floorplans PDF

Cannot Refute

[53] Disentangling the latent space of an end2end generative VRNN model for structural health condition diagnosis PDF

Cannot Refute

[54] Living upon networks: A heterogeneous graph neural embedding integrating waterway and street systems for urban form understanding PDF

Cannot Refute

[55] Toward Learning Latent-Variable Representations of Microstructures by Optimizing in Spatial Statistics Space PDF

Cannot Refute

[56] Physically reliable 3D styled shape generation via structure-aware topology optimization in unified latent space PDF

Cannot Refute

[57] Space is a latent sequence: A theory of the hippocampus PDF

Cannot Refute

[58] Place Cells as Proximity-Preserving Embeddings: From Multi-Scale Random Walk to Straight-Forward Path Planning PDF

Cannot Refute

[59] GenCAD-Three-Dimensional: Computer-Aided Design Program Generation Using Multimodal Latent Space Alignment and Synthetic Dataset Balancing PDF

Cannot Refute

[60] LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities PDF

Cannot Refute

Contribution

Zero-shot exploration and navigation capabilities

[66] Navigation world models PDF

Cannot Refute

[67] Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action PDF

Cannot Refute

[68] X-mobility: End-to-end generalizable navigation via world modeling PDF

Cannot Refute

[69] ViNT: A foundation model for visual navigation PDF

Cannot Refute

[70] Wild visual navigation: Fast traversability learning via pre-trained models and online self-supervision PDF

Cannot Refute

[71] Wmnav: Integrating vision-language models into world models for object goal navigation PDF

Cannot Refute

[72] In-context reinforcement learning via communicative world models PDF

Cannot Refute

[73] Dino-wm: World models on pre-trained visual features enable zero-shot planning PDF

Cannot Refute

[74] Rapid exploration for open-world navigation with latent goal models PDF

Cannot Refute

[75] Exploration-Driven Generative Interactive Environments PDF

Cannot Refute

Building spatial world models from sparse transitional episodic memories

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[5] Unifying spatial and episodic representations in the hippocampus through flexible memory use PDF

[7] High-capacity flexible hippocampal associative and episodic memory enabled by prestructured âspatialâ representations PDF

[21] Episodic and associative memory from spatial scaffolds in the hippocampus PDF

[40] A unified model of spatial and episodic memory PDF

Contribution Analysis

Episodic Spatial World Model (ESWM) framework

[7] High-capacity flexible hippocampal associative and episodic memory enabled by prestructured âspatialâ representations PDF

[9] Hippocampal place cells, context, and episodic memory PDF

[13] Reinforcement learning and episodic memory in humans and animals: an integrative framework PDF

[27] Sleep Benefits Spatial Context Binding in Episodic Memory PDF

[45] Breaking the chains: Toward a neural-level account of episodic memory. PDF

[61] 3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model PDF

[62] Towards qualitative spatiotemporal representations for episodic memory PDF

[63] Brain representations of space and time in episodic memory: A systematic review and meta-analysis PDF

[64] World model as a graph: Learning latent landmarks for planning PDF

[65] Space is a latent sequence: Structured sequence learning as a unified theory of representation in the hippocampus PDF

Geometric latent space reflecting environment topology

[51] Latent representation learning for geospatial entities PDF

[52] The role of latent representations for design space exploration of floorplans PDF

[53] Disentangling the latent space of an end2end generative VRNN model for structural health condition diagnosis PDF

[54] Living upon networks: A heterogeneous graph neural embedding integrating waterway and street systems for urban form understanding PDF

[55] Toward Learning Latent-Variable Representations of Microstructures by Optimizing in Spatial Statistics Space PDF

[56] Physically reliable 3D styled shape generation via structure-aware topology optimization in unified latent space PDF

[57] Space is a latent sequence: A theory of the hippocampus PDF

[58] Place Cells as Proximity-Preserving Embeddings: From Multi-Scale Random Walk to Straight-Forward Path Planning PDF

[59] GenCAD-Three-Dimensional: Computer-Aided Design Program Generation Using Multimodal Latent Space Alignment and Synthetic Dataset Balancing PDF

[60] LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities PDF

Zero-shot exploration and navigation capabilities

[66] Navigation world models PDF

[67] Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action PDF

[68] X-mobility: End-to-end generalizable navigation via world modeling PDF

[69] ViNT: A foundation model for visual navigation PDF

[70] Wild visual navigation: Fast traversability learning via pre-trained models and online self-supervision PDF

[71] Wmnav: Integrating vision-language models into world models for object goal navigation PDF

[72] In-context reinforcement learning via communicative world models PDF

[73] Dino-wm: World models on pre-trained visual features enable zero-shot planning PDF

[74] Rapid exploration for open-world navigation with latent goal models PDF

[75] Exploration-Driven Generative Interactive Environments PDF

Table of Contents

[7] High-capacity flexible hippocampal associative and episodic memory enabled by prestructured âspatialâ representations PDF

[7] High-capacity flexible hippocampal associative and episodic memory enabled by prestructured âspatialâ representations PDF