Building spatial world models from sparse transitional episodic memories

ICLR 2026 Conference SubmissionAnonymous Authors
Spatial RepresentationsWorld ModelsEpisodic Memory ModelsTransformersNavigation
Abstract:

Many animals possess a remarkable capacity to rapidly construct flexible cognitive maps of their environments. These maps are crucial for ethologically relevant behaviors such as navigation, exploration, and planning. Existing computational models typically require long sequential trajectories to build accurate maps, but neuroscience evidence suggests maps can also arise from integrating disjoint experiences governed by consistent spatial rules. We introduce the Episodic Spatial World Model (ESWM), a novel framework that constructs spatial maps from sparse, disjoint episodic memories. Across environments of varying complexity, ESWM predicts unobserved transitions from minimal experience, and the geometry of its latent space aligns with that of the environment. Because it operates on episodic memories that can be independently stored and updated, ESWM is inherently adaptive, enabling rapid adjustment to environmental changes. Furthermore, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training. Our work demonstrates how neuroscience-inspired principles of episodic memory can advance the development of more flexible and generalizable world models.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces the Episodic Spatial World Model (ESWM), a framework that constructs spatial maps from sparse, disjoint episodic memories rather than long sequential trajectories. It resides in the 'Unified Spatial-Episodic Memory Architectures' leaf, which contains five papers total (including the original). This leaf sits within the broader 'Computational Models of Spatial and Episodic Memory' branch, indicating a moderately populated research direction focused on integrating spatial navigation and episodic memory within single computational frameworks. The taxonomy shows this is an active but not overcrowded area, with sibling papers exploring similar integration challenges using attractor networks and factorized representations.

The taxonomy reveals neighboring research directions that contextualize ESWM's position. Adjacent leaves include 'Episodic Memory Encoding and Retrieval Models' (three papers on temporal indexing and mental models) and 'Spatial Navigation and Cognitive Mapping Models' (four papers on allocentric/egocentric representations). The exclude_note for the original leaf clarifies that models focusing exclusively on spatial or episodic aspects belong elsewhere, positioning ESWM as explicitly bridging both domains. Nearby AI Architectures branches explore reinforcement learning with episodic memory (four papers) and world models for sequential decision-making (three papers), suggesting ESWM connects computational modeling with practical AI implementation concerns.

Across three core contributions, the literature search examined thirty candidate papers total, finding zero refutable pairs. For the ESWM framework itself, ten candidates were examined with none providing clear refutation. Similarly, the geometric latent space contribution and zero-shot navigation capabilities each had ten candidates examined, again with no refutations identified. This suggests that among the limited top-thirty semantic matches explored, no prior work directly overlaps with ESWM's specific combination of sparse episodic integration, geometric alignment, and zero-shot task transfer. However, the modest search scope means more comprehensive surveys might reveal closer precedents in the broader literature.

Given the limited thirty-candidate search, ESWM appears to occupy a distinctive position within its moderately populated research area. The absence of refutations across all contributions, combined with the taxonomy showing only four sibling papers in the same leaf, suggests the work explores a relatively underexplored combination of episodic sparsity and spatial geometry. However, the analysis explicitly does not cover exhaustive literature review, and the taxonomy's fifty total papers indicate substantial related work exists across neighboring branches that may inform assessments of incremental versus transformative novelty.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: constructing spatial world models from sparse episodic memories. This field bridges neuroscience, cognitive science, and artificial intelligence to understand how agents—biological or artificial—build coherent representations of space from fragmentary experiences. The taxonomy reflects a multifaceted landscape: Neuroscience Foundations examine the neural substrates of place cells, grid cells, and hippocampal remapping that underpin spatial coding; Computational Models translate these insights into algorithmic frameworks for memory consolidation and retrieval; AI Architectures explore how modern machine learning systems can leverage episodic buffers and world models for navigation and planning; Human Behavioral Studies probe the cognitive strategies people use to integrate spatial context with event memory; and Theoretical Perspectives seek unifying principles that link these domains. Representative works such as Unifying Spatial Episodic[5] and Prestructured Spatial Representations[7] illustrate efforts to merge spatial and episodic streams into coherent architectures, while studies like Dynamic Neural Navigation[3] and Planning from Imagination[2] demonstrate how agents can exploit sparse memories for flexible decision-making. Several active lines of work highlight key trade-offs and open questions. One strand focuses on how episodic retrieval mechanisms—ranging from non-Hebbian codes (Non-Hebbian Episodic Code[1]) to agentic control strategies (Agentic Episodic Control[4])—can scaffold spatial reasoning when observations are incomplete. Another explores the role of prestructured representations versus learned world models, debating whether spatial scaffolds (Spatial Scaffolds[21]) or latent generative models (Latent World Models[11]) better capture the flexibility of human-like memory. Spatial World Models[0] sits within the Unified Spatial-Episodic Memory Architectures branch, closely aligned with Unifying Spatial Episodic[5] in its emphasis on integrating sparse episodic snapshots into a coherent spatial framework. Compared to Prestructured Spatial Representations[7], which assumes innate geometric priors, Spatial World Models[0] appears to prioritize learning from episodic experience, reflecting a more constructivist stance on how world models emerge from memory.

Claimed Contributions

Episodic Spatial World Model (ESWM) framework

The authors propose ESWM, a neural network framework that builds coherent spatial world models by integrating sparse, disjoint one-step transitions (episodic memories) rather than requiring long sequential trajectories. The model meta-learns to predict missing components of unseen transitions given a memory bank of disjoint experiences.

10 retrieved papers
Geometric latent space reflecting environment topology

ESWM's internal representations form a geometric map that mirrors the spatial layout of environments, including obstacles and boundaries. This structured latent space emerges without explicit supervision for spatial mapping and dynamically adapts when new memories are added or environmental structures change.

10 retrieved papers
Zero-shot exploration and navigation capabilities

The learned world model supports near-optimal exploration and navigation in novel environments without task-specific training. ESWM can autonomously explore unfamiliar spaces and plan paths between arbitrary locations using only its learned ability to integrate episodic memories.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Episodic Spatial World Model (ESWM) framework

The authors propose ESWM, a neural network framework that builds coherent spatial world models by integrating sparse, disjoint one-step transitions (episodic memories) rather than requiring long sequential trajectories. The model meta-learns to predict missing components of unseen transitions given a memory bank of disjoint experiences.

Contribution

Geometric latent space reflecting environment topology

ESWM's internal representations form a geometric map that mirrors the spatial layout of environments, including obstacles and boundaries. This structured latent space emerges without explicit supervision for spatial mapping and dynamically adapts when new memories are added or environmental structures change.

Contribution

Zero-shot exploration and navigation capabilities

The learned world model supports near-optimal exploration and navigation in novel environments without task-specific training. ESWM can autonomously explore unfamiliar spaces and plan paths between arbitrary locations using only its learned ability to integrate episodic memories.