Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

ICLR 2026 Conference SubmissionAnonymous Authors
mechanistic interpretabilityreinforcement learningsokoban
Abstract:

We partially reverse-engineer a convolutional recurrent neural network (RNN) trained with model-free reinforcement learning to play the box-pushing game Sokoban. We find that the RNN stores future moves (plans) as activations in particular channels of the hidden state, which we call path channels. A high activation in a particular location means that, when a box is in that location, it will get pushed in the channel's assigned direction. We examine the convolutional kernels between path channels and find that they encode the change in position resulting from each possible action, thus representing part of a learned transition model. The RNN constructs plans by starting at the boxes and goals. These kernels, extend activations in path channels forwards from boxes and backwards from the goal. Negative values are placed in channels at obstacles. This causes the extension kernels to propagate the negative value in reverse, thus pruning the last few steps and letting an alternative plan emerge; a form of backtracking. Our work shows that, a precise understanding of the plan representation allows us to directly understand the bidirectional planning-like algorithm learned by model-free training in more familiar terms.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper reverse-engineers a Sokoban-playing RNN to uncover how it internally represents plans as 'path channels' in hidden states. It resides in the 'Plan Representation Discovery in Game-Playing RNNs' leaf, which contains only three papers total, including this work and two siblings. This is a notably sparse research direction within the broader mechanistic interpretability branch, suggesting the paper addresses a relatively underexplored niche focused specifically on interpretable plan representations in puzzle-solving agents rather than general RNN debugging or application-oriented planning.

The taxonomy reveals that mechanistic interpretability of planning sits alongside much larger branches devoted to RNN-based planning applications in robotics and logistics. The closest neighboring leaf, 'Neural Circuit Mechanisms for Sequential Planning,' examines circuit-level mechanisms but excludes high-level planning models without mechanistic analysis. The paper's focus on convolutional kernels encoding transition models and bidirectional plan construction distinguishes it from application-driven work in robotic motion planning or cognitive neuroscience-inspired models, which do not prioritize reverse-engineering learned algorithms in trained networks.

Among thirty candidates examined across three contributions, none were found to clearly refute the paper's claims. The first contribution, discovering path channels as direct plan representation, examined ten candidates with zero refutable matches. Similarly, the mechanistic explanation via plan extension kernels and the bidirectional planning algorithm with backtracking each examined ten candidates without identifying overlapping prior work. This suggests that within the limited search scope, the specific combination of path channel discovery, kernel-based transition models, and backtracking mechanisms appears relatively novel, though the small candidate pool limits definitive conclusions.

Based on the limited literature search of thirty semantically similar papers, the work appears to occupy a sparsely populated research direction with minimal direct overlap among examined candidates. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of relevant prior work outside the top-K semantic matches. The taxonomy structure confirms this is an emerging area with few directly comparable studies.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Reverse-engineering planning mechanisms in a recurrent neural network. The field structure suggested by the taxonomy reveals a diverse landscape organized around five main branches. The first branch, Mechanistic Interpretability of Planning in Trained RNNs, focuses on understanding how trained networks internally represent and execute planning, often through analysis of game-playing agents such as those trained on Sokoban puzzles. The second branch, RNN-Based Planning Applications, encompasses practical deployments in robotics and navigation, including works like Robotic Path Planning[3] and Multi-AGV Routing[4]. The third branch, Prediction and Forecasting with RNNs, addresses temporal prediction tasks ranging from land use forecasting to vehicle trajectory prediction. The fourth branch, Design and Decision Prediction, targets sequential decision-making in engineering and design contexts, while the fifth branch covers Domain-Specific Applications and Methodological Studies, spanning areas from medical imaging to logistics optimization. Within the mechanistic interpretability branch, a particularly active line of work examines how RNNs trained on puzzle-solving tasks develop internal planning representations. Path Channels Sokoban[0] sits squarely in this cluster, focusing on discovering interpretable plan representations in game-playing RNNs. It shares close thematic ties with Planning Sokoban RNN[1] and Interpreting Sokoban Search[40], both of which also probe planning mechanisms in Sokoban-trained networks. While Planning Sokoban RNN[1] emphasizes the emergence of planning behavior during training, Path Channels Sokoban[0] appears to concentrate on identifying specific computational structures—such as path channels—that encode planned trajectories. This contrasts with broader interpretability efforts like DeepSeer RNN Debugging[36], which targets general-purpose debugging rather than domain-specific planning analysis. The main open question across these works remains how to bridge low-level mechanistic findings with higher-level cognitive theories of planning.

Claimed Contributions

Discovery of path channels as direct plan representation

The authors identify that specific hidden state channels in the DRC(3,3) network directly encode the agent's and boxes' future movement directions without requiring linear probes. High activation in a path channel at a location indicates the propensity to move in that channel's assigned direction.

10 retrieved papers
Mechanistic explanation via plan extension kernels

The authors reverse-engineer the convolutional kernels that operate on path channels, showing these kernels implement forward and backward plan extension by propagating activations along movement directions and enabling backtracking through negative value propagation.

10 retrieved papers
Bidirectional planning algorithm with backtracking

The authors describe how the network implements bidirectional search by initializing path segments at boxes and targets, extending them via specialized kernels, and pruning unpromising paths by propagating negative activations backward along path segments as a form of backtracking.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Discovery of path channels as direct plan representation

The authors identify that specific hidden state channels in the DRC(3,3) network directly encode the agent's and boxes' future movement directions without requiring linear probes. High activation in a path channel at a location indicates the propensity to move in that channel's assigned direction.

Contribution

Mechanistic explanation via plan extension kernels

The authors reverse-engineer the convolutional kernels that operate on path channels, showing these kernels implement forward and backward plan extension by propagating activations along movement directions and enabling backtracking through negative value propagation.

Contribution

Bidirectional planning algorithm with backtracking

The authors describe how the network implements bidirectional search by initializing path segments at boxes and targets, extending them via specialized kernels, and pruning unpromising paths by propagating negative activations backward along path segments as a form of backtracking.

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN | Novelty Validation