Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

mechanistic interpretabilityreinforcement learningsokoban

We partially reverse-engineer a convolutional recurrent neural network (RNN) trained with model-free reinforcement learning to play the box-pushing game Sokoban. We find that the RNN stores future moves (plans) as activations in particular channels of the hidden state, which we call path channels. A high activation in a particular location means that, when a box is in that location, it will get pushed in the channel's assigned direction. We examine the convolutional kernels between path channels and find that they encode the change in position resulting from each possible action, thus representing part of a learned transition model. The RNN constructs plans by starting at the boxes and goals. These kernels, extend activations in path channels forwards from boxes and backwards from the goal. Negative values are placed in channels at obstacles. This causes the extension kernels to propagate the negative value in reverse, thus pruning the last few steps and letting an alternative plan emerge; a form of backtracking. Our work shows that, a precise understanding of the plan representation allows us to directly understand the bidirectional planning-like algorithm learned by model-free training in more familiar terms.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper reverse-engineers a Sokoban-playing RNN to uncover how it internally represents plans as 'path channels' in hidden states. It resides in the 'Plan Representation Discovery in Game-Playing RNNs' leaf, which contains only three papers total, including this work and two siblings. This is a notably sparse research direction within the broader mechanistic interpretability branch, suggesting the paper addresses a relatively underexplored niche focused specifically on interpretable plan representations in puzzle-solving agents rather than general RNN debugging or application-oriented planning.

The taxonomy reveals that mechanistic interpretability of planning sits alongside much larger branches devoted to RNN-based planning applications in robotics and logistics. The closest neighboring leaf, 'Neural Circuit Mechanisms for Sequential Planning,' examines circuit-level mechanisms but excludes high-level planning models without mechanistic analysis. The paper's focus on convolutional kernels encoding transition models and bidirectional plan construction distinguishes it from application-driven work in robotic motion planning or cognitive neuroscience-inspired models, which do not prioritize reverse-engineering learned algorithms in trained networks.

Among thirty candidates examined across three contributions, none were found to clearly refute the paper's claims. The first contribution, discovering path channels as direct plan representation, examined ten candidates with zero refutable matches. Similarly, the mechanistic explanation via plan extension kernels and the bidirectional planning algorithm with backtracking each examined ten candidates without identifying overlapping prior work. This suggests that within the limited search scope, the specific combination of path channel discovery, kernel-based transition models, and backtracking mechanisms appears relatively novel, though the small candidate pool limits definitive conclusions.

Based on the limited literature search of thirty semantically similar papers, the work appears to occupy a sparsely populated research direction with minimal direct overlap among examined candidates. However, the analysis does not cover exhaustive citation networks or domain-specific venues, leaving open the possibility of relevant prior work outside the top-K semantic matches. The taxonomy structure confirms this is an emerging area with few directly comparable studies.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Reverse-engineering planning mechanisms in a recurrent neural network. The field structure suggested by the taxonomy reveals a diverse landscape organized around five main branches. The first branch, Mechanistic Interpretability of Planning in Trained RNNs, focuses on understanding how trained networks internally represent and execute planning, often through analysis of game-playing agents such as those trained on Sokoban puzzles. The second branch, RNN-Based Planning Applications, encompasses practical deployments in robotics and navigation, including works like Robotic Path Planning[3] and Multi-AGV Routing[4]. The third branch, Prediction and Forecasting with RNNs, addresses temporal prediction tasks ranging from land use forecasting to vehicle trajectory prediction. The fourth branch, Design and Decision Prediction, targets sequential decision-making in engineering and design contexts, while the fifth branch covers Domain-Specific Applications and Methodological Studies, spanning areas from medical imaging to logistics optimization. Within the mechanistic interpretability branch, a particularly active line of work examines how RNNs trained on puzzle-solving tasks develop internal planning representations. Path Channels Sokoban[0] sits squarely in this cluster, focusing on discovering interpretable plan representations in game-playing RNNs. It shares close thematic ties with Planning Sokoban RNN[1] and Interpreting Sokoban Search[40], both of which also probe planning mechanisms in Sokoban-trained networks. While Planning Sokoban RNN[1] emphasizes the emergence of planning behavior during training, Path Channels Sokoban[0] appears to concentrate on identifying specific computational structures—such as path channels—that encode planned trajectories. This contrasts with broader interpretability efforts like DeepSeer RNN Debugging[36], which targets general-purpose debugging rather than domain-specific planning analysis. The main open question across these works remains how to bridge low-level mechanistic findings with higher-level cognitive theories of planning.

Claimed Contributions

Discovery of path channels as direct plan representation

10 retrieved papers

The authors identify that specific hidden state channels in the DRC(3,3) network directly encode the agent's and boxes' future movement directions without requiring linear probes. High activation in a path channel at a location indicates the propensity to move in that channel's assigned direction.

10 retrieved papers

Mechanistic explanation via plan extension kernels

10 retrieved papers

The authors reverse-engineer the convolutional kernels that operate on path channels, showing these kernels implement forward and backward plan extension by propagating activations along movement directions and enabling backtracking through negative value propagation.

10 retrieved papers

Bidirectional planning algorithm with backtracking

10 retrieved papers

The authors describe how the network implements bidirectional search by initializing path segments at boxes and targets, extending them via specialized kernels, and pruning unpromising paths by propagating negative activations backward along path segments as a form of backtracking.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[1] Planning in a recurrent neural network that plays Sokoban PDF

Taufeeque, Mohammad, Quirke Philip, Li, Maximilian, Cundy, Chris, Tucker, Aaron David, Gleave, Adam, Garriga-Alonso, AdriÃ (2024)

[40] Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban PDF

Taufeeque, Mohammad, Tucker, Aaron David, Gleave, Adam, Garriga-Alonso, AdriÃ (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Discovery of path channels as direct plan representation

[60] Predrnn: A recurrent neural network for spatiotemporal predictive learning PDF

Cannot Refute

[61] Multi-condition latent diffusion network for scene-aware neural human motion prediction PDF

Cannot Refute

[62] Temporal recurrent networks for online action detection PDF

Cannot Refute

[63] Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences PDF

Cannot Refute

[64] Learning semantic latent directions for accurate and controllable human motion prediction PDF

Cannot Refute

[65] A prospective approach for human-to-human interaction recognition from Wi-Fi channel data using attention bidirectional gated recurrent neural network with GUI â¦ PDF

Cannot Refute

[66] Desire: Distant future prediction in dynamic scenes with interacting agents PDF

Cannot Refute

[67] On human motion prediction using recurrent neural networks PDF

Cannot Refute

[68] Text2action: Generative adversarial synthesis from language to action PDF

Cannot Refute

[69] Adversarial generative learning and timed path optimization for real-time visual image prediction to guide robot arm movements PDF

Cannot Refute

Contribution

Mechanistic explanation via plan extension kernels

[40] Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban PDF

Cannot Refute

[51] Residual Mask in Cascaded Convolutional Transformer for Spectral Reconstruction PDF

Cannot Refute

[52] Trajectory convolution for action recognition PDF

Cannot Refute

[53] Pdsketch: Integrated planning domain programming and learning PDF

Cannot Refute

[54] Online planner selection with graph neural networks and adaptive scheduling PDF

Cannot Refute

[55] Continuity-Preserving Convolutional Autoencoders for Learning Continuous Latent Dynamical Models from Images PDF

Cannot Refute

[56] Spatiotemporal trajectories in resting-state FMRI revealed by convolutional variational autoencoder PDF

Cannot Refute

[57] FIR filters for online trajectory planning with time-and frequency-domain specifications PDF

Cannot Refute

[58] Universal value iteration networks: When spatially-invariant is not universal PDF

Cannot Refute

[59] Multi-Objective Optimization of Route Planning Based on Distributed Breadth Convolutional Neural Network and Markov Decision Processes PDF

Cannot Refute

Contribution

Bidirectional planning algorithm with backtracking

[40] Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban PDF

Cannot Refute

[70] A Multi-robot Farmland Operation Path Scheduling Method Integrating Jump Point Search and Multi-stage Collaborative Optimized Salp Swarm Algorithm for â¦ PDF

Cannot Refute

[71] An Improved A* Algorithm Based on Bidirectional Search PDF

Cannot Refute

[72] Message anonymity on predictable opportunistic networks PDF

Cannot Refute

[73] Online complete coverage path planning using two-way proximity search PDF

Cannot Refute

[74] Large vocabulary off-line handwriting recognition: A survey PDF

Cannot Refute

[75] Bidirectional implementation of Markov/CCMT for dynamic reliability analysis with application to digital I&C systems PDF

Cannot Refute

[76] Looking backward to plan forward. PDF

Cannot Refute

[77] Modelling reversible execution of robotic assembly PDF

Cannot Refute

[78] Multiple-pass search strategies PDF

Cannot Refute

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[1] Planning in a recurrent neural network that plays Sokoban PDF

[40] Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban PDF

Contribution Analysis

Discovery of path channels as direct plan representation

[60] Predrnn: A recurrent neural network for spatiotemporal predictive learning PDF

[61] Multi-condition latent diffusion network for scene-aware neural human motion prediction PDF

[62] Temporal recurrent networks for online action detection PDF

[63] Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences PDF

[64] Learning semantic latent directions for accurate and controllable human motion prediction PDF

[65] A prospective approach for human-to-human interaction recognition from Wi-Fi channel data using attention bidirectional gated recurrent neural network with GUI â¦ PDF

[66] Desire: Distant future prediction in dynamic scenes with interacting agents PDF

[67] On human motion prediction using recurrent neural networks PDF

[68] Text2action: Generative adversarial synthesis from language to action PDF

[69] Adversarial generative learning and timed path optimization for real-time visual image prediction to guide robot arm movements PDF

Mechanistic explanation via plan extension kernels

[40] Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban PDF

[51] Residual Mask in Cascaded Convolutional Transformer for Spectral Reconstruction PDF

[52] Trajectory convolution for action recognition PDF

[53] Pdsketch: Integrated planning domain programming and learning PDF

[54] Online planner selection with graph neural networks and adaptive scheduling PDF

[55] Continuity-Preserving Convolutional Autoencoders for Learning Continuous Latent Dynamical Models from Images PDF

[56] Spatiotemporal trajectories in resting-state FMRI revealed by convolutional variational autoencoder PDF

[57] FIR filters for online trajectory planning with time-and frequency-domain specifications PDF

[58] Universal value iteration networks: When spatially-invariant is not universal PDF

[59] Multi-Objective Optimization of Route Planning Based on Distributed Breadth Convolutional Neural Network and Markov Decision Processes PDF

Bidirectional planning algorithm with backtracking

[40] Interpreting learned search: finding a transition model and value function in an RNN that plays Sokoban PDF

[70] A Multi-robot Farmland Operation Path Scheduling Method Integrating Jump Point Search and Multi-stage Collaborative Optimized Salp Swarm Algorithm for â¦ PDF

[71] An Improved A* Algorithm Based on Bidirectional Search PDF

[72] Message anonymity on predictable opportunistic networks PDF

[73] Online complete coverage path planning using two-way proximity search PDF

[74] Large vocabulary off-line handwriting recognition: A survey PDF

[75] Bidirectional implementation of Markov/CCMT for dynamic reliability analysis with application to digital I&C systems PDF

[76] Looking backward to plan forward. PDF

[77] Modelling reversible execution of robotic assembly PDF

[78] Multiple-pass search strategies PDF

Table of Contents

[65] A prospective approach for human-to-human interaction recognition from Wi-Fi channel data using attention bidirectional gated recurrent neural network with GUI â¦ PDF

[70] A Multi-robot Farmland Operation Path Scheduling Method Integrating Jump Point Search and Multi-stage Collaborative Optimized Salp Swarm Algorithm for â¦ PDF