Code World Models for General Game Playing

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

large language modelscode world modelscode generationinformation set MCTSplanningpartial observabilitytwo-player gamesimperfect information games

Large Language Models (LLMs) reasoning abilities are increasingly being applied to classical board and card games, but the dominant approach---involving prompting for direct move generation---has significant drawbacks. It relies on the model's implicit fragile pattern-matching capabilities, leading to frequent illegal moves and strategically shallow play. Here we introduce an alternative approach: We use the LLM to translate natural language rules and game trajectories into a formal, executable world model represented as Python code. This generated model---comprising functions for state transition, legal move enumeration, and termination checks---serves as a verifiable simulation engine for high-performance planning algorithms like Monte Carlo tree search (MCTS). In addition, we prompt the LLM to generate heuristic value functions (to make MCTS more efficient), and inference functions (to estimate hidden states in imperfect information games). Our method offers three distinct advantages compared to directly using the LLM as a policy: (1) Verifiability: The generated CWM serves as a formal specification of the game's rules, allowing planners to algorithmically enumerate valid actions and avoid illegal moves, contingent on the correctness of the synthesized model; (2) Strategic Depth: We combine LLM semantic understanding with the deep search power of classical planners; and (3) Generalization: We direct the LLM to focus on the meta-task of data-to-code translation, enabling it to adapt to new games more easily. We evaluate our agent on 10 different games, of which 4 are novel and created for this paper. 5 of the games are fully observed (perfect information), and 5 are partially observed (imperfect information). We find that our method outperforms or matches Gemini 2.5 Pro in 9 out of the 10 considered games.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes generating executable Python code from natural language game rules to serve as a formal world model for planning algorithms like MCTS. It resides in the 'General Game Playing via Code Generation' leaf, which contains only three papers total, including this work and two siblings (Code to Play, Code World MCTS). This is a notably sparse research direction within the broader taxonomy of 39 papers across 36 topics, suggesting the specific combination of LLM-driven code synthesis for general game playing with verifiable planning is relatively underexplored compared to adjacent areas like neural world models or direct LLM game generation.

The taxonomy reveals several neighboring research directions. The sibling leaf 'Domain-Specific Code Generation' focuses on specialized domains (3D environments, traffic scenarios) rather than general game playing, while 'Formal Specification Languages for Games' emphasizes declarative DSLs like VGDL rather than LLM-driven Python synthesis. The parallel branch 'Neural World Models' trades code interpretability for learned dynamics, and 'Direct LLM Game Generation' bypasses explicit world model construction entirely. The paper's approach sits at the intersection of symbolic verifiability (via code) and LLM flexibility, distinguishing it from purely neural methods while maintaining broader applicability than domain-specific code generators.

Among 29 candidates examined across three contributions, no clearly refuting prior work was identified. The core contribution (Code World Models for verifiable planning) examined 9 candidates with 0 refutable matches; inference function synthesis for imperfect information games examined 10 candidates with 0 refutable; and closed deck learning for partial observability examined 10 candidates with 0 refutable. This suggests that within the limited search scope, the specific combination of LLM-generated executable code, MCTS integration, and imperfect information handling appears relatively novel. However, the small candidate pool and sparse taxonomy leaf indicate this assessment reflects top-30 semantic matches rather than exhaustive field coverage.

Based on the limited literature search, the work appears to occupy a sparsely populated niche combining code-based world model synthesis with general game playing. The absence of refuting candidates across all three contributions, coupled with the small taxonomy leaf (3 papers), suggests potential novelty within the examined scope. However, the analysis covers only 29 candidates from semantic search, leaving open the possibility of relevant work outside this retrieval window, particularly in adjacent areas like hybrid symbolic-neural methods or domain-specific planning frameworks.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Generating executable world models from natural language game descriptions. The field divides into several complementary branches. Code-Based World Model Synthesis focuses on translating language into executable programs or domain-specific languages, often leveraging general game playing frameworks and symbolic representations. Neural World Models learn latent dynamics directly from data, trading interpretability for flexibility in complex environments. Direct LLM Game Generation exploits large language models to produce game content or mechanics end-to-end, sometimes bypassing explicit intermediate representations. Benchmarks and Evaluation Frameworks provide standardized testbeds and metrics for comparing these diverse approaches, while Supporting Techniques and Applications encompass auxiliary methods such as procedural generation, constraint-based design, and real-world deployment scenarios. Together, these branches reflect a spectrum from symbolic, verifiable code synthesis to learned, black-box neural dynamics. Within Code-Based World Model Synthesis, a particularly active line of work explores general game playing via code generation, where systems produce executable game logic from textual descriptions. Code World Models[0] sits squarely in this cluster, emphasizing the generation of interpretable, modular code that can be executed and debugged. Nearby efforts such as Code to Play[15] and Code World MCTS[29] similarly prioritize code as the primary representation, but differ in their search or planning strategies for refining generated programs. In contrast, works like Word to World Models[1] and Gavel[2] blend symbolic and neural components, using language models to guide code synthesis while maintaining some degree of learned flexibility. The main trade-off across these approaches is between the transparency and verifiability of pure code generation and the adaptability of hybrid or fully neural methods. Code World Models[0] leans toward the former, offering a clear executable artifact that domain experts can inspect and modify, distinguishing it from more opaque neural alternatives while sharing the code-centric philosophy of its immediate neighbors.

Claimed Contributions

Code World Models for game playing with verifiable planning

9 retrieved papers

The authors propose using LLMs to synthesize executable Python code representing game rules and dynamics (Code World Models) from textual descriptions and example trajectories. This CWM serves as a verifiable simulation engine for classical planning algorithms like MCTS, enabling algorithmic enumeration of valid actions and avoiding illegal moves.

9 retrieved papers

Inference function synthesis for imperfect information games

10 retrieved papers

The authors introduce a novel paradigm where the LLM synthesizes inference functions that act as encoders mapping observations to plausible latent histories, while the CWM acts as a decoder. This enables ISMCTS planning in partially observable games by estimating hidden states from observations.

10 retrieved papers

Closed deck learning for strictly partial observability

10 retrieved papers

The authors develop a method for learning CWMs in a closed deck scenario where hidden states are never observed, even post-hoc. They construct a regularized autoencoder where the inference function encodes observations to hidden action sequences and the CWM decodes them back, with game rules serving as structural regularizers.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[15] From code to play: Benchmarking program search for games using large language models PDF

Manuel Eberhardinger, James Goodman, Alexander Dockhorn, J. Goodman, Diego PÃ©rez LiÃ©bana, Raluca D. Gaina, Diego PÃ©rez-LiÃ©bana, Duygu Ãakmak, Setareh Maghsudi, Simon Lucas (2025)

[29] Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search PDF

M Merler (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Code World Models for game playing with verifiable planning

[60] Planning-driven programming: A large language model programming workflow PDF

Cannot Refute

[61] Codeplan: Unlocking reasoning potential in large language models by scaling code-form planning PDF

Cannot Refute

[63] Psyche: Innovations in Development of Planning and Sequencing Systems PDF

Cannot Refute

[64] Code-Driven Planning in Grid Worlds with Large Language Models PDF

Cannot Refute

[65] A Lightweight and Deployable Language-To-Robot Control System Using Modular Llms and Vision Model PDF

Cannot Refute

[66] ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models PDF

Cannot Refute

[67] Breast Cancer Classification Based on Fuzzy Rules and Deep Learning Techniques PDF

Cannot Refute

[68] Improved Generalized Planning with LLMs through Strategy Refinement and Reflection PDF

Cannot Refute

[69] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis PDF

Cannot Refute

Contribution

Inference function synthesis for imperfect information games

[50] Active inference and reinforcement learning: A unified inference on continuous state and action spaces under partial observability PDF

Cannot Refute

[51] Toward the third generation artificial intelligence PDF

Cannot Refute

[52] Deep Recurrent Reinforcement Learning for Intercept Guidance Law under Partial Observability PDF

Cannot Refute

[53] Learning models of adversarial agent behavior under partial observability PDF

Cannot Refute

[54] Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability PDF

Cannot Refute

[55] Sample-efficient reinforcement learning of partially observable markov games PDF

Cannot Refute

[56] Modeling other players with bayesian beliefs for games with incomplete information PDF

Cannot Refute

[57] Adversarial Decision-Making in Partially Observable Multi-Agent Systems: A Sequential Hypothesis Testing Approach PDF

Cannot Refute

[58] Stochastic prediction of multi-agent interactions from partial observations PDF

Cannot Refute

[59] Mean Field Game Theory for Agents with Individual-State Partial Observations PDF

Cannot Refute

Contribution

Closed deck learning for strictly partial observability

[40] Data-driven phase control for limit-cycle oscillators under partial observation PDF

Cannot Refute

[41] Predictive world models from real-world partial observations PDF

Cannot Refute

[42] Learning belief representations for partially observable deep rl PDF

Cannot Refute

[43] A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning PDF

Cannot Refute

[44] MEGA: Masked Generative Autoencoder for Human Mesh Recovery PDF

Cannot Refute

[45] Learning physics constrained dynamics using autoencoders PDF

Cannot Refute

[46] Variational Autoencoder for the Prediction of Oil Contamination Temporal Evolution in Water Environments PDF

Cannot Refute

[47] Model-based deep reinforcement learning for active control of flow around a circular cylinder using action-informed episode-based neural ordinary differential â¦ PDF

Cannot Refute

[48] Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction PDF

Cannot Refute

[49] Sensor-Driven Surrogate Modeling and Control of Nonlinear Dynamical Systems Using FAE-CAE-LSTM and Deep Reinforcement Learning PDF

Cannot Refute

Code World Models for General Game Playing

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[15] From code to play: Benchmarking program search for games using large language models PDF

[29] Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search PDF

Contribution Analysis

Code World Models for game playing with verifiable planning

[60] Planning-driven programming: A large language model programming workflow PDF

[61] Codeplan: Unlocking reasoning potential in large language models by scaling code-form planning PDF

[63] Psyche: Innovations in Development of Planning and Sequencing Systems PDF

[64] Code-Driven Planning in Grid Worlds with Large Language Models PDF

[65] A Lightweight and Deployable Language-To-Robot Control System Using Modular Llms and Vision Model PDF

[66] ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models PDF

[67] Breast Cancer Classification Based on Fuzzy Rules and Deep Learning Techniques PDF

[68] Improved Generalized Planning with LLMs through Strategy Refinement and Reflection PDF

[69] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis PDF

Inference function synthesis for imperfect information games

[50] Active inference and reinforcement learning: A unified inference on continuous state and action spaces under partial observability PDF

[51] Toward the third generation artificial intelligence PDF

[52] Deep Recurrent Reinforcement Learning for Intercept Guidance Law under Partial Observability PDF

[53] Learning models of adversarial agent behavior under partial observability PDF

[54] Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability PDF

[55] Sample-efficient reinforcement learning of partially observable markov games PDF

[56] Modeling other players with bayesian beliefs for games with incomplete information PDF

[57] Adversarial Decision-Making in Partially Observable Multi-Agent Systems: A Sequential Hypothesis Testing Approach PDF

[58] Stochastic prediction of multi-agent interactions from partial observations PDF

[59] Mean Field Game Theory for Agents with Individual-State Partial Observations PDF

Closed deck learning for strictly partial observability

[40] Data-driven phase control for limit-cycle oscillators under partial observation PDF

[41] Predictive world models from real-world partial observations PDF

[42] Learning belief representations for partially observable deep rl PDF

[43] A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning PDF

[44] MEGA: Masked Generative Autoencoder for Human Mesh Recovery PDF

[45] Learning physics constrained dynamics using autoencoders PDF

[46] Variational Autoencoder for the Prediction of Oil Contamination Temporal Evolution in Water Environments PDF

[47] Model-based deep reinforcement learning for active control of flow around a circular cylinder using action-informed episode-based neural ordinary differential â¦ PDF

[48] Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction PDF

[49] Sensor-Driven Surrogate Modeling and Control of Nonlinear Dynamical Systems Using FAE-CAE-LSTM and Deep Reinforcement Learning PDF

Table of Contents

[47] Model-based deep reinforcement learning for active control of flow around a circular cylinder using action-informed episode-based neural ordinary differential â¦ PDF