Chessformer: A Unified Architecture for Chess Modeling

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

TransformersInterpretabilityHuman-Aligned AIChessAction Prediction

Chess has played a uniquely important historical role as a testbed domain for artificial intelligence. Applying new architectures to improve absolute chess performance, and more recently to predict human moves at specified skill levels, has therefore garnered attention in the machine learning literature. Current approaches to these problems employ transformer models with widely varying architectural designs, and use unintuitive tokenization schemes that are not amenable to interpretability techniques, which hinders their applicability for teaching and human-AI interaction. We introduce Chessformer, a novel chess transformer model design that consists of an encoder-only model which processes chessboard squares as input tokens, instead of moves or the entire position, a dynamic positional encoding scheme that allows the model to flexibly adapt to the unique geometries present in chess, and an attention-based policy output design. We show that Chessformer advances the state of the art in all three major chess modeling goals: it significantly improves the chess-playing performance of a state-of-the-art chess engine, it surpasses the previous best human move-matching prediction performance with a much smaller model, and it enables substantial interpretability benefits. Our unified approach constitutes a broad advance across several important tasks in chess AI, and also demonstrates the benefits of carefully adapting transformers' tokenization systems, output systems, and positional encodings to reflect the structure of a domain of interest.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Chessformer, a unified encoder-only transformer architecture designed to address multiple chess modeling objectives simultaneously: superhuman play, human move prediction, and interpretability. According to the taxonomy, this work resides in the 'Unified and Multi-Task Chess Architectures' leaf, which contains only this paper as a sibling. This positioning indicates a relatively sparse research direction within the broader field of chess transformers, where most prior efforts have specialized in single tasks such as move prediction, position generation, or interpretability analysis rather than pursuing a unified framework.

The taxonomy reveals that neighboring research directions are densely populated. The 'Chess Move and Policy Prediction' branch contains multiple subcategories with numerous papers addressing superhuman play, human-like prediction, and notation optimization. The 'Interpretability and Mechanistic Analysis' branch explores learned algorithms and representation probing, while 'Planning and Strategic Reasoning' investigates longer-horizon decision-making. Chessformer's unified approach diverges from these specialized efforts by attempting to consolidate multiple objectives within a single architectural design, rather than optimizing for one narrow task as seen in works like Attention Chess Scoresheet or Contrastive Chess Planning.

Among the 21 candidates examined through limited semantic search, none were found to clearly refute any of Chessformer's three core contributions. The unified architecture contribution examined 10 candidates with no refutable overlaps, suggesting that the multi-task design approach is relatively unexplored in the examined literature. The Geometric Attention Bias dynamic positional encoding examined only 1 candidate without refutation, indicating limited prior work on geometry-aware positional schemes. The attention-based policy output design also examined 10 candidates with no refutations, though this may reflect the limited search scope rather than absolute novelty.

Based on the examined subset of 21 papers, Chessformer appears to occupy a distinctive position by combining unified multi-task learning with interpretability-focused design choices. However, the analysis explicitly acknowledges its limited scope—top-K semantic matches rather than exhaustive field coverage. The sparse population of the unified architecture leaf and absence of refutable overlaps suggest potential novelty, though a broader literature search might reveal additional relevant work in adjacent domains or recent publications not captured in this taxonomy.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: transformer architecture for chess modeling. The field has grown into a rich landscape organized around several complementary directions. At the highest level, one finds branches dedicated to move and policy prediction—where models learn to forecast legal or strong moves from positions—alongside position generation and synthesis efforts that create novel board states. A separate cluster addresses puzzle difficulty and evaluation prediction, aiming to estimate how challenging a tactic will be for human solvers. State tracking and world model learning explore whether transformers can maintain an internal board representation from move sequences, while interpretability and mechanistic analysis (e.g., Chess Mastery Mechanistic[10]) probe what circuits and features these models actually learn. Unified and multi-task architectures attempt to handle multiple objectives simultaneously, and planning branches investigate longer-horizon strategic reasoning. Additional directions cover alternative representations (e.g., Vision Transformer Chess[22], Linear Board Representations[24]), language model evaluation, knowledge integration from auxiliary data such as textbooks (LEAP Chess Textbooks[15]), cross-domain transfer, non-transformer baselines, and performance analysis. Within this taxonomy, particularly active lines of work contrast single-task specialists against more general frameworks. For instance, some studies focus narrowly on move prediction using attention over scoresheets (Attention Chess Scoresheet[3]) or hybrid architectures (AttLSTM Chess Moves[4]), while others pursue end-to-end planning with contrastive or amortized methods (Contrastive Chess Planning[7], Amortized Chess Planning[8]). The original paper, Chessformer[0], sits squarely in the unified and multi-task branch, emphasizing a single architecture that can address diverse chess objectives rather than specializing in one narrow task. This contrasts with works like Pretraining Chess Puzzles[1] or Decoder Chess Positions[2], which target specific sub-problems, and aligns more closely with efforts such as Maia Unified[11] that also seek broader applicability. A key open question across these branches is whether general-purpose transformers can match or exceed task-specific designs without sacrificing interpretability or sample efficiency.

Claimed Contributions

Chessformer unified architecture for chess modeling

10 retrieved papers

The authors propose Chessformer, an encoder-only transformer architecture that treats the 64 chessboard squares as tokens rather than moves or entire positions. This design choice aligns the model representation with the natural visual structure of chess and enables more effective position encodings and interpretability.

10 retrieved papers

Geometric Attention Bias (GAB) dynamic positional encoding

1 retrieved paper

The authors introduce GAB, a novel adaptive position encoding that uses a compressed representation of the board state to dynamically generate attention biases from templates. This allows the model to capture the variable geometry of chess, where positional relationships depend on piece types and board state, unlike static encodings used in language or vision.

1 retrieved paper

Attention-based policy output design

10 retrieved papers

The authors propose a policy head based on self-attention that reflects the from-to structure of chess moves by encoding them through starting and destination squares. This design aligns with the underlying action space and improves interpretability of MLP activations compared to prior approaches.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Chessformer unified architecture for chess modeling

[1] Pretraining Transformers for Chess Puzzle Difficulty Prediction PDF

Cannot Refute

[10] Decoding chess mastery: A mechanistic analysis of a chess language transformer model PDF

Cannot Refute

[12] Optimizing Language Models for Chess: The Impact of Custom Notation and Elo-Based Fine-Tuning PDF

Cannot Refute

[13] The Chess Transformer: Mastering Play using Generative Language Models PDF

Cannot Refute

[16] Evidence of Learned Look-Ahead in a Chess-Playing Neural Network PDF

Cannot Refute

[30] Diffusion Board: An End to End Chess Move Prediction Pipeline Leveraging Discrete Diffusion for Long Horizon Prediction PDF

Cannot Refute

[31] Flowcheck: A Discrete Flow Matching Approach for Generating Chess Configurations PDF

Cannot Refute

[49] Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models PDF

Cannot Refute

[50] Towards x86 Instruction Set Emulation in Java via Project-based Text-to-Code Generation using Reinforcement Learning PDF

Cannot Refute

[51] Beyond Evaluation: Learning Contextual Chess Position Representations PDF

Cannot Refute

Contribution

Geometric Attention Bias (GAB) dynamic positional encoding

[30] Diffusion Board: An End to End Chess Move Prediction Pipeline Leveraging Discrete Diffusion for Long Horizon Prediction PDF

Cannot Refute

Contribution

Attention-based policy output design

[4] An Empirical Study of AttLSTM Neural Networks for Chess Move Prediction PDF

Cannot Refute

[5] Enhancing Chess Reinforcement Learning with Graph Representation PDF

Cannot Refute

[6] Action vs. attention signals for human-ai collaboration: Evidence from chess PDF

Cannot Refute

[10] Decoding chess mastery: A mechanistic analysis of a chess language transformer model PDF

Cannot Refute

[14] Mastering Chess with a Transformer Model PDF

Cannot Refute

[16] Evidence of Learned Look-Ahead in a Chess-Playing Neural Network PDF

Cannot Refute

[22] Vision Transformer-Based Decision-Making Model for King and Minister Chess PDF

Cannot Refute

[29] Human-Aligned Chess AI: A Multitask Transformer for Humanlike Decision-Making PDF

Cannot Refute

[47] Transformer, BERT, and GPT: Including ChatGPT and Prompt Engineering PDF

Cannot Refute

[48] THE GENERATIVE AI LANDSCAPE PDF

Cannot Refute

Chessformer: A Unified Architecture for Chess Modeling

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Chessformer unified architecture for chess modeling

[1] Pretraining Transformers for Chess Puzzle Difficulty Prediction PDF

[10] Decoding chess mastery: A mechanistic analysis of a chess language transformer model PDF

[12] Optimizing Language Models for Chess: The Impact of Custom Notation and Elo-Based Fine-Tuning PDF

[13] The Chess Transformer: Mastering Play using Generative Language Models PDF

[16] Evidence of Learned Look-Ahead in a Chess-Playing Neural Network PDF

[30] Diffusion Board: An End to End Chess Move Prediction Pipeline Leveraging Discrete Diffusion for Long Horizon Prediction PDF

[31] Flowcheck: A Discrete Flow Matching Approach for Generating Chess Configurations PDF

[49] Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models PDF

[50] Towards x86 Instruction Set Emulation in Java via Project-based Text-to-Code Generation using Reinforcement Learning PDF

[51] Beyond Evaluation: Learning Contextual Chess Position Representations PDF

Geometric Attention Bias (GAB) dynamic positional encoding

[30] Diffusion Board: An End to End Chess Move Prediction Pipeline Leveraging Discrete Diffusion for Long Horizon Prediction PDF

Attention-based policy output design

[4] An Empirical Study of AttLSTM Neural Networks for Chess Move Prediction PDF

[5] Enhancing Chess Reinforcement Learning with Graph Representation PDF

[6] Action vs. attention signals for human-ai collaboration: Evidence from chess PDF

[10] Decoding chess mastery: A mechanistic analysis of a chess language transformer model PDF

[14] Mastering Chess with a Transformer Model PDF

[16] Evidence of Learned Look-Ahead in a Chess-Playing Neural Network PDF

[22] Vision Transformer-Based Decision-Making Model for King and Minister Chess PDF

[29] Human-Aligned Chess AI: A Multitask Transformer for Humanlike Decision-Making PDF

[47] Transformer, BERT, and GPT: Including ChatGPT and Prompt Engineering PDF

[48] THE GENERATIVE AI LANDSCAPE PDF

Table of Contents