Chessformer: A Unified Architecture for Chess Modeling

ICLR 2026 Conference SubmissionAnonymous Authors
TransformersInterpretabilityHuman-Aligned AIChessAction Prediction
Abstract:

Chess has played a uniquely important historical role as a testbed domain for artificial intelligence. Applying new architectures to improve absolute chess performance, and more recently to predict human moves at specified skill levels, has therefore garnered attention in the machine learning literature. Current approaches to these problems employ transformer models with widely varying architectural designs, and use unintuitive tokenization schemes that are not amenable to interpretability techniques, which hinders their applicability for teaching and human-AI interaction. We introduce Chessformer, a novel chess transformer model design that consists of an encoder-only model which processes chessboard squares as input tokens, instead of moves or the entire position, a dynamic positional encoding scheme that allows the model to flexibly adapt to the unique geometries present in chess, and an attention-based policy output design. We show that Chessformer advances the state of the art in all three major chess modeling goals: it significantly improves the chess-playing performance of a state-of-the-art chess engine, it surpasses the previous best human move-matching prediction performance with a much smaller model, and it enables substantial interpretability benefits. Our unified approach constitutes a broad advance across several important tasks in chess AI, and also demonstrates the benefits of carefully adapting transformers' tokenization systems, output systems, and positional encodings to reflect the structure of a domain of interest.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Chessformer, a unified encoder-only transformer architecture designed to address multiple chess modeling objectives simultaneously: superhuman play, human move prediction, and interpretability. According to the taxonomy, this work resides in the 'Unified and Multi-Task Chess Architectures' leaf, which contains only this paper as a sibling. This positioning indicates a relatively sparse research direction within the broader field of chess transformers, where most prior efforts have specialized in single tasks such as move prediction, position generation, or interpretability analysis rather than pursuing a unified framework.

The taxonomy reveals that neighboring research directions are densely populated. The 'Chess Move and Policy Prediction' branch contains multiple subcategories with numerous papers addressing superhuman play, human-like prediction, and notation optimization. The 'Interpretability and Mechanistic Analysis' branch explores learned algorithms and representation probing, while 'Planning and Strategic Reasoning' investigates longer-horizon decision-making. Chessformer's unified approach diverges from these specialized efforts by attempting to consolidate multiple objectives within a single architectural design, rather than optimizing for one narrow task as seen in works like Attention Chess Scoresheet or Contrastive Chess Planning.

Among the 21 candidates examined through limited semantic search, none were found to clearly refute any of Chessformer's three core contributions. The unified architecture contribution examined 10 candidates with no refutable overlaps, suggesting that the multi-task design approach is relatively unexplored in the examined literature. The Geometric Attention Bias dynamic positional encoding examined only 1 candidate without refutation, indicating limited prior work on geometry-aware positional schemes. The attention-based policy output design also examined 10 candidates with no refutations, though this may reflect the limited search scope rather than absolute novelty.

Based on the examined subset of 21 papers, Chessformer appears to occupy a distinctive position by combining unified multi-task learning with interpretability-focused design choices. However, the analysis explicitly acknowledges its limited scope—top-K semantic matches rather than exhaustive field coverage. The sparse population of the unified architecture leaf and absence of refutable overlaps suggest potential novelty, though a broader literature search might reveal additional relevant work in adjacent domains or recent publications not captured in this taxonomy.

Taxonomy

Core-task Taxonomy Papers
46
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: transformer architecture for chess modeling. The field has grown into a rich landscape organized around several complementary directions. At the highest level, one finds branches dedicated to move and policy prediction—where models learn to forecast legal or strong moves from positions—alongside position generation and synthesis efforts that create novel board states. A separate cluster addresses puzzle difficulty and evaluation prediction, aiming to estimate how challenging a tactic will be for human solvers. State tracking and world model learning explore whether transformers can maintain an internal board representation from move sequences, while interpretability and mechanistic analysis (e.g., Chess Mastery Mechanistic[10]) probe what circuits and features these models actually learn. Unified and multi-task architectures attempt to handle multiple objectives simultaneously, and planning branches investigate longer-horizon strategic reasoning. Additional directions cover alternative representations (e.g., Vision Transformer Chess[22], Linear Board Representations[24]), language model evaluation, knowledge integration from auxiliary data such as textbooks (LEAP Chess Textbooks[15]), cross-domain transfer, non-transformer baselines, and performance analysis. Within this taxonomy, particularly active lines of work contrast single-task specialists against more general frameworks. For instance, some studies focus narrowly on move prediction using attention over scoresheets (Attention Chess Scoresheet[3]) or hybrid architectures (AttLSTM Chess Moves[4]), while others pursue end-to-end planning with contrastive or amortized methods (Contrastive Chess Planning[7], Amortized Chess Planning[8]). The original paper, Chessformer[0], sits squarely in the unified and multi-task branch, emphasizing a single architecture that can address diverse chess objectives rather than specializing in one narrow task. This contrasts with works like Pretraining Chess Puzzles[1] or Decoder Chess Positions[2], which target specific sub-problems, and aligns more closely with efforts such as Maia Unified[11] that also seek broader applicability. A key open question across these branches is whether general-purpose transformers can match or exceed task-specific designs without sacrificing interpretability or sample efficiency.

Claimed Contributions

Chessformer unified architecture for chess modeling

The authors propose Chessformer, an encoder-only transformer architecture that treats the 64 chessboard squares as tokens rather than moves or entire positions. This design choice aligns the model representation with the natural visual structure of chess and enables more effective position encodings and interpretability.

10 retrieved papers
Geometric Attention Bias (GAB) dynamic positional encoding

The authors introduce GAB, a novel adaptive position encoding that uses a compressed representation of the board state to dynamically generate attention biases from templates. This allows the model to capture the variable geometry of chess, where positional relationships depend on piece types and board state, unlike static encodings used in language or vision.

1 retrieved paper
Attention-based policy output design

The authors propose a policy head based on self-attention that reflects the from-to structure of chess moves by encoding them through starting and destination squares. This design aligns with the underlying action space and improves interpretability of MLP activations compared to prior approaches.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Chessformer unified architecture for chess modeling

The authors propose Chessformer, an encoder-only transformer architecture that treats the 64 chessboard squares as tokens rather than moves or entire positions. This design choice aligns the model representation with the natural visual structure of chess and enables more effective position encodings and interpretability.

Contribution

Geometric Attention Bias (GAB) dynamic positional encoding

The authors introduce GAB, a novel adaptive position encoding that uses a compressed representation of the board state to dynamically generate attention biases from templates. This allows the model to capture the variable geometry of chess, where positional relationships depend on piece types and board state, unlike static encodings used in language or vision.

Contribution

Attention-based policy output design

The authors propose a policy head based on self-attention that reflects the from-to structure of chess moves by encoding them through starting and destination squares. This design aligns with the underlying action space and improves interpretability of MLP activations compared to prior approaches.