The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment

ICLR 2026 Conference SubmissionAnonymous Authors
language modelneurosciencebrain alignmentfMRI
Abstract:

The alignment of Large Language Models (LLMs) and brain activity provides a powerful framework to advance our understanding of cognitive neuroscience and artificial intelligence. In this work, we zoom into one of the fundamental units of LLMs—the transformer block—to provide the first systematic computational neuroanatomy of its internal operations and human brain acitivity during language processing. Analyzing 21 state-of-the-art LLMs across five model families, we extract and evaluate 13 distinct intermediate states per transformer block—from initial layer normalization through attention mechanisms to feed-forward networks (FFNs). Our analysis reveals three key findings: (1) The commonly used hidden states in LLMs are surprisingly suboptimal, with over 90% of brain voxels in sensory and language regions better explained by previously unexplored intermediate computations; (2) Different computational stages within a single transformer block map to anatomically distinct brain systems, revealing an intra-block hierarchy where early attention states align with sensory cortices while later FFN states correspond to association areas—mirroring the cortical processing hierarchy; (3) Rotary Positional Embeddings (RoPE) specifically enhance alignment along the brain's auditory processing streams. Per-head queries with RoPE best explain 74% of auditory cortex activity compared to 8% without RoPE, providing the first neurobiological validation of this architectural component in LLMs. Building on these insights, we propose MindTransformer, a feature selection framework that learns brain-aligned representations from all intermediate states. MindTransformer achieves significant brain alignment performance, with correlation improvements in primary auditory cortex exceeding gains from 456× model scaling. Our computational neuroanatomy approach opens new directions for understanding both biological intelligence through the lens of transformer computations and artificial intelligence through principles of brain organization.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a systematic analysis of transformer block internals—examining 13 intermediate computational states from layer normalization through attention to feed-forward networks—and their alignment with brain activity during language processing. It occupies the 'Transformer Component Analysis' leaf within the 'Computational Mechanisms of Alignment' branch, where it is currently the sole paper. This positioning reflects a sparse but emerging research direction: while the broader taxonomy contains 50 papers across diverse alignment topics, fine-grained component-level analyses remain underexplored compared to layer-wise or whole-model comparisons.

The taxonomy reveals neighboring work in 'Layer-Wise and Temporal Dynamics' (3 papers) and 'Functional Specialization and Brain-Like Organization' (2 papers), both examining hierarchical processing but at coarser granularities. The parent branch 'Computational Mechanisms of Alignment' contrasts with measurement-focused branches like 'Alignment Across Model Architectures' (7 papers) and application-driven branches like 'Language Decoding from fMRI' (6 papers). The paper's focus on intra-block operations diverges from these by dissecting sub-layer computations rather than comparing models or predicting neural responses, situating it at the intersection of mechanistic understanding and neural alignment.

Among 30 candidates examined, the first contribution—systematic computational neuroanatomy of transformer internals—shows one refutable candidate among 10 examined, suggesting some prior work on component-level analysis exists within this limited search scope. The second contribution—discovering intra-block hierarchy mirroring cortical organization—found no refutations among 10 candidates, indicating potential novelty in mapping attention-to-FFN stages onto sensory-to-association cortical hierarchies. The third contribution—MindTransformer framework—also encountered no refutations among 10 candidates, though the limited search scale means unexplored literature may contain relevant alignment methods or architectural innovations.

Based on top-30 semantic matches, the work appears to occupy a relatively novel niche within transformer-brain alignment research, particularly in its granular dissection of sub-layer computations. However, the sparse population of its taxonomy leaf and the presence of at least one overlapping candidate for the core contribution suggest the field is beginning to explore this direction. The analysis does not cover exhaustive citation networks or domain-specific venues, leaving open whether related component-level studies exist beyond the examined scope.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: alignment of large language models with brain activity during language processing. The field has grown into a rich taxonomy spanning measurement and validation methods, computational mechanisms underlying alignment, brain decoding and generation applications, semantic representation studies, foundation models, linguistic competence analyses, naturalistic processing paradigms, clinical applications, and theoretical perspectives. Works such as LLMs Mirror Cognition[2] and Brain Activity Alignment[3] exemplify measurement-focused branches, while others like Generative Language Reconstruction[11] and NeuroLM[12] push toward decoding applications. The taxonomy reflects a tension between using LLMs as cognitive models versus practical tools for neuroscience, with branches dedicated to understanding how transformer architectures relate to neural substrates and others exploring whether alignment metrics genuinely capture shared computational principles. Particularly active lines of work examine whether scaling and architectural choices drive alignment—Scale Matters[16] and Increasing LLM Alignment[1] suggest model size and training regimes matter—while critical perspectives like Against Brain Scores[5] question whether high correlations reflect meaningful cognitive similarity or methodological artifacts. Mind's Transformer[0] sits within the Computational Mechanisms branch, specifically analyzing transformer components to understand which architectural elements contribute to brain-like representations. This places it alongside studies probing internal model structure, contrasting with purely correlational approaches in measurement branches or application-driven decoding work. Compared to Human-like Representations[4], which examines emergent properties broadly, Mind's Transformer[0] offers a more granular dissection of attention and feedforward mechanisms, while differing from LLM Explanations[6] by focusing on neural alignment rather than interpretability per se. The central open question remains whether observed alignment arises from shared computational principles or superficial statistical regularities.

Claimed Contributions

Systematic computational neuroanatomy of transformer block internals

The authors systematically decompose each transformer block into 13 intermediate computational states and evaluate their correspondence with brain activity. This granular approach reveals that commonly used hidden states are suboptimal, with over 90% of brain voxels in sensory and language regions better explained by previously unexplored intermediate computations.

10 retrieved papers
Can Refute
Discovery of intra-block processing hierarchy mirroring cortical organization

The work uncovers a fine-grained computational hierarchy within each transformer block that parallels the brain's anatomical processing hierarchy. Early attention-related states align with low-level sensory cortices, while later feed-forward network states correspond to high-level association areas, extending beyond the known layer-wise progression in LLMs.

10 retrieved papers
MindTransformer framework for brain-aligned representation learning

The authors introduce MindTransformer, a principled framework that learns brain-aligned representations by discovering neurally-relevant features through ridge regression on concatenated intermediate states and selecting the most informative subset. This framework achieves significant brain alignment performance, with correlation improvements in primary auditory cortex exceeding gains from 456× model scaling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Systematic computational neuroanatomy of transformer block internals

The authors systematically decompose each transformer block into 13 intermediate computational states and evaluate their correspondence with brain activity. This granular approach reveals that commonly used hidden states are suboptimal, with over 90% of brain voxels in sensory and language regions better explained by previously unexplored intermediate computations.

Contribution

Discovery of intra-block processing hierarchy mirroring cortical organization

The work uncovers a fine-grained computational hierarchy within each transformer block that parallels the brain's anatomical processing hierarchy. Early attention-related states align with low-level sensory cortices, while later feed-forward network states correspond to high-level association areas, extending beyond the known layer-wise progression in LLMs.

Contribution

MindTransformer framework for brain-aligned representation learning

The authors introduce MindTransformer, a principled framework that learns brain-aligned representations by discovering neurally-relevant features through ridge regression on concatenated intermediate states and selecting the most informative subset. This framework achieves significant brain alignment performance, with correlation improvements in primary auditory cortex exceeding gains from 456× model scaling.