The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

language modelneurosciencebrain alignmentfMRI

The alignment of Large Language Models (LLMs) and brain activity provides a powerful framework to advance our understanding of cognitive neuroscience and artificial intelligence. In this work, we zoom into one of the fundamental units of LLMs—the transformer block—to provide the first systematic computational neuroanatomy of its internal operations and human brain acitivity during language processing. Analyzing 21 state-of-the-art LLMs across five model families, we extract and evaluate 13 distinct intermediate states per transformer block—from initial layer normalization through attention mechanisms to feed-forward networks (FFNs). Our analysis reveals three key findings: (1) The commonly used hidden states in LLMs are surprisingly suboptimal, with over 90% of brain voxels in sensory and language regions better explained by previously unexplored intermediate computations; (2) Different computational stages within a single transformer block map to anatomically distinct brain systems, revealing an intra-block hierarchy where early attention states align with sensory cortices while later FFN states correspond to association areas—mirroring the cortical processing hierarchy; (3) Rotary Positional Embeddings (RoPE) specifically enhance alignment along the brain's auditory processing streams. Per-head queries with RoPE best explain 74% of auditory cortex activity compared to 8% without RoPE, providing the first neurobiological validation of this architectural component in LLMs. Building on these insights, we propose MindTransformer, a feature selection framework that learns brain-aligned representations from all intermediate states. MindTransformer achieves significant brain alignment performance, with correlation improvements in primary auditory cortex exceeding gains from 456× model scaling. Our computational neuroanatomy approach opens new directions for understanding both biological intelligence through the lens of transformer computations and artificial intelligence through principles of brain organization.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper contributes a systematic analysis of transformer block internals—examining 13 intermediate computational states from layer normalization through attention to feed-forward networks—and their alignment with brain activity during language processing. It occupies the 'Transformer Component Analysis' leaf within the 'Computational Mechanisms of Alignment' branch, where it is currently the sole paper. This positioning reflects a sparse but emerging research direction: while the broader taxonomy contains 50 papers across diverse alignment topics, fine-grained component-level analyses remain underexplored compared to layer-wise or whole-model comparisons.

The taxonomy reveals neighboring work in 'Layer-Wise and Temporal Dynamics' (3 papers) and 'Functional Specialization and Brain-Like Organization' (2 papers), both examining hierarchical processing but at coarser granularities. The parent branch 'Computational Mechanisms of Alignment' contrasts with measurement-focused branches like 'Alignment Across Model Architectures' (7 papers) and application-driven branches like 'Language Decoding from fMRI' (6 papers). The paper's focus on intra-block operations diverges from these by dissecting sub-layer computations rather than comparing models or predicting neural responses, situating it at the intersection of mechanistic understanding and neural alignment.

Among 30 candidates examined, the first contribution—systematic computational neuroanatomy of transformer internals—shows one refutable candidate among 10 examined, suggesting some prior work on component-level analysis exists within this limited search scope. The second contribution—discovering intra-block hierarchy mirroring cortical organization—found no refutations among 10 candidates, indicating potential novelty in mapping attention-to-FFN stages onto sensory-to-association cortical hierarchies. The third contribution—MindTransformer framework—also encountered no refutations among 10 candidates, though the limited search scale means unexplored literature may contain relevant alignment methods or architectural innovations.

Based on top-30 semantic matches, the work appears to occupy a relatively novel niche within transformer-brain alignment research, particularly in its granular dissection of sub-layer computations. However, the sparse population of its taxonomy leaf and the presence of at least one overlapping candidate for the core contribution suggest the field is beginning to explore this direction. The analysis does not cover exhaustive citation networks or domain-specific venues, leaving open whether related component-level studies exist beyond the examined scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: alignment of large language models with brain activity during language processing. The field has grown into a rich taxonomy spanning measurement and validation methods, computational mechanisms underlying alignment, brain decoding and generation applications, semantic representation studies, foundation models, linguistic competence analyses, naturalistic processing paradigms, clinical applications, and theoretical perspectives. Works such as LLMs Mirror Cognition[2] and Brain Activity Alignment[3] exemplify measurement-focused branches, while others like Generative Language Reconstruction[11] and NeuroLM[12] push toward decoding applications. The taxonomy reflects a tension between using LLMs as cognitive models versus practical tools for neuroscience, with branches dedicated to understanding how transformer architectures relate to neural substrates and others exploring whether alignment metrics genuinely capture shared computational principles. Particularly active lines of work examine whether scaling and architectural choices drive alignment—Scale Matters[16] and Increasing LLM Alignment[1] suggest model size and training regimes matter—while critical perspectives like Against Brain Scores[5] question whether high correlations reflect meaningful cognitive similarity or methodological artifacts. Mind's Transformer[0] sits within the Computational Mechanisms branch, specifically analyzing transformer components to understand which architectural elements contribute to brain-like representations. This places it alongside studies probing internal model structure, contrasting with purely correlational approaches in measurement branches or application-driven decoding work. Compared to Human-like Representations[4], which examines emergent properties broadly, Mind's Transformer[0] offers a more granular dissection of attention and feedforward mechanisms, while differing from LLM Explanations[6] by focusing on neural alignment rather than interpretability per se. The central open question remains whether observed alignment arises from shared computational principles or superficial statistical regularities.

Claimed Contributions

Systematic computational neuroanatomy of transformer block internals

Can Refute

10 retrieved papers

The authors systematically decompose each transformer block into 13 intermediate computational states and evaluate their correspondence with brain activity. This granular approach reveals that commonly used hidden states are suboptimal, with over 90% of brain voxels in sensory and language regions better explained by previously unexplored intermediate computations.

10 retrieved papers

Can Refute

Discovery of intra-block processing hierarchy mirroring cortical organization

10 retrieved papers

The work uncovers a fine-grained computational hierarchy within each transformer block that parallels the brain's anatomical processing hierarchy. Early attention-related states align with low-level sensory cortices, while later feed-forward network states correspond to high-level association areas, extending beyond the known layer-wise progression in LLMs.

10 retrieved papers

MindTransformer framework for brain-aligned representation learning

10 retrieved papers

The authors introduce MindTransformer, a principled framework that learns brain-aligned representations by discovering neurally-relevant features through ridge regression on concatenated intermediate states and selecting the most informative subset. This framework achieves significant brain alignment performance, with correlation improvements in primary auditory cortex exceeding gains from 456× model scaling.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Systematic computational neuroanatomy of transformer block internals

[62] Shared functional specialization in transformer-based language models and the human brain PDF

Can Refute

[20] Do Large Language Models Think Like the Brain? Sentence-Level Evidence from fMRI and Hierarchical Embeddings PDF

Cannot Refute

[61] EEG-Deformer: A dense convolutional transformer for brain-computer interfaces PDF

Cannot Refute

[63] Building transformers from neurons and astrocytes PDF

Cannot Refute

[64] BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity PDF

Cannot Refute

[65] Investigating the role of modality and training objective on representational alignment between transformers and the brain PDF

Cannot Refute

[66] Aligning Brain Activity with Advanced Transformer Models: Exploring the Role of Punctuation in Semantic Processing PDF

Cannot Refute

[67] Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity PDF

Cannot Refute

[68] Sstformer: Bridging spiking neural network and memory support transformer for frame-event based recognition PDF

Cannot Refute

[69] Neural Correlates of Language Models Are Specific to Human Language PDF

Cannot Refute

Contribution

Discovery of intra-block processing hierarchy mirroring cortical organization

[70] Distinct feedforward and feedback pathways for cell-type specific attention effects PDF

Cannot Refute

[71] Feedforward attentional selection in sensory cortex PDF

Cannot Refute

[72] Transformers and cortical waves: encoders for pulling in context across time PDF

Cannot Refute

[73] A distributed, hierarchical and recurrent framework for reward-based choice PDF

Cannot Refute

[74] Attention along the cortical hierarchy: Development matters PDF

Cannot Refute

[75] AIM: A network model of attention in auditory cortex PDF

Cannot Refute

[76] Cortical state and attention PDF

Cannot Refute

[77] Stndt: Modeling neural population activity with spatiotemporal transformers PDF

Cannot Refute

[78] The cognit: a network model of cortical representation PDF

Cannot Refute

[79] Attention separates sensory and motor signals in the mouse visual cortex. PDF

Cannot Refute

Contribution

MindTransformer framework for brain-aligned representation learning

[51] The topology and geometry of neural representations PDF

Cannot Refute

[52] Neural representations of emotion are organized around abstract event features PDF

Cannot Refute

[53] Explainable Depression Classification Based on EEG Feature Selection From Audio Stimuli PDF

Cannot Refute

[54] Probing Word Syntactic Representations in the Brain by a Feature Elimination Method PDF

Cannot Refute

[55] Diagnosing autism spectrum disorder from brain resting-state functional connectivity patterns using a deep neural network with a novel feature selection method PDF

Cannot Refute

[56] Adaptive Learning through Temporal Dynamics of State Representation PDF

Cannot Refute

[57] Multitask fMRI and machine learning approach improve prediction of differential brain activity pattern in patients with insomnia disorder PDF

Cannot Refute

[58] Decoding the neural representation of affective states PDF

Cannot Refute

[59] Temporal combination pattern optimization based on feature selection method for motor imagery BCIs PDF

Cannot Refute

[60] Segmentation and detection of brain tumor through optimal selection of integrated features using transfer learning PDF

Cannot Refute

The Mind's Transformer: Computational Neuroanatomy of LLM-Brain Alignment

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

Systematic computational neuroanatomy of transformer block internals

[62] Shared functional specialization in transformer-based language models and the human brain PDF

[20] Do Large Language Models Think Like the Brain? Sentence-Level Evidence from fMRI and Hierarchical Embeddings PDF

[61] EEG-Deformer: A dense convolutional transformer for brain-computer interfaces PDF

[63] Building transformers from neurons and astrocytes PDF

[64] BrainSymphony: A Transformer-Driven Fusion of fMRI Time Series and Structural Connectivity PDF

[65] Investigating the role of modality and training objective on representational alignment between transformers and the brain PDF

[66] Aligning Brain Activity with Advanced Transformer Models: Exploring the Role of Punctuation in Semantic Processing PDF

[67] Neural Data Transformer 2: Multi-context Pretraining for Neural Spiking Activity PDF

[68] Sstformer: Bridging spiking neural network and memory support transformer for frame-event based recognition PDF

[69] Neural Correlates of Language Models Are Specific to Human Language PDF

Discovery of intra-block processing hierarchy mirroring cortical organization

[70] Distinct feedforward and feedback pathways for cell-type specific attention effects PDF

[71] Feedforward attentional selection in sensory cortex PDF

[72] Transformers and cortical waves: encoders for pulling in context across time PDF

[73] A distributed, hierarchical and recurrent framework for reward-based choice PDF

[74] Attention along the cortical hierarchy: Development matters PDF

[75] AIM: A network model of attention in auditory cortex PDF

[76] Cortical state and attention PDF

[77] Stndt: Modeling neural population activity with spatiotemporal transformers PDF

[78] The cognit: a network model of cortical representation PDF

[79] Attention separates sensory and motor signals in the mouse visual cortex. PDF

MindTransformer framework for brain-aligned representation learning

[51] The topology and geometry of neural representations PDF

[52] Neural representations of emotion are organized around abstract event features PDF

[53] Explainable Depression Classification Based on EEG Feature Selection From Audio Stimuli PDF

[54] Probing Word Syntactic Representations in the Brain by a Feature Elimination Method PDF

[55] Diagnosing autism spectrum disorder from brain resting-state functional connectivity patterns using a deep neural network with a novel feature selection method PDF

[56] Adaptive Learning through Temporal Dynamics of State Representation PDF

[57] Multitask fMRI and machine learning approach improve prediction of differential brain activity pattern in patients with insomnia disorder PDF

[58] Decoding the neural representation of affective states PDF

[59] Temporal combination pattern optimization based on feature selection method for motor imagery BCIs PDF

[60] Segmentation and detection of brain tumor through optimal selection of integrated features using transfer learning PDF

Table of Contents