Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model

ICLR 2026 Conference SubmissionAnonymous Authors
neuroscienceneuroimagingSSLself-supervised learningrepresentation learningfoundation model
Abstract:

The development of foundation models for functional magnetic resonance imaging (fMRI) time series holds significant promise for predicting phenotypes related to disease and cognition. Current models, however, are often trained using a mask-and-reconstruct objective on small brain regions. This focus on low-level information leads to representations that are sensitive to noise and temporal fluctuations, necessitating extensive fine-tuning for downstream tasks. We introduce Brain-Semantoks, a self-supervised framework designed specifically to learn abstract representations of brain dynamics. Its architecture is built on two core innovations: a semantic tokenizer that aggregates noisy regional signals into robust tokens representing functional networks, and a self-distillation objective that enforces representational stability across time. We show that this objective is stabilized through a novel training curriculum, ensuring the model robustly learns meaningful features from low signal-to-noise time series. We demonstrate that learned representations enable strong performance on a variety of downstream tasks even when only using a linear probe. Furthermore, we provide comprehensive scaling analyses indicating more unlabeled data reliably results in out-of-distribution performance gains without domain adaptation.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Brain-Semantoks, a self-supervised framework combining a semantic tokenizer that aggregates regional fMRI signals into functional network tokens with a self-distillation objective for temporal stability. It resides in the 'Masked Autoencoding and Predictive Architectures' leaf under 'Foundation Models and Self-Supervised Pretraining', alongside two sibling papers. This leaf represents a moderately populated research direction within the broader foundation model branch, which itself contains six papers across two sub-categories. The taxonomy shows this is an active but not overcrowded area, with the paper positioned among methods that learn general-purpose brain representations through reconstruction or prediction objectives.

The taxonomy reveals several neighboring research directions that contextualize this work. The sibling 'Graph Contrastive and Modular Pretraining' category explores alternative self-supervised objectives using graph structure and contrastive learning rather than masked reconstruction. Adjacent branches include 'Temporal Dynamics and State-Space Modeling', which emphasizes explicit temporal evolution through recurrent or state-space formulations, and 'Metastability and Discrete State Representations', which also quantizes brain dynamics but focuses on metastable configurations rather than functional network tokens. The paper bridges foundation model pretraining with discrete tokenization approaches, connecting masked autoencoding traditions with state-based representations while maintaining focus on abstract, noise-robust features.

Among 21 candidates examined across three contributions, none were identified as clearly refuting the proposed methods. The self-distillation framework examined 10 candidates with no refutable overlap, the semantic tokenizer examined 10 candidates with similar results, and the training curriculum examined 1 candidate without refutation. This suggests that within the limited search scope, the specific combination of semantic tokenization, self-distillation, and curriculum learning appears relatively unexplored. However, the modest search scale means substantial prior work may exist beyond the top-K semantic matches examined. The semantic tokenizer contribution, despite examining 10 candidates, shows no direct precedent in the retrieved literature.

Based on the limited literature search of 21 candidates, the work appears to occupy a distinctive position combining discrete tokenization with self-supervised learning for fMRI. The taxonomy structure indicates this sits in an active but not saturated research area, with clear differentiation from continuous embedding methods in sibling papers. The absence of refuting candidates across all contributions suggests novelty within the examined scope, though the search scale leaves open the possibility of relevant work in adjacent communities or earlier literature not captured by semantic similarity.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: learning abstract representations of brain dynamics from fMRI time series. The field has organized itself around several complementary perspectives on how to extract meaningful structure from high-dimensional neuroimaging data. Graph-based spatiotemporal methods treat brain regions as nodes and model evolving connectivity patterns, often using dynamic graph neural networks or temporal graph learning frameworks such as BrainTGL[6] and Dynamic Graph Neuroimaging[2]. Foundation models and self-supervised pretraining approaches borrow ideas from large-scale vision and language modeling, applying masked autoencoding or contrastive objectives to learn general-purpose brain representations that transfer across tasks and subjects, as seen in works like BrainMAE[39] and Foundational fMRI Model[26]. Generative models focus on reconstructing external stimuli or internal cognitive states from brain activity, while temporal dynamics and state-space modeling emphasize capturing the sequential evolution of neural states. Multi-subject and cross-modal branches address alignment and transfer across individuals or modalities, and task-based methods optimize representations for specific predictive goals. Dimensionality reduction and semantic representation models round out the taxonomy by exploring low-dimensional manifolds and cognitive content. Within the foundation model branch, a particularly active line of work centers on masked autoencoding and predictive architectures that learn to reconstruct or predict missing or future brain activity patterns. Brain Semantoks[0] sits squarely in this cluster, proposing a discrete tokenization scheme that quantizes spatiotemporal brain dynamics into a compact vocabulary before applying masked prediction. This approach contrasts with continuous embedding methods like Brain JEPA[20], which uses joint-embedding predictive architectures without explicit discretization, and with Foundational fMRI Model[26], which scales up pretraining across diverse datasets but retains continuous latent codes. The trade-off revolves around whether discrete tokens offer better interpretability and compositionality or whether continuous representations preserve richer temporal detail. Across these branches, open questions persist about how to balance model scale, inductive biases for spatiotemporal structure, and the degree of supervision needed to capture clinically or cognitively relevant brain dynamics.

Claimed Contributions

Self-distillation framework for learning abstract brain dynamics representations

The authors propose a novel pretraining approach that shifts from reconstruction-based objectives to learning high-level, stable phenotypic signatures through self-distillation across temporal views. This framework explicitly trains models to capture abstract representations suitable for transfer learning rather than modeling low-level signal details.

10 retrieved papers
Semantic tokenizer for functional brain networks

The authors introduce a neuroscientifically-grounded tokenizer that aggregates information from brain regions within functional networks into single robust tokens. This creates a more compact, semantically meaningful input sequence compared to treating individual noisy ROI signals as tokens.

10 retrieved papers
Teacher-guided Temporal Regularizer training curriculum

The authors develop a principled training curriculum that stabilizes self-distillation on low signal-to-noise fMRI data by initially guiding the model to learn time-averaged network representations before modeling complex temporal variations. This regularizer prevents convergence to poor solutions during early training.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Self-distillation framework for learning abstract brain dynamics representations

The authors propose a novel pretraining approach that shifts from reconstruction-based objectives to learning high-level, stable phenotypic signatures through self-distillation across temporal views. This framework explicitly trains models to capture abstract representations suitable for transfer learning rather than modeling low-level signal details.

Contribution

Semantic tokenizer for functional brain networks

The authors introduce a neuroscientifically-grounded tokenizer that aggregates information from brain regions within functional networks into single robust tokens. This creates a more compact, semantically meaningful input sequence compared to treating individual noisy ROI signals as tokens.

Contribution

Teacher-guided Temporal Regularizer training curriculum

The authors develop a principled training curriculum that stabilizes self-distillation on low signal-to-noise fMRI data by initially guiding the model to learn time-averaged network representations before modeling complex temporal variations. This regularizer prevents convergence to poor solutions during early training.