Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

neuroscienceneuroimagingSSLself-supervised learningrepresentation learningfoundation model

The development of foundation models for functional magnetic resonance imaging (fMRI) time series holds significant promise for predicting phenotypes related to disease and cognition. Current models, however, are often trained using a mask-and-reconstruct objective on small brain regions. This focus on low-level information leads to representations that are sensitive to noise and temporal fluctuations, necessitating extensive fine-tuning for downstream tasks. We introduce Brain-Semantoks, a self-supervised framework designed specifically to learn abstract representations of brain dynamics. Its architecture is built on two core innovations: a semantic tokenizer that aggregates noisy regional signals into robust tokens representing functional networks, and a self-distillation objective that enforces representational stability across time. We show that this objective is stabilized through a novel training curriculum, ensuring the model robustly learns meaningful features from low signal-to-noise time series. We demonstrate that learned representations enable strong performance on a variety of downstream tasks even when only using a linear probe. Furthermore, we provide comprehensive scaling analyses indicating more unlabeled data reliably results in out-of-distribution performance gains without domain adaptation.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Brain-Semantoks, a self-supervised framework combining a semantic tokenizer that aggregates regional fMRI signals into functional network tokens with a self-distillation objective for temporal stability. It resides in the 'Masked Autoencoding and Predictive Architectures' leaf under 'Foundation Models and Self-Supervised Pretraining', alongside two sibling papers. This leaf represents a moderately populated research direction within the broader foundation model branch, which itself contains six papers across two sub-categories. The taxonomy shows this is an active but not overcrowded area, with the paper positioned among methods that learn general-purpose brain representations through reconstruction or prediction objectives.

The taxonomy reveals several neighboring research directions that contextualize this work. The sibling 'Graph Contrastive and Modular Pretraining' category explores alternative self-supervised objectives using graph structure and contrastive learning rather than masked reconstruction. Adjacent branches include 'Temporal Dynamics and State-Space Modeling', which emphasizes explicit temporal evolution through recurrent or state-space formulations, and 'Metastability and Discrete State Representations', which also quantizes brain dynamics but focuses on metastable configurations rather than functional network tokens. The paper bridges foundation model pretraining with discrete tokenization approaches, connecting masked autoencoding traditions with state-based representations while maintaining focus on abstract, noise-robust features.

Among 21 candidates examined across three contributions, none were identified as clearly refuting the proposed methods. The self-distillation framework examined 10 candidates with no refutable overlap, the semantic tokenizer examined 10 candidates with similar results, and the training curriculum examined 1 candidate without refutation. This suggests that within the limited search scope, the specific combination of semantic tokenization, self-distillation, and curriculum learning appears relatively unexplored. However, the modest search scale means substantial prior work may exist beyond the top-K semantic matches examined. The semantic tokenizer contribution, despite examining 10 candidates, shows no direct precedent in the retrieved literature.

Based on the limited literature search of 21 candidates, the work appears to occupy a distinctive position combining discrete tokenization with self-supervised learning for fMRI. The taxonomy structure indicates this sits in an active but not saturated research area, with clear differentiation from continuous embedding methods in sibling papers. The absence of refuting candidates across all contributions suggests novelty within the examined scope, though the search scale leaves open the possibility of relevant work in adjacent communities or earlier literature not captured by semantic similarity.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: learning abstract representations of brain dynamics from fMRI time series. The field has organized itself around several complementary perspectives on how to extract meaningful structure from high-dimensional neuroimaging data. Graph-based spatiotemporal methods treat brain regions as nodes and model evolving connectivity patterns, often using dynamic graph neural networks or temporal graph learning frameworks such as BrainTGL[6] and Dynamic Graph Neuroimaging[2]. Foundation models and self-supervised pretraining approaches borrow ideas from large-scale vision and language modeling, applying masked autoencoding or contrastive objectives to learn general-purpose brain representations that transfer across tasks and subjects, as seen in works like BrainMAE[39] and Foundational fMRI Model[26]. Generative models focus on reconstructing external stimuli or internal cognitive states from brain activity, while temporal dynamics and state-space modeling emphasize capturing the sequential evolution of neural states. Multi-subject and cross-modal branches address alignment and transfer across individuals or modalities, and task-based methods optimize representations for specific predictive goals. Dimensionality reduction and semantic representation models round out the taxonomy by exploring low-dimensional manifolds and cognitive content. Within the foundation model branch, a particularly active line of work centers on masked autoencoding and predictive architectures that learn to reconstruct or predict missing or future brain activity patterns. Brain Semantoks[0] sits squarely in this cluster, proposing a discrete tokenization scheme that quantizes spatiotemporal brain dynamics into a compact vocabulary before applying masked prediction. This approach contrasts with continuous embedding methods like Brain JEPA[20], which uses joint-embedding predictive architectures without explicit discretization, and with Foundational fMRI Model[26], which scales up pretraining across diverse datasets but retains continuous latent codes. The trade-off revolves around whether discrete tokens offer better interpretability and compositionality or whether continuous representations preserve richer temporal detail. Across these branches, open questions persist about how to balance model scale, inductive biases for spatiotemporal structure, and the degree of supervision needed to capture clinically or cognitively relevant brain dynamics.

Claimed Contributions

Self-distillation framework for learning abstract brain dynamics representations

10 retrieved papers

The authors propose a novel pretraining approach that shifts from reconstruction-based objectives to learning high-level, stable phenotypic signatures through self-distillation across temporal views. This framework explicitly trains models to capture abstract representations suitable for transfer learning rather than modeling low-level signal details.

10 retrieved papers

Semantic tokenizer for functional brain networks

10 retrieved papers

The authors introduce a neuroscientifically-grounded tokenizer that aggregates information from brain regions within functional networks into single robust tokens. This creates a more compact, semantically meaningful input sequence compared to treating individual noisy ROI signals as tokens.

10 retrieved papers

Teacher-guided Temporal Regularizer training curriculum

1 retrieved paper

The authors develop a principled training curriculum that stabilizes self-distillation on low signal-to-noise fMRI data by initially guiding the model to learn time-averaged network representations before modeling complex temporal variations. This regularizer prevents convergence to poor solutions during early training.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[20] Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking PDF

Dong Zijian, Li Ruilin, Zijian Dong, Wu Yilei, Ruilin Li, Yilei Wu, Chong, Joanna Su Xian, T. T. Nguyen, Ji Fang, J. Chong, Fang Ji, Chen Christopher Li-Hsian, Nathanael Ren Jie Tong, Zhou, Juan Helen, Christopher LiâHsian Chen, Juan Helen Zhou (2024)

[26] A Foundational fMRI Model for Representing Continuous Brain States PDF

Li Yang, Lei Guo, Yixuan Yuan, Junwei Han, Xintao Hu, Tuo Zhang (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Self-distillation framework for learning abstract brain dynamics representations

[62] Self-supervised learning for electroencephalogram: A systematic survey PDF

Cannot Refute

[63] Self-supervised learning of brain dynamics from broad neuroimaging data PDF

Cannot Refute

[64] Population transformer: Learning population-level representations of neural activity PDF

Cannot Refute

[65] BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data PDF

Cannot Refute

[66] Explainable Self-Supervised Dynamic Neuroimaging Using Time Reversal PDF

Cannot Refute

[67] Self-supervised Learning for Encoding Between-Subject Information in Clinical EEG PDF

Cannot Refute

[68] LLEDAâLifelong self-supervised domain adaptation PDF

Cannot Refute

[69] GMAEEG: A Self-Supervised Graph Masked Autoencoder for EEG Representation Learning PDF

Cannot Refute

[70] Longitudinal self-supervised learning PDF

Cannot Refute

[71] Adaptive-Similarity-Based Brain Dynamic Functional Connectivity with Spatial-Temporal Attention and Domain Adaptation for Schizophrenia Diagnosis PDF

Cannot Refute

Contribution

Semantic tokenizer for functional brain networks

[52] The disturbed functional brain network in major depressive disorder identified by graph theory analysis PDF

Cannot Refute

[53] RTGMFF: Enhanced fmri-based brain disorder diagnosis via roi-driven text generation and multimodal feature fusion PDF

Cannot Refute

[54] Hierarchical Encoding and Fusion of Brain Functions for Depression Subtype Classification PDF

Cannot Refute

[55] BACE: Behavior-adaptive connectivity estimation for interpretable graphs of neural dynamics PDF

Cannot Refute

[56] EEG emotion classification based on graph convolutional network PDF

Cannot Refute

[57] Cognitmoe: A cognition-aware collaborative multi-expert network for bipolar disorder diagnosis PDF

Cannot Refute

[58] TFAGL: A novel agent graph learning method using time-frequency EEG for major depressive disorder detection PDF

Cannot Refute

[59] Disorder-specific neurodynamic features in schizophrenia inferred by neurodynamic embedded contrastive variational autoencoder model PDF

Cannot Refute

[60] Reconfiguration of dynamic largeâscale brain network functional connectivity in generalized tonicâclonic seizures PDF

Cannot Refute

[61] Joint learning of multi-level dynamic brain networks for autism spectrum disorder diagnosis PDF

Cannot Refute

Contribution

Teacher-guided Temporal Regularizer training curriculum

[51] A Survey on Joint Embedding Predictive Architectures and World Models PDF

Cannot Refute

Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[20] Brain-jepa: Brain dynamics foundation model with gradient positioning and spatiotemporal masking PDF

[26] A Foundational fMRI Model for Representing Continuous Brain States PDF

Contribution Analysis

Self-distillation framework for learning abstract brain dynamics representations

[62] Self-supervised learning for electroencephalogram: A systematic survey PDF

[63] Self-supervised learning of brain dynamics from broad neuroimaging data PDF

[64] Population transformer: Learning population-level representations of neural activity PDF

[65] BENDR: Using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data PDF

[66] Explainable Self-Supervised Dynamic Neuroimaging Using Time Reversal PDF

[67] Self-supervised Learning for Encoding Between-Subject Information in Clinical EEG PDF

[68] LLEDAâLifelong self-supervised domain adaptation PDF

[69] GMAEEG: A Self-Supervised Graph Masked Autoencoder for EEG Representation Learning PDF

[70] Longitudinal self-supervised learning PDF

[71] Adaptive-Similarity-Based Brain Dynamic Functional Connectivity with Spatial-Temporal Attention and Domain Adaptation for Schizophrenia Diagnosis PDF

Semantic tokenizer for functional brain networks

[52] The disturbed functional brain network in major depressive disorder identified by graph theory analysis PDF

[53] RTGMFF: Enhanced fmri-based brain disorder diagnosis via roi-driven text generation and multimodal feature fusion PDF

[54] Hierarchical Encoding and Fusion of Brain Functions for Depression Subtype Classification PDF

[55] BACE: Behavior-adaptive connectivity estimation for interpretable graphs of neural dynamics PDF

[56] EEG emotion classification based on graph convolutional network PDF

[57] Cognitmoe: A cognition-aware collaborative multi-expert network for bipolar disorder diagnosis PDF

[58] TFAGL: A novel agent graph learning method using time-frequency EEG for major depressive disorder detection PDF

[59] Disorder-specific neurodynamic features in schizophrenia inferred by neurodynamic embedded contrastive variational autoencoder model PDF

[60] Reconfiguration of dynamic largeâscale brain network functional connectivity in generalized tonicâclonic seizures PDF

[61] Joint learning of multi-level dynamic brain networks for autism spectrum disorder diagnosis PDF

Teacher-guided Temporal Regularizer training curriculum

[51] A Survey on Joint Embedding Predictive Architectures and World Models PDF

Table of Contents

[68] LLEDAâLifelong self-supervised domain adaptation PDF

[60] Reconfiguration of dynamic largeâscale brain network functional connectivity in generalized tonicâclonic seizures PDF