A foundation model with multi-variate parallel attention to generate neuronal activity

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

time-seriesieegneurologyfoundation modelattentiontransformer

Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks, particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future efforts by the community, we release the Long-term iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. MVPFormer leverages MVPA to achieve strong generalization across subjects, demonstrating expert-level performance in several iEEG tasks. MVPFormer surpasses state-of-the-art (SOTA) Transformer baselines in seizure detection across the Long-term, the MAYO, and the FNUSA datasets, while also achieving SOTA performance on four Brain TreeBank iEEG decoding tasks (volume, pitch, onset, and speech). We further validate MVPA on standard time-series forecasting and classification tasks, where it matches or exceeds the performance of existing attention-based models. Together, our contributions establish MVPA as a general-purpose attention mechanism for heterogeneous time-series and MVPFormer as the first open-source, open-weights, and open-data iEEG foundation model with SOTA clinical performance.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces multi-variate parallel attention (MVPA), a self-attention mechanism that disentangles content, temporal, and spatial attention for heterogeneous channel configurations in time-series data. It resides in the 'Attention Mechanisms for Channel and Temporal Modeling' leaf, which contains five papers total, including the original work. This leaf sits within the broader 'Channel Handling Strategies and Architectures' branch, indicating a moderately populated research direction focused on attention-based solutions for channel heterogeneity. The sibling papers explore related attention patterns, suggesting this is an active but not overcrowded subfield.

The taxonomy tree reveals neighboring leaves addressing channel independence (six papers using mixing approaches), channel dependence (three papers on cross-channel interaction), and variable channel handling (five papers on missing/partial channels). The paper's attention-based approach diverges from channel-independent mixers like Tsmixer and aligns more closely with structured attention methods such as Triformer. The 'Foundation Models and Pre-Training' leaf (five papers) represents a related direction for cross-domain generalization, while the 'Biomedical Signal Processing' leaf (two papers) captures domain-specific applications. MVPA bridges architectural innovation with domain needs by targeting iEEG data.

Among thirty candidates examined, none clearly refute the three main contributions: MVPA mechanism (ten candidates, zero refutable), MVPFormer foundation model (ten candidates, zero refutable), and the Long-term iEEG dataset (ten candidates, zero refutable). The MVPA mechanism appears most architecturally novel, as the examined candidates do not present identical disentangled attention designs for heterogeneous channels. The foundation model and dataset contributions show no overlapping prior work within the limited search scope, though the analysis acknowledges this reflects top-K semantic matches rather than exhaustive coverage. The statistics suggest the work occupies a relatively distinct position among examined papers.

Based on the limited search of thirty semantically similar candidates, the work appears to introduce distinct architectural and empirical contributions. The taxonomy context shows the paper sits in a moderately active attention-based subfield, with clear differentiation from channel-independent and purely cross-channel methods. However, the analysis does not cover the full landscape of biomedical foundation models or all attention variants in time-series literature, leaving open questions about broader novelty beyond the examined scope.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Modeling multi-variate time-series with heterogeneous channel configurations. The field addresses scenarios where different channels may have varying sampling rates, missing modalities, or distinct semantic meanings, requiring specialized architectures beyond standard uniform-channel models. The taxonomy organizes research into several main branches: Channel Handling Strategies and Architectures explores how models explicitly manage channel dependencies and independence, often through attention mechanisms or mixer designs such as Tsmixer Lightweight[11] and MTS-Mixers Factorized[8]. Irregular Sampling and Temporal Heterogeneity focuses on methods that accommodate non-uniform time grids or asynchronous observations, exemplified by Warpformer Irregular Clinical[44]. Cross-Domain Generalization and Heterogeneity Management examines techniques for transferring knowledge across datasets with different channel sets, including foundation models like UniTS Unified Model[24] and federated approaches such as Federated Heterogeneous Models[5]. Task-Specific Methods and Application Domains capture specialized solutions for forecasting, classification, and domain-specific challenges in healthcare, sensor networks, and beyond. Recent work has intensified around balancing channel-independent versus channel-mixing strategies, with some studies advocating for factorized designs to reduce complexity while others leverage cross-channel attention to capture dependencies. Multivariate Parallel Attention[0] sits within the attention-based channel modeling cluster, emphasizing parallel mechanisms that jointly process temporal and channel dimensions. This approach contrasts with purely channel-independent mixers like Tsmixer Lightweight[11] and aligns more closely with Multi-Channel Attention[16] and Triformer Triangular Attentions[34], which also exploit structured attention patterns. Compared to Catch Frequency Patching[3], which focuses on frequency-domain representations, Multivariate Parallel Attention[0] maintains a direct temporal-channel coupling strategy. The broader challenge remains how to scale these attention mechanisms efficiently while preserving the ability to handle variable channel counts and missing modalities, a tension evident across many branches of this taxonomy.

Claimed Contributions

Multi-variate parallel attention (MVPA)

10 retrieved papers

MVPA is a novel self-attention mechanism that decomposes attention into three separate components: content-based, time-based, and channel-based attention. This decomposition enables the model to handle multi-variate time-series with heterogeneous channel configurations while maintaining computational efficiency through relative positional encoding and local attention windows.

10 retrieved papers

MVPFormer foundation model for human electrophysiology

10 retrieved papers

MVPFormer is a Transformer-based foundation model powered by MVPA that processes heterogeneous iEEG data through generative pre-training in continuous embedding space. The model predicts future neuronal activity and demonstrates superior generalization across subjects and clinical tasks compared to vanilla attention-based models.

10 retrieved papers

Long-term iEEG dataset

10 retrieved papers

The Long-term iEEG dataset is the largest publicly available iEEG corpus, containing nearly 10,000 hours of multi-channel recordings (540,000 channel-hours) from 68 subjects with 704 ictal events, fully curated and labeled by experienced clinicians to support foundation model development in the iEEG domain.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[10] A multi-scale cross-channel attention network for remaining useful life prediction with variable sensors PDF

Tianao Zhang, Li Jiang, Ruyi Huang, Xin Zhang (2025)

[16] A novel approach of multi-channel attention mechanism for long-sequential multivariate time-series prediction problem PDF

Tham Vo, Linh Nguyen Thi My (2025)

[34] Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting PDF

Razvan-Gabriel Cirstea, Chenjuan Guo, Bin Yang, Tung Kieu, Xuanyi Dong, Shirui Pan (2022)

[47] An Aggregated Convolutional Transformer Based on Slices and Channels for Multivariate Time Series Classification PDF

Yupeng Wu, Cheng Lian, Zhigang Zeng, Bingrong Xu, Yi-Xin Su, Yixin Su (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Multi-variate parallel attention (MVPA)

[61] WHEN: A Wavelet-DTW Hybrid Attention Network for Heterogeneous Time Series Analysis PDF

Cannot Refute

[62] An audio-visual separation model integrating dual-channel attention mechanism PDF

Cannot Refute

[63] A comprehensive survey of time series forecasting: Concepts, challenges, and future directions PDF

Cannot Refute

[64] A channel-wise attention-based representation learning method for epileptic seizure detection and type classification PDF

Cannot Refute

[65] Unveiling the multi-dimensional spatio-temporal fusion transformer (MDSTFT): A revolutionary deep learning framework for enhanced multi-variate time series â¦ PDF

Cannot Refute

[66] Unsupervised multivariate time series anomaly detection by feature decoupling in federated learning scenarios PDF

Cannot Refute

[67] Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition PDF

Cannot Refute

[68] Fine-Grained Air Quality Inference via Multi-Channel Attention Model. PDF

Cannot Refute

[69] A channel dependency decoupled two-stream model for multivariate time series analysis PDF

Cannot Refute

[70] Separated channel attention convolutional neural network (SC-CNN-attention) to identify ADHD in multi-site rs-fMRI dataset PDF

Cannot Refute

Contribution

MVPFormer foundation model for human electrophysiology

[51] Deep conditional generative model for personalization of 12-lead electrocardiograms and cardiovascular risk prediction PDF

Cannot Refute

[52] Foundational gpt model for meg PDF

Cannot Refute

[53] Harnessing electroencephalography connectomes for cognitive and clinical neuroscience PDF

Cannot Refute

[54] Exploring the Potential of Electroencephalography SignalâBased Image Generation Using Diffusion Models: Integrative Framework Combining Mixed â¦ PDF

Cannot Refute

[55] Synthetic electroretinogram signal generation using a conditional generative adversarial network PDF

Cannot Refute

[56] Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder PDF

Cannot Refute

[57] Generating realistic neurophysiological time series with denoising diffusion probabilistic models PDF

Cannot Refute

[58] Time-resolved dynamic computational modeling of human EEG recordings reveals gradients of generative mechanisms for the MMN response PDF

Cannot Refute

[59] Multi-domain variational autoencoders for combined modeling of MRI-based biventricular anatomy and ECG-based cardiac electrophysiology PDF

Cannot Refute

[60] MEG-GPT: A transformer-based foundation model for magnetoencephalography data PDF

Cannot Refute

Contribution

Long-term iEEG dataset

[71] Annotated interictal discharges in intracranial EEG sleep data and related machine learning detection scheme PDF

Cannot Refute

[72] Open and free EEG datasets for epilepsy diagnosis PDF

Cannot Refute

[73] EEG datasets for seizure detection and predictionâA review PDF

Cannot Refute

[74] Open multi-center intracranial electroencephalography dataset with task probing conscious visual perception PDF

Cannot Refute

[75] Improving the generalization of patient non-specific model for epileptic seizure detection PDF

Cannot Refute

[76] Quantitative approaches to guide epilepsy surgery from intracranial EEG PDF

Cannot Refute

[77] Preictal period optimization for deep learning-based epileptic seizure prediction PDF

Cannot Refute

[78] Normative brain mapping of interictal intracranial EEG to localize epileptogenic tissue PDF

Cannot Refute

[79] Normative intracranial EEG maps epileptogenic tissues in focal epilepsy PDF

Cannot Refute

[80] EEG dataset for SNMC Bagalkot patients epileptic seizure data PDF

Cannot Refute

A foundation model with multi-variate parallel attention to generate neuronal activity

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[10] A multi-scale cross-channel attention network for remaining useful life prediction with variable sensors PDF

[16] A novel approach of multi-channel attention mechanism for long-sequential multivariate time-series prediction problem PDF

[34] Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting PDF

[47] An Aggregated Convolutional Transformer Based on Slices and Channels for Multivariate Time Series Classification PDF

Contribution Analysis

Multi-variate parallel attention (MVPA)

[61] WHEN: A Wavelet-DTW Hybrid Attention Network for Heterogeneous Time Series Analysis PDF

[62] An audio-visual separation model integrating dual-channel attention mechanism PDF

[63] A comprehensive survey of time series forecasting: Concepts, challenges, and future directions PDF

[64] A channel-wise attention-based representation learning method for epileptic seizure detection and type classification PDF

[65] Unveiling the multi-dimensional spatio-temporal fusion transformer (MDSTFT): A revolutionary deep learning framework for enhanced multi-variate time series â¦ PDF

[66] Unsupervised multivariate time series anomaly detection by feature decoupling in federated learning scenarios PDF

[67] Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition PDF

[68] Fine-Grained Air Quality Inference via Multi-Channel Attention Model. PDF

[69] A channel dependency decoupled two-stream model for multivariate time series analysis PDF

[70] Separated channel attention convolutional neural network (SC-CNN-attention) to identify ADHD in multi-site rs-fMRI dataset PDF

MVPFormer foundation model for human electrophysiology

[51] Deep conditional generative model for personalization of 12-lead electrocardiograms and cardiovascular risk prediction PDF

[52] Foundational gpt model for meg PDF

[53] Harnessing electroencephalography connectomes for cognitive and clinical neuroscience PDF

[54] Exploring the Potential of Electroencephalography SignalâBased Image Generation Using Diffusion Models: Integrative Framework Combining Mixed â¦ PDF

[55] Synthetic electroretinogram signal generation using a conditional generative adversarial network PDF

[56] Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder PDF

[57] Generating realistic neurophysiological time series with denoising diffusion probabilistic models PDF

[58] Time-resolved dynamic computational modeling of human EEG recordings reveals gradients of generative mechanisms for the MMN response PDF

[59] Multi-domain variational autoencoders for combined modeling of MRI-based biventricular anatomy and ECG-based cardiac electrophysiology PDF

[60] MEG-GPT: A transformer-based foundation model for magnetoencephalography data PDF

Long-term iEEG dataset

[71] Annotated interictal discharges in intracranial EEG sleep data and related machine learning detection scheme PDF

[72] Open and free EEG datasets for epilepsy diagnosis PDF

[73] EEG datasets for seizure detection and predictionâA review PDF

[74] Open multi-center intracranial electroencephalography dataset with task probing conscious visual perception PDF

[75] Improving the generalization of patient non-specific model for epileptic seizure detection PDF

[76] Quantitative approaches to guide epilepsy surgery from intracranial EEG PDF

[77] Preictal period optimization for deep learning-based epileptic seizure prediction PDF

[78] Normative brain mapping of interictal intracranial EEG to localize epileptogenic tissue PDF

[79] Normative intracranial EEG maps epileptogenic tissues in focal epilepsy PDF

[80] EEG dataset for SNMC Bagalkot patients epileptic seizure data PDF

Table of Contents

[65] Unveiling the multi-dimensional spatio-temporal fusion transformer (MDSTFT): A revolutionary deep learning framework for enhanced multi-variate time series â¦ PDF

[54] Exploring the Potential of Electroencephalography SignalâBased Image Generation Using Diffusion Models: Integrative Framework Combining Mixed â¦ PDF

[73] EEG datasets for seizure detection and predictionâA review PDF