A foundation model with multi-variate parallel attention to generate neuronal activity

ICLR 2026 Conference SubmissionAnonymous Authors
time-seriesieegneurologyfoundation modelattentiontransformer
Abstract:

Learning from multi-variate time-series with heterogeneous channel configurations remains a fundamental challenge for deep neural networks, particularly in clinical domains such as intracranial electroencephalography (iEEG), where channel setups vary widely across subjects. In this work, we introduce multi-variate parallel attention (MVPA), a novel self-attention mechanism that disentangles content, temporal, and spatial attention, enabling flexible, generalizable, and efficient modeling of time-series data with varying channel counts and configurations. We use MVPA to build MVPFormer, a generative foundation model for human electrophysiology, trained to predict the evolution of iEEG signals across diverse subjects. To support this and future efforts by the community, we release the Long-term iEEG dataset, the largest publicly available iEEG dataset to date, comprising nearly 10,000 hours of recordings from heterogeneous clinical sources. MVPFormer leverages MVPA to achieve strong generalization across subjects, demonstrating expert-level performance in several iEEG tasks. MVPFormer surpasses state-of-the-art (SOTA) Transformer baselines in seizure detection across the Long-term, the MAYO, and the FNUSA datasets, while also achieving SOTA performance on four Brain TreeBank iEEG decoding tasks (volume, pitch, onset, and speech). We further validate MVPA on standard time-series forecasting and classification tasks, where it matches or exceeds the performance of existing attention-based models. Together, our contributions establish MVPA as a general-purpose attention mechanism for heterogeneous time-series and MVPFormer as the first open-source, open-weights, and open-data iEEG foundation model with SOTA clinical performance.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces multi-variate parallel attention (MVPA), a self-attention mechanism that disentangles content, temporal, and spatial attention for heterogeneous channel configurations in time-series data. It resides in the 'Attention Mechanisms for Channel and Temporal Modeling' leaf, which contains five papers total, including the original work. This leaf sits within the broader 'Channel Handling Strategies and Architectures' branch, indicating a moderately populated research direction focused on attention-based solutions for channel heterogeneity. The sibling papers explore related attention patterns, suggesting this is an active but not overcrowded subfield.

The taxonomy tree reveals neighboring leaves addressing channel independence (six papers using mixing approaches), channel dependence (three papers on cross-channel interaction), and variable channel handling (five papers on missing/partial channels). The paper's attention-based approach diverges from channel-independent mixers like Tsmixer and aligns more closely with structured attention methods such as Triformer. The 'Foundation Models and Pre-Training' leaf (five papers) represents a related direction for cross-domain generalization, while the 'Biomedical Signal Processing' leaf (two papers) captures domain-specific applications. MVPA bridges architectural innovation with domain needs by targeting iEEG data.

Among thirty candidates examined, none clearly refute the three main contributions: MVPA mechanism (ten candidates, zero refutable), MVPFormer foundation model (ten candidates, zero refutable), and the Long-term iEEG dataset (ten candidates, zero refutable). The MVPA mechanism appears most architecturally novel, as the examined candidates do not present identical disentangled attention designs for heterogeneous channels. The foundation model and dataset contributions show no overlapping prior work within the limited search scope, though the analysis acknowledges this reflects top-K semantic matches rather than exhaustive coverage. The statistics suggest the work occupies a relatively distinct position among examined papers.

Based on the limited search of thirty semantically similar candidates, the work appears to introduce distinct architectural and empirical contributions. The taxonomy context shows the paper sits in a moderately active attention-based subfield, with clear differentiation from channel-independent and purely cross-channel methods. However, the analysis does not cover the full landscape of biomedical foundation models or all attention variants in time-series literature, leaving open questions about broader novelty beyond the examined scope.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Modeling multi-variate time-series with heterogeneous channel configurations. The field addresses scenarios where different channels may have varying sampling rates, missing modalities, or distinct semantic meanings, requiring specialized architectures beyond standard uniform-channel models. The taxonomy organizes research into several main branches: Channel Handling Strategies and Architectures explores how models explicitly manage channel dependencies and independence, often through attention mechanisms or mixer designs such as Tsmixer Lightweight[11] and MTS-Mixers Factorized[8]. Irregular Sampling and Temporal Heterogeneity focuses on methods that accommodate non-uniform time grids or asynchronous observations, exemplified by Warpformer Irregular Clinical[44]. Cross-Domain Generalization and Heterogeneity Management examines techniques for transferring knowledge across datasets with different channel sets, including foundation models like UniTS Unified Model[24] and federated approaches such as Federated Heterogeneous Models[5]. Task-Specific Methods and Application Domains capture specialized solutions for forecasting, classification, and domain-specific challenges in healthcare, sensor networks, and beyond. Recent work has intensified around balancing channel-independent versus channel-mixing strategies, with some studies advocating for factorized designs to reduce complexity while others leverage cross-channel attention to capture dependencies. Multivariate Parallel Attention[0] sits within the attention-based channel modeling cluster, emphasizing parallel mechanisms that jointly process temporal and channel dimensions. This approach contrasts with purely channel-independent mixers like Tsmixer Lightweight[11] and aligns more closely with Multi-Channel Attention[16] and Triformer Triangular Attentions[34], which also exploit structured attention patterns. Compared to Catch Frequency Patching[3], which focuses on frequency-domain representations, Multivariate Parallel Attention[0] maintains a direct temporal-channel coupling strategy. The broader challenge remains how to scale these attention mechanisms efficiently while preserving the ability to handle variable channel counts and missing modalities, a tension evident across many branches of this taxonomy.

Claimed Contributions

Multi-variate parallel attention (MVPA)

MVPA is a novel self-attention mechanism that decomposes attention into three separate components: content-based, time-based, and channel-based attention. This decomposition enables the model to handle multi-variate time-series with heterogeneous channel configurations while maintaining computational efficiency through relative positional encoding and local attention windows.

10 retrieved papers
MVPFormer foundation model for human electrophysiology

MVPFormer is a Transformer-based foundation model powered by MVPA that processes heterogeneous iEEG data through generative pre-training in continuous embedding space. The model predicts future neuronal activity and demonstrates superior generalization across subjects and clinical tasks compared to vanilla attention-based models.

10 retrieved papers
Long-term iEEG dataset

The Long-term iEEG dataset is the largest publicly available iEEG corpus, containing nearly 10,000 hours of multi-channel recordings (540,000 channel-hours) from 68 subjects with 704 ictal events, fully curated and labeled by experienced clinicians to support foundation model development in the iEEG domain.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Multi-variate parallel attention (MVPA)

MVPA is a novel self-attention mechanism that decomposes attention into three separate components: content-based, time-based, and channel-based attention. This decomposition enables the model to handle multi-variate time-series with heterogeneous channel configurations while maintaining computational efficiency through relative positional encoding and local attention windows.

Contribution

MVPFormer foundation model for human electrophysiology

MVPFormer is a Transformer-based foundation model powered by MVPA that processes heterogeneous iEEG data through generative pre-training in continuous embedding space. The model predicts future neuronal activity and demonstrates superior generalization across subjects and clinical tasks compared to vanilla attention-based models.

Contribution

Long-term iEEG dataset

The Long-term iEEG dataset is the largest publicly available iEEG corpus, containing nearly 10,000 hours of multi-channel recordings (540,000 channel-hours) from 68 subjects with 704 ictal events, fully curated and labeled by experienced clinicians to support foundation model development in the iEEG domain.