Decentralized Attention Fails Centralized Signals: Rethinking Transformers for Medical Time Series
Overview
Overall Novelty Assessment
The paper proposes CoTAR, a centralized MLP-based module replacing decentralized attention in transformers for medical time series analysis. It sits within the Transformer-Based Joint Modeling leaf, which contains only three papers including this work. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting the specific approach of replacing attention with centralized token aggregation for joint temporal-channel modeling is not yet heavily explored in the medical time series literature.
The taxonomy reveals that joint spatiotemporal modeling represents one of several major branches, alongside temporal-only architectures (RNNs, attention-based temporal models) and channel-only methods (graph-based, MLP-based channel mixing). The paper's sibling works—Medformer and Dispformer—employ standard transformer blocks for joint modeling, while neighboring leaves include hybrid convolutional-recurrent architectures and multi-scale approaches. The scope notes indicate that transformers focusing solely on temporal attention belong elsewhere, clarifying that this leaf specifically addresses integrated temporal-channel mechanisms. The paper diverges from graph-based channel modeling and attention-based channel mixing by introducing a centralized proxy token rather than direct pairwise interactions.
Among 30 candidates examined, the CoTAR module shows one refutable candidate from 10 examined, while the TeCh framework with Adaptive Dual Tokenization also has one refutable candidate from 10 examined. The identification of structural mismatch between attention and medical time series appears more novel, with zero refutable candidates among 10 examined. This suggests that while the specific architectural components may have some precedent in the limited search scope, the conceptual framing of centralized versus decentralized modeling for medical signals represents a less-explored perspective. The analysis indicates moderate prior work overlap for the core technical contributions but stronger novelty in the problem formulation.
Based on the top-30 semantic matches examined, the work appears to occupy a relatively underexplored niche within transformer-based joint modeling, though the limited search scope means potentially relevant work outside this candidate set remains unexamined. The sparse population of the taxonomy leaf and the conceptual novelty of the centralization argument suggest meaningful differentiation from existing approaches, while the refutable candidates for specific modules indicate some architectural overlap within the examined literature.
Taxonomy
Research Landscape Overview
Claimed Contributions
CoTAR is a centralized MLP-based module that replaces the standard attention mechanism in Transformers. Instead of direct pairwise token interactions, it introduces a global core token that aggregates information from all tokens and redistributes it back, reducing computational complexity from quadratic to linear while better aligning with the centralized nature of medical time series signals.
TeCh is a unified framework built on CoTAR that can adaptively model temporal dependencies, channel dependencies, or both by adjusting the tokenization strategy (Temporal, Channel, or Dual). This flexibility allows the framework to better match the unique characteristics of different medical time series datasets.
The authors identify and formalize a fundamental mismatch: medical time series signals like EEG and ECG originate from centralized biological sources (brain, heart), while Transformer attention operates as a decentralized graph where all tokens interact equally. This mismatch causes attention to fail at capturing the global synchronization and unified patterns essential for modeling channel dependencies in medical signals.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[1] Medformer: A multi-granularity patching transformer for medical time-series classification PDF
[39] Dispformer: A Dual Attention Transformer with Denoising for Irregular Clinical Time Series Classification PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Core Token Aggregation-Redistribution (CoTAR) module
CoTAR is a centralized MLP-based module that replaces the standard attention mechanism in Transformers. Instead of direct pairwise token interactions, it introduces a global core token that aggregates information from all tokens and redistributes it back, reducing computational complexity from quadratic to linear while better aligning with the centralized nature of medical time series signals.
[63] Softs: Efficient multivariate time series forecasting with series-core fusion PDF
[1] Medformer: A multi-granularity patching transformer for medical time-series classification PDF
[61] CPAT: cross-patch aggregated transformer for time series forecasting PDF
[62] PeT-KeyStAtion: Parameter-efficient Transformer with Keypoint-guided Spatial-temporal Aggregation for Video-based Person Re-identification PDF
[64] Transformer models for land cover classification with satellite image time series PDF
[65] SCAT: A Time Series Forecasting with Spectral Central Alternating Transformers PDF
[66] Knowledge Aggregation Transformer Network for Multivariate Time Series Classification PDF
[67] Many minds, one goal: Time series forecasting via sub-task specialization and inter-agent cooperation PDF
[68] Split Federated Learning for Real-Time Aerial Video Event Recognition in UAV-Based Geospatial Monitoring PDF
[69] Multi-stage Aggregated Transformer Network for Temporal Language Localization in Videos PDF
TeCh framework with Adaptive Dual Tokenization
TeCh is a unified framework built on CoTAR that can adaptively model temporal dependencies, channel dependencies, or both by adjusting the tokenization strategy (Temporal, Channel, or Dual). This flexibility allows the framework to better match the unique characteristics of different medical time series datasets.
[72] MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection PDF
[33] MTS-Mixers: Multivariate Time Series Forecasting via Factorized Temporal and Channel Mixing PDF
[70] A Time Series is Worth 64 Words: Long-term Forecasting with Transformers PDF
[71] Empowering time series analysis with large language models: A survey PDF
[73] ODTrack: Online Dense Temporal Token Learning for Visual Tracking PDF
[74] Multimodal temporal context network for tracking dynamic changes in emotion PDF
[75] Adaptive Tokenization Transformer: Enhancing Irregularly Sampled Multivariate Time-Series Analysis PDF
[76] Multiple-resolution tokenization for time series forecasting with an application to pricing PDF
[77] Occsora: 4d occupancy generation models as world simulators for autonomous driving PDF
[78] Learning Adaptive and Temporally Causal Video Tokenization in a 1D Latent Space PDF
Identification of structural mismatch between attention and medical time series
The authors identify and formalize a fundamental mismatch: medical time series signals like EEG and ECG originate from centralized biological sources (brain, heart), while Transformer attention operates as a decentralized graph where all tokens interact equally. This mismatch causes attention to fail at capturing the global synchronization and unified patterns essential for modeling channel dependencies in medical signals.