SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

accelerometerIMUfoundation modelsself-supervised learningtime-seriesslot attention

Wearable accelerometers are used for a wide range of applications, such as gesture recognition, gait analysis, and sports monitoring. Yet most existing foundation models focus primarily on classifying common daily activities such as locomotion and exercise, limiting their applicability to the broader range of tasks that rely on other signal characteristics. We present SlotFM, an accelerometer foundation model that generalizes across diverse downstream tasks. SlotFM uses Time-Frequency Slot Attention, an extension of Slot Attention that processes both time and frequency representations of the raw signals. It generates multiple small embeddings (slots), each capturing different signal components, enabling task-specific heads to focus on the most relevant parts of the data. We also introduce two loss regularizers that capture local structure and frequency patterns, which improve reconstruction of fine-grained details and helps the embeddings preserve task-relevant information. We evaluate SlotFM on 16 classification and regression downstream tasks that extend beyond standard human activity recognition. It outperforms existing self-supervised approaches on 13 of these tasks and achieves comparable results to the best performing approaches on the remaining tasks. On average, our method yields a 4.5% performance gain, demonstrating strong generalization for sensing foundation models.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

SlotFM introduces a foundation model for accelerometer data using Time-Frequency Slot Attention to generate multiple small embeddings capturing different signal components. The paper sits in the General Self-Supervised Pretraining leaf, which contains only three papers total including SlotFM itself. This is a relatively sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that self-supervised foundation models for accelerometer data remain an emerging area compared to more established branches like supervised classification or transfer learning.

The taxonomy reveals that SlotFM's closest neighbors are other self-supervised approaches within the same parent branch, while adjacent leaves focus on physiological signal coupling or unsupervised clustering. The broader Transfer Learning and Domain Adaptation branch (containing 10 papers across four leaves) addresses generalization through explicit domain alignment strategies, whereas SlotFM pursues generalization through task-agnostic pretraining. The Supervised Feature Learning branch (11 papers) represents the traditional paradigm of end-to-end architectures for labeled data, from which SlotFM diverges by learning representations without activity labels.

Among 20 candidates examined across three contributions, no clearly refuting prior work was identified. The Time-Frequency Slot Attention mechanism was assessed against 8 candidates with no refutations found, the two loss regularizers against 2 candidates, and the foundation model benchmark against 10 candidates. This limited search scope suggests that within the top-20 semantically similar papers, no work appears to provide substantial overlap with SlotFM's specific technical approach. However, the small candidate pool means the analysis cannot rule out relevant prior work beyond these top matches.

Based on the limited literature search of 20 candidates, SlotFM appears to occupy a relatively novel position combining slot-based attention with time-frequency processing for accelerometer foundation models. The sparse population of its taxonomy leaf and absence of refuting candidates within the examined scope suggest distinctiveness, though the analysis does not cover the full breadth of self-supervised learning or attention mechanism literature beyond accelerometer-specific applications.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Learning generalizable accelerometer representations across diverse motion tasks. The field organizes around six main branches that reflect different methodological emphases and application contexts. Self-Supervised and Unsupervised Representation Learning explores pretraining strategies that do not require labeled data, enabling models to capture general motion patterns before fine-tuning on specific tasks. Transfer Learning and Domain Adaptation focuses on adapting models trained in one setting to new domains, devices, or user populations, addressing the challenge of distribution shift across datasets. Supervised Feature Learning and Classification develops end-to-end architectures and handcrafted features for labeled activity data, while Application-Specific Activity Recognition targets specialized domains such as clinical monitoring, animal behavior, or industrial settings. Signal Processing and Sensor Methodology examines foundational issues like sensor placement, orientation estimation, and signal preprocessing. Finally, Datasets, Benchmarks, and Methodological Reviews provide the empirical infrastructure and comparative analyses that guide the field's progress. Recent work highlights a tension between domain-specific tuning and broadly generalizable representations. Many studies in transfer learning, such as CrossHAR[3] and Cross-domain HAR[2], tackle cross-dataset or cross-device generalization by aligning feature distributions or leveraging domain adaptation techniques. Meanwhile, self-supervised pretraining approaches like Self-supervised IMU[47] and Accelerometer Foundation Models[11] aim to learn universal motion embeddings that transfer widely without extensive labeled data. SlotFM[0] sits within the General Self-Supervised Pretraining cluster, emphasizing the discovery of reusable motion primitives through unsupervised methods. Compared to CrossHAR[3], which explicitly addresses domain shift via adversarial or alignment strategies, SlotFM[0] focuses on learning compositional representations that generalize by capturing fundamental motion structures. This contrast underscores an open question: whether generalization is best achieved through explicit domain adaptation or through richer, task-agnostic pretraining that naturally transfers across contexts.

Claimed Contributions

Time-Frequency Slot Attention for accelerometer signals

8 retrieved papers

The authors introduce an extension of Slot Attention that processes accelerometer data in both time and frequency domains. It generates multiple slot vectors that each capture different signal components, enabling task-specific heads to focus on relevant features across diverse downstream tasks.

8 retrieved papers

Two loss regularizers for improved signal reconstruction

2 retrieved papers

The authors introduce SSIM (Structural Similarity Index Measure) and MS-STFT (Multi-Scale Short-Term Fourier Transform) as loss regularizers. These losses encourage the model to preserve structural patterns and high-frequency details in the signal reconstruction, improving downstream task performance.

2 retrieved papers

SlotFM foundation model and diverse downstream benchmark

10 retrieved papers

The authors train and release SlotFM, an accelerometer foundation model, and evaluate it on 16 classification and regression tasks spanning gestures, sports, cooking, and transportation. They also release code for model training and benchmark setup to support reproducibility.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining PDF

Zhiqing Hong, Zelong Li, SHUXIN ZHONG, Wen-Jun Lyu, Haotian Wang, Wenjun Lyu, Yi Ding, Tian HE, Desheng Zhang (2024)

[47] Self-supervised Learning for IMU-based Human Activity Recognition PDF

SR Taghanaki (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Time-Frequency Slot Attention for accelerometer signals

[53] A Temperature Compensation Approach for Micro-Electro-Mechanical Systems Accelerometer Based on Gated Recurrent UnitâAttention and Robust Local Mean DecompositionâSample EntropyâTime-Frequency Peak Filtering PDF

Cannot Refute

[54] ATFA: Adversarial timeâfrequency attention network for sensor-based multimodal human activity recognition PDF

Cannot Refute

[55] Qualify assessment for extrusion-based additive manufacturing with 3D scan and machine learning PDF

Cannot Refute

[56] Deep Wavelet Convolutional Neural Networks for Multimodal Human Activity Recognition Using Wearable Inertial Sensors PDF

Cannot Refute

[57] Detecting Minor Symptoms of Parkinsonâs Disease in the Wild Using Bi-LSTM with Attention Mechanism PDF

Cannot Refute

[58] WF-SwinT: A Wavelet Fusion Method for Fault Diagnosis of Variable-Speed Rolling Bearings PDF

Cannot Refute

[59] FreqTime-HAR: Self-supervised Multimodal Fusion via Transformer for Robust Human Activity Recognition PDF

Cannot Refute

[60] Multimodal Spatiotemporal Feature-Based Human Motion Pattern Recognition With CNN-Transformer-Attention Framework PDF

Cannot Refute

Contribution

Two loss regularizers for improved signal reconstruction

[51] Signal recovery in structural health monitoring via dual-domain transformer-GAN with dynamic masking PDF

Cannot Refute

[52] Towards building text-to-speech systems for the next billion users PDF

Cannot Refute

Contribution

SlotFM foundation model and diverse downstream benchmark

[11] Wearable Accelerometer Foundation Models for Health via Knowledge Distillation PDF

Cannot Refute

[61] AI foundation models for wearable movement data in mental health research PDF

Cannot Refute

[62] Towards on-device foundation models for raw wearable signals PDF

Cannot Refute

[63] SelfPAB: large-scale pre-training on accelerometer data for human activity recognition PDF

Cannot Refute

[64] MASTER: A multi-modal foundation model for human activity recognition PDF

Cannot Refute

[65] Scaling Wearable Foundation Models PDF

Cannot Refute

[66] General pre-trained inertial signal feature extraction based on temporal memory fusion PDF

Cannot Refute

[67] Self-supervised learning of wrist-worn daily living accelerometer data improves the automated detection of gait in older adults PDF

Cannot Refute

[68] Self-supervised learning for human activity recognition using 700,000 person-days of wearable data PDF

Cannot Refute

[69] â¦ models for wearable sensor-based human activity recognition, health monitoring, and behavioral modeling: A survey of early trends, datasets, and challenges PDF

Cannot Refute

SlotFM: A Motion Foundation Model with Slot Attention for Diverse Downstream Tasks

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining PDF

[47] Self-supervised Learning for IMU-based Human Activity Recognition PDF

Contribution Analysis

Time-Frequency Slot Attention for accelerometer signals

[53] A Temperature Compensation Approach for Micro-Electro-Mechanical Systems Accelerometer Based on Gated Recurrent UnitâAttention and Robust Local Mean DecompositionâSample EntropyâTime-Frequency Peak Filtering PDF

[54] ATFA: Adversarial timeâfrequency attention network for sensor-based multimodal human activity recognition PDF

[55] Qualify assessment for extrusion-based additive manufacturing with 3D scan and machine learning PDF

[56] Deep Wavelet Convolutional Neural Networks for Multimodal Human Activity Recognition Using Wearable Inertial Sensors PDF

[57] Detecting Minor Symptoms of Parkinsonâs Disease in the Wild Using Bi-LSTM with Attention Mechanism PDF

[58] WF-SwinT: A Wavelet Fusion Method for Fault Diagnosis of Variable-Speed Rolling Bearings PDF

[59] FreqTime-HAR: Self-supervised Multimodal Fusion via Transformer for Robust Human Activity Recognition PDF

[60] Multimodal Spatiotemporal Feature-Based Human Motion Pattern Recognition With CNN-Transformer-Attention Framework PDF

Two loss regularizers for improved signal reconstruction

[51] Signal recovery in structural health monitoring via dual-domain transformer-GAN with dynamic masking PDF

[52] Towards building text-to-speech systems for the next billion users PDF

SlotFM foundation model and diverse downstream benchmark

[11] Wearable Accelerometer Foundation Models for Health via Knowledge Distillation PDF

[61] AI foundation models for wearable movement data in mental health research PDF

[62] Towards on-device foundation models for raw wearable signals PDF

[63] SelfPAB: large-scale pre-training on accelerometer data for human activity recognition PDF

[64] MASTER: A multi-modal foundation model for human activity recognition PDF

[65] Scaling Wearable Foundation Models PDF

[66] General pre-trained inertial signal feature extraction based on temporal memory fusion PDF

[67] Self-supervised learning of wrist-worn daily living accelerometer data improves the automated detection of gait in older adults PDF

[68] Self-supervised learning for human activity recognition using 700,000 person-days of wearable data PDF

[69] â¦ models for wearable sensor-based human activity recognition, health monitoring, and behavioral modeling: A survey of early trends, datasets, and challenges PDF

Table of Contents

[53] A Temperature Compensation Approach for Micro-Electro-Mechanical Systems Accelerometer Based on Gated Recurrent UnitâAttention and Robust Local Mean DecompositionâSample EntropyâTime-Frequency Peak Filtering PDF

[54] ATFA: Adversarial timeâfrequency attention network for sensor-based multimodal human activity recognition PDF

[57] Detecting Minor Symptoms of Parkinsonâs Disease in the Wild Using Bi-LSTM with Attention Mechanism PDF

[69] â¦ models for wearable sensor-based human activity recognition, health monitoring, and behavioral modeling: A survey of early trends, datasets, and challenges PDF