Can we generate portable representations for clinical time series data using LLMs?
Overview
Overall Novelty Assessment
The paper proposes a summarize-then-embed pipeline (Record2Vec) using frozen large language models to create portable patient representations from irregular ICU time series. It sits within the 'Contrastive and Self-Supervised Representation Learning for Clinical Time Series' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of 41 papers across the field, suggesting the specific approach of leveraging frozen LLMs for clinical time series portability is not yet heavily explored in the literature examined.
The taxonomy reveals neighboring work in clinical vocabulary embeddings, multi-modal foundation models, and federated phenotyping from unstructured text. The paper diverges from these by focusing on time series rather than static codes or clinical notes, and by using frozen pretrained models rather than training embeddings from scratch. It connects to the broader domain adaptation branch through its emphasis on cross-hospital transferability, but differs by creating portable input representations rather than adapting trained models post-hoc.
Among 19 candidates examined across three contributions, only one refutable pair was identified for the summarize-then-embed pipeline itself. The deployment-first framing examined 10 candidates with none clearly refuting the contribution, while the multi-site evaluation examined 8 candidates with no refutations found. This limited search scope suggests the specific combination of natural language summarization and frozen embeddings for clinical time series portability has minimal direct overlap in the examined literature, though the search was not exhaustive.
Based on top-19 semantic matches and citation expansion, the work appears to occupy a relatively novel position combining frozen LLM summarization with cross-hospital portability objectives. The sparse taxonomy leaf and low refutation rate suggest limited prior work on this specific approach, though the analysis cannot rule out relevant work outside the examined candidate set.
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors propose a new perspective that treats portable input representations, rather than models themselves, as the primary transferable object across hospitals. This framing aims to reduce site-specific engineering overhead and calibration cycles when deploying clinical ML systems.
The authors introduce Record2Vec, a method that uses a frozen LLM to generate clinical summaries from irregular ICU time series data, then embeds these summaries with a frozen text encoder to produce fixed-length vectors. These vectors serve as portable inputs for downstream predictors without requiring model architecture modifications.
The authors perform comprehensive experiments across three ICU cohorts (MIMIC-IV, HiRID, PPICU) and multiple prediction tasks, demonstrating that their approach achieves competitive in-distribution performance while exhibiting better cross-site transfer, improved few-shot learning, and comparable or reduced demographic leakage compared to baseline methods.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[2] Contrastive Representation Learning Helps Cross-institutional Knowledge Transfer: A Study in Pediatric Ventilation Management PDF
[25] Forecasting adverse surgical events using self-supervised transfer learning for physiological signals PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
Deployment-first framing focusing on portable input representations for healthcare
The authors propose a new perspective that treats portable input representations, rather than models themselves, as the primary transferable object across hospitals. This framing aims to reduce site-specific engineering overhead and calibration cycles when deploying clinical ML systems.
[22] HTPS: Heterogeneous Transferring Prediction System for Healthcare Datasets PDF
[42] Illusory generalizability of clinical prediction models PDF
[43] Generalizability of clinical prediction models in mental health PDF
[44] Https: Heterogeneous transfer learning for split prediction system evaluated on healthcare data PDF
[45] A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications PDF
[46] Generalizable Seizure Prediction with LLMs: Converting EEG to Textual Representations. PDF
[47] A novel transfer learning based approach for pneumonia detection in chest X-ray images PDF
[48] TrajGPT: Irregular Time-Series Representation Learning of Health Trajectory PDF
[49] Fostering reproducibility and generalizability in machine learning for clinical prediction modeling in spine surgery PDF
[50] Development and transfer learning of self-attention model for major adverse cardiovascular events prediction across hospitals PDF
Record2Vec: summarize-then-embed pipeline using frozen language models
The authors introduce Record2Vec, a method that uses a frozen LLM to generate clinical summaries from irregular ICU time series data, then embeds these summaries with a frozen text encoder to produce fixed-length vectors. These vectors serve as portable inputs for downstream predictors without requiring model architecture modifications.
[51] TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series PDF
Multi-site evaluation demonstrating portability, data-efficiency, and privacy preservation
The authors perform comprehensive experiments across three ICU cohorts (MIMIC-IV, HiRID, PPICU) and multiple prediction tasks, demonstrating that their approach achieves competitive in-distribution performance while exhibiting better cross-site transfer, improved few-shot learning, and comparable or reduced demographic leakage compared to baseline methods.