Can we generate portable representations for clinical time series data using LLMs?

ICLR 2026 Conference SubmissionAnonymous Authors
Machine Learning for HealthcareICU Time-seriesLLMsRepresentation Learning
Abstract:

Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs) create portable patient embeddings i.e. representations of patients enable a downstream predictor built on one hospital to be elsewhere with minimal-to-no retraining and fine-tuning. To do so, we map from irregular ICU time series onto concise natural language summaries using a frozen LLM, then embed each summary with a frozen text embedding model to obtain a fixed length vector capable of serving as input to a variety of downstream predictors. Across three cohorts (MIMIC-IV, HIRID, PPICU), on multiple clinically grounded forecasting and classification tasks, we find that our approach is simple, easy to use and surprisingly competitive with in-distribution with grid imputation, self-supervised representation learning, and time series foundation models, while exhibiting smaller relative performance drops when transferring to new hospitals. We study the variation in performance across prompt design, with structured prompts being crucial to reducing the variance of the predictive models without altering mean accuracy. We find that using these portable representations improves few-shot learning and does not increase demographic recoverability of age or sex relative to baselines, suggesting little additional privacy risk. Our work points to the potential that LLMs hold as tools to enable the scalable deployment of production grade predictive models by reducing the engineering overhead.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes a summarize-then-embed pipeline (Record2Vec) using frozen large language models to create portable patient representations from irregular ICU time series. It sits within the 'Contrastive and Self-Supervised Representation Learning for Clinical Time Series' leaf, which contains only three papers total. This is a relatively sparse research direction within the broader taxonomy of 41 papers across the field, suggesting the specific approach of leveraging frozen LLMs for clinical time series portability is not yet heavily explored in the literature examined.

The taxonomy reveals neighboring work in clinical vocabulary embeddings, multi-modal foundation models, and federated phenotyping from unstructured text. The paper diverges from these by focusing on time series rather than static codes or clinical notes, and by using frozen pretrained models rather than training embeddings from scratch. It connects to the broader domain adaptation branch through its emphasis on cross-hospital transferability, but differs by creating portable input representations rather than adapting trained models post-hoc.

Among 19 candidates examined across three contributions, only one refutable pair was identified for the summarize-then-embed pipeline itself. The deployment-first framing examined 10 candidates with none clearly refuting the contribution, while the multi-site evaluation examined 8 candidates with no refutations found. This limited search scope suggests the specific combination of natural language summarization and frozen embeddings for clinical time series portability has minimal direct overlap in the examined literature, though the search was not exhaustive.

Based on top-19 semantic matches and citation expansion, the work appears to occupy a relatively novel position combining frozen LLM summarization with cross-hospital portability objectives. The sparse taxonomy leaf and low refutation rate suggest limited prior work on this specific approach, though the analysis cannot rule out relevant work outside the examined candidate set.

Taxonomy

Core-task Taxonomy Papers
41
3
Claimed Contributions
19
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: Portable patient representation learning for cross-hospital clinical prediction. The field addresses the challenge of building clinical models that generalize across institutions despite heterogeneous data sources, privacy constraints, and distributional shifts. The taxonomy reveals five major branches: Federated Learning Architectures for Privacy-Preserving Cross-Institutional Collaboration encompasses methods like Federated Semi-supervised Healthcare[1] and Unitrans Federated Framework[3] that enable collaborative model training without sharing raw patient data. Representation Learning and Embedding Methods for Cross-Institutional Portability focuses on learning transferable patient embeddings through techniques including contrastive learning, as seen in Contrastive Pediatric Ventilation[2], and unified embedding spaces like Unified Clinical Embeddings[7]. Domain Adaptation and Transfer Learning for Cross-Hospital Generalization tackles distributional shifts using approaches such as AdaDiag Domain Adaptation[23] and Adversarial Glucose Transfer[39]. Task-Specific Cross-Hospital Prediction Models with Validation Studies demonstrate practical applications across diverse clinical scenarios, from GRU Ileus Surveillance[11] to Sarcopenic Obesity Prediction[13]. Finally, Data Harmonization and Preprocessing Pipelines for Multi-Institutional Research, exemplified by EHR Harmonization Pipeline[37], addresses the foundational challenge of standardizing heterogeneous clinical data. A central tension emerges between privacy-preserving federated approaches and representation-learning methods that require richer data sharing for effective embedding construction. Within the representation learning branch, contrastive and self-supervised techniques have gained traction for learning robust features from clinical time series without extensive labels. Portable Clinical LLMs[0] sits within this contrastive learning cluster, emphasizing self-supervised pretraining to capture temporal patterns that transfer across hospitals. This approach contrasts with Self-supervised Surgical Events[25], which focuses on surgical workflow rather than longitudinal patient trajectories, and aligns closely with Contrastive Pediatric Ventilation[2] in leveraging temporal contrasts for cross-institutional robustness. The interplay between learning portable representations and maintaining patient privacy remains an active research frontier, with works exploring whether foundation models like Multi-modal Foundation Models[4] can bridge these competing demands through large-scale pretraining.

Claimed Contributions

Deployment-first framing focusing on portable input representations for healthcare

The authors propose a new perspective that treats portable input representations, rather than models themselves, as the primary transferable object across hospitals. This framing aims to reduce site-specific engineering overhead and calibration cycles when deploying clinical ML systems.

10 retrieved papers
Record2Vec: summarize-then-embed pipeline using frozen language models

The authors introduce Record2Vec, a method that uses a frozen LLM to generate clinical summaries from irregular ICU time series data, then embeds these summaries with a frozen text encoder to produce fixed-length vectors. These vectors serve as portable inputs for downstream predictors without requiring model architecture modifications.

1 retrieved paper
Can Refute
Multi-site evaluation demonstrating portability, data-efficiency, and privacy preservation

The authors perform comprehensive experiments across three ICU cohorts (MIMIC-IV, HiRID, PPICU) and multiple prediction tasks, demonstrating that their approach achieves competitive in-distribution performance while exhibiting better cross-site transfer, improved few-shot learning, and comparable or reduced demographic leakage compared to baseline methods.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Deployment-first framing focusing on portable input representations for healthcare

The authors propose a new perspective that treats portable input representations, rather than models themselves, as the primary transferable object across hospitals. This framing aims to reduce site-specific engineering overhead and calibration cycles when deploying clinical ML systems.

Contribution

Record2Vec: summarize-then-embed pipeline using frozen language models

The authors introduce Record2Vec, a method that uses a frozen LLM to generate clinical summaries from irregular ICU time series data, then embeds these summaries with a frozen text encoder to produce fixed-length vectors. These vectors serve as portable inputs for downstream predictors without requiring model architecture modifications.

Contribution

Multi-site evaluation demonstrating portability, data-efficiency, and privacy preservation

The authors perform comprehensive experiments across three ICU cohorts (MIMIC-IV, HiRID, PPICU) and multiple prediction tasks, demonstrating that their approach achieves competitive in-distribution performance while exhibiting better cross-site transfer, improved few-shot learning, and comparable or reduced demographic leakage compared to baseline methods.