Context parroting: A simple but tough-to-beat baseline for foundation models in scientific machine learning

ICLR 2026 Conference SubmissionAnonymous Authors
time seriesfoundation modelsdynamical systemsforecastingchaosphysicsscientific machine learning
Abstract:

Recent time-series foundation models exhibit strong abilities to predict physical systems. These abilities include zero-shot forecasting, in which a model forecasts future states of a system given only a short trajectory as context, without knowledge of the underlying physics. Here, we show that foundation models often forecast through a simple parroting strategy, and when they are not parroting they exhibit some shared failure modes such as converging to the mean. As a result, a naive context parroting model that copies directly from the context scores higher than leading time-series foundation models on predicting a diverse range of dynamical systems, including low-dimensional chaos, turbulence, coupled oscillators, and electrocardiograms, at a tiny fraction of the computational cost. We draw a parallel between context parroting and induction heads, which explains recent works showing that large language models can often be repurposed for time series forecasting. Our dynamical systems perspective also ties the scaling between forecast accuracy and context length to the fractal dimension of the underlying chaotic attractor, providing insight into previously observed in-context neural scaling laws. By revealing the performance gaps and failure modes of current time-series foundation models, context parroting can guide the design of future foundation models and help identify in-context learning strategies beyond parroting.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper investigates whether time-series foundation models genuinely learn temporal dynamics or rely on simple context parroting strategies for zero-shot forecasting. It resides in the 'Capability and Limitation Analysis' leaf under 'Evaluation, Benchmarking, and Analysis', alongside four sibling papers examining reasoning abilities, memorization, and generalization gaps. This leaf is moderately populated within a taxonomy of 50 papers, indicating that capability analysis is an active but not overcrowded research direction. The work's focus on failure modes and parroting mechanisms positions it within ongoing debates about what foundation models truly learn versus what they memorize.

The taxonomy reveals several neighboring research directions. The sibling leaf 'Comprehensive Benchmarking Frameworks' proposes unified evaluation protocols, while 'Domain-Specific Evaluation Studies' examines performance on chaotic systems, industrial data, and healthcare applications. Nearby branches include 'Enhancement Mechanisms and Augmentation Strategies', which explore retrieval-augmented and reasoning-based improvements, and 'Foundation Model Architectures', which propose transformer, diffusion, and state-space designs. The paper's analytical stance contrasts with these architecture-focused and application-driven directions, instead probing the fundamental mechanisms underlying zero-shot forecasting success and failure across diverse dynamical systems.

Among 20 candidates examined across three contributions, the analysis found limited prior work overlap. The 'context parroting baseline' contribution examined zero candidates, suggesting this specific framing may be novel. The 'failure modes' contribution examined 10 candidates and identified one potentially refutable paper, indicating some existing work on model limitations but not comprehensive coverage. The 'fractal dimension scaling laws' contribution examined 10 candidates with zero refutations, suggesting this theoretical connection may be relatively unexplored. The modest search scope (20 candidates total) means these findings reflect top semantic matches rather than exhaustive field coverage.

Based on the limited search of 20 semantically similar papers, the work appears to offer fresh perspectives on foundation model mechanisms, particularly the parroting baseline and fractal dimension analysis. However, the single refutable candidate for failure mode analysis suggests some overlap with existing capability studies. The taxonomy context shows this sits in an active evaluation cluster, so broader literature may contain additional relevant work not captured in the top-20 semantic matches.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: zero-shot forecasting of dynamical systems using time-series foundation models. The field has rapidly organized around five main branches. Foundation Model Architectures and Pre-training Strategies explore diverse backbone designs—ranging from decoder-only transformers like A decoder-only foundation model[1] and Lag-llama[4] to state-space models such as A Mamba Foundation Model[17]—and the large-scale pre-training regimes that enable generalization across unseen domains. Enhancement Mechanisms and Augmentation Strategies investigate techniques like retrieval-augmented generation (Ts-rag[6]) and diffusion-based paradigms (Generative pre-trained diffusion paradigm[19]) to improve robustness and adaptability. Evaluation, Benchmarking, and Analysis provides systematic assessments through benchmarks like Tsfm-bench[7] and studies examining capability boundaries (Uncovering Zero-Shot Generalization Gaps[29], Are Time Series Foundation[21]). Domain-Specific Applications and Adaptations tailor foundation models to specialized contexts such as healthcare (Foundation Models for Clinical[47]), traffic (Zero-Shot Traffic Flow Prediction[46]), and climate (CarbonX[22]). Finally, Efficiency and Optimization Techniques address computational constraints and deployment challenges, ensuring scalability for real-world use. A particularly active line of work focuses on understanding what foundation models truly learn versus what they memorize, with studies like Measuring Memorization and Generalization[42] and Implicit Reasoning in Deep[45] probing the mechanisms behind zero-shot success. Another contrasting direction examines whether models genuinely capture temporal dynamics or rely on simpler heuristics, as explored in Only the curve shape[16] and Are Time-Series Foundation Models[48]. Context parroting[0] sits squarely within the Capability and Limitation Analysis cluster, investigating whether models parrot contextual patterns rather than learning robust forecasting principles. Its emphasis on dissecting failure modes aligns closely with Uncovering Zero-Shot Generalization Gaps[29], which systematically identifies distribution shifts that break generalization, and complements Measuring Memorization and Generalization[42], which quantifies the memorization-generalization trade-off. Together, these works highlight ongoing debates about the true reasoning capabilities of time-series foundation models and the conditions under which zero-shot forecasting remains reliable.

Claimed Contributions

Context parroting as a simple baseline for zero-shot forecasting

The authors introduce context parroting, a naive nearest-neighbor algorithm that copies matching motifs from context data to make forecasts. This baseline outperforms leading time-series foundation models on dynamical systems while requiring minimal computational cost, revealing performance gaps in current models.

0 retrieved papers
Revealing failure modes of time-series foundation models

The authors demonstrate that context parroting surpasses state-of-the-art foundation models (Chronos, TimesFM, Time-MoE, Moirai, DynaMix) on forecasting diverse dynamical systems, exposing shared failure modes such as converging to the mean and inability to fully utilize context data.

10 retrieved papers
Can Refute
Linking in-context neural scaling laws to fractal dimension

The authors provide a theoretical explanation for the power-law relationship between forecast accuracy and context length observed in foundation models. They connect the scaling coefficient to the fractal dimension of chaotic attractors, offering geometric insights into in-context learning.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Context parroting as a simple baseline for zero-shot forecasting

The authors introduce context parroting, a naive nearest-neighbor algorithm that copies matching motifs from context data to make forecasts. This baseline outperforms leading time-series foundation models on dynamical systems while requiring minimal computational cost, revealing performance gaps in current models.

Contribution

Revealing failure modes of time-series foundation models

The authors demonstrate that context parroting surpasses state-of-the-art foundation models (Chronos, TimesFM, Time-MoE, Moirai, DynaMix) on forecasting diverse dynamical systems, exposing shared failure modes such as converging to the mean and inability to fully utilize context data.

Contribution

Linking in-context neural scaling laws to fractal dimension

The authors provide a theoretical explanation for the power-law relationship between forecast accuracy and context length observed in foundation models. They connect the scaling coefficient to the fractal dimension of chaotic attractors, offering geometric insights into in-context learning.