CTBench: Cryptocurrency Time Series Generation Benchmark

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Time Series GenerationCrypto-centric BenchmarkCryptocurrency MarketsFinancial Evaluation Measure Suite

Synthetic time series are vital for data augmentation, stress testing, and prototyping in quantitative finance. Yet in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work targets non-financial or traditional financial domains, focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and lacks critical financial evaluations, particularly for trading applications. To bridge these gaps, we introduce \textbf{CTBench}, the first \textbf{C}ryptocurrency \textbf{T}ime series generation \textbf{Bench}mark. It curates an open-source dataset of 452 tokens and evaluates models across 13 metrics spanning forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency. A key innovation is a dual-task evaluation framework: the Predictive Utility measures how well synthetic data preserves temporal and cross-sectional patterns for forecasting, while the Statistical Arbitrage assesses whether reconstructed series support mean-reverting signals for trading. We systematically benchmark eight state-of-the-art models from five TSG families across four market regimes, revealing trade-offs between statistical quality and real-world profitability. Notably, CTBench provides ranking analysis and practical guidance for deploying TSG models in crypto analytics and trading applications. The source code is available at \url{https://anonymous.4open.science/r/CTBench-F5A3/}.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces CTBench, a comprehensive benchmark for cryptocurrency time series generation, positioned within the Benchmarking and Evaluation Frameworks leaf of the taxonomy. This leaf currently contains only this single paper, indicating a sparse research direction with no direct sibling works. The contribution addresses a recognized gap in systematic evaluation protocols for crypto-specific synthetic data generation, distinguishing itself from the more populated Forecasting Methods and Synthetic Data Generation branches that contain 8-15 papers each focused on model development rather than standardized assessment.

The taxonomy reveals CTBench sits at the intersection of multiple active research directions. The Synthetic Data Generation branch (4 GAN-focused papers) and Forecasting Methods branch (20+ papers across RNNs, classical models, and hybrid approaches) represent neighboring work that CTBench aims to evaluate. The scope_note for Benchmarking explicitly excludes 'individual models without systematic multi-metric benchmarking,' positioning this work as complementary infrastructure rather than competing methodology. The dual-task framework bridges evaluation gaps between generation quality and downstream forecasting utility, connecting to both the generation and forecasting branches.

Among 20 candidates examined across three contributions, no clearly refuting prior work emerged. The CTBench benchmark itself examined 6 candidates with 0 refutable matches, suggesting novelty in providing crypto-specific evaluation infrastructure. The dual-task framework similarly showed 6 candidates examined, 0 refutable, indicating the predictive utility plus statistical arbitrage pairing appears distinctive. The comprehensive financial metric suite examined 8 candidates with no refutations, though this may reflect the limited search scope rather than absolute novelty in individual metrics. The absence of sibling papers in the same taxonomy leaf reinforces that systematic crypto generation benchmarking remains underexplored.

Based on top-20 semantic matches, the work appears to occupy genuinely sparse territory within cryptocurrency time series research. The taxonomy structure shows active development in forecasting architectures and GAN-based generation, but minimal prior effort toward standardized evaluation frameworks combining financial metrics with crypto-specific characteristics. The analysis covers methodological positioning but cannot assess whether individual evaluation metrics or dataset curation practices overlap with broader financial benchmarking literature outside the examined candidate set.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: cryptocurrency time series generation and evaluation. The field divides into several complementary branches that together address the challenge of modeling and predicting highly volatile digital asset markets. Forecasting Methods and Architectures encompasses a diverse range of techniques from classical statistical models like ARIMA (Crypto Fear Greed ARIMA[3], Cryptocurrency ARIMA[13]) to modern deep learning approaches including LSTMs (LSTM Cryptocurrency Prices[14], Bitcoin LSTM Python[24]) and attention-based architectures (Temporal Fusion Transformer[1]). Synthetic Data Generation explores generative models such as GANs (GANs Synthetic Tabular[8], Wasserstein GAN Bitcoin[15]) and VAEs (VAE Quantile Modeling[10]) to create realistic cryptocurrency data for training and testing. Market Dynamics and Volatility Modeling focuses on capturing the unique behavioral patterns of crypto markets, while Application-Oriented Studies apply these methods to trading strategies and portfolio construction (Synthetic Portfolio Construction[16], OAC Quantitative Trading[17]). Finally, Benchmarking and Evaluation Frameworks provide systematic ways to assess model quality and generalization. Recent work reveals tension between model complexity and interpretability, with hybrid approaches (Bitcoin Hybrid Forecasting[5]) attempting to balance statistical rigor and predictive power. The synthetic generation branch has grown substantially, driven by privacy concerns and data scarcity, with studies like GANs Financial Data[28] and TSTR Financial Fraud[26] exploring train-on-synthetic-test-on-real paradigms. CTBench[0] sits squarely within the Benchmarking and Evaluation Frameworks branch, addressing a critical gap by providing standardized metrics and protocols for comparing both forecasting models and synthetic generators. Unlike application-focused works such as Genetic Algorithm Features[2] or sentiment-driven approaches (Crypto Fear Greed[4]), CTBench[0] emphasizes rigorous evaluation methodology, offering the community a common ground for assessing whether generated time series preserve essential statistical properties and whether forecasting models generalize across different cryptocurrency datasets.

Claimed Contributions

CTBench: Cryptocurrency Time Series Generation Benchmark

6 retrieved papers

The authors introduce CTBench, the first benchmark specifically designed for evaluating time series generation methods in cryptocurrency markets. It provides an open-source dataset of 452 tokens and evaluates models across 13 metrics spanning forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency.

6 retrieved papers

Dual-task evaluation framework

6 retrieved papers

The authors propose a dual-task evaluation framework that assesses both predictive realism and tradable structure. The Predictive Utility task measures whether synthetic data preserves forecasting signals, while the Statistical Arbitrage task evaluates whether reconstructed series enable market-neutral trading strategies.

6 retrieved papers

Comprehensive financial metric suite for crypto-specific evaluation

8 retrieved papers

The authors develop a comprehensive suite of 13 financial metrics grouped into six categories (error-based, rank-based, trading performance, risk assessment, efficiency, and visualization) specifically designed to evaluate TSG models in cryptocurrency contexts, addressing limitations of existing benchmarks that focus on traditional markets.

8 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution