InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
The authors provide rigorous theoretical proofs demonstrating that both fixed-compression tokenizers and existing adaptive tokenizers using data-agnostic routers (such as uniform sampling) are suboptimal in terms of expected token length compared to information-theoretic optimality. They show these methods fail to achieve near-optimal compression rates.
The authors introduce INFOTOK, a novel framework for adaptive video tokenization that uses an Evidence Lower Bound (ELBO)-based router to determine token sequence lengths based on video information complexity, combined with a transformer-based adaptive compressor that efficiently compresses embeddings into variable-length token sequences.
The authors conduct comprehensive experiments showing that INFOTOK achieves state-of-the-art compression performance, saving approximately 20% tokens without performance loss and achieving 2.3× better compression rates compared to prior adaptive approaches while maintaining or improving reconstruction quality.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
Contribution Analysis
Detailed comparisons for each claimed contribution
Theoretical proof of suboptimality in existing tokenizers
The authors provide rigorous theoretical proofs demonstrating that both fixed-compression tokenizers and existing adaptive tokenizers using data-agnostic routers (such as uniform sampling) are suboptimal in terms of expected token length compared to information-theoretic optimality. They show these methods fail to achieve near-optimal compression rates.
[19] Language modeling is compression PDF
[20] Single-pass adaptive image tokenization for minimum program search PDF
[21] Unpacking tokenization: Evaluating text compression and its correlation with model performance PDF
[22] Tokenization and the noiseless channel PDF
[23] Emergent architectural dynamics of neural token compression in large language models PDF
[24] Training llms over neurally compressed text PDF
[25] MOAT: Revealing the Task-Optimality Gap in Adaptive Tokenization PDF
[26] Leveraging Information Theoretic ToolsFor Foundation Model Analysis PDF
[27] WSDL term tokenization methods for IR-style Web services discovery PDF
[28] HutterX â Omniscientrix Hybrid Compressor (vΩ Unified InformationalâAwareness Framework Build) by Cornelius Aurelius PDF
INFOTOK framework with ELBO-based router and adaptive compressor
The authors introduce INFOTOK, a novel framework for adaptive video tokenization that uses an Evidence Lower Bound (ELBO)-based router to determine token sequence lengths based on video information complexity, combined with a transformer-based adaptive compressor that efficiently compresses embeddings into variable-length token sequences.
Empirical validation of superior token efficiency
The authors conduct comprehensive experiments showing that INFOTOK achieves state-of-the-art compression performance, saving approximately 20% tokens without performance loss and achieving 2.3× better compression rates compared to prior adaptive approaches while maintaining or improving reconstruction quality.