BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching

ICLR 2026 Conference SubmissionAnonymous Authors
Diffusion ModelVideo GenerationCache
Abstract:

Recent advancements in Diffusion Transformers (DiTs) have established them as the state-of-the-art method for video generation. However, their inherently sequential denoising process results in inevitable latency, limiting real-world applicability. Existing acceleration methods either compromise visual quality due to architectural modifications or fail to reuse intermediate features at proper granularity. Our analysis reveals that DiT blocks are the primary contributors to inference latency. Across diffusion timesteps, the feature variations of DiT blocks exhibit a U-shaped pattern with high similarity during intermediate timesteps, which suggests substantial computational redundancy. In this paper, we propose Block-Wise Caching (BWCache), a training-free method to accelerate DiT-based video generation. BWCache dynamically caches and reuses features from DiT blocks across diffusion timesteps. Furthermore, we introduce a similarity indicator that triggers feature reuse only when the differences between block features at adjacent timesteps fall below a threshold, thereby minimizing redundant computations while maintaining visual fidelity. Extensive experiments on several video diffusion models demonstrate that BWCache achieves up to 2.24×\times speedup with comparable visual quality.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes BWCache, a training-free method that caches and reuses features from entire DiT blocks across diffusion timesteps to accelerate video generation. It resides in the Block and Layer-Level Caching leaf, which contains four papers including the original work. This leaf sits within the broader Feature Caching Granularity and Reuse Strategies branch, indicating a moderately populated research direction focused on structural unit caching. The taxonomy shows this is one of four granularity approaches, suggesting the field has diversified into multiple caching strategies rather than converging on a single dominant paradigm.

The taxonomy reveals neighboring leaves exploring alternative granularities: Token-Level Caching (three papers) focuses on selective token reuse, while Hybrid and Multi-Granularity Caching (three papers) combines multiple levels. The Temporal Scheduling and Adaptive Caching branch (eleven papers across four leaves) addresses complementary questions of when to cache, with Similarity-Driven Adaptive Caching being particularly relevant. BWCache's block-level approach contrasts with token-wise methods that offer finer control but higher overhead, and differs from hybrid frameworks that blend multiple granularities. The taxonomy's scope and exclude notes clarify that BWCache's structural unit focus distinguishes it from temporal scheduling or memory optimization directions.

Among thirty candidates examined, the analysis found nine refutable pairs across three contributions. The core BWCache method examined ten candidates with two appearing to refute it, while the similarity indicator examined ten with three refutable matches, and the U-shaped variation analysis examined ten with four refutable candidates. These statistics suggest that within the limited search scope, each contribution faces some degree of prior overlap, with the feature dynamics analysis encountering the most substantial prior work. The block-wise caching concept and similarity-based triggering both show moderate overlap among the examined candidates, though the search scale limits definitive conclusions about field-wide novelty.

Based on the top-thirty semantic matches examined, the work appears to build on established block-level caching concepts with incremental refinements in similarity-based triggering and feature dynamics analysis. The taxonomy structure indicates this is an active but not overcrowded research area, with the original paper positioned among three siblings in its leaf. The analysis does not cover exhaustive citation networks or recent preprints, so additional related work may exist beyond the examined candidates.

Taxonomy

Core-task Taxonomy Papers
33
3
Claimed Contributions
30
Contribution Candidate Papers Compared
9
Refutable Paper

Research Landscape Overview

Core task: accelerating video diffusion transformers through block-wise caching. The field of accelerating video diffusion transformers has rapidly diversified into several complementary directions. Feature Caching Granularity and Reuse Strategies explores how to cache and reuse intermediate computations at different levels—ranging from token-wise approaches like Token-Wise Feature Caching[1] and Token Caching[3] to block and layer-level methods such as BWCache[0] and Blockdance[11]. Temporal Scheduling and Adaptive Caching focuses on dynamically deciding when and what to cache across diffusion timesteps, with works like Adaptive Caching[6] and Runtime-Adaptive Caching[7] learning or heuristically adjusting cache policies. Memory and Storage Optimization tackles the overhead of storing cached features through quantization (Quantcache[15]) and compression techniques (MagCache[16], Ca2-VDM[17]). Specialized Architectures and Conditioning investigates architectural modifications and conditioning mechanisms that inherently reduce computation, while Distributed and Parallel Inference addresses multi-device scenarios. Finally, Training-Free Acceleration Frameworks encompasses holistic systems that combine multiple strategies without requiring model retraining, exemplified by approaches like FORA[23] and Unicp[24]. Within Feature Caching Granularity and Reuse Strategies, a central tension emerges between fine-grained token-level caching—which offers flexibility but may incur higher bookkeeping costs—and coarser block or layer-level caching that simplifies implementation at the potential expense of adaptability. BWCache[0] sits squarely in the Block and Layer-Level Caching cluster alongside Learning-to-Cache[8] and CorGi[29], emphasizing structured reuse of entire transformer blocks across timesteps. This contrasts with token-centric methods like Token Caching[3] and Dual Feature Caching[2], which selectively cache individual tokens based on redundancy metrics. Meanwhile, hybrid strategies such as Blockdance[11] blend block-level decisions with finer control, illustrating ongoing exploration of the granularity sweet spot. The interplay between caching granularity, memory footprint, and quality preservation remains an active research question, with BWCache[0] contributing a block-wise perspective that balances efficiency gains against the need for temporal coherence in video generation.

Claimed Contributions

Block-Wise Caching (BWCache) method for accelerating DiT-based video generation

The authors introduce BWCache, a training-free acceleration method that dynamically caches and reuses features from DiT blocks across diffusion timesteps. This method can be seamlessly integrated into most DiT-based models as a plug-and-play component during inference.

10 retrieved papers
Can Refute
Similarity indicator for triggering feature reuse

The authors propose a similarity indicator based on the relative L1 distance between block features at adjacent timesteps. This indicator determines when to reuse cached features versus recomputing them, balancing computational efficiency with visual quality.

10 retrieved papers
Can Refute
Analysis of DiT block feature dynamics revealing U-shaped variation pattern

The authors analyze DiT block feature variations across diffusion timesteps, discovering a U-shaped pattern where intermediate timesteps exhibit high similarity and substantial computational redundancy. This analysis motivates the block-wise caching approach.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Block-Wise Caching (BWCache) method for accelerating DiT-based video generation

The authors introduce BWCache, a training-free acceleration method that dynamically caches and reuses features from DiT blocks across diffusion timesteps. This method can be seamlessly integrated into most DiT-based models as a plug-and-play component during inference.

Contribution

Similarity indicator for triggering feature reuse

The authors propose a similarity indicator based on the relative L1 distance between block features at adjacent timesteps. This indicator determines when to reuse cached features versus recomputing them, balancing computational efficiency with visual quality.

Contribution

Analysis of DiT block feature dynamics revealing U-shaped variation pattern

The authors analyze DiT block feature variations across diffusion timesteps, discovering a U-shaped pattern where intermediate timesteps exhibit high similarity and substantial computational redundancy. This analysis motivates the block-wise caching approach.

BWCache: Accelerating Video Diffusion Transformers through Block-Wise Caching | Novelty Validation