Abstract:

Diffusion models have achieved remarkable success in content generation but suffer from prohibitive computational costs due to iterative sampling. While recent feature caching methods tend to accelerate inference through temporal extrapolation, these methods still suffer from severe quality loss due to the failure in modeling the complex dynamics of feature evolution. To solve this problem, this paper presents HiCache (Hermite Polynomial-based Feature Cache), a training-free acceleration framework that fundamentally improves feature prediction by aligning mathematical tools with empirical properties. Our key insight is that feature derivative approximations in Diffusion Transformers exhibit multivariate Gaussian characteristics, motivating the use of Hermite polynomials, the potentially theoretically optimal basis for Gaussian-correlated processes. Besides, we introduce a dual-scaling mechanism that ensures numerical stability while preserving predictive accuracy, which is also effective when applied standalone to TaylorSeer. Extensive experiments demonstrate HiCache's superiority: achieving $5.55\times$ speedup on FLUX.1-dev while exceeding baseline quality, maintaining strong performance across text-to-image, video generation, and super-resolution tasks. Moreover, HiCache can be naturally added to the previous caching methods to enhance their performance, e.g., improving ClusCa from $0.9480$ to $0.9840$ in terms of image rewards. Our code is included in the supplementary material, and will be released on GitHub.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces HiCache, a training-free acceleration framework that uses Hermite polynomials to predict feature evolution in diffusion models, alongside a dual-scaling mechanism for numerical stability. It resides in the Taylor Expansion-Based Prediction leaf, which contains four papers including the original work. This leaf sits within the Feature Prediction and Forecasting branch, representing a moderately populated research direction focused on extrapolating future features rather than directly reusing cached ones. The taxonomy shows this is one of two main forecasting approaches, with the sibling leaf covering speculative and alternative forecasting methods.

The Taylor Expansion-Based Prediction leaf neighbors the Speculative and Alternative Forecasting leaf within the same parent branch, indicating two distinct mathematical approaches to feature prediction. Beyond this branch, the taxonomy reveals seven other major directions including Feature Caching Mechanisms (with temporal, spatial, and hierarchical variants), Hybrid and Optimization-Based Caching, and Domain-Specific Applications. HiCache's focus on Hermite polynomials for Gaussian-correlated processes positions it as a refinement within the Taylor expansion paradigm, diverging from pure caching methods like DeepCache and from optimization-driven approaches that dynamically adjust caching intervals.

Among sixteen candidates examined across three contributions, none were found to clearly refute the work. The Hermite polynomial framework examined four candidates with zero refutations, the dual-scaling mechanism examined two candidates with zero refutations, and the plug-and-play upgrade examined ten candidates with zero refutations. This limited search scope suggests that within the top-sixteen semantically similar papers, no prior work appears to provide overlapping contributions. The dual-scaling mechanism shows the smallest examination pool, while the plug-and-play upgrade received the broadest scrutiny, yet all three contributions appear novel relative to the examined candidates.

Based on the limited search of sixteen candidates, the work appears to introduce distinct technical elements within an established research direction. The taxonomy context shows Taylor expansion methods represent a recognized but not overcrowded approach, with four papers total in this leaf. However, the analysis does not cover the full literature landscape, and the absence of refutations reflects only the top-sixteen semantic matches rather than an exhaustive review of all related work in diffusion model acceleration.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
16
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: accelerating diffusion model inference through feature caching and prediction. The field has evolved into a rich taxonomy with eight major branches, each addressing distinct aspects of this acceleration challenge. Feature Caching Mechanisms explore direct reuse strategies, from simple interval-based schemes like DeepCache[3] to more sophisticated adaptive approaches such as FreqCA[5] and Block Caching[6]. Feature Prediction and Forecasting methods, including Taylor expansion techniques exemplified by TaylorSeers[28] and Confidence-Gated Taylor[18], attempt to extrapolate future features rather than merely storing past ones. Hybrid and Optimization-Based Caching combines multiple strategies, balancing caching intervals with error minimization as seen in Error-Optimized Cache[4] and Gradient-Optimized Cache[9]. Domain-Specific Caching Applications tailor these ideas to particular tasks like video generation or image editing, while Architectural and Structural Enhancements modify network designs to enable better reuse. Deployment and System-Level Optimization addresses practical concerns such as quantization and memory efficiency, and a small cluster of survey works provides comparative analysis across methods. A particularly active tension exists between pure caching approaches that store and reuse intermediate features versus predictive methods that forecast them using mathematical models. HiCache[0] sits squarely within the Taylor Expansion-Based Prediction branch, sharing conceptual ground with TaylorSeers[28] and Confidence-Gated Taylor[18] by leveraging derivative information to anticipate feature evolution across timesteps. Compared to simpler caching schemes like DeepCache[3], which periodically reuses stored features, HiCache[0] emphasizes analytical prediction to reduce computational overhead while maintaining generation quality. This predictive stance contrasts with hybrid methods that blend caching and optimization, such as Error-Optimized Cache[4], which dynamically adjusts caching intervals based on accumulated error. The broader landscape reveals ongoing exploration of trade-offs between prediction accuracy, computational savings, and implementation complexity, with many studies seeking the sweet spot where minimal computation yields maximal speedup without sacrificing output fidelity.

Claimed Contributions

Hermite polynomial-based feature caching framework (HiCache)

The authors propose HiCache, a training-free acceleration method for diffusion models that replaces standard Taylor polynomial basis with scaled Hermite polynomials for feature prediction. This choice is motivated by the empirical observation that feature derivatives in Diffusion Transformers exhibit multivariate Gaussian characteristics, making Hermite polynomials theoretically optimal for modeling such processes.

4 retrieved papers
Dual-scaling mechanism for numerical stability

The authors introduce a dual-scaling mechanism with a single hyperparameter that provides input contraction and coefficient suppression. This mechanism stabilizes Hermite polynomial predictions by constraining inputs within the stable oscillatory regime and suppressing exponential growth in high-order terms, and can also enhance existing Taylor-based methods.

2 retrieved papers
Plug-and-play upgrade for cache-then-forecast methods

HiCache serves as a drop-in replacement for Taylor-based predictors in existing cache-then-forecast frameworks by only substituting the polynomial basis while preserving the same predictor form and computational structure. This allows it to enhance existing methods like ClusCa with negligible overhead.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Hermite polynomial-based feature caching framework (HiCache)

The authors propose HiCache, a training-free acceleration method for diffusion models that replaces standard Taylor polynomial basis with scaled Hermite polynomials for feature prediction. This choice is motivated by the empirical observation that feature derivatives in Diffusion Transformers exhibit multivariate Gaussian characteristics, making Hermite polynomials theoretically optimal for modeling such processes.

Contribution

Dual-scaling mechanism for numerical stability

The authors introduce a dual-scaling mechanism with a single hyperparameter that provides input contraction and coefficient suppression. This mechanism stabilizes Hermite polynomial predictions by constraining inputs within the stable oscillatory regime and suppressing exponential growth in high-order terms, and can also enhance existing Taylor-based methods.

Contribution

Plug-and-play upgrade for cache-then-forecast methods

HiCache serves as a drop-in replacement for Taylor-based predictors in existing cache-then-forecast frameworks by only substituting the polynomial basis while preserving the same predictor form and computational structure. This allows it to enhance existing methods like ClusCa with negligible overhead.