Abstract:

Deepfake detection remains a formidable challenge due to the evolving nature of fake content in real-world scenarios. However, existing benchmarks suffer from severe discrepancies from industrial practice, typically featuring homogeneous training sources and low-quality testing images, which hinder the practical usage of current detectors. To mitigate this gap, we introduce HydraFake, a dataset that contains diversified deepfake techniques and in-the-wild forgeries, along with rigorous training and evaluation protocol, covering unseen model architectures, emerging forgery techniques and novel data domains. Building on this resource, we propose Veritas, a multi-modal large language model (MLLM) based deepfake detector. Different from vanilla chain-of-thought (CoT), we introduce pattern-aware reasoning that involves critical patterns such as "planning" and "self-reflection" to emulate human forensic process. We further propose a two-stage training pipeline to seamlessly internalize such deepfake reasoning capacities into current MLLMs. Experiments on HydraFake dataset reveal that although previous detectors show great generalization on cross-model scenarios, they fall short on unseen forgeries and data domains. Our Veritas achieves significant gains across different out-of-domain (OOD) scenarios, and is capable of delivering transparent and faithful detection outputs.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
43
3
Claimed Contributions
18
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: generalizable deepfake detection via pattern-aware reasoning. The field has evolved into a rich landscape of complementary strategies, each addressing different facets of the generalization challenge. At the highest level, the taxonomy reveals several major branches: frequency and spectral domain analysis (e.g., Frequency-aware Detection[1], Synthetic Frequency Patterns[3]) exploits artifacts in the frequency spectrum that persist across generation methods; temporal and spatiotemporal reasoning captures inconsistencies over time; feature disentanglement and decomposition (e.g., Texture Artifact Decomposition[8]) separates content from manipulation traces; large pre-trained model adaptation leverages foundation models such as CLIP or large language models; data augmentation and synthetic training strategies create diverse training signals; domain adaptation and generalization strategies (e.g., Invariant Risk Minimization[25]) explicitly optimize for cross-domain robustness; meta-learning and few-shot detection (e.g., Meta-Learning Relation Embedding[22]) enable rapid adaptation to novel forgeries; local and patch-level analysis (e.g., Patch-Discontinuity Mining[34]) focuses on fine-grained spatial cues; identity and semantic consistency analysis checks for logical coherence; noise pattern and forensic trace analysis mines low-level statistical signatures; attention mechanisms and architectural innovations introduce novel inductive biases; baseline and specialized architectures provide reference points; and surveys (e.g., Robust Detection Survey[6]) synthesize the state of the art. A particularly active line of work centers on adapting large pre-trained models to deepfake detection, where methods such as C2P-CLIP[5] and DeepFake-Adapter[23] fine-tune vision-language or vision-only backbones to capture generalizable forgery patterns. Within this branch, a small but growing cluster explores multimodal large language model reasoning, combining visual and textual modalities to perform more interpretable, context-aware detection. Veritas[0] sits squarely in this cluster, alongside Skyra[15] and EDVD-LLaMA[32], all of which harness the reasoning capabilities of large language models to identify subtle inconsistencies that simpler architectures might miss. Compared to Skyra[15], which emphasizes cross-modal alignment, and EDVD-LLaMA[32], which integrates video-level temporal cues, Veritas[0] focuses on pattern-aware reasoning that bridges low-level forensic traces with high-level semantic understanding. This direction reflects a broader trend toward interpretable, reasoning-driven detection, contrasting with purely data-driven approaches in frequency analysis or meta-learning branches, and highlights ongoing questions about how best to combine domain-specific inductive biases with the flexibility of foundation models.

Claimed Contributions

HydraFake dataset with hierarchical evaluation protocol

The authors construct a new deepfake detection dataset featuring diverse forgery techniques and in-the-wild samples. They establish a hierarchical evaluation protocol with four testing levels (in-domain, cross-model, cross-forgery, cross-domain) to simulate real-world challenges and comprehensively measure detector generalization.

7 retrieved papers
Pattern-aware reasoning framework for deepfake detection

The authors propose a reasoning framework that incorporates five thinking patterns (fast judgement, planning, reasoning, self-reflection, conclusion) inspired by human forensic analysis. This pattern-aware approach enables logical and holistic reasoning for deepfake detection, outperforming vanilla chain-of-thought methods.

1 retrieved paper
Two-stage training pipeline with MiPO and P-GRPO

The authors develop a training pipeline consisting of pattern-guided cold-start (with SFT and Mixed Preference Optimization) and Pattern-aware Group Relative Policy Optimization. This pipeline internalizes reasoning abilities into MLLMs, enabling adaptive planning and self-reflection while delivering transparent and faithful detection outputs.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

HydraFake dataset with hierarchical evaluation protocol

The authors construct a new deepfake detection dataset featuring diverse forgery techniques and in-the-wild samples. They establish a hierarchical evaluation protocol with four testing levels (in-domain, cross-model, cross-forgery, cross-domain) to simulate real-world challenges and comprehensively measure detector generalization.

Contribution

Pattern-aware reasoning framework for deepfake detection

The authors propose a reasoning framework that incorporates five thinking patterns (fast judgement, planning, reasoning, self-reflection, conclusion) inspired by human forensic analysis. This pattern-aware approach enables logical and holistic reasoning for deepfake detection, outperforming vanilla chain-of-thought methods.

Contribution

Two-stage training pipeline with MiPO and P-GRPO

The authors develop a training pipeline consisting of pattern-guided cold-start (with SFT and Mixed Preference Optimization) and Pattern-aware Group Relative Policy Optimization. This pipeline internalizes reasoning abilities into MLLMs, enabling adaptive planning and self-reflection while delivering transparent and faithful detection outputs.

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning | Novelty Validation