FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning
Overview
Taxonomy
Research Landscape Overview
Claimed Contributions
A curated dataset of 8,772 AI-generated images from diverse state-of-the-art generative models, annotated with bounding boxes and concise captions that highlight visual anomalies and illogical details. This dataset provides fine-grained, human-grounded annotations to support both visual grounding and textual reasoning for interpretable detection.
An end-to-end system that fine-tunes multi-modal large language models on FakeXplained using a progressive training pipeline integrating supervised fine-tuning and reinforcement learning. The system performs detection, localization, and provides spatially grounded, human-aligned explanations for AI-generated images.
FakeXplainer achieves state-of-the-art detection and localization accuracy while demonstrating strong robustness and out-of-distribution generalization. It uniquely delivers spatially grounded, human-aligned rationales that explain both where and why images appear AI-generated.
Core Task Comparisons
Comparisons with papers in the same taxonomy category
[3] ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models PDF
[10] AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models PDF
[11] Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation PDF
[29] Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs PDF
Contribution Analysis
Detailed comparisons for each claimed contribution
FakeXplained dataset with human-aligned grounded annotations
A curated dataset of 8,772 AI-generated images from diverse state-of-the-art generative models, annotated with bounding boxes and concise captions that highlight visual anomalies and illogical details. This dataset provides fine-grained, human-grounded annotations to support both visual grounding and textual reasoning for interpretable detection.
[7] Cifake: Image classification and explainable identification of ai-generated synthetic images PDF
[27] RADAR: Reasoning AI-Generated Image Detection for Semantic Fakes PDF
[51] Wildfake: A large-scale challenging dataset for ai-generated images detection PDF
[52] M3DSYNTH: A dataset of medical 3D images with AI-generated local manipulations PDF
[53] Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach PDF
[54] Exploring the naturalness of ai-generated images PDF
[55] ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection PDF
[56] Efficient end-to-end learning for cell segmentation with machine generated weak annotations PDF
[57] AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art PDF
[58] GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts PDF
FakeXplainer detector with progressive training pipeline
An end-to-end system that fine-tunes multi-modal large language models on FakeXplained using a progressive training pipeline integrating supervised fine-tuning and reinforcement learning. The system performs detection, localization, and provides spatially grounded, human-aligned explanations for AI-generated images.
[68] A collaborative Fusion and Registration Framework for Multi-Modal Image Fusion PDF
[69] Forgerygpt: Multimodal large language model for explainable image forgery detection and localization PDF
[70] DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention PDF
[71] Progressive feedback-enhanced transformer for image forgery localization PDF
[72] Towards dimension-enriched underwater image quality assessment PDF
[73] Multi-Modal Prompt Learning on Blind Image Quality Assessment PDF
[74] Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization PDF
[75] HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection PDF
State-of-the-art performance with robust explainability
FakeXplainer achieves state-of-the-art detection and localization accuracy while demonstrating strong robustness and out-of-distribution generalization. It uniquely delivers spatially grounded, human-aligned rationales that explain both where and why images appear AI-generated.