FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

Vision Language ModelsImage ForensicsAIGC Detection

The rapid rise of image generation calls for detection methods that are both interpretable and reliable. Existing approaches, though accurate, act as black boxes and fail to generalize to out-of-distribution data, while multi-modal large language models (MLLMs) provide reasoning ability but often hallucinate. To address these issues, we construct FakeXplained dataset of AI-generated images annotated with bounding boxes and descriptive captions that highlight synthesis artifacts, forming the basis for human-aligned, visually grounded reasoning. Leveraging FakeXplained, we develop FakeXplainer which fine-tunes MLLMs with a progressive training pipeline, enabling accurate detection, artifact localization, and coherent textual explanations. Extensive experiments show that FakeXplainer not only sets a new state-of-the-art in detection and localization accuracy (98.2% accuracy, 36.0% IoU), but also demonstrates strong robustness and out-of-distribution generalization, uniquely delivering spatially grounded, human-aligned rationales.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: AI-generated image detection with interpretable explanations. The field has evolved from purely classification-driven approaches toward systems that not only identify synthetic content but also provide human-understandable rationales for their decisions. The taxonomy reflects this dual emphasis through eight major branches. Deep Learning Classification Approaches and Forensic and Signal-Based Detection Methods focus on discriminative accuracy using CNNs, vision transformers, and frequency-domain analysis. Explainable AI Techniques for Detection Transparency leverage saliency maps, attention mechanisms, and post-hoc interpretation tools to reveal which image regions or features drive predictions. Multimodal Large Language Model-Based Detection and Explanation represents a newer direction that integrates vision-language models to generate natural-language justifications and localize artifacts. Meanwhile, Datasets, Benchmarks, and Evaluation Frameworks establish standardized testbeds, User-Centric and Interactive Detection Systems explore human-in-the-loop workflows, Domain-Specific and Specialized Applications address niche contexts like medical imaging or sign language, and Generalization, Robustness, and Cross-Generator Detection tackle the challenge of maintaining performance across diverse generative models. Recent work has increasingly emphasized grounded reasoning that pinpoints suspicious artifacts rather than offering only global verdicts. FakeXplain[0] exemplifies this trend by combining multimodal large language models with explicit artifact localization, situating itself within the Grounded Reasoning with Artifact Localization cluster. This approach contrasts with earlier explainability efforts such as Deepfake Detection Explainable[1] and CNN Explainable Detection[5], which primarily relied on gradient-based saliency or attention overlays without structured linguistic explanations. Closely related works like ForenX[3] and AIGI Holmes[10] similarly pursue fine-grained localization and interpretable outputs, yet FakeXplain[0] distinguishes itself by leveraging the reasoning capabilities of large language models to articulate why specific regions appear synthetic. A key open question across these branches is how to balance detection accuracy with explanation fidelity, especially when models must generalize to unseen generators or adversarially perturbed images.

Claimed Contributions

FakeXplained dataset with human-aligned grounded annotations

10 retrieved papers

A curated dataset of 8,772 AI-generated images from diverse state-of-the-art generative models, annotated with bounding boxes and concise captions that highlight visual anomalies and illogical details. This dataset provides fine-grained, human-grounded annotations to support both visual grounding and textual reasoning for interpretable detection.

10 retrieved papers

FakeXplainer detector with progressive training pipeline

8 retrieved papers

An end-to-end system that fine-tunes multi-modal large language models on FakeXplained using a progressive training pipeline integrating supervised fine-tuning and reinforcement learning. The system performs detection, localization, and provides spatially grounded, human-aligned explanations for AI-generated images.

8 retrieved papers

State-of-the-art performance with robust explainability

Can Refute

10 retrieved papers

FakeXplainer achieves state-of-the-art detection and localization accuracy while demonstrating strong robustness and out-of-distribution generalization. It uniquely delivers spatially grounded, human-aligned rationales that explain both where and why images appear AI-generated.

10 retrieved papers

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models PDF

Tan, Chuangchuang, Wang Jing-lu, Chuangchuang Tan, Ming Xiang, Jinglu Wang, Tao, Renshuai, Xiang Ming, Wei, Yunchao, Renshuai Tao, Zhao, Yao, Yunchao Wei, Lu Yan, Yao Zhao, Yan Lu (2025) • arXiv.org

[10] AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models PDF

Zhou Ziyin, Luo YunPeng, Ziyin Zhou, Wu Yuan-chen, Yunpeng Luo, Sun Ke, Yuanchen Wu, Ji, Jiayi, Ke Sun, Yan Ke, Jiayi Ji, Ding, Shouhong, Ke Yan, Sun, Xiaoshuai, Shouhong Ding, Wu Yunsheng, Xiaoshuai Sun, Ji Rongrong, Yunsheng Wu, Rongrong Ji (2025) • arXiv.org

[11] Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation PDF

Wen Si-wei, YE Junyan, Siwei Wen, Feng Peilin, Junyan Ye, Peilin Feng, Wen Zichen, Hengrui Kang, Chen Yize, Zichen Wen, Wu Jiang, Yize Chen, Wu Wenjun, Jiang Wu, He, Conghui, Wenjun Wu, LI Weijia, Conghui He, Weijia Li (2025)

[29] Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs PDF

Ji, Yikun, Yan Hong, Yikun Ji, Lan Jun, Hong Yan, Zhu, Huijia, Jun Lan, Wang Wei-qiang, Huijia Zhu, Fan Qi, Weiqiang Wang, Zhang Liqing, Qi Fan, Zhang Jian-fu, Liqing Zhang, Jianfu Zhang (2025) • arXiv.org

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

FakeXplained dataset with human-aligned grounded annotations

[7] Cifake: Image classification and explainable identification of ai-generated synthetic images PDF

Cannot Refute

[27] RADAR: Reasoning AI-Generated Image Detection for Semantic Fakes PDF

Cannot Refute

[51] Wildfake: A large-scale challenging dataset for ai-generated images detection PDF

Cannot Refute

[52] M3DSYNTH: A dataset of medical 3D images with AI-generated local manipulations PDF

Cannot Refute

[53] Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach PDF

Cannot Refute

[54] Exploring the naturalness of ai-generated images PDF

Cannot Refute

[55] ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection PDF

Cannot Refute

[56] Efficient end-to-end learning for cell segmentation with machine generated weak annotations PDF

Cannot Refute

[57] AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art PDF

Cannot Refute

[58] GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts PDF

Cannot Refute

Contribution

FakeXplainer detector with progressive training pipeline

[68] A collaborative Fusion and Registration Framework for Multi-Modal Image Fusion PDF

Cannot Refute

[69] Forgerygpt: Multimodal large language model for explainable image forgery detection and localization PDF

Cannot Refute

[70] DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention PDF

Cannot Refute

[71] Progressive feedback-enhanced transformer for image forgery localization PDF

Cannot Refute

[72] Towards dimension-enriched underwater image quality assessment PDF

Cannot Refute

[73] Multi-Modal Prompt Learning on Blind Image Quality Assessment PDF

Cannot Refute

[74] Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization PDF

Cannot Refute

[75] HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection PDF

Cannot Refute

Contribution

State-of-the-art performance with robust explainability

[29] Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs PDF

Can Refute

[59] Unmasking synthetic realities in generative ai: A comprehensive review of adversarially robust deepfake detection systems PDF

Cannot Refute

[60] Repmix: Representation mixing for robust attribution of synthesized images PDF

Cannot Refute

[61] Dfda: An analysis of deep learning models to detect deepfake videos PDF

Cannot Refute

[62] EditTrack: Detecting and Attributing AI-assisted Image Editing PDF

Cannot Refute

[63] CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos PDF

Cannot Refute

[64] Region-Level Data Attribution for Text-to-Image Generative Models PDF

Cannot Refute

[65] DeepGuard: Identification and Attribution of AI-Generated Synthetic Images PDF

Cannot Refute

[66] SAGA: Source Attribution of Generative AI Videos PDF

Cannot Refute

[67] Zoom-In to Sort AI-Generated Images Out PDF

Cannot Refute

FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning

Overview

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] ForenX: Towards Explainable AI-Generated Image Detection with Multimodal Large Language Models PDF

[10] AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models PDF

[11] Spot the fake: Large multimodal model-based synthetic image detection with artifact explanation PDF

[29] Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs PDF

Contribution Analysis

FakeXplained dataset with human-aligned grounded annotations

[7] Cifake: Image classification and explainable identification of ai-generated synthetic images PDF

[27] RADAR: Reasoning AI-Generated Image Detection for Semantic Fakes PDF

[51] Wildfake: A large-scale challenging dataset for ai-generated images detection PDF

[52] M3DSYNTH: A dataset of medical 3D images with AI-generated local manipulations PDF

[53] Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach PDF

[54] Exploring the naturalness of ai-generated images PDF

[55] ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection PDF

[56] Efficient end-to-end learning for cell segmentation with machine generated weak annotations PDF

[57] AI Art Neural Constellation: Revealing the Collective and Contrastive State of AI-Generated and Human Art PDF

[58] GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts PDF

FakeXplainer detector with progressive training pipeline

[68] A collaborative Fusion and Registration Framework for Multi-Modal Image Fusion PDF

[69] Forgerygpt: Multimodal large language model for explainable image forgery detection and localization PDF

[70] DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention PDF

[71] Progressive feedback-enhanced transformer for image forgery localization PDF

[72] Towards dimension-enriched underwater image quality assessment PDF

[73] Multi-Modal Prompt Learning on Blind Image Quality Assessment PDF

[74] Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization PDF

[75] HAMLET-FFD: Hierarchical Adaptive Multi-modal Learning Embeddings Transformation for Face Forgery Detection PDF

State-of-the-art performance with robust explainability

[29] Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs PDF

[59] Unmasking synthetic realities in generative ai: A comprehensive review of adversarially robust deepfake detection systems PDF

[60] Repmix: Representation mixing for robust attribution of synthesized images PDF

[61] Dfda: An analysis of deep learning models to detect deepfake videos PDF

[62] EditTrack: Detecting and Attributing AI-assisted Image Editing PDF

[63] CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos PDF

[64] Region-Level Data Attribution for Text-to-Image Generative Models PDF

[65] DeepGuard: Identification and Attribution of AI-Generated Synthetic Images PDF

[66] SAGA: Source Attribution of Generative AI Videos PDF

[67] Zoom-In to Sort AI-Generated Images Out PDF

Table of Contents