FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning

ICLR 2026 Conference SubmissionAnonymous Authors
Vision Language ModelsImage ForensicsAIGC Detection
Abstract:

The rapid rise of image generation calls for detection methods that are both interpretable and reliable. Existing approaches, though accurate, act as black boxes and fail to generalize to out-of-distribution data, while multi-modal large language models (MLLMs) provide reasoning ability but often hallucinate. To address these issues, we construct FakeXplained dataset of AI-generated images annotated with bounding boxes and descriptive captions that highlight synthesis artifacts, forming the basis for human-aligned, visually grounded reasoning. Leveraging FakeXplained, we develop FakeXplainer which fine-tunes MLLMs with a progressive training pipeline, enabling accurate detection, artifact localization, and coherent textual explanations. Extensive experiments show that FakeXplainer not only sets a new state-of-the-art in detection and localization accuracy (98.2% accuracy, 36.0% IoU), but also demonstrates strong robustness and out-of-distribution generalization, uniquely delivering spatially grounded, human-aligned rationales.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
28
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: AI-generated image detection with interpretable explanations. The field has evolved from purely classification-driven approaches toward systems that not only identify synthetic content but also provide human-understandable rationales for their decisions. The taxonomy reflects this dual emphasis through eight major branches. Deep Learning Classification Approaches and Forensic and Signal-Based Detection Methods focus on discriminative accuracy using CNNs, vision transformers, and frequency-domain analysis. Explainable AI Techniques for Detection Transparency leverage saliency maps, attention mechanisms, and post-hoc interpretation tools to reveal which image regions or features drive predictions. Multimodal Large Language Model-Based Detection and Explanation represents a newer direction that integrates vision-language models to generate natural-language justifications and localize artifacts. Meanwhile, Datasets, Benchmarks, and Evaluation Frameworks establish standardized testbeds, User-Centric and Interactive Detection Systems explore human-in-the-loop workflows, Domain-Specific and Specialized Applications address niche contexts like medical imaging or sign language, and Generalization, Robustness, and Cross-Generator Detection tackle the challenge of maintaining performance across diverse generative models. Recent work has increasingly emphasized grounded reasoning that pinpoints suspicious artifacts rather than offering only global verdicts. FakeXplain[0] exemplifies this trend by combining multimodal large language models with explicit artifact localization, situating itself within the Grounded Reasoning with Artifact Localization cluster. This approach contrasts with earlier explainability efforts such as Deepfake Detection Explainable[1] and CNN Explainable Detection[5], which primarily relied on gradient-based saliency or attention overlays without structured linguistic explanations. Closely related works like ForenX[3] and AIGI Holmes[10] similarly pursue fine-grained localization and interpretable outputs, yet FakeXplain[0] distinguishes itself by leveraging the reasoning capabilities of large language models to articulate why specific regions appear synthetic. A key open question across these branches is how to balance detection accuracy with explanation fidelity, especially when models must generalize to unseen generators or adversarially perturbed images.

Claimed Contributions

FakeXplained dataset with human-aligned grounded annotations

A curated dataset of 8,772 AI-generated images from diverse state-of-the-art generative models, annotated with bounding boxes and concise captions that highlight visual anomalies and illogical details. This dataset provides fine-grained, human-grounded annotations to support both visual grounding and textual reasoning for interpretable detection.

10 retrieved papers
FakeXplainer detector with progressive training pipeline

An end-to-end system that fine-tunes multi-modal large language models on FakeXplained using a progressive training pipeline integrating supervised fine-tuning and reinforcement learning. The system performs detection, localization, and provides spatially grounded, human-aligned explanations for AI-generated images.

8 retrieved papers
State-of-the-art performance with robust explainability

FakeXplainer achieves state-of-the-art detection and localization accuracy while demonstrating strong robustness and out-of-distribution generalization. It uniquely delivers spatially grounded, human-aligned rationales that explain both where and why images appear AI-generated.

10 retrieved papers
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

FakeXplained dataset with human-aligned grounded annotations

A curated dataset of 8,772 AI-generated images from diverse state-of-the-art generative models, annotated with bounding boxes and concise captions that highlight visual anomalies and illogical details. This dataset provides fine-grained, human-grounded annotations to support both visual grounding and textual reasoning for interpretable detection.

Contribution

FakeXplainer detector with progressive training pipeline

An end-to-end system that fine-tunes multi-modal large language models on FakeXplained using a progressive training pipeline integrating supervised fine-tuning and reinforcement learning. The system performs detection, localization, and provides spatially grounded, human-aligned explanations for AI-generated images.

Contribution

State-of-the-art performance with robust explainability

FakeXplainer achieves state-of-the-art detection and localization accuracy while demonstrating strong robustness and out-of-distribution generalization. It uniquely delivers spatially grounded, human-aligned rationales that explain both where and why images appear AI-generated.

FakeXplain: AI-Generated Images Detection via Human-Aligned Grounded Reasoning | Novelty Validation