Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images

ICLR 2026 Conference SubmissionAnonymous Authors
Anomaly Detection,AI-Generated Images
Abstract:

The rapid advancement of AI-generated content (AIGC) has enabled the synthesis of visually convincing images; however, many such outputs exhibit subtle \textbf{semantic anomalies}, including unrealistic object configurations, violations of physical laws, or commonsense inconsistencies, which compromise the overall plausibility of the generated scenes. Detecting these semantic-level anomalies is essential for assessing the trustworthiness of AIGC media, especially in AIGC image analysis, explainable deepfake detection and semantic authenticity assessment.In this paper, we formalize \textbf{semantic anomaly detection and reasoning} for AIGC images and introduce \textbf{AnomReason}, a large-scale benchmark with structured annotations as quadruples \emph{(Name, Phenomenon, Reasoning, Severity)}. Annotations are produced by a modular multi-agent pipeline (\textbf{AnomAgent}) with lightweight human-in-the-loop verification, enabling scale while preserving quality. At construction time, AnomAgent processed approximately 4.17,B GPT-4o tokens, providing scale evidence for the resulting structured annotations. We further show that models fine-tuned on AnomReason achieve consistent gains over strong vision-language baselines under our proposed semantic matching metric (\textit{SemAP} and \textit{SemF1}). Applications to {explainable deepfake detection} and {semantic reasonableness assessment of image generators} demonstrate practical utility. In summary, AnomReason and AnomAgent serve as a foundation for measuring and improving the semantic plausibility of AI-generated images. We will release code, metrics, data, and task-aligned models to support reproducible research on semantic authenticity and interpretable AIGC forensics.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes semantic anomaly detection and reasoning for AIGC images and introduces AnomReason, a benchmark with structured quadruple annotations, alongside AnomAgent, a multi-agent annotation pipeline. It resides in the Multimodal Large Language Model-Based Reasoning leaf, which contains seven papers including the original work. This leaf sits within the broader Semantic Anomaly Detection and Reasoning branch, indicating a moderately populated research direction focused on leveraging MLLMs for explainable semantic inconsistency detection, as opposed to low-level artifact-based methods.

The taxonomy reveals neighboring leaves such as Non-MLLM Semantic Detection (five papers) and branches like Artifact-Based Detection (multiple sub-leaves) and Explainability and Interpretability. The paper's MLLM-based approach diverges from traditional deep learning methods in the sibling Non-MLLM leaf and complements explainability work by providing structured reasoning outputs. The taxonomy's scope notes clarify that this work emphasizes vision-language reasoning and commonsense error detection, distinguishing it from purely visual or frequency-domain methods in adjacent branches.

Among thirty candidates examined, the formalization contribution shows two refutable candidates out of ten examined, suggesting some prior work on task definition exists within the limited search scope. The AnomReason benchmark and AnomAgent pipeline contributions each examined ten candidates with zero refutable matches, indicating these specific structured annotation and multi-agent pipeline designs appear less directly overlapped in the sampled literature. The statistics reflect a focused semantic search rather than exhaustive coverage, so unexamined work may exist beyond the top-thirty matches.

Based on the limited search scope of thirty semantically similar papers, the benchmark and pipeline contributions appear more distinctive than the task formalization, which has identifiable prior work among examined candidates. The taxonomy context shows the paper occupies a moderately active MLLM-based reasoning niche within a broader field spanning artifact detection, explainability, and safety. This analysis covers top-ranked semantic matches and does not claim exhaustive field coverage.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: semantic anomaly detection and reasoning in AI-generated images. The field has evolved beyond simple artifact-based detection to encompass a rich taxonomy of approaches. At the top level, the taxonomy divides into Semantic Anomaly Detection and Reasoning, which focuses on logical inconsistencies and content-level errors; Artifact-Based Detection, which targets low-level forensic traces; Multimodal and Hybrid Detection Frameworks, which combine visual and textual cues; Explainability and Interpretability, which aims to make detection decisions transparent; Safety and Content Moderation, which addresses harmful content; Domain Adaptation and Generalization, which tackles cross-dataset robustness; and Related Image Analysis Tasks, which situate this work within broader computer vision challenges. Representative works such as Explainable Fake Detection[4] and Grounded Reasoning Detection[6] illustrate how semantic reasoning and explainability have become central themes, while methods like SemID Inpainting[3] and Synthetic Photography Detection[5] highlight the diversity of technical strategies. A particularly active line of work leverages multimodal large language models to perform reasoning about semantic inconsistencies, moving beyond pixel-level cues to higher-level understanding. Semantic Visual Anomaly[0] sits squarely within this branch, emphasizing the use of advanced reasoning capabilities to identify logical flaws in generated images. This approach contrasts with nearby works such as Seeing Before Reasoning[8], which explores the interplay between visual perception and reasoning stages, and FakeReasoning[24], which also employs reasoning but may differ in architectural choices or dataset focus. The trade-offs here revolve around balancing computational cost, interpretability, and generalization: while reasoning-based methods offer richer explanations and can capture subtle semantic errors, they may require more resources and careful prompt engineering compared to purely visual or hybrid approaches like GPT Forensics[17] or ForgerySleuth[30]. Open questions include how to scale these reasoning frameworks across diverse generative models and how to ensure robustness when adversaries adapt to semantic detection strategies.

Claimed Contributions

Formalization of semantic anomaly detection and reasoning task for AIGC images

The authors formally define a new task that requires detecting and explaining semantic-level anomalies in AI-generated images through structured outputs comprising Name, Phenomenon, Reasoning, and Severity Score. This formulation goes beyond surface-level artifact detection to capture commonsense violations, physical implausibilities, and logical inconsistencies.

10 retrieved papers
Can Refute
AnomReason benchmark with structured quadruple annotations

The authors construct a large-scale benchmark dataset containing 21,539 AI-generated images annotated with structured semantic anomalies. Each anomaly is represented as a quadruple capturing what is wrong, why it is wrong, and how severe it is, enabling interpretable semantic analysis.

10 retrieved papers
AnomAgent multi-agent annotation pipeline with human-in-the-loop verification

The authors develop a modular multi-agent framework that decomposes anomaly reasoning into specialized stages (entity parsing, attribute analysis, relational reasoning, and anomaly consolidation). This pipeline is combined with lightweight human verification to balance annotation scale and quality.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Formalization of semantic anomaly detection and reasoning task for AIGC images

The authors formally define a new task that requires detecting and explaining semantic-level anomalies in AI-generated images through structured outputs comprising Name, Phenomenon, Reasoning, and Severity Score. This formulation goes beyond surface-level artifact detection to capture commonsense violations, physical implausibilities, and logical inconsistencies.

Contribution

AnomReason benchmark with structured quadruple annotations

The authors construct a large-scale benchmark dataset containing 21,539 AI-generated images annotated with structured semantic anomalies. Each anomaly is represented as a quadruple capturing what is wrong, why it is wrong, and how severe it is, enabling interpretable semantic analysis.

Contribution

AnomAgent multi-agent annotation pipeline with human-in-the-loop verification

The authors develop a modular multi-agent framework that decomposes anomaly reasoning into specialized stages (entity parsing, attribute analysis, relational reasoning, and anomaly consolidation). This pipeline is combined with lightweight human verification to balance annotation scale and quality.