Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

AIGC copyright protection; Image watermark; Diffusion model

Protecting the copyright of user-generated AI images is an emerging challenge as AIGC becomes pervasive in creative workflows. Existing watermarking methods (1) remain vulnerable to real-world adversarial threats, often forced to trade off between defenses against spoofing and removal attacks; and (2) cannot support semantic-level tamper localization. We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection, plug-and-play with diffusion-based AIGC services. PAI simultaneously provides three key functionalities: robust ownership verification, attack detection, and semantic-level tampering localization. Unlike existing inherent watermark methods that only embed watermarks at noise initialization of diffusion models, we design a novel key-conditioned deflection mechanism that subtly steers the denoising trajectory according to the user key. Such trajectory-level coupling further strengthens the semantic entanglement of identity and content, thereby further enhancing robustness against real-world threats. Moreover, we also provide a theoretical analysis proving that only the valid key can pass verification. Experiments across 12 attack methods show that PAI achieves 98.43% verification accuracy, improving over SOTA methods by 37.25% on average, and retains strong tampering localization performance even against advanced AIGC edits. Our code is available at \url{https://anonymous.4open.science/r/PAI-423D}.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PAI, a training-free inherent watermarking framework that embeds watermarks by steering diffusion model denoising trajectories via a key-conditioned deflection mechanism. It resides in the 'Trajectory-Level and Noise-Conditioned Embedding' leaf, which contains only three papers total including this one. This leaf represents a relatively sparse but active research direction within diffusion model watermarking, focusing specifically on methods that manipulate the iterative denoising process rather than post-processing or latent-only approaches. The small sibling set suggests this trajectory-steering paradigm is still emerging compared to broader watermarking categories.

The taxonomy tree reveals that PAI's leaf sits within 'Diffusion Model In-Generation Watermarking,' which branches into four distinct approaches: trajectory-level methods, latent space integration, text-prompt conditioning, and provenance tracing. Neighboring leaves address complementary challenges—latent space methods embed without trajectory manipulation, while provenance tracing focuses on tamper localization. The broader 'Watermark Embedding Mechanisms' branch also includes GAN-based and autoregressive techniques, indicating that trajectory-level diffusion watermarking occupies a specialized niche within a diverse landscape of generative model protection strategies.

Among 19 candidates examined across three contributions, none were flagged as clearly refuting PAI's claims. The dual-stage injection mechanism was assessed against one candidate with no overlap found. The theoretical guarantee on key exclusivity examined eight candidates without identifying prior work establishing similar formal proofs. The unified forensic framework—supporting verification, attack detection, and semantic tampering localization—reviewed ten candidates, none providing equivalent multi-functional integration. These statistics reflect a limited semantic search scope rather than exhaustive coverage, suggesting the contributions appear novel within the examined subset but do not rule out relevant work beyond the top-19 matches.

Given the sparse taxonomy leaf and absence of refuting candidates in the limited search, PAI's trajectory-deflection approach and unified forensic capabilities appear to extend existing trajectory-level methods in meaningful ways. However, the analysis covers only 19 semantically similar papers from a 50-paper taxonomy, leaving open the possibility that related work in adjacent leaves or outside the search scope could provide additional context. The framework's novelty is most evident in its combination of trajectory steering with multi-functional forensic analysis, a pairing not explicitly represented in sibling papers.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: watermarking for AI-generated image copyright protection and forensic analysis. The field has evolved into a rich taxonomy spanning multiple branches that address distinct but interconnected challenges. Watermark Embedding Mechanisms in Generative Models focuses on integrating watermarks directly into the generation process—whether in GANs, diffusion models, or auto-regressive architectures—so that every synthesized image carries an intrinsic signature. Detection, Verification, and Attribution branches develop methods to reliably extract and validate these embedded signals, often under real-world distortions. Robustness and Attack Resistance explores how watermarks withstand adversarial manipulations, while Forensic Analysis and Tamper Detection examines post-hoc identification of alterations or provenance. Attack Analysis and Watermark Forgery investigates adversarial strategies that attempt to remove or forge watermarks, and Traditional and Classical Watermarking Techniques provides foundational methods adapted from pre-generative-AI eras. Finally, Surveys, Reviews, and Theoretical Frameworks (e.g., AI Image Watermarking Survey[7], Latent Diffusion Watermarking Review[17]) synthesize emerging trends and regulatory considerations such as EU AI Act Watermarking[3]. Within the diffusion model embedding branch, a particularly active line of work targets trajectory-level and noise-conditioned strategies that modulate the iterative denoising process itself. Semantic Deflection Watermarking[0] exemplifies this approach by steering intermediate latent states to encode ownership information without compromising visual fidelity, closely aligning with methods like Gaussian Shading[9] and Gaussian Shading Plus[15] that also manipulate noise schedules or latent perturbations. These techniques contrast with post-generation or frequency-domain methods (e.g., Frequency Spectrum Copyright[5]) that apply watermarks after synthesis, trading off imperceptibility for robustness. A central open question across these branches is balancing stealth, capacity, and resilience: trajectory-level embedding offers tighter integration but may be more vulnerable to adversarial purification attacks, whereas classical frequency techniques provide established robustness guarantees at the cost of potential perceptual artifacts. Semantic Deflection Watermarking[0] sits squarely in the trajectory-conditioned cluster, sharing design principles with Gaussian Shading[9] and Gaussian Shading Plus[15] while emphasizing semantic-level deflection to enhance both security and imperceptibility.

Claimed Contributions

PAI: Training-free inherent watermarking framework with dual-stage injection

1 retrieved paper

The authors introduce PAI, a plug-and-play watermarking framework that embeds watermarks during both the initialization stage (via Box-Muller transformation) and the denoising stage (via key-conditioned deflection). This dual-stage design semantically couples user identity with content, enhancing robustness without requiring additional training or encoder-decoder networks.

1 retrieved paper

Theoretical guarantee on key exclusivity for verification

8 retrieved papers

The authors prove that only the valid user key can pass verification by showing that invalid keys produce consistently higher initialization bias than valid keys, even when the forged key approaches the valid key. This theoretical analysis ensures that watermark verification is cryptographically sound and resistant to key forgery.

8 retrieved papers

Unified forensic framework supporting verification, attack detection, and semantic tampering localization

10 retrieved papers

The authors design a unified verification framework that uses initialization bias in a low-dimensional latent space to simultaneously support ownership verification, distinguish between removal and spoofing attacks, and localize semantic-level tampering. This overcomes the limitation of existing methods that rely on one-dimensional verification signals and cannot handle advanced AIGC-based editing.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[9] Gaussian shading: Provable performance-lossless image watermarking for diffusion models PDF

Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu, Wei Ming Zhang, Neng H. Yu (2024)

[15] Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models PDF

Yang Zi-jin, Zhang Xin, Chen, Kejiang, Zeng Kai, Yao Qiyi, Fang Han, Zhang Weiming, Yu, Nenghai (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PAI: Training-free inherent watermarking framework with dual-stage injection

[51] A Comprehensive Evaluation of Watermarking for Time Series Diffusion Models PDF

Cannot Refute

Contribution

Theoretical guarantee on key exclusivity for verification

[59] An Ensemble Framework for Unbiased Language Model Watermarking PDF

Cannot Refute

[60] Multi-designated detector watermarking for language models PDF

Cannot Refute

[61] ATG-CHFMs: Accurate Ternary Generalized ChebyshevâFourier Moments for Stereo Image Zero-Watermarking PDF

Cannot Refute

[62] Wavelet packets-based digital watermarking for image verification and authentication PDF

Cannot Refute

[63] Publicly verifiable software watermarking PDF

Cannot Refute

[64] Mitigating Watermark Forgery in Generative Models via Randomized Key Selection PDF

Cannot Refute

[65] The marriage of cryptography and watermarkingâbeneficial and challenging for secure watermarking and detection PDF

Cannot Refute

[66] A blind image watermarking algorithm based on dual tree complex wavelet transform PDF

Cannot Refute

Contribution

Unified forensic framework supporting verification, attack detection, and semantic tampering localization

[14] AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection PDF

Cannot Refute

[45] StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models PDF

Cannot Refute

[50] Securing Digital Media Integrity: A Survey of Watermarking and Manipulation Detection for Image Authentication PDF

Cannot Refute

[52] Secure and reversible fragile watermarking for accurate authentication and tamper localization in medical images PDF

Cannot Refute

[53] Seal: Semantic aware image watermarking PDF

Cannot Refute

[54] Watermarking language models through language models PDF

Cannot Refute

[55] Hierarchical watermarking for secure image authentication with localization PDF

Cannot Refute

[56] Security of fragile authentication watermarks with localization PDF

Cannot Refute

[58] Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking PDF

Cannot Refute

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] Gaussian shading: Provable performance-lossless image watermarking for diffusion models PDF

[15] Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models PDF

Contribution Analysis

PAI: Training-free inherent watermarking framework with dual-stage injection

[51] A Comprehensive Evaluation of Watermarking for Time Series Diffusion Models PDF

Theoretical guarantee on key exclusivity for verification

[59] An Ensemble Framework for Unbiased Language Model Watermarking PDF

[60] Multi-designated detector watermarking for language models PDF

[61] ATG-CHFMs: Accurate Ternary Generalized ChebyshevâFourier Moments for Stereo Image Zero-Watermarking PDF

[62] Wavelet packets-based digital watermarking for image verification and authentication PDF

[63] Publicly verifiable software watermarking PDF

[64] Mitigating Watermark Forgery in Generative Models via Randomized Key Selection PDF

[65] The marriage of cryptography and watermarkingâbeneficial and challenging for secure watermarking and detection PDF

[66] A blind image watermarking algorithm based on dual tree complex wavelet transform PDF

Unified forensic framework supporting verification, attack detection, and semantic tampering localization

[14] AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection PDF

[45] StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models PDF

[50] Securing Digital Media Integrity: A Survey of Watermarking and Manipulation Detection for Image Authentication PDF

[52] Secure and reversible fragile watermarking for accurate authentication and tamper localization in medical images PDF

[53] Seal: Semantic aware image watermarking PDF

[54] Watermarking language models through language models PDF

[55] Hierarchical watermarking for secure image authentication with localization PDF

[56] Security of fragile authentication watermarks with localization PDF

[57] äººå·¥æºè½æ¨¡åæ°´å°ç ç©¶è¿å± PDF

[58] Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking PDF

Table of Contents

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[9] Gaussian shading: Provable performance-lossless image watermarking for diffusion models PDF

[15] Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models PDF

Contribution Analysis

PAI: Training-free inherent watermarking framework with dual-stage injection

[51] A Comprehensive Evaluation of Watermarking for Time Series Diffusion Models PDF

Theoretical guarantee on key exclusivity for verification

[59] An Ensemble Framework for Unbiased Language Model Watermarking PDF

[60] Multi-designated detector watermarking for language models PDF

[61] ATG-CHFMs: Accurate Ternary Generalized ChebyshevâFourier Moments for Stereo Image Zero-Watermarking PDF

[62] Wavelet packets-based digital watermarking for image verification and authentication PDF

[63] Publicly verifiable software watermarking PDF

[64] Mitigating Watermark Forgery in Generative Models via Randomized Key Selection PDF

[65] The marriage of cryptography and watermarkingâbeneficial and challenging for secure watermarking and detection PDF

[66] A blind image watermarking algorithm based on dual tree complex wavelet transform PDF

Unified forensic framework supporting verification, attack detection, and semantic tampering localization

[14] AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection PDF

[45] StableGuard: Towards Unified Copyright Protection and Tamper Localization in Latent Diffusion Models PDF

[50] Securing Digital Media Integrity: A Survey of Watermarking and Manipulation Detection for Image Authentication PDF

[52] Secure and reversible fragile watermarking for accurate authentication and tamper localization in medical images PDF

[53] Seal: Semantic aware image watermarking PDF

[54] Watermarking language models through language models PDF

[55] Hierarchical watermarking for secure image authentication with localization PDF

[56] Security of fragile authentication watermarks with localization PDF

[57] äººå·¥æºè½æ¨¡åæ°´å°ç ç©¶è¿å± PDF

[58] Proactive Deepfake Detection via Self-Verifiable Semantic Watermarking PDF

Table of Contents

[61] ATG-CHFMs: Accurate Ternary Generalized ChebyshevâFourier Moments for Stereo Image Zero-Watermarking PDF

[65] The marriage of cryptography and watermarkingâbeneficial and challenging for secure watermarking and detection PDF

[57] äººå·¥æºè½æ¨¡åæ°´å°ç ç©¶è¿å± PDF