Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

ICLR 2026 Conference SubmissionAnonymous Authors
AIGC copyright protection; Image watermark; Diffusion model
Abstract:

Protecting the copyright of user-generated AI images is an emerging challenge as AIGC becomes pervasive in creative workflows. Existing watermarking methods (1) remain vulnerable to real-world adversarial threats, often forced to trade off between defenses against spoofing and removal attacks; and (2) cannot support semantic-level tamper localization. We introduce PAI, a training-free inherent watermarking framework for AIGC copyright protection, plug-and-play with diffusion-based AIGC services. PAI simultaneously provides three key functionalities: robust ownership verification, attack detection, and semantic-level tampering localization. Unlike existing inherent watermark methods that only embed watermarks at noise initialization of diffusion models, we design a novel key-conditioned deflection mechanism that subtly steers the denoising trajectory according to the user key. Such trajectory-level coupling further strengthens the semantic entanglement of identity and content, thereby further enhancing robustness against real-world threats. Moreover, we also provide a theoretical analysis proving that only the valid key can pass verification. Experiments across 12 attack methods show that PAI achieves 98.43% verification accuracy, improving over SOTA methods by 37.25% on average, and retains strong tampering localization performance even against advanced AIGC edits. Our code is available at \url{https://anonymous.4open.science/r/PAI-423D}.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces PAI, a training-free inherent watermarking framework that embeds watermarks by steering diffusion model denoising trajectories via a key-conditioned deflection mechanism. It resides in the 'Trajectory-Level and Noise-Conditioned Embedding' leaf, which contains only three papers total including this one. This leaf represents a relatively sparse but active research direction within diffusion model watermarking, focusing specifically on methods that manipulate the iterative denoising process rather than post-processing or latent-only approaches. The small sibling set suggests this trajectory-steering paradigm is still emerging compared to broader watermarking categories.

The taxonomy tree reveals that PAI's leaf sits within 'Diffusion Model In-Generation Watermarking,' which branches into four distinct approaches: trajectory-level methods, latent space integration, text-prompt conditioning, and provenance tracing. Neighboring leaves address complementary challenges—latent space methods embed without trajectory manipulation, while provenance tracing focuses on tamper localization. The broader 'Watermark Embedding Mechanisms' branch also includes GAN-based and autoregressive techniques, indicating that trajectory-level diffusion watermarking occupies a specialized niche within a diverse landscape of generative model protection strategies.

Among 19 candidates examined across three contributions, none were flagged as clearly refuting PAI's claims. The dual-stage injection mechanism was assessed against one candidate with no overlap found. The theoretical guarantee on key exclusivity examined eight candidates without identifying prior work establishing similar formal proofs. The unified forensic framework—supporting verification, attack detection, and semantic tampering localization—reviewed ten candidates, none providing equivalent multi-functional integration. These statistics reflect a limited semantic search scope rather than exhaustive coverage, suggesting the contributions appear novel within the examined subset but do not rule out relevant work beyond the top-19 matches.

Given the sparse taxonomy leaf and absence of refuting candidates in the limited search, PAI's trajectory-deflection approach and unified forensic capabilities appear to extend existing trajectory-level methods in meaningful ways. However, the analysis covers only 19 semantically similar papers from a 50-paper taxonomy, leaving open the possibility that related work in adjacent leaves or outside the search scope could provide additional context. The framework's novelty is most evident in its combination of trajectory steering with multi-functional forensic analysis, a pairing not explicitly represented in sibling papers.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
19
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: watermarking for AI-generated image copyright protection and forensic analysis. The field has evolved into a rich taxonomy spanning multiple branches that address distinct but interconnected challenges. Watermark Embedding Mechanisms in Generative Models focuses on integrating watermarks directly into the generation process—whether in GANs, diffusion models, or auto-regressive architectures—so that every synthesized image carries an intrinsic signature. Detection, Verification, and Attribution branches develop methods to reliably extract and validate these embedded signals, often under real-world distortions. Robustness and Attack Resistance explores how watermarks withstand adversarial manipulations, while Forensic Analysis and Tamper Detection examines post-hoc identification of alterations or provenance. Attack Analysis and Watermark Forgery investigates adversarial strategies that attempt to remove or forge watermarks, and Traditional and Classical Watermarking Techniques provides foundational methods adapted from pre-generative-AI eras. Finally, Surveys, Reviews, and Theoretical Frameworks (e.g., AI Image Watermarking Survey[7], Latent Diffusion Watermarking Review[17]) synthesize emerging trends and regulatory considerations such as EU AI Act Watermarking[3]. Within the diffusion model embedding branch, a particularly active line of work targets trajectory-level and noise-conditioned strategies that modulate the iterative denoising process itself. Semantic Deflection Watermarking[0] exemplifies this approach by steering intermediate latent states to encode ownership information without compromising visual fidelity, closely aligning with methods like Gaussian Shading[9] and Gaussian Shading Plus[15] that also manipulate noise schedules or latent perturbations. These techniques contrast with post-generation or frequency-domain methods (e.g., Frequency Spectrum Copyright[5]) that apply watermarks after synthesis, trading off imperceptibility for robustness. A central open question across these branches is balancing stealth, capacity, and resilience: trajectory-level embedding offers tighter integration but may be more vulnerable to adversarial purification attacks, whereas classical frequency techniques provide established robustness guarantees at the cost of potential perceptual artifacts. Semantic Deflection Watermarking[0] sits squarely in the trajectory-conditioned cluster, sharing design principles with Gaussian Shading[9] and Gaussian Shading Plus[15] while emphasizing semantic-level deflection to enhance both security and imperceptibility.

Claimed Contributions

PAI: Training-free inherent watermarking framework with dual-stage injection

The authors introduce PAI, a plug-and-play watermarking framework that embeds watermarks during both the initialization stage (via Box-Muller transformation) and the denoising stage (via key-conditioned deflection). This dual-stage design semantically couples user identity with content, enhancing robustness without requiring additional training or encoder-decoder networks.

1 retrieved paper
Theoretical guarantee on key exclusivity for verification

The authors prove that only the valid user key can pass verification by showing that invalid keys produce consistently higher initialization bias than valid keys, even when the forged key approaches the valid key. This theoretical analysis ensures that watermark verification is cryptographically sound and resistant to key forgery.

8 retrieved papers
Unified forensic framework supporting verification, attack detection, and semantic tampering localization

The authors design a unified verification framework that uses initialization bias in a low-dimensional latent space to simultaneously support ownership verification, distinguish between removal and spoofing attacks, and localize semantic-level tampering. This overcomes the limitation of existing methods that rely on one-dimensional verification signals and cannot handle advanced AIGC-based editing.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

PAI: Training-free inherent watermarking framework with dual-stage injection

The authors introduce PAI, a plug-and-play watermarking framework that embeds watermarks during both the initialization stage (via Box-Muller transformation) and the denoising stage (via key-conditioned deflection). This dual-stage design semantically couples user identity with content, enhancing robustness without requiring additional training or encoder-decoder networks.

Contribution

Theoretical guarantee on key exclusivity for verification

The authors prove that only the valid user key can pass verification by showing that invalid keys produce consistently higher initialization bias than valid keys, even when the forged key approaches the valid key. This theoretical analysis ensures that watermark verification is cryptographically sound and resistant to key forgery.

Contribution

Unified forensic framework supporting verification, attack detection, and semantic tampering localization

The authors design a unified verification framework that uses initialization bias in a low-dimensional latent space to simultaneously support ownership verification, distinguish between removal and spoofing attacks, and localize semantic-level tampering. This overcomes the limitation of existing methods that rely on one-dimensional verification signals and cannot handle advanced AIGC-based editing.