THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

watemarkingdeep learningAI SecurityRe-Watermarkingattack

Watermarking has been widely used for copyright protection of digital images. Deep learning-based watermarking systems have recently emerged as more robust and effective than traditional methods, offering improved fidelity and resilience against attacks. Among the various threats to deep learning-based watermarking systems, self-re-watermarking attacks represent a critical and underexplored challenge. In such attacks, the same encoder is maliciously reused to embed a new message into an already watermarked image. This process effectively prevents the original decoder from retrieving the original watermark without introducing perceptual artifacts. In this work, we make two key contributions. First, we introduce the self-re-watermarking threat model as a novel attack vector and demonstrate that existing state-of-the-art watermarking methods consistently fail under such attacks. Second, we develop a self-aware deep watermarking framework to defend against this threat. Our key insight for mitigating the risk of self-re-watermarking is to limit the sensitivity of the watermarking models to the inputs, thereby resisting re-embedding of new watermarks. To achieve this, we propose a self-aware deep watermarking framework that extends Lipschitz constraints to the watermarking process, regulating encoder–decoder sensitivity in a principled manner. In addition, the framework incorporates re-watermarking adversarial training, which further constrains sensitivity to distortions arising from re-embedding. The proposed method provides theoretical bounds on message recoverability under malicious encoder based re-watermarking and demonstrates strong empirical robustness against diverse scenarios of re-watermarking attempts. In addition, it maintains high visual fidelity and demonstrates competitive robustness against common image processing distortions compared to state-of-the-art watermarking methods. This work establishes a robust defense against both standard distortions and self-re-watermarking attacks. The implementation will be made publicly available in GitHub.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a self-aware watermarking framework to defend against self-re-watermarking attacks, where adversaries reuse the same encoder to overwrite original watermarks. It resides in the 'Sensitivity-Constrained Watermarking Frameworks' leaf, which contains only two papers total. This leaf sits within the broader 'Deep Learning Watermarking Defense Mechanisms' branch, indicating a relatively sparse research direction focused on architectural and training-level defenses. The small sibling count suggests this specific approach to sensitivity regulation is not yet crowded, though the parent branch encompasses diverse defense strategies across parameter-level protection and integrated authentication frameworks.

The taxonomy reveals neighboring work in 'Parameter-Level Watermark Protection' (six papers across three sub-leaves) and 'Generative Model and Output Watermarking' (five papers across four sub-leaves), indicating the field has concentrated more on protecting model weights and generative outputs than on sensitivity-based defenses. The 'Parametric Vulnerability Reduction' and 'Integrated Watermarking and Authentication Frameworks' leaves each contain single papers, suggesting emerging but underdeveloped directions. The paper's focus on encoder-decoder sensitivity constraints diverges from frequency-domain methods and backdoor penetration approaches, carving a distinct niche within the defense landscape.

Among nineteen candidates examined, the self-re-watermarking threat model (Contribution 1) shows one refutable candidate from three examined, suggesting some prior recognition of iterative embedding risks. The self-aware framework with Lipschitz constraints (Contribution 2) found no refutations across six candidates, indicating potential novelty in this specific defense mechanism. The theoretical bit-error rate analysis (Contribution 3) examined ten candidates without refutation, though the limited search scope means unexplored literature may exist. The statistics suggest the framework and theoretical contributions face less direct prior work than the threat model itself.

Based on top-nineteen semantic matches and citation expansion, the work appears to occupy a sparsely populated intersection of sensitivity constraints and re-watermarking defenses. The analysis covers a focused subset of the watermarking literature, leaving open the possibility of relevant work in adjacent domains like adversarial robustness or iterative image processing that may not surface through watermarking-centric search strategies.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: defending deep learning-based image watermarking against self-re-watermarking attacks. The field of deep learning watermarking has evolved into several distinct branches addressing different modalities and threat models. Deep Learning Watermarking Defense Mechanisms focuses on protecting embedded watermarks from adversarial removal or overwriting, with works exploring sensitivity constraints, high-frequency domain defenses, and parametric robustness. Generative Model and Output Watermarking targets the protection of AI-generated content and model outputs, while Audio Watermarking with Deep Learning extends these techniques to the audio domain, as seen in works like Robust Audio Watermarking[1] and DeAR Audio Resilient[6]. Comprehensive Surveys and Comparative Studies provide broader perspectives on the landscape, synthesizing trends across modalities and attack scenarios. These branches collectively address the tension between watermark imperceptibility, robustness against attacks, and computational efficiency. Recent work has intensified around adversarial scenarios where attackers exploit the watermarking mechanism itself. High-Frequency Attack Defense[3] and High-Frequency Overwriting Attack[16] illustrate the arms race in frequency-domain manipulations, while Reducing Parametric Vulnerability[4] and White-Box Watermarking Robustness[5] tackle white-box threats where attackers have full model access. Self-Re-Watermarking Trap[0] sits within the Sensitivity-Constrained Watermarking Frameworks branch, addressing a particularly insidious attack where adversaries re-watermark already protected images to confuse ownership verification. This work shares thematic ground with White-Box Watermarking Robustness[5] in confronting sophisticated adversaries, yet emphasizes sensitivity constraints to prevent cascading degradation from repeated watermarking. Compared to High-Frequency Attack Defense[3], which focuses on spectral manipulations, Self-Re-Watermarking Trap[0] targets the logical vulnerability of iterative embedding, highlighting an emerging concern about recursive attacks in watermarking ecosystems.

Claimed Contributions

Self-re-watermarking threat model

Can Refute

3 retrieved papers

The authors formalize a new adversarial scenario in which an attacker reuses the same encoder to embed a new watermark into an already watermarked image, effectively overwriting the original message. They show empirically that current deep watermarking systems are vulnerable to this attack.

3 retrieved papers

Can Refute

Self-aware deep watermarking framework with Lipschitz constraints

6 retrieved papers

The authors propose a watermarking framework that extends Lipschitz constraints to the encoder–decoder architecture and incorporates re-watermarking adversarial training. This design regulates model sensitivity to resist re-embedding of new watermarks while maintaining fidelity and robustness.

6 retrieved papers

Theoretical analysis of bit-error rate under self-re-watermarking

10 retrieved papers

The authors formally analyze the system's bit-error rate when subjected to self-re-watermarking attacks, deriving an upper bound that relates decoder Lipschitz constant, distortion magnitude, and clean margin to message recovery performance.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[18] THE SELF-RE-WATERMARKING TRAP: FROM EX PDF

PTO RESILIENCE (0)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Self-re-watermarking threat model

[19] DLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques PDF

Can Refute

[17] Deep Watermarking for Deep Intellectual Property Protection: A Comprehensive Survey PDF

Cannot Refute

[18] THE SELF-RE-WATERMARKING TRAP: FROM EX PDF

Cannot Refute

Contribution

Self-aware deep watermarking framework with Lipschitz constraints

[18] THE SELF-RE-WATERMARKING TRAP: FROM EX PDF

Cannot Refute

[20] Dimension-independent certified neural network watermarks via mollifier smoothing PDF

Cannot Refute

[21] GAN-based image steganography by exploiting transform domain knowledge with deep networks PDF

Cannot Refute

[22] Achieving domain-independent certified robustness via knowledge continuity PDF

Cannot Refute

[23] A GAN-based Digital Image Watermarking Model with Attention Augmented Feature Extractor and Spatial Transformer PDF

Cannot Refute

[24] Shallow Diffuse: Robust and Invisible Watermarking through Low-Dim Subspaces in Diffusion Models PDF

Cannot Refute

Contribution

Theoretical analysis of bit-error rate under self-re-watermarking

[25] The Coding Limits of Robust Watermarking for Generative Models PDF

Cannot Refute

[26] Performance Evaluation of Associative Watermarking Using Statistical Neurodynamics PDF

Cannot Refute

[27] DCT-domain watermarking techniques for still images: Detector performance analysis and a new structure PDF

Cannot Refute

[28] Practical capacity of digital watermarks PDF

Cannot Refute

[29] The role of information theory in watermarking and its application to image watermarking PDF

Cannot Refute

[30] Watermarking of uncompressed and compressed video PDF

Cannot Refute

[31] Algorithms for audio watermarking and steganography PDF

Cannot Refute

[32] Statistical analysis of watermarking schemes for copyright protection of images PDF

Cannot Refute

[33] On random coding error exponents of watermarking systems PDF

Cannot Refute

[34] On error-correcting fingerprinting codes for use with watermarking PDF

Cannot Refute

THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[18] THE SELF-RE-WATERMARKING TRAP: FROM EX PDF

Contribution Analysis

Self-re-watermarking threat model

[19] DLOVE: A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques PDF

[17] Deep Watermarking for Deep Intellectual Property Protection: A Comprehensive Survey PDF

[18] THE SELF-RE-WATERMARKING TRAP: FROM EX PDF

Self-aware deep watermarking framework with Lipschitz constraints

[18] THE SELF-RE-WATERMARKING TRAP: FROM EX PDF

[20] Dimension-independent certified neural network watermarks via mollifier smoothing PDF

[21] GAN-based image steganography by exploiting transform domain knowledge with deep networks PDF

[22] Achieving domain-independent certified robustness via knowledge continuity PDF

[23] A GAN-based Digital Image Watermarking Model with Attention Augmented Feature Extractor and Spatial Transformer PDF

[24] Shallow Diffuse: Robust and Invisible Watermarking through Low-Dim Subspaces in Diffusion Models PDF

Theoretical analysis of bit-error rate under self-re-watermarking

[25] The Coding Limits of Robust Watermarking for Generative Models PDF

[26] Performance Evaluation of Associative Watermarking Using Statistical Neurodynamics PDF

[27] DCT-domain watermarking techniques for still images: Detector performance analysis and a new structure PDF

[28] Practical capacity of digital watermarks PDF

[29] The role of information theory in watermarking and its application to image watermarking PDF

[30] Watermarking of uncompressed and compressed video PDF

[31] Algorithms for audio watermarking and steganography PDF

[32] Statistical analysis of watermarking schemes for copyright protection of images PDF

[33] On random coding error exponents of watermarking systems PDF

[34] On error-correcting fingerprinting codes for use with watermarking PDF

Table of Contents