THE SELF-RE-WATERMARKING TRAP: FROM EXPLOIT TO RESILIENCE

ICLR 2026 Conference SubmissionAnonymous Authors
watemarkingdeep learningAI SecurityRe-Watermarkingattack
Abstract:

Watermarking has been widely used for copyright protection of digital images. Deep learning-based watermarking systems have recently emerged as more robust and effective than traditional methods, offering improved fidelity and resilience against attacks. Among the various threats to deep learning-based watermarking systems, self-re-watermarking attacks represent a critical and underexplored challenge. In such attacks, the same encoder is maliciously reused to embed a new message into an already watermarked image. This process effectively prevents the original decoder from retrieving the original watermark without introducing perceptual artifacts. In this work, we make two key contributions. First, we introduce the self-re-watermarking threat model as a novel attack vector and demonstrate that existing state-of-the-art watermarking methods consistently fail under such attacks. Second, we develop a self-aware deep watermarking framework to defend against this threat. Our key insight for mitigating the risk of self-re-watermarking is to limit the sensitivity of the watermarking models to the inputs, thereby resisting re-embedding of new watermarks. To achieve this, we propose a self-aware deep watermarking framework that extends Lipschitz constraints to the watermarking process, regulating encoder–decoder sensitivity in a principled manner. In addition, the framework incorporates re-watermarking adversarial training, which further constrains sensitivity to distortions arising from re-embedding. The proposed method provides theoretical bounds on message recoverability under malicious encoder based re-watermarking and demonstrates strong empirical robustness against diverse scenarios of re-watermarking attempts. In addition, it maintains high visual fidelity and demonstrates competitive robustness against common image processing distortions compared to state-of-the-art watermarking methods. This work establishes a robust defense against both standard distortions and self-re-watermarking attacks. The implementation will be made publicly available in GitHub.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a self-aware watermarking framework to defend against self-re-watermarking attacks, where adversaries reuse the same encoder to overwrite original watermarks. It resides in the 'Sensitivity-Constrained Watermarking Frameworks' leaf, which contains only two papers total. This leaf sits within the broader 'Deep Learning Watermarking Defense Mechanisms' branch, indicating a relatively sparse research direction focused on architectural and training-level defenses. The small sibling count suggests this specific approach to sensitivity regulation is not yet crowded, though the parent branch encompasses diverse defense strategies across parameter-level protection and integrated authentication frameworks.

The taxonomy reveals neighboring work in 'Parameter-Level Watermark Protection' (six papers across three sub-leaves) and 'Generative Model and Output Watermarking' (five papers across four sub-leaves), indicating the field has concentrated more on protecting model weights and generative outputs than on sensitivity-based defenses. The 'Parametric Vulnerability Reduction' and 'Integrated Watermarking and Authentication Frameworks' leaves each contain single papers, suggesting emerging but underdeveloped directions. The paper's focus on encoder-decoder sensitivity constraints diverges from frequency-domain methods and backdoor penetration approaches, carving a distinct niche within the defense landscape.

Among nineteen candidates examined, the self-re-watermarking threat model (Contribution 1) shows one refutable candidate from three examined, suggesting some prior recognition of iterative embedding risks. The self-aware framework with Lipschitz constraints (Contribution 2) found no refutations across six candidates, indicating potential novelty in this specific defense mechanism. The theoretical bit-error rate analysis (Contribution 3) examined ten candidates without refutation, though the limited search scope means unexplored literature may exist. The statistics suggest the framework and theoretical contributions face less direct prior work than the threat model itself.

Based on top-nineteen semantic matches and citation expansion, the work appears to occupy a sparsely populated intersection of sensitivity constraints and re-watermarking defenses. The analysis covers a focused subset of the watermarking literature, leaving open the possibility of relevant work in adjacent domains like adversarial robustness or iterative image processing that may not surface through watermarking-centric search strategies.

Taxonomy

Core-task Taxonomy Papers
18
3
Claimed Contributions
19
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: defending deep learning-based image watermarking against self-re-watermarking attacks. The field of deep learning watermarking has evolved into several distinct branches addressing different modalities and threat models. Deep Learning Watermarking Defense Mechanisms focuses on protecting embedded watermarks from adversarial removal or overwriting, with works exploring sensitivity constraints, high-frequency domain defenses, and parametric robustness. Generative Model and Output Watermarking targets the protection of AI-generated content and model outputs, while Audio Watermarking with Deep Learning extends these techniques to the audio domain, as seen in works like Robust Audio Watermarking[1] and DeAR Audio Resilient[6]. Comprehensive Surveys and Comparative Studies provide broader perspectives on the landscape, synthesizing trends across modalities and attack scenarios. These branches collectively address the tension between watermark imperceptibility, robustness against attacks, and computational efficiency. Recent work has intensified around adversarial scenarios where attackers exploit the watermarking mechanism itself. High-Frequency Attack Defense[3] and High-Frequency Overwriting Attack[16] illustrate the arms race in frequency-domain manipulations, while Reducing Parametric Vulnerability[4] and White-Box Watermarking Robustness[5] tackle white-box threats where attackers have full model access. Self-Re-Watermarking Trap[0] sits within the Sensitivity-Constrained Watermarking Frameworks branch, addressing a particularly insidious attack where adversaries re-watermark already protected images to confuse ownership verification. This work shares thematic ground with White-Box Watermarking Robustness[5] in confronting sophisticated adversaries, yet emphasizes sensitivity constraints to prevent cascading degradation from repeated watermarking. Compared to High-Frequency Attack Defense[3], which focuses on spectral manipulations, Self-Re-Watermarking Trap[0] targets the logical vulnerability of iterative embedding, highlighting an emerging concern about recursive attacks in watermarking ecosystems.

Claimed Contributions

Self-re-watermarking threat model

The authors formalize a new adversarial scenario in which an attacker reuses the same encoder to embed a new watermark into an already watermarked image, effectively overwriting the original message. They show empirically that current deep watermarking systems are vulnerable to this attack.

3 retrieved papers
Can Refute
Self-aware deep watermarking framework with Lipschitz constraints

The authors propose a watermarking framework that extends Lipschitz constraints to the encoder–decoder architecture and incorporates re-watermarking adversarial training. This design regulates model sensitivity to resist re-embedding of new watermarks while maintaining fidelity and robustness.

6 retrieved papers
Theoretical analysis of bit-error rate under self-re-watermarking

The authors formally analyze the system's bit-error rate when subjected to self-re-watermarking attacks, deriving an upper bound that relates decoder Lipschitz constant, distortion magnitude, and clean margin to message recovery performance.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Self-re-watermarking threat model

The authors formalize a new adversarial scenario in which an attacker reuses the same encoder to embed a new watermark into an already watermarked image, effectively overwriting the original message. They show empirically that current deep watermarking systems are vulnerable to this attack.

Contribution

Self-aware deep watermarking framework with Lipschitz constraints

The authors propose a watermarking framework that extends Lipschitz constraints to the encoder–decoder architecture and incorporates re-watermarking adversarial training. This design regulates model sensitivity to resist re-embedding of new watermarks while maintaining fidelity and robustness.

Contribution

Theoretical analysis of bit-error rate under self-re-watermarking

The authors formally analyze the system's bit-error rate when subjected to self-re-watermarking attacks, deriving an upper bound that relates decoder Lipschitz constant, distortion magnitude, and clean margin to message recovery performance.