ON GOOGLE’S LLM WATERMARKING SYSTEM: THEORETICAL ANALYSIS AND EMPIRICAL VALIDATION

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

WatermarkingLarge Language Model (LLM)GoogleSynthID-Text

Google’s SynthID-Text, the first ever production-ready generative watermark system for large language model, designs a novel Tournament-based method that achieves the state-of-the-art detectability for identifying AI-generated texts. The system’s innovation lies in three key components: 1) a new Tournament sampling algorithm for watermarking embedding, 2) a detection strategy based on the introduced score function (e.g., Bayesian or mean score), and 3) a unified design that supports both distortionary and non-distortionary watermarking methods.

This paper presents the first theoretical analysis of SynthID-Text, with a focus on its detection performance and watermark robustness, complemented by empirical validation. For example, we prove that the mean score is inherently vulnerable to increased tournament layers, and design a layer inflation attack to break SynthID-Text. We also prove the Bayesian score offers improved watermark robustness w.r.t. layers and further establish that the optimal Bernoulli distribution for watermark detection is achieved when the parameter is set to 0.5. Together, these theoretical and empirical insights not only deepen our understanding of SynthID-Text, but also open new avenues for analyzing effective watermark removal strategies and designing robust watermarking techniques. Source code is available at https: //anonymous.4open.science/r/Break-Synth-ID-text-EE5D/

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

This paper provides the first theoretical analysis of Google's SynthID-Text, focusing on detection performance and robustness of its tournament-based watermarking approach. It resides in the 'Attack Strategies and Vulnerabilities' leaf within the 'Robustness Analysis and Attack Resistance' branch, alongside two sibling papers examining adversarial attacks and watermark removal strategies. This leaf represents a relatively sparse research direction with only three papers total, suggesting that theoretical vulnerability analysis of production watermarking systems remains an emerging area within the broader field of fifty surveyed works.

The taxonomy reveals that this work sits at the intersection of multiple research threads. Neighboring leaves include 'Robustness Evaluation and Benchmarking' (five papers on empirical robustness assessment) and 'Certified and Provable Robustness' (three papers on formal guarantees). The paper's theoretical approach to analyzing SynthID-Text's detection mechanisms connects to the 'Detection Theory and Statistical Frameworks' branch, particularly 'Statistical Detection Frameworks and Hypothesis Testing', while its focus on vulnerabilities distinguishes it from defensive works in adjacent categories. The taxonomy's scope and exclude notes clarify that this work belongs in attack analysis rather than detection optimization or embedding design.

Among twenty-one candidates examined across three contributions, none were found to clearly refute the paper's claims. The first contribution (theoretical analysis of SynthID-Text) examined ten candidates with zero refutable overlaps, suggesting novelty in applying formal analysis to this specific production system. The layer inflation attack contribution examined one candidate without refutation, indicating limited prior work on exploiting tournament-layer vulnerabilities. The Bernoulli distribution optimality proof examined ten candidates, again with no clear refutations, though this may reflect the limited search scope rather than absolute novelty. The statistics indicate that within the examined candidate pool, the theoretical characterization of SynthID-Text appears distinctive.

Based on the limited literature search of twenty-one semantically-related candidates, the work appears to occupy a relatively unexplored niche: formal theoretical analysis of a deployed watermarking system's vulnerabilities. The sparse population of its taxonomy leaf and absence of refuting candidates among those examined suggest potential novelty, though the analysis does not cover the entire field exhaustively. The contribution's distinctiveness may stem from targeting a specific production system rather than generic watermarking frameworks.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Theoretical analysis of LLM watermarking detection performance and robustness. The field of LLM watermarking has evolved into a structured landscape with several major branches. Watermarking Scheme Design and Theoretical Foundations encompasses foundational methods such as Watermark for LLMs[1] and Distortion-free Watermarks[16], which establish core embedding techniques and quality-preservation principles. Detection Theory and Statistical Frameworks includes works like Statistical Framework[25] and Universally Optimal[41] that formalize detection guarantees and optimality conditions. Robustness Analysis and Attack Resistance examines adversarial challenges, with studies such as Attacking Watermarks[32] and Watermark Under Fire[35] exploring vulnerabilities and defense mechanisms. Domain-Specific and Application-Oriented Watermarking addresses specialized contexts like code generation (Secure Code Watermarking[28]) and retrieval-augmented generation (RAG Watermark[14]), while Practical Deployment and System Considerations focuses on real-world implementation issues covered in surveys like SoK Deployment Ready[36] and LLM Watermarking Survey[12]. A particularly active tension exists between robustness guarantees and practical attack resistance. Works like Provable Robust Watermarking[9] and Certified Robust Watermark[10] pursue formal robustness certificates, while empirical studies such as Waterpark Robustness[31] and Adaptive Robust Watermark[40] investigate performance under diverse perturbations. Google LLM Watermarking[0] sits within the Attack Strategies and Vulnerabilities cluster, focusing on theoretical characterization of detection limits when adversaries attempt to evade or spoof watermarks. This contrasts with nearby defensive works like Defending Spoofing Attacks[48], which develops countermeasures against impersonation, and Attacking Watermarks[32], which systematically explores removal strategies. The original paper's emphasis on theoretical detection performance under adversarial conditions bridges the gap between formal robustness analysis and practical vulnerability assessment, addressing fundamental questions about what guarantees remain achievable when attackers actively manipulate watermarked text.

Claimed Contributions

First theoretical analysis of SynthID-Text detection performance and robustness

10 retrieved papers

The authors provide the first formal theoretical analysis of Google's SynthID-Text watermarking system, examining how detection performance (TPR at fixed FPR) varies with the number of tournament layers and choice of score function, using tools such as the Central Limit Theorem to derive closed-form expressions.

10 retrieved papers

Layer inflation attack exploiting mean score vulnerability

1 retrieved paper

The authors prove that the mean score exhibits unimodal behavior with respect to tournament layers and design a black-box layer inflation attack that artificially increases the number of layers to reduce detection effectiveness, demonstrating a fundamental vulnerability in SynthID-Text when using the mean score.

1 retrieved paper

Proof that Bernoulli(0.5) is optimal for watermark detection

10 retrieved papers

The authors theoretically prove that among all Bernoulli g-value distributions, Bernoulli(0.5) achieves the highest TPR at a given FPR for the mean score function, validating the default choice used in SynthID-Text and providing theoretical justification for this design decision.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[32] Attacking llm watermarks by exploiting their strengths PDF

Qi Pang, Shengyuan Hu, Wenting Zheng, Virginia Smith (2024)

[48] Defending LLM watermarking against spoofing attacks with contrastive representation learning PDF

An Li, Liu Yujian, Liu, Yepeng, Zhang Yang, Bu, Yuheng, Chang, Shiyu (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

First theoretical analysis of SynthID-Text detection performance and robustness

[1] A watermark for large language models PDF

Cannot Refute

[2] A survey of text watermarking in the era of large language models PDF

Cannot Refute

[3] On the reliability of watermarks for large language models PDF

Cannot Refute

[4] Duwak: Dual watermarks in large language models PDF

Cannot Refute

[9] Provable Robust Watermarking for AI-Generated Text PDF

Cannot Refute

[13] Optimized Couplings for Watermarking Large Language Models PDF

Cannot Refute

[22] WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off PDF

Cannot Refute

[51] BiMark: Unbiased Multilayer Watermarking for Large Language Models PDF

Cannot Refute

[52] Can AI-Generated Text be Reliably Detected? PDF

Cannot Refute

[53] GaussMark: A Practical Approach for Structural Watermarking of Language Models PDF

Cannot Refute

Contribution

Layer inflation attack exploiting mean score vulnerability

[62] Uncovering the Hidden Threat of Text Watermarking from Users with Cross-Lingual Knowledge PDF

Cannot Refute

Contribution

Proof that Bernoulli(0.5) is optimal for watermark detection

[4] Duwak: Dual watermarks in large language models PDF

Cannot Refute

[19] Robust detection of watermarks for large language models under human edits PDF

Cannot Refute

[54] An Entropy-based Text Watermarking Detection Method PDF

Cannot Refute

[55] An Ensemble Framework for Unbiased Language Model Watermarking PDF

Cannot Refute

[56] On the Empirical Power of Goodness-of-Fit Tests in Watermark Detection PDF

Cannot Refute

[57] Watermarking text generated by black-box language models PDF

Cannot Refute

[58] Cater: Intellectual property protection on text generation apis via conditional watermarks PDF

Cannot Refute

[59] Token-specific watermarking with enhanced detectability and semantic coherence for large language models PDF

Cannot Refute

[60] Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts PDF

Cannot Refute

[61] Analyzing and evaluating unbiased language model watermark PDF

Cannot Refute

ON GOOGLE’S LLM WATERMARKING SYSTEM: THEORETICAL ANALYSIS AND EMPIRICAL VALIDATION

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[32] Attacking llm watermarks by exploiting their strengths PDF

[48] Defending LLM watermarking against spoofing attacks with contrastive representation learning PDF

Contribution Analysis

First theoretical analysis of SynthID-Text detection performance and robustness

[1] A watermark for large language models PDF

[2] A survey of text watermarking in the era of large language models PDF

[3] On the reliability of watermarks for large language models PDF

[4] Duwak: Dual watermarks in large language models PDF

[9] Provable Robust Watermarking for AI-Generated Text PDF

[13] Optimized Couplings for Watermarking Large Language Models PDF

[22] WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off PDF

[51] BiMark: Unbiased Multilayer Watermarking for Large Language Models PDF

[52] Can AI-Generated Text be Reliably Detected? PDF

[53] GaussMark: A Practical Approach for Structural Watermarking of Language Models PDF

Layer inflation attack exploiting mean score vulnerability

[62] Uncovering the Hidden Threat of Text Watermarking from Users with Cross-Lingual Knowledge PDF

Proof that Bernoulli(0.5) is optimal for watermark detection

[4] Duwak: Dual watermarks in large language models PDF

[19] Robust detection of watermarks for large language models under human edits PDF

[54] An Entropy-based Text Watermarking Detection Method PDF

[55] An Ensemble Framework for Unbiased Language Model Watermarking PDF

[56] On the Empirical Power of Goodness-of-Fit Tests in Watermark Detection PDF

[57] Watermarking text generated by black-box language models PDF

[58] Cater: Intellectual property protection on text generation apis via conditional watermarks PDF

[59] Token-specific watermarking with enhanced detectability and semantic coherence for large language models PDF

[60] Optimal Estimation of Watermark Proportions in Hybrid AI-Human Texts PDF

[61] Analyzing and evaluating unbiased language model watermark PDF

Table of Contents