Every Language Model Has a Forgery-Resistant Signature

ICLR 2026 Conference SubmissionAnonymous Authors
fingerprintwatermarklanguage modelsignatureaccountabilitycryptographyforgerysecurity
Abstract:

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint—namely that language model outputs lie on the surface of a high-dimensional ellipse—functions as a signature for the model, which be used to identify which model an output came from. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model watermarks. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce logprobs on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model input or full weights. Finally, the signature is exceptionally redundant, as it is independently detectable in every single logprob output from the model. We evaluate a novel technique for extracting the ellipse on small models, and discuss the practical hurdles that make it infeasible for production-size models, making the signature hard to forge. Finally, we use ellipse signatures to propose a protocol for language model output verification, which is analogous to cryptographic symmetric-key message authentication systems.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes using elliptical geometric constraints in language model output distributions as a naturally occurring signature for model identification. It resides in the 'Geometric and Probabilistic Constraints' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. The sibling papers explore rhythmic statistical patterns and model inversion from outputs, suggesting this leaf focuses on mathematical structures inherent to generation processes rather than externally embedded signals or lexical features.

The taxonomy reveals that this work sits within 'Intrinsic Output Signature Detection,' which contrasts sharply with the more populated 'Embedded Signature Methods' branch containing watermarking frameworks and fingerprinting techniques. Neighboring leaves include 'Linguistic Feature Analysis' and 'Perplexity-Based Detection,' which analyze lexical patterns and perplexity metrics respectively. The scope notes clarify that this leaf excludes externally embedded watermarks and lexical analysis, positioning the ellipse signature approach as exploiting inherent mathematical constraints rather than engineered or linguistic features.

Among twenty candidates examined across three contributions, none were found to clearly refute the proposed work. The core ellipse signature contribution examined ten candidates with zero refutations, the forgery-resistance property examined seven with none refuting, and the authentication protocol examined three with none refuting. This limited search scope—twenty papers from semantic search and citation expansion—suggests the specific combination of elliptical constraints, forgery resistance, and self-contained detection may not have direct precedents in the examined literature, though the analysis does not claim exhaustive coverage.

Based on the limited search scope, the work appears to occupy a distinct position combining geometric constraints with cryptographic-style robustness guarantees. The sparse population of its taxonomy leaf and absence of refuting candidates among twenty examined papers suggest novelty within the analyzed sample, though the small search scale and narrow leaf membership leave open questions about broader field coverage and potential overlap with work outside the top-twenty semantic matches.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
20
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Identifying language models by their output signatures. The field has organized itself into several major branches that reflect different strategies for recognizing or controlling model-generated text. Intrinsic Output Signature Detection focuses on naturally occurring statistical and geometric patterns in model outputs, such as perplexity distributions or probabilistic constraints that emerge without explicit modification. Embedded Signature Methods, by contrast, actively inject watermarks or fingerprints into generated text to enable later attribution. Machine-Generated Text Detection develops classifiers and feature-based approaches to distinguish human from synthetic content, while Model Behavior Analysis examines broader patterns in how models respond to prompts and tasks. Additional branches address output quality, alignment with human values, and systematic evaluation frameworks, reflecting the community's concern with both technical identification and responsible deployment. Recent work has explored increasingly subtle trade-offs between detectability and text quality. Embedded approaches like Watermark[19] and Watermarking Through Models[8] enable robust attribution but may alter output distributions, whereas intrinsic methods seek signatures that arise organically from the generation process. Forgery Resistant Signature[0] sits within the geometric and probabilistic constraints cluster, emphasizing provenance guarantees that resist adversarial manipulation—a direction closely related to LLMs Have Rhythm[40], which uncovers rhythmic statistical patterns, and Model Inversion[41], which infers model properties from outputs. Compared to these neighbors, Forgery Resistant Signature[0] appears to prioritize cryptographic-style robustness over purely observational detection, addressing scenarios where attackers might attempt to forge or strip signatures. This positions it at the intersection of intrinsic detection and security-oriented design, contributing to ongoing debates about whether identification should rely on emergent properties or engineered guarantees.

Claimed Contributions

Ellipse signature for language model identification

The authors demonstrate that the geometric constraint forcing language model logits onto a high-dimensional ellipse can serve as a signature to identify which model generated a given output. This ellipse signature arises naturally from the normalization and linear layers in standard language model architectures.

10 retrieved papers
Forgery-resistant property of ellipse signatures

The authors show that ellipse signatures are forgery-resistant because extracting the ellipse from API-protected models is computationally expensive (requiring O(d^3 log d) queries and O(d^6) time complexity for fitting), making it practically infeasible to generate conforming logprobs without direct parameter access.

7 retrieved papers
Message authentication protocol using ellipse signatures

The authors propose a verification protocol where the model ellipse functions as a secret key analogous to cryptographic message authentication codes. Parties with access to the secret ellipse parameters can generate and verify logprobs, enabling output authentication without revealing model parameters.

3 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Ellipse signature for language model identification

The authors demonstrate that the geometric constraint forcing language model logits onto a high-dimensional ellipse can serve as a signature to identify which model generated a given output. This ellipse signature arises naturally from the normalization and linear layers in standard language model architectures.

Contribution

Forgery-resistant property of ellipse signatures

The authors show that ellipse signatures are forgery-resistant because extracting the ellipse from API-protected models is computationally expensive (requiring O(d^3 log d) queries and O(d^6) time complexity for fitting), making it practically infeasible to generate conforming logprobs without direct parameter access.

Contribution

Message authentication protocol using ellipse signatures

The authors propose a verification protocol where the model ellipse functions as a secret key analogous to cryptographic message authentication codes. Parties with access to the secret ellipse parameters can generate and verify logprobs, enabling output authentication without revealing model parameters.