Every Language Model Has a Forgery-Resistant Signature

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

fingerprintwatermarklanguage modelsignatureaccountabilitycryptographyforgerysecurity

The ubiquity of closed-weight language models with public-facing APIs has generated interest in forensic methods, both for extracting hidden model details (e.g., parameters) and identifying models by their outputs. One successful approach to these goals has been to exploit the geometric constraints imposed by the language model architecture and parameters. In this work, we show that a lesser-known geometric constraint—namely that language model outputs lie on the surface of a high-dimensional ellipse—functions as a signature for the model, which be used to identify which model an output came from. This ellipse signature has unique properties that distinguish it from existing model-output association methods like language model watermarks. In particular, the signature is hard to forge: without direct access to model parameters, it is practically infeasible to produce logprobs on the ellipse. Secondly, the signature is naturally occurring, since all language models have these elliptical constraints. Thirdly, the signature is self-contained, in that it is detectable without access to the model input or full weights. Finally, the signature is exceptionally redundant, as it is independently detectable in every single logprob output from the model. We evaluate a novel technique for extracting the ellipse on small models, and discuss the practical hurdles that make it infeasible for production-size models, making the signature hard to forge. Finally, we use ellipse signatures to propose a protocol for language model output verification, which is analogous to cryptographic symmetric-key message authentication systems.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes using elliptical geometric constraints in language model output distributions as a naturally occurring signature for model identification. It resides in the 'Geometric and Probabilistic Constraints' leaf, which contains only three papers total, indicating a relatively sparse research direction within the broader taxonomy. The sibling papers explore rhythmic statistical patterns and model inversion from outputs, suggesting this leaf focuses on mathematical structures inherent to generation processes rather than externally embedded signals or lexical features.

The taxonomy reveals that this work sits within 'Intrinsic Output Signature Detection,' which contrasts sharply with the more populated 'Embedded Signature Methods' branch containing watermarking frameworks and fingerprinting techniques. Neighboring leaves include 'Linguistic Feature Analysis' and 'Perplexity-Based Detection,' which analyze lexical patterns and perplexity metrics respectively. The scope notes clarify that this leaf excludes externally embedded watermarks and lexical analysis, positioning the ellipse signature approach as exploiting inherent mathematical constraints rather than engineered or linguistic features.

Among twenty candidates examined across three contributions, none were found to clearly refute the proposed work. The core ellipse signature contribution examined ten candidates with zero refutations, the forgery-resistance property examined seven with none refuting, and the authentication protocol examined three with none refuting. This limited search scope—twenty papers from semantic search and citation expansion—suggests the specific combination of elliptical constraints, forgery resistance, and self-contained detection may not have direct precedents in the examined literature, though the analysis does not claim exhaustive coverage.

Based on the limited search scope, the work appears to occupy a distinct position combining geometric constraints with cryptographic-style robustness guarantees. The sparse population of its taxonomy leaf and absence of refuting candidates among twenty examined papers suggest novelty within the analyzed sample, though the small search scale and narrow leaf membership leave open questions about broader field coverage and potential overlap with work outside the top-twenty semantic matches.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Identifying language models by their output signatures. The field has organized itself into several major branches that reflect different strategies for recognizing or controlling model-generated text. Intrinsic Output Signature Detection focuses on naturally occurring statistical and geometric patterns in model outputs, such as perplexity distributions or probabilistic constraints that emerge without explicit modification. Embedded Signature Methods, by contrast, actively inject watermarks or fingerprints into generated text to enable later attribution. Machine-Generated Text Detection develops classifiers and feature-based approaches to distinguish human from synthetic content, while Model Behavior Analysis examines broader patterns in how models respond to prompts and tasks. Additional branches address output quality, alignment with human values, and systematic evaluation frameworks, reflecting the community's concern with both technical identification and responsible deployment. Recent work has explored increasingly subtle trade-offs between detectability and text quality. Embedded approaches like Watermark[19] and Watermarking Through Models[8] enable robust attribution but may alter output distributions, whereas intrinsic methods seek signatures that arise organically from the generation process. Forgery Resistant Signature[0] sits within the geometric and probabilistic constraints cluster, emphasizing provenance guarantees that resist adversarial manipulation—a direction closely related to LLMs Have Rhythm[40], which uncovers rhythmic statistical patterns, and Model Inversion[41], which infers model properties from outputs. Compared to these neighbors, Forgery Resistant Signature[0] appears to prioritize cryptographic-style robustness over purely observational detection, addressing scenarios where attackers might attempt to forge or strip signatures. This positions it at the intersection of intrinsic detection and security-oriented design, contributing to ongoing debates about whether identification should rely on emergent properties or engineered guarantees.

Claimed Contributions

Ellipse signature for language model identification

10 retrieved papers

The authors demonstrate that the geometric constraint forcing language model logits onto a high-dimensional ellipse can serve as a signature to identify which model generated a given output. This ellipse signature arises naturally from the normalization and linear layers in standard language model architectures.

10 retrieved papers

Forgery-resistant property of ellipse signatures

7 retrieved papers

The authors show that ellipse signatures are forgery-resistant because extracting the ellipse from API-protected models is computationally expensive (requiring O(d^3 log d) queries and O(d^6) time complexity for fitting), making it practically infeasible to generate conforming logprobs without direct parameter access.

7 retrieved papers

Message authentication protocol using ellipse signatures

3 retrieved papers

The authors propose a verification protocol where the model ellipse functions as a secret key analogous to cryptographic message authentication codes. Parties with access to the secret ellipse parameters can generate and verify logprobs, enabling output authentication without revealing model parameters.

3 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[40] LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis PDF

Alhazbi, Saeif, Oligeri, Gabriele, Papadimitratos, Panos (2025) • IEEE Open Journal of the Communications Society

[41] Language Model Inversion PDF

Morris, John X., Zhao Wenting, John X. Morris, Chiu, Justin T., Wenting Zhao, Shmatikov, Vitaly, Justin T Chiu, Rush, Alexander M., Vitaly Shmatikov, Alexander M. Rush (2023)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Ellipse signature for language model identification

[61] The Linear Representation Hypothesis and the Geometry of Large Language Models PDF

Cannot Refute

[62] Inflection-dependent gradient masking in predictive distribution collapse: A procedural mechanism in large language models PDF

Cannot Refute

[63] The geometry of tokens in internal representations of large language models PDF

Cannot Refute

[64] Detectgpt: Zero-shot machine-generated text detection using probability curvature PDF

Cannot Refute

[65] Curved inference: Concern-sensitive geometry in large language model residual streams PDF

Cannot Refute

[66] Gradient boundary infiltration in large language models: A projection-based constraint framework for distributional trace locality PDF

Cannot Refute

[67] The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models PDF

Cannot Refute

[68] Tracing the representation geometry of language models from pretraining to post-training PDF

Cannot Refute

[69] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models PDF

Cannot Refute

[70] Towards distribution matching between collaborative and language spaces for generative recommendation PDF

Cannot Refute

Contribution

Forgery-resistant property of ellipse signatures

[54] Enhancing Proof-of-Learning Security Against Spoofing Attacks Using Model Watermarking PDF

Cannot Refute

[55] Thieves on Sesame Street! Model Extraction of BERT-based APIs PDF

Cannot Refute

[56] Stealing Machine Learning Models via Prediction APIs PDF

Cannot Refute

[57] Model Inversion Attacks Against Graph Neural Networks PDF

Cannot Refute

[58] A practical introduction to side-channel extraction of deep neural network parameters PDF

Cannot Refute

[59] Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks PDF

Cannot Refute

[60] OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution PDF

Cannot Refute

Contribution

Message authentication protocol using ellipse signatures

[51] CLUE: Non-parametric Verification from Experience via Hidden-State Clustering PDF

Cannot Refute

[52] Truth or Dare: Measuring the Geometry of Information with LLMs PDF

Cannot Refute

[53] Curved Inference PDF

Cannot Refute

Every Language Model Has a Forgery-Resistant Signature

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[40] LLMs Have Rhythm: Fingerprinting Large Language Models Using Inter-Token Times and Network Traffic Analysis PDF

[41] Language Model Inversion PDF

Contribution Analysis

Ellipse signature for language model identification

[61] The Linear Representation Hypothesis and the Geometry of Large Language Models PDF

[62] Inflection-dependent gradient masking in predictive distribution collapse: A procedural mechanism in large language models PDF

[63] The geometry of tokens in internal representations of large language models PDF

[64] Detectgpt: Zero-shot machine-generated text detection using probability curvature PDF

[65] Curved inference: Concern-sensitive geometry in large language model residual streams PDF

[66] Gradient boundary infiltration in large language models: A projection-based constraint framework for distributional trace locality PDF

[67] The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models PDF

[68] Tracing the representation geometry of language models from pretraining to post-training PDF

[69] Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models PDF

[70] Towards distribution matching between collaborative and language spaces for generative recommendation PDF

Forgery-resistant property of ellipse signatures

[54] Enhancing Proof-of-Learning Security Against Spoofing Attacks Using Model Watermarking PDF

[55] Thieves on Sesame Street! Model Extraction of BERT-based APIs PDF

[56] Stealing Machine Learning Models via Prediction APIs PDF

[57] Model Inversion Attacks Against Graph Neural Networks PDF

[58] A practical introduction to side-channel extraction of deep neural network parameters PDF

[59] Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks PDF

[60] OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution PDF

Message authentication protocol using ellipse signatures

[51] CLUE: Non-parametric Verification from Experience via Hidden-State Clustering PDF

[52] Truth or Dare: Measuring the Geometry of Information with LLMs PDF

[53] Curved Inference PDF

Table of Contents