Towards Prompt-Robust Machine-Generated Text Detection

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 5.5 Download Report PDF

LLM detectionRewrite-based detectionLearning distancePrompt robust

Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Yet, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLM-generated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 57.8% to 80.6% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini).

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a geometric framework for understanding rewrite-based detection and proposes an adaptive distance learning algorithm for identifying LLM-generated text. It resides in the 'Rewrite-Based and Paraphrase Detection' leaf, which contains only two papers total (including this one). This is a notably sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that rewrite-based approaches remain relatively underexplored compared to statistical or supervised methods, which collectively contain nine papers across two neighboring leaves.

The taxonomy reveals that most detection work clusters around statistical zero-shot methods (five papers) and supervised feature-based approaches (four papers), with additional focus on robustness and adversarial scenarios (three papers). The paper's rewrite-based approach sits adjacent to these mainstream directions but diverges by leveraging paraphrasing transformations rather than direct statistical properties or trained classifiers. The taxonomy's scope notes clarify that rewrite-based methods explicitly compare original versus rewritten versions, distinguishing them from zero-shot methods that analyze text in isolation or supervised approaches that rely on labeled corpora.

Among the 17 candidates examined across three contributions, no clearly refuting prior work was identified. The geometric framework contribution examined one candidate with no refutation; the adaptive distance learning algorithm examined six candidates with none refuting; and the theoretical characterization examined ten candidates with none refuting. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the specific combination of geometric interpretation, adaptive distance learning, and theoretical guarantees appears not to have direct precedent, though the small candidate pool (17 total) means unexplored literature may exist.

Based on the limited search of 17 candidates, the work appears to occupy a relatively novel position within rewrite-based detection, particularly in its geometric framing and adaptive distance approach. However, the sparse population of its taxonomy leaf (only one sibling paper) and the modest search scope mean this assessment reflects top-ranked semantic matches rather than exhaustive coverage. The absence of refuting candidates across all contributions may indicate genuine novelty or simply that closely related work was not captured in the search.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: detecting machine-generated text under diverse prompts. The field has organized itself around four main branches that reflect both technical and societal dimensions. Detection Methodologies and Algorithms encompasses the core techniques—ranging from statistical and neural classifiers to rewrite-based and paraphrase detection methods—that aim to distinguish human from machine text. Detector Evaluation and Benchmarking focuses on systematic assessment frameworks, including cross-domain robustness studies like Cross-Domain Detection[2] and comprehensive benchmarks such as MGTBench[12] and M4GT-Bench[20]. Human Perception and Interaction Studies examine how people recognize or fail to recognize synthetic content, while Misuse and Security Implications address adversarial scenarios and the potential for evasion, as explored in works like Evading Detection[49] and Malicious Prompt Engineering[42]. Together, these branches capture the interplay between algorithmic innovation, rigorous evaluation, human factors, and emerging threats. Within Detection Methodologies, a particularly active line of work targets robustness to input variation and paraphrasing. Methods like MAGE[1] and Latent-Space Detection[3] explore feature representations that remain stable across rewrites, while DeTinyLLM[4] investigates lightweight architectures for efficient detection. Prompt-Robust Detection[0] sits squarely in this rewrite-based cluster, emphasizing resilience when adversaries or benign users alter prompts or rephrase outputs. Compared to DeTinyLLM[4], which prioritizes model compactness, Prompt-Robust Detection[0] focuses more directly on handling prompt diversity as a central challenge. Meanwhile, broader evaluation efforts like MULTITuDE[5] and Science of Detection[6] stress the need for detectors that generalize across domains and generation strategies, highlighting an ongoing tension between specialized, high-accuracy methods and versatile, robust solutions that maintain performance under real-world variability.

Claimed Contributions

Geometric framework for understanding rewrite-based detection methods

1 retrieved paper

The authors develop a geometric framework using Hilbert space projections to explain why rewrite-based methods work for detecting LLM-generated text. They prove that human-written text has larger reconstruction error than LLM-generated text (Proposition 1) and that these methods generalize to unseen prompts (Proposition 2).

1 retrieved paper

Adaptive distance learning algorithm for LLM-generated text detection

6 retrieved papers

The authors propose a new rewrite-based detection method that learns a distance function parameterized by a language model, rather than using fixed distances like existing approaches. They theoretically justify this approach by showing that adaptively learned distances are more effective than fixed distances (Proposition 3).

6 retrieved papers

Theoretical characterization of optimal distance function for detection

10 retrieved papers

The authors provide a theoretical result (Proposition 3) characterizing the form of the optimal distance function for maximizing the gap in reconstruction error between human-written and LLM-generated text, showing it should assign zero distance when both texts are LLM-generated and maximum distance when one is human-written.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[4] DeTinyLLM: Efficient detection of machine-generated text via compact paraphrase transformation PDF

Shilei Tan, Yongcheng Zhou, Haoxiang Liu, Xue-song Wang, Si Chen, Xuesong Wang, Wei Gong (2025)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Geometric framework for understanding rewrite-based detection methods

[57] A hybrid model for the detection of multi-agent written news articles based on linguistic features and BERT PDF

Cannot Refute

Contribution

Adaptive distance learning algorithm for LLM-generated text detection

[51] Learning to rewrite: Generalized llm-generated text detection PDF

Cannot Refute

[52] Dmqr-rag: Diverse multi-query rewriting for rag PDF

Cannot Refute

[53] Floquetifying stabiliser codes with distance-preserving rewrites PDF

Cannot Refute

[54] Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers PDF

Cannot Refute

[55] Machine Learning Model for Paraphrases Detection Based on Text Content Pair Binary Classification. PDF

Cannot Refute

[56] Reducing the plagiarism detection search space on the basis of the kullback-leibler distance PDF

Cannot Refute

Contribution

Theoretical characterization of optimal distance function for detection

[7] Can AI-Generated Text be Reliably Detected? PDF

Cannot Refute

[15] Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text PDF

Cannot Refute

[58] Real or Fake? Learning to Discriminate Machine from Human Generated Text PDF

Cannot Refute

[59] Taming the Turing Test: Exploring Machine Learning Approaches to Discriminate Human vs. AI-Generated Texts. PDF

Cannot Refute

[60] Detecting AI-generated essays: the ChatGPT challenge PDF

Cannot Refute

[61] A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach PDF

Cannot Refute

[62] Raidar: geneRative AI Detection viA Rewriting PDF

Cannot Refute

[63] MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers PDF

Cannot Refute

[64] Who Will Author the Synthetic Texts? Evoking Multiple Personas from Large Language Models to Represent Users' Associative Thesauri PDF

Cannot Refute

[65] Text Origin Detection: Unmasking the Source â AI vs Human PDF

Cannot Refute

Towards Prompt-Robust Machine-Generated Text Detection

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[4] DeTinyLLM: Efficient detection of machine-generated text via compact paraphrase transformation PDF

Contribution Analysis

Geometric framework for understanding rewrite-based detection methods

[57] A hybrid model for the detection of multi-agent written news articles based on linguistic features and BERT PDF

Adaptive distance learning algorithm for LLM-generated text detection

[51] Learning to rewrite: Generalized llm-generated text detection PDF

[52] Dmqr-rag: Diverse multi-query rewriting for rag PDF

[53] Floquetifying stabiliser codes with distance-preserving rewrites PDF

[54] Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers PDF

[55] Machine Learning Model for Paraphrases Detection Based on Text Content Pair Binary Classification. PDF

[56] Reducing the plagiarism detection search space on the basis of the kullback-leibler distance PDF

Theoretical characterization of optimal distance function for detection

[7] Can AI-Generated Text be Reliably Detected? PDF

[15] Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text PDF

[58] Real or Fake? Learning to Discriminate Machine from Human Generated Text PDF

[59] Taming the Turing Test: Exploring Machine Learning Approaches to Discriminate Human vs. AI-Generated Texts. PDF

[60] Detecting AI-generated essays: the ChatGPT challenge PDF

[61] A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach PDF

[62] Raidar: geneRative AI Detection viA Rewriting PDF

[63] MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers PDF

[64] Who Will Author the Synthetic Texts? Evoking Multiple Personas from Large Language Models to Represent Users' Associative Thesauri PDF

[65] Text Origin Detection: Unmasking the Source â AI vs Human PDF

Table of Contents

[65] Text Origin Detection: Unmasking the Source â AI vs Human PDF