Towards Prompt-Robust Machine-Generated Text Detection

ICLR 2026 Conference SubmissionAnonymous Authors
LLM detectionRewrite-based detectionLearning distancePrompt robust
Abstract:

Modern large language models (LLMs) such as GPT, Claude, and Gemini have transformed the way we learn, work, and communicate. Yet, their ability to produce highly human-like text raises serious concerns about misinformation and academic integrity, making it an urgent need for reliable algorithms to detect LLM-generated content. In this paper, we start by presenting a geometric approach to demystify rewrite-based detection algorithms, revealing their underlying rationale and demonstrating their generalization ability. Building on this insight, we introduce a novel rewrite-based detection algorithm that adaptively learns the distance between the original and rewritten text. Theoretically, we demonstrate that employing an adaptively learned distance function is more effective for detection than using a fixed distance. Empirically, we conduct extensive experiments with over 100 settings, and find that our approach demonstrates superior performance over baseline algorithms in the majority of scenarios. In particular, it achieves relative improvements from 57.8% to 80.6% over the strongest baseline across different target LLMs (e.g., GPT, Claude, and Gemini).

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a geometric framework for understanding rewrite-based detection and proposes an adaptive distance learning algorithm for identifying LLM-generated text. It resides in the 'Rewrite-Based and Paraphrase Detection' leaf, which contains only two papers total (including this one). This is a notably sparse research direction within the broader taxonomy of 50 papers across 36 topics, suggesting that rewrite-based approaches remain relatively underexplored compared to statistical or supervised methods, which collectively contain nine papers across two neighboring leaves.

The taxonomy reveals that most detection work clusters around statistical zero-shot methods (five papers) and supervised feature-based approaches (four papers), with additional focus on robustness and adversarial scenarios (three papers). The paper's rewrite-based approach sits adjacent to these mainstream directions but diverges by leveraging paraphrasing transformations rather than direct statistical properties or trained classifiers. The taxonomy's scope notes clarify that rewrite-based methods explicitly compare original versus rewritten versions, distinguishing them from zero-shot methods that analyze text in isolation or supervised approaches that rely on labeled corpora.

Among the 17 candidates examined across three contributions, no clearly refuting prior work was identified. The geometric framework contribution examined one candidate with no refutation; the adaptive distance learning algorithm examined six candidates with none refuting; and the theoretical characterization examined ten candidates with none refuting. This suggests that within the limited search scope—top-K semantic matches plus citation expansion—the specific combination of geometric interpretation, adaptive distance learning, and theoretical guarantees appears not to have direct precedent, though the small candidate pool (17 total) means unexplored literature may exist.

Based on the limited search of 17 candidates, the work appears to occupy a relatively novel position within rewrite-based detection, particularly in its geometric framing and adaptive distance approach. However, the sparse population of its taxonomy leaf (only one sibling paper) and the modest search scope mean this assessment reflects top-ranked semantic matches rather than exhaustive coverage. The absence of refuting candidates across all contributions may indicate genuine novelty or simply that closely related work was not captured in the search.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
17
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: detecting machine-generated text under diverse prompts. The field has organized itself around four main branches that reflect both technical and societal dimensions. Detection Methodologies and Algorithms encompasses the core techniques—ranging from statistical and neural classifiers to rewrite-based and paraphrase detection methods—that aim to distinguish human from machine text. Detector Evaluation and Benchmarking focuses on systematic assessment frameworks, including cross-domain robustness studies like Cross-Domain Detection[2] and comprehensive benchmarks such as MGTBench[12] and M4GT-Bench[20]. Human Perception and Interaction Studies examine how people recognize or fail to recognize synthetic content, while Misuse and Security Implications address adversarial scenarios and the potential for evasion, as explored in works like Evading Detection[49] and Malicious Prompt Engineering[42]. Together, these branches capture the interplay between algorithmic innovation, rigorous evaluation, human factors, and emerging threats. Within Detection Methodologies, a particularly active line of work targets robustness to input variation and paraphrasing. Methods like MAGE[1] and Latent-Space Detection[3] explore feature representations that remain stable across rewrites, while DeTinyLLM[4] investigates lightweight architectures for efficient detection. Prompt-Robust Detection[0] sits squarely in this rewrite-based cluster, emphasizing resilience when adversaries or benign users alter prompts or rephrase outputs. Compared to DeTinyLLM[4], which prioritizes model compactness, Prompt-Robust Detection[0] focuses more directly on handling prompt diversity as a central challenge. Meanwhile, broader evaluation efforts like MULTITuDE[5] and Science of Detection[6] stress the need for detectors that generalize across domains and generation strategies, highlighting an ongoing tension between specialized, high-accuracy methods and versatile, robust solutions that maintain performance under real-world variability.

Claimed Contributions

Geometric framework for understanding rewrite-based detection methods

The authors develop a geometric framework using Hilbert space projections to explain why rewrite-based methods work for detecting LLM-generated text. They prove that human-written text has larger reconstruction error than LLM-generated text (Proposition 1) and that these methods generalize to unseen prompts (Proposition 2).

1 retrieved paper
Adaptive distance learning algorithm for LLM-generated text detection

The authors propose a new rewrite-based detection method that learns a distance function parameterized by a language model, rather than using fixed distances like existing approaches. They theoretically justify this approach by showing that adaptively learned distances are more effective than fixed distances (Proposition 3).

6 retrieved papers
Theoretical characterization of optimal distance function for detection

The authors provide a theoretical result (Proposition 3) characterizing the form of the optimal distance function for maximizing the gap in reconstruction error between human-written and LLM-generated text, showing it should assign zero distance when both texts are LLM-generated and maximum distance when one is human-written.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Geometric framework for understanding rewrite-based detection methods

The authors develop a geometric framework using Hilbert space projections to explain why rewrite-based methods work for detecting LLM-generated text. They prove that human-written text has larger reconstruction error than LLM-generated text (Proposition 1) and that these methods generalize to unseen prompts (Proposition 2).

Contribution

Adaptive distance learning algorithm for LLM-generated text detection

The authors propose a new rewrite-based detection method that learns a distance function parameterized by a language model, rather than using fixed distances like existing approaches. They theoretically justify this approach by showing that adaptively learned distances are more effective than fixed distances (Proposition 3).

Contribution

Theoretical characterization of optimal distance function for detection

The authors provide a theoretical result (Proposition 3) characterizing the form of the optimal distance function for maximizing the gap in reconstruction error between human-written and LLM-generated text, showing it should assign zero distance when both texts are LLM-generated and maximum distance when one is human-written.

Towards Prompt-Robust Machine-Generated Text Detection | Novelty Validation