All Code, No Thought: Language Models Struggle to Reason in Ciphered Language

ICLR 2026 Conference SubmissionAnonymous Authors
AI safetychain of thoughtLLMCoT monitoring
Abstract:

Detecting harmful AI actions is important as AI agents gain adoption. Chain-of-thought (CoT) monitoring is one method widely used to detect adversarial attacks and AI misalignment. However, attackers and misaligned models might evade CoT monitoring through ciphered reasoning: reasoning hidden in encrypted, translated, or compressed text. To assess this risk, we test whether models can perform ciphered reasoning. For each of 28 different ciphers, we fine-tune and prompt up to 10 models to reason in that cipher. We measure model accuracy on math problems as a proxy for reasoning ability. Across the models we test, we find an asymmetry: model accuracy can drop significantly when reasoning in ciphered text, even though models demonstrate comprehension of ciphered text by being able to translate it accurately to English. Even frontier models struggle with lesser-known ciphers, although they can reason accurately in well-known ciphers like rot13. We show that ciphered reasoning capability correlates with cipher prevalence in pretraining data. We also identify scaling laws showing that ciphered reasoning capability improves slowly with additional fine-tuning data. Our work suggests that evading CoT monitoring using ciphered reasoning may be an ineffective tactic for current models and offers guidance on constraining the development of this capability in future frontier models.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper evaluates whether language models can perform mathematical reasoning when inputs and intermediate steps are expressed in various ciphers, testing 28 cipher types across multiple models. It resides in the 'Specialized Applications of Encrypted Reasoning' leaf, which contains only two papers total. This leaf sits at the periphery of the taxonomy, distinct from the more populated branches on privacy-preserving inference (18 papers across homomorphic encryption and secure computation) and adversarial exploitation (5 papers on jailbreak attacks). The sparse population suggests this particular angle—assessing reasoning capability rather than cryptographic security or adversarial robustness—is relatively underexplored in the literature.

The taxonomy reveals several neighboring research directions. 'Cryptanalysis and Cipher Decryption Using AI' (6 papers) focuses on breaking encryption, while 'Adversarial Exploitation of Encrypted Reasoning' (5 papers) examines jailbreak attacks via cipher encoding. 'Privacy-Preserving Inference on Encrypted Data' (18 papers) emphasizes computational security guarantees through homomorphic encryption or secure multi-party computation. The original paper diverges by treating ciphers as a lens for understanding model reasoning limitations rather than as cryptographic primitives to attack or defend. Its scope excludes both adversarial safety bypasses and formal privacy protocols, instead probing the cognitive boundaries of language models when linguistic structure is obscured.

Among 30 candidates examined, the contribution on cipher prevalence correlating with reasoning performance encountered one potentially refutable prior work, while the other two contributions—evaluating ciphered reasoning capability and identifying scaling laws—showed no clear refutation across 10 candidates each. The limited search scope means these statistics reflect top-K semantic matches and citation expansion, not exhaustive coverage. The evaluation framework and scaling law analyses appear less contested in the examined literature, whereas the data-prevalence hypothesis may overlap with existing observations about pretraining distribution effects. The sibling paper in the same leaf addresses network traffic analysis, offering minimal direct overlap with mathematical reasoning in ciphers.

Given the sparse taxonomy leaf and the modest search scale, the work appears to occupy a relatively novel position within the examined literature. The absence of extensive prior work on reasoning-in-cipher evaluation suggests the framing is distinctive, though the single refutable candidate for the prevalence-correlation claim indicates some conceptual overlap exists. The analysis does not cover the full breadth of cipher-related AI research, so conclusions remain provisional pending broader literature review.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
30
Contribution Candidate Papers Compared
1
Refutable Paper

Research Landscape Overview

Core task: reasoning in encrypted or encoded text. This field spans a diverse set of challenges, from breaking classical ciphers with modern AI techniques (Cryptanalysis and Cipher Decryption Using AI) to adversarially exploiting encrypted reasoning systems, ensuring privacy-preserving inference on encrypted data, and protecting models or representations through obfuscation. Privacy-Preserving Inference on Encrypted Data includes work on homomorphic encryption for neural networks (Encrypted Learning Models[4], Homomorphic Encrypted Databases[5]) and secure transformer inference (Secure Transformer Inference[11]), while Data Obfuscation and Representation Protection addresses methods that scramble inputs or latent features to prevent unauthorized access (Instance Obfuscation[25], Shielding Latent Faces[30]). Model and Algorithm Protection focuses on safeguarding intellectual property and preventing extraction attacks (Model Extraction Defense[15]), and Privacy-Preserving Data Publishing and Querying explores secure search and linked data (Searching Encrypted Data[8], Privacy Preserving Linked Data[1]). Specialized Applications of Encrypted Reasoning captures domain-specific uses, such as network traffic analysis (PacketCLIP[16]) and historical manuscript decryption (Encrypted Thomas More[32]). A particularly active line of work examines whether large language models can perform reasoning over ciphered inputs, balancing utility with confidentiality. Ciphered Language Reasoning[0] sits squarely in this specialized application space, exploring how models handle encrypted natural language tasks. This contrasts with cryptanalysis-focused efforts like Attention Transformer Cryptanalysis[42] and Zero Shot Cryptanalysis[47], which aim to break encryption rather than reason within it, and with privacy-preserving inference methods like Secure Prompt Ensembling[14] or Private Language Model[20], which emphasize computational security guarantees over linguistic obfuscation. Meanwhile, adversarial studies such as Complex Ciphers Jailbreak[13] probe whether encryption can be exploited to bypass safety mechanisms. Ciphered Language Reasoning[0] thus occupies a niche where the goal is not cryptographic robustness per se, but rather understanding and enabling model comprehension of encoded text—a theme that also appears in Cipherbank[2] and Arabic Encrypted Texts[34], which provide benchmarks for evaluating such capabilities across languages and cipher types.

Claimed Contributions

Evaluation of ciphered reasoning capability across models and ciphers

The authors systematically test whether language models can perform mathematical reasoning when their chain-of-thought is expressed in various encrypted, translated, or compressed forms. They find models can translate ciphered text accurately but experience significant accuracy drops when reasoning in ciphered language, even for frontier models on lesser-known ciphers.

10 retrieved papers
Correlation between cipher prevalence in pretraining data and reasoning performance

The authors demonstrate that ciphered reasoning capability correlates with how frequently a cipher appears in pretraining corpora. They estimate cipher prevalence using token n-gram frequencies and show strong correlation (R² = 0.906 for structure-preserving ciphers) between pretraining prevalence and math accuracy.

10 retrieved papers
Can Refute
Data- and parameter-scaling laws for ciphered reasoning

The authors characterize how ciphered reasoning capability scales with fine-tuning data and model parameters. They show that even a simple cipher requires more than 3.7 billion tokens of ciphered fine-tuning data to approach plain-text reasoning accuracy, suggesting substantial data requirements for developing this capability.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Evaluation of ciphered reasoning capability across models and ciphers

The authors systematically test whether language models can perform mathematical reasoning when their chain-of-thought is expressed in various encrypted, translated, or compressed forms. They find models can translate ciphered text accurately but experience significant accuracy drops when reasoning in ciphered language, even for frontier models on lesser-known ciphers.

Contribution

Correlation between cipher prevalence in pretraining data and reasoning performance

The authors demonstrate that ciphered reasoning capability correlates with how frequently a cipher appears in pretraining corpora. They estimate cipher prevalence using token n-gram frequencies and show strong correlation (R² = 0.906 for structure-preserving ciphers) between pretraining prevalence and math accuracy.

Contribution

Data- and parameter-scaling laws for ciphered reasoning

The authors characterize how ciphered reasoning capability scales with fine-tuning data and model parameters. They show that even a simple cipher requires more than 3.7 billion tokens of ciphered fine-tuning data to approach plain-text reasoning accuracy, suggesting substantial data requirements for developing this capability.

All Code, No Thought: Language Models Struggle to Reason in Ciphered Language | Novelty Validation