Cyber-Zero: Training Cybersecurity Agents without Runtime

ICLR 2026 Conference SubmissionAnonymous Authors
capture the flaglanguage model agentssecurityvulnerability
Abstract:

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Cyber-Zero, a runtime-free framework that synthesizes agent trajectories from CTF writeups to train cybersecurity LLMs. According to the taxonomy, this work occupies the 'Runtime-Free Trajectory Synthesis' leaf under 'Offensive Security Agent Training', where it is currently the sole paper. This positioning suggests the paper addresses a relatively sparse research direction within the broader offensive security landscape, which includes more populated areas like runtime-based penetration testing with multiple sibling approaches.

The taxonomy reveals that Cyber-Zero's nearest neighbors are runtime-based methods in sibling leaves: 'Sequence Modeling for Penetration Testing' (Pentraformer) and 'Reasoning-Optimized Penetration Testing' (Pentest-R1). These approaches require executable environments or simulators, whereas Cyber-Zero explicitly avoids runtime interaction. The broader 'Offensive Security Agent Training' branch also includes 'AI System Red-Teaming', which targets AI safety vulnerabilities rather than traditional network penetration. The defensive counterpart branch ('Offline RL for Cybersecurity Defense') addresses policy learning from historical logs but focuses on protection rather than offensive trajectory synthesis.

Among 21 candidates examined, none clearly refute the three core contributions. The CYBER-ZERO framework itself was compared against 1 candidate with no refutation found. The synthesized trajectory dataset and ENIGMA+ agent scaffold each faced 10 candidates, with all classified as non-refutable or unclear. This limited search scope—covering top-K semantic matches and citation expansion—suggests that within the examined literature, no prior work directly overlaps with the combination of runtime-free synthesis, persona-driven simulation, and CTF writeup exploitation for cybersecurity agent training.

Based on the 21-candidate search, the work appears to occupy a novel position at the intersection of trajectory synthesis and offensive security training. However, the analysis does not cover exhaustive domain-specific venues or gray literature in cybersecurity competitions. The taxonomy structure indicates this is an emerging direction with sparse prior work, though the limited search scope means additional related efforts in specialized CTF or security conferences may exist beyond the examined set.

Taxonomy

Core-task Taxonomy Papers
11
3
Claimed Contributions
21
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: runtime-free trajectory synthesis for training cybersecurity agents. The field addresses the challenge of training autonomous security agents without requiring expensive real-time interaction with live systems or simulators. The taxonomy reveals three main branches that capture distinct facets of this problem. Offline Reinforcement Learning for Cybersecurity Defense focuses on learning defensive policies from pre-collected datasets, enabling agents to protect systems against intrusions without online exploration—works like Offline Cyber Defense[9] and Healthcare IoT RL[1] exemplify this direction. Offensive Security Agent Training emphasizes the generation and use of synthetic attack trajectories to train penetration-testing agents, as seen in Pentraformer[2], ASTRA[3], and Pentest-R1[7]. The Reinforcement Learning Methodology branch encompasses foundational techniques—such as inverse learning approaches and policy optimization methods like TD3R[10]—that underpin both offensive and defensive settings. A particularly active line of work centers on synthesizing realistic offensive trajectories without executing attacks in production environments. Cyber-Zero[0] sits squarely within the Offensive Security Agent Training branch and shares this runtime-free philosophy with Pentraformer[2] and ASTRA[3], yet it distinguishes itself by leveraging large-scale trajectory generation to bootstrap agent learning from scratch. In contrast, Pentest-R1[7] and related methods often rely on iterative refinement or hybrid simulation strategies. Meanwhile, the defensive side grapples with distribution shift and the scarcity of labeled attack data, prompting interest in offline methods like Offline Cyber Defense[9] that learn from historical logs. Across both branches, a central tension persists: how to ensure that synthetically trained agents generalize to real-world adversarial dynamics and novel attack vectors, without incurring the cost and risk of live deployment during training.

Claimed Contributions

CYBER-ZERO runtime-free trajectory synthesis framework

The authors present CYBER-ZERO, a novel framework that synthesizes high-quality agent trajectories for training cybersecurity LLMs without requiring access to executable runtime environments. It uses persona-driven LLM simulation with dual models (CTF Player and Bash Terminal) to reverse-engineer behaviors from public CTF writeups and generate realistic multi-turn interaction sequences.

1 retrieved paper
Large-scale synthesized cybersecurity trajectory dataset

The authors build a dataset of 6,188 high-quality CTF writeups spanning 4,610 unique challenges from 543 competitions across six task categories. These synthesized trajectories enable training of LLM agents for vulnerability discovery and exploitation tasks without requiring runtime environments.

10 retrieved papers
ENIGMA+ agent scaffold with improved efficiency

The authors develop ENIGMA+, an enhanced version of the ENIGMA scaffold that executes evaluation tasks in parallel rather than sequentially. This improvement dramatically reduces evaluation time from 1-3 days to under 5 hours for 300+ CTF challenges while maintaining evaluation quality.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

CYBER-ZERO runtime-free trajectory synthesis framework

The authors present CYBER-ZERO, a novel framework that synthesizes high-quality agent trajectories for training cybersecurity LLMs without requiring access to executable runtime environments. It uses persona-driven LLM simulation with dual models (CTF Player and Bash Terminal) to reverse-engineer behaviors from public CTF writeups and generate realistic multi-turn interaction sequences.

Contribution

Large-scale synthesized cybersecurity trajectory dataset

The authors build a dataset of 6,188 high-quality CTF writeups spanning 4,610 unique challenges from 543 competitions across six task categories. These synthesized trajectories enable training of LLM agents for vulnerability discovery and exploitation tasks without requiring runtime environments.

Contribution

ENIGMA+ agent scaffold with improved efficiency

The authors develop ENIGMA+, an enhanced version of the ENIGMA scaffold that executes evaluation tasks in parallel rather than sequentially. This improvement dramatically reduces evaluation time from 1-3 days to under 5 hours for 300+ CTF challenges while maintaining evaluation quality.

Cyber-Zero: Training Cybersecurity Agents without Runtime | Novelty Validation