Cyber-Zero: Training Cybersecurity Agents without Runtime

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

capture the flaglanguage model agentssecurityvulnerability

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, particularly in resolving GitHub issues. However, such runtime environments are often unavailable in other domains, especially cybersecurity, where challenge configurations and execution contexts are ephemeral or restricted. We present Cyber-Zero, the first runtime-free framework for synthesizing high-quality agent trajectories to train cybersecurity LLMs. Cyber-Zero leverages publicly available CTF writeups and employs persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual environments. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench. Our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, and demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces Cyber-Zero, a runtime-free framework that synthesizes agent trajectories from CTF writeups to train cybersecurity LLMs. According to the taxonomy, this work occupies the 'Runtime-Free Trajectory Synthesis' leaf under 'Offensive Security Agent Training', where it is currently the sole paper. This positioning suggests the paper addresses a relatively sparse research direction within the broader offensive security landscape, which includes more populated areas like runtime-based penetration testing with multiple sibling approaches.

The taxonomy reveals that Cyber-Zero's nearest neighbors are runtime-based methods in sibling leaves: 'Sequence Modeling for Penetration Testing' (Pentraformer) and 'Reasoning-Optimized Penetration Testing' (Pentest-R1). These approaches require executable environments or simulators, whereas Cyber-Zero explicitly avoids runtime interaction. The broader 'Offensive Security Agent Training' branch also includes 'AI System Red-Teaming', which targets AI safety vulnerabilities rather than traditional network penetration. The defensive counterpart branch ('Offline RL for Cybersecurity Defense') addresses policy learning from historical logs but focuses on protection rather than offensive trajectory synthesis.

Among 21 candidates examined, none clearly refute the three core contributions. The CYBER-ZERO framework itself was compared against 1 candidate with no refutation found. The synthesized trajectory dataset and ENIGMA+ agent scaffold each faced 10 candidates, with all classified as non-refutable or unclear. This limited search scope—covering top-K semantic matches and citation expansion—suggests that within the examined literature, no prior work directly overlaps with the combination of runtime-free synthesis, persona-driven simulation, and CTF writeup exploitation for cybersecurity agent training.

Based on the 21-candidate search, the work appears to occupy a novel position at the intersection of trajectory synthesis and offensive security training. However, the analysis does not cover exhaustive domain-specific venues or gray literature in cybersecurity competitions. The taxonomy structure indicates this is an emerging direction with sparse prior work, though the limited search scope means additional related efforts in specialized CTF or security conferences may exist beyond the examined set.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: runtime-free trajectory synthesis for training cybersecurity agents. The field addresses the challenge of training autonomous security agents without requiring expensive real-time interaction with live systems or simulators. The taxonomy reveals three main branches that capture distinct facets of this problem. Offline Reinforcement Learning for Cybersecurity Defense focuses on learning defensive policies from pre-collected datasets, enabling agents to protect systems against intrusions without online exploration—works like Offline Cyber Defense[9] and Healthcare IoT RL[1] exemplify this direction. Offensive Security Agent Training emphasizes the generation and use of synthetic attack trajectories to train penetration-testing agents, as seen in Pentraformer[2], ASTRA[3], and Pentest-R1[7]. The Reinforcement Learning Methodology branch encompasses foundational techniques—such as inverse learning approaches and policy optimization methods like TD3R[10]—that underpin both offensive and defensive settings. A particularly active line of work centers on synthesizing realistic offensive trajectories without executing attacks in production environments. Cyber-Zero[0] sits squarely within the Offensive Security Agent Training branch and shares this runtime-free philosophy with Pentraformer[2] and ASTRA[3], yet it distinguishes itself by leveraging large-scale trajectory generation to bootstrap agent learning from scratch. In contrast, Pentest-R1[7] and related methods often rely on iterative refinement or hybrid simulation strategies. Meanwhile, the defensive side grapples with distribution shift and the scarcity of labeled attack data, prompting interest in offline methods like Offline Cyber Defense[9] that learn from historical logs. Across both branches, a central tension persists: how to ensure that synthetically trained agents generalize to real-world adversarial dynamics and novel attack vectors, without incurring the cost and risk of live deployment during training.

Claimed Contributions

CYBER-ZERO runtime-free trajectory synthesis framework

1 retrieved paper

The authors present CYBER-ZERO, a novel framework that synthesizes high-quality agent trajectories for training cybersecurity LLMs without requiring access to executable runtime environments. It uses persona-driven LLM simulation with dual models (CTF Player and Bash Terminal) to reverse-engineer behaviors from public CTF writeups and generate realistic multi-turn interaction sequences.

1 retrieved paper

Large-scale synthesized cybersecurity trajectory dataset

10 retrieved papers

The authors build a dataset of 6,188 high-quality CTF writeups spanning 4,610 unique challenges from 543 competitions across six task categories. These synthesized trajectories enable training of LLM agents for vulnerability discovery and exploitation tasks without requiring runtime environments.

10 retrieved papers

ENIGMA+ agent scaffold with improved efficiency

10 retrieved papers

The authors develop ENIGMA+, an enhanced version of the ENIGMA scaffold that executes evaluation tasks in parallel rather than sequentially. This improvement dramatically reduces evaluation time from 1-3 days to under 5 hours for 300+ CTF challenges while maintaining evaluation quality.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

CYBER-ZERO runtime-free trajectory synthesis framework

[32] Generative AI for Simulating Real World Dynamics Applications and Challenges PDF

Cannot Refute

Contribution

Large-scale synthesized cybersecurity trajectory dataset

[22] Generative AI-Enhanced Cybersecurity Framework for Enterprise Data Privacy Management PDF

Cannot Refute

[23] Approach to Forming Vulnerability Datasets for Fine-Tuning AI Agents PDF

Cannot Refute

[24] AI-enabled Cybersecurity using Synthetic Data PDF

Cannot Refute

[25] An Ensemble Transformer Approach with Cross-Attention for Automated Code Security Vulnerability Detection and Documentation PDF

Cannot Refute

[26] Leveraging gans for synthetic data generation to improve intrusion detection systems PDF

Cannot Refute

[27] A novel deep synthesis-based insider intrusion detection (DS-IID) model for malicious insiders and AI-generated threats PDF

Cannot Refute

[28] Evaluating Biased Synthetic Data Effects on Large Language Model-Based Software Vulnerability Detection PDF

Cannot Refute

[29] DeepBalance: Deep-Learning and Fuzzy Oversampling for Vulnerability Detection PDF

Cannot Refute

[30] Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation PDF

Cannot Refute

[31] A Multimodal Framework for Advanced Cybersecurity Threat Detection Using GAN-Driven Data Synthesis PDF

Cannot Refute

Contribution

ENIGMA+ agent scaffold with improved efficiency

[12] Parallel WaveNet: Fast High-Fidelity Speech Synthesis PDF

Cannot Refute

[13] Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning PDF

Cannot Refute

[14] Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution PDF

Cannot Refute

[15] Repoforge: Training a sota fast-thinking swe agent with an end-to-end data curation pipeline synergizing sft and rl at scale PDF

Cannot Refute

[16] Resource-Aware Multi-Fidelity Multi-Objective Multidisciplinary Design Optimization PDF

Cannot Refute

[17] Parallel and High-Fidelity Text-to-Lip Generation PDF

Cannot Refute

[18] Parallel Smell Agent Optimization (SAO): Collaborative Subpopulations for Accelerated Convergence PDF

Cannot Refute

[19] Physics-Aware Compilation for Parallel Quantum Circuit Execution on Neutral Atom Arrays PDF

Cannot Refute

[20] A Parallel GEM5-Based Simulation Infrastructure for Multicluster SoC Performance Evaluation PDF

Cannot Refute

[21] A Survey on Benchmarks of LLM-based GUI Agents PDF

Cannot Refute

Cyber-Zero: Training Cybersecurity Agents without Runtime

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

CYBER-ZERO runtime-free trajectory synthesis framework

[32] Generative AI for Simulating Real World Dynamics Applications and Challenges PDF

Large-scale synthesized cybersecurity trajectory dataset

[22] Generative AI-Enhanced Cybersecurity Framework for Enterprise Data Privacy Management PDF

[23] Approach to Forming Vulnerability Datasets for Fine-Tuning AI Agents PDF

[24] AI-enabled Cybersecurity using Synthetic Data PDF

[25] An Ensemble Transformer Approach with Cross-Attention for Automated Code Security Vulnerability Detection and Documentation PDF

[26] Leveraging gans for synthetic data generation to improve intrusion detection systems PDF

[27] A novel deep synthesis-based insider intrusion detection (DS-IID) model for malicious insiders and AI-generated threats PDF

[28] Evaluating Biased Synthetic Data Effects on Large Language Model-Based Software Vulnerability Detection PDF

[29] DeepBalance: Deep-Learning and Fuzzy Oversampling for Vulnerability Detection PDF

[30] Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation PDF

[31] A Multimodal Framework for Advanced Cybersecurity Threat Detection Using GAN-Driven Data Synthesis PDF

ENIGMA+ agent scaffold with improved efficiency

[12] Parallel WaveNet: Fast High-Fidelity Speech Synthesis PDF

[13] Adaptive Job Scheduling in Quantum Clouds Using Reinforcement Learning PDF

[14] Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution PDF

[15] Repoforge: Training a sota fast-thinking swe agent with an end-to-end data curation pipeline synergizing sft and rl at scale PDF

[16] Resource-Aware Multi-Fidelity Multi-Objective Multidisciplinary Design Optimization PDF

[17] Parallel and High-Fidelity Text-to-Lip Generation PDF

[18] Parallel Smell Agent Optimization (SAO): Collaborative Subpopulations for Accelerated Convergence PDF

[19] Physics-Aware Compilation for Parallel Quantum Circuit Execution on Neutral Atom Arrays PDF

[20] A Parallel GEM5-Based Simulation Infrastructure for Multicluster SoC Performance Evaluation PDF

[21] A Survey on Benchmarks of LLM-based GUI Agents PDF

Table of Contents