Systematic Biosafety Evaluation of DNA Language Models under Jailbreak Attacks

ICLR 2026 Conference SubmissionAnonymous Authors
Jailbreak Attacks; DNA language models
Abstract:

DNA, encoding genetic instructions for almost all living organisms, fuels groundbreaking advances in genomics and synthetic biology. Recently, DNA Language Models have achieved success in designing synthetic functional DNA sequences, even whole genomes of novel bacteriophage, verified with wet lab experiments. Such remarkable generative power also brings severe biosafety concerns about whether DNA language models can design human viruses. With the goal of exposing vulnerabilities and informing the development of robust safeguarding techniques, we perform a systematic biosafety evaluation of DNA language models through the lens of jailbreak attacks. Specifically, we introduce JailbreakDNABench, a benchmark centered on high-priority human viruses, together with an end-to-end jailbreak framework, GeneBreaker. GeneBreaker integrates three key components: (1) an LLM agent equipped with customized bioinformatics tools to design high-homology yet non-pathogenic jailbreak prompts, (2) beam search guided by PathoLM and log-probability heuristics to steer sequence generation toward pathogen-like outputs, and (3) a BLAST- and function-annotation–based evaluation pipeline to identify successful jailbreaks. On JailbreakDNABench, GeneBreaker successfully jailbreaks the latest Evo series models across 6 viral categories consistently (up to 60% Attack Success Rate for Evo2-40B). Further case studies on SARS-CoV-2 spike protein and HIV-1 envelope protein demonstrate the sequence and structural fidelity of jailbreak output, while evolutionary modeling of SARS-CoV-2 underscores biosecurity risks. Our findings also reveal that scaling DNA language models amplifies dual-use risks, motivating enhanced safety alignment and tracing mechanisms.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces JailbreakDNABench and GeneBreaker, a benchmark and attack framework for evaluating biosafety vulnerabilities in DNA language models. It resides in the 'Pathogenicity-Guided Jailbreak Systems' leaf, which contains only two papers total. This sparse population suggests the work addresses an emerging and relatively unexplored research direction within the broader biosafety evaluation landscape. The taxonomy shows the field is still nascent, with only ten papers across all branches, indicating that systematic jailbreak evaluation of DNA models is a frontier area rather than a crowded subfield.

The taxonomy reveals five major branches addressing AI biosafety: jailbreak attacks, adversarial robustness, safety benchmarks, defensive mechanisms, and policy perspectives. The paper's leaf sits within 'Jailbreak Attack Frameworks and Methodologies', adjacent to 'Adversarial Robustness Assessment' which examines embedding-space attacks and toxicity analysis. Neighboring branches include defensive techniques like watermarking and concept erasure, plus broader evaluation frameworks such as Scisafeeval. The scope notes clarify that pathogenicity-guided methods are distinct from general adversarial robustness work, positioning this contribution at the intersection of domain-specific biological knowledge and adversarial prompting.

Among three candidates examined across three contributions, none were clearly refuted by prior work. The JailbreakDNABench benchmark examined one candidate with no refutations, as did the GeneBreaker framework and the methodological insight combining prompt design with guided beam search. This limited search scope—only three candidates total—means the analysis captures a narrow slice of potentially relevant literature. The absence of refutations within this small sample suggests the specific combination of benchmark, attack framework, and pathogenicity-guided beam search may represent a novel integration, though a broader search could reveal additional overlapping efforts.

Based on the limited examination of three candidates, the work appears to occupy a sparsely populated research direction with no immediate prior work providing the same combination of benchmark, attack framework, and guided generation. However, the small search scope and the field's rapid evolution mean this assessment reflects only top-ranked semantic matches rather than exhaustive coverage. The taxonomy structure confirms this is an emerging area where systematic evaluation frameworks are still being established.

Taxonomy

Core-task Taxonomy Papers
10
3
Claimed Contributions
3
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: biosafety evaluation of DNA language models under jailbreak attacks. The field has rapidly organized around five major branches that together address the emerging risks of generative AI in biosciences. Jailbreak Attack Frameworks and Methodologies develop adversarial prompting techniques to elicit harmful biological outputs, including pathogenicity-guided systems that exploit domain-specific vulnerabilities. Adversarial Robustness Assessment examines how models respond to these attacks, while Safety Benchmarks and Evaluation Frameworks provide standardized testbeds for measuring dual-use risks. Defensive Mechanisms and Mitigation Strategies explore technical safeguards such as output filtering and knowledge deletion, and Biosecurity Threat Analysis and Policy Perspectives situate these technical challenges within broader governance and regulatory contexts. Representative works like Scisafeeval[2] and SafeGenes[4] illustrate how evaluation and defense efforts often intertwine, while policy-oriented studies such as Biosecurity-Aware AI[5] and Securing Dual-Use Pathogen Data[8] bridge technical and regulatory concerns. A particularly active line of work focuses on pathogenicity-guided jailbreak systems, where adversaries craft prompts that leverage biological domain knowledge to bypass safety filters. Systematic Biosafety Evaluation of[0] sits squarely within this branch, developing a comprehensive framework for testing DNA language models against such attacks. Its emphasis on systematic evaluation contrasts with GeneBreaker[1], which focuses more narrowly on specific attack vectors, yet both share a common goal of exposing vulnerabilities before malicious actors can exploit them. Meanwhile, defensive approaches like SafeGenie[10] and Deleting Biological Weapons Knowledge[9] explore whether harmful capabilities can be removed post-training, raising open questions about the trade-offs between model utility and safety. The original paper's contribution lies in its structured methodology for assessing jailbreak susceptibility, positioning it as a bridge between attack development and the broader evaluation frameworks that inform policy and mitigation strategies.

Claimed Contributions

JailbreakDNABench benchmark for biosafety evaluation

The authors introduce JailbreakDNABench, a systematic benchmark consisting of six high-priority human viral categories (e.g., large DNA viruses, small DNA viruses, positive-strand RNA viruses) together with an evaluation pipeline using BLAST and function annotation to assess biosafety vulnerabilities of DNA language models.

1 retrieved paper
GeneBreaker jailbreak attack framework

The authors develop GeneBreaker, an end-to-end jailbreak framework that integrates an LLM agent for designing high-homology non-pathogenic prompts, beam search guided by PathoLM and log-probability heuristics, and a BLAST-based evaluation pipeline to systematically expose vulnerabilities in DNA language models.

1 retrieved paper
Methodological insight combining prompt design and guided beam search

The authors propose a novel methodological approach that combines retrieving high-homology yet non-pathogenic sequences as prompts with beam search guided by pathogenicity prediction (PathoLM) and log-probability heuristics to steer DNA language models toward generating pathogen-like outputs.

1 retrieved paper

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

JailbreakDNABench benchmark for biosafety evaluation

The authors introduce JailbreakDNABench, a systematic benchmark consisting of six high-priority human viral categories (e.g., large DNA viruses, small DNA viruses, positive-strand RNA viruses) together with an evaluation pipeline using BLAST and function annotation to assess biosafety vulnerabilities of DNA language models.

Contribution

GeneBreaker jailbreak attack framework

The authors develop GeneBreaker, an end-to-end jailbreak framework that integrates an LLM agent for designing high-homology non-pathogenic prompts, beam search guided by PathoLM and log-probability heuristics, and a BLAST-based evaluation pipeline to systematically expose vulnerabilities in DNA language models.

Contribution

Methodological insight combining prompt design and guided beam search

The authors propose a novel methodological approach that combines retrieving high-homology yet non-pathogenic sequences as prompts with beam search guided by pathogenicity prediction (PathoLM) and log-probability heuristics to steer DNA language models toward generating pathogen-like outputs.