The Human Genomics Long-Range Benchmark: Advancing DNA Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.0 Download Report PDF

Language ModelsDNADNA LMsBenchmark

The advent of language models (LMs) in genomics necessitates benchmarks that can assess models’ capabilities and limitations. In contrast to protein models, DNA LMs can be used to study non-coding regions of the genome and must account for unique challenges, especially interactions across long sequence lengths. However, existing benchmarks for DNA LMs are defined over short sequence datasets and can involve tasks that are not considered to be biologically meaningful. Here, we present the Human Genomics Long-Range Benchmark (LRB), which focuses on biologically meaningful tasks and supports long-range contexts. We complement our benchmark with fine-tuning recipes that meaningfully improve performance. We evaluate DNA LMs across nine compiled human genome tasks and observe that they achieve competitive performance relative to supervised baselines on several tasks (e.g., genome annotation), but there remains a significant gap in domains, such as variant effect and gene expression prediction. Additionally, we introduce a visualization tool to examine model performance split by genomic properties.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces a benchmark suite focused on long-range genomic tasks for DNA language models, emphasizing biologically meaningful evaluations across nine human genome tasks. It resides in the 'Long-Range Genomic Task Benchmarks' leaf, which contains four papers total including this work. This represents a moderately populated research direction within the broader benchmarking landscape, suggesting active but not overcrowded interest in evaluating models on extended genomic contexts that require capturing dependencies across thousands to millions of base pairs.

The taxonomy reveals neighboring evaluation frameworks with distinct emphases: 'General Benchmark Suites' (three papers) cover diverse tasks without long-range focus, while 'Regulatory DNA Benchmarks' (two papers) target chromatin accessibility and transcription factor binding. The original work bridges these by selecting biologically meaningful long-range tasks rather than comprehensive short-context coverage. Its sibling papers in the same leaf (DNALongBench, DART-Eval, and one other) share the long-range evaluation goal but may differ in task selection, species focus, or evaluation protocols, positioning this work within an emerging subfield addressing context-length challenges.

Among thirty candidates examined, none clearly refuted the three main contributions: the benchmark suite itself, fine-tuning recipes, and the visualization tool. For each contribution, ten candidates were reviewed with zero refutable overlaps identified. This suggests that within the limited search scope, the specific combination of human-focused long-range tasks, accompanying fine-tuning strategies, and genomic property visualization appears relatively distinct. However, the analysis explicitly covers top-K semantic matches and citation expansion, not an exhaustive literature review, leaving open the possibility of unexamined overlapping work.

Given the limited search scope and the moderately populated taxonomy leaf, the work appears to offer a focused contribution to long-range genomic benchmarking. The absence of refutable candidates among thirty examined suggests novelty in the specific task compilation and methodological recipes, though the broader concept of long-range DNA model evaluation is shared with sibling papers. The analysis does not capture potential overlaps outside the top-thirty semantic matches or recent preprints.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: Benchmarking DNA language models on long-range genomic tasks. The field of DNA language modeling has matured into a structured landscape with several distinct branches. DNA Language Model Architectures and Pre-training encompasses foundational models such as HyenaDNA[4], GENA-LM[2], and Evo[16], which explore diverse architectural choices from transformers to state space models for capturing genomic sequences at scale. Benchmarking Frameworks and Evaluation Methodologies focuses on systematic assessment tools like BEND[8], DNALongBench[7], and DART-Eval[14], which provide standardized tasks to compare model performance across regulatory prediction, variant effect estimation, and other genomic challenges. Application-Specific Models and Downstream Tasks targets specialized problems such as gene expression prediction, RNA structure modeling with FlashRNA[20], and variant interpretation, while Model Interpretation and Functional Understanding investigates how these models learn biological signals and dependencies. Theoretical and Methodological Foundations addresses core questions about tokenization strategies, representational power, and the mathematical underpinnings that enable effective genomic sequence modeling. A particularly active line of work centers on developing comprehensive benchmarks that stress-test models on tasks requiring long-range context, where dependencies span thousands or even millions of base pairs. Genomics Long-Range Benchmark[0] sits squarely within this effort, joining DNALongBench[7] and related frameworks in pushing models beyond local motif recognition toward genome-scale understanding. While Advancing DNA LMs[3] surveys broader architectural trends and DNA Foundation Benchmarking[6] examines general-purpose evaluation, the original work emphasizes the unique challenges of long-range tasks where models like HyenaDNA[4] and Evo[16] demonstrate advantages over traditional transformers. This focus on extended context contrasts with benchmarks like BEND[8], which cover a wider variety of shorter-range regulatory tasks, highlighting an ongoing tension between breadth of evaluation and depth in specific challenging regimes.

Claimed Contributions

Human Genomics Long-Range Benchmark (LRB)

10 retrieved papers

A benchmark compilation of biologically meaningful tasks in human genomics that deliberately incorporates tasks spanning both short and long genomic contexts, allowing users to select arbitrary sequence length inputs for any dataset to empirically understand the importance of long-range inputs.

10 retrieved papers

Fine-tuning recipes for DNA language models

10 retrieved papers

The authors provide fine-tuning approaches that demonstrate the benefit of full model fine-tuning compared to previous methods that keep backbone DNA LM weights frozen during downstream training, achieving meaningful performance improvements.

10 retrieved papers

Visualization tool for genomic property analysis

10 retrieved papers

A tool that allows users to analyze model performance results in detail by examining how performance varies across different genomic properties and annotations.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[3] Advancing dna language models: The genomics long-range benchmark PDF

CH Kao, E Trop, MK Polen (2024)

[7] Dnalongbench: a benchmark suite for long-range dna prediction tasks PDF

Zhenqiao Song, Yang Zhang, Shike Wang, Danqing Wang, Muyu Yang, Lei Li, Wenduo Cheng, Jian Ma (2025)

[39] The Genomics Long-Range Benchmark: Advancing DNA Language Models PDF

E Trop, Y Schiff, EM Marroquin, CH Kao, A Gokaslan (2024)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Human Genomics Long-Range Benchmark (LRB)

[4] Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution PDF

Cannot Refute

[13] Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling PDF

Cannot Refute

[15] Genome modeling and design across all domains of life with Evo 2 PDF

Cannot Refute

[22] GENERator: a long-context generative genomic foundation model PDF

Cannot Refute

[57] Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity PDF

Cannot Refute

[58] Long Range Graph Benchmark PDF

Cannot Refute

[59] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA PDF

Cannot Refute

[60] Unlimiformer: Long-Range Transformers with Unlimited Length Input PDF

Cannot Refute

[61] Benchmarking challenging small variants with linked and long reads PDF

Cannot Refute

[62] scLong: A billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics PDF

Cannot Refute

Contribution

Fine-tuning recipes for DNA language models

[2] GENA-LM: a family of open-source foundational DNA language models for long sequences PDF

Cannot Refute

[16] Sequence modeling and design from molecular to genome scale with Evo PDF

Cannot Refute

[26] Nucleotide Transformer: building and evaluating robust foundation models for human genomics PDF

Cannot Refute

[38] Evaluating the representational power of pre-trained DNA language models for regulatory genomics. PDF

Cannot Refute

[51] Pre-trained language models in biomedical domain: A systematic survey PDF

Cannot Refute

[52] seqLens: optimizing language models for genomic predictions PDF

Cannot Refute

[53] Genomic language models: opportunities and challenges PDF

Cannot Refute

[54] PDLLMs: A group of tailored DNA large language models for analyzing plant genomes PDF

Cannot Refute

[55] Efficient and scalable fine-tune of language models for genome understanding PDF

Cannot Refute

[56] Language models for controllable dna sequence design PDF

Cannot Refute

Contribution

Visualization tool for genomic property analysis

A tool that allows users to analyze model performance results in detail by examining how performance varies across different genomic properties and annotations.

[25] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome PDF

Cannot Refute

[63] Proksee: in-depth characterization and visualization of bacterial genomes PDF

Cannot Refute

[64] JBrowse: a dynamic web platform for genome visualization and analysis PDF

Cannot Refute

[65] TCGEx: a powerful visual interface for exploring and analyzing cancer gene expression data PDF

Cannot Refute

[66] Tasks, techniques, and tools for genomic data visualization PDF

Cannot Refute

[67] CPJSdraw: analysis and visualization of junction sites of chloroplast genomes PDF

Cannot Refute

[68] DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis PDF

Cannot Refute

[69] Ensemble AnalySis with Interpretable Genomic Prediction (EasiGP): Computational tool for interpreting ensembles of genomic prediction models PDF

Cannot Refute

[70] Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks PDF

Cannot Refute

[71] In Silico Validation of OncoOrigin: An Integrative AI Tool for Primary Cancer Site Prediction with Graphical User Interface to Facilitate Clinical Application PDF

Cannot Refute

The Human Genomics Long-Range Benchmark: Advancing DNA Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[3] Advancing dna language models: The genomics long-range benchmark PDF

[7] Dnalongbench: a benchmark suite for long-range dna prediction tasks PDF

[39] The Genomics Long-Range Benchmark: Advancing DNA Language Models PDF

Contribution Analysis

Human Genomics Long-Range Benchmark (LRB)

[4] Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution PDF

[13] Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling PDF

[15] Genome modeling and design across all domains of life with Evo 2 PDF

[22] GENERator: a long-context generative genomic foundation model PDF

[57] Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity PDF

[58] Long Range Graph Benchmark PDF

[59] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA PDF

[60] Unlimiformer: Long-Range Transformers with Unlimited Length Input PDF

[61] Benchmarking challenging small variants with linked and long reads PDF

[62] scLong: A billion-parameter foundation model for capturing long-range gene context in single-cell transcriptomics PDF

Fine-tuning recipes for DNA language models

[2] GENA-LM: a family of open-source foundational DNA language models for long sequences PDF

[16] Sequence modeling and design from molecular to genome scale with Evo PDF

[26] Nucleotide Transformer: building and evaluating robust foundation models for human genomics PDF

[38] Evaluating the representational power of pre-trained DNA language models for regulatory genomics. PDF

[51] Pre-trained language models in biomedical domain: A systematic survey PDF

[52] seqLens: optimizing language models for genomic predictions PDF

[53] Genomic language models: opportunities and challenges PDF

[54] PDLLMs: A group of tailored DNA large language models for analyzing plant genomes PDF

[55] Efficient and scalable fine-tune of language models for genome understanding PDF

[56] Language models for controllable dna sequence design PDF

Visualization tool for genomic property analysis

[25] DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome PDF

[63] Proksee: in-depth characterization and visualization of bacterial genomes PDF

[64] JBrowse: a dynamic web platform for genome visualization and analysis PDF

[65] TCGEx: a powerful visual interface for exploring and analyzing cancer gene expression data PDF

[66] Tasks, techniques, and tools for genomic data visualization PDF

[67] CPJSdraw: analysis and visualization of junction sites of chloroplast genomes PDF

[68] DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis PDF

[69] Ensemble AnalySis with Interpretable Genomic Prediction (EasiGP): Computational tool for interpreting ensembles of genomic prediction models PDF

[70] Deep motif dashboard: visualizing and understanding genomic sequences using deep neural networks PDF

[71] In Silico Validation of OncoOrigin: An Integrative AI Tool for Primary Cancer Site Prediction with Graphical User Interface to Facilitate Clinical Application PDF

Table of Contents