Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction

ICLR 2026 Conference SubmissionAnonymous Authors
dna language modelgene expression predictionmultimodal information integration
Abstract:

Gene expression prediction, which predicts mRNA expression levels from DNA sequences, presents significant challenges. Previous works often focus on extending input sequence length to locate distal enhancers, which may influence target genes from hundreds of kilobases away. Our work first reveals that for current models, long sequence modeling can decrease performance. Even carefully designed algorithms only mitigate the performance degradation caused by long sequences. Instead, we find that proximal multimodal epigenomic signals near target genes prove more essential. Hence we focus on how to better integrate these signals, which has been overlooked. We find that different signal types serve distinct biological roles, with some directly marking active regulatory elements while others reflect background chromatin patterns that may introduce confounding effects. Simple concatenation may lead models to develop spurious associations with these background patterns. To address this challenge, we propose Prism (Proximal regulatory integration of signals for mRNA expression levels prediction), a framework that learns multiple combinations of high-dimensional epigenomic features to represent distinct background chromatin states and uses backdoor adjustment to mitigate confounding effects. Our experimental results demonstrate that proper modeling of multimodal epigenomic signals achieves state-of-the-art performance using only short sequences for gene expression prediction.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Prism, a framework for predicting mRNA expression levels by integrating proximal epigenomic signals with DNA sequences. It sits within the Comprehensive Multimodal Frameworks leaf of the taxonomy, which contains only three papers total including this work. This leaf focuses on integrating multiple epigenomic modalities beyond single-mark approaches. The sparse population of this specific leaf suggests the research direction—comprehensive multimodal integration with causal intervention techniques—remains relatively underexplored compared to broader sequence-based or single-modality integration approaches.

The taxonomy reveals substantial activity in neighboring areas. The parent branch, Multimodal Integration, includes separate leaves for Histone Modification Integration (four papers) and Chromatin Accessibility Integration (two papers), indicating that single-modality integration is more established. Adjacent branches show mature work in Sequence-Based Architectures (seven papers across three leaves) and Personalized Prediction (five papers). The paper's emphasis on proximal signals and causal modeling distinguishes it from enhancer-promoter interaction models, which focus on distal regulatory elements, and from purely sequence-driven transformer approaches that avoid explicit epigenomic feature integration.

Among the three contributions analyzed, the literature search examined twenty-one candidates total. The claim about long sequence modeling limitations examined ten candidates with one appearing to provide overlapping analysis. The identification of confounding background signals examined ten candidates with none clearly refuting this observation. The Prism framework's backdoor adjustment approach examined one candidate with one potential overlap. These statistics reflect a focused semantic search scope rather than exhaustive coverage. The confounding signal identification appears least contested among examined candidates, while the causal intervention framework shows the most direct prior work overlap within this limited search.

Based on the top-twenty-one semantic matches examined, the work appears to occupy a relatively sparse position within comprehensive multimodal integration, particularly regarding causal intervention techniques for epigenomic confounding. The analysis does not cover the full breadth of genomics literature, and the small candidate pool means potentially relevant work in causal inference or epigenomics may exist outside this search scope. The taxonomy structure suggests the field is actively developing multimodal approaches, with this work contributing specific methodological innovations to an emerging research direction.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
21
Contribution Candidate Papers Compared
2
Refutable Paper

Research Landscape Overview

Core task: predicting gene expression levels from DNA sequences and epigenomic signals. The field has evolved into a rich landscape organized around several complementary directions. Sequence-based deep learning architectures explore how convolutional and transformer models can decode regulatory grammar directly from nucleotide sequences, with works like Transformers Gene Expression[1] demonstrating the power of attention mechanisms for capturing long-range dependencies. Multimodal integration frameworks combine DNA sequence with chromatin accessibility, histone modifications, and methylation data to capture the full regulatory context; representative efforts include EPInformer[8] and DNA Epigenetic Importance[39], which systematically weigh the contribution of different epigenomic marks. Personalized and population-level prediction branches address how genetic variation and individual epigenetic profiles shape expression, while specialized contexts target single-cell resolution, tissue-specific models, or non-model organisms. Methodological frameworks and benchmarking studies, such as Benchmarking Neural Networks[3], provide rigorous comparisons across architectures, and mechanistic branches aim to interpret learned features in terms of transcription factor binding and enhancer logic. A particularly active line of work focuses on comprehensive multimodal frameworks that integrate diverse data modalities to improve predictive accuracy and biological interpretability. Multimodal Gene Expression[0] sits squarely within this branch, emphasizing the joint modeling of sequence and epigenomic signals to capture regulatory complexity. This approach contrasts with purely sequence-driven methods and aligns closely with EPInformer[8], which also prioritizes multimodal fusion, though the two may differ in architectural choices or the specific epigenomic marks they emphasize. Meanwhile, DNA Epigenetic Importance[39] offers a complementary perspective by systematically quantifying feature importance, helping to clarify which signals matter most. Across these branches, key trade-offs revolve around model complexity versus interpretability, the challenge of integrating heterogeneous data types, and the need for large-scale benchmarks that span multiple tissues and conditions. The original work contributes to this ongoing effort by advancing multimodal integration strategies within a unified predictive framework.

Claimed Contributions

Revealing limitations of long sequence modeling for gene expression prediction

The authors demonstrate through systematic experiments that current state space models (SSMs) do not benefit from extended sequence lengths in gene expression prediction, contrary to prevailing approaches. They show that models trained on 200k sequences rely primarily on proximal information and that performance degrades beyond 2k base pairs.

10 retrieved papers
Can Refute
Identification of confounding effects from background epigenomic signals

The authors categorize epigenomic signals into foreground signals (like H3K27ac marking active regulatory elements) and background signals (like DNase-seq and Hi-C). They reveal that while background signals provide minimal standalone improvement, models develop over-dependence on them during training, creating spurious correlations rather than causal associations.

10 retrieved papers
Prism framework using backdoor adjustment for causal intervention

The authors introduce Prism, a causal inference framework that learns diverse representations of background chromatin states through a confounder encoder and applies backdoor adjustment to perform causal intervention. This approach mitigates confounding effects from background signals while achieving state-of-the-art performance using only short sequences.

1 retrieved paper
Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Revealing limitations of long sequence modeling for gene expression prediction

The authors demonstrate through systematic experiments that current state space models (SSMs) do not benefit from extended sequence lengths in gene expression prediction, contrary to prevailing approaches. They show that models trained on 200k sequences rely primarily on proximal information and that performance degrades beyond 2k base pairs.

Contribution

Identification of confounding effects from background epigenomic signals

The authors categorize epigenomic signals into foreground signals (like H3K27ac marking active regulatory elements) and background signals (like DNase-seq and Hi-C). They reveal that while background signals provide minimal standalone improvement, models develop over-dependence on them during training, creating spurious correlations rather than causal associations.

Contribution

Prism framework using backdoor adjustment for causal intervention

The authors introduce Prism, a causal inference framework that learns diverse representations of background chromatin states through a confounder encoder and applies backdoor adjustment to perform causal intervention. This approach mitigates confounding effects from background signals while achieving state-of-the-art performance using only short sequences.