Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 6.5 Download Report PDF

dna language modelgene expression predictionmultimodal information integration

Gene expression prediction, which predicts mRNA expression levels from DNA sequences, presents significant challenges. Previous works often focus on extending input sequence length to locate distal enhancers, which may influence target genes from hundreds of kilobases away. Our work first reveals that for current models, long sequence modeling can decrease performance. Even carefully designed algorithms only mitigate the performance degradation caused by long sequences. Instead, we find that proximal multimodal epigenomic signals near target genes prove more essential. Hence we focus on how to better integrate these signals, which has been overlooked. We find that different signal types serve distinct biological roles, with some directly marking active regulatory elements while others reflect background chromatin patterns that may introduce confounding effects. Simple concatenation may lead models to develop spurious associations with these background patterns. To address this challenge, we propose Prism (Proximal regulatory integration of signals for mRNA expression levels prediction), a framework that learns multiple combinations of high-dimensional epigenomic features to represent distinct background chromatin states and uses backdoor adjustment to mitigate confounding effects. Our experimental results demonstrate that proper modeling of multimodal epigenomic signals achieves state-of-the-art performance using only short sequences for gene expression prediction.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes Prism, a framework for predicting mRNA expression levels by integrating proximal epigenomic signals with DNA sequences. It sits within the Comprehensive Multimodal Frameworks leaf of the taxonomy, which contains only three papers total including this work. This leaf focuses on integrating multiple epigenomic modalities beyond single-mark approaches. The sparse population of this specific leaf suggests the research direction—comprehensive multimodal integration with causal intervention techniques—remains relatively underexplored compared to broader sequence-based or single-modality integration approaches.

The taxonomy reveals substantial activity in neighboring areas. The parent branch, Multimodal Integration, includes separate leaves for Histone Modification Integration (four papers) and Chromatin Accessibility Integration (two papers), indicating that single-modality integration is more established. Adjacent branches show mature work in Sequence-Based Architectures (seven papers across three leaves) and Personalized Prediction (five papers). The paper's emphasis on proximal signals and causal modeling distinguishes it from enhancer-promoter interaction models, which focus on distal regulatory elements, and from purely sequence-driven transformer approaches that avoid explicit epigenomic feature integration.

Among the three contributions analyzed, the literature search examined twenty-one candidates total. The claim about long sequence modeling limitations examined ten candidates with one appearing to provide overlapping analysis. The identification of confounding background signals examined ten candidates with none clearly refuting this observation. The Prism framework's backdoor adjustment approach examined one candidate with one potential overlap. These statistics reflect a focused semantic search scope rather than exhaustive coverage. The confounding signal identification appears least contested among examined candidates, while the causal intervention framework shows the most direct prior work overlap within this limited search.

Based on the top-twenty-one semantic matches examined, the work appears to occupy a relatively sparse position within comprehensive multimodal integration, particularly regarding causal intervention techniques for epigenomic confounding. The analysis does not cover the full breadth of genomics literature, and the small candidate pool means potentially relevant work in causal inference or epigenomics may exist outside this search scope. The taxonomy structure suggests the field is actively developing multimodal approaches, with this work contributing specific methodological innovations to an emerging research direction.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: predicting gene expression levels from DNA sequences and epigenomic signals. The field has evolved into a rich landscape organized around several complementary directions. Sequence-based deep learning architectures explore how convolutional and transformer models can decode regulatory grammar directly from nucleotide sequences, with works like Transformers Gene Expression[1] demonstrating the power of attention mechanisms for capturing long-range dependencies. Multimodal integration frameworks combine DNA sequence with chromatin accessibility, histone modifications, and methylation data to capture the full regulatory context; representative efforts include EPInformer[8] and DNA Epigenetic Importance[39], which systematically weigh the contribution of different epigenomic marks. Personalized and population-level prediction branches address how genetic variation and individual epigenetic profiles shape expression, while specialized contexts target single-cell resolution, tissue-specific models, or non-model organisms. Methodological frameworks and benchmarking studies, such as Benchmarking Neural Networks[3], provide rigorous comparisons across architectures, and mechanistic branches aim to interpret learned features in terms of transcription factor binding and enhancer logic. A particularly active line of work focuses on comprehensive multimodal frameworks that integrate diverse data modalities to improve predictive accuracy and biological interpretability. Multimodal Gene Expression[0] sits squarely within this branch, emphasizing the joint modeling of sequence and epigenomic signals to capture regulatory complexity. This approach contrasts with purely sequence-driven methods and aligns closely with EPInformer[8], which also prioritizes multimodal fusion, though the two may differ in architectural choices or the specific epigenomic marks they emphasize. Meanwhile, DNA Epigenetic Importance[39] offers a complementary perspective by systematically quantifying feature importance, helping to clarify which signals matter most. Across these branches, key trade-offs revolve around model complexity versus interpretability, the challenge of integrating heterogeneous data types, and the need for large-scale benchmarks that span multiple tissues and conditions. The original work contributes to this ongoing effort by advancing multimodal integration strategies within a unified predictive framework.

Claimed Contributions

Revealing limitations of long sequence modeling for gene expression prediction

Can Refute

10 retrieved papers

The authors demonstrate through systematic experiments that current state space models (SSMs) do not benefit from extended sequence lengths in gene expression prediction, contrary to prevailing approaches. They show that models trained on 200k sequences rely primarily on proximal information and that performance degrades beyond 2k base pairs.

10 retrieved papers

Can Refute

Identification of confounding effects from background epigenomic signals

10 retrieved papers

The authors categorize epigenomic signals into foreground signals (like H3K27ac marking active regulatory elements) and background signals (like DNase-seq and Hi-C). They reveal that while background signals provide minimal standalone improvement, models develop over-dependence on them during training, creating spurious correlations rather than causal associations.

10 retrieved papers

Prism framework using backdoor adjustment for causal intervention

Can Refute

1 retrieved paper

The authors introduce Prism, a causal inference framework that learns diverse representations of background chromatin states through a confounder encoder and applies backdoor adjustment to perform causal intervention. This approach mitigates confounding effects from background signals while achieving state-of-the-art performance using only short sequences.

1 retrieved paper

Can Refute

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[8] EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data PDF

Jiecong Lin, Ruibang Luo, Luca Pinello (2024)

[39] Assessing comparative importance of DNA sequence and epigenetic modifications on gene expression using a deep convolutional neural network PDF

Shang Gao, J. Rehman, Gao Shang, Yang Dai, Jalees Rehman (2022)

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Revealing limitations of long sequence modeling for gene expression prediction

[61] An Interventional Framework of Multimodal Epigenomic Regulation for Gene Expression Prediction PDF

Can Refute

[62] Multi-level PEnet: A Robust Three-Stage Model for Parameter Estimation in Non-Gaussian Noise-Driven Stochastic Differential Equations: S. Li et al. PDF

Cannot Refute

[63] Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers PDF

Cannot Refute

[64] MTMixG-Net: mixture of Transformer and Mamba network with a dual-path gating mechanism for plant gene expression prediction PDF

Cannot Refute

[65] Stacked Ensemble Learning for Neuroblastoma Prediction Using Gene Expression Profiles PDF

Cannot Refute

[66] CLAP-HMM: a biologically constrained deep learning framework for resistance gene prediction in long DNA sequences PDF

Cannot Refute

[67] Advances of Deep Learning in Healthcare from Diagnosis to Decision Support PDF

Cannot Refute

[68] Effects of restrained degradation on gene expression and regulation PDF

Cannot Refute

[69] Improving Long-Horizon Forecasts with Expectation-Biased LSTM Networks PDF

Cannot Refute

[70] Stochastic models of gene expression with delayed degradation. PDF

Cannot Refute

Contribution

Identification of confounding effects from background epigenomic signals

[51] ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor â¦ PDF

Cannot Refute

[52] Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling PDF

Cannot Refute

[53] Coupled single-cell CRISPR screening and epigenomic profiling reveals causal gene regulatory networks PDF

Cannot Refute

[54] Identifying and mitigating bias in next-generation sequencing methods for chromatin biology PDF

Cannot Refute

[55] Chromatin insulators: regulatory mechanisms and epigenetic inheritance PDF

Cannot Refute

[56] Cell type-specific signal analysis in epigenome-wide association studies PDF

Cannot Refute

[57] Genetic drivers of epigenetic and transcriptional variation in human immune cells PDF

Cannot Refute

[58] Chromatin immunoprecipitation: optimization, quantitative analysis and data normalization PDF

Cannot Refute

[59] S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data PDF

Cannot Refute

[60] Regulated noise in the epigenetic landscape of development and disease PDF

Cannot Refute

Contribution

Prism framework using backdoor adjustment for causal intervention

[61] An Interventional Framework of Multimodal Epigenomic Regulation for Gene Expression Prediction PDF

Can Refute

Extending Sequence Length is Not All You Need: Effective Integration of Multimodal Signals for Gene Expression Prediction

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[8] EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data PDF

[39] Assessing comparative importance of DNA sequence and epigenetic modifications on gene expression using a deep convolutional neural network PDF

Contribution Analysis

Revealing limitations of long sequence modeling for gene expression prediction

[61] An Interventional Framework of Multimodal Epigenomic Regulation for Gene Expression Prediction PDF

[62] Multi-level PEnet: A Robust Three-Stage Model for Parameter Estimation in Non-Gaussian Noise-Driven Stochastic Differential Equations: S. Li et al. PDF

[63] Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers PDF

[64] MTMixG-Net: mixture of Transformer and Mamba network with a dual-path gating mechanism for plant gene expression prediction PDF

[65] Stacked Ensemble Learning for Neuroblastoma Prediction Using Gene Expression Profiles PDF

[66] CLAP-HMM: a biologically constrained deep learning framework for resistance gene prediction in long DNA sequences PDF

[67] Advances of Deep Learning in Healthcare from Diagnosis to Decision Support PDF

[68] Effects of restrained degradation on gene expression and regulation PDF

[69] Improving Long-Horizon Forecasts with Expectation-Biased LSTM Networks PDF

[70] Stochastic models of gene expression with delayed degradation. PDF

Identification of confounding effects from background epigenomic signals

[51] ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor â¦ PDF

[52] Peak calling by Sparse Enrichment Analysis for CUT&RUN chromatin profiling PDF

[53] Coupled single-cell CRISPR screening and epigenomic profiling reveals causal gene regulatory networks PDF

[54] Identifying and mitigating bias in next-generation sequencing methods for chromatin biology PDF

[55] Chromatin insulators: regulatory mechanisms and epigenetic inheritance PDF

[56] Cell type-specific signal analysis in epigenome-wide association studies PDF

[57] Genetic drivers of epigenetic and transcriptional variation in human immune cells PDF

[58] Chromatin immunoprecipitation: optimization, quantitative analysis and data normalization PDF

[59] S3norm: simultaneous normalization of sequencing depth and signal-to-noise ratio in epigenomic data PDF

[60] Regulated noise in the epigenetic landscape of development and disease PDF

Prism framework using backdoor adjustment for causal intervention

[61] An Interventional Framework of Multimodal Epigenomic Regulation for Gene Expression Prediction PDF

Table of Contents

[51] ChromBPNet: bias factorized, base-resolution deep learning models of chromatin accessibility reveal cis-regulatory sequence syntax, transcription factor â¦ PDF