BioBO: Biology-informed Bayesian Optimization for Perturbation Design

ICLR 2026 Conference SubmissionAnonymous Authors
Bayesian optimization; Biological priors; Perturbation design
Abstract:

Efficient design of genomic perturbation experiments is crucial for accelerating drug discovery and therapeutic target identification, yet exhaustive perturbation of the human genome remains infeasible due to the vast search space of potential genetic interactions and experimental constraints. Bayesian optimization (BO) has emerged as a powerful framework for selecting informative interventions, but existing approaches often fail to exploit domain-specific biological prior knowledge. We propose Biology-Informed Bayesian Optimization (BioBO), a method that integrates Bayesian optimization with multimodal gene embeddings and enrichment analysis, a widely used tool for gene prioritization in biology, to enhance surrogate modeling and acquisition strategies. BioBO combines biologically grounded priors with acquisition functions in a principled framework, which biases the search toward promising genes while maintaining the ability to explore uncertain regions. Through experiments on established public benchmarks and datasets, we demonstrate that BioBO improves labeling efficiency by 25-40%, and consistently outperforms conventional BO by identifying top-performing perturbations more effectively. Moreover, by incorporating enrichment analysis, BioBO yields pathway-level explanations for selected perturbations, offering mechanistic interpretability that links designs to biologically coherent regulatory circuits.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper proposes BioBO, a Bayesian optimization framework that integrates multimodal gene embeddings and enrichment analysis to guide genomic perturbation experiments. It resides in the Biology-Informed Bayesian Optimization leaf, which contains only two papers including this one. This leaf sits within the broader Bayesian Optimization Frameworks branch, indicating a relatively sparse research direction focused specifically on incorporating biological priors into optimization strategies. The small cluster size suggests this integration of domain knowledge into BO for genomics is an emerging rather than saturated area.

The taxonomy reveals that BioBO's closest neighbors are General Bayesian Optimization Approaches, which apply BO to experimental design without domain-specific biological integration. The sibling leaf contains three papers addressing multiscale circuits and generic acquisition functions. Beyond the Bayesian Optimization branch, related work appears in Network Inference categories that use perturbation data to learn causal structures, and in AI Agents that employ LLMs for experiment design. BioBO diverges from these by maintaining classical BO machinery while enriching it with biological embeddings and pathway priors, rather than pursuing network reconstruction or autonomous reasoning.

Among sixteen candidates examined across three contributions, none were found to clearly refute the proposed methods. The enrichment-analysis-augmented acquisition function examined ten candidates with zero refutable overlaps, while the combined BioBO framework examined six candidates with similar results. The multimodal gene embeddings contribution examined zero candidates, suggesting limited prior work in this specific integration. These statistics reflect a top-K semantic search scope, not an exhaustive literature review. The absence of refutable candidates among this limited set suggests the specific combination of multimodal embeddings and enrichment-based acquisition may be relatively unexplored.

Based on the limited search scope of sixteen candidates, the work appears to occupy a sparsely populated niche at the intersection of Bayesian optimization and biological prior integration. The taxonomy structure confirms that biology-informed BO is a small cluster, and the contribution-level analysis found no clear precedents among examined papers. However, the search scale is modest and focused on semantic similarity, leaving open the possibility of relevant work in adjacent domains such as active learning with biological features or transfer learning in genomics.

Taxonomy

Core-task Taxonomy Papers
13
3
Claimed Contributions
16
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: Genomic perturbation experiment design using Bayesian optimization. The field centers on selecting which genetic interventions to perform next in order to maximize information gain or achieve specific biological objectives under resource constraints. The taxonomy reveals four main branches. Bayesian Optimization Frameworks for Perturbation Design focuses on adapting classical acquisition functions and surrogate models to biological contexts, often incorporating domain knowledge such as gene networks or pathway structure. AI and Machine Learning Agents for Perturbation Design explores how autonomous agents and large language models can propose experiments by reasoning over biological literature and data. Network Inference and Causal Discovery with Perturbations emphasizes learning gene regulatory networks or causal graphs from interventional data, treating experiment design as a means to resolve graph structure. Study Design Optimization and Model Selection addresses broader questions of sample size, batch allocation, and choosing among competing statistical models. Together, these branches reflect a spectrum from purely algorithmic optimization to biologically grounded inference. Recent work highlights trade-offs between generality and domain specificity. Some studies pursue generic Bayesian optimization methods that can be applied across experimental platforms, while others tailor acquisition strategies to genomic constraints such as combinatorial perturbations or pathway priors. BioBO[0] sits within the Biology-Informed Bayesian Optimization cluster, emphasizing the integration of biological priors—such as known gene interactions or pathway annotations—into the optimization loop. This contrasts with more agnostic approaches that treat genes as abstract features. Sequential optimal experimental design[3], a close neighbor, similarly advocates for iterative, information-theoretic experiment selection but may place less emphasis on encoding biological structure directly. Open questions remain around scalability to high-dimensional perturbation spaces, the robustness of learned surrogates when biological assumptions are violated, and how to balance exploration of novel gene functions against exploitation of well-characterized pathways.

Claimed Contributions

Multimodal gene embeddings for improved surrogate modeling in Bayesian optimization

The authors propose using multimodal gene representations (combining Gene2Vec, GenePT, and Achilles embeddings) instead of single-modality embeddings to enhance the surrogate model in Bayesian optimization. This fusion improves predictive performance near the optimum, leading to better experimental designs.

0 retrieved papers
Enrichment-analysis-augmented acquisition function within the π-BO framework

The authors integrate enrichment analysis as a biological prior into the acquisition function using the π-BO framework. This approach incorporates domain knowledge from pathway databases while maintaining principled exploration-exploitation trade-offs and provides interpretable pathway-level explanations for selected perturbations.

10 retrieved papers
BioBO framework combining multimodal embeddings and enrichment priors

BioBO is a unified framework that combines multimodal gene representations with enrichment-analysis-based priors to guide perturbation design. The method improves labeling efficiency by 25-40% and provides mechanistic interpretability by linking designs to biologically coherent regulatory circuits.

6 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

Multimodal gene embeddings for improved surrogate modeling in Bayesian optimization

The authors propose using multimodal gene representations (combining Gene2Vec, GenePT, and Achilles embeddings) instead of single-modality embeddings to enhance the surrogate model in Bayesian optimization. This fusion improves predictive performance near the optimum, leading to better experimental designs.

Contribution

Enrichment-analysis-augmented acquisition function within the π-BO framework

The authors integrate enrichment analysis as a biological prior into the acquisition function using the π-BO framework. This approach incorporates domain knowledge from pathway databases while maintaining principled exploration-exploitation trade-offs and provides interpretable pathway-level explanations for selected perturbations.

Contribution

BioBO framework combining multimodal embeddings and enrichment priors

BioBO is a unified framework that combines multimodal gene representations with enrichment-analysis-based priors to guide perturbation design. The method improves labeling efficiency by 25-40% and provides mechanistic interpretability by linking designs to biologically coherent regulatory circuits.