AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 7.0 Download Report PDF

roboticsrobot learningvision language action modelbiology experimental operationAI for science

Vision-language-action (VLA) models have shown promise as generalist robotic policies by jointly leveraging visual, linguistic, and proprioceptive modalities to generate action trajectories. While recent benchmarks have advanced VLA research in domestic tasks, professional science-oriented domains remain underexplored. We introduce AutoBio, a simulation framework and benchmark designed to evaluate robotic automation in biology laboratory environments—an application domain that combines structured protocols with demanding precision and multimodal interaction. AutoBio extends existing simulation capabilities through a pipeline for digitizing real-world laboratory instruments, specialized physics plugins for mechanisms ubiquitous in laboratory workflows, and a rendering stack that support dynamic instrument interfaces and transparent materials through physically based rendering. Our benchmark comprises biologically grounded tasks spanning three difficulty levels, enabling standardized evaluation of language-guided robotic manipulation in experimental protocols. We provide infrastructure for demonstration generation and seamless integration with VLA models. Baseline evaluations with SOTA VLA models reveal significant gaps in precision manipulation, visual reasoning, and instruction following in scientific workflows. By releasing AutoBio, we aim to catalyze research on generalist robotic systems for complex, high-precision, and multimodal professional environments.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper introduces AutoBio, a simulation framework and benchmark for evaluating vision-language-action models in biology laboratory automation. Within the taxonomy, it occupies the 'Simulation and Benchmarking Frameworks' leaf under 'Software and Computational Infrastructure'. Notably, this leaf contains only one paper—AutoBio itself—indicating a sparse research direction. The broader parent branch includes six papers on AI/ML for laboratory automation and six on workflow orchestration, but no other work explicitly focuses on simulation-based benchmarking for robotic biology tasks.

The taxonomy reveals that most related work concentrates on physical robotic platforms (eight general-purpose systems, four specialized systems) or AI-driven workflow tools (six papers applying LLMs and machine learning to protocol generation). AutoBio bridges these areas by providing a virtual testbed for evaluating VLA models before physical deployment. Its closest conceptual neighbors are AI/ML papers like 'LLMs Robotic Scripts' and 'Design Build Test Learn', which explore computational approaches to laboratory automation but do not offer standardized simulation environments or benchmarks for systematic evaluation.

Among thirty candidates examined across three contributions, none were identified as clearly refuting AutoBio's claims. The simulator contribution examined ten candidates with zero refutable overlaps; the benchmark contribution similarly found no prior work providing biologically grounded VLA evaluation tasks in simulated laboratory settings; and the systematic VLA evaluation examined ten candidates without encountering existing assessments of vision-language-action models in scientific domains. This suggests that within the limited search scope, the combination of biology-specific simulation infrastructure, standardized benchmarking tasks, and VLA model evaluation represents a relatively unexplored intersection.

The analysis reflects a top-30 semantic search plus citation expansion, not an exhaustive literature review. While the taxonomy shows active research in physical laboratory robotics and AI-driven protocol generation, the specific niche of simulation-based benchmarking for VLA models in biology appears underrepresented. The absence of sibling papers in the same taxonomy leaf and the lack of refutable candidates across all contributions suggest novelty within the examined scope, though broader searches in robotics simulation or general VLA benchmarking domains may reveal additional context.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: robotic automation in digital biology laboratory environments. The field encompasses a diverse set of concerns that span physical hardware—ranging from liquid handlers and mobile robots to specialized single-cell manipulators—and the software infrastructure needed to orchestrate these systems. The taxonomy reflects this breadth through five main branches: Robotic Hardware and Physical Automation Systems addresses the mechanical platforms and instrumentation (e.g., Physical Laboratory Automation[1], Modular Robotic Platform[14]); Software and Computational Infrastructure covers middleware, simulation tools, and data management layers (e.g., Property Graph Metadata[12], Genesis DB[24]); Application Domains and Experimental Workflows examines domain-specific deployments in synthetic biology, drug discovery, and diagnostics (e.g., Cell Free Biosensors[20], Accelerating Drug Discovery[28]); Sociotechnical and Organizational Perspectives explores human factors and adoption barriers (e.g., Researchers Perceptions Automation[27]); and Cross-Cutting Reviews and Surveys synthesizes overarching trends (e.g., Self-Driving Labs[32], Robotics Revolutionizing Research[2]). A particularly active line of work focuses on simulation and benchmarking frameworks that enable researchers to prototype and validate automated workflows before deploying them on expensive physical equipment. AutoBio[0] sits squarely within this Software and Computational Infrastructure branch, providing a simulation environment for digital biology experiments. Its emphasis on virtual testbeds contrasts with hardware-centric efforts like RoboCulture[3] or Mobile Robots Workflows[4], which prioritize physical integration and real-world deployment. Meanwhile, works such as LLMs Robotic Scripts[8] and Design Build Test Learn[9] explore how computational tools—including large language models and closed-loop optimization—can streamline protocol generation and experimental iteration. The interplay between simulation platforms like AutoBio[0] and these emerging AI-driven approaches highlights an open question: how to balance the fidelity of virtual models with the practical constraints of wet-lab execution, ensuring that insights from simulation translate reliably into reproducible biological discoveries.

Claimed Contributions

AutoBio simulator for biology laboratory environments

10 retrieved papers

The authors develop a specialized simulation framework that extends existing capabilities through a pipeline for digitizing real-world laboratory instruments using 3D Gaussian Splatting, custom physics plugins for laboratory-specific mechanisms (thread, detent, eccentric mechanisms, and quasi-static liquids), and a rendering stack supporting dynamic instrument interfaces and transparent materials via physically based rendering.

10 retrieved papers

AutoBio benchmark with biologically grounded tasks

10 retrieved papers

The authors introduce a benchmark consisting of 16 tasks across three difficulty levels (easy, medium, hard) that evaluate robotic automation in laboratory protocols. The benchmark includes infrastructure for demonstration generation and seamless integration with VLA models, enabling standardized evaluation of precision control, instruction following, and visual reasoning in scientific workflows.

10 retrieved papers

Systematic evaluation revealing VLA model limitations in scientific settings

10 retrieved papers

The authors conduct comprehensive evaluations of state-of-the-art VLA models (π0, π0.5, and RDT) on the AutoBio benchmark, systematically identifying critical limitations in current approaches including cross-modal grounding errors, visual reasoning limitations, and lack of closed-loop recovery in contact-rich tasks, suggesting directions for future improvements in model architecture and training methodologies.

10 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Within the taxonomy built over the current TopK core-task papers, the original paper is assigned to a leaf with no direct siblings and no cousin branches under the same grandparent topic. In this retrieved landscape, it appears structurally isolated, which is one partial signal of novelty, but still constrained by search coverage and taxonomy granularity.

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

AutoBio simulator for biology laboratory environments

[61] AR/VR Digital Twin for Simulation and Data Collection of Robotic Environments PDF

Cannot Refute

[62] Design of a Simulation System for Quadruped Robot based on Gazebo PDF

Cannot Refute

[63] Towards robotic Laboratory Automation Plug & Play: LAPP reference implementation with the TIAGo mobile manipulator PDF

Cannot Refute

[64] Hybrid control paradigm for exploring VR teleoperation and DRL-driven autonomy in mobile robotics PDF

Cannot Refute

[65] Chemistry3d: Robotic interaction benchmark for chemistry experiments PDF

Cannot Refute

[66] The design and application of IRobotQ3D for simulating robotics experiments in Kâ12 education PDF

Cannot Refute

[67] Virtual Reality with Haptic Gloves for Human-robot Collaborative Assembly PDF

Cannot Refute

[68] Robotic surgery: the impact of simulation and other innovative platforms on performance and training PDF

Cannot Refute

[69] VR Co-Lab: A Virtual Reality Platform for HumanâRobot Disassembly Training and Synthetic Data Generation PDF

Cannot Refute

[70] A low-cost table-top robot platform for measurement science education in robotics and artificial intelligence PDF

Cannot Refute

Contribution

AutoBio benchmark with biologically grounded tasks

[51] Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks PDF

Cannot Refute

[52] Scaling up and distilling down: Language-guided robot skill acquisition PDF

Cannot Refute

[53] Contrastive imitation learning for language-guided multi-task robotic manipulation PDF

Cannot Refute

[54] Interactive language: Talking to robots in real time PDF

Cannot Refute

[55] Language-Conditioned Robotic Manipulation with Fast and Slow Thinking PDF

Cannot Refute

[56] Lanmp: A language-conditioned mobile manipulation benchmark for autonomous robots PDF

Cannot Refute

[57] Vision language action models in robotic manipulation: A systematic review PDF

Cannot Refute

[58] Wildlma: Long horizon loco-manipulation in the wild PDF

Cannot Refute

[59] Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation PDF

Cannot Refute

[60] Long-Horizon Language-Conditioned Imitation Learning for Robotic Manipulation PDF

Cannot Refute

Contribution

Systematic evaluation revealing VLA model limitations in scientific settings

[57] Vision language action models in robotic manipulation: A systematic review PDF

Cannot Refute

[71] A survey on vision-language-action models for embodied ai PDF

Cannot Refute

[72] TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation PDF

Cannot Refute

[73] OpenVLA: An Open-Source Vision-Language-Action Model PDF

Cannot Refute

[74] Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation PDF

Cannot Refute

[75] Fine-tuning large vision-language models as decision-making agents via reinforcement learning PDF

Cannot Refute

[76] Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation PDF

Cannot Refute

[77] Tgrpo: Fine-tuning vision-language-action model via trajectory-wise group relative policy optimization PDF

Cannot Refute

[78] Recipe for Vision-Language-Action Models in Robotic Manipulation: A Survey PDF

Cannot Refute

[79] MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation PDF

Cannot Refute

AutoBio: A Simulation and Benchmark for Robotic Automation in Digital Biology Laboratory

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

Contribution Analysis

AutoBio simulator for biology laboratory environments

[61] AR/VR Digital Twin for Simulation and Data Collection of Robotic Environments PDF

[62] Design of a Simulation System for Quadruped Robot based on Gazebo PDF

[63] Towards robotic Laboratory Automation Plug & Play: LAPP reference implementation with the TIAGo mobile manipulator PDF

[64] Hybrid control paradigm for exploring VR teleoperation and DRL-driven autonomy in mobile robotics PDF

[65] Chemistry3d: Robotic interaction benchmark for chemistry experiments PDF

[66] The design and application of IRobotQ3D for simulating robotics experiments in Kâ12 education PDF

[67] Virtual Reality with Haptic Gloves for Human-robot Collaborative Assembly PDF

[68] Robotic surgery: the impact of simulation and other innovative platforms on performance and training PDF

[69] VR Co-Lab: A Virtual Reality Platform for HumanâRobot Disassembly Training and Synthetic Data Generation PDF

[70] A low-cost table-top robot platform for measurement science education in robotics and artificial intelligence PDF

AutoBio benchmark with biologically grounded tasks

[51] Vlabench: A large-scale benchmark for language-conditioned robotics manipulation with long-horizon reasoning tasks PDF

[52] Scaling up and distilling down: Language-guided robot skill acquisition PDF

[53] Contrastive imitation learning for language-guided multi-task robotic manipulation PDF

[54] Interactive language: Talking to robots in real time PDF

[55] Language-Conditioned Robotic Manipulation with Fast and Slow Thinking PDF

[56] Lanmp: A language-conditioned mobile manipulation benchmark for autonomous robots PDF

[57] Vision language action models in robotic manipulation: A systematic review PDF

[58] Wildlma: Long horizon loco-manipulation in the wild PDF

[59] Bridging Language and Action: A Survey of Language-Conditioned Robot Manipulation PDF

[60] Long-Horizon Language-Conditioned Imitation Learning for Robotic Manipulation PDF

Systematic evaluation revealing VLA model limitations in scientific settings

[57] Vision language action models in robotic manipulation: A systematic review PDF

[71] A survey on vision-language-action models for embodied ai PDF

[72] TinyVLA: Toward Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation PDF

[73] OpenVLA: An Open-Source Vision-Language-Action Model PDF

[74] Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation PDF

[75] Fine-tuning large vision-language models as decision-making agents via reinforcement learning PDF

[76] Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation PDF

[77] Tgrpo: Fine-tuning vision-language-action model via trajectory-wise group relative policy optimization PDF

[78] Recipe for Vision-Language-Action Models in Robotic Manipulation: A Survey PDF

[79] MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation PDF

Table of Contents

[66] The design and application of IRobotQ3D for simulating robotics experiments in Kâ12 education PDF

[69] VR Co-Lab: A Virtual Reality Platform for HumanâRobot Disassembly Training and Synthetic Data Generation PDF