Minimal life by computer | Nature Biotechnology
Progress toward a true virtual cell will depend on uniting AI’s pattern-finding power with the causal rigor of mechanistic models.
For decades, biologists have pursued the ambitious goal of constructing a ‘virtual cell’: a computational model capable of reproducing the behavior of a living organism from its molecular components1. Such a model would allow experiments, design and optimization to happen in silico, saving researchers time and money. A recent paper in Cell presented one of the most detailed mechanistic whole-cell simulations ever built: the complete cell cycle of a minimal bacterium, JCVI-syn3A2. To build the computational models for JCVI-syn3A, the authors incorporated everything known about it — biochemical reaction networks, gene expression patterns, spatial cell structure and molecular dynamics — which allowed them to visualize chromosomal replication and segregation and heterogeneity of these processes across 50 replicate models. While this is an impressive computational feat, it will still be many years before a virtual cell will be truly useful to biologists.
A traditional mechanistic virtual cell model is a bottom-up, equation-based simulation of cellular processes built from known biological mechanisms to predict how cells behave under different conditions. However, mechanistic simulations are difficult to scale beyond relatively simple organisms. JCVI-syn3A is a synthetic bacterium with 493 genes and is easy to manipulate and study. Another vision of the virtual cell is now emerging from advances in AI. Rather than assembling the cell from known biochemical mechanisms, AI learns cellular behavior directly from large-scale transcriptomics, proteomics and imaging datasets. By training models directly on these data, AI systems can learn statistical representations of cellular states without requiring every underlying mechanism to be explicitly specified. In principle, such approaches could scale rapidly across organisms and conditions, enabling the construction of predictive virtual cells for many biological systems rather than a single carefully reconstructed organism. The trade-off, however, is that these models drawn up from large-scale data may lack mechanistic transparency.
A functional virtual cell would be a powerful tool for the field of biotechnology, whether it were an AI or mechanistic model. Researchers could simulate metabolic engineering strategies in microbes such as Escherichia coli or Saccharomyces cerevisiae before building them, simulating growth and production yields and identifying the best genetic edits for applications like biofuels manufacture, or predicting drug toxicity by simulating drug perturbations across pathways. In drug discovery, instead of screening thousands of compounds experimentally, companies could first screen for cellular responses in silico. With tools like CRISPR, virtual cell models could forecast off-target effects and would be especially valuable for complex genetic edits involving multiple genes. They could be used to optimize engineered cells and model disease states for precision medicine.
These applications are still far in the future — mainly because, even in well-studied organisms, a large fraction of molecular and protein functions are poorly understood. Many kinetic parameters for enzymes are missing, and regulatory interactions are not completely mapped. Fully mechanistic models would rely on these accurate biochemical rules, and they require thousands of parameters, such as reaction rates and binding affinities, to be known or estimated. Small parameter errors could propagate through any model and produce unrealistic effects.
“These applications are still far in the future — mainly because, even in well-studied organisms, a large fraction of molecular and protein functions are poorly understood”.
Whole-cell mechanistic simulations are also computationally intensive, and AI models would be even more so. More cell-specific data are needed to train AI models. Additionally, real biological systems show cell-to-cell variability: two identical cells could behave differently due to stochastic gene expression or environmental differences, and this would be important for realistic simulations. Mechanistic models are good at this, and parts of the JCVI-syn3A model do incorporate stochasticity.
These limitations have not stopped companies and researchers from buying into the idea. The Virtual Cell Challenge hosted by the Arc Institute and concluding at the end of last year had thousands of submissions across 14 countries. In the middle of 2025, the Arc Institute also introduced their first-generation AI virtual cell model, State3, which was trained on data from 170 million cells and single-cell perturbational data from over 100 million cells across 70 different cell lines. SciLifeLab has just recently announced the Alpha Cell project, building an AI predictive cell model that would leverage the human protein atlas and spatial cell data across time. The Chan Zuckerberg Initiative and NVIDIA launched the Virtual Cells Platform late last year, which is focused on scalability of data for virtual cell model development and deployment. Unsurprisingly, Google DeepMind is also showing interest.
In many ways, the quest to generate a virtual cell is reminiscent of the beginning of the Human Cell Atlas (HCA) project ten years ago. When the HCA was proposed, cataloguing all human cell types in a single resource and generating visual atlases of each tissue was ambitious, although there was hope that such a map would transform biology and medicine. The technologies needed to create the HCA were not available; they had to be developed specifically for this project. Single-cell sequencing methods were noisy and inconsistent; cell types had to be defined; it was expensive. Substantial effort had to be made to standardize protocols and integrate complex data, and the analysis challenge was just as important as the biology. Tissue atlases that have been generated to date have identified new cell types, revealed how cells change over time, and improved our understanding of diseases4.
Generating truly useful virtual cells will not be easy and will take time, collaboration and computing power. Along the way to creating a useful virtual cell, new tools will be developed and new biology will be discovered. As has been shown with the HCA, the project does not need to be complete for discoveries to make a difference to the lives of patients or biomanufacturing. The major companies and research efforts above are all focused on AI models, but what they are calling ‘virtual cell’ models are not representations (yet) of an entire cell. They may predict transcriptomic responses to a stimulus or drug, or protein translation, but they are not looking at full cellular responses and pathways as the work in JCVI-syn3A does. Complete understanding of cellular response across a range of conditions and cell types will require integration of both mechanistic and AI approaches.



