Sequence Display enables large-scale sequence–activity datasets for rapid protein evolution

Arnold, F. H. Design by directed evolution. Acc. Chem. Res. 31, 125–131 (1998).
Google Scholar
Packer, M. S. & Liu, D. R. Methods for the directed evolution of proteins. Nat. Rev. Genet. 16, 379–394 (2015).
Google Scholar
Arnold, F. H. Directed evolution: bringing new chemistry to life. Angew. Chem. Int. Ed. Engl. 57, 4143–4148 (2018).
Google Scholar
Wang, Y. et al. Directed evolution: methodologies and applications. Chem. Rev. 121, 12384–12444 (2021).
Google Scholar
Yuan, T. et al. Biocatalytic synthesis of N-protected α-amino acids through 1,3-nitrogen migration by nonheme iron enzymes. J. Am. Chem. Soc. 147, 44041–44047 (2025).
Google Scholar
Smith, G. P. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228, 1315–1317 (1985).
Google Scholar
McCafferty, J., Griffiths, A. D., Winter, G. & Chiswell, D. J. Phage antibodies: filamentous phage displaying antibody variable domains. Nature 348, 552–554 (1990).
Google Scholar
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Google Scholar
Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).
Google Scholar
Blum, T. R. et al. Phage-assisted evolution of botulinum neurotoxin proteases with reprogrammed specificity. Science 371, 803–810 (2021).
Google Scholar
Doman, J. L. et al. Phage-assisted evolution and protein engineering yield compact, efficient prime editors. Cell 186, 3983–4002.e26 (2023).
Google Scholar
Chin, J. W. et al. An expanded eukaryotic genetic code. Science 301, 964–967 (2003).
Google Scholar
Chatterjee, A., Xiao, H. & Schultz, P. G. Evolution of multiple, mutually orthogonal prolyl-tRNA synthetase/tRNA pairs for unnatural amino acid mutagenesis in Escherichia coli. Proc. Natl Acad. Sci. USA 109, 14841–14846 (2012).
Google Scholar
Chatterjee, A., Xiao, H., Yang, P.-Y., Soundararajan, G. & Schultz, P. G. A tryptophanyl-tRNA synthetase/tRNA pair for unnatural amino acid mutagenesis in E. coli. Angew. Chem. Int. Ed. Engl. 52, 5106–5109 (2013).
Google Scholar
Xiao, H., Xuan, W., Shao, S., Liu, T. & Schultz, P. G. Genetic incorporation of ε-N-2-hydroxyisobutyryl-lysine into recombinant histones. ACS Chem. Biol. 10, 1599–1603 (2015).
Google Scholar
Xiao, H. et al. Exploring the potential impact of an expanded genetic code on protein function. Proc. Natl Acad. Sci. USA 112, 6961–6966 (2015).
Google Scholar
Xiao, H. & Schultz, P. G. At the interface of chemical and biological synthesis: an expanded genetic code. Cold Spring Harb. Perspect. Biol. 8, a023945 (2016).
Google Scholar
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Google Scholar
Fowler, D. M. et al. High-resolution mapping of protein sequence–function relationships. Nat. Methods 7, 741–746 (2010).
Google Scholar
Meier, G. et al. Deep mutational scan of a drug efflux pump reveals its structure–function landscape. Nat. Chem. Biol. 19, 440–450 (2023).
Google Scholar
Raguram, A., An, M., Chen, P. Z. & Liu, D. R. Directed evolution of engineered virus-like particles with improved production and transduction efficiencies. Nat. Biotechnol. 43, 1635–1647 (2025).
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023).
Google Scholar
Ruffolo, J. A. & Madani, A. Designing proteins with language models. Nat. Biotechnol. 42, 200–202 (2024).
Google Scholar
Su, J., et al. SaProt: protein language modeling with structure-aware vocabulary. Preprint at bioRxiv https://doi.org/10.1101/2023.10.01.560349 (2024).
Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025).
Google Scholar
Su, J. et al. Democratizing protein language model training, sharing and collaboration. Nat. Biotechnol. https://doi.org/10.1038/s41587-025-02859-7 (2025).
Google Scholar
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Google Scholar
Zhou, X. et al. I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat. Protoc. 17, 2326–2353 (2022).
Google Scholar
Zheng, W. et al. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat. Methods 21, 279–289 (2024).
Google Scholar
Rapp, J. T., Bremer, B. J. & Romero, P. A. Self-driving laboratories to autonomously navigate the protein fitness landscape. Nat. Chem. Eng. 1, 97–107 (2024).
Google Scholar
Jiang, K. et al. Rapid in silico directed evolution by a protein language model with EVOLVEpro. Science 387, eadr6006 (2024).
Google Scholar
Shanker, V. R., Bruun, T. U. J., Hie, B. L. & Kim, P. S. Unsupervised evolution of protein and antibody complexes with a structure-informed language model. Science 385, 46–53 (2024).
Google Scholar
He, Y. et al. Protein language models-assisted optimization of a uracil-N-glycosylase variant enables programmable T-to-G and T-to-C base editing. Mol. Cell 84, 1257–1270 (2024).
Google Scholar
Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025).
Google Scholar
Yeh, A. H.-W. et al. De novo design of luciferases using deep learning. Nature 614, 774–780 (2023).
Google Scholar
Kortemme, T. De novo protein design—from new structures to programmable functions. Cell 187, 526–544 (2024).
Google Scholar
Lu, L. et al. De novo design of drug-binding proteins with predictable binding energy and specificity. Science 384, 106–112 (2024).
Google Scholar
Vázquez Torres, S. et al. De novo designed proteins neutralize lethal snake venom toxins. Nature 639, 225–231 (2025).
Google Scholar
Freschlin, C. R., Fahlberg, S. A. & Romero, P. A. Machine learning to navigate fitness landscapes for protein engineering. Curr. Opin. Biotechnol. 75, 102713 (2022).
Google Scholar
Ye, L. et al. Glycosylase-based base editors for efficient T-to-G and C-to-G editing in mammalian cells. Nat. Biotechnol. 42, 1538–1547 (2024).
Google Scholar
Tong, H. et al. Development of deaminase-free T-to-S base editor and C-to-G base editor by engineered human uracil DNA glycosylase. Nat. Commun. 15, 4897 (2024).
Google Scholar
Hie, B. L. et al. Efficient evolution of human antibodies from general protein language models. Nat. Biotechnol. 42, 275–283 (2024).
Google Scholar
Tang, W. & Liu, D. R. Rewritable multi-event analog recording in bacterial and mammalian cells. Science 360, eaap8992 (2018).
Google Scholar
Yu, Y. et al. Cytosine base editors with minimized unguided DNA and RNA off-target events and high on-target activity. Nat. Commun. 11, 2052 (2020).
Google Scholar
Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).
Google Scholar
Mol, C. D. et al. Crystal structure of human uracil-DNA glycosylase in complex with a protein inhibitor: protein mimicry of DNA. Cell 82, 701–708 (1995).
Google Scholar
Wang, L. et al. Enhanced base editing by co-expression of free uracil DNA glycosylase inhibitor. Cell Res. 27, 1289–1292 (2017).
Google Scholar
Huang, Y. et al. Genetic code expansion: recent developments and emerging applications. Chem. Rev. 125, 523–598 (2025).
Google Scholar
Osgood, A. O., Huang, Z., Szalay, K. H. & Chatterjee, A. Strategies to expand the genetic code of mammalian cells. Chem. Rev. 125, 2474–2501 (2025).
Google Scholar
Hu, Z. et al. Discovery and engineering of small SlugCas9 with broad targeting range and high specificity and activity. Nucleic Acids Res. 49, 4008–4019 (2021).
Google Scholar
Seo, S.-Y. et al. Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s. Nat. Methods 20, 999–1009 (2023).
Google Scholar
Qi, T. et al. Phage-assisted evolution of compact Cas9 variants targeting a simple NNG PAM. Nat. Chem. Biol. 20, 344–352 (2024).
Google Scholar
Putnam, C. D. et al. Protein mimicry of DNA from crystal structures of the uracil-DNA glycosylase inhibitor protein and its complex with Escherichia coli uracil-DNA glycosylase1. J. Mol. Biol. 287, 331–346 (1999).
Google Scholar
Karzai, A. W., Roche, E. D. & Sauer, R. T. The SsrA–SmpB system for protein tagging, directed degradation and ribosome rescue. Nat. Struct. Mol. Biol. 7, 449–455 (2000).
Google Scholar
Klimecka, M. M. et al. A uniform benchmark for testing SsrA-derived degrons in the Escherichia coli ClpXP degradation pathway. Molecules 26, 5936 (2021).
Google Scholar
Thuronyi, B. W. et al. Continuous evolution of base editors with expanded target compatibility and improved activity. Nat. Biotechnol. 37, 1070–1079 (2019).
Google Scholar
Neugebauer, M. E. et al. Evolution of an adenine base editor into a small, efficient cytosine base editor with low off-target activity. Nat. Biotechnol. 41, 673–685 (2023).
Google Scholar
Zhang, E., Neugebauer, M. E., Krasnow, N. A. & Liu, D. R. Phage-assisted evolution of highly active cytosine base editors with enhanced selectivity and minimal sequence context preference. Nat. Commun. 15, 1697 (2024).
Google Scholar
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Google Scholar
Horvath, P. et al. Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus. J. Bacteriol. 190, 1401–1412 (2008).
Google Scholar
Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl Acad. Sci. USA 110, 15644–15649 (2013).
Google Scholar
Harrington, L. B. et al. A thermostable Cas9 with increased lifetime in human plasma. Nat. Commun. 8, 1424 (2017).
Google Scholar
Agudelo, D. et al. Versatile and robust genome editing with Streptococcus thermophilus CRISPR1–Cas9. Genome Res. 30, 107–117 (2020).
Google Scholar
Legut, M. et al. High-throughput screens of PAM-flexible Cas9 variants for gene knockout and transcriptional modulation. Cell Rep. 30, 2859–2868 (2020).
Google Scholar
Chen, L. et al. Re-engineering the adenine deaminase TadA-8e for efficient and specific CRISPR-based cytosine base editing. Nat. Biotechnol. 41, 663–672 (2023).
Google Scholar
Yan, H. & Tang, W. Programmed RNA editing with an evolved bacterial adenosine deaminase. Nat. Chem. Biol. 20, 1361–1370 (2024).
Google Scholar
Xiao, Y.-L., Wu, Y. & Tang, W. An adenine base editor variant expands context compatibility. Nat. Biotechnol. 42, 1442–1453 (2024).
Google Scholar
Ibba, M. & Söll, D. Aminoacyl-tRNA synthesis. Annu. Rev. Biochem. 69, 617–650 (2000).
Google Scholar
Liu, C. C. & Schultz, P. G. Adding new chemistries to the genetic code. Annu. Rev. Biochem. 79, 413–444 (2010).
Google Scholar
Chin, J. W. Expanding and reprogramming the genetic code of cells and animals. Annu. Rev. Biochem. 83, 379–408 (2014).
Google Scholar
Guo, Y. et al. Biosynthesis of halogenated tryptophans for protein engineering using genetic code expansion. ChemBioChem 25, e202400366 (2024).
Google Scholar
Hu, Y. et al. Biosynthesis of unnatural cyclodipeptides through genetic code expansion and cyclodipeptide synthase evolution. J. Am. Chem. Soc. 147, 34517–34526 (2025).
Google Scholar
Hu, Y. et al. Engineering unnatural cells with a 21st amino acid as a living epigenetic sensor. Nat. Commun. 16, 9388 (2025).
Google Scholar
Cheng, L., Wang, Y., Guo, Y., Zhang, S. S. & Xiao, H. Advancing protein therapeutics through proximity-induced chemistry. Cell Chem. Biol. 31, 428–445 (2024).
Google Scholar
Chen, Y. et al. Unleashing the potential of noncanonical amino acid biosynthesis to create cells with precision tyrosine sulfation. Nat. Commun. 13, 5434 (2022).
Google Scholar
Zhang, M. et al. Harnessing nature-inspired catechol amino acid to engineer sticky proteins and bacteria. Small Methods 8, 2400230 (2024).
Google Scholar
Yang, S. et al. Real-time imaging of protein microenvironment changes in cells with rotor-based fluorescent amino acids. Nat. Chem. Biol. 22, 97–108 (2026).
Google Scholar
Bryson, D. I. et al. Continuous directed evolution of aminoacyl-tRNA synthetases. Nat. Chem. Biol. 13, 1253–1260 (2017).
Google Scholar
Wilkins, B. J. et al. Genetically encoding lysine modifications on histone H4. ACS Chem. Biol. 10, 939–944 (2015).
Google Scholar
Miao, H., Yu, C., Yao, A. & Xuan, W. Rational design of a function-based selection method for genetically encoding acylated lysine derivatives. Org. Biomol. Chem. 17, 6127–6130 (2019).
Google Scholar
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
Google Scholar
Smola, M. J., Rice, G. M., Busan, S., Siegfried, N. A. & Weeks, K. M. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. Nat. Protoc. 10, 1643–1669 (2015).
Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Google Scholar
Chen, S. Ultrafast one-pass FASTQ data preprocessing, quality control, and deduplication using fastp. iMeta 2, e107 (2023).
Google Scholar
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).
Google Scholar
Cheng, L., Zheng, C. & Jiang, S. SophieSarceau/SequenceDisplay-ML: SequenceDisplay-ML. Zenodo https://doi.org/10.5281/zenodo.18850384 (2026).
Cheng, L., Ding, H., Jiang, S., Zheng, X. & Xiao, H. Raw Illumina sequencing data for the large-scale 5NNK SlugCas9 sequence–activity dataset. Zenodo https://doi.org/10.5281/zenodo.18839434 (2026).



