Customizing CRISPR–Cas PAM specificity with protein language models

Collias, D. & Beisel, C. L. CRISPR technologies and the search for the PAM-free nuclease. Nat. Commun. 12, 555 (2021).
Google Scholar
Karvelis, T. et al. Rapid characterization of CRISPR–Cas9 protospacer adjacent motif sequence elements. Genome Biol. 16, 253 (2015).
Google Scholar
Gasiunas, G. et al. A catalogue of biochemically diverse CRISPR–Cas9 orthologs. Nat. Commun. 11, 5512 (2020).
Google Scholar
Yan, W. X. et al. Functionally diverse type V CRISPR–Cas systems. Science 363, 88–91 (2019).
Google Scholar
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Google Scholar
Christie, K. A. et al. Towards personalised allele-specific CRISPR gene editing to treat autosomal dominant disorders. Sci. Rep. 7, 16174 (2017).
Google Scholar
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Google Scholar
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).
Google Scholar
Kleinstiver, B. P. et al. Engineered CRISPR–Cas12a variants with increased activities and improved targeting ranges for gene, epigenetic and base editing. Nat. Biotechnol. 37, 276–282 (2019).
Google Scholar
Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).
Google Scholar
Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Google Scholar
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Google Scholar
Huang, T. P. et al. High-throughput continuous evolution of compact Cas9 variants targeting single-nucleotide-pyrimidine PAMs. Nat. Biotechnol. 41, 96–107 (2023).
Google Scholar
Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).
Google Scholar
Nijkamp, E., Ruffolo, J. A., Weinstein, E. N., Naik, N. & Madani, A. ProGen2: exploring the boundaries of protein language models. Cell Syst. 14, 968–978 (2023).
Google Scholar
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Google Scholar
Ruffolo, J. A. et al. Design of highly functional genome editors by modelling CRISPR–Cas sequences. Nature 645, 518–525 (2025).
Google Scholar
Meeske, A. J. & Marraffini, L. A. RNA guide complementarity prevents self-targeting in type VI CRISPR Systems. Mol. Cell 71, 791–801 (2018).
Google Scholar
Marraffini, L. A. & Sontheimer, E. J. CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea. Nat. Rev. Genet. 11, 181–190 (2010).
Google Scholar
Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).
Google Scholar
Camargo, A. P. et al. IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata. Nucleic Acids Res. 52, D164–D173 (2024).
Google Scholar
Ciciani, M. et al. Automated identification of sequence-tailored Cas9 proteins using massive metagenomic data. Nat. Commun. 13, 6474 (2022).
Google Scholar
Adler, B. A. et al. CasPEDIA Database: a functional classification system for class 2 CRISPR–Cas enzymes. Nucleic Acids Res. 52, D590–D596 (2024).
Google Scholar
Gleditzsch, D. et al. PAM identification by CRISPR–Cas effector complexes: diversified mechanisms and structures. RNA Biol. 16, 504–517 (2019).
Google Scholar
Anders, C., Niewoehner, O., Duerst, A. & Jinek, M. Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease. Nature 513, 569–573 (2014).
Google Scholar
Ruffolo, J. A. et al. Adapting protein language models for structure-conditioned design. Preprint at bioRxiv https://doi.org/10.1101/2024.08.03.606485 (2024).
Wei, J. et al. Closely related type II-C Cas9 orthologs recognize diverse PAMs. eLife 11, e77825 (2022).
Google Scholar
Wimmer, F., Mougiakos, I., Englert, F. & Beisel, C. L. Rapid cell-free characterization of multi-subunit CRISPR effectors and transposons. Mol. Cell 82, 1210–1224 (2022).
Google Scholar
Sun, W. et al. Structures of Neisseria meningitidis Cas9 complexes in catalytically poised and anti-CRISPR-inhibited states. Mol. Cell 76, 938–952 (2019).
Google Scholar
Huang, X. et al. Decoding CRISPR-Cas PAM recognition with UniDesign. Brief. Bioinform. 24, bbad133 (2023).
Google Scholar
Hirano, S., Nishimasu, H., Ishitani, R. & Nureki, O. Structural basis for the altered PAM specificities of engineered CRISPR–Cas9. Mol. Cell 61, 886–894 (2016).
Google Scholar
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Google Scholar
Schmidheini, L. et al. Continuous directed evolution of a compact CjCas9 variant with broad PAM compatibility. Nat. Chem. Biol. 20, 333–343 (2024).
Google Scholar
Amrani, N. et al. NmeCas9 is an intrinsically high-fidelity genome-editing platform. Genome Biol. 19, 1–25 (2018).
Google Scholar
Tsui, T. K. M., Hand, T. H., Duboy, E. C. & Li, H. The impact of DNA topology and guide length on target selection by a cytosine-specific Cas9. ACS Synth. Biol. 6, 1103–1113 (2017).
Google Scholar
Luscombe, N. M., Laskowski, R. A. & Thornton, J. M. Amino acid-base interactions: a three-dimensional analysis of protein–DNA interactions at an atomic level. Nucleic Acids Res. 29, 2860–2874 (2001).
Google Scholar
Walton, R. T., Hsu, J. Y., Joung, J. K. & Kleinstiver, B. P. Scalable characterization of the PAM requirements of CRISPR–Cas enzymes using HT-PAMDA. Nat. Protoc. 16, 1511–1547 (2021).
Google Scholar
Grathwohl, W., Swersky, K., Hashemi, M., Duvenaud, D. & Maddison, C. Oops I took a gradient: scalable sampling for discrete distributions. In Proceedings of the 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) (PMLR, 2021).
Li, W. & Godzik, A. CD-HIT: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Google Scholar
van Dongen, S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix. Anal. Appl. 30, 121–141 (2008).
Google Scholar
Deorowicz, S., Debudaj-Grabysz, A. & Gudyś, A. FAMSA: fast and accurate multiple sequence alignment of huge protein families. Sci. Rep. 6, 33964 (2016).
Google Scholar
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Google Scholar
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Google Scholar
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
Google Scholar
Liu, Z. et al. Versatile and efficient genome editing with Neisseria cinerea Cas9. Commun. Biol. 5, 1–7 (2022).
Google Scholar
Hand, T. H., Das, A. & Li, H. Directed evolution studies of a thermophilic type II-C Cas9. Methods Enzymol. 616, 265–288 (2019).
Google Scholar
Hirano, H. et al. Structure and engineering of Francisella novicida Cas9. Cell 164, 950–961 (2016).
Google Scholar
Cui, Z. et al. FrCas9 is a CRISPR/Cas9 system with high editing efficiency and fidelity. Nat. Commun. 13, 1425 (2022).
Google Scholar
Kim, E. et al. In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).
Google Scholar
Hirano, S. et al. Structural basis for the promiscuous PAM recognition by Corynebacterium diphtheriae Cas9. Nat. Commun. 10, 1968 (2019).
Google Scholar
Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).
Google Scholar
Zetsche, B., Abudayyeh, O. O., Gootenberg, J. S., Scott, D. A. & Zhang, F. A survey of genome editing activity for 16 Cas12a orthologs. Keio J. Med. 69, 59–65 (2020).
Google Scholar
Strecker, J. et al. Engineering of CRISPR–Cas12b for human genome editing. Nat. Commun. 10, 212 (2019).
Harrington, L. B. et al. A scoutRNA is required for some type V CRISPR–Cas systems. Mol. Cell 79, 416–424 (2020).
Google Scholar
Burstein, D. et al. New CRISPR–Cas systems from uncultivated microbes. Nature 542, 237–241 (2017).
Google Scholar
Karvelis, T. et al. PAM recognition by miniature CRISPR–Cas12f nucleases triggers programmable double-stranded DNA target cleavage. Nucleic Acids Res. 48, 5016 (2020).
Google Scholar
Wang, Y. et al. A highly specific CRISPR-Cas12j nuclease enables allele-specific genome editing. Sci. Adv. 9, eabo6405 (2023).
Google Scholar
Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. Science 365, 48–53 (2019).
Google Scholar
Urbaitis, T. et al. A new family of CRISPR-type V nucleases with C-rich PAM recognition. EMBO Rep. 23, e55481 (2022).
Google Scholar
Wu, W. Y. et al. The miniature CRISPR–Cas12m effector binds DNA to block transcription. Mol. Cell 82, 4487–4502 (2022).
Google Scholar
Al-Shayeb, B. et al. Diverse virus-encoded CRISPR–Cas systems include streamlined genome editors. Cell 185, 4574–4586 (2022).
Google Scholar
Zhang, Y. et al. Catalytic-state structure and engineering of Streptococcus thermophilus Cas9. Nat. Catal. 3, 813–823 (2020).
Google Scholar
Tran, M. H. et al. A more efficient CRISPR–Cas12a variant derived from MA2020. Mol. Ther. Nucleic Acids 24, 40–53 (2021).
Google Scholar
Gao, L. et al. Engineered Cpf1 variants with altered PAM specificities. Nat. Biotechnol. 35, 789–792 (2017).
Google Scholar
Russel, J., Pinilla-Redondo, R., Mayo-Mun˜oz, D., Shah, S. A. & Sørensen, S. J. CRISPRCasTyper: automated identification, annotation, and classification of CRISPR–Cas loci. CRISPR J. 3, 462–469 (2020).
Google Scholar
Chen, Z. & Zhao, H. A highly sensitive selection method for directed evolution of homing endonucleases. Nucleic Acids Res. 33, e154 (2005).
Google Scholar
Rohland, N. & Reich, D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Res. 22, 939–946 (2012).
Google Scholar
Gooden, A. A., Evans, C. N., Sheets, T. P., Clapp, M. E. & Chari, R. dbGuide: a database of functionally validated guide RNAs for genome editing in human and mouse cells. Nucleic Acids Res. 49, D871–D876 (2020).
Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Google Scholar
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Google Scholar




