Manufacturing-aware generative models enable petascale synthesis of designed DNA

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c
  • Russ, WP et al. An evolutionary-based model for the design of chorismate mutase enzymes. Science 369440-445 (2020).

    Article CAS PubMed Google Scholar

  • Shin, J.-E. et al. Protein design and variant prediction using generative autoregressive models. Nat. Common. 122403 (2021).

    Article CAS PubMed PubMed Central Google Scholar

  • Madani, A. et al. Large language models generate functional protein sequences in diverse families. Nat. Biotechnology. 411099-1106 (2023).

    Article CAS PubMed PubMed Central Google Scholar

  • Ingraham, JB et al. Illuminating protein space with a programmable generative model. Nature 6231070-1078 (2023).

    Article CAS PubMed PubMed Central Google Scholar

  • Watson, JL et al. De novo design of protein structure and function with RFdiffusion. Nature 6201089-1100 (2023).

    Article CAS PubMed PubMed Central Google Scholar

  • Hopf, TA et al. Mutation effects predicted from sequence co-variation. Nat. Biotechnology. 35128-135 (2017).

    Article CAS PubMed PubMed Central Google Scholar

  • Weinstein, EN, Amin, AN, Medical, H., Frazer, J. & Marks, DS Nonidentifiability and benefits of misspecification in molecular fitness models. In Proc. 36th International Conference on Neural Information Processing Systems (ed. Koyejo, S. et al.) (ACM, 2022).

  • Kosuri, S. & Church, GM Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11499-507 (2014).

    Article CAS PubMed PubMed Central Google Scholar

  • Weinstein, EN et al. Optimal design of stochastic DNA synthesis protocols based on generative sequence models. In Proc. 25th International Conference on Artificial Intelligence and Statistics (ed. Camps-Valls, G. et al.) (PMLR, 2022).

  • Li, JQ and Barron, AR Estimation of mixture density. In Proc. 12th International Conference on Neural Information Processing Systems (ed. Kearns, MJ et al.) (ACM, 1999).

  • Richardson, E. & Weiss, Y. On GANs and GMMs. In Proc. 32nd International Conference on Neural Information Processing Systems (ed. Bengio, S. et al.) (ACM, 2022).

  • Olsen, TH, Boyles, F. & Deane, CM Observed Antibody Space: a diverse database of cleaned, annotated and translated unpaired and matched antibody sequences. Protein Sci. 31141-146 (2022).

    Article CAS PubMed Google Scholar

  • Olsen, TH, Moal, IH and Deane, CM Addressing antibody germline bias and its effect on language models to improve antibody design. Bioinformatics 40btae618 (2024).

    Article CAS PubMed PubMed Central Google Scholar

  • Amin, AN, Weinstein, EN & Marks, DS A nonparametric generative Bayesian model for whole genomes. In Proc. 35th International Conference on Neural Information Processing Systems (ed. Ranzato, M. et al.) (ACM, 2021).

  • Gretton, A., Borgwardt, KM, Rasch, MJ, Schölkopf, B. and Smola, A. A two-sample kernel test. J.Mach. Learn. Res. 13723-773 (2012).

    Google Scholar

  • Amin, AN, Marks, DS & Weinstein, EN Biological sequence cores with guaranteed flexibility. J.Mach. Learn. Res. 261–63 (2025).

    Google Scholar

  • Shuai, RW, Ruffolo, JA & Gray, JJ IgLM: filler language modeling for antibody sequence design. Cellular systems 14979-989.e4 (2023).

    Article PubMed PubMed Central Google Scholar

  • Amin, AN, Weinstein, EN & Marks, DS A kernelized Stein divergence for biological sequences. In Proc. 40th International Conference on Machine Learning (ed. Krause, A. et al.) (PMLR, 2023).

  • Lloyd, JR and Ghahramani, Z. Critique of the statistical model using two kernel test samples. In Proc. 29th International Conference on Neural Information Processing Systems (ed. Cortes, C. et al.) (ACM, 2015).

  • Wermke, M. et al. Autologous T cell therapy for PRAME+ advanced solid tumors in HLA-A*02+ patients: a phase 1 trial. Nat. Med. 312365-2374 (2025).

    Article PubMed PubMed Central Google Scholar

  • Reynisson, B., Alvarez, B., Paul, S., Peters, B. & Nielsen, M. NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by simultaneous motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 48W449-W454 (2020).

    Article CAS PubMed PubMed Central Google Scholar

  • Nijkamp, ​​E., Ruffolo, JA, Weinstein, E.N., Naik, N. & Madani, A. ProGen2: exploring the limits of protein language models. Cellular system. 14968-978 (2023).

    Google Scholar

  • Gibson, DG et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6343-345 (2009).

    Article CAS PubMed Google Scholar

  • Shumailov, I. et al. AI models collapse when trained on recursively generated data. Nature 631755-759 (2024).

    Article CAS PubMed PubMed Central Google Scholar

  • Framework for screening nucleic acid synthesis (National Council of Science and Technology, 2024); https://aspr.hhs.gov/S3/Documents/OSTP-Nucleic-Acid-Synthesis-Screening-Framework-Sep2024.pdf

  • Baker, D. & Church, G. Protein design meets biosecurity. Science 383349 (2024).

    Article PubMed Google Scholar

  • Baum, C. et al. A system capable of verifiably and privately filtering global DNA synthesis. Preprint at https://arxiv.org/abs/2403.14023 (2025).

  • Abdali, S., Anarfi, R., Barberan, CJ, He, J. and Shayegani, E. Securing large language models: threats, vulnerabilities and responsible practices. Preprint at https://arxiv.org/abs/2403.12503 (2024).

  • Weinstein, EN, Slabodkin, A., Gollub, MG & Wood, EB Accelerated learning on large-scale displays using generative library models. Preprint at https://arxiv.org/abs/2510.16612 (2025).

  • Weinstein, EN et al. Acquisition of lifting biomolecular data. Preprint at https://arxiv.org/abs/2512.15984 (2025).

  • Zhang, J., Kobert, K., Flouri, T. and Stamatakis, A. PEAR: fast and accurate illumina double-end read fusion. Bioinformatics 30614-620 (2014).

    Article CAS PubMed Google Scholar

  • Daily, J. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC Bioinformatics 1781 (2016).

    Article PubMed PubMed Central Google Scholar

  • Jaravine, V., Mösch, A., Raffegerst, S., Schendel, DJ & Frishman, D. Expitope 2.0: a tool to evaluate immunotherapeutic antigens for their potential cross-reactivity against proteins naturally expressed in human tissues. Cancer BMC 17892 (2017).

    Article PubMed PubMed Central Google Scholar

  • Vita, R. et al. The immune epitope database (iedb): 2018 update. Nucleic Acids Res. 47D339-D343 (2019).

    Article CAS PubMed PubMed Central Google Scholar

  • Huszár, F. & Duvenaud, D. Optimal weighting breeding is Bayesian quadrature. In Proc. 28th Annual Conference on Uncertainty in Artificial Intelligence (ed. de Freitas, N. and Murphy, K.) (ACM, 2012).

  • McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2020).

  • Salimans, T. et al. Improved techniques for training GANs. In Proc. 30th International Conference on Neural Information Processing Systems (ed. Lee, DD et al.) (ACM, 2016).

  • Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by an update rule at two time scales converge to a local Nash equilibrium. In Proc. 31st International Conference on Neural Information Processing Systems (ed. von Luxburg, U. et al.) (ACM, 2017).

  • Lefranc, deputy. et al. Unique IMGT numbering for immunoglobulin and T cell receptor variable domains and Ig superfamily type V domains. Dev. Comp. Immunol. 2755-77 (2003).

    Article CAS PubMed Google Scholar

  • Shen, S. et al. Probabilistic analysis of the frequencies of amino acid pairs within characterized protein sequences. Physics A 370651-662 (2006).

    Article CAS PubMed PubMed Central Google Scholar

  • Rao, X., Fontaine Costa, AIC, van Baarle, D. & Kesmir, C. A comparative study of HLA binding affinity and ligand diversity: implications for the generation of immunodominant CD8+ T cell responses. J. Immunol. 1821526-1532 (2009).

    Article CAS PubMed Google Scholar

  • Trolle, T. et al. The length distribution of class I-restricted T cell epitopes is determined by both peptide intake and MHC allele-specific binding preference. J. Immunol. 1961480-1487 (2016).

    Article CAS PubMed PubMed Central Google Scholar

  • Related Articles

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top button