Machine learning enables scalable and systematic hierarchical virus taxonomy

Roux, S. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016).
Google Scholar
Guidi, L. et al. Plankton networks driving carbon export in the oligotrophic ocean. Nature 532, 465–470 (2016).
Google Scholar
Zimmerman, A. E. et al. Metabolic and biogeochemical consequences of viral infection in aquatic ecosystems. Nat. Rev. Microbiol. https://doi.org/10.1038/s41579-019-0270-x (2020).
Google Scholar
Emerson, J. B. et al. Host-linked soil viral ecology along a permafrost thaw gradient. Nat. Microbiol. 3, 870–880 (2018).
Google Scholar
Jansson, J. K. & Wu, R. Soil viral diversity, ecology and climate change. Nat. Rev. Microbiol. 21, 296–311 (2023).
Google Scholar
Koskella, B. & Taylor, T. B. Multifaceted impacts of bacteriophages in the plant microbiome. Annu. Rev. Phytopathol. 56, 361–380 (2018).
Google Scholar
Yan, M. et al. Interrogating the viral dark matter of the rumen ecosystem with a global virome database. Nat. Commun. 14, 5254 (2023).
Google Scholar
Yan, M. & Yu, Z. Viruses contribute to microbial diversification in the rumen ecosystem and are associated with certain animal production traits. Microbiome 12, 82 (2024).
Google Scholar
Shkoporov, A. N. & Hill, C. Bacteriophages of the human gut: the “known unknown” of the microbiome. Cell Host Microbe 25, 195–209 (2019).
Google Scholar
Shkoporov, A. N., Turkington, C. J. & Hill, C. Mutualistic interplay between bacteriophages and bacteria in the human gut. Nat. Rev. Microbiol. 20, 737–749 (2022).
Google Scholar
Walker, P. J. et al. Changes to virus taxonomy and the Statutes ratified by the International Committee on Taxonomy of Viruses (2020). Arch. Virol. 165, 2737–2748 (2020).
Google Scholar
Walker, P. J. et al. Recent changes to virus taxonomy ratified by the International Committee on Taxonomy of Viruses (2022). Arch. Virol. 167, 2429–2440 (2022).
Google Scholar
Zerbini, F. M. et al. Changes to virus taxonomy and the ICTV Statutes ratified by the International Committee on Taxonomy of Viruses (2023). Arch. Virol. 168, 175 (2023).
Google Scholar
Gorbalenya, A. E. et al. The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks. Nat. Microbiol 5, 668–674 (2020).
Google Scholar
Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).
Google Scholar
Roux, S. et al. Minimum Information about an Uncultivated Virus Genome (MIUViG): a community consensus on standards and best practices for describing genome sequences from uncultivated viruses. Nat. Biotechnol. 37, 29–37 (2018).
Google Scholar
Simmonds, P. et al. Consensus statement: virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).
Google Scholar
Simmonds, P. et al. Four principles to establish a universal virus taxonomy. PLoS Biol. 21, e3001922 (2023).
Google Scholar
Dutilh, B. E. et al. Perspective on taxonomic classification of uncultivated viruses. Curr. Opin. Virol. 51, 207–215 (2021).
Google Scholar
Koonin, E. V., Senkevich, T. G. & Dolja, V. V. The ancient Virus World and evolution of cells. Biol. Direct 1, 29 (2006).
Google Scholar
Holmes, E. C. What does virus evolution tell us about virus origins? J. Virol. 85, 5247–5251 (2011).
Google Scholar
Koonin, E. V. & Dolja, V. V. Virus World as an evolutionary network of viruses and capsidless selfish elements. Microbiol. Mol. Biol. Rev. 78, 278–303 (2014).
Google Scholar
Moraru, C. VirClust—a tool for hierarchical clustering, core protein detection and annotation of (prokaryotic) viruses. Viruses 15, 1007 (2023).
Google Scholar
Aiewsakun, P. & Simmonds, P. The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification. Microbiome 6, 38 (2018).
Google Scholar
Pons, J. C. et al. VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. Bioinformatics https://doi.org/10.1093/bioinformatics/btab026 (2021).
Google Scholar
Camargo, A. P. et al. Identification of mobile genetic elements with geNomad. Nat. Biotechnol. 42, 1303–1312 (2024).
Google Scholar
Moraru, C., Varsani, A. & Kropinski, A. M. VIRIDIC—a novel tool to calculate the intergenomic similarities of prokaryote-infecting viruses. Viruses 12, 1268 (2020).
Google Scholar
Bao, Y., Chetvernin, V. & Tatusova, T. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification. Arch. Virol. 159, 3293–3304 (2014).
Google Scholar
Tisza, M. J., Belford, A. K., Domínguez-Huerta, G., Bolduc, B. & Buck, C. B. Cenote-Taker 2 democratizes virus discovery and sequence annotation. Virus Evol. 7, veaa100 (2021).
Google Scholar
Lima-Mendez, G., Van Helden, J., Toussaint, A. & Leplae, R. Reticulate representation of evolutionary and functional relationships between phage genomes. Mol. Biol. Evol. 25, 762–777 (2008).
Google Scholar
Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).
Google Scholar
Bin Jang, H. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019).
Google Scholar
Barylski, J. et al. Analysis of Spounaviruses as a case study for the overdue reclassification of tailed phages. Syst. Biol. 69, 110–123 (2020).
Google Scholar
Turner, D. et al. Abolishment of morphology-based taxa and change to binomial species names: 2022 taxonomy update of the ICTV bacterial viruses subcommittee. Arch. Virol. 168, 74 (2023).
Google Scholar
Van Dongen, S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008).
Google Scholar
Gorbalenya, A. E. & Lauber, C. Bioinformatics of virus taxonomy: foundations and tools for developing sequence-based hierarchical classification. Curr. Opin. Virol. 52, 48–56 (2022).
Google Scholar
Wertheim, J. O., Steel, M. & Sanderson, M. J. Accuracy in near-perfect virus phylogenies. Syst. Biol. 71, 426–438 (2022).
Google Scholar
Meier-Kolthoff, J. P. & Göker, M. VICTOR: genome-based phylogeny and classification of prokaryotic viruses. Bioinformatics 33, 3396–3404 (2017).
Google Scholar
Gregory, A. C. et al. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer. BMC Genomics 17, 930 (2016).
Google Scholar
Bobay, L. & Ochman, H. Biological species in the viral world. Proc. Natl Acad. Sci. USA 115, 6040–6045 (2018).
Google Scholar
Ndovie, W. et al. Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism. mSystems https://doi.org/10.1128/msystems.01661-24 (2025).
Cook, R. et al. INfrastructure for a PHAge REference Database: identification of large-scale biases in the current collection of cultured phage genomes. PHAGE 2, 214–223 (2021).
Google Scholar
Nelson, D. Phage taxonomy: we agree to disagree. J. Bacteriol. 186, 7029–7031 (2004).
Google Scholar
Krupovic, M., Quemin, E. R. J., Bamford, D. H., Forterre, P. & Prangishvili, D. Unification of the globally distributed spindle-shaped viruses of the Archaea. J. Virol. 88, 2354–2358 (2014).
Google Scholar
Rokyta, D. R., Burch, C. L., Caudle, S. B. & Wichman, H. A. Horizontal gene transfer and the evolution of microvirid coliphage genomes. J. Bacteriol. 188, 1134–1142 (2006).
Google Scholar
Dominguez-Huerta, G. et al. Diversity and ecological footprint of Global Ocean RNA viruses. Science 376, 1202–1208 (2022).
Google Scholar
Gregory, A. C. et al. Marine DNA viral macro- and microdiversity from Pole to Pole. Cell 177, 1109–1123 (2019).
Google Scholar
Gregory, A. C. et al. The gut virome database reveals age-dependent patterns of virome diversity in the human gut. Cell Host Microbe 28, 724–740 (2020).
Google Scholar
Graham, E. B. et al. A global atlas of soil viruses reveals unexplored biodiversity and potential biogeochemical impacts. Nat. Microbiol. 9, 1873–1883 (2024).
Google Scholar
Nayfach, S. et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol 6, 960–970 (2021).
Google Scholar
Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).
Google Scholar
Schulz, F. et al. Giant virus diversity and host interactions through global metagenomics. Nature 578, 432–436 (2020).
Google Scholar
Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D. & Lawley, T. D. Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109 (2021).
Google Scholar
Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).
Google Scholar
Strehl, A. & Ghosh, J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003).
Larralde, M. Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. J. Open Source Softw. 7, 4296 (2022).
Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 11, 119 (2010).
Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Google Scholar
Staudt, C. L., Sazonovs, A. & Meyerhenke, H. NetworKit: a tool suite for large-scale complex network analysis. Netw. Sci. 4, 508–530 (2016).
Google Scholar
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Google Scholar
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Google Scholar
Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).
Google Scholar
Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Google Scholar
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).
Google Scholar
Millard, A. et al. taxmyPHAGE: Automated taxonomy of dsDNA phage genomes at the genus and species level. Phage (New Rochelle) 6, 5–11 (2025).
Google Scholar
Bolduc, B. vConTACT3 database v.220. Zenodo https://doi.org/10.5281/zenodo.10035618 (2023).
Bolduc, B. vConTACT3 database v.223. Zenodo https://doi.org/10.5281/zenodo.10935512 (2024).
Bolduc, B. vConTACT3 database v.223 (software repository). Bitbucket https://bitbucket.org/MAVERICLab/vcontact3/src/master/ (2025).


