Comprehensive taxonomic identification of microbial species in metagenomic data using SingleM and Sandpiper

Meyer, F. et al. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat. Methods 19, 429–440 (2022).
Google Scholar
Poussin, C. et al. Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge. BMC Genomics 23, 624 (2022).
Google Scholar
Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
Google Scholar
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Google Scholar
Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2022).
Google Scholar
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
Google Scholar
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
Google Scholar
Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 41, 1633–1644 (2023).
Google Scholar
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Google Scholar
Irber, L. et al. sourmash v4: a multitool to quickly search, compare, and analyze genomic and metagenomic data sets. J. Open Source Softw. 9, 6830 (2024).
Kim, J. & Steinegger, M. Metabuli: sensitive and specific metagenomic classification via joint analysis of amino acid and DNA. Nat. Methods 21, 971–973 (2024).
Google Scholar
Ruscheweyh, H.-J. et al. Cultivation-independent genomes greatly expand taxonomic-profiling capabilities of mOTUs across various environments. Microbiome 10, 212 (2022).
Google Scholar
Sun, Z. et al. Removal of false positives in metagenomics-based taxonomy profiling via targeting type IIB restriction sites. Nat. Commun. 14, 5321 (2023).
Google Scholar
Kodama, Y., Shumway, M., Leinonen, R. & International Nucleotide Sequence Database Collaboration The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 40, D54–D56 (2012).
Google Scholar
Martiny, H.-M., Munk, P., Brinch, C., Aarestrup, F. M. & Petersen, T. N. A curated data resource of 214K metagenomes for characterization of the global antimicrobial resistome. PLoS Biol. 20, e3001792 (2022).
Google Scholar
Schmidt, T. S. B. et al. SPIRE: a searchable, planetary-scale microbiome resource. Nucleic Acids Res. 52, D777–D783 (2023).
Google Scholar
Ma, B. et al. A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources. Nat. Commun. 14, 7318 (2023).
Google Scholar
Nayfach, S. et al. A genomic catalog of Earth’s microbiomes. Nat. Biotechnol. 39, 499–509 (2021).
Google Scholar
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Google Scholar
Paoli, L. et al. Biosynthetic potential of the global ocean microbiome. Nature 607, 111–118 (2022).
Google Scholar
Oren, A. & Garrity, G. M. Valid publication of the names of forty-two phyla of prokaryotes. Int. J. Syst. Evol. Microbiol. 71, 005056 (2021).
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).
Google Scholar
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
Google Scholar
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Google Scholar
Dmitrijeva, M. et al. The mOTUs online database provides web-accessible genomic context to taxonomic profiling of microbial communities. Nucleic Acids Res. 53, D797–D805 (2025).
Google Scholar
Coleman, G. A. et al. A rooted phylogeny resolves early bacterial evolution. Science 372, eabe0511 (2021).
Google Scholar
Ma, S. et al. A microbial gene catalog of anaerobic digestion from full-scale biogas plants. Gigascience 10, giaa164 (2021).
Google Scholar
Yin, Q., Gu, M., Hermanowicz, S. W., Hu, H. & Wu, G. Potential interactions between syntrophic bacteria and methanogens via type IV pili and quorum-sensing systems. Environ. Int. 138, 105650 (2020).
Google Scholar
Yin, Q., Yang, S., Wang, Z., Xing, L. & Wu, G. Clarifying electron transfer and metagenomic analysis of microbial community in the methane production process with the addition of ferroferric oxide. Chem. Eng. J. 333, 216–225 (2018).
Google Scholar
Cheng, H. et al. Understanding the antifouling mechanisms related to copper oxide and zinc oxide nanoparticles in anaerobic membrane bioreactors. Environ. Sci. Nano 6, 3467–3479 (2019).
Google Scholar
Laviad-Shitrit, S. et al. Identification of chironomid species as natural reservoirs of toxigenic Vibrio cholerae strains with pandemic potential. PLoS Negl. Trop. Dis. 14, e0008959 (2020).
Google Scholar
Cao, J. et al. Metagenomic analysis reveals the microbiome and resistome in migratory birds. Microbiome 8, 26 (2020).
Google Scholar
Rhoades, N. S. et al. Longitudinal profiling of the macaque vaginal microbiome reveals similarities to diverse human vaginal communities. mSystems 6, e01322–20 (2021).
Google Scholar
Pratte, Z. A. et al. Microbiome structure in large pelagic sharks with distinct feeding ecologies. Anim. Microbiome 4, 17 (2022).
Google Scholar
Collins, F. W. J. et al. The microbiome of deep-sea fish reveals new microbial species and a sparsity of antibiotic resistance genes. Gut Microbes 13, 1–13 (2021).
Google Scholar
Riiser, E. S. et al. Metagenomic shotgun analyses reveal complex patterns of intra- and interspecific variation in the intestinal microbiomes of codfishes. Appl. Environ. Microbiol. 86, e02788–19 (2020).
Google Scholar
Le Doujet, T. et al. Closely-related Photobacterium strains comprise the majority of bacteria in the gut of migrating Atlantic cod (Gadus morhua). Microbiome 7, 64 (2019).
Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Google Scholar
Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9, R151 (2008).
Google Scholar
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Google Scholar
Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).
Google Scholar
Belmann, P. et al. Bioboxes: standardised containers for interchangeable bioinformatics software. Gigascience 4, 47 (2015).
Google Scholar
Meyer, F. et al. Assessing taxonomic metagenome profilers with OPAL. Genome Biol. 20, 51 (2019).
Google Scholar
Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis 2nd edn (Springer, 2016).
Youngblut, N. D. & Ley, R. E. Struo2: efficient metagenome profiling database construction for ever-expanding microbial genome datasets. PeerJ 9, e12198 (2021).
Google Scholar
Chaumeil, P.-A., Mussig, A. J., Hugenholtz, P. & Parks, D. H. GTDB-Tk v2: memory friendly classification with the genome taxonomy database. Bioinformatics 38, 5315–5316 (2022).
Google Scholar
Shaw, J. & Yu, Y. W. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nat. Methods 20, 1661–1665 (2023).
Google Scholar
Aroney, S. T. N. et al. CoverM: read coverage calculator for metagenomics. Zenodo https://doi.org/10.5281/zenodo.10531253 (2024).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Google Scholar
Sun, Z. et al. Challenges in benchmarking metagenomic profilers. Nat. Methods 18, 618–626 (2021).
Google Scholar
Shen, W. & Ren, H. TaxonKit: a practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).
Google Scholar
Woodcroft, B. J., Cunningham, M., Gans, J. D., Bolduc, B. B. & Hodgkins, S. B. Kingfisher: a utility for procurement of public sequencing data. Zenodo https://doi.org/10.5281/zenodo.10525085 (2024).
Woodcroft, B. J. SingleM pipe search database. Zenodo https://doi.org/10.5281/zenodo.5739612 (2021).
Woodcroft, B. J. Default SingleM reference ‘metapackage’ data. Zenodo https://doi.org/10.5281/zenodo.5739611 (2023).
Woodcroft, B. Public metagenome datasets annotated using SingleM. Zenodo https://doi.org/10.5281/zenodo.10547494 (2024).
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. & Shah, M.) 785–794 (Association for Computing Machinery, 2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Ma, B., Lu, C. & Xu, J. New soil metagenome-assembled genomes catalogue boosts genetic resources. Zenodo https://doi.org/10.5281/zenodo.7341719 (2023).
Chklovski, A., Parks, D. H., Woodcroft, B. J. & Tyson, G. W. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat. Methods 20, 1203–1212 (2023).
Google Scholar
Aroney, S. T. N., Camargo, A. P., Tyson, G. W. & Woodcroft, B. J. Galah: more scalable dereplication for metagenome assembled genomes. Zenodo https://doi.org/10.5281/zenodo.10526085 (2024).
Woodcroft, B. Supplemented SingleM package inclusive of MAGs beyond GTDB. Zenodo https://doi.org/10.5281/zenodo.10360136 (2024).
Woodcroft, B. Public metagenome datasets annotated using SingleM, using a supplemented reference package. Zenodo https://doi.org/10.5281/zenodo.10547501 (2024).
Woodcroft, B. J. & Aroney, S. Default SingleM reference ‘metapackage’ data. Zenodo https://doi.org/10.5281/zenodo.5739611 (2024).
Woodcroft, B. Targeted MAG recovery of novel Muirbacteria, Wallbacteria, Riflebacteria and Fusobacteria using SIngleM. Zenodo https://doi.org/10.5281/zenodo.10162715 (2023).
Woodcroft, B. J. SingleM: novelty-inclusive microbial community profiling of shotgun metagenomes. Source code. GitHub https://github.com/wwood/singlem (2025).
Woodcroft, B. J. Sandpiper: website/continuous DB builds for SingleM. Source code. GitHub https://github.com/wwood/sandpiper (2025).
Woodcroft, B. J. Smafa: biological sequence aligner for pre-aligned sequences. GitHub https://github.com/wwood/smafa (2025).
Woodcroft, B. SingleM BioConda package. BioConda https://anaconda.org/bioconda/singlem (2025).
Woodcroft, B. J. SingleM PyPI archive. Python Packaging Index https://pypi.org/project/singlem/ (2025).
Woodcroft, B. J. Smafa crate. The Rust Community’s Crate Registry https://crates.io/crates/smafa (2025).
Woodcroft, B. J. SingleM docker container. DockerHub https://hub.docker.com/r/wwood/singlem (2025).
Woodcroft, B. J. singlem-installation: containerised testing of SingleM installation methods. Source code. GitHub https://github.com/wwood/singlem-installation (2025).
Woodcroft, B. J. singlem-benchmarking. Source code. GitHub https://github.com/wwood/singlem-benchmarking (2025).
Woodcroft, B. Reference genome data used for benchmarking SingleM. Zenodo https://doi.org/10.5281/zenodo.12525852 (2024).
Woodcroft, B. J. singlem_host_or_ecological_predictor: predict whether a metagenome is from a host-associated sample or not based on its SingleM profile. Source code. GitHub https://github.com/wwood/singlem_host_or_ecological_predictor (2025).



