World News

Scalable homology detection with ERAST

https://www.profitableratecpm.com/f4ffsdxe?key=39b1ebce72f3758345b2155c98e6709c
  • Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Pearson, W. R. Using the FASTA program to search protein and DNA sequence databases. Methods Mol. Biol. 24, 307–331 (1994).

    CAS 
    PubMed 

    Google Scholar 

  • Yang, J.-M. & Tung, C.-H. Protein structure database search and evolutionary classification. Nucleic Acids Res. 34, 3646–3659 (2006).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wang, S. & Zheng, W.-M. CLePAPS: fast pair alignment of protein structures based on conformational letters. J. Bioinform. Comput. Biol. 6, 347–366 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Shindyalov, I. N. & Bourne, P. E. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. 11, 739–747 (1998).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Holm, L. Dali server: structural unification of protein families. Nucleic Acids Res. 50, W210–W215 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Richardson, L. et al. MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res. 51, D753–D759 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jing, Z., Su, Y. & Han, Y. When large language models meet vector databases: a survey. Preprint at arXiv (2024).

  • Winnicki, M. J., Brown, C. A., Porter, H. L., Giles, C. B. & Wren, J. D. BioVDB: biological vector database for high-throughput gene expression meta-analysis. Front. Artif. Intell. Appl. 7, 1366273 (2024).

    Article 

    Google Scholar 

  • Hamamsy, T. et al. Protein remote homology detection and structural alignment using deep learning. Nat. Biotechnol. 42, 975–985 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liu, W. et al. PLMSearch: protein language model powers accurate and fast sequence search for remote homology. Nat. Commun. 15, 2775 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Hong, L. et al. Fast, sensitive detection of protein homologs using deep dense retrieval. Nat. Biotechnol. 43, 983–995 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Verkuil, R. et al. Language models generalize beyond natural proteins. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521521 (2022).

  • Gu, A. & Dao, T. MAMBA: linear-time sequence modeling with selective state spaces. Preprint at https://arxiv.org/abs/2312.00752 (2023).

  • Schiff, Y. et al. Caduceus: Bi-directional equivariant long-range dna sequence modeling. Proc. Mach. Learn. Res. 235, 43632 (2024).

    PubMed 
    PubMed Central 

    Google Scholar 

  • Jégou, H., Douze, M. & Schmid, C. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011).

    Article 
    PubMed 

    Google Scholar 

  • Malkov, Y. A. & Yashunin, D. A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 824–836 (2020).

    Article 
    PubMed 

    Google Scholar 

  • Ahmad, T., Ahmed, N., Peltenburg, J. & Al-Ars, Z. ArrowSAM: In-memory genomics data processing using Apache Arrow. In 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS) 1–6 (IEEE, 2020).

  • Sanderson, T., Bileschi, M. L., Belanger, D. & Colwell, L. J. ProteInfer, deep neural networks for protein functional inference. eLife 12, e80942 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Huson, D. H., Auch, A. F., Qi, J. & Schuster, S. C. MEGAN analysis of metagenomic data. Genome Res. 17, 377–386 (2007).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Durairaj, J. et al. Uncovering new families and folds in the natural protein universe. Nature 622, 646–653 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Muhammed, M. T. & Aki-Yalcin, E. Homology modeling in drug discovery: overview, current applications, and future perspectives. Chem. Biol. Drug Des. 93, 12–20 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Res. 36, D190–D195 (2008).

    Article 

    Google Scholar 

  • Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article 
    PubMed 

    Google Scholar 

  • Chandonia, J.-M., Fox, N. K. & Brenner, S. E. SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res. 47, D475–D481 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Mock, F., Kretschmer, F., Kriese, A., Böcker, S. & Marz, M. Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks. Proc. Natl Acad. Sci. USA 119, e2122636119 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Elnaggar, A. et al. Ankh ☥: optimized protein language model unlocks general-purpose modelling. Preprint at bioRxiv https://doi.org/10.1101/2023.01.16.524265 (2023).

  • Maćkiewicz, A. & Ratajczak, W. Principal component analysis (PCA). Comput. Geosci. 19, 303–342 (1993).

  • McInnes, L., Healy, J. & Astels, S. HDBSCAN: hierarchical density based clustering. J. Open Source Softw. 2, 205 (2017).

    Article 

    Google Scholar 

  • Related Articles

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    Back to top button