Zero-shot de novo peptide sequencing with open posttranslational modification discovery

Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003).
Google Scholar
Dupree, E. J. et al. A critical review of bottom-up proteomics: the good, the bad, and the future of this field. Proteomes 8, 14 (2020).
Google Scholar
Matthiesen, R. Methods, algorithms and tools in computational proteomics: a practical point of view. Proteomics 7, 2815–2832 (2007).
Google Scholar
Chen, C., Hou, J., Tanner, J. J. & Cheng, J. Bioinformatics methods for mass spectrometry-based proteomics data analysis. Int. J. Mol. Sci. 21, 2873 (2020).
Google Scholar
Szabo, Z. & Janaky, T. Challenges and developments in protein identification using mass spectrometry. Trends Anal. Chem. 69, 76–87 (2015).
Google Scholar
Mudge, J. M. et al. Standardized annotation of translated open reading frames. Nat. Biotechnol. 40, 994–999 (2022).
Google Scholar
Zhu, C. et al. Identification of non-canonical peptides with moPepGen. Nat. Biotechnol. 44, 568–573 (2026).
Google Scholar
Keenan, E. K., Zachman, D. K. & Hirschey, M. D. Discovering the landscape of protein modifications. Mol. Cell 81, 1868–1878 (2021).
Google Scholar
Kleikamp, H. B. C. et al. Database-independent de novo metaproteomics of complex microbial communities. Cell Syst. 12, 375–383 (2021).
Google Scholar
Minegishi, Y., Haga, Y. & Ueda, K. Emerging potential of immunopeptidomics by mass spectrometry in cancer immunotherapy. Cancer Sci. 115, 1048–1059 (2024).
Google Scholar
Muth, T. & Renard, B. Y. Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification? Brief. Bioinform. 19, 954–970 (2017).
Google Scholar
Wen, B. et al. Deep learning in proteomics. Proteomics 20, 1900335 (2020).
Google Scholar
Eloff, K. et al. InstaNovo enables diffusion-powered de novo peptide sequencing in large-scale proteomics experiments. Nat. Mach. Intell. 7, 565–579 (2025).
Google Scholar
Zhang, X. et al. π-PrimeNovo: an accurate and efficient non-autoregressive deep learning model for de novo peptide sequencing. Nat. Commun. 16, 267 (2025).
Google Scholar
Melendez, C. et al. Accounting for digestion enzyme bias in Casanovo. J. Proteome Res. 23, 4761–4769 (2024).
Google Scholar
Grimes, M. et al. Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks. Sci. Signal. 11, eaaq1087 (2018).
Google Scholar
Schwämmle, V. et al. Systems level analysis of histone H3 post-translational modifications (PTMs) reveals features of PTM crosstalk in chromatin regulation. Mol. Cell. Proteomics 15, 2715–2729 (2016).
Google Scholar
Liu, J., Qian, C. & Cao, X. Post-translational modification control of innate immunity. Immunity 45, 15–30 (2016).
Google Scholar
Sui, Y., Shen, Z., Wang, Z., Feng, J. & Zhou, G. Lactylation in cancer: metabolic mechanism and therapeutic strategies. Cell Death Discov. 11, 68 (2025).
Google Scholar
He, X. et al. Lysine vitcylation is a vitamin C-derived protein modification that enhances STAT1-mediated immune response. Cell 188, 1858–1877 (2025).
Google Scholar
Paik, Y.-K. et al. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 30, 221–223 (2012).
Google Scholar
Su, J. et al. RoFormer: enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024).
Google Scholar
Mao, Z., Zhang, R., Xin, L. & Li, M. Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model. Nat. Mach. Intell. 5, 1250–1260 (2023).
Google Scholar
Zolg, D. P. et al. ProteomeTools: systematic characterization of 21 post-translational protein modifications by liquid chromatography tandem mass spectrometry (LC-MS/MS) using synthetic peptides. Mol. Cell. Proteomics 17, 1850–1863 (2018).
Google Scholar
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics 11, M111.010587 (2012).
Google Scholar
Tran, N. H. et al. NovoBoard: a comprehensive framework for evaluating the false discovery rate and accuracy of de novo peptide sequencing. Mol. Cell. Proteomics 23, 100849 (2024).
Google Scholar
Yu, F. et al. Identification of modified peptides using localization-aware open search. Nat. Commun. 11, 4065 (2020).
Google Scholar
Rebak, A. S. et al. A quantitative and site-specific atlas of the citrullinome reveals widespread existence of citrullination and insights into PADI4 substrates. Nat. Struct. Mol. Biol. 31, 977–995 (2024).
Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Google Scholar
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).
Dao, T., Fu, D. Y., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-awareness. In Proc. 36th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 16344–16359 (Curran Associates, 2022).
Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR 2015), Conference Track Proceedings (eds Bengio, Y. & LeCun, Y.) (2015).
Tran, N. H., Zhang, X., Xin, L., Shan, B. & Li, M. De novo peptide sequencing by deep learning. Proc. Natl Acad. Sci. USA 114, 8247–8252 (2017).
Google Scholar
Qiao, R. et al. Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices. Nat. Mach. Intell. 3, 420–425 (2021).
Google Scholar
Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).
Google Scholar
Needleman, S. B. & Wunsch, C. D. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970).
Google Scholar
Treen, D. G. C. et al. SIMILE enables alignment of tandem mass spectra with statistical significance. Nat. Commun. 13, 2510 (2022).
Google Scholar
Zhong, H., Marcus, S. L. & Li, L. Two-dimensional mass spectra generated from the analysis of 15N-labeled and unlabeled peptides for efficient protein identification and de novo peptide sequencing. J. Proteome Res. 3, 1155–1163 (2004).
Google Scholar
Mao, Z. RNovA test datasets. Zenodo https://doi.org/10.5281/zenodo.15715597 (2026).
Perez-Riverol, Y. et al. The PRIDE database at 20 years: 2025 update. Nucleic Acids Res. 53, D543–D553 (2024).
Google Scholar
Zhang, Q. RNovA. GitHub https://github.com/zqq66/RNovA (2026).


