Reference-free discovery with barcoded single-cell sequencing

Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Google Scholar
Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33, 495–502 (2015).
Google Scholar
Olivieri, J. E. et al. RNA splicing programs define tissue compartments and cell types at single-cell resolution. eLife 10, e70692 (2021).
Google Scholar
Olivieri, J. E., Dehghannasiri, R. & Salzman, J. The SpliZ generalizes ‘percent spliced in’ to reveal regulated splicing at single-cell resolution. Nat. Methods 19, 307–310 (2022).
Google Scholar
Xiang, X., He, Y., Zhang, Z. & Yang, X. Interrogations of single-cell RNA splicing landscapes with SCASL define new cell identities with physiological relevance. Nat. Commun. 15, 2164 (2024).
Google Scholar
Cuddleston, W. H. et al. Cellular and genetic drivers of RNA editing variation in the human brain. Nat. Commun. 13, 2997 (2022).
Google Scholar
Sturm, G. et al. Scirpy: a Scanpy extension for analyzing single-cell T-cell receptor-sequencing data. Bioinformatics 36, 4817–4818 (2020).
Google Scholar
Borcherding, N., Bormann, N. L. & Kraus, G. scRepertoire: an R-based toolkit for single-cell immune receptor analysis. F1000Res. 9, 47 (2020).
Google Scholar
Meyer, E., Chaung, K., Dehghannasiri, R. & Salzman, J. ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq. Genome Biol. 23, 226 (2022).
Google Scholar
Gao, Y., Li, L., Amos, C. I. & Li, W. Analysis of alternative polyadenylation from single-cell RNA-seq using scDaPars reveals cell subpopulations invisible to gene expression. Genome Res. 31, 1856–1866 (2021).
Google Scholar
Patrick, R. et al. Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data. Genome Biol. 21, 167 (2020).
Google Scholar
Chaung, K. et al. SPLASH: a statistical, reference-free genomic algorithm unifies biological discovery. Cell 186, 5440–5456 (2023).
Google Scholar
Ungaro, A. et al. Challenges and advances for transcriptome assembly in non-model species. PLoS ONE 12, e0185020 (2017).
Google Scholar
Kokot, M., Dehghannasiri, R., Baharav, T., Salzman, J. & Deorowicz, S.Scalable and unsupervised discovery from raw sequencing reads using SPLASH2. Nat. Biotechnol. 43, 1084–1090 (2025).
Google Scholar
Dehghannasiri, R. et al. Unsupervised reference-free inference reveals unrecognized regulated transcriptomic complexity in human single cells. eLife 14, RP105979 (2025).
Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).
Google Scholar
Baharav, T. Z., Tse, D. & Salzman, J. OASIS: an interpretable, finite-sample valid alternative to Pearson’s χ2 for scientific discovery. Proc. Natl Acad. Sci. USA 121, e2304671121 (2024).
Google Scholar
Kaminow, B., Yunusov, D. & Dobin, A. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/2021.05.05.442755 (2021).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Google Scholar
Tabula Sapiens Consortium et al. The Tabula Sapiens: a multiple-organ, single-cell transcriptomic atlas of humans. Science 376, eabl4896 (2022).
Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Google Scholar
Henderson, G. et al. Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly. Preprint at bioRxiv https://doi.org/10.1101/2024.01.18.576133 (2024).
Ye, J., Ma, N., Madden, T. L. & Ostell, J. M. IgBLAST: an immunoglobulin variable domain sequence analysis tool. Nucleic Acids Res. 41, W34–W40 (2013).
Google Scholar
Gulati, G. S., D’Silva, J. P., Liu, Y., Wang, L. & Newman, A. M. Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics. Nat. Rev. Mol. Cell Biol. 26, 11–31 (2025).
Google Scholar
Ji, A. L. et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell 182, 1661–1662 (2020).
Google Scholar
Huang, W.-C. et al. A novel miR-365-3p/EHF/keratin 16 axis promotes oral squamous cell carcinoma metastasis, cancer stemness and drug resistance via enhancing β5-integrin/c-met signaling pathway. J. Exp. Clin. Cancer Res. 38, 89 (2019).
Google Scholar
Moll, R., Divo, M. & Langbein, L. The human keratins: biology and pathology. Histochem. Cell Biol. 129, 705–733 (2008).
Google Scholar
Fawkner-Corbett, D. et al. Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell 184, 810–826 (2021).
Google Scholar
Gotter, A. L., Kaetzel, M. A. & Dedman, J. R. Electrophorus electricus as a model system for the study of membrane excitability. Comp. Biochem. Physiol. A Mol. Integr. Physiol. 119, 225–241 (1998).
Google Scholar
Musser, J. M. et al. Profiling cellular diversity in sponges informs animal cell type and nervous system evolution. Science 374, 717–723 (2021).
Google Scholar
Cao, C. et al. Comprehensive single-cell transcriptome lineages of a proto-vertebrate. Nature 571, 349–354 (2019).
Google Scholar
Satou, Y. et al. A manually curated gene model set for an Ascidian, Ciona robusta (Ciona intestinalis type A). Zoolog. Sci. 39, 253–260 (2022).
Google Scholar
Johnsen, A. H. & Rehfeld, J. F. Cionin: a disulfotyrosyl hybrid of cholecystokinin and gastrin from the neural ganglion of the protochordate Ciona intestinalis. J. Biol. Chem. 265, 3054–3058 (1990).
Google Scholar
Longo, V. et al. The conservation and diversity of ascidian cells and molecules involved in the inflammatory reaction: the Ciona robusta model. Fish Shellfish Immunol. 119, 384–396 (2021).
Google Scholar
Hu, H. et al. Constrained vertebrate evolution by pleiotropic genes. Nat. Ecol. Evol. 1, 1722–1730 (2017).
Google Scholar
Parrinello, N., Cammarata, M. & Parrinello, D. in Advances in Comparative Immunology (ed. Cooper, E. L.) 521–590 (2018).
Lasda, E. L. & Blumenthal, T. Trans-splicing. Wiley Interdiscip. Rev. RNA 2, 417–434 (2011).
Google Scholar
Satou, Y., Hamaguchi, M., Takeuchi, K., Hastings, K. E. M. & Satoh, N. Genomic overview of mRNA 5′-leader trans-splicing in the ascidian Ciona intestinalis. Nucleic Acids Res. 34, 3378–3388 (2006).
Google Scholar
Matsumoto, J. et al. High-throughput sequence analysis of Ciona intestinalis SL trans-spliced mRNAs: alternative expression modes and gene function correlates. Genome Res. 20, 636–645 (2010).
Google Scholar
Simmen, M. W. & Bird, A. Sequence analysis of transposable elements in the sea squirt, Ciona intestinalis. Mol. Biol. Evol. 17, 1685–1694 (2000).
Google Scholar
Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).
Google Scholar
Satou, Y., Kawashima, T., Shoguchi, E., Nakayama, A. & Satoh, N. An integrated database of the ascidian, Ciona intestinalis: towards functional genomics. Zoolog. Sci. 22, 837–843 (2005).
Google Scholar
Kokot, M., Dlugosz, M. & Deorowicz, S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33, 2759–2761 (2017).
Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Google Scholar
Gupta, N. T. et al. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data. Bioinformatics 31, 3356–3358 (2015).
Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Google Scholar
Rasmont, R. Une technique de culture des éponges d’eau douce en milieu contrôlé. Ann. Soc. R. Zool. Belg. 91, 147–156 (1961).
Funayama, N., Nakatsukasa, M., Hayashi, T. & Agata, K. Isolation of the choanocyte in the fresh water sponge, Ephydatia fluviatilis and its lineage marker, Ef annexin. Dev Growth Differ 47, 243–253 (2005).
Google Scholar
Nichols, S. HCR-fluorescent in situ hybridization (HCR-FISH) of gemmule-hatched freshwater sponges v1. protocols.io https://doi.org/10.17504/protocols.io.5jyl8jwkdg2w/v1 (2023).
Steentoft, C. et al. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 32, 1478–1488 (2013).
Google Scholar
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
Google Scholar
Hebsgaard, S. M. et al. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res. 24, 3439–3452 (1996).
Google Scholar
Elagoz, A. M. et al. Optimization of whole mount RNA multiplexed hybridization chain reaction with immunohistochemistry, clearing and imaging to visualize octopus embryonic neurogenesis. Front. Physiol. 13, 882413 (2022).
Google Scholar
Kuehn, E. et al. Segment number threshold determines juvenile onset of germline cluster expansion in Platynereis dumerilii. J. Exp. Zool. B Mol. Dev. Evol. 338, 225–240 (2022).
Google Scholar
Pisco, A. & Tabula Sapiens Consortium. Tabula Sapiens single-cell dataset. figshare https://doi.org/10.6084/m9.figshare.14267219 (2023).

