{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T16:35:35Z","timestamp":1771518935698,"version":"3.50.1"},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2018,10,22]],"date-time":"2018-10-22T00:00:00Z","timestamp":1540166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Sunstar","award":["1R01AI125982"],"award-info":[{"award-number":["1R01AI125982"]}]},{"name":"Sunstar","award":["1R01DE024523"],"award-info":[{"award-number":["1R01DE024523"]}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Sequence analysis is arguably a foundation of modern biology. Classic approaches to sequence analysis are based on sequence alignment, which is limited when dealing with large-scale sequence data. A dozen of alignment-free approaches have been developed to provide computationally efficient alternatives to alignment-based approaches. However, existing methods define sequence similarity based on various heuristics and can only provide rough approximations to alignment distances.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this article, we developed a new approach, referred to as SENSE (SiamEse Neural network for Sequence Embedding), for efficient and accurate alignment-free sequence comparison. The basic idea is to use a deep neural network to learn an explicit embedding function based on a small training dataset to project sequences into an embedding space so that the mean square error between alignment distances and pairwise distances defined in the embedding space is minimized. To the best of our knowledge, this is the first attempt to use deep learning for alignment-free sequence analysis. A large-scale experiment was performed that demonstrated that our method significantly outperformed the state-of-the-art alignment-free methods in terms of both efficiency and accuracy.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Open-source software for the proposed method is developed and freely available at https:\/\/www.acsu.buffalo.edu\/\u223cyijunsun\/lab\/SENSE.html.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty887","type":"journal-article","created":{"date-parts":[[2018,10,18]],"date-time":"2018-10-18T11:11:42Z","timestamp":1539861102000},"page":"1820-1828","source":"Crossref","is-referenced-by-count":48,"title":["SENSE: Siamese neural network for sequence embedding and alignment-free comparison"],"prefix":"10.1093","volume":"35","author":[{"given":"Wei","family":"Zheng","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA"}]},{"given":"Le","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA"}]},{"given":"Robert J","family":"Genco","sequence":"additional","affiliation":[{"name":"Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY, USA"},{"name":"Department of Microbiology and Immunology, University at Buffalo, The State University of New York, Buffalo, NY, USA"}]},{"given":"Jean","family":"Wactawski-Wende","sequence":"additional","affiliation":[{"name":"Department of Epidemiology and Environmental Health, University at Buffalo, The State University of New York, Buffalo, NY, USA"}]},{"given":"Michael","family":"Buck","sequence":"additional","affiliation":[{"name":"Department of Biochemistry, University at Buffalo, The State University of New York, Buffalo, NY, USA"}]},{"given":"Yijun","family":"Sun","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA"},{"name":"Department of Microbiology and Immunology, University at Buffalo, The State University of New York, Buffalo, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,10,22]]},"reference":[{"key":"2023012713221727400_bty887-B1","article-title":"A survey on metric learning for feature vectors and structured data","volume-title":"arXiv preprint arXiv: 1306.6709","author":"Bellet","year":"2013"},{"key":"2023012713221727400_bty887-B2","doi-asserted-by":"crossref","first-page":"890","DOI":"10.1093\/bib\/bbt052","article-title":"Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis","volume":"15","author":"Bonham-Carter","year":"2014","journal-title":"Brief. Bioinformatics"},{"key":"2023012713221727400_bty887-B3","first-page":"737","article-title":"Signature verification using a \u201csiamese\u201d time delay neural network","volume-title":"Advances in Neural Information Processing Systems","author":"Bromley","year":"1994"},{"key":"2023012713221727400_bty887-B4","doi-asserted-by":"crossref","first-page":"e95","DOI":"10.1093\/nar\/gkr349","article-title":"ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time","volume":"39","author":"Cai","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012713221727400_bty887-B5","doi-asserted-by":"crossref","first-page":"e1005518","DOI":"10.1371\/journal.pcbi.1005518","article-title":"ESPRIT-Forest: parallel clustering of massive amplicon sequence data in subquadratic time","volume":"13","author":"Cai","year":"2017","journal-title":"PLoS Comput. Biol."},{"key":"2023012713221727400_bty887-B6","doi-asserted-by":"crossref","first-page":"e1500183","DOI":"10.1126\/sciadv.1500183","article-title":"The microbiome of uncontacted Amerindians","volume":"1","author":"Clemente","year":"2015","journal-title":"Sci. Adv."},{"key":"2023012713221727400_bty887-B7","first-page":"48","volume-title":"Approximation with Artificial Neural Networks","author":"Cs\u00e1ji","year":"2001"},{"key":"2023012713221727400_bty887-B8","first-page":"69","article-title":"Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts","volume-title":"International Conference on Computational Linguistics","author":"Dos Santos","year":"2014"},{"key":"2023012713221727400_bty887-B9","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1471-2148-7-41","article-title":"Whole genome molecular phylogeny of large dsDNA viruses using composition vector method","volume":"7","author":"Gao","year":"2007","journal-title":"BMC Evol. Biol."},{"key":"2023012713221727400_bty887-B10","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1016\/j.gene.2011.11.004","article-title":"Genome-based phylogeny of dsDNA viruses by a novel alignment-free method","volume":"492","author":"Gao","year":"2012","journal-title":"Gene"},{"key":"2023012713221727400_bty887-B11","first-page":"1319","article-title":"Maxout networks","volume-title":"International Conference on Machine Learning","author":"Goodfellow","year":"2013"},{"key":"2023012713221727400_bty887-B12","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1089\/cmb.2009.0106","article-title":"Estimating mutation distances from unaligned genomes","volume":"16","author":"Haubold","year":"2009","journal-title":"J. Comput. Biol."},{"key":"2023012713221727400_bty887-B13","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"2023012713221727400_bty887-B14","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/S0168-9525(00)89076-9","article-title":"Dinucleotide relative abundance extremes: a genomic signature","volume":"11","author":"Karlin","year":"1995","journal-title":"Trends Genet."},{"key":"2023012713221727400_bty887-B15","first-page":"1","article-title":"Adam: a method for stochastic optimization","volume-title":"International Conference on Learning Representations","author":"Kingma","year":"2014"},{"key":"2023012713221727400_bty887-B16","first-page":"1097","article-title":"ImageNet classification with deep convolutional neural networks","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky","year":"2012"},{"key":"2023012713221727400_bty887-B17","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"2023012713221727400_bty887-B18","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023012713221727400_bty887-B19","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1145\/1553374.1553453","article-title":"Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations","volume-title":"Proceedings of the 26th Annual International Conference on Machine Learning","author":"Lee","year":"2009"},{"key":"2023012713221727400_bty887-B20","doi-asserted-by":"crossref","first-page":"2000","DOI":"10.1093\/bioinformatics\/btu331","article-title":"Kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison","volume":"30","author":"Leimeister","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713221727400_bty887-B21","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1038\/234034a0","article-title":"Distance between sets","volume":"234","author":"Levandowsky","year":"1971","journal-title":"Nature"},{"key":"2023012713221727400_bty887-B22","first-page":"310","article-title":"Parallel hierarchical clustering in linearithmic time for large-scale sequence analysis","volume-title":"IEEE International Conference on Data Mining","author":"Mao","year":"2015"},{"key":"2023012713221727400_bty887-B23","first-page":"807","article-title":"Rectified linear units improve restricted boltzmann machines","volume-title":"Proceedings of the 27th International Conference on Machine Learning","author":"Nair","year":"2010"},{"key":"2023012713221727400_bty887-B24","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023012713221727400_bty887-B25","doi-asserted-by":"crossref","first-page":"2677","DOI":"10.1073\/pnas.0813249106","article-title":"Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions","volume":"106","author":"Sims","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012713221727400_bty887-B26","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1093\/bib\/bbt067","article-title":"New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing","volume":"15","author":"Song","year":"2014","journal-title":"Brief. Bioinformatics"},{"key":"2023012713221727400_bty887-B27","unstructured":"Sugar\n              C.A.\n            \n           (1998). \nTechniques for clustering and classification with applications to medical problems. PhD Thesis, Stanford University."},{"key":"2023012713221727400_bty887-B28","doi-asserted-by":"crossref","first-page":"e76","DOI":"10.1093\/nar\/gkp285","article-title":"ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences","volume":"37","author":"Sun","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012713221727400_bty887-B29","doi-asserted-by":"crossref","first-page":"e205","DOI":"10.1093\/nar\/gkq872","article-title":"Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data","volume":"38","author":"Sun","year":"2010","journal-title":"Nucleic Acids Res."},{"key":"2023012713221727400_bty887-B30","first-page":"e69","article-title":"Computational approach for deriving cancer progression roadmaps from static sample data","volume":"45","author":"Sun","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"2023012713221727400_bty887-B31","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","article-title":"A global geometric framework for nonlinear dimensionality reduction","volume":"290","author":"Tenenbaum","year":"2000","journal-title":"Science"},{"key":"2023012713221727400_bty887-B32","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1089\/cmb.2006.13.336","article-title":"The average common substring approach to phylogenomic reconstruction","volume":"13","author":"Ulitsky","year":"2006","journal-title":"J. Comput. Biol."},{"key":"2023012713221727400_bty887-B33","first-page":"203","article-title":"Active clustering of biological sequences","volume":"13","author":"Voevodski","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"2023012713221727400_bty887-B34","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012713221727400_bty887-B35","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol."},{"key":"2023012713221727400_bty887-B36","first-page":"521","article-title":"Distance metric learning with application to clustering with side-information","volume-title":"Advances in Neural Information Processing Systems","author":"Xing","year":"2003"},{"key":"2023012713221727400_bty887-B37","article-title":"A parallel computational framework for ultra-large-scale sequence clustering analysis","volume":"35","author":"Zheng","year":"2018","journal-title":"Bioinformatics"},{"key":"2023012713221727400_bty887-B38","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/11\/1820\/48934640\/bty887.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/11\/1820\/48934640\/bty887.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T14:16:42Z","timestamp":1674829002000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/11\/1820\/5140215"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,10,22]]},"references-count":38,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2019,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty887","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,6,1]]},"published":{"date-parts":[[2018,10,22]]}}}