{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T15:20:53Z","timestamp":1768922453084,"version":"3.49.0"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2017,9,18]],"date-time":"2017-09-18T00:00:00Z","timestamp":1505692800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>As a fundamental task in bioinformatics, searching for massive short patterns over a long text has been accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g. suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text indexes, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are not efficient, especially for the patterns with many occurrences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual multiway tree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the multiway tree. Compared with existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>FMtree is freely available at https:\/\/github.com\/chhylp123\/FMtree.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx596","type":"journal-article","created":{"date-parts":[[2017,9,16]],"date-time":"2017-09-16T11:10:14Z","timestamp":1505560214000},"page":"416-424","source":"Crossref","is-referenced-by-count":10,"title":["FMtree: a fast locating algorithm of FM-indexes for genomic data"],"prefix":"10.1093","volume":"34","author":[{"given":"Haoyu","family":"Cheng","sequence":"first","affiliation":[{"name":"School of Computer Science, University of Science and Technology of China, Heifei, Anhui, China"},{"name":"Key Laboratory on High Performance Computing, Anhui Province, National University of Defense Technology, Changsha, China"},{"name":"Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha, China"}]},{"given":"Ming","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Science and Technology of China, Heifei, Anhui, China"},{"name":"Key Laboratory on High Performance Computing, Anhui Province, National University of Defense Technology, Changsha, China"}]},{"given":"Yun","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science, University of Science and Technology of China, Heifei, Anhui, China"},{"name":"Key Laboratory on High Performance Computing, Anhui Province, National University of Defense Technology, Changsha, China"},{"name":"Collaborative Innovation Center of High Performance Computing, National University of Defense Technology, Changsha, China"}]}],"member":"286","published-online":{"date-parts":[[2017,9,18]]},"reference":[{"key":"2023012712270154900_btx596-B1","doi-asserted-by":"crossref","first-page":"e41","DOI":"10.1093\/nar\/gkr1246","article-title":"Hobbes: optimized gram-based methods for efficient read alignment","volume":"40","author":"Ahmadi","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023012712270154900_btx596-B2","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1007\/11780441_29","article-title":"Reducing the space requirement of LZ-index","author":"Arroyuelo","year":"2006","journal-title":"Annual Symposium on Combinatorial Pattern Matching"},{"key":"2023012712270154900_btx596-B3","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1007\/s00453-010-9443-8","article-title":"Stronger Lempel-Ziv based compressed text indexing","volume":"62","author":"Arroyuelo","year":"2012","journal-title":"Algorithmica"},{"key":"2023012712270154900_btx596-B4","author":"Burrows","year":"1994"},{"key":"2023012712270154900_btx596-B5","doi-asserted-by":"crossref","first-page":"192.","DOI":"10.1186\/s12859-015-0626-9","article-title":"BitMapper: an efficient all-mapper based on bit-vector computing","volume":"16","author":"Cheng","year":"2015","journal-title":"BMC Bioinform"},{"key":"2023012712270154900_btx596-B6","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.jda.2010.09.004","article-title":"String matching with alphabet sampling","volume":"11","author":"Claude","year":"2012","journal-title":"J. Discrete Algorith"},{"key":"2023012712270154900_btx596-B7","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1145\/1070838.1070856","article-title":"The locality principle","volume":"48","author":"Denning","year":"2005","journal-title":"Commun. ACM"},{"key":"2023012712270154900_btx596-B8","doi-asserted-by":"crossref","first-page":"25.","DOI":"10.1186\/1748-7188-8-25","article-title":"Data compression for sequencing data","volume":"8","author":"Deorowicz","year":"2013","journal-title":"Algorith. Mol. Biol"},{"key":"2023012712270154900_btx596-B9","first-page":"390","author":"Ferragina","year":"2000"},{"key":"2023012712270154900_btx596-B10","first-page":"12.","article-title":"Compressed text indexes: from theory to practice","volume":"13","author":"Ferragina","year":"2009","journal-title":"J. Exp. Algorith. (JEA)"},{"key":"2023012712270154900_btx596-B11","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1007\/s00453-013-9782-3","article-title":"Distribution-aware compressed full-text indexes","volume":"67","author":"Ferragina","year":"2013","journal-title":"Algorithmica"},{"key":"2023012712270154900_btx596-B12","first-page":"1287","article-title":"Optimized succinct data structures for massive data","volume":"44","author":"Gog","year":"2014","journal-title":"Software: Prac. Exp"},{"key":"2023012712270154900_btx596-B13","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.jda.2015.01.006","article-title":"Improved and extended locating functionality on compressed suffix arrays","volume":"32","author":"Gog","year":"2015","journal-title":"J. Discrete Algorith"},{"key":"2023012712270154900_btx596-B14","first-page":"73","author":"Gog","year":"2017"},{"key":"2023012712270154900_btx596-B15","first-page":"216","author":"Gonz\u00e1lez","year":"2007"},{"key":"2023012712270154900_btx596-B16","first-page":"1","article-title":"Locally compressed suffix arrays","volume":"19","author":"Gonz\u00e1lez","year":"2015","journal-title":"J. Exp. Algorith. (JEA)"},{"key":"2023012712270154900_btx596-B17","first-page":"210","author":"Grabowski","year":"2004"},{"key":"2023012712270154900_btx596-B18","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1137\/S0097539702402354","article-title":"Compressed suffix arrays and suffix trees with applications to text indexing and string matching","volume":"35","author":"Grossi","year":"2005","journal-title":"SIAM J. Comput"},{"key":"2023012712270154900_btx596-B19","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1038\/nmeth0810-576","article-title":"mrsFAST: a cache-oblivious algorithm for short-read mapping","volume":"7","author":"Hach","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012712270154900_btx596-B20","first-page":"31","article-title":"Practical aspects of Compressed Suffix Arrays and FM-Index in Searching DNA Sequences","volume-title":"ALENEX\/ANALC","author":"Hon","year":"2004"},{"key":"2023012712270154900_btx596-B21","article-title":"Sparse suffix trees","author":"K\u00e4rkk\u00e4inen","year":"1996","journal-title":"COCOON: International Computing and Combinatorics Conference"},{"key":"2023012712270154900_btx596-B22","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012712270154900_btx596-B23","doi-asserted-by":"crossref","first-page":"1838","DOI":"10.1093\/bioinformatics\/bts280","article-title":"Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly","volume":"28","author":"Li","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012712270154900_btx596-B24","author":"Li","year":"2013"},{"key":"2023012712270154900_btx596-B25","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012712270154900_btx596-B26","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1093\/bioinformatics\/btv662","article-title":"rHAT: fast alignment of noisy long reads with regional hashing","volume":"32","author":"Liu","year":"2016","journal-title":"Bioinformatics"},{"key":"2023012712270154900_btx596-B27","first-page":"45","author":"M\u00e4kinen","year":"2005"},{"key":"2023012712270154900_btx596-B28","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1137\/0222058","article-title":"Suffix arrays: a new method for on-line string searches","volume":"22","author":"Manber","year":"1993","journal-title":"SIAM J. Comput"},{"key":"2023012712270154900_btx596-B29","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1038\/nmeth.2221","article-title":"The GEM mapper: fast, accurate and versatile alignment by filtration","volume":"9","author":"Marco-Sola","year":"2012","journal-title":"Nat. Methods"},{"key":"2023012712270154900_btx596-B30","doi-asserted-by":"crossref","first-page":"2.","DOI":"10.1145\/1216370.1216372","article-title":"Compressed full-text indexes","volume":"39","author":"Navarro","year":"2007","journal-title":"ACM Comput. Surveys (CSUR)"},{"key":"2023012712270154900_btx596-B31","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/S0196-6774(03)00087-7","article-title":"New text indexing functionalities of the compressed suffix arrays","volume":"48","author":"Sadakane","year":"2003","journal-title":"J. Algorith"},{"key":"2023012712270154900_btx596-B32","doi-asserted-by":"crossref","first-page":"i356","DOI":"10.1093\/bioinformatics\/btu440","article-title":"Fiona: a parallel and automatic strategy for read error correction","volume":"30","author":"Schulz","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712270154900_btx596-B33","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res"},{"key":"2023012712270154900_btx596-B34","doi-asserted-by":"crossref","first-page":"6993","DOI":"10.1093\/nar\/gks408","article-title":"Prospects and limitations of full-text index structures in genome analysis","volume":"40","author":"Vyverman","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023012712270154900_btx596-B35","doi-asserted-by":"crossref","first-page":"1632","DOI":"10.1093\/bioinformatics\/btv670","article-title":"Optimal seed solver: optimizing seed selection in read mapping","volume":"32","author":"Xin","year":"2016","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/3\/416\/48912969\/bioinformatics_34_3_416.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/3\/416\/48912969\/bioinformatics_34_3_416.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:13:54Z","timestamp":1674825234000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/3\/416\/4160683"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,9,18]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx596","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,2,1]]},"published":{"date-parts":[[2017,9,18]]}}}