{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:24:48Z","timestamp":1772166288856,"version":"3.50.1"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T00:00:00Z","timestamp":1712707200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T00:00:00Z","timestamp":1712707200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Human Genome Research Institute,United States","award":["R01HG011392"],"award-info":[{"award-number":["R01HG011392"]}]},{"name":"National Science Foundation, United States","award":["DBI2029552"],"award-info":[{"award-number":["DBI2029552"]}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["JP21K17701"],"award-info":[{"award-number":["JP21K17701"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science,Japan","doi-asserted-by":"crossref","award":["JP22H03551"],"award-info":[{"award-number":["JP22H03551"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science,Japan","doi-asserted-by":"crossref","award":["JP20H04141"],"award-info":[{"award-number":["JP20H04141"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["R01AI14180"],"award-info":[{"award-number":["R01AI14180"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN071852020"],"award-info":[{"award-number":["RGPIN071852020"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    FM-indexes are crucial data structures in DNA alignment, but searching with them usually takes at least one random access per character in the query pattern. Ferragina and Fischer [1] observed in 2007 that word-based indexes often use fewer random accesses than character-based indexes, and thus support faster searches. Since DNA lacks natural word-boundaries, however, it is necessary to parse it somehow before applying word-based FM-indexing. In 2022, Deng et al. [2] proposed parsing genomic data by induced suffix sorting, and showed that the resulting word-based FM-indexes support faster counting queries than standard FM-indexes when patterns are a few thousand characters or longer. In this paper we show that using prefix-free parsing\u2014which takes parameters that let us tune the average length of the phrases\u2014instead of induced suffix sorting, gives a significant speedup for patterns of only a few hundred characters. We implement our method and demonstrate it is between 3 and 18 times faster than competing methods on queries to GRCh38, and is consistently faster on queries made to 25,000, 50,000 and 100,000 SARS-CoV-2 genomes. Hence, it seems our method accelerates the performance of count over all state-of-the-art methods with a moderate increase in the memory. The source code for\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\texttt {PFP-FM}$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mrow>\n                            <mml:mi>PFP<\/mml:mi>\n                            <mml:mo>-<\/mml:mo>\n                            <mml:mi>FM<\/mml:mi>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    is available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/AaronHong1024\/afm\">https:\/\/github.com\/AaronHong1024\/afm<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1186\/s13015-024-00260-8","type":"journal-article","created":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T00:01:27Z","timestamp":1712707287000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Pfp-fm: an accelerated FM-index"],"prefix":"10.1186","volume":"19","author":[{"given":"Aaron","family":"Hong","sequence":"first","affiliation":[]},{"given":"Marco","family":"Oliva","sequence":"additional","affiliation":[]},{"given":"Dominik","family":"K\u00f6ppl","sequence":"additional","affiliation":[]},{"given":"Hideo","family":"Bannai","sequence":"additional","affiliation":[]},{"given":"Christina","family":"Boucher","sequence":"additional","affiliation":[]},{"given":"Travis","family":"Gagie","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,4,10]]},"reference":[{"key":"260_CR1","doi-asserted-by":"publisher","first-page":"328","DOI":"10.1007\/978-3-540-73437-6_33","volume-title":"Proceedings of the 18th Annual Symposium Combinatorial Pattern Matching (CPM)","author":"P Ferragina","year":"2007","unstructured":"Ferragina P, Fischer J. Suffix arrays on words. In: Ma B, Zhang K, editors. Proceedings of the 18th Annual Symposium Combinatorial Pattern Matching (CPM). London: Springer; 2007. p. 328\u201339."},{"key":"260_CR2","first-page":"63","volume-title":"Proceedings of the IEEE Data Compression Conference (DCC)","author":"J-J Deng","year":"2022","unstructured":"Deng J-J, Hon W-K, K\u00f6ppl D. Sadakane K, FM-indexing grammars induced by suffix sorting for long patterns. In: Deng JJ, editor. Proceedings of the IEEE Data Compression Conference (DCC). Snowbird: IEEE; 2022. p. 63\u201372."},{"key":"260_CR3","doi-asserted-by":"publisher","first-page":"552","DOI":"10.1145\/1082036.1082039","volume":"52","author":"P Ferragina","year":"2005","unstructured":"Ferragina P, Manzini G. Indexing compressed text. J ACM. 2005;52:552\u201381.","journal-title":"J ACM"},{"issue":"3","key":"260_CR4","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1186\/gb-2009-10-3-r25","volume":"10","author":"B Langmead","year":"2009","unstructured":"Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):25\u201325.","journal-title":"Genome Biol"},{"key":"260_CR5","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1303.3997","author":"H Li","year":"2013","unstructured":"Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Cornell Univ. 2013. https:\/\/doi.org\/10.48550\/arXiv.1303.3997.","journal-title":"Cornell Univ"},{"key":"260_CR6","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1007\/11496656_5","volume-title":"Proceedings of the annual symposium on combinatorial pattern matching","author":"V M\u00e4kinen","year":"2005","unstructured":"M\u00e4kinen V, Navarro G. Succinct suffix arrays based on run-length encoding. In: Apostolico A, Crochemore M, Park K, editors. Proceedings of the annual symposium on combinatorial pattern matching. Jeju Island: Springer; 2005. p. 45\u201356."},{"key":"260_CR7","doi-asserted-by":"publisher","first-page":"1370","DOI":"10.1007\/s00453-018-0475-9","volume":"81","author":"S Gog","year":"2019","unstructured":"Gog S, K\u00e4rkk\u00e4inen J, Kempa D, Petri M, Puglisi SJ. Fixed block compression boosting in fm-indexes: theory and practice. Algorithmica. 2019;81:1370\u201391.","journal-title":"Algorithmica"},{"issue":"1","key":"260_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3375890","volume":"67","author":"T Gagie","year":"2020","unstructured":"Gagie T, Navarro G, Prezza N. Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J ACM. 2020;67(1):1\u201354.","journal-title":"J ACM"},{"key":"260_CR9","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1109\/DCC.2018.00012","volume-title":"2018 Data Compression Conference","author":"DSN Nunes","year":"2018","unstructured":"Nunes DSN, Louza FA, Gog S, Ayala-Rinc\u00f3n M, Navarro G. A grammar compression algorithm based on induced suffix sorting. In: Nunes DSN, editor. 2018 Data Compression Conference. IEEE: Snowbird; 2018. p. 42\u201351. https:\/\/doi.org\/10.1109\/DCC.2018.00012."},{"issue":"10","key":"260_CR10","doi-asserted-by":"publisher","first-page":"1471","DOI":"10.1109\/TC.2010.188","volume":"60","author":"G Nong","year":"2011","unstructured":"Nong G, Zhang S, Chan WH. Two efficient algorithms for linear time suffix array construction. IEEE Trans Comput. 2011;60(10):1471\u201384. https:\/\/doi.org\/10.1109\/TC.2010.188.","journal-title":"IEEE Trans Comput"},{"issue":"1\u20133","key":"260_CR11","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1016\/S0304-3975(03)00142-7","volume":"304","author":"P Crescenzi","year":"2003","unstructured":"Crescenzi P, Lungo AD, Grossi R, Lodi E, Pagli L, Rossi G. Text sparsification via local maxima. Theor Comput Sci. 2003;304(1\u20133):341\u201364.","journal-title":"Theor Comput Sci"},{"key":"260_CR12","first-page":"326","volume-title":"Proceedings of the 13th symposium on experimental algorithms (SEA)","author":"S Gog","year":"2014","unstructured":"Gog S, Beller T, Moffat A, Petri M. From theory to practice: plug and play with succinct data structures. In: Gog S, editor. Proceedings of the 13th symposium on experimental algorithms (SEA). Copenhagen: Springer; 2014. p. 326\u201337."},{"key":"260_CR13","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1007\/978-3-642-03784-9_7","volume-title":"Proceedings of the 16th International Symposium String Processing and Information Retrieval (SPIRE)","author":"J Siren","year":"2009","unstructured":"Siren J. Compressed suffix arrays for massive data. In: Karlgren J, Tarhio J, Hyyro H, editors. Proceedings of the 16th International Symposium String Processing and Information Retrieval (SPIRE). Berlin: Springer; 2009. p. 63\u201374."},{"key":"260_CR14","unstructured":"M\u00e4kinen V, Navarro G. Run-length FM-index. In: Proceedings of the DIMACS Workshop: \u201cThe Burrows-Wheeler Transform: Ten Years Later\u201d. 2004; pp. 17\u201319."},{"issue":"5","key":"260_CR15","doi-asserted-by":"publisher","first-page":"935","DOI":"10.1137\/0222058","volume":"22","author":"U Manber","year":"1993","unstructured":"Manber U, Myers GW. Suffix arrays: a new method for on-line string searches. SIAM J Comput. 1993;22(5):935\u201348.","journal-title":"SIAM J Comput"},{"issue":"1","key":"260_CR16","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/s13015-019-0148-5","volume":"14","author":"C Boucher","year":"2019","unstructured":"Boucher C, Gagie T, Kuhnle A, Langmead B, Manzini G, Mun T. Prefix-free parsing for building big BWTs. Algorithms Mol Biol. 2019;14(1):13\u201311315.","journal-title":"Algorithms Mol Biol"},{"key":"260_CR17","doi-asserted-by":"publisher","DOI":"10.1186\/s13015-019-0148-5","author":"C Boucher","year":"2018","unstructured":"Boucher C, Gagie T, Kuhnle A, Manzini G. Prefix-free parsing for building big BWTs. Algorithms Biol (WABI). 2018. https:\/\/doi.org\/10.1186\/s13015-019-0148-5.","journal-title":"Algorithms Biol (WABI)"},{"issue":"4","key":"260_CR18","doi-asserted-by":"publisher","first-page":"514","DOI":"10.1089\/cmb.2019.0316","volume":"27","author":"T Mun","year":"2020","unstructured":"Mun T, Kuhnle A, Boucher C, Gagie T, Langmead B, Manzini G. Matching reads to many genomes with the r-index. J Comput Biol. 2020;27(4):514\u20138. https:\/\/doi.org\/10.1089\/cmb.2019.0316.","journal-title":"J Comput Biol"},{"key":"260_CR19","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1109\/DCC55655.2023.00014","volume-title":"2023 data compression conference (DCC)","author":"M Oliva","year":"2023","unstructured":"Oliva M, Gagie T, Boucher C. Recursive prefix-free parsing for building big BWTs. In: Oliva M, editor. 2023 data compression conference (DCC). IEEE: Snowbird; 2023. p. 62\u201370."},{"key":"260_CR20","doi-asserted-by":"publisher","first-page":"370","DOI":"10.1007\/11786986_33","volume":"387","author":"A Golynski","year":"2006","unstructured":"Golynski A. Optimal lower bounds for rank and select indexes. 2006;387:370\u201381. https:\/\/doi.org\/10.1007\/11786986_33.","journal-title":"Optimal lower bounds for rank and select indexes"},{"key":"260_CR21","doi-asserted-by":"publisher","first-page":"33","DOI":"10.12688\/f1000research.29032.2","volume":"10","author":"F M\u00f6lder","year":"2021","unstructured":"M\u00f6lder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, et al. Sustainable data analysis with Snakemake. F1000Research. 2021;10:33.","journal-title":"F1000Research"},{"issue":"3","key":"260_CR22","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1093\/bioinformatics\/bty648","volume":"35","author":"B Langmead","year":"2018","unstructured":"Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2018;35(3):421\u201332. https:\/\/doi.org\/10.1093\/bioinformatics\/bty648.","journal-title":"Bioinformatics"},{"issue":"4","key":"260_CR23","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/nmeth.1923","volume":"9","author":"B Langmead","year":"2012","unstructured":"Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357\u20139.","journal-title":"Nat Methods"},{"issue":"3","key":"260_CR24","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1186\/gb-2009-10-3-r25","volume":"10","author":"B Langmead","year":"2009","unstructured":"Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):25. https:\/\/doi.org\/10.1186\/gb-2009-10-3-r25.","journal-title":"Genome Biol"},{"key":"260_CR25","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1007\/978-3-030-86692-1_8","volume-title":"Proceedings of the 28th international symposium on string processing and information retrieval (SPIRE)","author":"T Akagi","year":"2021","unstructured":"Akagi T, K\u00f6ppl D, Nakashima Y, Inenaga S, Bannai H, Takeda M. Grammar index by induced suffix sorting. In: Lecroq T, Touzet H, editors. Proceedings of the 28th international symposium on string processing and information retrieval (SPIRE). Cham: Springer; 2021. p. 85\u201399."},{"key":"260_CR26","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1137\/1.9781611977561.ch11","volume-title":"The proceedings of the symposium on algorithm engineering and experiments (ALENEX)","author":"A Hong","year":"2023","unstructured":"Hong A, Rossi M, Boucher C. LZ77 via prefix-free parsing. In: Hong A, editor. The proceedings of the symposium on algorithm engineering and experiments (ALENEX). Philadelphia: SIAM; 2023. p. 123\u201334."},{"key":"260_CR27","volume-title":"FM-indexing grammars induced by suffix sorting for long patterns","author":"JJ Deng","year":"2021","unstructured":"Deng JJ, Hon W, K\u00f6ppl D, Sadakane K. FM-indexing grammars induced by suffix sorting for long patterns. Snowbird: IEEE; 2021."}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-024-00260-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13015-024-00260-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-024-00260-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T00:08:41Z","timestamp":1712707721000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/s13015-024-00260-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,10]]},"references-count":27,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["260"],"URL":"https:\/\/doi.org\/10.1186\/s13015-024-00260-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3487536\/v1","asserted-by":"object"}]},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,10]]},"assertion":[{"value":"24 October 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"15"}}