{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T13:09:52Z","timestamp":1773493792605,"version":"3.50.1"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,31]],"date-time":"2021-12-31T00:00:00Z","timestamp":1640908800000},"content-version":"vor","delay-in-days":30,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["P20 GM103546"],"award-info":[{"award-number":["P20 GM103546"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006206","name":"Biological and Environmental Research","doi-asserted-by":"publisher","award":["DE-SC0021216"],"award-info":[{"award-number":["DE-SC0021216"]}],"id":[{"id":"10.13039\/100006206","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Algorithms Mol Biol"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Pattern matching is a key step in a variety of biological sequence analysis pipelines. The FM-index is a compressed data structure for pattern matching, with search run time that is independent of the length of the database text. Implementation of the FM-index is reasonably complicated, so that increased adoption will be aided by the availability of a fast and flexible FM-index library.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      We present AvxWindowedFMindex (AWFM-index), a lightweight, open-source, thread-parallel FM-index library written in C that is optimized for indexing nucleotide and amino acid sequences. AWFM-index introduces a new approach to storing FM-index data in a strided bit-vector format that enables extremely efficient computation of the FM-index occurrence function via AVX2 bitwise instructions, and combines this with optional on-disk storage of the index\u2019s suffix array and a cache-efficient lookup table for partial k-mer searches. The AWFM-index performs exact match count and locate queries faster than SeqAn3\u2019s FM-index implementation across a range of comparable memory footprints. When optimized for speed, AWFM-index is\n                      <jats:inline-formula>\n                        <jats:alternatives>\n                          <jats:tex-math>$$\\sim $$<\/jats:tex-math>\n                          <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                            <mml:mo>\u223c<\/mml:mo>\n                          <\/mml:math>\n                        <\/jats:alternatives>\n                      <\/jats:inline-formula>\n                      2\u20134x faster than SeqAn3 for nucleotide search, and\n                      <jats:inline-formula>\n                        <jats:alternatives>\n                          <jats:tex-math>$$\\sim $$<\/jats:tex-math>\n                          <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                            <mml:mo>\u223c<\/mml:mo>\n                          <\/mml:math>\n                        <\/jats:alternatives>\n                      <\/jats:inline-formula>\n                      2\u20136x faster for amino acid search; it is also\n                      <jats:inline-formula>\n                        <jats:alternatives>\n                          <jats:tex-math>$$\\sim $$<\/jats:tex-math>\n                          <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                            <mml:mo>\u223c<\/mml:mo>\n                          <\/mml:math>\n                        <\/jats:alternatives>\n                      <\/jats:inline-formula>\n                      4x faster with similar memory footprint when storing the suffix array in on-disk SSD storage.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>AWFM-index is easy to incorporate into bioinformatics software, offers run-time performance parameterization, and provides clients with FM-index functionality at both a high-level (count or locate all instances of a query string) and low-level (step-wise control of the FM-index backward-search process). The open-source library is available for download at https:\/\/github.com\/TravisWheelerLab\/AvxWindowFmIndex.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s13015-021-00204-6","type":"journal-article","created":{"date-parts":[[2021,12,31]],"date-time":"2021-12-31T07:02:46Z","timestamp":1640934166000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["An optimized FM-index library for nucleotide and amino acid search"],"prefix":"10.1186","volume":"16","author":[{"given":"Tim","family":"Anderson","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2004-1785","authenticated-orcid":false,"given":"Travis J.","family":"Wheeler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,12,31]]},"reference":[{"issue":"14","key":"204_CR1","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25(14):1754\u201360.","journal-title":"Bioinformatics"},{"issue":"3","key":"204_CR2","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1186\/gb-2009-10-3-r25","volume":"10","author":"B Langmead","year":"2009","unstructured":"Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome Biol. 2009;10(3):25.","journal-title":"Genome Biol"},{"issue":"12","key":"204_CR3","doi-asserted-by":"publisher","first-page":"1721","DOI":"10.1101\/gr.210641.116","volume":"26","author":"D Kim","year":"2016","unstructured":"Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 2016;26(12):1721\u20139.","journal-title":"Genome Res"},{"issue":"1","key":"204_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/ncomms11257","volume":"7","author":"P Menzel","year":"2016","unstructured":"Menzel P, Ng KL, Krogh A. Fast and sensitive taxonomic classification for metagenomics with kaiju. Nat Commun. 2016;7(1):1\u20139.","journal-title":"Nat Commun"},{"issue":"1","key":"204_CR5","doi-asserted-by":"publisher","first-page":"524","DOI":"10.1186\/s12859-017-1940-1","volume":"18","author":"Y-T Huang","year":"2017","unstructured":"Huang Y-T, Huang Y-W. An efficient error correction algorithm using fm-index. BMC Bioinformatics. 2017;18(1):524.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"204_CR6","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1038\/nmeth.3176","volume":"12","author":"B Buchfink","year":"2015","unstructured":"Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using diamond. Nat Methods. 2015;12(1):59\u201360.","journal-title":"Nat Methods"},{"issue":"3","key":"204_CR7","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403\u201310.","journal-title":"J Mol Biol"},{"issue":"11","key":"204_CR8","doi-asserted-by":"publisher","first-page":"1026","DOI":"10.1038\/nbt.3988","volume":"35","author":"M Steinegger","year":"2017","unstructured":"Steinegger M, S\u00f6ding J. Mmseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017;35(11):1026\u20138.","journal-title":"Nat Biotechnol"},{"issue":"5","key":"204_CR9","doi-asserted-by":"publisher","first-page":"935","DOI":"10.1137\/0222058","volume":"22","author":"U Manber","year":"1993","unstructured":"Manber U, Myers G. Suffix arrays: a new method for on-line string searches. SIAM J Comput. 1993;22(5):935\u201348.","journal-title":"SIAM J Comput"},{"key":"204_CR10","unstructured":"Ferragina P, Manzini G. Opportunistic data structures with applications. In: Proceedings 41st Annual Symposium on Foundations of Computer Science, pp. 390\u2013398 (2000). IEEE."},{"key":"204_CR11","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/j.jbiotec.2017.07.017","volume":"261","author":"K Reinert","year":"2017","unstructured":"Reinert K, Dadi TH, Ehrhardt M, Hauswedell H, Mehringer S, Rahn R, Kim J, Pockrandt C, Winkler J, Siragusa E, et al. The seqan c++ template library for efficient sequence analysis: a resource for programmers. J Biotechnol. 2017;261:157\u201368.","journal-title":"J Biotechnol"},{"key":"204_CR12","doi-asserted-by":"crossref","unstructured":"Ko P, Aluru S. Space efficient linear time construction of suffix arrays. In: Annual Symposium on Combinatorial Pattern Matching, pp. 200\u2013210 (2003). Springer.","DOI":"10.1007\/3-540-44888-8_15"},{"key":"204_CR13","unstructured":"Fischer J, Kurpicz F. Dismantling divsufsort. arXiv preprint arXiv:1710.01896. 2017."},{"key":"204_CR14","unstructured":"Mori Y. libdivsufsort. https:\/\/github.com\/y-256\/libdivsufsort."},{"key":"204_CR15","unstructured":"Burrows M, Wheeler DJ. A block-sorting lossless data compression algorithm. Technical report 124, Digital Equipment Corporation. 1994."},{"key":"204_CR16","doi-asserted-by":"publisher","unstructured":"Grossi R, Gupta A, Vitter J. High-order entropy-compressed text indexes. Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms. 2002. https:\/\/doi.org\/10.1145\/644108.644250.","DOI":"10.1145\/644108.644250"},{"key":"204_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1412228.1455268","volume":"13","author":"P Ferragina","year":"2009","unstructured":"Ferragina P, Gonz\u00e1lez R, Navarro G, Venturini R. Compressed text indexes: from theory to practice. J Exp Algorithmics. 2009;13:1\u201312.","journal-title":"J Exp Algorithmics"},{"issue":"3","key":"204_CR18","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1093\/bioinformatics\/btx596","volume":"34","author":"H Cheng","year":"2018","unstructured":"Cheng H, Wu M, Xu Y. Fmtree: a fast locating algorithm of fm-indexes for genomic data. Bioinformatics. 2018;34(3):416\u201324.","journal-title":"Bioinformatics"},{"key":"204_CR19","doi-asserted-by":"publisher","unstructured":"Vigna S. Broadword implementation of rank\/select queries. In the Proceedings of the 7th International Workshop on Experimental Algorithms, 2008. https:\/\/doi.org\/10.1007\/978-3-540-68552-4_12.","DOI":"10.1007\/978-3-540-68552-4_12"},{"key":"204_CR20","unstructured":"Consortium U. et al. Uniprot: the universal protein knowledgebase in 2021. Nucleic Acids Res. 1100."},{"issue":"1","key":"204_CR21","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1093\/comjnl\/bxx046","volume":"61","author":"W Mu\u0142a","year":"2017","unstructured":"Mu\u0142a W, Kurz N, Lemire D. Faster population counts using avx2 instructions. Comput J. 2017;61(1):111\u201320.","journal-title":"Comput J"},{"key":"204_CR22","doi-asserted-by":"crossref","unstructured":"Gog S, Beller T, Moffat A, Petri M. From theory to practice: plug and play with succinct data structures. In: 13th International Symposium on Experimental Algorithms, (SEA 2014), pp. 326\u2013337. 2014.","DOI":"10.1007\/978-3-319-07959-2_28"},{"key":"204_CR23","unstructured":"Eddy S. Easel\u2014a C library for biological sequence analysis. http:\/\/bioeasel.org."},{"key":"204_CR24","doi-asserted-by":"crossref","unstructured":"Lam TW, Li R, Tam A, Wong S, Wu E, Yiu S-M. High throughput short read alignment via bi-directional bwt. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 31\u201336 (2009). IEEE.","DOI":"10.1109\/BIBM.2009.42"},{"issue":"12","key":"204_CR25","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1093\/bioinformatics\/btq217","volume":"26","author":"JT Simpson","year":"2010","unstructured":"Simpson JT, Durbin R. Efficient construction of an assembly string graph using the fm-index. Bioinformatics. 2010;26(12):367\u201373.","journal-title":"Bioinformatics"}],"container-title":["Algorithms for Molecular Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-021-00204-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13015-021-00204-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13015-021-00204-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,31]],"date-time":"2021-12-31T07:03:03Z","timestamp":1640934183000},"score":1,"resource":{"primary":{"URL":"https:\/\/almob.biomedcentral.com\/articles\/10.1186\/s13015-021-00204-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["204"],"URL":"https:\/\/doi.org\/10.1186\/s13015-021-00204-6","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.01.12.426474","asserted-by":"object"}]},"ISSN":["1748-7188"],"issn-type":[{"value":"1748-7188","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"12 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"25"}}