{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T17:12:16Z","timestamp":1760202736695,"version":"3.37.3"},"reference-count":14,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2017,4,12]],"date-time":"2017-04-12T00:00:00Z","timestamp":1491955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["2845984","294143"],"award-info":[{"award-number":["2845984","294143"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes. There exists an O(n)-time and O(n)-space algorithm for computing all minimal absent words of a sequence of length n on a fixed-sized alphabet based on suffix arrays. A standard implementation of this algorithm, when applied to a large sequence of length n, requires more than 20n\u00a0bytes of RAM. Such memory requirements are a significant hurdle to the computation of minimal absent words in large datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present emMAW, the first external-memory algorithm for computing minimal absent words. A free open-source implementation of our algorithm is made available. This allows for computation of minimal absent words on far bigger data sets than was previously possible. Our implementation requires less than 3\u2009h on a standard workstation to process the full human genome when as little as 1\u2009GB of RAM is made available. We stress that our implementation, despite making use of external memory, is fast; indeed, even on relatively smaller datasets when enough RAM is available to hold all necessary data structures, it is less than two times slower than state-of-the-art internal-memory implementations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/solonas13\/maw (free software under the terms of the GNU GPL).<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx209","type":"journal-article","created":{"date-parts":[[2017,4,6]],"date-time":"2017-04-06T14:43:48Z","timestamp":1491489828000},"page":"2746-2749","source":"Crossref","is-referenced-by-count":10,"title":["emMAW: computing minimal absent words in external memory"],"prefix":"10.1093","volume":"33","author":[{"given":"Alice","family":"H\u00e9liou","sequence":"first","affiliation":[{"name":"LIX, \u00c9cole Polytechnique, CNRS, INRIA, Universit\u00e9 Paris-Saclay, Palaiseau, France"}]},{"given":"Solon P","family":"Pissis","sequence":"additional","affiliation":[{"name":"Department of Informatics, King\u2019s College London, London, UK"}]},{"given":"Simon J","family":"Puglisi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki, Helsinki, Finland"}]}],"member":"286","published-online":{"date-parts":[[2017,4,12]]},"reference":[{"key":"2023020206272132500_btx209-B1","doi-asserted-by":"crossref","first-page":"5.","DOI":"10.1186\/s13015-017-0094-z","article-title":"On avoided words, absent words, and their application to biological sequence analysis","volume":"12","author":"Almirantis","year":"2017","journal-title":"Algorithms for Molecular Biology"},{"key":"2023020206272132500_btx209-B2","doi-asserted-by":"crossref","first-page":"388.","DOI":"10.1186\/s12859-014-0388-9","article-title":"Linear-time computation of minimal absent words using suffix array","volume":"15","author":"Barton","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023020206272132500_btx209-B3","first-page":"243","volume-title":"PPAM, Part II, Volume 9574 of LNCS","author":"Barton","year":"2015"},{"key":"2023020206272132500_btx209-B4","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1006\/aama.2000.0682","article-title":"Forbidden words in symbolic dynamics","volume":"25","author":"B\u00e9al","year":"2000","journal-title":"Advances in Applied Mathematics"},{"key":"2023020206272132500_btx209-B5","first-page":"222","volume-title":"SPIRE, Volume 9309 of LNCS","author":"Belazzougui","year":"2015"},{"key":"2023020206272132500_btx209-B6","first-page":"133","volume-title":"ESA, Volume 8125 of LNCS","author":"Belazzougui","year":"2013"},{"key":"2023020206272132500_btx209-B7","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/S0020-0190(98)00104-5","article-title":"Automata and forbidden words","volume":"67","author":"Crochemore","year":"1998","journal-title":"Information Processing Letters"},{"key":"2023020206272132500_btx209-B8","first-page":"355","volume-title":"PCB","author":"Hampikian","year":"2007"},{"key":"2023020206272132500_btx209-B9","first-page":"61:1","volume-title":"ESA 2016, Volume 57 of LIPIcs","author":"K\u00e4rkk\u00e4inen","year":"2016"},{"key":"2023020206272132500_btx209-B10","first-page":"329","volume-title":"CPM, Volume 9133 of LNCS","author":"K\u00e4rkk\u00e4inen","year":"2015"},{"key":"2023020206272132500_btx209-B11","first-page":"98","volume-title":"ALENEX","author":"K\u00e4rkk\u00e4inen","year":"2017"},{"key":"2023020206272132500_btx209-B12","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0304-3975(00)00436-9","article-title":"Words and forbidden factors","volume":"273","author":"Mignosi","year":"2002","journal-title":"Theoretical Computer Science"},{"key":"2023020206272132500_btx209-B13","doi-asserted-by":"crossref","first-page":"2421","DOI":"10.1093\/bioinformatics\/btv189","article-title":"Three minimal sequences found in Ebola virus genomes and absent from human DNA","volume":"31","author":"Silva","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020206272132500_btx209-B14","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1561\/0400000014","article-title":"Algorithms and data structures for external memory","volume":"2","author":"Vitter","year":"2006","journal-title":"Foundations and Trends in Theoretical Computer Science"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/17\/2746\/49041000\/bioinformatics_33_17_2746.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/17\/2746\/49041000\/bioinformatics_33_17_2746.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T06:29:47Z","timestamp":1675319387000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/17\/2746\/3605193"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,4,12]]},"references-count":14,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2017,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx209","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,9,1]]},"published":{"date-parts":[[2017,4,12]]}}}