{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T00:45:12Z","timestamp":1775263512234,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2020,7,30]],"date-time":"2020-07-30T00:00:00Z","timestamp":1596067200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"FCT\u2014Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia"},{"name":"Scientific Employment Stimulus\u2014Institutional Call","award":["CI-CTTI-94-ARH\/2019"],"award-info":[{"award-number":["CI-CTTI-94-ARH\/2019"]}]},{"DOI":"10.13039\/100006129","name":"FCT","doi-asserted-by":"publisher","award":["SFRH\/BD\/141851\/2018"],"award-info":[{"award-number":["SFRH\/BD\/141851\/2018"]}],"id":[{"id":"10.13039\/100006129","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006129","name":"FCT","doi-asserted-by":"publisher","award":["UIDB\/00127\/2020"],"award-info":[{"award-number":["UIDB\/00127\/2020"]}],"id":[{"id":"10.13039\/100006129","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,1,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused more than 14 million cases and more than half million deaths. Given the absence of implemented therapies, new analysis, diagnosis and therapeutics are of great importance.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Analysis of SARS-CoV-2 genomes from the current outbreak reveals the presence of short persistent DNA\/RNA sequences that are absent from the human genome and transcriptome (PmRAWs). For the PmRAWs with length 12, only four exist at the same location in all SARS-CoV-2. At the gene level, we found one PmRAW of size 13 at the Spike glycoprotein coding sequence. This protein is fundamental for binding in human ACE2 and further use as an entry receptor to invade target cells. Applying protein structural prediction, we localized this PmRAW at the surface of the Spike protein, providing a potential targeted vector for diagnostics and therapeutics. In addition, we show a new pattern of relative absent words (RAWs), characterized by the progressive increase of GC content (Guanine and Cytosine) according to the decrease of RAWs length, contrarily to the virus and host genome distributions. New analysis shows the same property during the Ebola virus outbreak. At a computational level, we improved the alignment-free method to identify pathogen-specific signatures in balance with GC measures and removed previous size limitations.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/cobilab\/eagle.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa686","type":"journal-article","created":{"date-parts":[[2020,7,22]],"date-time":"2020-07-22T19:24:53Z","timestamp":1595445893000},"page":"5129-5132","source":"Crossref","is-referenced-by-count":14,"title":["Persistent minimal sequences of SARS-CoV-2"],"prefix":"10.1093","volume":"36","author":[{"given":"Diogo","family":"Pratas","sequence":"first","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro , 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics, Telecommunications and Informatics, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal"},{"name":"Department of Virology, University of Helsinki , 00014 Helsinki, Finland"}]},{"given":"Jorge M","family":"Silva","sequence":"additional","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro , 3810-193 Aveiro, Portugal"},{"name":"Department of Electronics, Telecommunications and Informatics, University of Aveiro , Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal"}]}],"member":"286","published-online":{"date-parts":[[2020,7,30]]},"reference":[{"key":"2023062504243125100_btaa686-B1","doi-asserted-by":"crossref","first-page":"450","DOI":"10.1038\/s41591-020-0820-9","article-title":"The proximal origin of SARS-CoV-2","volume":"26","author":"Andersen","year":"2020","journal-title":"Nat. Med"},{"key":"2023062504243125100_btaa686-B2","doi-asserted-by":"crossref","first-page":"W597","DOI":"10.1093\/nar\/gks400","article-title":"ExPASy: SIB bioinformatics resource portal","volume":"40","author":"Artimo","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062504243125100_btaa686-B3","doi-asserted-by":"crossref","first-page":"388","DOI":"10.1186\/s12859-014-0388-9","article-title":"Linear-time computation of minimal absent words using suffix array","volume":"15","author":"Barton","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023062504243125100_btaa686-B4","first-page":"555","author":"B\u00e9al","year":"1996"},{"key":"2023062504243125100_btaa686-B5","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.tcs.2012.04.031","article-title":"Using minimal absent words to build phylogeny","volume":"450","author":"Chairungsee","year":"2012","journal-title":"Theor. Comput. Sci"},{"key":"2023062504243125100_btaa686-B6","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1016\/S0140-6736(20)30211-7","article-title":"Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study","volume":"395","author":"Chen","year":"2020","journal-title":"Lancet"},{"key":"2023062504243125100_btaa686-B7","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/S0020-0190(98)00104-5","article-title":"Automata and forbidden words","volume":"67","author":"Crochemore","year":"1998","journal-title":"Inf. Process. Lett"},{"key":"2023062504243125100_btaa686-B8","doi-asserted-by":"crossref","first-page":"104461","DOI":"10.1016\/j.ic.2019.104461","article-title":"Absent words in a sliding window with applications","volume":"270","author":"Crochemore","year":"2020","journal-title":"Inf. Comput"},{"key":"2023062504243125100_btaa686-B9","doi-asserted-by":"crossref","first-page":"1967","DOI":"10.1056\/NEJMoa030747","article-title":"Identification of a novel coronavirus in patients with severe acute respiratory syndrome","volume":"348","author":"Drosten","year":"2003","journal-title":"N. Engl. J. Med"},{"key":"2023062504243125100_btaa686-B10","doi-asserted-by":"crossref","first-page":"2662","DOI":"10.1093\/bioinformatics\/btu312","article-title":"keeSeek: searching distant non-existing words in genomes for PCR-based applications","volume":"30","author":"Falda","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062504243125100_btaa686-B11","doi-asserted-by":"crossref","first-page":"e16065","DOI":"10.1371\/journal.pone.0016065","article-title":"Minimal absent words in prokaryotic and eukaryotic genomes","volume":"6","author":"Garcia","year":"2011","journal-title":"PLoS One"},{"key":"2023062504243125100_btaa686-B12","doi-asserted-by":"crossref","DOI":"10.1002\/ddr.21656","article-title":"Angiotensin receptor blockers as tentative SARS-CoV-2 therapeutics","author":"Gurwitz","year":"2020","journal-title":"Drug Dev. Res"},{"key":"2023062504243125100_btaa686-B13","doi-asserted-by":"crossref","first-page":"2746","DOI":"10.1093\/bioinformatics\/btx209","article-title":"emMAW: computing minimal absent words in external memory","volume":"33","author":"H\u00e9liou","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062504243125100_btaa686-B14","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1186\/1471-2105-9-167","article-title":"Efficient computation of absent words in genomic sequences","volume":"9","author":"Herold","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023062504243125100_btaa686-B15","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1016\/S0140-6736(20)30183-5","article-title":"Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China","volume":"395","author":"Huang","year":"2020","journal-title":"Lancet"},{"key":"2023062504243125100_btaa686-B16","doi-asserted-by":"crossref","first-page":"105924","DOI":"10.1016\/j.ijantimicag.2020.105924","article-title":"Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges","volume":"55","author":"Lai","year":"2020","journal-title":"Int. J. Antimicrob. Agents"},{"key":"2023062504243125100_btaa686-B17","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023062504243125100_btaa686-B18","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023062504243125100_btaa686-B19","doi-asserted-by":"crossref","DOI":"10.1128\/AAC.00483-20","article-title":"Updated approaches against SARS-CoV-2","volume":"64","author":"Li","year":"2020","journal-title":"Antimicrob. Agents Chemother"},{"key":"2023062504243125100_btaa686-B20","doi-asserted-by":"crossref","DOI":"10.1128\/JCM.00557-20","article-title":"Comparative performance of SARS-CoV-2 detection assays using seven different primer-probe sets and one assay kit","volume":"58","author":"Nalla","year":"2020","journal-title":"J. Clin. Microbiol"},{"key":"2023062504243125100_btaa686-B21","author":"Nguyen","year":"2020"},{"key":"2023062504243125100_btaa686-B22","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1186\/1471-2105-10-137","article-title":"On finding minimal absent words","volume":"10","author":"Pinho","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023062504243125100_btaa686-B23","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1038\/nbt.1754","article-title":"Integrative genomics viewer","volume":"29","author":"Robinson","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023062504243125100_btaa686-B24","doi-asserted-by":"crossref","first-page":"112787","DOI":"10.1016\/j.jim.2020.112787","article-title":"In the search of potential epitopes for Wuhan seafood market pneumonia virus using high order nullomers","volume":"481\u2013482","author":"Santoni","year":"2020","journal-title":"J. Immunol. Methods"},{"key":"2023062504243125100_btaa686-B25","doi-asserted-by":"crossref","first-page":"2421","DOI":"10.1093\/bioinformatics\/btv189","article-title":"Three minimal sequences found in Ebola virus genomes and absent from human DNA","volume":"31","author":"Silva","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062504243125100_btaa686-B26","doi-asserted-by":"crossref","first-page":"e0164540","DOI":"10.1371\/journal.pone.0164540","article-title":"Nullomers and high order nullomers in genomic sequences","volume":"11","author":"Vergni","year":"2016","journal-title":"PLoS One"},{"key":"2023062504243125100_btaa686-B27","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1016\/S0140-6736(20)30185-9","article-title":"A novel coronavirus outbreak of global health concern","volume":"395","author":"Wang","year":"2020","journal-title":"Lancet"},{"key":"2023062504243125100_btaa686-B28","doi-asserted-by":"crossref","first-page":"W296","DOI":"10.1093\/nar\/gky427","article-title":"SWISS-MODEL: homology modelling of protein structures and complexes","volume":"46","author":"Waterhouse","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023062504243125100_btaa686-B29","doi-asserted-by":"crossref","first-page":"1260","DOI":"10.1126\/science.abb2507","article-title":"Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation","volume":"367","author":"Wrapp","year":"2020","journal-title":"Science"},{"key":"2023062504243125100_btaa686-B30","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/s41586-020-2008-3","article-title":"A new coronavirus associated with human respiratory disease in China","volume":"579","author":"Wu","year":"2020","journal-title":"Nature"},{"key":"2023062504243125100_btaa686-B31","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1016\/j.ipl.2010.05.008","article-title":"Efficient computation of shortest absent words in a genomic sequence","volume":"110","author":"Wu","year":"2010","journal-title":"Inf. Process. Lett"},{"key":"2023062504243125100_btaa686-B32","doi-asserted-by":"crossref","first-page":"1814","DOI":"10.1056\/NEJMoa1211721","article-title":"Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia","volume":"367","author":"Zaki","year":"2012","journal-title":"N. Engl. J. Med"},{"key":"2023062504243125100_btaa686-B33","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1007\/s00134-020-05985-9","article-title":"Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target","volume":"46","author":"Zhang","year":"2020","journal-title":"Intensive Care Med"},{"key":"2023062504243125100_btaa686-B34","doi-asserted-by":"crossref","first-page":"16855","DOI":"10.1073\/pnas.0407821101","article-title":"GC\/AT-content spikes as genomic punctuation marks","volume":"101","author":"Zhang","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062504243125100_btaa686-B35","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1038\/s41586-020-2012-7","article-title":"A pneumonia outbreak associated with a new coronavirus of probable bat origin","volume":"579","author":"Zhou","year":"2020","journal-title":"Nature"},{"key":"2023062504243125100_btaa686-B36","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/s13059-019-1755-7","article-title":"Benchmarking of alignment-free sequence comparison methods","volume":"20","author":"Zielezinski","year":"2019","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa686\/34127031\/btaa686.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/21\/5129\/50692652\/btaa686.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/21\/5129\/50692652\/btaa686.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,25]],"date-time":"2023-06-25T04:25:28Z","timestamp":1687667128000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/21\/5129\/5878957"}},"subtitle":[],"editor":[{"given":"Pier","family":"Luigi Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,7,30]]},"references-count":36,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2021,1,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa686","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,11,1]]},"published":{"date-parts":[[2020,7,30]]}}}