{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:54Z","timestamp":1772138094561,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2019,4,16]],"date-time":"2019-04-16T00:00:00Z","timestamp":1555372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2018YFC0910401"],"award-info":[{"award-number":["2018YFC0910401"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"publisher","award":["61721003"],"award-info":[{"award-number":["61721003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"publisher","award":["61673231"],"award-info":[{"award-number":["61673231"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"US National Science Foundation","doi-asserted-by":"crossref","award":["DMS-1518001"],"award-info":[{"award-number":["DMS-1518001"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Institute of Health","award":["R01GM120624"],"award-info":[{"award-number":["R01GM120624"]}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Detecting sequences containing repetitive regions is a basic bioinformatics task with many applications. Several methods have been developed for various types of repeat detection tasks. An efficient generic method for detecting most types of repetitive sequences is still desirable. Inspired by the excellent properties and successful applications of the D2 family of statistics in comparative analyses of genomic sequences, we developed a new statistic D2R that can efficiently discriminate sequences with or without repetitive regions.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Using the statistic, we developed an algorithm of linear time and space complexity for detecting most types of repetitive sequences in multiple scenarios, including finding candidate clustered regularly interspaced short palindromic repeats regions from bacterial genomic or metagenomics sequences. Simulation and real data experiments show that the method works well on both assembled sequences and unassembled short reads.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The codes are available at https:\/\/github.com\/XuegongLab\/D2R_codes under GPL 3.0 license.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz262","type":"journal-article","created":{"date-parts":[[2019,4,10]],"date-time":"2019-04-10T07:14:22Z","timestamp":1554880462000},"page":"4596-4606","source":"Crossref","is-referenced-by-count":6,"title":["A new statistic for efficient detection of repetitive sequences"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4331-0773","authenticated-orcid":false,"given":"Sijie","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Automation, MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University , Beijing 100084, China"}]},{"given":"Yixin","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Automation, MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University , Beijing 100084, China"}]},{"given":"Fengzhu","family":"Sun","sequence":"additional","affiliation":[{"name":"Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California , Los Angeles, CA 90089, USA"},{"name":"Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University , Shanghai 200433, China"}]},{"given":"Michael S","family":"Waterman","sequence":"additional","affiliation":[{"name":"Department of Automation, MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University , Beijing 100084, China"},{"name":"Quantitative and Computational Biology Program, Department of Biological Sciences, University of Southern California , Los Angeles, CA 90089, USA"},{"name":"Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University , Shanghai 200433, China"}]},{"given":"Xuegong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Automation, MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, BNRist, Tsinghua University , Beijing 100084, China"},{"name":"School of Life Sciences, Tsinghua University , Beijing 100084, China"}]}],"member":"286","published-online":{"date-parts":[[2019,4,16]]},"reference":[{"key":"2023013108324343700_btz262-B1","doi-asserted-by":"crossref","first-page":"13579","DOI":"10.1073\/pnas.1735481100","article-title":"Extensive repetitive DNA facilitates prokaryotic genome plasticity","volume":"100","author":"Aras","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108324343700_btz262-B2","doi-asserted-by":"crossref","first-page":"1269","DOI":"10.1101\/gr.88502","article-title":"Automated de novo identification of repeat sequence families in sequenced genomes","volume":"12","author":"Bao","year":"2002","journal-title":"Genome Res"},{"key":"2023013108324343700_btz262-B3","doi-asserted-by":"crossref","first-page":"1709","DOI":"10.1126\/science.1138140","article-title":"CRISPR provides acquired resistance against viruses in prokaryotes","volume":"315","author":"Barrangou","year":"2007","journal-title":"Science"},{"key":"2023013108324343700_btz262-B4","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1089\/cmb.2015.0226","article-title":"CRISPR detection from short reads using partial overlap graphs","volume":"23","author":"Ben-Bassat","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023013108324343700_btz262-B5","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023013108324343700_btz262-B6","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s10577-015-9499-z","article-title":"Repetitive DNA in eukaryotic genomes","volume":"23","author":"Biscotti","year":"2015","journal-title":"Chromosom. Res"},{"key":"2023013108324343700_btz262-B7","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1186\/1471-2105-8-209","article-title":"CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats","volume":"8","author":"Bland","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013108324343700_btz262-B8","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1093\/bioinformatics\/14.4.380","article-title":"MView: a web compatible database search or multiple alignment viewer","volume":"14","author":"Brown","year":"1998","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B9","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/nature21059","article-title":"New CRISPR-Cas systems from uncultivated microbes","volume":"542","author":"Burstein","year":"2017","journal-title":"Nature"},{"key":"2023013108324343700_btz262-B10","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1109\/TCBB.2006.16","article-title":"An efficient algorithm for the identification of structured motifs in DNA promoter sequences","volume":"3","author":"Carvalho","year":"2006","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023013108324343700_btz262-B11","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1002\/nav.20017","article-title":"Higher-order Markov chain models for categorical data sequences","volume":"51","author":"Ching","year":"2004","journal-title":"Nav. Res. Logist"},{"key":"2023013108324343700_btz262-B12","doi-asserted-by":"crossref","first-page":"e0150719.","DOI":"10.1371\/journal.pone.0150719","article-title":"REPdenovo: inferring De Novo repeat motifs from short sequence reads","volume":"11","author":"Chu","year":"2016","journal-title":"PLoS One"},{"key":"2023013108324343700_btz262-B13","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1126\/science.1231143","article-title":"Multiplex genome engineering using CRISPR\/Cas systems","volume":"339","author":"Cong","year":"2013","journal-title":"Science"},{"key":"2023013108324343700_btz262-B14","doi-asserted-by":"crossref","first-page":"S21","DOI":"10.1186\/1471-2105-8-S7-S21","article-title":"A survey of DNA motif finding algorithms","volume":"8","author":"Das","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013108324343700_btz262-B15","doi-asserted-by":"crossref","first-page":"e1002384.","DOI":"10.1371\/journal.pgen.1002384","article-title":"Repetitive elements may comprise over two-thirds of the human genome","volume":"7","author":"de Koning","year":"2011","journal-title":"PLoS Genet"},{"key":"2023013108324343700_btz262-B16","doi-asserted-by":"crossref","first-page":"1853","DOI":"10.1016\/j.cell.2016.11.038","article-title":"Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens","volume":"167","author":"Dixit","year":"2016","journal-title":"Cell"},{"key":"2023013108324343700_btz262-B17","doi-asserted-by":"crossref","first-page":"eaar4120.","DOI":"10.1126\/science.aar4120","article-title":"Systematic discovery of antiphage defense systems in the microbial pangenome","volume":"359","author":"Doron","year":"2018","journal-title":"Science"},{"key":"2023013108324343700_btz262-B18","doi-asserted-by":"crossref","first-page":"2059","DOI":"10.1093\/bioinformatics\/btl355","article-title":"Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching","volume":"22","author":"Du","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B19","doi-asserted-by":"crossref","first-page":"i152","DOI":"10.1093\/bioinformatics\/bti1003","article-title":"PILER: identification and classification of genomic repeats","volume":"21","author":"Edgar","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B20","doi-asserted-by":"crossref","first-page":"227.","DOI":"10.1186\/s12859-015-0654-5","article-title":"Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale","volume":"16","author":"Girgis","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023013108324343700_btz262-B21","doi-asserted-by":"crossref","first-page":"172.","DOI":"10.1186\/1471-2105-8-172","article-title":"The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats","volume":"8","author":"Grissa","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013108324343700_btz262-B22","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.1093\/bioinformatics\/btx717","article-title":"RepLong: de novo repeat identification using long read sequencing data","volume":"34","author":"Guo","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B23","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1126\/science.1179555","article-title":"CRISPR\/Cas, the immune system of bacteria and archaea","volume":"327","author":"Horvath","year":"2010","journal-title":"Science"},{"key":"2023013108324343700_btz262-B24","doi-asserted-by":"crossref","first-page":"678","DOI":"10.3389\/fmicb.2015.00678","article-title":"Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial","volume":"6","author":"Howe","year":"2015","journal-title":"Front. Microbiol"},{"key":"2023013108324343700_btz262-B25","doi-asserted-by":"crossref","first-page":"1262","DOI":"10.1016\/j.cell.2014.05.010","article-title":"Development and applications of CRISPR-Cas9 for genome engineering","volume":"157","author":"Hsu","year":"2014","journal-title":"Cell"},{"key":"2023013108324343700_btz262-B26","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1126\/science.1225829","article-title":"A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity","volume":"337","author":"Jinek","year":"2012","journal-title":"Science"},{"key":"2023013108324343700_btz262-B27","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1016\/S0168-9525(00)02093-X","article-title":"Repbase update: a database and an electronic journal of repetitive elements","volume":"16","author":"Jurka","year":"2000","journal-title":"Trends Genet"},{"key":"2023013108324343700_btz262-B28","doi-asserted-by":"crossref","first-page":"i249","DOI":"10.1093\/bioinformatics\/btm211","article-title":"A statistical method for alignment-free comparison of regulatory sequences","volume":"23","author":"Kantorovitz","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B52","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1038\/nature12198","article-title":"Gut metagenome in European women with normal, impaired and diabetic glucose control","volume":"498","author":"Karlsson","year":"2013","journal-title":"Nature"},{"key":"2023013108324343700_btz262-B29","doi-asserted-by":"crossref","first-page":"e80.","DOI":"10.1093\/nar\/gku210","article-title":"RepARK\u2014de novo creation of repeat libraries from whole-genome NGS reads","volume":"42","author":"Koch","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023013108324343700_btz262-B30","doi-asserted-by":"crossref","first-page":"517.","DOI":"10.1186\/1471-2164-9-517","article-title":"A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes","volume":"9","author":"Kurtz","year":"2008","journal-title":"BMC Genomics"},{"key":"2023013108324343700_btz262-B31","doi-asserted-by":"crossref","first-page":"i520","DOI":"10.1093\/bioinformatics\/btw456","article-title":"Assemble CRISPRs from metagenomic sequencing data","volume":"32","author":"Lei","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B32","doi-asserted-by":"crossref","first-page":"13980","DOI":"10.1073\/pnas.202468099","article-title":"Distributional regimes for the number of k-word matches between two random sequences","volume":"99","author":"Lippert","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013108324343700_btz262-B33","doi-asserted-by":"crossref","first-page":"1502","DOI":"10.1001\/jama.2013.3231","article-title":"A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of shiga-toxigenic Escherichia coli O104:H4","volume":"309","author":"Loman","year":"2013","journal-title":"JAMA"},{"key":"2023013108324343700_btz262-B34","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1126\/science.1232033","article-title":"RNA-guide human genome engineering via Cas9","volume":"339","author":"Mali","year":"2013","journal-title":"Science"},{"key":"2023013108324343700_btz262-B35","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s12918-015-0248-x","article-title":"Computational prediction of CRISPR cassettes in gut metagenome samples from Chinese type-2 diabetic patients and healthy controls","volume":"10","author":"Mangericao","year":"2016","journal-title":"BMC Syst. Biol"},{"key":"2023013108324343700_btz262-B36","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.ygeno.2013.03.002","article-title":"RF: a method for filtering short reads with tandem repeats for genome mapping","volume":"102","author":"Misawa","year":"2013","journal-title":"Genomics"},{"key":"2023013108324343700_btz262-B37","doi-asserted-by":"crossref","first-page":"1416","DOI":"10.1093\/nar\/gks1285","article-title":"One size does not fit all: on how Markov model order dictates performance of genomic sequence analyses","volume":"41","author":"Narlikar","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023013108324343700_btz262-B38","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1017\/S0021900200007403","article-title":"On the first k moments of the random count of a pattern in a multistate sequence generated by a Markov source","volume":"47","author":"Nuel","year":"2010","journal-title":"J. Appl. Probab"},{"key":"2023013108324343700_btz262-B39","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3389\/fbioe.2016.00035","article-title":"Accurate prediction of the statistics of repetitions in random sequences: a case study in archaea genomes","volume":"4","author":"R\u00e9gnier","year":"2016","journal-title":"Front. Bioeng. Biotechnol"},{"key":"2023013108324343700_btz262-B40","doi-asserted-by":"crossref","first-page":"1615","DOI":"10.1089\/cmb.2009.0198","article-title":"Alignment-free sequence comparison (I): statistics and power","volume":"16","author":"Reinert","year":"2009","journal-title":"J. Comput. Biol"},{"key":"2023013108324343700_btz262-B41","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/bioinformatics\/btv395","article-title":"Inference of Markovian properties of molecular sequences from NGS data and applications to comparative genomics","volume":"32","author":"Ren","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B42","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1093\/bioinformatics\/btn525","article-title":"Faster exact Markovian probability functions for motif occurrences: a DFA-only approach","volume":"24","author":"Ribeca","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B43","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1002\/pro.3290","article-title":"Clustal omega for making accurate alignments of many protein sequences","volume":"27","author":"Sievers","year":"2018","journal-title":"Protein Sci"},{"key":"2023013108324343700_btz262-B44","doi-asserted-by":"crossref","first-page":"e105","DOI":"10.1093\/nar\/gkt183","article-title":"Crass: identification and reconstruction of CRISPR from unassembled metagenomic data","volume":"41","author":"Skennerton","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023013108324343700_btz262-B45","year":"2013\u20132015"},{"key":"2023013108324343700_btz262-B46","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1214\/aos\/1074290335","article-title":"The positive false discovery rate: a Bayesian interpretation and the q-value","volume":"31","author":"Storey","year":"2003","journal-title":"Ann. Stat"},{"key":"2023013108324343700_btz262-B47","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1186\/1471-2105-12-77","article-title":"pROC: an open-source package for R and S+ to analyze and compare ROC curves","volume":"12","author":"Tiberti","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023013108324343700_btz262-B48","first-page":"109","article-title":"Computation of d2: a measure of sequence dissimilarity","author":"Torney","year":"1990","journal-title":"Computers and DNA, SFI Studies in the Sciences of Complexity"},{"key":"2023013108324343700_btz262-B49","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1093\/bioinformatics\/btl662","article-title":"Computing exact P-values for DNA motifs","volume":"23","author":"Tromp","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013108324343700_btz262-B50","doi-asserted-by":"crossref","first-page":"1467","DOI":"10.1089\/cmb.2010.0056","article-title":"Alignment-free sequence comparison (II): theoretical power of comparison statistics","volume":"17","author":"Wan","year":"2010","journal-title":"J. Comput. Biol"},{"key":"2023013108324343700_btz262-B51","author":"Waterman","year":"1995"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz262\/28560863\/btz262.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/22\/4596\/48978368\/bioinformatics_35_22_4596.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/22\/4596\/48978368\/bioinformatics_35_22_4596.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T12:41:12Z","timestamp":1675168872000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/22\/4596\/5472337"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,4,16]]},"references-count":52,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2019,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz262","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/420745","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,11,15]]},"published":{"date-parts":[[2019,4,16]]}}}