{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T06:40:15Z","timestamp":1684046415086},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"20","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Genomic-based methods have significant potential for fast and accurate identification of organisms or even genes of interest in complex environmental samples (air, water, soil, food, etc.), especially when isolation of the target organism cannot be performed by a variety of reasons. Despite this potential, the presence of the unknown, variable and usually large quantities of background DNA can cause interference resulting in false positive outcomes.<\/jats:p><jats:p>Results: In order to estimate how the genomic diversity of the background (total length of all of the different genomes present in the background), target length and target mutation rate affect the probability of misidentifications, we introduce a mathematical definition for the quality of an individual signature in the presence of a background based on its length and number of mismatches needed to transform the signature into the closest subsequence present in the background. This definition, in conjunction with a probabilistic framework, allows one to predict the minimal signature length required to identify the target in the presence of different sizes of backgrounds and the effect of the target's mutation rate on the quality of its identification. The model assumptions and predictions were validated using both Monte Carlo simulations and real genomic data examples. The proposed model can be used to determine appropriate signature lengths for various combinations of target and background genome sizes. It also predicted that any genomic signatures will be unable to identify target if its mutation rate is &amp;gt;5%.<\/jats:p><jats:p>Contact: \u00a0yfofanov@bioinfo.uh.edu<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm420","type":"journal-article","created":{"date-parts":[[2007,9,20]],"date-time":"2007-09-20T00:24:46Z","timestamp":1190247886000},"page":"2665-2671","source":"Crossref","is-referenced-by-count":4,"title":["Effect of the mutation rate and background size on the quality of pathogen identification"],"prefix":"10.1093","volume":"23","author":[{"given":"Chris","family":"Reed","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Viacheslav","family":"Fofanov","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Catherine","family":"Putonti","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"},{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sergei","family":"Chumakov","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tom","family":"Slezak","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuriy","family":"Fofanov","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"},{"name":"1 Department of Computer Science, University of Houston, 501 Philip G. Hoffman Hall, Houston, TX 77204, 2Department of Statistics, Rice University, 6100 Main Street, MS138, Houston, TX 77005, 3Department of Biology and Biochemistry, University of Houston, Science and Research Building 2, Houston, TX 77204, USA, 4Departmento de Fisica, CUCEI, Universidad de Guadalajara, Revolucion 1500, Guadalajara, Jalisco 44430, Mexico and 5Computations Department, Lawrence Livermore National Laboratory, 7000 East Avenue L-174, Livermore, CA 94550, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2007,9,19]]},"reference":[{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1128\/mr.59.1.143-169.1995","article-title":"Phylogenetic identification and in situ detection of individual microbial cells without cultivation","volume":"59","author":"Amann","year":"1995","journal-title":"Microbiol. Rev"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"R23","DOI":"10.1186\/gb-2004-5-4-r23","article-title":"Hotspots of mammalian chromosomal evolution","volume":"5","author":"Bailey","year":"2004","journal-title":"Genome Biol"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"9184","DOI":"10.1073\/pnas.96.16.9184","article-title":"Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA","volume":"96","author":"Campbell","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"121","DOI":"10.4024\/40501.jbpc.05.04","article-title":"Theoretical basis for universal identification systems for bacteria and viruses","volume":"5","author":"Chumakov","year":"2005","journal-title":"J. Biol. Phys. Chem"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"1391","DOI":"10.1093\/oxfordjournals.molbev.a026048","article-title":"Genomic signature: characterization and classification of species assessed by chaos game representation of sequences","volume":"16","author":"Deschavanne","year":"1999","journal-title":"Mol. Biol. Evol"},{"key":"2023041105592667000_","first-page":"248","article-title":"Identification of genomic signatures for the design of assays for the detection and monitoring of anthrax threats","volume":"10","author":"Draghici","year":"2005","journal-title":"Pac. Symp. Biocomput"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"3746","DOI":"10.1093\/nar\/gkg569","article-title":"PROBEmer: a web-based software tool for selecting optimal DNA oligos","volume":"31","author":"Emrich","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"1708","DOI":"10.1109\/JPROC.2002.804680","article-title":"Rapid development of nucleic acid diagnostics","volume":"90","author":"Fitch","year":"2002","journal-title":"Proc. IEEE"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"2421","DOI":"10.1093\/bioinformatics\/bth266","article-title":"How independent are the appearances of n-mers in different genomes?","volume":"20","author":"Fofanov","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1101\/gr.4305906","article-title":"Mutation hot spots in mammalian mitochondrial DNA","volume":"16","author":"Galtier","year":"2006","journal-title":"Genome Res"},{"key":"2023041105592667000_","volume-title":"Fundamentals of Molecular Evolution","author":"Graur","year":"2000","edition":"2nd"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1016\/S1369-5274(98)80095-7","article-title":"Global dinucleotide signatures and analysis of genomic heterogeneity","volume":"1","author":"Karlin","year":"1998","journal-title":"Curr. Opin. Micobiol"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"12832","DOI":"10.1073\/pnas.91.26.12832","article-title":"Comparisons of eukaryotic genomic sequences","volume":"91","author":"Karlin","year":"1994","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"3899","DOI":"10.1128\/jb.179.12.3899-3913.1997","article-title":"Compositional biases of bacterial genomes and evolutionary implications","volume":"179","author":"Karlin","year":"1997","journal-title":"J. Bacteriol"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"Lander","year":"2001","journal-title":"Nature"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/j.femsle.2005.04.002","article-title":"Oligonucleotide microarray for identification of Enterococcus species","volume":"246","author":"Lehner","year":"2005","journal-title":"FEMS Microbiol. Lett"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"847","DOI":"10.1038\/35057039","article-title":"Evolutionary analyses of the human genome","volume":"409","author":"Li","year":"2001","journal-title":"Nature"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/dnares\/4.3.185","article-title":"Differences in dinucleotide frequencies of human, yeast, and Escherichia coli genes","volume":"4","author":"Nakashima","year":"1997","journal-title":"DNA Res"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1093\/dnares\/5.5.251","article-title":"Genes from nine genomes are separated into their organisms in the dinucleotide composition space","volume":"5","author":"Nakashima","year":"1998","journal-title":"DNA Res"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1038\/342705a0","article-title":"Mutations in the p53 gene occur in diverse human tumour types","volume":"342","author":"Nigro","year":"1989","journal-title":"Nature"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"1749","DOI":"10.1093\/nar\/12.3.1749","article-title":"Doublet frequencies in evolutionary distinct groups","volume":"12","author":"Nussinov","year":"1984","journal-title":"Nucleic Acids Res"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"e98","DOI":"10.1371\/journal.pcbi.0030098","article-title":"Comprehensive DNA signature discovery and validation","volume":"3","author":"Phillippy","year":"2007","journal-title":"PLoS Comput. Biol"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1111\/j.1742-4658.2005.05074.x","article-title":"Human-blind probes and primers for dengue virus identification","volume":"273","author":"Putonti","year":"2006","journal-title":"FEBS J"},{"key":"2023041105592667000_","first-page":"57","article-title":"Fast and sensitive probe selection for DNA chips using jumps in matching statistics","volume":"2","author":"Rahmann","year":"2003","journal-title":"Proc. IEEE Comput. Soc. Bioinform. Conf"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"1404","DOI":"10.1101\/gr.186401","article-title":"Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier","volume":"11","author":"Sandberg","year":"2001","journal-title":"Genome Res"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1093\/bib\/4.2.133","article-title":"Comparative genomics tools applied to bioterrorism defense","volume":"4","author":"Slezak","year":"2003","journal-title":"Brief. Bioinform"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1093\/bioinformatics\/btl549","article-title":"Oligonucleotide fingerprint identification for microarray-based pathogen diagnostic assays","volume":"23","author":"Tembe","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"456","DOI":"10.1007\/s002390010235","article-title":"Intragenic variation of synonymous substitution rates is caused by nonrandom mutations at methylated CpG","volume":"53","author":"Tsunoyama","year":"2001","journal-title":"J. Mol. Evol"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"e199","DOI":"10.1371\/journal.pbio.0020199","article-title":"Evidence for widespread convergent evolution around human microsatellites","volume":"2","author":"Vowles","year":"2004","journal-title":"PLoS Biol"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1101\/gr.3896805","article-title":"Hotspots of mutation and breakage in dog and human chromosomes","volume":"15","author":"Webber","year":"2005","journal-title":"Genome Res"},{"key":"2023041105592667000_","doi-asserted-by":"crossref","first-page":"1710","DOI":"10.1093\/bioinformatics\/bth147","article-title":"Primer design using genetic algorithm","volume":"20","author":"Wu","year":"2004","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/20\/2665\/49818662\/bioinformatics_23_20_2665.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/20\/2665\/49818662\/bioinformatics_23_20_2665.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,14]],"date-time":"2023-05-14T05:59:31Z","timestamp":1684043971000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/20\/2665\/230621"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,9,19]]},"references-count":31,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2007,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm420","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,10,15]]},"published":{"date-parts":[[2007,9,19]]}}}