{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T07:51:43Z","timestamp":1722844303497},"reference-count":13,"publisher":"Oxford University Press (OUP)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Standard search techniques for DNA repeats start by identifying small matching words, or seeds, that may inhabit larger repeats. Recent innovations in seed structure include spaced seeds and indel seeds which are more sensitive than contiguous seeds. Evaluating seed sensitivity requires (i) specifying a homology model for alignments and (ii) assigning probabilities to those alignments. Optimal seed selection is resource intensive because all alternative seeds must be tested. Current methods require that the model and its probability parameters be specified in advance. When the parameters change, the entire calculation has to be rerun.<\/jats:p>\n               <jats:p>Results: We show how to eliminate the need for prior parameter specification by exploiting a simple observation: given a homology model, the alignments hit by a particular seed remain the same regardless of the probability parameters. Only the weights assigned to those alignments change. Therefore, if we know all the hits, we can easily (and quickly) find optimal seeds. We describe an efficient preprocessing step, which is computed once per seed. Then we show several increasingly efficient methods to find the optimal seed when given specific probability parameters. Indeed, we show how to determine exactly which seeds can never be optimal under any set of probability parameters. This leads to the startling observation that out of thousands of seeds, only a handful have any chance of being optimal. We then show how to identify optimal seeds and the boundaries within probability space where they are optimal.<\/jats:p>\n               <jats:p>Contact: \u00a0dyfmak@bu.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn643","type":"journal-article","created":{"date-parts":[[2008,12,19]],"date-time":"2008-12-19T01:31:48Z","timestamp":1229650308000},"page":"302-308","source":"Crossref","is-referenced-by-count":10,"title":["All hits all the time: parameter-free calculation of spaced seed sensitivity"],"prefix":"10.1093","volume":"25","author":[{"given":"Denise Y.F.","family":"Mak","sequence":"first","affiliation":[{"name":"1 Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, 2Department of Computer Science and 3Department of Biology, Boston University, Boston, MA 02215, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gary","family":"Benson","sequence":"additional","affiliation":[{"name":"1 Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, 2Department of Computer Science and 3Department of Biology, Boston University, Boston, MA 02215, USA"},{"name":"1 Graduate Program in Bioinformatics, Boston University, Boston, MA 02215, 2Department of Computer Science and 3Department of Biology, Boston University, Boston, MA 02215, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2008,12,18]]},"reference":[{"key":"2023013110013935100_B1","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1142\/S0219720004000326","article-title":"Optimal spaced seeds for homologous coding regions","volume":"1","author":"Brejov\u00e1","year":"2004","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023013110013935100_B2","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1016\/j.jcss.2004.12.003","article-title":"Designing seeds for similarity search in genomic DNA","volume":"70","author":"Buhler","year":"2005","journal-title":"J. Comput. Syst. Sci."},{"key":"2023013110013935100_B3","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.jcss.2003.04.002","article-title":"Sensitivity analysis and efficient method for identifying optimal spaced seeds","volume":"68","author":"Choi","year":"2004","journal-title":"J. Comput. Syst. Sci."},{"key":"2023013110013935100_B4","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1093\/bioinformatics\/bth037","article-title":"Good spaced seeds for homology search","volume":"20","author":"Choi","year":"2004","journal-title":"Bioinformatics"},{"key":"2023013110013935100_B5","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1142\/S0219720004000661","article-title":"Patternhunter II: highly sensitive and fast homology search","volume":"2","author":"Li","year":"2004","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023013110013935100_B6","first-page":"444","article-title":"Superiority and complexity of the spaced seeds","volume-title":"Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA).","author":"Li","year":"2006"},{"key":"2023013110013935100_B7","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1093\/bioinformatics\/18.3.440","article-title":"Patternhunter: faster and more sensitive homology search","volume":"18","author":"Ma","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013110013935100_B8","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1142\/9781860947995_0035","article-title":"All hits all the time: parameter free calculation of seed sensitivity","volume-title":"Proceedings of the 5th Asia-Pacific Bioinformatics Conference.","author":"Mak","year":"2007"},{"key":"2023013110013935100_B9","doi-asserted-by":"crossref","first-page":"e341","DOI":"10.1093\/bioinformatics\/btl263","article-title":"Indel seeds for homology search","volume":"22","author":"Mak","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013110013935100_B10","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1186\/1471-2105-5-149","article-title":"Improved hit criteria for DNA local alignment","volume":"5","author":"No\u00e9","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023013110013935100_B11","doi-asserted-by":"crossref","first-page":"16138","DOI":"10.1073\/pnas.0406011101","article-title":"Parametric inference for biological sequence analysis","volume":"101","author":"Pachter","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110013935100_B12"},{"key":"2023013110013935100_B13","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.1089\/cmb.2006.13.1355","article-title":"Optimizing multiple spaced seeds for homology search","volume":"13","author":"Xu","year":"2006","journal-title":"J. Comput. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/3\/302\/48983535\/bioinformatics_25_3_302.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/3\/302\/48983535\/bioinformatics_25_3_302.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T18:46:12Z","timestamp":1675190772000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/3\/302\/244710"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,12,18]]},"references-count":13,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2009,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn643","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,2,1]]},"published":{"date-parts":[[2008,12,18]]}}}