{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:33:52Z","timestamp":1772724832054,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"11","funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Optimizing seed selection is an important problem in read mapping. The number of non-overlapping seeds a mapper selects determines the sensitivity of the mapper while the total frequency of all selected seeds determines the speed of the mapper. Modern seed-and-extend mappers usually select seeds with either an equal and fixed-length scheme or with an inflexible placement scheme, both of which limit the ability of the mapper in selecting less frequent seeds to speed up the mapping process. Therefore, it is crucial to develop a new algorithm that can adjust both the individual seed length and the seed placement, as well as derive less frequent seeds.<\/jats:p>\n               <jats:p>Results: We present the Optimal Seed Solver (OSS), a dynamic programming algorithm that discovers the least frequently-occurring set of x seeds in an L-base-pair read in O(x\u00d7L) operations on average and in O(x\u00d7L2) operations in the worst case, while generating a maximum of O(L2) seed frequency database lookups. We compare OSS against four state-of-the-art seed selection schemes and observe that OSS provides a 3-fold reduction in average seed frequency over the best previous seed selection optimizations.<\/jats:p>\n               <jats:p>Availability and implementation: We provide an implementation of the Optimal Seed Solver in C++ at: https:\/\/github.com\/CMU-SAFARI\/Optimal-Seed-Solver<\/jats:p>\n               <jats:p>Contact: \u00a0hxin@cmu.edu, calkan@cs.bilkent.edu.tr or onur@cmu.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv670","type":"journal-article","created":{"date-parts":[[2015,11,15]],"date-time":"2015-11-15T01:38:21Z","timestamp":1447551501000},"page":"1632-1642","source":"Crossref","is-referenced-by-count":23,"title":["Optimal seed solver: optimizing seed selection in read mapping"],"prefix":"10.1093","volume":"32","author":[{"given":"Hongyi","family":"Xin","sequence":"first","affiliation":[{"name":"1 Computer Science Department,"}]},{"given":"Sunny","family":"Nahar","sequence":"additional","affiliation":[{"name":"1 Computer Science Department,"}]},{"given":"Richard","family":"Zhu","sequence":"additional","affiliation":[{"name":"1 Computer Science Department,"}]},{"given":"John","family":"Emmons","sequence":"additional","affiliation":[{"name":"5 Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA"}]},{"given":"Gennady","family":"Pekhimenko","sequence":"additional","affiliation":[{"name":"1 Computer Science Department,"}]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[{"name":"3 Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA,"}]},{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[{"name":"4 Department of Computer Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey and"}]},{"given":"Onur","family":"Mutlu","sequence":"additional","affiliation":[{"name":"1 Computer Science Department,"},{"name":"2 Department of Electrical and Computer Engineering,"}]}],"member":"286","published-online":{"date-parts":[[2015,11,14]]},"reference":[{"key":"2023020112284922600_btv670-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1\u2009092 human genomes","volume":"491","author":"1000 Genomes Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B3","doi-asserted-by":"crossref","first-page":"e41","DOI":"10.1093\/nar\/gkr1246","article-title":"Hobbes: optimized gram-based methods for efficient read alignment","volume":"40","author":"Ahmadi","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023020112284922600_btv670-B4","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/ng.437","article-title":"Personalized copy number and segmental duplication maps using next-generation sequencing","volume":"41","author":"Alkan","year":"2009","journal-title":"Nat. Genet."},{"key":"2023020112284922600_btv670-B5","article-title":"A block-sorting lossless data compression algorithm Technical Report, 124, Digital Equipment Corporation","author":"Burrows","year":"1994"},{"key":"2023020112284922600_btv670-B6","doi-asserted-by":"crossref","first-page":"1550011","DOI":"10.1142\/S0219720015500110","article-title":"Multiple seeds sensitivity using a single seed with threshold","volume":"13","author":"Egidi","year":"2015","journal-title":"J. Bioinf. Comput. Biol."},{"key":"2023020112284922600_btv670-B7","doi-asserted-by":"crossref","DOI":"10.1109\/SFCS.2000.892127","article-title":"Opportunistic data structures with applications","author":"Ferragina","year":"2000"},{"key":"2023020112284922600_btv670-B8","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/ng.2915","article-title":"Loss-of-function mutations in slc30a8 protect against type 2 diabetes","volume":"46","author":"Flannick","year":"2014","journal-title":"Nat. Genet."},{"key":"2023020112284922600_btv670-B9","doi-asserted-by":"crossref","first-page":"S6","DOI":"10.1038\/nmeth.1376","article-title":"Sense from sequence reads: methods for alignment and assembly","volume":"6","author":"Flicek","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020112284922600_btv670-B10","doi-asserted-by":"crossref","first-page":"710","DOI":"10.1126\/science.1188021","article-title":"A draft sequence of the Neandertal genome","volume":"328","author":"Green","year":"2010","journal-title":"Science"},{"key":"2023020112284922600_btv670-B11","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1101\/gr.113985.110","article-title":"Adaptive seeds tame genomic sequence comparison","volume":"21","author":"Kie\u0142basa","year":"2011","journal-title":"Genome Res."},{"key":"2023020112284922600_btv670-B12","first-page":"222","article-title":"Approximate string matching using a bidirectional index","volume-title":"Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 8486","author":"Kucherov","year":"2014"},{"key":"2023020112284922600_btv670-B13","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Method"},{"key":"2023020112284922600_btv670-B14","article-title":"Aligning sequence reads, clone sequences and assembly contigs with bwa-mem","author":"Li","year":"2013"},{"key":"2023020112284922600_btv670-B15","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1093\/bioinformatics\/18.3.440","article-title":"Patternhunter: faster and more sensitive homology search","volume":"18","author":"Ma","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020112284922600_btv670-B16","doi-asserted-by":"crossref","first-page":"1185","DOI":"10.1038\/nmeth.2221","article-title":"The gem mapper: fast, accurate and versatile alignment by filtration","volume":"9","author":"Marco-Sola","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020112284922600_btv670-B17","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1038\/nature07744","article-title":"A burst of segmental duplications in the genome of the African great ape ancestor","volume":"457","author":"Marques-Bonet","year":"2009","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B18","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1126\/science.1224344","article-title":"A high-coverage genome sequence from an archaic denisovan individual","volume":"338","author":"Meyer","year":"2012","journal-title":"Science"},{"key":"2023020112284922600_btv670-B19","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1145\/316542.316550","article-title":"A fast bit-vector algorithm for approximate string matching based on dynamic programming","volume":"46","author":"Myers","year":"1999","journal-title":"J. ACM"},{"key":"2023020112284922600_btv670-B20","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1038\/nature09807","article-title":"Tumour evolution inferred by single-cell sequencing","volume":"472","author":"Navin","year":"2011","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B21","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol."},{"key":"2023020112284922600_btv670-B22","doi-asserted-by":"crossref","first-page":"790","DOI":"10.1038\/ng.646","article-title":"Exome sequencing identifies MLL2 mutations as a cause of kabuki syndrome","volume":"42","author":"Ng","year":"2010","journal-title":"Nat. Genet."},{"key":"2023020112284922600_btv670-B23","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1038\/nature12228","article-title":"Great ape genetic diversity and population history","volume":"499","author":"Prado-Martinez","year":"2013","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B24","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1089\/cmb.2006.13.296","article-title":"Efficient q-gram filters for finding all e-matches over a given length","volume":"13","author":"Rasmussen","year":"2006","journal-title":"J. Comput. Biol."},{"key":"2023020112284922600_btv670-B25","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/nature09710","article-title":"Genetic history of an archaic hominin group from Denisova Cave in Siberia","volume":"468","author":"Reich","year":"2010","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B26","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1186\/1471-2105-12-221","article-title":"Faster smith-waterman database searches with inter-sequence simd parallelisation","volume":"12","author":"Rognes","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020112284922600_btv670-B27","doi-asserted-by":"crossref","first-page":"e1000386","DOI":"10.1371\/journal.pcbi.1000386","article-title":"Shrimp: Accurate mapping of short color-space reads","volume":"5","author":"Rumble","year":"2009","journal-title":"PLoS Comput. Biol."},{"key":"2023020112284922600_btv670-B28","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1038\/nature10842","article-title":"Insights into hominid evolution from the gorilla genome sequence","volume":"483","author":"Scally","year":"2012","journal-title":"Nature"},{"key":"2023020112284922600_btv670-B29","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol."},{"key":"2023020112284922600_btv670-B30","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1186\/1756-0500-1-107","article-title":"SWPS3 - fast multi-threaded vectorized Smith\u2013Waterman for IBM Cell\/B.e. and x86\/SSE2","volume":"1","author":"Szalkowski","year":"2008","journal-title":"BMC Res. Notes"},{"key":"2023020112284922600_btv670-B31","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/ng.542","article-title":"Phf6 mutations in t-cell acute lymphoblastic leukemia","volume":"42","author":"Van Vlierberghe","year":"2010","journal-title":"Nat. Genet."},{"key":"2023020112284922600_btv670-B32","doi-asserted-by":"crossref","first-page":"1640","DOI":"10.1101\/gr.124461.111","article-title":"Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee","volume":"21","author":"Ventura","year":"2011","journal-title":"Genome Res."},{"key":"2023020112284922600_btv670-B33","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1093\/bioinformatics\/bts505","article-title":"RazerS 3: faster, fully sensitive read mapping","volume":"28","author":"Weese","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020112284922600_btv670-B34","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1186\/1471-2164-14-S1-S13","article-title":"Accelerating read mapping with FastHASH","volume":"14","author":"Xin","year":"2013","journal-title":"BMC Genomics"},{"key":"2023020112284922600_btv670-B35","doi-asserted-by":"crossref","first-page":"1553","DOI":"10.1093\/bioinformatics\/btu856","article-title":"Shifted hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping","volume":"31","author":"Xin","year":"2015","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/11\/1632\/49019228\/bioinformatics_32_11_1632.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/11\/1632\/49019228\/bioinformatics_32_11_1632.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:33:14Z","timestamp":1675290794000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/11\/1632\/1742696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,11,14]]},"references-count":35,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2016,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv670","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,6,1]]},"published":{"date-parts":[[2015,11,14]]}}}