{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,5,6]],"date-time":"2023-05-06T08:10:29Z","timestamp":1683360629435},"reference-count":20,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Single nucleotide polymorphisms (SNPs) are locations at which the genomic sequences of population members differ. Since these differences are known to follow patterns, disease association studies are facilitated by identifying SNPs that allow the unique identification of such patterns. This process, known as haplotype tagging, is formulated as a combinatorial optimization problem and analyzed in terms of complexity and approximation properties.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>It is shown that the tagging problem is NP-hard but approximable within 1 + ln((<jats:italic>n<\/jats:italic><jats:sup>2<\/jats:sup>-<jats:italic>n<\/jats:italic>)\/2) for<jats:italic>n<\/jats:italic>haplotypes but not approximable within (1 -<jats:italic>\u03b5<\/jats:italic>) ln(<jats:italic>n<\/jats:italic>\/2) for any<jats:italic>\u03b5<\/jats:italic>&gt; 0 unless NP \u2282 DTIME(<jats:italic>n<\/jats:italic><jats:sup>log log<jats:italic>n<\/jats:italic><\/jats:sup>).<\/jats:p><jats:p>A simple, very easily implementable algorithm that exhibits the above upper bound on solution quality is presented. This algorithm has running time<jats:italic>O<\/jats:italic>(\"Equation missing\"<!-- image only, no MathML or LaTex -->(2<jats:italic>m<\/jats:italic>-<jats:italic>p<\/jats:italic>+ 1)) \u2264<jats:italic>O<\/jats:italic>(<jats:italic>m<\/jats:italic>(<jats:italic>n<\/jats:italic><jats:sup>2<\/jats:sup>-<jats:italic>n<\/jats:italic>)\/2) where<jats:italic>p<\/jats:italic>\u2264 min(<jats:italic>n<\/jats:italic>,<jats:italic>m<\/jats:italic>) for<jats:italic>n<\/jats:italic>haplotypes of size<jats:italic>m<\/jats:italic>. As we show that the approximation bound is asymptotically tight, the algorithm presented is optimal with respect to this asymptotic bound.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>The haplotype tagging problem is hard, but approachable with a fast, practical, and surprisingly simple algorithm that cannot be significantly improved upon on a single processor machine. Hence, significant improvement in computatational efforts expended can only be expected if the computational effort is distributed and done in parallel.<\/jats:p><\/jats:sec>","DOI":"10.1186\/1471-2105-7-8","type":"journal-article","created":{"date-parts":[[2006,1,23]],"date-time":"2006-01-23T12:36:09Z","timestamp":1138019769000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Approximation properties of haplotype tagging"],"prefix":"10.1186","volume":"7","author":[{"given":"Staal A","family":"Vinterbo","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stephan","family":"Dreiseitl","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lucila","family":"Ohno-Machado","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,1,9]]},"reference":[{"issue":"6990","key":"747_CR1","doi-asserted-by":"publisher","first-page":"446","DOI":"10.1038\/nature02623","volume":"429","author":"C Carlson","year":"2004","unstructured":"Carlson C, Eberle M, Kruglyak L, Nickerson D: Mapping complex disease loci in whole-genome association studies. Nature 2004, 429(6990):446\u2013452.","journal-title":"Nature"},{"issue":"6834","key":"747_CR2","doi-asserted-by":"publisher","first-page":"199","DOI":"10.1038\/35075590","volume":"411","author":"D Reich","year":"2001","unstructured":"Reich D, Cargill M, Bolk S, Ireland J, Sabeti P, Richter D, Lavery T, Kouyoumjian R, Farhadian S, Ward R, Lander E: Linkage disequilibrium in the human genome. Nature 2001, 411(6834):199\u2013204.","journal-title":"Nature"},{"key":"747_CR3","doi-asserted-by":"publisher","first-page":"R576","DOI":"10.1016\/S0960-9822(01)00348-7","volume":"11","author":"D Goldstein","year":"2001","unstructured":"Goldstein D, Weale M: Population genomics: Linkage disequilibrium holds the key. Current Biology 2001, 11: R576-R579.","journal-title":"Current Biology"},{"issue":"6822","key":"747_CR4","doi-asserted-by":"publisher","first-page":"928","DOI":"10.1038\/35057149","volume":"409","author":"International SNP Map Working Group","year":"2001","unstructured":"International SNP Map Working Group: A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001, 409(6822):928\u2013933.","journal-title":"Nature"},{"key":"747_CR5","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1016\/S0025-5564(03)00089-0","volume":"185","author":"C Wiuf","year":"2003","unstructured":"Wiuf C, Laidlaw Z, Stumpf MPH: Some notes on the combinatorial properties of haplotype tagging. Mathematical Biosciences 2003, 185: 205\u2013216.","journal-title":"Mathematical Biosciences"},{"issue":"1\u20133","key":"747_CR6","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1159\/000073732","volume":"56","author":"D Thompson","year":"2003","unstructured":"Thompson D, Stram D, Goldgar D, Witte J: Haplotype tagging single nucleotide polymorphisms and association studies. Hum Hered 2003, 56(1\u20133):48\u201355.","journal-title":"Hum Hered"},{"issue":"2","key":"747_CR7","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1093\/bioinformatics\/19.2.287","volume":"19","author":"X Ke","year":"2003","unstructured":"Ke X, Cardon L: Efficient selective screening of haplotype tag SNPs. Bioinformatics 2003, 19(2):287\u2013288.","journal-title":"Bioinformatics"},{"issue":"17","key":"747_CR8","doi-asserted-by":"publisher","first-page":"9900","DOI":"10.1073\/pnas.1633613100","volume":"100","author":"P Sebastiani","year":"2003","unstructured":"Sebastiani P, Lazarus R, Weiss S, Kunkel L, Kohane I, Ramoni M: Minimal haplotype tagging. Proc Natl Acad Sci USA 2003, 100(17):9900\u20139905.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"2","key":"747_CR9","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1038\/ng1001-233","volume":"29","author":"G Johnson","year":"2001","unstructured":"Johnson G, Esposito L, Barratt B, Smith A, Heward J, Genova GD, Ueda H, Cordell H, Eaves I, Dudbridge F, Twells R, Payne F, Hughes W, Nutland S, Stevens H, Carr P, Tuomilehto-Wolf E, Tuomilehto J, Gough S, Clayton D, Todd J: Haplotype tagging for the identification of common disease genes. Nature Genet 2001, 29(2):233\u2013237.","journal-title":"Nature Genet"},{"key":"747_CR10","first-page":"19","volume-title":"Proceedings of the seventh annual international conference on Computational molecular biology","author":"V Bafna","year":"2003","unstructured":"Bafna V, Halldorsson BV, Schwartz R, Clark AG, Istrail S: Haplotypes and informa tive SNP selection algorithms: don't block out information. In Proceedings of the seventh annual international conference on Computational molecular biology. ACM Press; 2003:19\u201327."},{"key":"747_CR11","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1093\/genetics\/111.1.147","volume":"111","author":"RR Hudson","year":"1985","unstructured":"Hudson RR, Kaplan N: Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 1985, 111: 147\u2013164.","journal-title":"Genetics"},{"key":"747_CR12","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1016\/S0022-0000(74)80044-9","volume":"9","author":"DS Johnson","year":"1974","unstructured":"Johnson DS: Approximation Algorithms for Combinatorial Problems. Journal of Computer and System Sciences 1974, 9: 256\u2013278.","journal-title":"Journal of Computer and System Sciences"},{"key":"747_CR13","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1002\/(SICI)1520-6750(199809)45:6<615::AID-NAV5>3.0.CO;2-5","volume":"45","author":"D Hochbaum","year":"1998","unstructured":"Hochbaum D, Pathria A: Analysis of the Greedy Approach in Covering Problems. Naval Research Quarterly 1998, 45: 615\u2013627.","journal-title":"Naval Research Quarterly"},{"key":"747_CR14","doi-asserted-by":"publisher","first-page":"634","DOI":"10.1145\/285055.285059","volume":"45","author":"U Feige","year":"1998","unstructured":"Feige U: A threshold of ln n for approximating set cover. J ACM 1998, 45: 634\u2013652.","journal-title":"J ACM"},{"key":"747_CR15","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1109\/ISTCS.1993.253482","volume-title":"Proc. 2nd Israel Symp. on Theory of Computing and Systems","author":"D Peleg","year":"1993","unstructured":"Peleg D, Schechtman G, Wool A: Approximating bounded 0\u20131 integer linear programs. In Proc. 2nd Israel Symp. on Theory of Computing and Systems. IEEE Computer Society, IEEE Computer Society; 1993:69\u201377."},{"key":"747_CR16","volume-title":"Minimum Diameter Covering Problems","author":"RH EM Arkin","year":"1994","unstructured":"EM Arkin RH: Minimum Diameter Covering Problems.1994. [Http:\/\/www.ams.sunysb.edu\/~estie\/publications.html]"},{"key":"747_CR17","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-58412-1","volume-title":"Complexity and approximation: combinatorial optimization problems and their approximability properties","author":"G Ausiello","year":"1999","unstructured":"Ausiello G, Crescenzi P, Gambosi G, Kann V, Marchetti-Spaccamela A, Protasi M: Complexity and approximation: combinatorial optimization problems and their approximability properties. Springer-Verlag; 1999."},{"key":"747_CR18","first-page":"256","volume-title":"Complexity and approximation: combinatorial optimization problems and their approximability properties","author":"R Duh","year":"1997","unstructured":"Duh R, F\u00fcrer M: Proc. 29th Ann. ACM Symp. on Theory of Comp. In Complexity and approximation: combinatorial optimization problems and their approximability properties. ACM; 1997:256\u2013265."},{"key":"747_CR19","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1145\/258533.258641","volume-title":"Proceedings of the twenty-ninth annual ACM symposium on Theory of computing","author":"R Raz","year":"1997","unstructured":"Raz R, Safra S: A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. ACM Press; 1997:475\u2013484."},{"key":"747_CR20","volume-title":"Introduction to Algorithms","author":"TH Cormen","year":"1990","unstructured":"Cormen TH, Leiserson CE, Rivest RL: Introduction to Algorithms. MIT Press\/McGraw-Hill; 1990."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,6]],"date-time":"2023-05-06T07:34:10Z","timestamp":1683358450000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,1,9]]},"references-count":20,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["747"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,1,9]]},"assertion":[{"value":"4 August 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 January 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 January 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"8"}}