{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,4]],"date-time":"2026-03-04T07:03:05Z","timestamp":1772607785301,"version":"3.50.1"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"16","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1551,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself.<\/jats:p>\n               <jats:p>Results: We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5\u201314% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA\/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the \u2018dark matter\u2019 of the genome, including of known clinically relevant variations in these regions.<\/jats:p>\n               <jats:p>Availability: The source code and profiles of several model organisms are available at http:\/\/gma-bio.sourceforge.net<\/jats:p>\n               <jats:p>Contact: \u00a0hlee@cshl.edu<\/jats:p>\n               <jats:p>Supplementary Information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts330","type":"journal-article","created":{"date-parts":[[2012,6,6]],"date-time":"2012-06-06T00:49:12Z","timestamp":1338943752000},"page":"2097-2105","source":"Crossref","is-referenced-by-count":117,"title":["Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score"],"prefix":"10.1093","volume":"28","author":[{"given":"Hayan","family":"Lee","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, Stony Brook University, Stony Brook, NY, USA and 2Simons Center for Quantitive Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA"},{"name":"1 Department of Computer Science, Stony Brook University, Stony Brook, NY, USA and 2Simons Center for Quantitive Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA"}]},{"given":"Michael C.","family":"Schatz","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Stony Brook University, Stony Brook, NY, USA and 2Simons Center for Quantitive Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA"},{"name":"1 Department of Computer Science, Stony Brook University, Stony Brook, NY, USA and 2Simons Center for Quantitive Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,7,4]]},"reference":[{"key":"2023012512532677500_B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2023012512532677500_B2","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1038\/nature07517","article-title":"Accurate whole human genome sequencing using reversible terminator chemistry","volume":"456","author":"Bentley","year":"2008","journal-title":"Nature"},{"key":"2023012512532677500_B3","article-title":"A block-sorting lossless data compression algorithm. Technical Report Digitial SRC Research Report 124","author":"Burrows","year":"1994"},{"key":"2023012512532677500_B4","first-page":"207","article-title":"Draft genome sequence of the sexually transmitted pathogen","volume":"315","author":"Carlton","year":"2007","journal-title":"Trichomonas vaginalis. Science"},{"key":"2023012512532677500_B5","first-page":"137","article-title":"MapReduce: simplified data processing on large clusters","volume-title":"Symposium on Operating System Design and Implementation (OSDI)","author":"Dean","year":"2004"},{"key":"2023012512532677500_B6","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1126\/science.1181498","article-title":"Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays","volume":"327","author":"Drmanac","year":"2010","journal-title":"Science (New York, N.Y.)"},{"key":"2023012512532677500_B7","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1126\/science.1105136","article-title":"The ENCODE (ENCyclopedia Of DNA Elements) project","volume":"306","author":"ENCODE Project Consortium","year":"2004","journal-title":"Science"},{"key":"2023012512532677500_B8","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1186\/1471-2164-12-245","article-title":"Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing","volume":"12","author":"Gilles","year":"2011","journal-title":"BMC Genom."},{"key":"2023012512532677500_B9","doi-asserted-by":"crossref","first-page":"3065","DOI":"10.1073\/pnas.1121491109","article-title":"Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011","volume":"109","author":"Grad","year":"2012","journal-title":"Proc. Nat. Acad. Sci."},{"key":"2023012512532677500_B10","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1186\/1471-2105-12-210","article-title":"A novel and well-defined benchmarking method for second generation read mapping","volume":"12","author":"Holtgrewe","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012512532677500_B11","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1038\/nature08987","article-title":"International network of cancer genome projects","volume":"464","author":"International Cancer Genome Consortium","year":"2010","journal-title":"Nature"},{"key":"2023012512532677500_B12","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"International Human Genome Sequencing Consortium","year":"2001","journal-title":"Nature"},{"key":"2023012512532677500_B13","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1093\/bioinformatics\/btq640","article-title":"The Uniqueome: a mappability resource for short-tag sequencing","volume":"27","author":"Koehler","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512532677500_B14","doi-asserted-by":"crossref","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","author":"Koren","year":"2012","journal-title":"Nat. Biotechnol."},{"key":"2023012512532677500_B15","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023012512532677500_B16","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with BurrowsWheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512532677500_B17","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512532677500_B18","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res."},{"key":"2023012512532677500_B19","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023012512532677500_B20","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1126\/science.1198374","article-title":"Identification of functional elements and regulatory circuits by Drosophila modENCODE","volume":"330","author":"modENCODE Consortium","year":"2010","journal-title":"Science (New York, N.Y.)"},{"key":"2023012512532677500_B21","doi-asserted-by":"crossref","first-page":"348","DOI":"10.1038\/nature10242","article-title":"An integrated semiconductor device enabling non-optical genome sequencing","volume":"475","author":"Rothberg","year":"2011","journal-title":"Nature"},{"key":"2023012512532677500_B22","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1038\/nbt0710-691","article-title":"Cloud computing and the DNA data race","volume":"28","author":"Schatz","year":"2010","journal-title":"Nat. Biotechnol."},{"key":"2023012512532677500_B23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/MSST.2010.5496972","article-title":"The hadoop distributed file system","volume-title":"2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)","author":"Shvachko","year":"2010"},{"key":"2023012512532677500_B24","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/gb-2010-11-5-207","article-title":"The case for cloud computing in genome informatics","volume":"11","author":"Stein","year":"2010","journal-title":"Genome Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/16\/2097\/48871578\/bioinformatics_28_16_2097.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/16\/2097\/48871578\/bioinformatics_28_16_2097.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T17:52:07Z","timestamp":1674669127000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/16\/2097\/323484"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,4]]},"references-count":24,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2012,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts330","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,8,15]]},"published":{"date-parts":[[2012,7,4]]}}}