{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T11:38:43Z","timestamp":1778845123119,"version":"3.51.4"},"reference-count":17,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1490,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment\u2014in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants.<\/jats:p>\n               <jats:p>Approach: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality.<\/jats:p>\n               <jats:p>Results: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can \u2018resurrect\u2019 many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms.<\/jats:p>\n               <jats:p>Availability: LoQuM is available as open source at http:\/\/compbio.case.edu\/loqum\/.<\/jats:p>\n               <jats:p>Contact: \u00a0matthew.ruffalo@case.edu.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts408","type":"journal-article","created":{"date-parts":[[2012,9,7]],"date-time":"2012-09-07T20:35:22Z","timestamp":1347050122000},"page":"i349-i355","source":"Crossref","is-referenced-by-count":29,"title":["Accurate estimation of short read mapping quality for next-generation genome sequencing"],"prefix":"10.1093","volume":"28","author":[{"given":"Matthew","family":"Ruffalo","sequence":"first","affiliation":[{"name":"1 Department of Electrical Engineering & Computer Science"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mehmet","family":"Koyut\u00fcrk","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering & Computer Science"},{"name":"3 Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Soumya","family":"Ray","sequence":"additional","affiliation":[{"name":"1 Department of Electrical Engineering & Computer Science"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas","family":"LaFramboise","sequence":"additional","affiliation":[{"name":"2 Department of Genetics"},{"name":"3 Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2012,9,3]]},"reference":[{"key":"2023012513022023300_B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/ng.437","article-title":"Personalized copy number and segmental duplication maps using next-generation sequencing","volume":"41","author":"Alkan","year":"2009","journal-title":"Nat. Genet."},{"key":"2023012513022023300_B2","article-title":"FastQC: a quality control tool for high throughput sequence data","author":"Andrews","year":"2010"},{"key":"2023012513022023300_B3","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1038\/nmeth0810-576","article-title":"mrsFAST: a cache-oblivious algorithm for short-read mapping","volume":"7","author":"Hach","year":"2010","journal-title":"Nat. Methods"},{"key":"2023012513022023300_B4","doi-asserted-by":"crossref","first-page":"R99","DOI":"10.1186\/gb-2010-11-10-r99","article-title":"Improved variant discovery through local realignment of short-read next-generation sequencing data using SRMA","volume":"11","author":"Homer","year":"2010","journal-title":"Genome Biol."},{"key":"2023012513022023300_B5","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012513022023300_B6","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1111\/j.1471-8286.2007.02019.x","article-title":"Sequencing breakthroughs for genomic ecology and evolutionary biology","volume":"8","author":"Hudson","year":"2008","journal-title":"Mol. Ecol. Resources"},{"key":"2023012513022023300_B7","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012513022023300_B8","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012513022023300_B9","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2009","journal-title":"Genome Res."},{"key":"2023012513022023300_B10","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","article-title":"SOAP2: an improved ultrafast tool for short read alignment","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012513022023300_B11","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/j.tig.2007.12.007","article-title":"The impact of next-generation sequencing technology on genetics","volume":"24","author":"Mardis","year":"2008","journal-title":"Trends Genet."},{"key":"2023012513022023300_B12","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1038\/nrg2841","article-title":"Advances in understanding cancer genomes through second-generation sequencing","volume":"11","author":"Meyerson","year":"2010","journal-title":"Nat Rev Genet"},{"key":"2023012513022023300_B13","author":"Novocraft","year":"2010"},{"key":"2023012513022023300_B14","doi-asserted-by":"crossref","first-page":"2790","DOI":"10.1093\/bioinformatics\/btr477","article-title":"Comparative analysis of algorithms for next-generation sequencing read alignment","volume":"27","author":"Ruffalo","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012513022023300_B15","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1016\/j.ymeth.2009.03.001","article-title":"Chip-seq: using high-throughput sequencing to discover protein-dna interactions","volume":"48","author":"Schmidt","year":"2009","journal-title":"Methods"},{"key":"2023012513022023300_B16","doi-asserted-by":"crossref","first-page":"e17490","DOI":"10.1371\/journal.pone.0017490","article-title":"Integrated analysis of gene expression, cpg island methylation, and gene copy number in breast cancer cells by deep sequencing","volume":"6","author":"Sun","year":"2011","journal-title":"PLoS One"},{"key":"2023012513022023300_B17","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nrg2484","article-title":"RNA-Seq: a revolutionary tool for transcriptomics","volume":"10","author":"Wang","year":"2009","journal-title":"Nat. Rev. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/18\/i349\/48883470\/bioinformatics_28_18_i349.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/18\/i349\/48883470\/bioinformatics_28_18_i349.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T18:51:33Z","timestamp":1674672693000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/18\/i349\/249968"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9,3]]},"references-count":17,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2012,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts408","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,9,15]]},"published":{"date-parts":[[2012,9,3]]}}}