{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T00:37:45Z","timestamp":1775522265346,"version":"3.50.1"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Short-read data from next-generation sequencing technologies are now being generated across a range of research projects. The fidelity of this data can be affected by several factors and it is important to have simple and reliable approaches for monitoring it at the level of individual experiments.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We developed a fast, scalable and accurate approach to estimating error rates in short reads, which has the added advantage of not requiring a reference genome. We build on the fundamental observation that there is a linear relationship between the copy number for a given read and the number of erroneous reads that differ from the read of interest by one or two bases. The slope of this relationship can be transformed to give an estimate of the error rate, both by read and by position. We present simulation studies as well as analyses of real data sets illustrating the precision and accuracy of this method, and we show that it is more accurate than alternatives that count the difference between the sample of interest and a reference genome. We show how this methodology led to the detection of mutations in the genome of the PhiX strain used for calibration of Illumina data. The proposed method is implemented in an R package, which can be downloaded from <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/bcb.dfci.harvard.edu\/~vwang\/shadowRegression.html\" ext-link-type=\"uri\">http:\/\/bcb.dfci.harvard.edu\/\u223cvwang\/shadowRegression.html<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>The proposed method can be used to monitor the quality of sequencing pipelines at the level of individual experiments without the use of reference genomes. Furthermore, having an estimate of the error rates gives one the opportunity to improve analyses and inferences in many applications of next-generation sequencing data.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-13-185","type":"journal-article","created":{"date-parts":[[2012,7,30]],"date-time":"2012-07-30T10:14:56Z","timestamp":1343643296000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":56,"title":["Estimation of sequencing error rates in short reads"],"prefix":"10.1186","volume":"13","author":[{"given":"Xin","family":"Victoria Wang","sequence":"first","affiliation":[]},{"given":"Natalie","family":"Blades","sequence":"additional","affiliation":[]},{"given":"Jie","family":"Ding","sequence":"additional","affiliation":[]},{"given":"Razvan","family":"Sultana","sequence":"additional","affiliation":[]},{"given":"Giovanni","family":"Parmigiani","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,7,30]]},"reference":[{"issue":"10","key":"5439_CR1","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1038\/nbt1486","volume":"26","author":"J Shendure","year":"2008","unstructured":"Shendure J, Ji H: Next-generation DNA sequencing. Nature Biotechnology. 2008, 26 (10): 1135-1145. 10.1038\/nbt1486.","journal-title":"Nature Biotechnology"},{"issue":"8","key":"5439_CR2","doi-asserted-by":"publisher","first-page":"679","DOI":"10.1038\/nmeth.1230","volume":"5","author":"Y Erlich","year":"2008","unstructured":"Erlich Y, Mitra PP, delaBastide M, McCombie WR, Hannon GJ: Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nature Methods. 2008, 5 (8): 679-682. 10.1038\/nmeth.1230.","journal-title":"Nature Methods"},{"key":"5439_CR3","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1186\/1471-2105-9-431","volume":"9","author":"J Rougemont","year":"2008","unstructured":"Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F: Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics. 2008, 9: 431-10.1186\/1471-2105-9-431.","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"5439_CR4","doi-asserted-by":"publisher","first-page":"1884","DOI":"10.1101\/gr.095299.109","volume":"19","author":"W Kao","year":"2009","unstructured":"Kao W, Stevens K, Song Y: BayesCall: a model-based base-calling algorithm for high-throughput short-read sequencing. Genome Research. 2009, 19 (10): 1884-10.1101\/gr.095299.109.","journal-title":"Genome Research"},{"issue":"3","key":"5439_CR5","doi-asserted-by":"publisher","first-page":"665","DOI":"10.1111\/j.1541-0420.2009.01353.x","volume":"66","author":"H Bravo","year":"2010","unstructured":"Bravo H, Irizarry R: Model-based quality assessment and base-calling for second-generation sequencing data. Biometrics. 2010, 66 (3): 665-674. 10.1111\/j.1541-0420.2009.01353.x.","journal-title":"Biometrics"},{"issue":"3","key":"5439_CR6","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1101\/gr.8.3.186","volume":"8","author":"B Ewing","year":"1998","unstructured":"Ewing B, Green P: Base-calling of automated sequencer traces using Phred. II. error probabilities. Genome Research. 1998, 8 (3): 186-","journal-title":"Genome Research"},{"issue":"16","key":"5439_CR7","doi-asserted-by":"publisher","first-page":"e105","DOI":"10.1093\/nar\/gkn425","volume":"36","author":"J Dohm","year":"2008","unstructured":"Dohm J, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research. 2008, 36 (16): e105-10.1093\/nar\/gkn425.","journal-title":"Nucleic Acids Research"},{"issue":"12","key":"5439_CR8","doi-asserted-by":"publisher","first-page":"e131","DOI":"10.1093\/nar\/gkq224","volume":"38","author":"K Hansen","year":"2010","unstructured":"Hansen K, Brenner S, Dudoit S: Biases in illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research. 2010, 38 (12): e131-10.1093\/nar\/gkq224.","journal-title":"Nucleic Acids Research"},{"issue":"7","key":"5439_CR9","doi-asserted-by":"publisher","first-page":"R143","DOI":"10.1186\/gb-2007-8-7-r143","volume":"8","author":"S Huse","year":"2007","unstructured":"Huse S, Huber J, Morrison H, Sogin M, Welch D: Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biology. 2007, 8 (7): R143-10.1186\/gb-2007-8-7-r143.","journal-title":"Genome Biology"},{"key":"5439_CR10","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1471-2105-11-94","volume":"11","author":"J Bullard","year":"2010","unstructured":"Bullard J, Purdom E, Hansen K, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010, 11: 94-10.1186\/1471-2105-11-94.","journal-title":"BMC Bioinformatics"},{"issue":"5","key":"5439_CR11","doi-asserted-by":"publisher","first-page":"810","DOI":"10.1101\/gr.7337908","volume":"18","author":"J Butler","year":"2008","unstructured":"Butler J, MacCallum I, Kleber M, Shlyakhter I, Belmonte M, Lander E, Nusbaum C, Jaffe D: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research. 2008, 18 (5): 810-10.1101\/gr.7337908.","journal-title":"Genome Research"},{"issue":"17","key":"5439_CR12","first-page":"2009","volume":"25","author":"J Schr\u00f6der","year":"2157","unstructured":"Schr\u00f6der J, Schr\u00f6der H, Puglisi S, Sinha R, Schmidt B: SHREC: a short-read error correction method. Bioinformatics. 2157, 25 (17): 2009-","journal-title":"Bioinformatics"},{"issue":"11","key":"5439_CR13","doi-asserted-by":"publisher","first-page":"R116","DOI":"10.1186\/gb-2010-11-11-r116","volume":"11","author":"D Kelley","year":"2010","unstructured":"Kelley D, Schatz M, Salzberg S: Quake: quality-aware detection and correction of sequencing errors. Genome Biology. 2010, 11 (11): R116-10.1186\/gb-2010-11-11-r116.","journal-title":"Genome Biology"},{"issue":"10","key":"5439_CR14","doi-asserted-by":"publisher","first-page":"1284","DOI":"10.1093\/bioinformatics\/btq151","volume":"26","author":"L Salmela","year":"2010","unstructured":"Salmela L: Correction of sequencing errors in a mixed set of reads. Bioinformatics. 2010, 26 (10): 1284-10.1093\/bioinformatics\/btq151.","journal-title":"Bioinformatics"},{"issue":"9","key":"5439_CR15","doi-asserted-by":"publisher","first-page":"e12681","DOI":"10.1371\/journal.pone.0012681","volume":"5","author":"J Schr\u00f6der","year":"2010","unstructured":"Schr\u00f6der J, Bailey J, Conway T, Zobel J: Reference-free validation of short read data. PloS ONE. 2010, 5 (9): e12681-10.1371\/journal.pone.0012681.","journal-title":"PloS ONE"},{"issue":"2","key":"5439_CR16","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1101\/gr.097261.109","volume":"20","author":"R Li","year":"2010","unstructured":"Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research. 2010, 20 (2): 265-10.1101\/gr.097261.109.","journal-title":"Genome Research"},{"issue":"7","key":"5439_CR17","doi-asserted-by":"publisher","first-page":"1181","DOI":"10.1101\/gr.111351.110","volume":"21","author":"W Kao","year":"2011","unstructured":"Kao W, Chan A, Song Y: ECHO: A reference-free short-read error correction algorithm. Genome Research. 2011, 21 (7): 1181-1192. 10.1101\/gr.111351.110.","journal-title":"Genome Research"},{"issue":"suppl 1","key":"5439_CR18","doi-asserted-by":"publisher","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","volume":"39","author":"R Leinonen","year":"2011","unstructured":"Leinonen R, Sugawara H, Shumway M: The sequence read archive. Nucleic Acids Research. 2011, 39 (suppl 1): D19-","journal-title":"Nucleic Acids Research"},{"issue":"7","key":"5439_CR19","doi-asserted-by":"publisher","first-page":"1051","DOI":"10.1101\/gr.10.7.1051","volume":"10","author":"A Lash","year":"2000","unstructured":"Lash A, Tolstoshev C, Wagner L, Schuler G, Strausberg R, Riggins G, Altschul S: SAGEmap: a public gene expression resource. Genome Research. 2000, 10 (7): 1051-10.1101\/gr.10.7.1051.","journal-title":"Genome Research"},{"issue":"13","key":"5439_CR20","doi-asserted-by":"publisher","first-page":"e90","DOI":"10.1093\/nar\/gkr344","volume":"39","author":"K Nakamura","year":"2011","unstructured":"Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak M, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S: Sequence-specific error profile of Illumina sequencers. Nucleic Acids Research. 2011, 39 (13): e90-e90. 10.1093\/nar\/gkr344.","journal-title":"Nucleic Acids Research"},{"issue":"7218","key":"5439_CR21","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1038\/nature07517","volume":"456","author":"D Bentley","year":"2008","unstructured":"Bentley D, Balasubramanian S, Swerdlow H, Smith G, Milton J, Brown C, Hall K, Evers D, Barnes C, Bignell H, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456 (7218): 53-59. 10.1038\/nature07517.","journal-title":"Nature"},{"issue":"2","key":"5439_CR22","doi-asserted-by":"publisher","first-page":"747","DOI":"10.1534\/genetics.109.106005","volume":"183","author":"J Cuevas","year":"2009","unstructured":"Cuevas J, Duffy S, Sanjuan R: Point mutation rate of bacteriophage \u03a6X174. Genetics. 2009, 183 (2): 747-749. 10.1534\/genetics.109.106005.","journal-title":"Genetics"},{"issue":"9","key":"5439_CR23","doi-asserted-by":"publisher","first-page":"1151","DOI":"10.1038\/nbt1239","volume":"24","author":"L Shi","year":"2006","unstructured":"Shi L, Reid L, Jones W, Shippy R, Warrington J, Baker S, Collins P, De Longueville F, Kawasaki E, Lee K: The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nature Biotechnology. 2006, 24 (9): 1151-1161. 10.1038\/nbt1239.","journal-title":"Nature Biotechnology"},{"issue":"7146","key":"5439_CR24","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1038\/nature05874","volume":"447","author":"E Birney","year":"2007","unstructured":"Birney E, Stramatoyannopoulos JA, Dutta A, Guig\u00f3 R, Thomas R, Elliott H, Zhiping Weng M, Emmanouil T, John A, Robert E, Michael S, Christopher M, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447 (7146): 799-816. 10.1038\/nature05874.","journal-title":"Nature"},{"key":"5439_CR25","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1007\/s11568-010-9137-y","volume":"3","author":"H Hu","year":"2009","unstructured":"Hu H, Wrogemann K, Kalscheuer V, Tzschach A, Richard H, Haas S, Menzel C, Bienek M, Froyen G, Raynaud M, Van Bokhoven H, Chelly J, Ropers H, Chen W: Mutation screening in 86 known X-linked mental retardation genes by droplet-based multiplex PCR and massive parallel sequencing. HUGO J. 2009, 3: 41-49. 10.1007\/s11568-010-9137-y.","journal-title":"HUGO J"},{"issue":"5235","key":"5439_CR26","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1126\/science.270.5235.484","volume":"270","author":"V Velculescu","year":"1995","unstructured":"Velculescu V, Zhang L, Vogelstein B, Kinzler K: Serial analysis of gene expression. Science. 1995, 270 (5235): 484-10.1126\/science.270.5235.484.","journal-title":"Science"},{"issue":"2","key":"5439_CR27","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/S0092-8674(00)81845-0","volume":"88","author":"V Velculescu","year":"1997","unstructured":"Velculescu V, Vogelstein B, Kinzler K: Characterization of the yeast transcriptome. Cell. 1997, 88 (2): 243-251. 10.1016\/S0092-8674(00)81845-0.","journal-title":"Cell"},{"issue":"5316","key":"5439_CR28","doi-asserted-by":"publisher","first-page":"1268","DOI":"10.1126\/science.276.5316.1268","volume":"276","author":"L Zhang","year":"1997","unstructured":"Zhang L, Zhou W, Velculescu V, Kern S, Hruban R, Hamilton S, Vogelstein B, Kinzler K: Gene expression profiles in normal and cancer cells. Science. 1997, 276 (5316): 1268-10.1126\/science.276.5316.1268.","journal-title":"Science"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-185.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T20:18:59Z","timestamp":1630527539000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-185"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,7,30]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["5439"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-185","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,7,30]]},"assertion":[{"value":"20 December 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 July 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"185"}}