{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T13:34:54Z","timestamp":1775309694239,"version":"3.50.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2016,3,11]],"date-time":"2016-03-11T00:00:00Z","timestamp":1457654400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2016,3,11]],"date-time":"2016-03-11T00:00:00Z","timestamp":1457654400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000270","name":"Natural Environment Research Council","doi-asserted-by":"publisher","award":["NE\/L011956\/1"],"award-info":[{"award-number":["NE\/L011956\/1"]}],"id":[{"id":"10.13039\/501100000270","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000396","name":"Technology Strategy Board","doi-asserted-by":"publisher","award":["TS\/J000175\/1"],"award-info":[{"award-number":["TS\/J000175\/1"]}],"id":[{"id":"10.13039\/501100000396","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","award":["EP\/H003851\/1"],"award-info":[{"award-number":["EP\/H003851\/1"]}],"id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Illumina\u2019s sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technology contains noise and our understanding of the peculiarities and sequencing errors encountered in Illumina data has lagged behind this rapid development.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We conducted a systematic investigation of errors and biases in Illumina data based on the largest collection of <jats:italic>in vitro<\/jats:italic> metagenomic data sets to date. We evaluated the Genome Analyzer II, HiSeq and MiSeq and tested state-of-the-art low input library preparation methods. Analysing <jats:italic>in vitro<\/jats:italic> metagenomic sequencing data allowed us to determine biases directly associated with the actual sequencing process. The position- and nucleotide-specific analysis revealed a substantial bias related to motifs (3mers preceding errors) ending in \u201cGG\u201d. On average the top three motifs were linked to 16 % of all substitution errors. Furthermore, a preferential incorporation of ddGTPs was recorded. We hypothesise that all of these biases are related to the engineered polymerase and ddNTPs which are intrinsic to any sequencing-by-synthesis method. We show that quality-score-based error removal strategies can on average remove 69 % of the substitution errors - however, the motif-bias remains.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, detecting them within metagenomes is therefore vital. Current error removal techniques are not designed to target the peculiarities encountered in Illumina sequencing data and other sequencing-by-synthesis methods, causing biases to persist and potentially affect any conclusions drawn from the data. In order to develop effective diagnostic and therapeutic approaches we need to be able to identify systematic sequencing errors and distinguish these errors from true genetic variation.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-016-0976-y","type":"journal-article","created":{"date-parts":[[2016,3,11]],"date-time":"2016-03-11T04:51:51Z","timestamp":1457671911000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":322,"title":["Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data"],"prefix":"10.1186","volume":"17","author":[{"given":"Melanie","family":"Schirmer","sequence":"first","affiliation":[]},{"given":"Rosalinda","family":"D\u2019Amore","sequence":"additional","affiliation":[]},{"given":"Umer Z.","family":"Ijaz","sequence":"additional","affiliation":[]},{"given":"Neil","family":"Hall","sequence":"additional","affiliation":[]},{"given":"Christopher","family":"Quince","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2016,3,11]]},"reference":[{"issue":"6","key":"976_CR1","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1093\/nar\/gku1341","volume":"43","author":"M Schirmer","year":"2015","unstructured":"Schirmer M, Ijaz UZ, D\u2019Amore R, Hall N, Sloan WT, Quince C. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform. Nucleic Acids Res. 2015; 43(6):37.","journal-title":"Nucleic Acids Res"},{"key":"976_CR2","unstructured":"Illumina. https:\/\/support.illumina.com\/content\/dam\/illumina-support\/documents\/myillumina\/f5f619d3-2c4c-489b-80a3-e0414baa4e89\/truseq_dna_sampleprep_guide_15026486_c.pdf (last checked March 2016)."},{"issue":"11","key":"976_CR3","doi-asserted-by":"crossref","first-page":"i","DOI":"10.1038\/nmeth.f.272","volume":"6","author":"F Syed","year":"2009","unstructured":"Syed F, Grunenwald H, Caruccio N. Next-generation sequencing library preparation: simultaneous fragmentation and tagging using in vitro transposition. Nature Methods. 2009; 6(11):i\u2013ii.","journal-title":"Nature Methods"},{"issue":"1","key":"976_CR4","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1101\/gr.124016.111","volume":"22","author":"NJ Parkinson","year":"2012","unstructured":"Parkinson NJ, Maslau S, Ferneyhough B, Zhang G, Gregory L, Buck D, Ragoussis J, Ponting CP, Fischer MD. Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA. Genome Res. 2012; 22(1):125\u201333.","journal-title":"Genome Res"},{"issue":"16","key":"976_CR5","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1093\/nar\/gkn425","volume":"36","author":"JC Dohm","year":"2008","unstructured":"Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 2008; 36(16):105\u20135.","journal-title":"Nucleic Acids Res"},{"issue":"13","key":"976_CR6","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1093\/nar\/gkr344","volume":"39","author":"K Nakamura","year":"2011","unstructured":"Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, et al.Sequence-specific error profile of Illumina sequencers. Nucleic Acids Res. 2011; 39(13):90\u20130.","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"976_CR7","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1186\/gb-2011-12-11-r112","volume":"12","author":"AE Minoche","year":"2011","unstructured":"Minoche AE, Dohm JC, Himmelbauer H, et al. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems. Genome Biol. 2011; 12(11):112.","journal-title":"Genome Biol"},{"issue":"1","key":"976_CR8","doi-asserted-by":"publisher","first-page":"451","DOI":"10.1186\/1471-2105-12-451","volume":"12","author":"F Meacham","year":"2011","unstructured":"Meacham F, Boffelli D, Dhahbi J, Martin D, Singer M, Pachter L. Identification and correction of systematic error in high-throughput sequence data. BMC Bioinforma. 2011; 12(1):451.","journal-title":"BMC Bioinforma"},{"issue":"Suppl 5","key":"976_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-14-S5-S1","volume":"14","author":"M Allhoff","year":"2013","unstructured":"Allhoff M, Sch\u00f6nhuth A, Martin M, Costa IG, Rahmann S, Marschall T. Discovering motifs that induce sequencing errors. BMC Bioinforma. 2013; 14(Suppl 5):1.","journal-title":"BMC Bioinforma"},{"key":"976_CR10","unstructured":"https:\/\/github.com\/najoshi\/sickle (last checked March 2016)."},{"issue":"5","key":"976_CR11","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1089\/cmb.2012.0021","volume":"19","author":"A Bankevich","year":"2012","unstructured":"Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012; 19(5):455\u201377.","journal-title":"J Comput Biol"},{"issue":"17","key":"976_CR12","doi-asserted-by":"publisher","first-page":"9491","DOI":"10.1073\/pnas.96.17.9491","volume":"96","author":"Y Li","year":"1999","unstructured":"Li Y, Mitaxov V, Waksman G. Structure-based design of Taq DNA polymerases with improved properties of dideoxynucleotide incorporation. Proc Natl Acad Sci. 1999; 96(17):9491\u2013496.","journal-title":"Proc Natl Acad Sci"},{"key":"976_CR13","first-page":"305","volume":"5","author":"C Chen","year":"2014","unstructured":"Chen C. DNA polymerases drive DNA sequencing-by-synthesis technologies: Both past and present. Evol Gen Microbiol. 2014; 5:305.","journal-title":"Evol Gen Microbiol"},{"issue":"7218","key":"976_CR14","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1038\/nature07517","volume":"456","author":"DR Bentley","year":"2008","unstructured":"Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008; 456(7218):53\u20139.","journal-title":"Nature"},{"issue":"1","key":"976_CR15","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/j.gpb.2013.01.003","volume":"11","author":"F Chen","year":"2013","unstructured":"Chen F, Dong M, Ge M, Zhu L, Ren L, Liu G, Mu R. The history and advances of reversible terminators used in new generations of sequencing technology. Genomics, Proteomics & Bioinformatics. 2013; 11(1):34\u201340.","journal-title":"Genomics, Proteomics & Bioinformatics"},{"issue":"1","key":"976_CR16","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/1759-8753-3-3","volume":"3","author":"B Green","year":"2012","unstructured":"Green B, Bouchier C, Fairhead C, Craig NL, Cormack BP. Insertion site preference of Mu, Tn5, and Tn7 transposons. Mobile DNA. 2012; 3(1):3.","journal-title":"Mobile DNA"},{"issue":"22","key":"976_CR17","doi-asserted-by":"publisher","first-page":"8071","DOI":"10.1128\/AEM.05610-11","volume":"77","author":"R Marine","year":"2011","unstructured":"Marine R, Polson SW, Ravel J, Hatfull G, Russell D, Sullivan M, Syed F, Dumas M, Wommack KE. Evaluation of a transposase protocol for rapid generation of shotgun high-throughput sequencing libraries from nanogram quantities of DNA. Appl Environ Microbiol. 2011; 77(22):8071\u2013079.","journal-title":"Appl Environ Microbiol"},{"issue":"5","key":"976_CR18","doi-asserted-by":"publisher","first-page":"1199","DOI":"10.1046\/j.1365-2958.2003.03382.x","volume":"47","author":"WS Reznikoff","year":"2003","unstructured":"Reznikoff WS. Tn5 as a model for understanding DNA transposition. Mole Microbiol. 2003; 47(5):1199\u20131206.","journal-title":"Mole Microbiol"},{"issue":"5","key":"976_CR19","doi-asserted-by":"publisher","first-page":"1213","DOI":"10.1016\/j.jmb.2003.11.039","volume":"335","author":"B Ason","year":"2004","unstructured":"Ason B, Reznikoff WS. DNA sequence bias during Tn5 transposition. J Mole Biol. 2004; 335(5):1213\u20131225.","journal-title":"J Mole Biol"},{"issue":"6","key":"976_CR20","doi-asserted-by":"publisher","first-page":"1882","DOI":"10.1111\/1462-2920.12086","volume":"15","author":"M Shakya","year":"2013","unstructured":"Shakya M, Quince C, Campbell JH, Yang ZK, Schadt CW, Podar M. Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. Environ Microbiol. 2013; 15(6):1882\u201399.","journal-title":"Environ Microbiol"},{"key":"976_CR21","unstructured":"http:\/\/www.vicbioinformatics.com\/software.velvetoptimiser.shtml (last checked March 2016)."},{"issue":"14","key":"976_CR22","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. 2009; 25(14):1754\u20131760.","journal-title":"Bioinforma"},{"issue":"16","key":"976_CR23","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, et al.The sequence alignment\/map format and SAMtools. Bioinforma. 2009; 25(16):2078\u2013079.","journal-title":"Bioinforma"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-0976-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-016-0976-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-0976-y","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-0976-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T17:58:19Z","timestamp":1706810299000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-0976-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,3,11]]},"references-count":23,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["976"],"URL":"https:\/\/doi.org\/10.1186\/s12859-016-0976-y","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,3,11]]},"assertion":[{"value":"16 October 2015","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 March 2016","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 March 2016","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"125"}}