{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T15:42:01Z","timestamp":1778600521846,"version":"3.51.4"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Solexa\/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-431","type":"journal-article","created":{"date-parts":[[2008,10,14]],"date-time":"2008-10-14T06:14:16Z","timestamp":1223964856000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":77,"title":["Probabilistic base calling of Solexa sequencing data"],"prefix":"10.1186","volume":"9","author":[{"given":"Jacques","family":"Rougemont","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arnaud","family":"Amzallag","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christian","family":"Iseli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Laurent","family":"Farinelli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ioannis","family":"Xenarios","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felix","family":"Naef","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2008,10,13]]},"reference":[{"issue":"6","key":"2416_CR1","doi-asserted-by":"publisher","first-page":"545","DOI":"10.1016\/j.gde.2006.10.009","volume":"16","author":"DR Bentley","year":"2006","unstructured":"Bentley DR: Whole-genome re-sequencing. Current Opinion in Genetics & Development 2006, 16(6):545\u2013552. 10.1016\/j.gde.2006.10.009","journal-title":"Current Opinion in Genetics & Development"},{"key":"2416_CR2","volume-title":"Genome Research","author":"W Chen","year":"2008","unstructured":"Chen W, Kalscheu V, Tzschach A, Menzel C, Ullmann R, Schulz M, Erdogan F, Li N, Kijas Z, Arkesteijn G, et al.: Mapping translocation breakpoints by next-generation sequencing. Genome Research 2008."},{"issue":"5849","key":"2416_CR3","doi-asserted-by":"publisher","first-page":"420","DOI":"10.1126\/science.1149504","volume":"318","author":"JO Korbel","year":"2007","unstructured":"Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons J, Kim PM, Palejev D, Carriero NJ, Du L, et al.: Paired-end mapping reveals extensive structural variation in the human genome. Science 2007, 318(5849):420\u2013426. 10.1126\/science.1149504","journal-title":"Science"},{"issue":"1","key":"2416_CR4","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/j.ymeth.2007.09.009","volume":"44","author":"M Hafner","year":"2008","unstructured":"Hafner M, Landgraf P, Ludwig J, Rice A, Ojo T, Lin C, Holoch D, Lim C, Tuschl T: Identification of microRNAs and other small regulatory RNAs using cDNA library sequencing. Methods 2008, 44(1):3\u201312. 10.1016\/j.ymeth.2007.09.009","journal-title":"Methods"},{"issue":"7","key":"2416_CR5","doi-asserted-by":"publisher","first-page":"1636","DOI":"10.1111\/j.1365-294X.2008.03666.x","volume":"17","author":"JC Vera","year":"2008","unstructured":"Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Crawford DL, Hanski I, Marden JH: Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol 2008, 17(7):1636\u20131647. 10.1111\/j.1365-294X.2008.03666.x","journal-title":"Mol Ecol"},{"issue":"4","key":"2416_CR6","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1038\/nbt1394","volume":"26","author":"MR Friedl\u00e4nder","year":"2008","unstructured":"Friedl\u00e4nder MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N: Discovering microRNAs from deep sequencing data using miRDeep. Nat Biotechnol 2008, 26(4):407\u2013415. 10.1038\/nbt1394","journal-title":"Nat Biotechnol"},{"issue":"7153","key":"2416_CR7","doi-asserted-by":"publisher","first-page":"553","DOI":"10.1038\/nature06008","volume":"448","author":"T Mikkelsen","year":"2007","unstructured":"Mikkelsen T, Ku M, Jaffe D, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim T, Koche R, et al.: Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 2007, 448(7153):553\u2013560. 10.1038\/nature06008","journal-title":"Nature"},{"issue":"4","key":"2416_CR8","doi-asserted-by":"publisher","first-page":"823","DOI":"10.1016\/j.cell.2007.05.009","volume":"129","author":"A Barski","year":"2007","unstructured":"Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K: High-resolution profiling of histone methylations in the human genome. Cell 2007, 129(4):823\u2013837. 10.1016\/j.cell.2007.05.009","journal-title":"Cell"},{"issue":"5","key":"2416_CR9","doi-asserted-by":"publisher","first-page":"802","DOI":"10.1101\/gr.072033.107","volume":"18","author":"D Hernandez","year":"2008","unstructured":"Hernandez D, Fran\u00e7ois P, Farinelli L, Oster\u00e5s M, Schrenzel J: De novo bacterial genome sequencing: Millions of very short reads assembled on a desktop computer. Genome Research 2008, 18(5):802\u2013809. 10.1101\/gr.072033.107","journal-title":"Genome Research"},{"issue":"7057","key":"2416_CR10","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1038\/nature03959","volume":"437","author":"M Margulies","year":"2005","unstructured":"Margulies M, Egholm M, Altman W, Attiya S, Bader J, Bemben L, Berka J, Braverman M, Chen Y, Chen Z, et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005, 437(7057):376\u2013380.","journal-title":"Nature"},{"issue":"3","key":"2416_CR11","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1101\/gr.8.3.186","volume":"8","author":"B Ewing","year":"1998","unstructured":"Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research 1998, 8(3):186\u2013194.","journal-title":"Genome Research"},{"issue":"7184","key":"2416_CR12","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1038\/nature06745","volume":"452","author":"SJ Cokus","year":"2008","unstructured":"Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson S, Pellegrini M, Jacobsen SE: Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 2008, 452(7184):215\u2013219. 10.1038\/nature06745","journal-title":"Nature"},{"key":"2416_CR13","volume-title":"Nat Methods","author":"Y Erlich","year":"2008","unstructured":"Erlich Y, Mitra PP, Delabastide M, McCombie WR, Hannon GJ: Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 2008."},{"key":"2416_CR14","volume-title":"Nucleic Acids Research","author":"JC Dohm","year":"2008","unstructured":"Dohm JC, Lottaz C, Borodina T, Himmelbauer H: Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Research 2008."},{"key":"2416_CR15","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1186\/1471-2105-9-128","volume":"9","author":"A Smith","year":"2008","unstructured":"Smith A, Xuan Z, Zhang M: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 2008, 9: 128. 10.1186\/1471-2105-9-128","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"2416_CR16","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1186\/1471-2105-9-250","volume":"9","author":"PC Dolan","year":"2008","unstructured":"Dolan PC, Denver DR: TileQC: a system for tile-based quality control of Solexa data. BMC Bioinformatics 2008, 9(1):250. 10.1186\/1471-2105-9-250","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"2416_CR17","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1093\/nar\/gkj454","volume":"34","author":"P Yakovchuk","year":"2006","unstructured":"Yakovchuk P, Protozanova E, Frank-Kamenetskii MD: Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Research 2006, 34(2):564\u2013574. 10.1093\/nar\/gkj454","journal-title":"Nucleic Acids Research"},{"issue":"368","key":"2416_CR18","doi-asserted-by":"publisher","first-page":"829","DOI":"10.1080\/01621459.1979.10481038","volume":"74","author":"WS Cleveland","year":"1979","unstructured":"Cleveland WS: Robust locally weighted regression and smoothing scatterplots. J Amer Statist Assoc 1979, 74(368):829\u2013836. 10.2307\/2286407","journal-title":"J Amer Statist Assoc"},{"issue":"3","key":"2416_CR19","doi-asserted-by":"publisher","first-page":"803","DOI":"10.2307\/2532201","volume":"49","author":"JD Banfield","year":"1993","unstructured":"Banfield JD, Raftery AE: Model-based Gaussian and non-Gaussian clustering. Biometrics 1993, 49(3):803\u2013821. 10.2307\/2532201","journal-title":"Biometrics"},{"issue":"2","key":"2416_CR20","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1007\/s003579900058","volume":"16","author":"C Fraley","year":"1999","unstructured":"Fraley C, Raftery AE: MCLUST: Software for model-based cluster analysis. J Classification 1999, 16(2):297\u2013306. 10.1007\/s003579900058","journal-title":"J Classification"},{"issue":"458","key":"2416_CR21","doi-asserted-by":"publisher","first-page":"611","DOI":"10.1198\/016214502760047131","volume":"97","author":"C Fraley","year":"2002","unstructured":"Fraley C, Raftery AE: Model-based clustering, discriminant analysis, and density estimation. J Amer Statist Assoc 2002, 97(458):611\u2013631. 10.1198\/016214502760047131","journal-title":"J Amer Statist Assoc"},{"issue":"2","key":"2416_CR22","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1007\/s00357-003-0015-3","volume":"20","author":"C Fraley","year":"2003","unstructured":"Fraley C, Raftery AE: Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. J Classification 2003, 20(2):263\u2013286. 10.1007\/s00357-003-0015-3","journal-title":"J Classification"},{"key":"2416_CR23","doi-asserted-by":"publisher","DOI":"10.1002\/0471200611","volume-title":"Elements of Information Theory","author":"TM Cover","year":"1991","unstructured":"Cover TM, Thomas JA: Elements of Information Theory. John Wiley; 1991."},{"issue":"6","key":"2416_CR24","doi-asserted-by":"publisher","first-page":"e579","DOI":"10.1371\/journal.pone.0000579","volume":"2","author":"C Iseli","year":"2007","unstructured":"Iseli C, Ambrosini G, Bucher P, Jongeneel CV: Indexing strategies for rapid searches of short words in genome sequences. PLoS ONE 2007, 2(6):e579. 10.1371\/journal.pone.0000579","journal-title":"PLoS ONE"},{"issue":"1","key":"2416_CR25","first-page":"11","volume":"4","author":"EW Myers","year":"1988","unstructured":"Myers EW, Miller W: Optimal alignments in linear space. Comput Appl Biosci 1988, 4(1):11\u201317.","journal-title":"Comput Appl Biosci"},{"issue":"1","key":"2416_CR26","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1186\/1471-2105-9-128","volume":"9","author":"A Smith","year":"2008","unstructured":"Smith A, Xuan Z, Zhang M: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 2008, 9(1):128. 10.1186\/1471-2105-9-128","journal-title":"BMC Bioinformatics"},{"key":"2416_CR27","doi-asserted-by":"crossref","unstructured":"Ferragina P, Manzini G, M\u00e4kinen V, Navarro G: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms (TALG) 2007., 3(2):","DOI":"10.1145\/1240233.1240243"},{"issue":"13","key":"2416_CR28","doi-asserted-by":"publisher","first-page":"i195","DOI":"10.1093\/bioinformatics\/btm200","volume":"23","author":"S Gr\u00e4f","year":"2007","unstructured":"Gr\u00e4f S, Nielsen FG, Kurtz S, Huynen MA, Birney E, Stunnenberg H, Flicek P: Optimized design and assessment of whole genome tiling arrays. Bioinformatics 2007, 23(13):i195\u2013204. 10.1093\/bioinformatics\/btm200","journal-title":"Bioinformatics"},{"issue":"3","key":"2416_CR29","doi-asserted-by":"publisher","first-page":"142","DOI":"10.1016\/j.tig.2007.12.006","volume":"24","author":"M Pop","year":"2008","unstructured":"Pop M, Salzberg SL: Bioinformatics challenges of new sequencing technology. Trends Genet 2008, 24(3):142\u2013149.","journal-title":"Trends Genet"},{"issue":"5712","key":"2416_CR30","doi-asserted-by":"publisher","first-page":"1072","DOI":"10.1126\/science.1105436","volume":"307","author":"DA Hinds","year":"2005","unstructured":"Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR: Whole-genome patterns of common DNA variation in three human populations. Science 2005, 307(5712):1072\u20131079. 10.1126\/science.1105436","journal-title":"Science"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-431.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:05:58Z","timestamp":1630494358000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-431"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,10,13]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2416"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-431","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,10,13]]},"assertion":[{"value":"4 June 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 October 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 October 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"431"}}