{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T16:00:56Z","timestamp":1770998456464,"version":"3.50.1"},"reference-count":19,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-11-38","type":"journal-article","created":{"date-parts":[[2010,1,21]],"date-time":"2010-01-21T17:04:31Z","timestamp":1264093471000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":152,"title":["SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read"],"prefix":"10.1186","volume":"11","author":[{"given":"Juan","family":"Falgueras","sequence":"first","affiliation":[]},{"given":"Antonio J","family":"Lara","sequence":"additional","affiliation":[]},{"given":"No\u00e9","family":"Fern\u00e1ndez-Pozo","sequence":"additional","affiliation":[]},{"given":"Francisco R","family":"Cant\u00f3n","sequence":"additional","affiliation":[]},{"given":"Guillermo","family":"P\u00e9rez-Trabado","sequence":"additional","affiliation":[]},{"given":"M Gonzalo","family":"Claros","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,1,20]]},"reference":[{"key":"3495_CR1","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1093\/bioinformatics\/15.2.106","volume":"15","author":"GA Seluja","year":"1999","unstructured":"Seluja GA, Farmer A, McLeod M, Harger C, Schad PA: Establishing a method of vector contamination identification in database sequences. Bioinformatics 1999, 15: 106\u2013110. 10.1093\/bioinformatics\/15.2.106","journal-title":"Bioinformatics"},{"key":"3495_CR2","doi-asserted-by":"crossref","first-page":"194","DOI":"10.2144\/04372BM03","volume":"37","author":"JS Coker","year":"2004","unstructured":"Coker JS, Davies E: Identifying adaptor contamination when mining DNA sequence data. Biotechniques 2004, 37: 194\u2013198.","journal-title":"Biotechniques"},{"key":"3495_CR3","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1186\/1471-2164-8-416","volume":"8","author":"YA Chen","year":"2007","unstructured":"Chen YA, Lin CC, Wang CD, Wu HB, Hwang PI: An optimized procedure greatly improves EST vector contamination removal. BMC Genomics 2007, 8: 416. 10.1186\/1471-2164-8-416","journal-title":"BMC Genomics"},{"key":"3495_CR4","doi-asserted-by":"publisher","first-page":"1318","DOI":"10.1093\/bioinformatics\/btg159","volume":"19","author":"TE Scheetz","year":"2003","unstructured":"Scheetz TE, Trivedi N, Roberts CA, Kucaba T, Berger B, Robinson NL, Birkett CL, Gavin AJ, O'Leary B, Braun TA, Bonaldo MF, Robinson JP, Sheffeld VC, Casavant MBSTL: ESTprep: preprocessing cDNA sequence reads. Bioinformatics 2003, 19: 1318\u20131324. 10.1093\/bioinformatics\/btg159","journal-title":"Bioinformatics"},{"key":"3495_CR5","doi-asserted-by":"publisher","first-page":"462","DOI":"10.1093\/bioinformatics\/btm632","volume":"24","author":"JR White","year":"2008","unstructured":"White JR, Roberts M, Yorke JA, M P: Figaro: a novel statistical method for vector sequence removal. Bioinformatics 2008, 24: 462\u2013467. 10.1093\/bioinformatics\/btm632","journal-title":"Bioinformatics"},{"key":"3495_CR6","doi-asserted-by":"publisher","first-page":"4992","DOI":"10.1093\/nar\/23.24.4992","volume":"23","author":"JK Bonfield","year":"1995","unstructured":"Bonfield JK, Smith K, Staden R: A new DNA sequence assembly program. Nucleic Acids Res 1995, 23: 4992\u20134999. 10.1093\/nar\/23.24.4992","journal-title":"Nucleic Acids Res"},{"key":"3495_CR7","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1093\/bioinformatics\/17.12.1093","volume":"17","author":"HH Chou","year":"2001","unstructured":"Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics 2001, 17: 194\u2013198. 10.1093\/bioinformatics\/17.12.1093","journal-title":"Bioinformatics"},{"key":"3495_CR8","first-page":"3657","volume":"20","author":"S Li","year":"2004","unstructured":"Li S, Chou HH: LUCY2: an interactive DNA sequence quality trimming and vector removal tool. Bioinformatics 2004, 20: 3657\u20133665.","journal-title":"Bioinformatics"},{"key":"3495_CR9","unstructured":"TIGR: SeqClean.[http:\/\/compbio.dfci.harvard.edu\/tgi\/software\/]"},{"key":"3495_CR10","doi-asserted-by":"publisher","first-page":"3716","DOI":"10.1093\/nar\/gkg566","volume":"31","author":"A Hotz-Wagenblatt","year":"2003","unstructured":"Hotz-Wagenblatt A, Hankeln T, Ernst P, Glatting KH, Schmidt ER, Suhai S: ESTAnnotator: A tool for high throughput EST annotation. Nucleid Acids Res 2003, 31: 3716\u20133719. 10.1093\/nar\/gkg566","journal-title":"Nucleid Acids Res"},{"key":"3495_CR11","doi-asserted-by":"publisher","first-page":"W159","DOI":"10.1093\/nar\/gkm369","volume":"35","author":"B Lee","year":"2007","unstructured":"Lee B, Hong T, Byun SJ, Woo T, Choi YJ: ESTpass: a web-based server for processing and annotating expressed sequence tag (EST) sequences. Nucleic Acids Res 2007, 35: W159\u2013162. 10.1093\/nar\/gkm369","journal-title":"Nucleic Acids Res"},{"key":"3495_CR12","doi-asserted-by":"publisher","first-page":"W143","DOI":"10.1093\/nar\/gkm378","volume":"35","author":"SH Nagaraj","year":"2007","unstructured":"Nagaraj SH, Deshpande N, Gasser RB, Ranganathan S: ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform. Nucleic Acids Res 2007, 35: W143\u2013147. 10.1093\/nar\/gkm378","journal-title":"Nucleic Acids Res"},{"key":"3495_CR13","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1101\/gr.8.3.175","volume":"8","author":"B Ewing","year":"1998","unstructured":"Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998, 8: 175\u2013185.","journal-title":"Genome Res"},{"key":"3495_CR14","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1101\/gr.8.3.186","volume":"8","author":"B Ewing","year":"1998","unstructured":"Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998, 8: 186\u2013194.","journal-title":"Genome Res"},{"key":"3495_CR15","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1016\/S0168-9525(00)02093-X","volume":"16","author":"J Jurka","year":"2000","unstructured":"Jurka J: Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet 2000, 16: 418\u2013420. 10.1016\/S0168-9525(00)02093-X","journal-title":"Trends Genet"},{"key":"3495_CR16","unstructured":"Cant\u00f3n FR, Provost GL, Garcia V, Barr\u00e9 A, Frigerio JM, Paiva J, Fevereiro P, Avila C, Mouret JF, de Daruvar A, C\u00e1novas FM, Plomion C: Sustainable Forestry, Wood products and Biotechnology, . DFA-AFA Press 2003 chap. Transcriptome analysis of wood formation in maritime pine 333\u2013347."},{"key":"3495_CR17","doi-asserted-by":"publisher","first-page":"3657","DOI":"10.1093\/nar\/28.18.3657","volume":"28","author":"F Liang","year":"2000","unstructured":"Liang F, Holt I, Pertea G, Karamycheva S, Salzberg S, Quackenbush J: An optimized protocol for analysis of EST sequences. Nucleic Acids Res 2000, 28: 3657\u20133665. 10.1093\/nar\/28.18.3657","journal-title":"Nucleic Acids Res"},{"key":"3495_CR18","doi-asserted-by":"publisher","first-page":"W459","DOI":"10.1093\/nar\/gkl066","volume":"34","author":"A Masoudi-Nejad","year":"2006","unstructured":"Masoudi-Nejad A, Tonomura K, Kawashima S, Moriya Y, Suzuki M, Itoh M, Kanehisa M, Endo T, Goto S: EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments. Nucleic Acids Res 2006, 34: W459\u2013462. 10.1093\/nar\/gkl066","journal-title":"Nucleic Acids Res"},{"key":"3495_CR19","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/1471-2105-9-5","volume":"9","author":"J Forment","year":"2008","unstructured":"Forment J, Gilaber F, Robles A, Conejero V, Nuez F, Blanca JM: EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration. BMC Bioinformatics 2008, 9: 5. 10.1186\/1471-2105-9-5","journal-title":"BMC Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-38.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T05:29:48Z","timestamp":1630474188000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-38"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,20]]},"references-count":19,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3495"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-38","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,1,20]]},"assertion":[{"value":"4 June 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"38"}}