{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T15:48:03Z","timestamp":1772552883732,"version":"3.50.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"S14","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>With the fast advances in nextgen sequencing technology, high-throughput RNA sequencing has emerged as a powerful and cost-effective way for transcriptome study. <jats:italic>De novo<\/jats:italic> assembly of transcripts provides an important solution to transcriptome analysis for organisms with no reference genome. However, there lacked understanding on how the different variables affected assembly outcomes, and there was no consensus on how to approach an optimal solution by selecting software tool and suitable strategy based on the properties of RNA-Seq data.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>To reveal the performance of different programs for transcriptome assembly, this work analyzed some important factors, including <jats:italic>k<\/jats:italic>-mer values, genome complexity, coverage depth, directional reads, <jats:italic>etc<\/jats:italic>. Seven program conditions, four single <jats:italic>k<\/jats:italic>-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple <jats:italic>k<\/jats:italic>-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK) were tested. While small and large <jats:italic>k<\/jats:italic>-mer values performed better for reconstructing lowly and highly expressed transcripts, respectively, MK strategy worked well for almost all ranges of expression quintiles. Among SK tools, Trinity performed well across various conditions but took the longest running time. Oases consumed the most memory whereas SOAPdenovo required the shortest runtime but worked poorly to reconstruct full-length CDS. ABySS showed some good balance between resource usage and quality of assemblies.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Our work compared the performance of publicly available transcriptome assemblers, and analyzed important factors affecting <jats:italic>de novo<\/jats:italic> assembly. Some practical guidelines for transcript reconstruction from short-read RNA-Seq data were proposed. <jats:italic>De novo<\/jats:italic> assembly of <jats:italic>C. sinensis<\/jats:italic> transcriptome was greatly improved using some optimized methods.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-s14-s2","type":"journal-article","created":{"date-parts":[[2011,12,14]],"date-time":"2011-12-14T19:42:56Z","timestamp":1323891776000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":401,"title":["Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study"],"prefix":"10.1186","volume":"12","author":[{"given":"Qiong-Yi","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Yi","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Yi-Meng","family":"Kong","sequence":"additional","affiliation":[]},{"given":"Da","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Xuan","family":"Li","sequence":"additional","affiliation":[]},{"given":"Pei","family":"Hao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,12,14]]},"reference":[{"issue":"7339","key":"4961_CR1","doi-asserted-by":"publisher","first-page":"473","DOI":"10.1038\/nature09715","volume":"471","author":"BR Graveley","year":"2010","unstructured":"Graveley BR, Brooks AN, Carlson JW, Duff MO, Landolin JM, Yang L, Artieri CG, van Baren MJ, Boley N, Booth BW, et al.: The developmental transcriptome of Drosophila melanogaster. Nature 2010, 471(7339):473\u2013479.","journal-title":"Nature"},{"issue":"12","key":"4961_CR2","doi-asserted-by":"publisher","first-page":"1060","DOI":"10.1038\/ng.703","volume":"42","author":"P Li","year":"2010","unstructured":"Li P, Ponnala L, Gandotra N, Wang L, Si Y, Tausta SL, Kebrom TH, Provart N, Patel R, Myers CR, et al.: The developmental dynamics of the maize leaf transcriptome. Nat Genet 2010, 42(12):1060\u20131067. 10.1038\/ng.703","journal-title":"Nat Genet"},{"key":"4961_CR3","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1186\/1471-2164-12-131","volume":"12","author":"CY Shi","year":"2011","unstructured":"Shi CY, Yang H, Wei CL, Yu O, Zhang ZZ, Jiang CJ, Sun J, Li YY, Chen Q, Xia T, et al.: Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds. BMC Genomics 2011, 12: 131. 10.1186\/1471-2164-12-131","journal-title":"BMC Genomics"},{"key":"4961_CR4","volume-title":"Nature","author":"I Voineagu","year":"2011","unstructured":"Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, Mill J, Cantor RM, Blencowe BJ, Geschwind DH: Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 2011."},{"key":"4961_CR5","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1186\/1471-2164-11-400","volume":"11","author":"XW Wang","year":"2010","unstructured":"Wang XW, Luan JB, Li JM, Bao YY, Zhang CX, Liu SS: De novo characterization of a whitefly transcriptome and analysis of its gene expression during development. BMC Genomics 2010, 11: 400. 10.1186\/1471-2164-11-400","journal-title":"BMC Genomics"},{"issue":"22","key":"4961_CR6","doi-asserted-by":"publisher","first-page":"9172","DOI":"10.1073\/pnas.1100489108","volume":"108","author":"K Kannan","year":"2011","unstructured":"Kannan K, Wang L, Wang J, Ittmann MM, Li W, Yen L: Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc Natl Acad Sci U S A 2011, 108(22):9172\u20139177. 10.1073\/pnas.1100489108","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"7234","key":"4961_CR7","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1038\/nature07638","volume":"458","author":"CA Maher","year":"2009","unstructured":"Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, Sam L, Barrette T, Palanisamy N, Chinnaiyan AM: Transcriptome sequencing to detect gene fusions in cancer. Nature 2009, 458(7234):97\u2013101. 10.1038\/nature07638","journal-title":"Nature"},{"issue":"15","key":"4961_CR8","doi-asserted-by":"publisher","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","volume":"25","author":"R Li","year":"2009","unstructured":"Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 2009, 25(15):1966\u20131967. 10.1093\/bioinformatics\/btp336","journal-title":"Bioinformatics"},{"issue":"21","key":"4961_CR9","doi-asserted-by":"publisher","first-page":"2872","DOI":"10.1093\/bioinformatics\/btp367","volume":"25","author":"I Birol","year":"2009","unstructured":"Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, Stazyk G, Morin RD, Zhao Y, Hirst M, Schein JE, et al.: De novo transcriptome assembly with ABySS. Bioinformatics 2009, 25(21):2872\u20132877. 10.1093\/bioinformatics\/btp367","journal-title":"Bioinformatics"},{"key":"4961_CR10","unstructured":"Oases: De novo transcriptome assembler for very short reads\n                  http:\/\/www.ebi.ac.uk\/~zerbino\/oases\/"},{"issue":"11","key":"4961_CR11","doi-asserted-by":"publisher","first-page":"909","DOI":"10.1038\/nmeth.1517","volume":"7","author":"G Robertson","year":"2010","unstructured":"Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, et al.: De novo assembly and analysis of RNA-seq data. Nat Methods 2010, 7(11):909\u2013912. 10.1038\/nmeth.1517","journal-title":"Nat Methods"},{"issue":"1","key":"4961_CR12","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1093\/dnares\/dsq028","volume":"18","author":"R Garg","year":"2010","unstructured":"Garg R, Patel RK, Tyagi AK, Jain M: De novo assembly of chickpea transcriptome using short reads for gene discovery and marker identification. DNA Res 2010, 18(1):53\u201363.","journal-title":"DNA Res"},{"issue":"1","key":"4961_CR13","doi-asserted-by":"publisher","first-page":"298","DOI":"10.1186\/1471-2164-12-298","volume":"12","author":"RW Ness","year":"2011","unstructured":"Ness RW, Siol M, Barrett SC: De novo sequence assembly and characterization of the floral transcriptome in cross- and self-fertilizing plants. BMC Genomics 2011, 12(1):298. 10.1186\/1471-2164-12-298","journal-title":"BMC Genomics"},{"key":"4961_CR14","volume-title":"Nat Biotechnol","author":"MG Grabherr","year":"2011","unstructured":"Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al.: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011."},{"issue":"10","key":"4961_CR15","doi-asserted-by":"publisher","first-page":"1432","DOI":"10.1101\/gr.103846.109","volume":"20","author":"Y Surget-Groba","year":"2010","unstructured":"Surget-Groba Y, Montoya-Burgos JI: Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Res 2010, 20(10):1432\u20131440. 10.1101\/gr.103846.109","journal-title":"Genome Res"},{"issue":"5","key":"4961_CR16","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1101\/gr.074492.107","volume":"18","author":"DR Zerbino","year":"2008","unstructured":"Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008, 18(5):821\u2013829. 10.1101\/gr.074492.107","journal-title":"Genome Res"},{"issue":"3","key":"4961_CR17","doi-asserted-by":"publisher","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","volume":"10","author":"B Langmead","year":"2009","unstructured":"Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009, 10(3):R25. 10.1186\/gb-2009-10-3-r25","journal-title":"Genome Biol"},{"issue":"4","key":"4961_CR18","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1101\/gr.229202. Article published online before March 2002","volume":"12","author":"WJ Kent","year":"2002","unstructured":"Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res 2002, 12(4):656\u2013664.","journal-title":"Genome Res"},{"issue":"9","key":"4961_CR19","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.1093\/bioinformatics\/btp120","volume":"25","author":"C Trapnell","year":"2009","unstructured":"Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009, 25(9):1105\u20131111. 10.1093\/bioinformatics\/btp120","journal-title":"Bioinformatics"},{"issue":"6032","key":"4961_CR20","doi-asserted-by":"publisher","first-page":"930","DOI":"10.1126\/science.1203357","volume":"332","author":"N Rhind","year":"2011","unstructured":"Rhind N, Chen Z, Yassour M, Thompson DA, Haas BJ, Habib N, Wapinski I, Roy S, Lin MF, Heiman DI, et al.: Comparative functional genomics of the fission yeasts. Science 2011, 332(6032):930\u2013936. 10.1126\/science.1203357","journal-title":"Science"},{"issue":"Database issue","key":"4961_CR21","doi-asserted-by":"publisher","first-page":"D187","DOI":"10.1093\/nar\/gkj161","volume":"34","author":"CH Wu","year":"2006","unstructured":"Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, et al.: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, 34(Database issue):D187\u2013191.","journal-title":"Nucleic Acids Res"},{"key":"4961_CR22","first-page":"1","volume":"101","author":"J Tanaka","year":"2006","unstructured":"Tanaka J, Taniguchi F: Estimation of the genome size of tea (Camellia sinensis), Camellia (C. japonica), and their interspecific hybrids by flow cytometry. Journal of Remote Sensing Society of Japan 2006, 101: 1\u20137.","journal-title":"Journal of Remote Sensing Society of Japan"},{"issue":"Database issue","key":"4961_CR23","doi-asserted-by":"publisher","first-page":"D277","DOI":"10.1093\/nar\/gkh063","volume":"32","author":"M Kanehisa","year":"2004","unstructured":"Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res 2004, 32(Database issue):D277\u2013280.","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-S14-S2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T17:47:12Z","timestamp":1630518432000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-S14-S2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,12]]},"references-count":23,"journal-issue":{"issue":"S14","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4961"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-s14-s2","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,12]]},"assertion":[{"value":"14 December 2011","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S2"}}