{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,25]],"date-time":"2024-06-25T11:51:53Z","timestamp":1719316313012},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: In genome assembly, the primary issue is how to determine upstream and downstream sequence regions of sequence seeds for constructing long contigs or scaffolds. When extending one sequence seed, repetitive regions in the genome always cause multiple feasible extension candidates which increase the difficulty of genome assembly. The universally accepted solution is choosing one based on read overlaps and paired-end (mate-pair) reads. However, this solution faces difficulties with regard to some complex repetitive regions. In addition, sequencing errors may produce false repetitive regions and uneven sequencing depth leads some sequence regions to have too few or too many reads. All the aforementioned problems prohibit existing assemblers from getting satisfactory assembly results.<\/jats:p>\n               <jats:p>Results: In this article, we develop an algorithm, called extract paths for genome assembly (EPGA), which extracts paths from De Bruijn graph for genome assembly. EPGA uses a new score function to evaluate extension candidates based on the distributions of reads and insert size. The distribution of reads can solve problems caused by sequencing errors and short repetitive regions. Through assessing the variation of the distribution of insert size, EPGA can solve problems introduced by some complex repetitive regions. For solving uneven sequencing depth, EPGA uses relative mapping to evaluate extension candidates. On real datasets, we compare the performance of EPGA and other popular assemblers. The experimental results demonstrate that EPGA can effectively obtain longer and more accurate contigs and scaffolds.<\/jats:p>\n               <jats:p>Availability and implementation: EPGA is publicly available for download at https:\/\/github.com\/bioinfomaticsCSU\/EPGA.<\/jats:p>\n               <jats:p>Contact: \u00a0jxwang@csu.edu.cn<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu762","type":"journal-article","created":{"date-parts":[[2014,11,19]],"date-time":"2014-11-19T01:15:08Z","timestamp":1416359708000},"page":"825-833","source":"Crossref","is-referenced-by-count":21,"title":["EPGA: <i>de novo<\/i> assembly using the distributions of reads and insert size"],"prefix":"10.1093","volume":"31","author":[{"given":"Junwei","family":"Luo","sequence":"first","affiliation":[{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"},{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"}]},{"given":"Jianxin","family":"Wang","sequence":"additional","affiliation":[{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"}]},{"given":"Zhen","family":"Zhang","sequence":"additional","affiliation":[{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"}]},{"given":"Fang-Xiang","family":"Wu","sequence":"additional","affiliation":[{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"}]},{"given":"Min","family":"Li","sequence":"additional","affiliation":[{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"}]},{"given":"Yi","family":"Pan","sequence":"additional","affiliation":[{"name":"1 School of Information Science and Engineering, Central South University, ChangSha 410083, China, 2College of Computer Science and Technology, Henan Polytechnic University, JiaoZuo, 454000, China, 3Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan S7N 5A9, Canada and 4Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA"}]}],"member":"286","published-online":{"date-parts":[[2014,11,17]]},"reference":[{"key":"2023020116175479200_btu762-B1","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nmeth.1527","article-title":"Limitations of next-generation genome sequence assembly","volume":"8","author":"Alkan","year":"2011","journal-title":"Nat. Methods"},{"key":"2023020116175479200_btu762-B2","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1093\/bioinformatics\/btq626","article-title":"PE-assembler: de novo assembly using short paired end reads","volume":"27","author":"Ariyaratne","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020116175479200_btu762-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a New Genome Assembly Algorithm and its Applications to Single-Cell Sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comp. Biol."},{"key":"2023020116175479200_btu762-B4","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1101\/gr.079053.108","article-title":"De\u00a0novo fragment assembly with short mate-paired reads: does the read length matter?","volume":"19","author":"Chaisson","year":"2009","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B5","doi-asserted-by":"crossref","first-page":"915","DOI":"10.1038\/nbt.1966","article-title":"Efficient de novo assembly of single-cell bacterial genomes from short-read datasets","volume":"29","author":"Chitsaz","year":"2011","journal-title":"Nature Biotech."},{"key":"2023020116175479200_btu762-B6","doi-asserted-by":"crossref","first-page":"1697","DOI":"10.1101\/gr.6435207","article-title":"SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing","volume":"17","author":"Dohm","year":"2007","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B27","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.1101\/gr.126599.111","article-title":"Assemblathon 1: A competitive assessment of de novo short read assembly methods","volume":"21","author":"Earl","year":"2011","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B7","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020116175479200_btu762-B8","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1109\/TST.2013.6616523","article-title":"De\u00a0novo assembly methods for next generation sequencing data","volume":"5","author":"He","year":"2013","journal-title":"Tsinghua Sci. Technol."},{"key":"2023020116175479200_btu762-B9","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De\u00a0novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nature Genet."},{"key":"2023020116175479200_btu762-B10","doi-asserted-by":"crossref","first-page":"2942","DOI":"10.1093\/bioinformatics\/btm451","article-title":"Extending assembly of short DNA sequences to handle error","volume":"23","author":"Jeck","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020116175479200_btu762-B11","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De\u00a0novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2010","journal-title":"Genome Res"},{"key":"2023020116175479200_btu762-B12","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/2047-217X-1-18","article-title":"SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler","volume":"1","author":"Luo","year":"2012","journal-title":"GigaScience"},{"key":"2023020116175479200_btu762-B13","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1093\/bioinformatics\/bts399","article-title":"Telescoper: de novo assembly of highly repetitive regions","volume":"28","author":"Maayan","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020116175479200_btu762-B14","doi-asserted-by":"crossref","first-page":"R103","DOI":"10.1186\/gb-2009-10-10-r103","article-title":"ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads","volume":"10","author":"MacCallum","year":"2009","journal-title":"Genome Biol."},{"key":"2023020116175479200_btu762-B15","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1007\/978-3-642-20036-6_22","article-title":"Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers","volume-title":"Proceedings of Research in Computational Molecular Biology","author":"Medvedev","year":"2011"},{"key":"2023020116175479200_btu762-B16","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-12683-3_28","article-title":"IDBA\u2014a practical iterative de Bruijn graph de novo assembler","author":"Peng","year":"2010"},{"key":"2023020116175479200_btu762-B17","doi-asserted-by":"crossref","first-page":"1420","DOI":"10.1093\/bioinformatics\/bts174","article-title":"IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth","volume":"28","author":"Peng","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020116175479200_btu762-B18","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNAfragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020116175479200_btu762-B19","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1089\/cmb.2012.0098","article-title":"Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly","volume":"20","author":"Pham","year":"2013","journal-title":"J. Comput. Biol."},{"key":"2023020116175479200_btu762-B20","doi-asserted-by":"crossref","first-page":"2270","DOI":"10.1101\/gr.141515.112","article-title":"Finished bacterial genomes from shotgun sequence data","volume":"22","author":"Ribeiro","year":"2012","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B21","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: a critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B22","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short-read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B23","doi-asserted-by":"crossref","first-page":"500","DOI":"10.1093\/bioinformatics\/btl629","article-title":"Assembling millions of short DNA sequences using SSAKE","volume":"23","author":"Warren","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020116175479200_btu762-B24","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1186\/1471-2105-12-95","article-title":"Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies","volume":"12","author":"Wetzel","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020116175479200_btu762-B25","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short-read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."},{"key":"2023020116175479200_btu762-B26","doi-asserted-by":"crossref","first-page":"e8407","DOI":"10.1371\/journal.pone.0008407","article-title":"Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler","volume":"4","author":"Zerbino","year":"2009","journal-title":"PLoS One"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/6\/825\/49011669\/bioinformatics_31_6_825.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/6\/825\/49011669\/bioinformatics_31_6_825.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:31:05Z","timestamp":1675297865000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/6\/825\/215627"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,11,17]]},"references-count":27,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2015,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu762","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,3,15]]},"published":{"date-parts":[[2014,11,17]]}}}