{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T03:10:11Z","timestamp":1773630611104,"version":"3.50.1"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Next-generation sequencing technologies allow genomes to be sequenced more quickly and less expensively than ever before. However, as sequencing technology has improved, the difficulty of <jats:italic>de novo<\/jats:italic> genome assembly has increased, due in large part to the shorter reads generated by the new technologies. The use of mated sequences (referred to as mate-pairs) is a standard means of disambiguating assemblies to obtain a more complete picture of the genome without resorting to manual finishing. Here, we examine the effectiveness of mate-pair information in resolving repeated sequences in the DNA (a paramount issue to overcome). While it has been empirically accepted that mate-pairs improve assemblies, and a variety of assemblers use mate-pairs in the context of repeat resolution, the effectiveness of mate-pairs in this context has not been systematically evaluated in previous literature.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We show that, in high-coverage prokaryotic assemblies, libraries of short mate-pairs (about 4-6 times the read-length) more effectively disambiguate repeat regions than the libraries that are commonly constructed in current genome projects. We also demonstrate that the best assemblies can be obtained by 'tuning' mate-pair libraries to accommodate the specific repeat structure of the genome being assembled - information that can be obtained through an initial assembly using unpaired reads. These results are shown across 360 simulations on 'ideal' prokaryotic data as well as assembly of 8 bacterial genomes using SOAPdenovo. The simulation results provide an upper-bound on the potential value of mate-pairs for resolving repeated sequences in real prokaryotic data sets. The assembly results show that our method of tuning mate-pairs exploits fundamental properties of these genomes, leading to better assemblies even when using an off -the-shelf assembler in the presence of base-call errors.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Our results demonstrate that dramatic improvements in prokaryotic genome assembly quality can be achieved by tuning mate-pair sizes to the actual repeat structure of a genome, suggesting the possible need to change the way sequencing projects are designed. We propose that a two-tiered approach - first generate an assembly of the genome with unpaired reads in order to evaluate the repeat structure of the genome; then generate the mate-pair libraries that provide most information towards the resolution of repeats in the genome being assembled - is not only possible, but likely also more cost-effective as it will significantly reduce downstream manual finishing costs. In future work we intend to address the question of whether this result can be extended to larger eukaryotic genomes, where repeat structure can be quite different.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-95","type":"journal-article","created":{"date-parts":[[2011,4,13]],"date-time":"2011-04-13T18:18:39Z","timestamp":1302718719000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":42,"title":["Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies"],"prefix":"10.1186","volume":"12","author":[{"given":"Joshua","family":"Wetzel","sequence":"first","affiliation":[]},{"given":"Carl","family":"Kingsford","sequence":"additional","affiliation":[]},{"given":"Mihai","family":"Pop","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,4,13]]},"reference":[{"key":"4498_CR1","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1038\/nmeth.1527","volume":"8","author":"C Alkan","year":"2011","unstructured":"Alkan C, Sajjadian S, Eichler EE: Limitations of next-generation genome sequence assembly. Nat Meth 2011, 8: 61\u201365. 10.1038\/nmeth.1527","journal-title":"Nat Meth"},{"key":"4498_CR2","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1186\/1471-2105-11-21","volume":"11","author":"C Kingsford","year":"2010","unstructured":"Kingsford C, Schatz M, Pop M: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 2010, 11: 21. 10.1186\/1471-2105-11-21","journal-title":"BMC Bioinformatics"},{"issue":"suppl 1","key":"4498_CR3","doi-asserted-by":"publisher","first-page":"S225","DOI":"10.1093\/bioinformatics\/17.suppl_1.S225","volume":"17","author":"P Pevzner","year":"2001","unstructured":"Pevzner P, Tang H: Fragment assembly with double-barreled data. Bioinformatics 2001, 17(suppl 1):S225\u2013233.","journal-title":"Bioinformatics"},{"issue":"17","key":"4498_CR4","doi-asserted-by":"publisher","first-page":"9748","DOI":"10.1073\/pnas.171285098","volume":"98","author":"P Pevzner","year":"2001","unstructured":"Pevzner P, Tang H, Waterman M: An Eulerian Path Approach to DNA Fragment Assembly. Proceedings of the National Academy of Sciences of the United States of America 2001, 98(17):9748\u20139753. 10.1073\/pnas.171285098","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"5223","key":"4498_CR5","doi-asserted-by":"publisher","first-page":"496","DOI":"10.1126\/science.7542800","volume":"269","author":"R Fleischmann","year":"1995","unstructured":"Fleischmann R, Adams M, White O, Clayton R, Kirkness E, Kerlavage A, Bult C, Tomb J, Dougherty B, Merrick J, McKenney K, Sutton G, Fitzhugh W, Fields C, Gocyne J, Scott J, Shirley R, Liu L, Glodek A, Kelley J, Jenny M, Weidman J, Phillips C, Spriggs T, Hedblom E, Cotton M, Utterback T, Hanna M, Nguyen D, Saudek D, Brandon R, Fine L, Fritchman J, Fuhrmann J, Geoghagen N, Gnehm C, McDonald L, Small K, Fraser C, Smith H, Venter J: Whole-genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Science 1995, 269(5223):496\u2013512. 10.1126\/science.7542800","journal-title":"Science"},{"issue":"5461","key":"4498_CR6","doi-asserted-by":"publisher","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","volume":"287","author":"E Myers","year":"2000","unstructured":"Myers E, Sutton G, Delcher A, Dew I, Fasulo D, Flanigan M, Kravitz S, Mobarry C, Reinert K, Remington K, Anson E, Bolanos R, Chou H, Jordan C, Halpern A, Lonardi S, Beasley E, Brandon R, Chen L, Dunn P, Lai Z, Liang Y, Nusskern D, Zhan M, Zhang Q, Zheng X, Rubin G, Adams M, Venter J: A Whole Genome Assembly of Drosophila . Science 2000, 287(5461):2196\u20132204. 10.1126\/science.287.5461.2196","journal-title":"Science"},{"issue":"5","key":"4498_CR7","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1101\/gr.074492.107","volume":"18","author":"D Zerbino","year":"2008","unstructured":"Zerbino D, Birney E: Velvet: Algorithms for de Novo short read assembly using de Bruijn graphs. Genome Research 2008, 18(5):821\u2013829. 10.1101\/gr.074492.107","journal-title":"Genome Research"},{"key":"4498_CR8","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1101\/gr.208902","volume":"12","author":"S Batzoglou","year":"2002","unstructured":"Batzoglou S, Jaffe D, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov J, Lander E: ARACHNE: a whole genome shotgun assembler. Genome Research 2002, 12: 177\u2013189. 10.1101\/gr.208902","journal-title":"Genome Research"},{"issue":"5","key":"4498_CR9","doi-asserted-by":"publisher","first-page":"810","DOI":"10.1101\/gr.7337908","volume":"18","author":"J Butler","year":"2008","unstructured":"Butler J, MacCallum I, Kleber M, Belmonte ISM, Lander E, Nusbaum C, Jaffe D: ALLPATHS: De Novo assembly of whole-genome shotgun microreads. Genome Research 2008, 18(5):810\u2013820. 10.1101\/gr.7337908","journal-title":"Genome Research"},{"key":"4498_CR10","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1101\/gr.1536204","volume":"14","author":"M Pop","year":"2004","unstructured":"Pop M, Kosack D, Salzberg S: Hierarchical scaffolding with Bambus. Genome Research 2004, 14: 149\u2013159. 10.1101\/gr.1536204","journal-title":"Genome Research"},{"issue":"4","key":"4498_CR11","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1093\/bib\/bbp026","volume":"10","author":"M Pop","year":"2009","unstructured":"Pop M: Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 2009, 10(4):354\u2013366. 10.1093\/bib\/bbp026","journal-title":"Briefings in Bioinformatics"},{"key":"4498_CR12","doi-asserted-by":"publisher","first-page":"853","DOI":"10.1093\/bioinformatics\/bti091","volume":"21","author":"D Bartels","year":"2005","unstructured":"Bartels D, Kespohl S, Albaum S, Dr\u00fcke T, Goesmann A, Herold J, Kaiser O, P\u00fcler A, Pfeiffer F, Raddatz G, Stoye J, Meyer F, Schuster S: BACCardi-a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison. Bioinformatics 2005, 21: 853\u2013859. 10.1093\/bioinformatics\/bti091","journal-title":"Bioinformatics"},{"key":"4498_CR13","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1101\/gr.8.3.195","volume":"8","author":"D Gordon","year":"1998","unstructured":"Gordon D, Abaijian C, Green P: Consed: A graphical tool for sequence finishing. Genome Research 1998, 8: 195\u2013202.","journal-title":"Genome Research"},{"key":"4498_CR14","doi-asserted-by":"publisher","first-page":"R34","DOI":"10.1186\/gb-2007-8-3-r34","volume":"8","author":"M Schatz","year":"2007","unstructured":"Schatz M, Phillipy A, Schneiderman B, Salzberg S: Hawkeye: an interactive visual analytics tool for genome assemblies. Genome Biology 2007, 8: R34. 10.1186\/gb-2007-8-3-r34","journal-title":"Genome Biology"},{"key":"4498_CR15","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1006\/jagm.2001.1201","volume":"42","author":"M Nyk\u00e4nen","year":"2002","unstructured":"Nyk\u00e4nen M, Ukkonen E: The exact path length problem. J Algorithms 2002, 42: 41\u201353.","journal-title":"J Algorithms"},{"key":"4498_CR16","doi-asserted-by":"publisher","first-page":"336","DOI":"10.1101\/gr.079053.108","volume":"19","author":"M Chaisson","year":"2009","unstructured":"Chaisson M, Brinza D, Pevzner P: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 2009, 19: 336\u2013346. 10.1101\/gr.079053.108","journal-title":"Genome Research"},{"issue":"12","key":"4498_CR17","doi-asserted-by":"publisher","first-page":"e8407","DOI":"10.1371\/journal.pone.0008407","volume":"4","author":"D Zerbino","year":"2009","unstructured":"Zerbino D, McEwen G, Margulies E, Birney E: Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler. PLoS one 2009, 4(12):e8407. 10.1371\/journal.pone.0008407","journal-title":"PLoS one"},{"issue":"Suppl 13","key":"4498_CR18","doi-asserted-by":"publisher","first-page":"O2","DOI":"10.1186\/1471-2105-10-S13-O2","volume":"10","author":"R Chikhi","year":"2009","unstructured":"Chikhi R, Lavenier D: Paired-end read length lower bounds for genome re-sequencing. BMC Bioinformatics 2009, 10(Suppl 13):O2. 10.1186\/1471-2105-10-S13-O2","journal-title":"BMC Bioinformatics"},{"key":"4498_CR19","doi-asserted-by":"publisher","first-page":"385","DOI":"10.1186\/1471-2164-11-385","volume":"11","author":"A Bashir","year":"2010","unstructured":"Bashir A, Bansal V, Bafna V: Designing deep sequencing experiments: structural variation, haplotype assembly, and transcript abundance. BMC Genomics 2010, 11: 385. 10.1186\/1471-2164-11-385","journal-title":"BMC Genomics"},{"issue":"2","key":"4498_CR20","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1101\/gr.097261.109","volume":"20","author":"R Li","year":"2010","unstructured":"Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 2010, 20(2):265\u2013272. 10.1101\/gr.097261.109","journal-title":"Genome Research"},{"issue":"8","key":"4498_CR21","doi-asserted-by":"publisher","first-page":"1101","DOI":"10.1089\/cmb.2009.0047","volume":"16","author":"P Medvedev","year":"2009","unstructured":"Medvedev P, Brudno M: Maximum likelihood genome assembly. Journal of Computational Biology 2009, 16(8):1101\u20131116. 10.1089\/cmb.2009.0047","journal-title":"Journal of Computational Biology"},{"key":"4498_CR22","first-page":"199","volume-title":"Pacific Symposium on Biocomputing","author":"Z Mulyukov","year":"2002","unstructured":"Mulyukov Z, Pevzner P: EULER-PCR: Finishing Experiments for Repeat Resolution. Pacific Symposium on Biocomputing 2002, 199\u2013210."},{"key":"4498_CR23","volume-title":"Standards in Genomic Sciences","author":"M G\u00f6ker","year":"2010","unstructured":"G\u00f6ker M, Held B, Lucas S, Nolan M, Yasawong M, Rio TD, Tice H, Cheng J, Bruce D, Detter J, Tapia R, Han C, Goodwin L, Pitluck S, Liolios K, Ivanova N, Mavromatis K, Mikhailova N, Pati A, Chen A, Palaniappan K, Land M, Hauser L, Chang Y, Jeffries C, Rohde M, Sikorski J, Pukall R, Woyke T, Bristow J, Eisen J, Markowitz V, Hugenholtz P, Kyrpides N, Klenk H, Lapidus A: Complete genome sequence of Olsenella uli type strain (VPI D76D-27CT). Standards in Genomic Sciences 2010., 3:"},{"issue":"2","key":"4498_CR24","doi-asserted-by":"publisher","first-page":"194","DOI":"10.4056\/sigs.761490","volume":"2","author":"R Wirth","year":"2010","unstructured":"Wirth R, Sikorski J, Brambilla E, Misra M, Lapidus A, Copeland A, Nolan M, Lucas S, Chen F, Tice H, Cheng J, Han C, Detter J, Tapia R, Bruce D, Goodwin L, Pitluck S, Pati A, Anderson I, Ivanova N, Mavromatis K, Mikhailova N, Chen A, Palaniappan K, Bilek Y, Hader T, Land M, Hauser L, Chang Y, Jeffries C, Tindall B, Rohde M, G\u00f6ker M, Bristow J, Eisen J, Markowitz V, Hugenholtz P, Kyrpides N, Klenk H: Complete genome sequence of Thermocrinis albus type strain (HI 11\/12T). Standards in Genomic Sciences 2010, 2(2):194. 10.4056\/sigs.761490","journal-title":"Standards in Genomic Sciences"},{"key":"4498_CR25","first-page":"abs\/1006.4828","volume-title":"CoRR","author":"V Kundeti","year":"2010","unstructured":"Kundeti V, Rajasekaran S, Dinh H: An Efficient Algorithm For Chinese Postman Walk on Bi-directed de Bruijn Graphs. CoRR 2010, abs\/1006.4828."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-95.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T14:03:15Z","timestamp":1630504995000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-95"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,4,13]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4498"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-95","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,4,13]]},"assertion":[{"value":"17 December 2010","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 April 2011","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 April 2011","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"95"}}