{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T00:34:39Z","timestamp":1767832479266,"version":"3.49.0"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2016,6,16]],"date-time":"2016-06-16T00:00:00Z","timestamp":1466035200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2016,6,16]],"date-time":"2016-06-16T00:00:00Z","timestamp":1466035200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-12-BS02-0008"],"award-info":[{"award-number":["ANR-12-BS02-0008"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Next Generation Sequencing (NGS) has dramatically enhanced our ability to sequence genomes, but not to assemble them. In practice, many published genome sequences remain in the state of a large set of contigs. Each contig describes the sequence found along some path of the assembly graph, however, the set of contigs does not record all the sequence information contained in that graph. Although many subsequent analyses can be performed with the set of contigs, one may ask whether mapping reads on the contigs is as informative as mapping them on the paths of the assembly graph. Currently, one lacks practical tools to perform mapping on such graphs.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Here, we propose a formal definition of mapping on a de Bruijn graph, analyse the problem complexity which turns out to be NP-complete, and provide a practical solution. We propose a pipeline called <jats:italic>GGMAP<\/jats:italic> (Greedy Graph MAPping). Its novelty is a procedure to map reads on branching paths of the graph, for which we designed a heuristic algorithm called <jats:italic>BGREAT<\/jats:italic> (de Bruijn Graph REAd mapping Tool). For the sake of efficiency, <jats:italic>BGREAT<\/jats:italic> rewrites a read sequence as a succession of unitigs sequences. <jats:italic>GGMAP<\/jats:italic> can map millions of reads per CPU hour on a de Bruijn graph built from a large set of human genomic reads. Surprisingly, results show that up to 22 % more reads can be mapped on the graph but not on the contig set.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Although mapping reads on a de Bruijn graph is complex task, our proposal offers a practical solution combining efficiency with an improved mapping capacity compared to assembly-based mapping even for complex eukaryotic data.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-016-1103-9","type":"journal-article","created":{"date-parts":[[2016,6,16]],"date-time":"2016-06-16T05:32:36Z","timestamp":1466055156000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":69,"title":["Read mapping on de Bruijn graphs"],"prefix":"10.1186","volume":"17","author":[{"given":"Antoine","family":"Limasset","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bastien","family":"Cazaux","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eric","family":"Rivals","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pierre","family":"Peterlongo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2016,6,16]]},"reference":[{"key":"1103_CR1","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1186\/2047-217X-2-10","volume":"2","author":"KR Bradnam","year":"2013","unstructured":"Bradnam KR, Fass JN, et al.Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience. 2013; 2:10. [doi:10.1186\/2047-217X-2-10].","journal-title":"GigaScience"},{"issue":"7","key":"1103_CR2","doi-asserted-by":"publisher","first-page":"897","DOI":"10.1089\/cmb.2009.0005","volume":"16","author":"N Nagarajan","year":"2009","unstructured":"Nagarajan N, Pop M. Parametric complexity of sequence assembly: theory and applications to next generation sequencing. J Comput Biol. 2009; 16(7):897\u2013908. [doi:10.1089\/cmb.2009.0005].","journal-title":"J Comput Biol"},{"issue":"2","key":"1103_CR3","doi-asserted-by":"publisher","first-page":"324","DOI":"10.1101\/gr.7088808","volume":"18","author":"MJ Chaisson","year":"2008","unstructured":"Chaisson MJ, Pevzner PA. Short read fragment assembly of bacterial genomes. Genome Res. 2008; 18(2):324\u201330. [doi:10.1101\/gr.7088808].","journal-title":"Genome Res"},{"issue":"5461","key":"1103_CR4","doi-asserted-by":"publisher","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","volume":"287","author":"EW Myers","year":"2000","unstructured":"Myers EW, Sutton GG, et al.A whole-genome assembly of Drosophila. Science (New York, N.Y.) 2000; 287(5461):2196\u2013204. [doi:10.1126\/science.287.5461.2196].","journal-title":"Science (New York, N.Y.)"},{"issue":"Suppl 2","key":"1103_CR5","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1093\/bioinformatics\/bti1114","volume":"21","author":"EW Myers","year":"2005","unstructured":"Myers EW. The fragment assembly string graph. Bioinformatics. 2005; 21(Suppl 2):79\u201385. [doi:10.1093\/bioinformatics\/bti1114].","journal-title":"Bioinformatics"},{"issue":"14","key":"1103_CR6","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009; 25(14):1754\u201360. [doi:10.1093\/bioinformatics\/btp324].","journal-title":"Bioinformatics"},{"issue":"4","key":"1103_CR7","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/nmeth.1923","volume":"9","author":"B Langmead","year":"2012","unstructured":"Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012; 9(4):357\u20139. [doi:10.1038\/nmeth.1923].","journal-title":"Nat Methods"},{"issue":"16","key":"1103_CR8","doi-asserted-by":"publisher","first-page":"2097","DOI":"10.1093\/bioinformatics\/bts330","volume":"28","author":"H Lee","year":"2012","unstructured":"Lee H, Schatz MC. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics. 2012; 28(16):2097\u2013105. [doi:10.1093\/bioinformatics\/bts330].","journal-title":"Bioinformatics"},{"key":"1103_CR9","doi-asserted-by":"crossref","unstructured":"Deshpande V, Fung EDK, Pham S, Bafna V. Cerulean: A Hybrid Assembly Using High Throughput Short and Long Reads. In: Lecture Notes in Computer Science vol. 8126 LNBI. Springer: 2013. p. 349\u201363, doi:10.1007\/978-3-642-40453-5_27.","DOI":"10.1007\/978-3-642-40453-5_27"},{"issue":"11","key":"1103_CR10","doi-asserted-by":"publisher","first-page":"2270","DOI":"10.1101\/gr.141515.112","volume":"22","author":"FJ Ribeiro","year":"2012","unstructured":"Ribeiro FJ, Przybylski D, Yin S, Sharpe T, Gnerre S, Abouelleil A, Berlin AM, Montmayeur A, Shea TP, Walker BJ, Young SK, Russ C, Nusbaum C, MacCallum I, Jaffe DB. Finished bacterial genomes from shotgun sequence data. Genome Res. 2012; 22(11):2270\u20137. [doi:10.1101\/gr.141515.112].","journal-title":"Genome Res"},{"issue":"17","key":"1103_CR11","doi-asserted-by":"publisher","first-page":"9748","DOI":"10.1073\/pnas.171285098","volume":"98","author":"PA Pevzner","year":"2001","unstructured":"Pevzner PA, Tang H, Waterman MS. An Eulerian path approach to DNA fragment assembly. Proc Natl Acad Sci. 2001; 98(17):9748\u201353. [doi:10.1073\/pnas.171285098].","journal-title":"Proc Natl Acad Sci"},{"issue":"5","key":"1103_CR12","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1089\/cmb.2012.0021","volume":"19","author":"A Bankevich","year":"2012","unstructured":"Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012; 19(5):455\u201377. [doi:10.1089\/cmb.2012.0021].","journal-title":"J Comput Biol"},{"issue":"24","key":"1103_CR13","doi-asserted-by":"publisher","first-page":"3506","DOI":"10.1093\/bioinformatics\/btu538","volume":"30","author":"L Salmela","year":"2014","unstructured":"Salmela L, Rivals E. LoRDEC: accurate and efficient long read error correction. Bioinformatics. 2014; 30(24):3506\u201314. [doi:10.1093\/bioinformatics\/btu538].","journal-title":"Bioinformatics"},{"issue":"1","key":"1103_CR14","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1093\/bib\/bbs015","volume":"14","author":"X Yang","year":"2013","unstructured":"Yang X, Chockalingam SP, Aluru S. A survey of error-correction methods for next-generation sequencing. Brief Bioinform. 2013; 14(1):56\u201366. [doi:10.1093\/bib\/bbs015].","journal-title":"Brief Bioinform"},{"key":"1103_CR15","unstructured":"Benoit G, Lavenier D, Lemaitre C, Rizk G. Bloocoo, a memory efficient read corrector. In: European Conference on Computational Biology (ECCB): 2014. https:\/\/gatb.inria.fr\/software\/bloocoo\/."},{"issue":"6","key":"1103_CR16","doi-asserted-by":"publisher","first-page":"814","DOI":"10.1089\/cmb.2012.0058","volume":"19","author":"M Wang","year":"2012","unstructured":"Wang M, Ye Y, Tang H. A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community. J Comput Biol. 2012; 19(6):814\u201325. [doi:10.1089\/cmb.2012.0058].","journal-title":"J Comput Biol"},{"issue":"13","key":"1103_CR17","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1093\/bioinformatics\/btt215","volume":"29","author":"L Huang","year":"2013","unstructured":"Huang L, Popic V, Batzoglou S. Short read alignment with populations of genomes. Bioinformatics. 2013; 29(13):361\u201370. [doi:10.1093\/bioinformatics\/btt215].","journal-title":"Bioinformatics"},{"issue":"6","key":"1103_CR18","doi-asserted-by":"publisher","first-page":"682","DOI":"10.1038\/ng.3257","volume":"47","author":"A Dilthey","year":"2015","unstructured":"Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G. Improved genome inference in the MHC using a population reference graph. Nat Genet. 2015; 47(6):682\u20138. [doi:10.1038\/ng.3257].","journal-title":"Nat Genet"},{"key":"1103_CR19","unstructured":"Holley G, Peterlongo P. Blastgraph: Intensive approximate pattern matching in sequence graphs and de-bruijn graphs. In: Stringology: 2012. p. 53\u201363. http:\/\/alcovna.genouest.org\/blastree\/."},{"key":"1103_CR20","volume-title":"50 Years of Integer Programming 1958-2008","author":"RM Karp","year":"2010","unstructured":"Karp RM. Reducibility Among Combinatorial Problems. In: 50 Years of Integer Programming 1958-2008. Berlin, Heidelberg: Springer: 2010. p. 219\u201341. doi:10.1007\/978-3-540-68279-0_8.  http:\/\/link.springer.com\/10.1007\/978-3-540-68279-0_8."},{"key":"1103_CR21","doi-asserted-by":"crossref","unstructured":"Chikhi R, Limasset A, Jackman S, Simpson JT, Medvedev P. On the representation of de bruijn graphs. In: Research in Computational Molecular Biology. Springer: 2014. p. 35\u201355, doi:10.1007\/978-3-319-05269-4-4.","DOI":"10.1007\/978-3-319-05269-4_4"},{"issue":"5","key":"1103_CR22","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1101\/gr.074492.107","volume":"18","author":"DR Zerbino","year":"2008","unstructured":"Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008; 18(5):821\u20139. [doi:10.1101\/gr.074492.107].","journal-title":"Genome Res"},{"issue":"1","key":"1103_CR23","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1186\/1748-7188-8-22","volume":"8","author":"R Chikhi","year":"2013","unstructured":"Chikhi R, Rizk G. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithm Mol Biol. 2013; 8(1):22. [doi:10.1186\/1748-7188-8-22].","journal-title":"Algorithm Mol Biol"},{"key":"1103_CR24","doi-asserted-by":"crossref","unstructured":"Vroland C, Salson M, Touzet H. Lossless seeds for searching short patterns with high error rates. In: Combinatorial Algorithms. Springer: 2014. p. 364\u201375.","DOI":"10.1007\/978-3-319-19315-1_32"},{"issue":"Suppl 6","key":"1103_CR25","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/1471-2105-13-S6-S5","volume":"13","author":"GA Sacomoto","year":"2012","unstructured":"Sacomoto GA, Kielbassa J, Chikhi R, Uricaru R, Antoniou P, Sagot MF, Peterlongo P, Lacroix V. Kissplice: de-novo calling alternative splicing events from rna-seq data. BMC Bioinformatics. 2012; 13(Suppl 6):5.","journal-title":"BMC Bioinformatics"},{"key":"1103_CR26","doi-asserted-by":"crossref","unstructured":"Ye Y, Tang H. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics. 2015:btv510. Oxford Univ Press. arXiv preprint arXiv:1504.01304.","DOI":"10.1093\/bioinformatics\/btv510"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1103-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-016-1103-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1103-9","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-1103-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T18:26:09Z","timestamp":1706811969000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-1103-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,16]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["1103"],"URL":"https:\/\/doi.org\/10.1186\/s12859-016-1103-9","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,6,16]]},"assertion":[{"value":"9 December 2015","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 May 2016","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2016","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"237"}}