{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T00:12:53Z","timestamp":1759104773015},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1490,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: One of the difficulties in metagenomic assembly is that homologous genes from evolutionarily closely related species may behave like repeats and confuse assemblers. As a result, small contigs, each representing a short gene fragment, instead of complete genes, may be reported by an assembler. This further complicates annotation of metagenomic datasets, as annotation tools (such as gene predictors or similarity search tools) typically perform poorly on configs encoding short gene fragments.<\/jats:p>\n               <jats:p>Results: We present a novel way of using the de Bruijn graph assembly of metagenomes to improve the assembly of genes. A network matching algorithm is proposed for matching the de Bruijn graph of contigs against reference genes, to derive \u2018gene paths\u2019 in the graph (sequences of contigs containing gene fragments) that have the highest similarities to known genes, allowing gene fragments contained in multiple contigs to be connected to form more complete (or intact) genes. Tests on simulated and real datasets show that our approach (called GeneStitch) is able to significantly improve the assembly of genes from metagenomic sequences, by connecting contigs with the guidance of homologous genes\u2014information that is orthogonal to the sequencing reads. We note that the improvement of gene assembly can be observed even when only distantly related genes are available as the reference. We further propose to use \u2018gene graphs\u2019 to represent the assembly of reads from homologous genes and discuss potential applications of gene graphs to improving functional annotation for metagenomics.<\/jats:p>\n               <jats:p>Availability: The tools are available as open source for download at http:\/\/omics.informatics.indiana.edu\/GeneStitch<\/jats:p>\n               <jats:p>Contact: \u00a0yye@indiana.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts388","type":"journal-article","created":{"date-parts":[[2012,9,7]],"date-time":"2012-09-07T20:35:22Z","timestamp":1347050122000},"page":"i363-i369","source":"Crossref","is-referenced-by-count":12,"title":["Stitching gene fragments with a network matching algorithm improves gene assembly for metagenomics"],"prefix":"10.1093","volume":"28","author":[{"given":"Yu-Wei","family":"Wu","sequence":"first","affiliation":[{"name":"1 School of Informatics and Computing"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mina","family":"Rho","sequence":"additional","affiliation":[{"name":"1 School of Informatics and Computing"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thomas G.","family":"Doak","sequence":"additional","affiliation":[{"name":"2 Department of Biology, Indiana University, Bloomington, IN 47405, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuzhen","family":"Ye","sequence":"additional","affiliation":[{"name":"1 School of Informatics and Computing"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2012,9,3]]},"reference":[{"key":"2023012513033639300_B1","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1016\/j.gde.2006.10.009","article-title":"Whole-genome re-sequencing","volume":"16","author":"Bentley","year":"2006","journal-title":"Curr. Opin. Genet. Dev."},{"key":"2023012513033639300_B2","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1038\/nbt.2023","article-title":"How to apply de Bruijn graphs to genome assembly","volume":"29","author":"Compeau","year":"2011","journal-title":"Natl Biotechnol."},{"key":"2023012513033639300_B3","doi-asserted-by":"crossref","first-page":"4636","DOI":"10.1093\/nar\/27.23.4636","article-title":"Improved microbial gene identification with GLIMMER","volume":"27","author":"Delcher","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023012513033639300_B4","doi-asserted-by":"crossref","first-page":"9061","DOI":"10.1073\/pnas.93.17.9061","article-title":"Gene recognition via spliced sequence alignment","volume":"93","author":"Gelfand","year":"1996","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012513033639300_B5","doi-asserted-by":"crossref","first-page":"e91","DOI":"10.1093\/nar\/gkr225","article-title":"Taxonomic classification of metagenomic shotgun sequences with CARMA3","volume":"39","author":"Gerlach","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012513033639300_B6","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.1126\/science.1124234","article-title":"Metagenomic analysis of the human distal gut microbiome","volume":"312","author":"Gill","year":"2006","journal-title":"Science"},{"key":"2023012513033639300_B7","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-Seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol."},{"key":"2023012513033639300_B8","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1038\/nmeth.1184","article-title":"Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex","volume":"5","author":"Hamady","year":"2008","journal-title":"Nat. Methods"},{"key":"2023012513033639300_B9","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1126\/science.1200387","article-title":"Metagenomic discovery of biomass-degrading genes and genomes from cow rumen","volume":"331","author":"Hess","year":"2011","journal-title":"Science"},{"key":"2023012513033639300_B10","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1186\/1471-2164-10-520","article-title":"The effect of sequencing errors on metagenomic gene prediction","volume":"10","author":"Hoff","year":"2009","journal-title":"BMC Genomics"},{"key":"2023012513033639300_B11","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012513033639300_B12","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2010","journal-title":"Genome Res."},{"key":"2023012513033639300_B13","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1038\/nature03959","article-title":"Genome sequencing in microfabricated high-density picolitre reactors","volume":"437","author":"Margulies","year":"2005","journal-title":"Nature"},{"key":"2023012513033639300_B14","doi-asserted-by":"crossref","first-page":"4103","DOI":"10.1093\/nar\/gkf543","article-title":"Current methods of gene prediction, their strengths and weaknesses","volume":"30","author":"Mathe","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012513033639300_B15","doi-asserted-by":"crossref","first-page":"e10209","DOI":"10.1371\/journal.pone.0010209","article-title":"Metagenomic sequencing of an in vitro-simulated microbial community","volume":"5","author":"Morgan","year":"2010","journal-title":"PLoS One"},{"key":"2023012513033639300_B16","doi-asserted-by":"crossref","first-page":"2317","DOI":"10.1101\/gr.096651.109","article-title":"The NIH Human Microbiome Project","volume":"19","author":"Peterson","year":"2009","journal-title":"Genome Res."},{"key":"2023012513033639300_B17","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1093\/bib\/bbp026","article-title":"Genome assembly reborn: recent computational challenges","volume":"10","author":"Pop","year":"2009","journal-title":"Brief Bioinform."},{"key":"2023012513033639300_B18","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature08821","article-title":"A human gut microbial gene catalogue established by metagenomic sequencing","volume":"464","author":"Qin","year":"2010","journal-title":"Nature"},{"key":"2023012513033639300_B19","doi-asserted-by":"crossref","first-page":"e191","DOI":"10.1093\/nar\/gkq747","article-title":"FragGeneScan: predicting genes in short and error-prone reads","volume":"38","author":"Rho","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023012513033639300_B20","doi-asserted-by":"crossref","first-page":"e3373","DOI":"10.1371\/journal.pone.0003373","article-title":"MetaSim: a sequencing simulator for genomics and metagenomics","volume":"3","author":"Richter","year":"2008","journal-title":"PLoS One"},{"key":"2023012513033639300_B21","doi-asserted-by":"crossref","first-page":"e1000186","DOI":"10.1371\/journal.pcbi.1000186","article-title":"Gene-boosted assembly of a novel bacterial genome from very short reads","volume":"4","author":"Salzberg","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012513033639300_B22","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1186\/gb-2005-6-8-229","article-title":"Metagenomics for studying unculturable microorganisms: cutting the Gordian knot","volume":"6","author":"Schloss","year":"2005","journal-title":"Genome Biol."},{"key":"2023012513033639300_B23","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1038\/nature05414","article-title":"An obesity-associated gut microbiome with increased capacity for energy harvest","volume":"444","author":"Turnbaugh","year":"2006","journal-title":"Nature"},{"key":"2023012513033639300_B24","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1038\/nature02340","article-title":"Community structure and metabolism through reconstruction of microbial genomes from the environment","volume":"428","author":"Tyson","year":"2004","journal-title":"Nature"},{"key":"2023012513033639300_B25","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1126\/science.1093857","article-title":"Environmental genome shotgun sequencing of the Sargasso Sea","volume":"304","author":"Venter","year":"2004","journal-title":"Science"},{"key":"2023012513033639300_B26","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1128\/AEM.02181-07","article-title":"Metagenomics: read length matters","volume":"74","author":"Wommack","year":"2008","journal-title":"Appl. Environ. Microbiol."},{"key":"2023012513033639300_B27","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1142\/S0219720009004151","article-title":"An ORFome assembly approach to metagenomics sequences analysis","volume":"7","author":"Ye","year":"2009","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023012513033639300_B28","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1093\/bioinformatics\/btg073","article-title":"Asegment alignment approach to protein comparison","volume":"19","author":"Ye","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012513033639300_B29","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1093\/bioinformatics\/btm542","article-title":"Assembly reconciliation","volume":"24","author":"Zimin","year":"2008","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/18\/i363\/48884171\/bioinformatics_28_18_i363.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/18\/i363\/48884171\/bioinformatics_28_18_i363.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T18:53:41Z","timestamp":1674672821000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/18\/i363\/246963"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9,3]]},"references-count":29,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2012,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts388","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,9,15]]},"published":{"date-parts":[[2012,9,3]]}}}