{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,16]],"date-time":"2025-06-16T17:45:59Z","timestamp":1750095959715},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2011,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called \u201cmissing gene\u201d problem.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of <jats:italic>E. coli<\/jats:italic> in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>An effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on <jats:italic>E. coli<\/jats:italic> at a genome level. Numerous missing genes are found to be related to knwon <jats:italic>E. coli<\/jats:italic> pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-12-s1-s1","type":"journal-article","created":{"date-parts":[[2011,2,18]],"date-time":"2011-02-18T20:05:54Z","timestamp":1298059554000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Genome-wide discovery of missing genes in biological pathways of prokaryotes"],"prefix":"10.1186","volume":"12","author":[{"given":"Yong","family":"Chen","sequence":"first","affiliation":[]},{"given":"Fenglou","family":"Mao","sequence":"additional","affiliation":[]},{"given":"Guojun","family":"Li","sequence":"additional","affiliation":[]},{"given":"Ying","family":"Xu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,2,15]]},"reference":[{"issue":"Databaseissue","key":"4359_CR1","first-page":"D258","volume":"32","author":"MA Harris","year":"2004","unstructured":"Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al.: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004, 32(Databaseissue):D258\u2013261.","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"4359_CR2","doi-asserted-by":"publisher","first-page":"240","DOI":"10.1093\/bfgp\/elm027","volume":"6","author":"C Wierling","year":"2007","unstructured":"Wierling C, Herwig R, Lehrach H: Resources, standards and tools for systems biology. Brief Funct Genomic Proteomic 2007, 6(3):240\u2013251. 10.1093\/bfgp\/elm027","journal-title":"Brief Funct Genomic Proteomic"},{"issue":"Databaseissue","key":"4359_CR3","doi-asserted-by":"publisher","first-page":"D354","DOI":"10.1093\/nar\/gkj102","volume":"34","author":"M Kanehisa","year":"2006","unstructured":"Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34(Databaseissue):D354\u2013357. 10.1093\/nar\/gkj102","journal-title":"Nucleic Acids Res"},{"key":"4359_CR4","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1186\/1471-2105-4-41","volume":"4","author":"RL Tatusov","year":"2003","unstructured":"Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003, 4: 41. 10.1186\/1471-2105-4-41","journal-title":"BMC Bioinformatics"},{"issue":"Databaseissue","key":"4359_CR5","doi-asserted-by":"publisher","first-page":"D464","DOI":"10.1093\/nar\/gkn751","volume":"37","author":"IM Keseler","year":"2009","unstructured":"Keseler IM, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M, Nolan LM, Paley S, Paulsen IT, et al.: EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res 2009, 37(Databaseissue):D464\u2013470. 10.1093\/nar\/gkn751","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"4359_CR6","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1016\/S1367-5931(03)00027-9","volume":"7","author":"A Osterman","year":"2003","unstructured":"Osterman A, Overbeek R: Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol 2003, 7(2):238\u2013251. 10.1016\/S1367-5931(03)00027-9","journal-title":"Curr Opin Chem Biol"},{"issue":"5","key":"4359_CR7","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1007\/s002030050780","volume":"172","author":"SJ Cordwell","year":"1999","unstructured":"Cordwell SJ: Microbial genomes and \"missing\" enzymes: redefining biochemical pathways. Arch Microbiol 1999, 172(5):269\u2013279. 10.1007\/s002030050780","journal-title":"Arch Microbiol"},{"key":"4359_CR8","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1186\/1471-2105-5-76","volume":"5","author":"ML Green","year":"2004","unstructured":"Green ML, Karp PD: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 2004, 5: 76. 10.1186\/1471-2105-5-76","journal-title":"BMC Bioinformatics"},{"key":"4359_CR9","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1186\/1471-2105-7-177","volume":"7","author":"P Kharchenko","year":"2006","unstructured":"Kharchenko P, Chen L, Freund Y, Vitkup D, Church GM: Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 2006, 7: 177. 10.1186\/1471-2105-7-177","journal-title":"BMC Bioinformatics"},{"key":"4359_CR10","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1186\/1471-2105-8-139","volume":"8","author":"M DeJongh","year":"2007","unstructured":"DeJongh M, Formsma K, Boillot P, Gould J, Rycenga M, Best A: Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics 2007, 8: 139. 10.1186\/1471-2105-8-139","journal-title":"BMC Bioinformatics"},{"issue":"4","key":"4359_CR11","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1006\/jmbi.2001.4701","volume":"311","author":"G Kolesov","year":"2001","unstructured":"Kolesov G, Mewes HW, Frishman D: SNAPping up functionally related genes based on context information: a colinearity-free approach. J Mol Biol 2001, 311(4):639\u2013656. 10.1006\/jmbi.2001.4701","journal-title":"J Mol Biol"},{"issue":"8","key":"4359_CR12","doi-asserted-by":"publisher","first-page":"1078","DOI":"10.1093\/bioinformatics\/btn066","volume":"24","author":"G Sanguinetti","year":"2008","unstructured":"Sanguinetti G, Noirel J, Wright PC: MMG: a probabilistic tool to identify submodules of metabolic pathways. Bioinformatics 2008, 24(8):1078\u20131084. 10.1093\/bioinformatics\/btn066","journal-title":"Bioinformatics"},{"key":"4359_CR13","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/1752-0509-1-8","volume":"1","author":"I Ulitsky","year":"2007","unstructured":"Ulitsky I, Shamir R: Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 2007, 1: 8. 10.1186\/1752-0509-1-8","journal-title":"BMC Syst Biol"},{"issue":"13","key":"4359_CR14","doi-asserted-by":"publisher","first-page":"i577","DOI":"10.1093\/bioinformatics\/btm227","volume":"23","author":"X Yan","year":"2007","unstructured":"Yan X, Mehan MR, Huang Y, Waterman MS, Yu PS, Zhou XJ: A graph-based approach to systematically reconstruct human transcriptional regulatory modules. Bioinformatics 2007, 23(13):i577\u2013586. 10.1093\/bioinformatics\/btm227","journal-title":"Bioinformatics"},{"issue":"13","key":"4359_CR15","doi-asserted-by":"publisher","first-page":"i222","DOI":"10.1093\/bioinformatics\/btm222","volume":"23","author":"Y Huang","year":"2007","unstructured":"Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ: Systematic discovery of functional modules and context-specific functional annotation of human genome. Bioinformatics 2007, 23(13):i222\u2013229. 10.1093\/bioinformatics\/btm222","journal-title":"Bioinformatics"},{"issue":"20","key":"4359_CR16","doi-asserted-by":"publisher","first-page":"2775","DOI":"10.1093\/bioinformatics\/btm409","volume":"23","author":"A Cakmak","year":"2007","unstructured":"Cakmak A, Ozsoyoglu G: Mining biological networks for unknown pathways. Bioinformatics 2007, 23(20):2775\u20132783. 10.1093\/bioinformatics\/btm409","journal-title":"Bioinformatics"},{"key":"4359_CR17","volume-title":"Brief Bioinform","author":"RW Brouwer","year":"2008","unstructured":"Brouwer RW, Kuipers OP, Hijum SA: The relative value of operon predictions. Brief Bioinform 2008."},{"issue":"1","key":"4359_CR18","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1093\/nar\/gkl1018","volume":"35","author":"P Dam","year":"2007","unstructured":"Dam P, Olman V, Harris K, Su Z, Xu Y: Operon prediction using both genome-specific and general genomic information. Nucleic Acids Res 2007, 35(1):288\u2013298. 10.1093\/nar\/gkl1018","journal-title":"Nucleic Acids Res"},{"issue":"Databaseissue","key":"4359_CR19","doi-asserted-by":"publisher","first-page":"D459","DOI":"10.1093\/nar\/gkn757","volume":"37","author":"F Mao","year":"2009","unstructured":"Mao F, Dam P, Chou J, Olman V, Xu Y: DOOR: a database for prokaryotic operons. Nucleic Acids Res 2009, 37(Databaseissue):D459\u2013463. 10.1093\/nar\/gkn757","journal-title":"Nucleic Acids Res"},{"issue":"7","key":"4359_CR20","doi-asserted-by":"publisher","first-page":"911","DOI":"10.1038\/nbt988","volume":"22","author":"JO Korbel","year":"2004","unstructured":"Korbel JO, Jensen LJ, von Mering C, Bork P: Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 2004, 22(7):911\u2013917. 10.1038\/nbt988","journal-title":"Nat Biotechnol"},{"issue":"8","key":"4359_CR21","doi-asserted-by":"publisher","first-page":"4285","DOI":"10.1073\/pnas.96.8.4285","volume":"96","author":"M Pellegrini","year":"1999","unstructured":"Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 1999, 96(8):4285\u20134288. 10.1073\/pnas.96.8.4285","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"16","key":"4359_CR22","doi-asserted-by":"publisher","first-page":"3409","DOI":"10.1093\/bioinformatics\/bti532","volume":"21","author":"J Sun","year":"2005","unstructured":"Sun J, Xu J, Liu Z, Liu Q, Zhao A, Shi T, Li Y: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 2005, 21(16):3409\u20133415. 10.1093\/bioinformatics\/bti532","journal-title":"Bioinformatics"},{"issue":"9","key":"4359_CR23","doi-asserted-by":"publisher","first-page":"2822","DOI":"10.1093\/nar\/gki573","volume":"33","author":"H Wu","year":"2005","unstructured":"Wu H, Su Z, Mao F, Olman V, Xu Y: Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res 2005, 33(9):2822\u20132837. 10.1093\/nar\/gki573","journal-title":"Nucleic Acids Res"},{"issue":"23","key":"4359_CR24","doi-asserted-by":"publisher","first-page":"8774","DOI":"10.1073\/pnas.0510258103","volume":"103","author":"V Spirin","year":"2006","unstructured":"Spirin V, Gelfand MS, Mironov AA, Mirny LA: A metabolic network in the evolutionary context: multiscale structure and modularity. Proc Natl Acad Sci U S A 2006, 103(23):8774\u20138779. 10.1073\/pnas.0510258103","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"5586","key":"4359_CR25","doi-asserted-by":"publisher","first-page":"1551","DOI":"10.1126\/science.1073374","volume":"297","author":"E Ravasz","year":"2002","unstructured":"Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL: Hierarchical organization of modularity in metabolic networks. Science 2002, 297(5586):1551\u20131555. 10.1126\/science.1073374","journal-title":"Science"},{"issue":"7191","key":"4359_CR26","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1038\/nature06830","volume":"453","author":"A Clauset","year":"2008","unstructured":"Clauset A, Moore C, Newman ME: Hierarchical structure and the prediction of missing links in networks. Nature 2008, 453(7191):98\u2013101. 10.1038\/nature06830","journal-title":"Nature"},{"issue":"Databaseissue","key":"4359_CR27","doi-asserted-by":"publisher","first-page":"D394","DOI":"10.1093\/nar\/gkj156","volume":"34","author":"H Salgado","year":"2006","unstructured":"Salgado H, Gama-Castro S, Peralta-Gil M, Diaz-Peredo E, Sanchez-Solano F, Santos-Zavaleta A, Martinez-Flores I, Jimenez-Jacinto V, Bonavides-Martinez C, Segura-Salazar J, et al.: RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res 2006, 34(Databaseissue):D394\u2013397. 10.1093\/nar\/gkj156","journal-title":"Nucleic Acids Res"},{"issue":"Databaseissue","key":"4359_CR28","doi-asserted-by":"publisher","first-page":"D273","DOI":"10.1093\/nar\/gkh053","volume":"32","author":"K Suhre","year":"2004","unstructured":"Suhre K, Claverie JM: FusionDB: a database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res 2004, 32(Databaseissue):D273\u2013276. 10.1093\/nar\/gkh053","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-12-S1-S1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T13:28:24Z","timestamp":1630502904000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-12-S1-S1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,2,15]]},"references-count":28,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2011,12]]}},"alternative-id":["4359"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-12-s1-s1","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,2,15]]},"assertion":[{"value":"15 February 2011","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S1"}}