{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T09:25:35Z","timestamp":1777713935445,"version":"3.51.4"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2010,1,8]],"date-time":"2010-01-08T00:00:00Z","timestamp":1262908800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2010,1,8]],"date-time":"2010-01-08T00:00:00Z","timestamp":1262908800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>To quantitatively validate methods for pathway prediction, we developed a large \"gold standard\" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including na\u00efve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and\/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-11-15","type":"journal-article","created":{"date-parts":[[2010,1,9]],"date-time":"2010-01-09T19:14:42Z","timestamp":1263064482000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":134,"title":["Machine learning methods for metabolic pathway prediction"],"prefix":"10.1186","volume":"11","author":[{"given":"Joseph M","family":"Dale","sequence":"first","affiliation":[]},{"given":"Liviu","family":"Popescu","sequence":"additional","affiliation":[]},{"given":"Peter D","family":"Karp","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2010,1,8]]},"reference":[{"key":"3472_CR1","doi-asserted-by":"publisher","first-page":"D464","DOI":"10.1093\/nar\/gkn751","volume":"37","author":"I Keseler","year":"2009","unstructured":"Keseler I, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus R, Johnson DA, Krummenacker M, Nolan L, Paley S, Paulsen I, Peralta-Gil M, Santos-Zavaleta A, Shearer A, Karp P: EcoCyc: A comprehensive view of E. coli biology. Nuc Acids Res 2009, 37: D464\u201370. 10.1093\/nar\/gkn751","journal-title":"Nuc Acids Res"},{"key":"3472_CR2","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1038\/msb4100155","volume":"3","author":"A Feist","year":"2007","unstructured":"Feist A, Henry C, Reed J, Krummenacker M, Joyce A, Karp P, Broadbelt L, Hatzimanikatis V, Palsson B: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol 2007, 3: 121\u201338. 10.1038\/msb4100155","journal-title":"Mol Syst Biol"},{"issue":"5","key":"3472_CR3","doi-asserted-by":"publisher","first-page":"715","DOI":"10.1093\/bioinformatics\/18.5.715","volume":"18","author":"S Paley","year":"2002","unstructured":"Paley S, Karp P: Evaluation of computational metabolic-pathway predictions for H. pylori . Bioinformatics 2002, 18(5):715\u201324. 10.1093\/bioinformatics\/18.5.715","journal-title":"Bioinformatics"},{"key":"3472_CR4","doi-asserted-by":"publisher","first-page":"D623","DOI":"10.1093\/nar\/gkm900","volume":"36","author":"R Caspi","year":"2008","unstructured":"Caspi R, Foerster H, Fulcher C, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer A, Tissier C, Walk T, Zhang P, Karp PD: The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway\/Genome Databases. Nuc Acids Res 2008, 36: D623\u201331. 10.1093\/nar\/gkm900","journal-title":"Nuc Acids Res"},{"key":"3472_CR5","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1186\/1471-2105-5-76","volume":"5","author":"M Green","year":"2004","unstructured":"Green M, Karp P: A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 2004, 5: 76. 10.1186\/1471-2105-5-76","journal-title":"BMC Bioinformatics"},{"key":"3472_CR6","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1104\/pp.105.060376","volume":"138","author":"P Zhang","year":"2005","unstructured":"Zhang P, Foerster H, Tissier CP, Mueller L, Paley S, Karp PD, Rhee SY: MetaCyc and AraCyc. Metabolic Pathway Databases for Plant Research. Plant Physiol 2005, 138: 27\u201337. 10.1104\/pp.105.060376","journal-title":"Plant Physiol"},{"key":"3472_CR7","unstructured":"AraCyc Database[http:\/\/www.arabidopsis.org\/biocyc\/]"},{"key":"3472_CR8","unstructured":"YeastCyc Database[http:\/\/pathway.yeastgenome.org\/]"},{"key":"3472_CR9","unstructured":"MouseCyc Database[http:\/\/mousecyc.jax.org:8000\/]"},{"key":"3472_CR10","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/1752-0509-3-33","volume":"3","author":"S Seo","year":"2009","unstructured":"Seo S, Lewin HA: Reconstruction of metabolic pathways for the cattle genome. BMC Syst Biol 2009, 3: 33. 10.1186\/1752-0509-3-33","journal-title":"BMC Syst Biol"},{"key":"3472_CR11","doi-asserted-by":"publisher","first-page":"S225","DOI":"10.1093\/bioinformatics\/18.suppl_1.S225","volume":"18","author":"P Karp","year":"2002","unstructured":"Karp P, Paley S, Romero P: The Pathway Tools Software. Bioinformatics 2002, 18: S225-S232.","journal-title":"Bioinformatics"},{"key":"3472_CR12","volume-title":"Tech. Rep. FIA-91-28, NASA Ames Research Center","author":"W Buntine","year":"1991","unstructured":"Buntine W, Caruana R: Introduction to IND and recursive partitioning. Tech. Rep. FIA-91\u201328, NASA Ames Research Center 1991."},{"key":"3472_CR13","unstructured":"IND software package[http:\/\/opensource.arc.nasa.gov\/project\/ind\/]"},{"key":"3472_CR14","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1007\/BF01889584","volume":"2","author":"W Buntine","year":"1992","unstructured":"Buntine W: Learning classification trees. Statistics and Computing 1992, 2: 63\u201373. 10.1007\/BF01889584","journal-title":"Statistics and Computing"},{"key":"3472_CR15","volume-title":"A Course in Probability and Statistics","author":"CJ Stone","year":"1996","unstructured":"Stone CJ: A Course in Probability and Statistics. Duxbury Press; 1996."},{"issue":"6","key":"3472_CR16","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1109\/TAC.1974.1100705","volume":"19","author":"H Akaike","year":"1974","unstructured":"Akaike H: A new look at the statistical model identification. IEEE Transactions on Automatic Control 1974, 19(6):716\u2013723. 10.1109\/TAC.1974.1100705","journal-title":"IEEE Transactions on Automatic Control"},{"issue":"2","key":"3472_CR17","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1214\/aos\/1176344136","volume":"6","author":"G Schwarz","year":"1978","unstructured":"Schwarz G: Estimating the Dimension of a Model. The Annals of Statistics 1978, 6(2):461\u2013464. 10.1214\/aos\/1176344136","journal-title":"The Annals of Statistics"},{"issue":"2","key":"3472_CR18","first-page":"123","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L: Bagging Predictors. Machine Learning 1996, 24(2):123\u2013140.","journal-title":"Machine Learning"},{"key":"3472_CR19","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L: Random Forests. Machine Learning 2001, 45: 5\u201332. 10.1023\/A:1010933404324","journal-title":"Machine Learning"},{"issue":"17","key":"3472_CR20","doi-asserted-by":"publisher","first-page":"5691","DOI":"10.1093\/nar\/gki866","volume":"33","author":"R Overbeek","year":"2005","unstructured":"Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nuc Acids Res 2005, 33(17):5691\u20135702. 10.1093\/nar\/gki866","journal-title":"Nuc Acids Res"},{"key":"3472_CR21","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1186\/1471-2105-8-139","volume":"8","author":"M DeJongh","year":"2007","unstructured":"DeJongh M, Formsma K, Boillot P, Gould J, Rycenga M, Best A: Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics 2007, 8: 139. 10.1186\/1471-2105-8-139","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"3472_CR22","doi-asserted-by":"publisher","first-page":"i478","DOI":"10.1093\/bioinformatics\/bti1052","volume":"21","author":"Y Ye","year":"2005","unstructured":"Ye Y, Osterman A, Overbeek R, Godzik A: Automatic detection of subsystem\/pathway variants in genome analysis. Bioinformatics 2005, 21(Suppl 1):i478-i486. 10.1093\/bioinformatics\/bti1052","journal-title":"Bioinformatics"},{"key":"3472_CR23","doi-asserted-by":"crossref","unstructured":"Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D, de Bono B, Garapati P, Hemish J, Hermjakob H, Jassal B, Kanapin A, Lewis S, Mahajan S, May B, Schmidt E, Vastrik I, Wu G, Birney E, Stein L, D'Eustachio P: Reactome knowledgebase of human biological pathways and processes. Nuc Acids Res 2009, (37 Database):D619\u201322. 10.1093\/nar\/gkn863","DOI":"10.1093\/nar\/gkn863"},{"key":"3472_CR24","doi-asserted-by":"publisher","first-page":"W423","DOI":"10.1093\/nar\/gkn282","volume":"36","author":"S Okuda","year":"2008","unstructured":"Okuda S, Yamada T, Hamajima M, Itoh M, Katayama T, Bork P, Goto S, Kanehisa M: KEGG Atlas mapping for global analysis of metabolic pathways. Nuc Acids Res 2008, 36: W423\u201326. 10.1093\/nar\/gkn282","journal-title":"Nuc Acids Res"},{"key":"3472_CR25","doi-asserted-by":"publisher","first-page":"3687","DOI":"10.1093\/nar\/gkl438","volume":"34","author":"M Green","year":"2006","unstructured":"Green M, Karp P: The Outcomes of Pathway Database Computations Depend on Pathway Ontology. Nuc Acids Res 2006, 34: 3687\u201397. 10.1093\/nar\/gkl438","journal-title":"Nuc Acids Res"},{"key":"3472_CR26","doi-asserted-by":"publisher","first-page":"994","DOI":"10.1038\/nbt1094-994","volume":"12","author":"A Varma","year":"1994","unstructured":"Varma A, Palsson B: Metabolic Flux Balancing: Basic concepts, Scientific and Practical Use. Bio\/Technology 1994, 12: 994\u20138. 10.1038\/nbt1094-994","journal-title":"Bio\/Technology"},{"key":"3472_CR27","first-page":"469","volume-title":"Proceedings of the 6th International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES 02)","author":"L Liao","year":"2002","unstructured":"Liao L, Kim S, Tomb JF: Genome comparisons based on profiles of metabolic pathways. Proceedings of the 6th International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES 02) 2002, 469\u2013476."},{"issue":"16","key":"3472_CR28","doi-asserted-by":"publisher","first-page":"i56","DOI":"10.1093\/bioinformatics\/btn302","volume":"24","author":"G Kastenmuller","year":"2008","unstructured":"Kastenmuller G, Gasteiger J, Mewes HW: An environmental perspective on large-scale genome clustering based on metabolic capabilities. Bioinformatics 2008, 24(16):i56\u201362. 10.1093\/bioinformatics\/btn302","journal-title":"Bioinformatics"},{"issue":"3","key":"3472_CR29","doi-asserted-by":"publisher","first-page":"R28","DOI":"10.1186\/gb-2009-10-3-r28","volume":"10","author":"G Kastenmuller","year":"2009","unstructured":"Kastenmuller G, Schenk ME, Gasteiger J, Mewes HW: Uncovering metabolic pathways relevant to phenotypic traits of microbial genomes. Genome Biol 2009, 10(3):R28. 10.1186\/gb-2009-10-3-r28","journal-title":"Genome Biol"},{"key":"3472_CR30","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1186\/1471-2105-5-112","volume":"5","author":"J Sun","year":"2004","unstructured":"Sun J, Zeng AP: IdentiCS - Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence. BMC Bioinformatics 2004, 5: 112. 10.1186\/1471-2105-5-112","journal-title":"BMC Bioinformatics"},{"issue":"4","key":"3472_CR31","doi-asserted-by":"publisher","first-page":"1399","DOI":"10.1093\/nar\/gki285","volume":"33","author":"JW Pinney","year":"2005","unstructured":"Pinney JW, Shirley MW, McConkey GA, Westhead DR: metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella. Nucleic Acids Research 2005, 33(4):1399\u20131409. 10.1093\/nar\/gki285","journal-title":"Nucleic Acids Research"},{"key":"3472_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/CIBCB.2005.1594924","volume-title":"Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on","author":"L Pireddu","year":"2005","unstructured":"Pireddu L, Poulin B, Szafron D, Lu P, Wishart DS: Pathway Analyst -- Automated Metabolic Pathway Prediction. Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on 2005, 1\u20138. full_text"},{"issue":"suppl 2","key":"3472_CR33","doi-asserted-by":"publisher","first-page":"W714","DOI":"10.1093\/nar\/gkl228","volume":"34","author":"L Pireddu","year":"2006","unstructured":"Pireddu L, Szafron D, Lu P, Greiner R: The Path-A metabolic pathway prediction web server. Nucleic Acids Research 2006, 34(suppl 2):W714\u2013719. 10.1093\/nar\/gkl228","journal-title":"Nucleic Acids Research"},{"issue":"13","key":"3472_CR34","doi-asserted-by":"publisher","first-page":"1692","DOI":"10.1093\/bioinformatics\/btg217","volume":"19","author":"D McShan","year":"2003","unstructured":"McShan D, Rao S, Shah I: PathMiner: Predicting metabolic pathways by heuristic search. Bioinformatics 2003, 19(13):1692\u20138. 10.1093\/bioinformatics\/btg217","journal-title":"Bioinformatics"},{"issue":"20","key":"3472_CR35","doi-asserted-by":"publisher","first-page":"2775","DOI":"10.1093\/bioinformatics\/btm409","volume":"23","author":"A Cakmak","year":"2007","unstructured":"Cakmak A, Ozsoyoglu G: Mining biological networks for unknown pathways. Bioinformatics 2007, 23(20):2775\u20132783. 10.1093\/bioinformatics\/btm409","journal-title":"Bioinformatics"},{"issue":"suppl 1","key":"3472_CR36","doi-asserted-by":"publisher","first-page":"i468","DOI":"10.1093\/bioinformatics\/bti1012","volume":"21","author":"Y Yamanishi","year":"2005","unstructured":"Yamanishi Y, Vert JP, Kanehisa M: Supervised enzyme network inference from the integration of genomic data and chemical information. Bioinformatics 2005, 21(suppl 1):i468\u2013477. 10.1093\/bioinformatics\/bti1012","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-15.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-11-15\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-11-15.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T12:13:53Z","timestamp":1630498433000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-11-15"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,1,8]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["3472"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-11-15","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,1,8]]},"assertion":[{"value":"3 August 2009","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2010","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 January 2010","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"15"}}