{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T23:33:11Z","timestamp":1770247991089,"version":"3.49.0"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T00:00:00Z","timestamp":1767744000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T00:00:00Z","timestamp":1770163200000},"content-version":"vor","delay-in-days":28,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Science Foundation","award":["2020026"],"award-info":[{"award-number":["2020026"]}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"crossref","award":["P42 ES007380"],"award-info":[{"award-number":["P42 ES007380"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>The associations of metabolites with biochemical pathways are highly useful information for interpreting molecular datasets generated in biological and biomedical research. However, such pathway annotations are sparse in most molecular datasets, limiting their utility for pathway level interpretation. To address these shortcomings, several past publications have presented machine learning models for predicting the pathway association of small biomolecule (metabolite and xenobiotic) using data from the Kyoto Encyclopedia of Genes and Genomes (KEGG). But other similar knowledgebases exist, for example MetaCyc, which has more compound entries and pathway definitions than KEGG.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>As a logical next step, we trained and evaluated multilayer perceptron models on compound entries and pathway annotations obtained from MetaCyc. From the models trained on this dataset, we observed a mean Matthews correlation coefficient (MCC) of 0.845 with 0.0101 standard deviation, compared to a mean MCC of 0.847 with 0.0098 standard deviation for the KEGG dataset. However, KEGG\u2019s 184 metabolic-only pathway predictions (out of 502 total pathways) have a mean MCC of 0.800 with 0.021 standard deviation. Since MetaCyc pathways are metabolic focused, the MetaCyc results represent over a 5.6% improvement in metabolic pathway prediction performance.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>These performance results are pragmatically the same, demonstrating that in aggregate, the 4055 MetaCyc pathways can be effectively predicted at the current state-of-the-art performance level.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-025-06358-z","type":"journal-article","created":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T03:44:58Z","timestamp":1767757498000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Predicting the pathway involvement of metabolites annotated in the MetaCyc knowledgebase"],"prefix":"10.1186","volume":"27","author":[{"given":"Erik D.","family":"Huckvale","sequence":"first","affiliation":[]},{"given":"Hunter N. B.","family":"Moseley","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2026,1,7]]},"reference":[{"key":"6358_CR1","volume-title":"Fundamentals of biochemistry: life at the molecular","author":"D Voet","year":"2016","unstructured":"Voet D, Voet JG, Pratt CW. Fundamentals of biochemistry: life at the molecular. 5th ed. Hoboken: Wiley; 2016.","edition":"5"},{"key":"6358_CR2","volume-title":"Biochemistry","author":"JM Berg","year":"2019","unstructured":"Berg JM, Tymoczko JL, Gatto GJ, Stryer L. Biochemistry. 9th ed. New York: W. H. Freeman; 2019.","edition":"9"},{"key":"6358_CR3","volume-title":"Principles of biochemistry","author":"DL Nelson","year":"2021","unstructured":"Nelson DL, Cox MM. Principles of biochemistry. 8th ed. New York: W. H. Freeman; 2021.","edition":"8"},{"issue":"1","key":"6358_CR4","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1093\/nar\/28.1.27","volume":"28","author":"M Kanehisa","year":"2000","unstructured":"Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27\u201330.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"6358_CR5","doi-asserted-by":"publisher","first-page":"D445","DOI":"10.1093\/nar\/gkz862","volume":"48","author":"R Caspi","year":"2020","unstructured":"Caspi R, Billington R, Keseler IM, Kothari A, Krummenacker M, Midford PE, et al. The metacyc database of metabolic pathways and enzymes\u2014a 2019 update. Nucleic Acids Res. 2020;48(D1):D445\u201353.","journal-title":"Nucleic Acids Res"},{"issue":"5","key":"6358_CR6","doi-asserted-by":"publisher","first-page":"e0299583","DOI":"10.1371\/journal.pone.0299583","volume":"19","author":"ED Huckvale","year":"2024","unstructured":"Huckvale ED, Moseley HNB. A cautionary Tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement. PLoS ONE. 2024;19(5):e0299583.","journal-title":"PLoS ONE"},{"issue":"11","key":"6358_CR7","doi-asserted-by":"publisher","first-page":"1120","DOI":"10.3390\/metabo13111120","volume":"13","author":"ED Huckvale","year":"2023","unstructured":"Huckvale ED, Powell CD, Jin H, Moseley HNB. Benchmark dataset for training machine learning models to predict the pathway involvement of metabolites. Metabolites. 2023;13(11):1120.","journal-title":"Metabolites"},{"issue":"5","key":"6358_CR8","doi-asserted-by":"publisher","first-page":"266","DOI":"10.3390\/metabo14050266","volume":"14","author":"ED Huckvale","year":"2024","unstructured":"Huckvale ED, Moseley HNB. Predicting the pathway involvement of metabolites based on combined metabolite and pathway features. Metabolites. 2024;14(5):266.","journal-title":"Metabolites"},{"issue":"9","key":"6358_CR9","doi-asserted-by":"publisher","first-page":"510","DOI":"10.3390\/metabo14090510","volume":"14","author":"ED Huckvale","year":"2024","unstructured":"Huckvale ED, Moseley HNB. Predicting the association of metabolites with both pathway categories and individual pathways. Metabolites. 2024;14(9):510.","journal-title":"Metabolites"},{"issue":"11","key":"6358_CR10","doi-asserted-by":"publisher","first-page":"582","DOI":"10.3390\/metabo14110582","volume":"14","author":"ED Huckvale","year":"2024","unstructured":"Huckvale ED, Moseley HNB. Predicting the pathway involvement of all pathway and associated compound entries defined in the Kyoto encyclopedia of genes and genomes. Metabolites. 2024;14(11):582.","journal-title":"Metabolites"},{"issue":"3","key":"6358_CR11","first-page":"244","volume":"32","author":"A Dalby","year":"1992","unstructured":"Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland BA, et al. Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Inf Model. 1992;32(3):244\u201355.","journal-title":"J Chem Inf Model"},{"issue":"7","key":"6358_CR12","doi-asserted-by":"publisher","first-page":"431","DOI":"10.3390\/metabo11070431","volume":"11","author":"H Jin","year":"2021","unstructured":"Jin H, Moseley HNB. Hierarchical harmonization of atom-resolved metabolic reactions across metabolic databases. Metabolites. 2021;11(7):431.","journal-title":"Metabolites"},{"issue":"9","key":"6358_CR13","doi-asserted-by":"publisher","first-page":"368","DOI":"10.3390\/metabo10090368","volume":"10","author":"H Jin","year":"2020","unstructured":"Jin H, Mitchell JM, Moseley HNB. Atom identifiers generated by a neighborhood-specific graph coloring method enable compound harmonization across metabolic databases. Metabolites. 2020;10(9):368.","journal-title":"Metabolites"},{"issue":"12","key":"6358_CR14","doi-asserted-by":"publisher","first-page":"1199","DOI":"10.3390\/metabo13121199","volume":"13","author":"H Jin","year":"2023","unstructured":"Jin H, Moseley HNB. md_harmonize: a python package for atom-level harmonization of public metabolic databases. Metabolites. 2023;13(12):1199.","journal-title":"Metabolites"},{"key":"6358_CR15","unstructured":"Verstraeten G, Van den Poel D. Using predicted outcome stratified sampling to reduce the variability in predictive performance of a One-Shot Train-and-Test split for individual customer predictions. ICDM (Posters). 2006;214:1\u201310."},{"key":"6358_CR16","unstructured":"Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. arXiv. 2014."},{"key":"6358_CR17","doi-asserted-by":"publisher","first-page":"47112","DOI":"10.1109\/ACCESS.2021.3068614","volume":"9","author":"D Chicco","year":"2021","unstructured":"Chicco D, Starovoitov V, Jurman G. The benefits of the Matthews correlation coefficient (MCC) over the diagnostic odds ratio (DOR) in binary classification assessment. IEEE Access. 2021;9:47112\u201324.","journal-title":"IEEE Access"},{"key":"6358_CR18","unstructured":"Rossum GV, Drake FL. Python 3 reference manual. CreateSpace; 2009."},{"key":"6358_CR19","unstructured":"The pandas development team. pandas-dev\/pandas: pandas 1.0.3. Zenodo. 2020."},{"issue":"7825","key":"6358_CR20","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/s41586-020-2649-2","volume":"585","author":"CR Harris","year":"2020","unstructured":"Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with numpy. Nature. 2020;585(7825):357\u201362.","journal-title":"Nature"},{"key":"6358_CR21","unstructured":"Collette A. Python and HDF5. O\u2019Reilly; 2013."},{"key":"6358_CR22","unstructured":"Falcon W, Borovec J, W\u00e4lchli A, Eggert N, Schock J, Jordan J et al. PyTorchLightning\/pytorch-lightning: 0.7.6 release. Zenodo. 2020."},{"key":"6358_CR23","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv. 2019."},{"key":"6358_CR24","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al. Scikit-learn: Machine Learning in Python. arXiv. 2012."},{"key":"6358_CR25","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T, Ohta T, Koyama M, Optuna. A Next-generation Hyperparameter Optimization Framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining\u2014KDD \u201919. New York: ACM Press; 2019. pp. 2623\u201331.","DOI":"10.1145\/3292500.3330701"},{"key":"6358_CR26","doi-asserted-by":"publisher","first-page":"2753","DOI":"10.1007\/978-0-387-39940-9_1091","volume-title":"Encyclopedia of database systems","author":"D Chamberlin","year":"2009","unstructured":"Chamberlin D. SQL. In: Liu L, \u00d6zsu MT, editors. Encyclopedia of database systems. Boston: Springer US; 2009. pp. 2753\u201360."},{"key":"6358_CR27","doi-asserted-by":"crossref","unstructured":"Raasveldt M, M\u00fchleisen H. Duckdb: an embeddable analytical database. Proceedings of the 2019 International Conference on Management of Data. New York: ACM; 2019. pp. 1981\u20134.","DOI":"10.1145\/3299869.3320212"},{"key":"6358_CR28","first-page":"87","volume-title":"Positioning and power in academic publishing: Players, agents and agendas","author":"T Kluyver","year":"2016","unstructured":"Kluyver T, Ragan-Kelley B, P\u00e9rez F, Granger B, Bussonnier M, Frederic J, et al. Jupyter Notebooks\u2014a publishing format for reproducible computational workflows. In: Loizides F, Scmidt B, editors. Positioning and power in academic publishing: Players, agents and agendas. Netherlands: IOS; 2016. pp. 87\u201390."},{"issue":"60","key":"6358_CR29","doi-asserted-by":"publisher","first-page":"3021","DOI":"10.21105\/joss.03021","volume":"6","author":"Waskom M","year":"2021","unstructured":"Waskom M. Seaborn: statistical data visualization. JOSS. 2021;6(60):3021.","journal-title":"JOSS"},{"issue":"3","key":"6358_CR30","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1109\/MCSE.2007.55","volume":"9","author":"JD Hunter","year":"2007","unstructured":"Hunter JD, Matplotlib: A 2D graphics environment. Comput Sci Eng. 2007;9(3):90\u20135.","journal-title":"Comput Sci Eng"},{"key":"6358_CR31","unstructured":"Salesforce. Tableau Public. Salesforce; 2024."},{"issue":"1","key":"6358_CR32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-017-2006-0","volume":"19","author":"A Marco-Ramell","year":"2018","unstructured":"Marco-Ramell A, Palau-Rodriguez M, Alay A, Tulipani S, Urpi-Sarda M, Sanchez-Pla A, et al. Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data. BMC Bioinform. 2018;19(1):1.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"6358_CR33","doi-asserted-by":"publisher","first-page":"bbac553","DOI":"10.1093\/bib\/bbac553","volume":"24","author":"Y Lu","year":"2023","unstructured":"Lu Y, Pang Z, Xia J. Comprehensive investigation of pathway enrichment methods for functional interpretation of LC-MS global metabolomics data. Brief Bioinf. 2023;24(1):bbac553.","journal-title":"Brief Bioinf"},{"issue":"4","key":"6358_CR34","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/j.synbio.2017.11.002","volume":"2","author":"L Wang","year":"2017","unstructured":"Wang L, Dash S, Ng CY, Maranas CD. A review of computational tools for design and reconstruction of metabolic pathways. Synth Syst Biotechnol. 2017;2(4):243\u201352.","journal-title":"Synth Syst Biotechnol"},{"key":"6358_CR35","doi-asserted-by":"publisher","first-page":"634141","DOI":"10.3389\/fmolb.2021.634141","volume":"8","author":"HA Shah","year":"2021","unstructured":"Shah HA, Liu J, Yang Z, Feng J. Review of machine learning methods for the prediction and reconstruction of metabolic pathways. Front Mol Biosci. 2021;8:634141.","journal-title":"Front Mol Biosci"},{"key":"6358_CR36","doi-asserted-by":"crossref","unstructured":"Huckvale ED, Moseley HNB. Chemical representation standardization needed to generalize metabolic pathway involvement prediction across the Kyoto Encyclopedia of Genes and Genomes, Reactome, and MetaCyc knowledgebases. BioRxiv. 2025.","DOI":"10.1101\/2025.04.02.646918"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-025-06358-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-025-06358-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-025-06358-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T10:51:35Z","timestamp":1770202295000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1186\/s12859-025-06358-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,7]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,12]]}},"alternative-id":["6358"],"URL":"https:\/\/doi.org\/10.1186\/s12859-025-06358-z","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,7]]},"assertion":[{"value":"22 November 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 December 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 January 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"36"}}