{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T16:50:36Z","timestamp":1780332636115,"version":"3.54.1"},"reference-count":64,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T00:00:00Z","timestamp":1722211200000},"content-version":"vor","delay-in-days":28,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Understanding the mechanisms and structural mappings between molecules and pathway classes are critical for design of reaction predictors for synthesizing new molecules. This article studies the problem of prediction of classes of metabolic pathways (series of chemical reactions occurring within a cell) in which a given biochemical compound participates. We apply a hybrid machine learning approach consisting of graph convolutional networks used to extract molecular shape features as input to a random forest classifier. In contrast to previously applied machine learning methods for this problem, our framework automatically extracts relevant shape features directly from input SMILES representations, which are atom-bond specifications of chemical structures composing the molecules.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our method is capable of correctly predicting the respective metabolic pathway class of 95.16% of tested compounds, whereas competing methods only achieve an accuracy of 84.92% or less. Furthermore, our framework extends to the task of classification of compounds having mixed membership in multiple pathway classes. Our prediction accuracy for this multi-label task is 95.62%. We analyze the relative importance of various global physicochemical features to the pathway class prediction problem and show that simple linear\/logistic regression models can predict the values of these global features from the shape features extracted using our framework.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/baranwa2\/MetabolicPathwayPrediction.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae359","type":"journal-article","created":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T23:30:47Z","timestamp":1722295847000},"source":"Crossref","is-referenced-by-count":9,"title":["A deep learning architecture for metabolic pathway prediction"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9354-2826","authenticated-orcid":false,"given":"Mayank","family":"Baranwal","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, MI 48109, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Abram","family":"Magner","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University at Albany , SUNY, Albany, NY 12222, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Paolo","family":"Elvati","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jacob","family":"Saldinger","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Angela","family":"Violi","sequence":"additional","affiliation":[{"name":"Department of Mechanical Engineering"},{"name":"Department of Chemical Engineering and Biophysics, University of Michigan , Ann Arbor, MI 48109, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alfred O","family":"Hero","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Michigan , Ann Arbor, MI 48109, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,7,29]]},"reference":[{"key":"2024072915171602300_btae359-B1","doi-asserted-by":"crossref","first-page":"2634","DOI":"10.1093\/bioinformatics\/bty1035","article-title":"Systematic selection of chemical fingerprint features improves the Gibbs energy prediction of biochemical reactions","volume":"35","author":"Alazmi","year":"2018","journal-title":"Bioinformatics"},{"key":"2024072915171602300_btae359-B2","doi-asserted-by":"crossref","first-page":"e0158896","DOI":"10.1371\/journal.pone.0158896","article-title":"Prediction of metabolic pathway involvement in prokaryotic UniProtKB data by association rule mining","volume":"11","author":"Boudellioua","year":"2016","journal-title":"PLoS One"},{"key":"2024072915171602300_btae359-B3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2024072915171602300_btae359-B4","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1007\/s11030-008-9085-9","article-title":"Prediction of compounds\u2019? biological function (metabolic pathways) based on functional group composition","volume":"12","author":"Cai","year":"2008","journal-title":"Mol Divers"},{"key":"2024072915171602300_btae359-B5","doi-asserted-by":"crossref","first-page":"136","DOI":"10.2174\/1386207319666151110122453","article-title":"Predicting the types of metabolic pathway of compounds using molecular fragments and sequential minimal optimization","volume":"19","author":"Chen","year":"2016","journal-title":"Combinatorial Chemistry & High Throughput Screening"},{"key":"2024072915171602300_btae359-B6","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1186\/1752-0509-4-35","article-title":"Prediction of novel synthetic pathways for the production of desired chemicals","volume":"4","author":"Cho","year":"2010","journal-title":"BMC Syst Biol"},{"key":"2024072915171602300_btae359-B7","doi-asserted-by":"crossref","first-page":"370","DOI":"10.1039\/C8SC04228D","article-title":"A graph-convolutional neural network model for the prediction of chemical reactivity","volume":"10","author":"Coley","year":"2019","journal-title":"Chem Sci"},{"key":"2024072915171602300_btae359-B8","doi-asserted-by":"crossref","first-page":"e0181991","DOI":"10.1371\/journal.pone.0181991","article-title":"A data mining approach for identifying pathway-gene biomarkers for predicting clinical outcome: a case study of erlotinib and sorafenib","volume":"12","author":"Covell","year":"2017","journal-title":"PLoS One"},{"key":"2024072915171602300_btae359-B9","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/1471-2105-11-15","article-title":"Machine learning methods for metabolic pathway prediction","volume":"11","author":"Dale","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2024072915171602300_btae359-B10","doi-asserted-by":"crossref","first-page":"1895","DOI":"10.1162\/089976698300017197","article-title":"Approximate statistical tests for comparing supervised classification learning algorithms","volume":"10","author":"Dietterich","year":"1998","journal-title":"Neural Comput"},{"key":"2024072915171602300_btae359-B11","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.trac.2004.11.021","article-title":"Metabolomics: current analytical platforms and methodologies","volume":"24","author":"Dunn","year":"2005","journal-title":"Trends Analyt Chem"},{"key":"2024072915171602300_btae359-B12","doi-asserted-by":"crossref","first-page":"W427","DOI":"10.1093\/nar\/gkn315","article-title":"The university of Minnesota pathway prediction system: predicting metabolic logic","volume":"36","author":"Ellis","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B13","doi-asserted-by":"crossref","first-page":"140","DOI":"10.2174\/1386207319666161215142130","article-title":"A binary classifier for prediction of the types of metabolic pathway of chemicals","volume":"20","author":"Fang","year":"2017","journal-title":"Combinatorial Chemistry & High Throughput Screening"},{"key":"2024072915171602300_btae359-B14","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/978-94-010-0448-0_11","volume-title":"Functional Genomics","author":"Fiehn","year":"2002"},{"key":"2024072915171602300_btae359-B15","doi-asserted-by":"crossref","first-page":"e45944","DOI":"10.1371\/journal.pone.0045944","article-title":"Predicting metabolic pathways of small molecules and enzymes based on interaction information of chemicals and proteins","volume":"7","author":"Gao","year":"2012","journal-title":"PLoS One"},{"key":"2024072915171602300_btae359-B16","doi-asserted-by":"crossref","first-page":"3784","DOI":"10.1093\/nar\/gkg563","article-title":"ExPaSy: the proteomics server for in-depth protein?knowledge and analysis","volume":"31","author":"Gasteiger","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B17","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1021\/cc9800071","article-title":"A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases","volume":"1","author":"Ghose","year":"1999","journal-title":"J Comb Chem"},{"key":"2024072915171602300_btae359-B18","author":"Goh","year":"2017"},{"key":"2024072915171602300_btae359-B19","volume-title":"Deep Learning","author":"Goodfellow","year":"2016"},{"key":"2024072915171602300_btae359-B20","doi-asserted-by":"crossref","first-page":"670","DOI":"10.2174\/1386207322666181206112641","article-title":"A network integration method for deciphering the types of metabolic pathway of chemicals with heterogeneous information","volume":"21","author":"Guo","year":"2018","journal-title":"Combinatorial Chemistry & High Throughput Screening"},{"key":"2024072915171602300_btae359-B21","doi-asserted-by":"crossref","first-page":"709","DOI":"10.1021\/ci500517v","article-title":"Metabolic pathway predictions for metabolomics: a molecular structure matching approach","volume":"55","author":"Hamdalla","year":"2015","journal-title":"J Chem Inform Model"},{"key":"2024072915171602300_btae359-B22","doi-asserted-by":"crossref","first-page":"e29491","DOI":"10.1371\/journal.pone.0029491","article-title":"Predicting biological functions of compounds based on chemical-chemical interactions","volume":"6","author":"Hu","year":"2011","journal-title":"PLoS One"},{"key":"2024072915171602300_btae359-B23","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: Kyoto encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B24","doi-asserted-by":"crossref","first-page":"D354","DOI":"10.1093\/nar\/gkj102","article-title":"From genomics to chemical genomics: new developments in KEGG","volume":"34","author":"Kanehisa","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B25","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1093\/nar\/28.1.56","article-title":"The EcoCyc and MetaCyc databases","volume":"28","author":"Karp","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B26","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1093\/bib\/bbp043","article-title":"Pathway tools version 13.0: integrated software for pathway\/genome informatics and systems biology","volume":"11","author":"Karp","year":"2009","journal-title":"Brief Bioinform"},{"key":"2024072915171602300_btae359-B27","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1109\/TSMC.1985.6313426","article-title":"A fuzzy K-nearest neighbor algorithm","volume":"SMC-15","author":"Keller","year":"1985","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"key":"2024072915171602300_btae359-B28","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1007\/978-1-4842-2766-4_12","volume-title":"Deep Learning with Python","author":"Ketkar","year":"2017"},{"key":"2024072915171602300_btae359-B29","doi-asserted-by":"crossref","first-page":"398","DOI":"10.1093\/bioinformatics\/btv578","article-title":"FogLight: an efficient matrix-based approach to construct metabolic pathways by search space reduction","volume":"32","author":"Khosraviani","year":"2015","journal-title":"Bioinformatics"},{"key":"2024072915171602300_btae359-B30","author":"Kingma","year":"2014"},{"key":"2024072915171602300_btae359-B31","author":"Kipf","year":"2017"},{"key":"2024072915171602300_btae359-B32","doi-asserted-by":"crossref","first-page":"W217","DOI":"10.1093\/nar\/gkw342","article-title":"MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind","volume":"44","author":"Kuwahara","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B33","author":"Landrum","year":"2006"},{"key":"2024072915171602300_btae359-B34","doi-asserted-by":"crossref","first-page":"4283","DOI":"10.1021\/acs.jmedchem.7b01120","article-title":"Importance of rigidity in designing small molecule drugs to tackle protein-protein interactions (PPIs) through stabilization of desired conformers: miniperspective","volume":"61","author":"Lawson","year":"2018","journal-title":"J Med Chem"},{"key":"2024072915171602300_btae359-B35","doi-asserted-by":"crossref","first-page":"760","DOI":"10.1093\/bioinformatics\/btx680","article-title":"DEEPre: sequence-based enzyme EC number prediction by deep learning","volume":"34","author":"Li","year":"2017","journal-title":"Bioinformatics"},{"key":"2024072915171602300_btae359-B36","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/S0169-409X(96)00423-1","article-title":"Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings","volume":"23","author":"Lipinski","year":"1997","journal-title":"Adv Drug Deliv Rev"},{"key":"2024072915171602300_btae359-B37","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1021\/ci500747n","article-title":"Deep neural nets as a method for quantitative structure\u2013activity relationships","volume":"55","author":"Ma","year":"2015","journal-title":"J Chem Inform Model"},{"key":"2024072915171602300_btae359-B38","doi-asserted-by":"crossref","first-page":"2272","DOI":"10.1021\/ci900196u","article-title":"Mapping human metabolic pathways in the small molecule chemical space","volume":"49","author":"Macchiarulo","year":"2009","journal-title":"J Chem Inform Model"},{"key":"2024072915171602300_btae359-B39","doi-asserted-by":"crossref","first-page":"80","DOI":"10.3389\/fenvs.2015.00080","article-title":"DeepTox: toxicity prediction using deep learning","volume":"3","author":"Mayr","year":"2016","journal-title":"Front Environ Sci"},{"key":"2024072915171602300_btae359-B40","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1021\/ci6004178","article-title":"TMACC: interpretable correlation descriptors for quantitative structure-activity relationships","volume":"47","author":"Melville","year":"2007","journal-title":"J Chem Inform Model"},{"key":"2024072915171602300_btae359-B41","doi-asserted-by":"crossref","first-page":"e61318","DOI":"10.1371\/journal.pone.0061318","article-title":"Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties","volume":"8","author":"Menden","year":"2013","journal-title":"PLoS One"},{"key":"2024072915171602300_btae359-B42","first-page":"178","author":"Mendes","year":"2000"},{"key":"2024072915171602300_btae359-B43","author":"Molnar","year":"2019"},{"key":"2024072915171602300_btae359-B44","doi-asserted-by":"crossref","first-page":"2344","DOI":"10.1073\/pnas.1817074116","article-title":"Robust predictions of specialized metabolism genes through machine learning","volume":"116","author":"Moore","year":"2019","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024072915171602300_btae359-B45","doi-asserted-by":"crossref","first-page":"W138","DOI":"10.1093\/nar\/gkq318","article-title":"PathPred: an enzyme-catalyzed metabolic pathway prediction server","volume":"38","author":"Moriya","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B46","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1038\/nrd728","article-title":"Metabonomics: a platform for studying drug toxicity and gene function","volume":"1","author":"Nicholson","year":"2002","journal-title":"Nat Rev Drug Discov"},{"key":"2024072915171602300_btae359-B47","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1023\/A:1008130001697","article-title":"Property distribution of drug-related chemical databases","volume":"14","author":"Oprea","year":"2000","journal-title":"J Comput Aided Mol Des"},{"key":"2024072915171602300_btae359-B48","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2024072915171602300_btae359-B49","doi-asserted-by":"crossref","first-page":"W714","DOI":"10.1093\/nar\/gkl228","article-title":"The path-a metabolic pathway prediction web server","volume":"34","author":"Pireddu","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2024072915171602300_btae359-B50","doi-asserted-by":"crossref","first-page":"1011","DOI":"10.1016\/j.drudis.2009.07.014","article-title":"The impact of aromatic ring count on compound developability are too many aromatic rings a liability in drug design?","volume":"14","author":"Ritchie","year":"2009","journal-title":"Drug Discov Today"},{"key":"2024072915171602300_btae359-B51","doi-asserted-by":"crossref","first-page":"3955","DOI":"10.1093\/bioinformatics\/btx481","article-title":"Predicting novel metabolic pathways through subgraph mining","volume":"33","author":"Sankar","year":"2017","journal-title":"Bioinformatics"},{"key":"2024072915171602300_btae359-B52","doi-asserted-by":"crossref","first-page":"e43","DOI":"10.1371\/journal.pcbi.0030043","article-title":"Deciphering protein\u2013protein interactions. Part II. Computational methods to predict protein and domain interaction partners","volume":"3","author":"Shoemaker","year":"2007","journal-title":"PLoS Comput Biol"},{"key":"2024072915171602300_btae359-B53","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1124\/pr.112.007336","article-title":"Computational methods in drug discovery","volume":"66","author":"Sliwoski","year":"2014","journal-title":"Pharmacol Rev"},{"key":"2024072915171602300_btae359-B54","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1089\/cmb.1998.5.747","article-title":"A database for cell signaling networks","volume":"5","author":"Takai-Igarashi","year":"1998","journal-title":"J Comput Biol"},{"key":"2024072915171602300_btae359-B55","first-page":"160","article-title":"An integrated database SPAD (signaling pathway database) for signal transduction and genetic information","volume":"6","author":"Tateishi","year":"1995","journal-title":"Genome Inform"},{"key":"2024072915171602300_btae359-B56","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1093\/bioinformatics\/bty535","article-title":"Compound\u2013protein interaction prediction with end-to-end learning of neural networks for graphs and sequences","volume":"35","author":"Tsubaki","year":"2018","journal-title":"Bioinformatics"},{"key":"2024072915171602300_btae359-B57","doi-asserted-by":"crossref","first-page":"2615","DOI":"10.1021\/jm020017n","article-title":"Molecular properties that influence the oral bioavailability of drug candidates","volume":"45","author":"Veber","year":"2002","journal-title":"J Med Chem"},{"key":"2024072915171602300_btae359-B58","article-title":"Classification of skin disease using ensemble data mining techniques","volume":"20","author":"Verma","year":"1887","journal-title":"Asian Pac J Cancer Prev"},{"key":"2024072915171602300_btae359-B59","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1016\/j.synbio.2017.11.002","article-title":"A review of computational tools for design and reconstruction of metabolic pathways","volume":"2","author":"Wang","year":"2017","journal-title":"Synth Syst Biotechnol"},{"key":"2024072915171602300_btae359-B60","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1021\/ci990307l","article-title":"Prediction of physicochemical parameters by atomic contributions","volume":"39","author":"Wildman","year":"1999","journal-title":"J Chem Inf Comput Sci"},{"key":"2024072915171602300_btae359-B61","first-page":"6412","author":"You","year":"2018"},{"key":"2024072915171602300_btae359-B62","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/j.cels.2018.08.001","article-title":"Machine learning predicts the yeast metabolome from the quantitative proteome of kinase knockouts","volume":"7","author":"Zelezniak","year":"2018","journal-title":"Cell Syst"},{"key":"2024072915171602300_btae359-B63","doi-asserted-by":"crossref","first-page":"634","DOI":"10.1016\/j.neucom.2017.08.044","article-title":"Multi-target deep neural networks: theoretical analysis and implementation","volume":"273","author":"Zeng","year":"2018","journal-title":"Neurocomputing"},{"key":"2024072915171602300_btae359-B64","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.neucom.2018.02.097","article-title":"Protein\u2013protein interactions prediction based on ensemble deep neural networks","volume":"324","author":"Zhang","year":"2019","journal-title":"Neurocomputing"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/7\/btae359\/58667783\/btae359.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/7\/btae359\/58667783\/btae359.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T23:31:17Z","timestamp":1722295877000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae359\/7722000"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2024,7,1]]},"references-count":64,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae359","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,7,1]]},"article-number":"btae359"}}