{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T21:39:34Z","timestamp":1779917974126,"version":"3.53.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"15","license":[{"start":{"date-parts":[[2018,12,24]],"date-time":"2018-12-24T00:00:00Z","timestamp":1545609600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004052","name":"King Abdullah University of Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004052","name":"KAUST","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004052","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Office of Sponsored Research"},{"name":"OSR","award":["FCC\/1\/1976-23"],"award-info":[{"award-number":["FCC\/1\/1976-23"]}]},{"name":"OSR","award":["FCC\/1\/1976-26"],"award-info":[{"award-number":["FCC\/1\/1976-26"]}]},{"name":"OSR","award":["URF\/1\/2602-01"],"award-info":[{"award-number":["URF\/1\/2602-01"]}]},{"name":"OSR","award":["URF\/1\/3007-01"],"award-info":[{"award-number":["URF\/1\/3007-01"]}]},{"name":"OSR","award":["URF\/1\/3450-01"],"award-info":[{"award-number":["URF\/1\/3450-01"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Accurate and wide-ranging prediction of thermodynamic parameters for biochemical reactions can facilitate deeper insights into the workings and the design of metabolic systems.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Here, we introduce a machine learning method with chemical fingerprint-based features for the prediction of the Gibbs free energy of biochemical reactions. From a large pool of 2D fingerprint-based features, this method systematically selects a small number of relevant ones and uses them to construct a regularized linear model. Since a manual selection of 2D structure-based features can be a tedious and time-consuming task, requiring expert knowledge about the structure-activity relationship of chemical compounds, the systematic feature selection step in our method offers a convenient means to identify relevant 2D fingerprint-based features. By comparing our method with state-of-the-art linear regression-based methods for the standard Gibbs free energy prediction, we demonstrated that its prediction accuracy and prediction coverage are most favorable. Our results show direct evidence that a number of 2D fingerprints collectively provide useful information about the Gibbs free energy of biochemical reactions and that our systematic feature selection procedure provides a convenient way to identify them.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Our software is freely available for download at http:\/\/sfb.kaust.edu.sa\/Pages\/Software.aspx.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty1035","type":"journal-article","created":{"date-parts":[[2018,12,19]],"date-time":"2018-12-19T20:13:33Z","timestamp":1545250413000},"page":"2634-2643","source":"Crossref","is-referenced-by-count":15,"title":["Systematic selection of chemical fingerprint features improves the Gibbs energy prediction of biochemical reactions"],"prefix":"10.1093","volume":"35","author":[{"given":"Meshari","family":"Alazmi","sequence":"first","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5333-6729","authenticated-orcid":false,"given":"Hiroyuki","family":"Kuwahara","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Othman","family":"Soufan","sequence":"additional","affiliation":[{"name":"Institute of Parasitology, McGill University, Montreal, Quebec, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lizhong","family":"Ding","sequence":"additional","affiliation":[{"name":"Inception Institute of Artificial Intelligence (IIAI), Abu Dhabi, UAE"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2018,12,24]]},"reference":[{"key":"2023062713153524700_bty1035-B1","doi-asserted-by":"crossref","first-page":"176","DOI":"10.1016\/j.copbio.2015.08.021","article-title":"Heading in the right direction: thermodynamics-based network analysis and pathway engineering","volume":"36","author":"Ataman","year":"2015","journal-title":"Curr. Opin. Biotechnol"},{"key":"2023062713153524700_bty1035-B2","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/j.jtbi.2004.01.008","article-title":"Thermodynamic constraints for biochemical networks","volume":"228","author":"Beard","year":"2004","journal-title":"J. Theor. Biol"},{"key":"2023062713153524700_bty1035-B3","doi-asserted-by":"crossref","first-page":"W389","DOI":"10.1093\/nar\/gku362","article-title":"XTMS: pathway design in an eXTended metabolic space","volume":"42","author":"Carbonell","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062713153524700_bty1035-B4","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.ymeth.2014.08.005","article-title":"Molecular fingerprint similarity search in virtual screening","volume":"71","author":"Cereto-Massagu\u00e9","year":"2015","journal-title":"Methods"},{"key":"2023062713153524700_bty1035-B5","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.drudis.2007.01.011","article-title":"Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches","volume":"12","author":"Eckert","year":"2007","journal-title":"Drug Discov. Today"},{"key":"2023062713153524700_bty1035-B6","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/msb4100155","article-title":"A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information","volume":"3","author":"Feist","year":"2007","journal-title":"Mol. Syst. Biol"},{"key":"2023062713153524700_bty1035-B7","doi-asserted-by":"crossref","first-page":"D770","DOI":"10.1093\/nar\/gkr874","article-title":"eQuilibrator\u2014the biochemical thermodynamics calculator","volume":"40","author":"Flamholz","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062713153524700_bty1035-B8","doi-asserted-by":"crossref","first-page":"2874","DOI":"10.1093\/bioinformatics\/bth314","article-title":"Thermodynamics of enzyme-catalyzed reactions\u2014a database for quantitative biochemistry","volume":"20","author":"Goldberg","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062713153524700_bty1035-B9","doi-asserted-by":"crossref","first-page":"2725","DOI":"10.1038\/ismej.2016.49","article-title":"Microbial diversity arising from thermodynamic constraints","volume":"10","author":"Gro\u00dfkopf","year":"2016","journal-title":"ISME J"},{"key":"2023062713153524700_bty1035-B10","author":"Gunawardena","year":"2003"},{"key":"2023062713153524700_bty1035-B11","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1146\/annurev-chembioeng-080615-034704","article-title":"Thermodynamics of bioreactions","volume":"7","author":"Held","year":"2016","journal-title":"Annu. Rev. Chem. Biomol. Eng"},{"key":"2023062713153524700_bty1035-B12","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1529\/biophysj.105.071720","article-title":"Genome-scale thermodynamic analysis of Escherichia coli metabolism","volume":"90","author":"Henry","year":"2006","journal-title":"Biophys. J"},{"key":"2023062713153524700_bty1035-B13","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1529\/biophysj.107.124784","article-title":"Group contribution method for thermodynamic analysis of complex metabolic networks","volume":"95","author":"Jankowski","year":"2008","journal-title":"Biophys. J"},{"key":"2023062713153524700_bty1035-B14","doi-asserted-by":"crossref","first-page":"7022.","DOI":"10.1038\/srep07022","article-title":"Quantum chemical approach to estimating the thermodynamics of metabolic reactions","volume":"4","author":"Jinich","year":"2014","journal-title":"Sci. Rep"},{"key":"2023062713153524700_bty1035-B15","doi-asserted-by":"crossref","DOI":"10.1038\/msb4100074","article-title":"Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data","volume":"2","author":"K\u00fcmmel","year":"2006","journal-title":"Mol. Syst. Biol"},{"key":"2023062713153524700_bty1035-B16","doi-asserted-by":"crossref","first-page":"W217","DOI":"10.1093\/nar\/gkw342","article-title":"MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind","volume":"44","author":"Kuwahara","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062713153524700_bty1035-B17","doi-asserted-by":"crossref","DOI":"10.1093\/synbio\/ysx001","article-title":"ACRE: absolute concentration robustness exploration in module-based combinatorial networks","volume":"2","author":"Kuwahara","year":"2017","journal-title":"Synth. Biol"},{"key":"2023062713153524700_bty1035-B18","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1016\/j.drudis.2014.10.012","article-title":"Machine-learning approaches in drug discovery: methods and applications","volume":"20","author":"Lavecchia","year":"2015","journal-title":"Drug Discov. Today"},{"key":"2023062713153524700_bty1035-B19","doi-asserted-by":"crossref","first-page":"536.","DOI":"10.1038\/nchembio.970","article-title":"Systems metabolic engineering of microorganisms for natural and non-natural chemicals","volume":"8","author":"Lee","year":"2012","journal-title":"Nat. Chem. Biol"},{"key":"2023062713153524700_bty1035-B20","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1016\/S0098-1354(00)00323-9","article-title":"Recursive MILP model for finding all the alternate optima in LP models for metabolic networks","volume":"24","author":"Lee","year":"2000","journal-title":"Comput. Chem. Eng"},{"key":"2023062713153524700_bty1035-B21","first-page":"1273","article-title":"A note on the lasso and related procedures in model selection","volume":"16","author":"Leng","year":"2006","journal-title":"Stat. Sin"},{"key":"2023062713153524700_bty1035-B22","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/BF01874203","article-title":"A group contribution method for the estimation of equilibrium constants for biochemical reactions","volume":"2","author":"Mavrovouniotis","year":"1988","journal-title":"Biotechnol. Tech"},{"key":"2023062713153524700_bty1035-B23","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1002\/(SICI)1097-0290(19980420)58:2\/3<125::AID-BIT3>3.0.CO;2-N","article-title":"Metabolic engineering: techniques for analysis of targets for genetic manipulations","volume":"58","author":"Nielsen","year":"1998","journal-title":"Biotechnol. Bioeng"},{"key":"2023062713153524700_bty1035-B24","doi-asserted-by":"crossref","first-page":"2037","DOI":"10.1093\/bioinformatics\/bts317","article-title":"An integrated open framework for thermodynamics of reactions that combines accuracy and coverage","volume":"28","author":"Noor","year":"2012","journal-title":"Bioinformatics"},{"key":"2023062713153524700_bty1035-B25","doi-asserted-by":"crossref","first-page":"1003098.","DOI":"10.1371\/journal.pcbi.1003098","article-title":"Consistent estimation of Gibbs energy using component contributions","volume":"9","author":"Noor","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023062713153524700_bty1035-B26","doi-asserted-by":"crossref","first-page":"245.","DOI":"10.1038\/nbt.1614","article-title":"What is flux balance analysis?","volume":"28","author":"Orth","year":"2010","journal-title":"Nat. Biotechnol"},{"key":"2023062713153524700_bty1035-B27","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1111\/rssb.12106","article-title":"Lasso regression: estimation and shrinkage via the limit of Gibbs sampling","volume":"78","author":"Rajaratnam","year":"2016","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023062713153524700_bty1035-B28","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1016\/j.drudis.2011.02.011","article-title":"State-of-the-art in ligand-based virtual screening","volume":"16","author":"Ripphausen","year":"2011","journal-title":"Drug Discov. Today"},{"key":"2023062713153524700_bty1035-B29","doi-asserted-by":"crossref","first-page":"2478","DOI":"10.1016\/j.bpj.2010.02.052","article-title":"IGERS: inferring Gibbs energy changes of biochemical reactions from reaction similarities","volume":"98","author":"Rother","year":"2010","journal-title":"Biophys. J"},{"key":"2023062713153524700_bty1035-B30","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B"},{"key":"2023062713153524700_bty1035-B31","doi-asserted-by":"crossref","first-page":"359","DOI":"10.1016\/j.biortech.2016.02.109","article-title":"Determination of Gibbs energies of formation in aqueous solution using chemical engineering tools","volume":"213","author":"Toure","year":"2016","journal-title":"Bioresour Technol"},{"key":"2023062713153524700_bty1035-B32","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1016\/j.drudis.2006.10.005","article-title":"Similarity-based virtual screening using 2D fingerprints","volume":"11","author":"Willett","year":"2006","journal-title":"Drug Discov. Today"},{"key":"2023062713153524700_bty1035-B33","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1021\/ci9800211","article-title":"Chemical similarity searching","volume":"38","author":"Willett","year":"1998","journal-title":"J. Chem. Inf. Comput. Sci"},{"key":"2023062713153524700_bty1035-B34","doi-asserted-by":"crossref","first-page":"445.","DOI":"10.1038\/nchembio.580","article-title":"Metabolic engineering of Escherichia coli for direct production of 1, 4-butanediol","volume":"7","author":"Yim","year":"2011","journal-title":"Nat. Chem. Biol"},{"key":"2023062713153524700_bty1035-B35","doi-asserted-by":"crossref","first-page":"1418","DOI":"10.1198\/016214506000000735","article-title":"The adaptive lasso and its oracle properties","volume":"101","author":"Zou","year":"2006","journal-title":"J. Am. Stat. Assoc"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/15\/2634\/50722477\/bioinformatics_35_15_2634.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/15\/2634\/50722477\/bioinformatics_35_15_2634.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,13]],"date-time":"2024-07-13T13:44:29Z","timestamp":1720878269000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/15\/2634\/5258101"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2018,12,24]]},"references-count":35,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2019,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty1035","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,8,1]]},"published":{"date-parts":[[2018,12,24]]}}}