{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T17:41:37Z","timestamp":1769794897839,"version":"3.49.0"},"reference-count":28,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T00:00:00Z","timestamp":1769731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec>\n                    <jats:title>Background and motivation<\/jats:title>\n                    <jats:p>\n                      The Michaelis constant Km is one of the key kinetic parameters in the quantification of enzyme-substrate affinity within the context of the Michaelis\u2013Menten theory. While Km values are traditionally subjected to labor-intensive governance via\n                      <jats:italic>in vitro<\/jats:italic>\n                      assays, the brisk expansion of protein sequence and chemical databases has composed an essential intended for computational prediction approaches.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methodology<\/jats:title>\n                    <jats:p>Herein, we expose a consolidative machine learning framework-KmPred-for Km forecast that merges protein sequence embeddings from state-of-the-art language models with molecular descriptors derived from substrate SMILES descriptions. This methodology was benchmarked on the MPEK dataset and the independent dataset assembled by Kroll et al.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results and discussion<\/jats:title>\n                    <jats:p>\n                      On the MPEK dataset, the greatest model achieved a test MSE of 0.4995, RMSE of 0.7067, MAE of 0.5022, R\n                      <jats:sup>2<\/jats:sup>\n                      of 0.7049, and a PCC of 0.8398 (\n                      <jats:italic>p<\/jats:italic>\n                      &amp;lt; 1 \u00d7 10\u22126), outperforming the baseline MPEK model. On the Kroll dataset, KmPred achieved a test MSE of 0.6206, RMSE of 0.7878, R\n                      <jats:sup>2<\/jats:sup>\n                      of 0.5519, PCC of 0.7440, and Spearman\u2019s \u03c1 of 0.7342, which represents reasonable results compared to state-of-the-art methods. These outcomes demonstrate that combining multi-modal protein sequence and ligand features with advanced machine learning architectures enables robust and generalizable Km prediction across diverse datasets. Specifically, we utilized LSTM and Transformer models solely for feature extraction to capture complex sequential and contextual patterns from enzyme sequences, while employing XGBoost as our primary regression model for final Km predictions. Beyond methodological impact, this work highlights the role of AI-driven kinetic modeling in accelerating enzyme characterization, facilitating metabolic engineering, and enhancing drug discovery pipelines. Our approach thus establishes a foundation for predictive enzymology at scale, with significant potential to benefit biotechnology, synthetic biology, and national strategic initiatives such as Saudi Vision 2030.\n                    <\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/frai.2026.1711471","type":"journal-article","created":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T06:41:37Z","timestamp":1769755297000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["KmPred: prediction of Michaelis constants (Km) using an integrative machine learning framework"],"prefix":"10.3389","volume":"9","author":[{"given":"Meshari","family":"Alazmi","sequence":"first","affiliation":[{"name":"College of Computer Science and Engineering, University of Ha\u2019il","place":["Ha\u2019il, Saudi Arabia"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2026,1,30]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat. Methods"},{"key":"ref2","doi-asserted-by":"publisher","first-page":"D498","DOI":"10.1093\/nar\/gkaa1025","article-title":"BRENDA, the ELIXIR core data resource in 2021: new developments and updates","volume":"49","author":"Chang","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"ref3","doi-asserted-by":"crossref","DOI":"10.1145\/2939672.2939785","volume-title":"XGBoost: A scalable tree boosting system","author":"Chen","year":"2016"},{"key":"ref4","doi-asserted-by":"publisher","first-page":"W434","DOI":"10.1093\/nar\/gkac351","article-title":"iFeatureOmega: an integrative platform for engineering, visualization and analysis of features from molecular sequences, structural and ligand data sets","volume":"50","author":"Chen","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"ref5","doi-asserted-by":"crossref","DOI":"10.1002\/9781118540398","volume-title":"Evaluation of enzyme inhibitors in drug discovery","author":"Copeland","year":"2013"},{"key":"ref6","volume-title":"Fundamentals of enzyme kinetics","author":"Cornish-Bowden","year":"2013"},{"key":"ref7","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1021\/ci010132r","article-title":"Reoptimization of MDL keys for use in drug discovery","volume":"42","author":"Durant","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref8","doi-asserted-by":"publisher","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref9","doi-asserted-by":"publisher","first-page":"5252","DOI":"10.1038\/s41467-018-07652-6","article-title":"Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models","volume":"9","author":"Heckmann","year":"2018","journal-title":"Nat. Commun."},{"key":"ref10","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref11","doi-asserted-by":"publisher","first-page":"D545","DOI":"10.1093\/nar\/gkaa970","article-title":"KEGG: integrating viruses and cellular organisms","volume":"49","author":"Kanehisa","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"ref12","volume-title":"Saudi Vision 2030","year":"2016"},{"key":"ref13","doi-asserted-by":"publisher","first-page":"e3001402","DOI":"10.1371\/journal.pbio.3001402","article-title":"Deep learning allows genome-scale prediction of Michaelis constants from structural features","volume":"19","author":"Kroll","year":"2021","journal-title":"PLoS Biol."},{"key":"ref14","doi-asserted-by":"publisher","first-page":"bbab502","DOI":"10.1093\/bib\/bbab502","article-title":"Accurate protein function prediction via graph attention networks with predicted structure information","volume":"23","author":"Lai","year":"2022","journal-title":"Brief. Bioinform."},{"key":"ref15","article-title":"RDKit: Open-source cheminformatics software","author":"Landrum","year":"2016"},{"key":"ref16","doi-asserted-by":"publisher","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"ref17","first-page":"352","article-title":"Die kinetik der invertinwirkung","volume":"49","author":"Michaelis","year":"1913","journal-title":"Biochem. Z."},{"key":"ref18","doi-asserted-by":"publisher","first-page":"1185","DOI":"10.1016\/j.cell.2016.02.004","article-title":"Engineering cellular metabolism","volume":"164","author":"Nielsen","year":"2016","journal-title":"Cell"},{"key":"ref19","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1912.01703","article-title":"Pytorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref20","doi-asserted-by":"publisher","first-page":"2825","DOI":"10.1007\/s10994-011-5276-8","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Machine Learn. Res."},{"key":"ref21","doi-asserted-by":"publisher","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc. Natl. Acad. Sci."},{"key":"ref22","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J. Chem. Inf. Model."},{"key":"ref23","volume-title":"Enzyme kinetics: Behavior and analysis of rapid equilibrium and steady state enzyme systems","author":"Segel","year":"1975"},{"key":"ref24","doi-asserted-by":"publisher","first-page":"2292","DOI":"10.1111\/febs.16404","article-title":"A guide to enzyme kinetics in early drug discovery","volume":"290","author":"Srinivasan","year":"2023","journal-title":"FEBS J."},{"key":"ref25","volume-title":"Molecular descriptors for chemoinformatics: Volume I: Alphabetical listing\/volume II: Appendices, references","author":"Todeschini","year":"2009"},{"key":"ref26","doi-asserted-by":"publisher","first-page":"bbae387","DOI":"10.1093\/bib\/bbae387","article-title":"MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction","volume":"25","author":"Wang","year":"2024","journal-title":"Brief. Bioinform."},{"key":"ref27","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"ref28","doi-asserted-by":"publisher","first-page":"D656","DOI":"10.1093\/nar\/gkx1065","article-title":"SABIO-RK: an updated resource for manually curated biochemical reaction kinetics","volume":"46","author":"Wittig","year":"2018","journal-title":"Nucleic Acids Res."}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2026.1711471\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T06:41:43Z","timestamp":1769755303000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2026.1711471\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,30]]},"references-count":28,"alternative-id":["10.3389\/frai.2026.1711471"],"URL":"https:\/\/doi.org\/10.3389\/frai.2026.1711471","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,30]]},"article-number":"1711471"}}