{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T10:20:29Z","timestamp":1768818029594,"version":"3.49.0"},"reference-count":77,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T00:00:00Z","timestamp":1748304000000},"content-version":"vor","delay-in-days":26,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["R01GM149705"],"award-info":[{"award-number":["R01GM149705"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["R01AG057555"],"award-info":[{"award-number":["R01AG057555"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1955260"],"award-info":[{"award-number":["1955260"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et\u00a0al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et\u00a0al.\u2019s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research &amp; development.<\/jats:p>","DOI":"10.1093\/bib\/bbaf212","type":"journal-article","created":{"date-parts":[[2025,5,15]],"date-time":"2025-05-15T06:27:05Z","timestamp":1747290425000},"source":"Crossref","is-referenced-by-count":2,"title":["NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2691-8867","authenticated-orcid":false,"given":"Jingchen","family":"Zhai","sequence":"first","affiliation":[{"name":"Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh , 3501 Terrace St, Pittsburgh, PA 15261 ,","place":["United States"]}]},{"given":"Xiguang","family":"Qi","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh , 3501 Terrace St, Pittsburgh, PA 15261 ,","place":["United States"]}]},{"given":"Lianjin","family":"Cai","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh , 3501 Terrace St, Pittsburgh, PA 15261 ,","place":["United States"]}]},{"given":"Yue","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh , 3501 Terrace St, Pittsburgh, PA 15261 ,","place":["United States"]}]},{"given":"Haocheng","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh , 3501 Terrace St, Pittsburgh, PA 15261 ,","place":["United States"]}]},{"given":"Lei","family":"Xie","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Hunter College, The City University of New York , 695 Park Ave, New York, NY 10065 ,","place":["United States"]},{"name":"Helen & Robert Appel Alzheimer's Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University , 413 E 69th St, New York, NY 10021 ,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9607-8229","authenticated-orcid":false,"given":"Junmei","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh , 3501 Terrace St, Pittsburgh, PA 15261 ,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,15]]},"reference":[{"key":"2025052711504211200_ref1","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1006\/bioo.2002.1246","article-title":"The application and usefulness of the ratio kcat\/KM","volume":"30","author":"Koshland","year":"2002","journal-title":"Bioorg Chem"},{"key":"2025052711504211200_ref2","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1016\/j.tibtech.2007.03.010","article-title":"Catalytic efficiency and kcat\/KM: a useful comparator?","volume":"25","author":"Eisenthal","year":"2007","journal-title":"Trends Biotechnol"},{"key":"2025052711504211200_ref3","doi-asserted-by":"publisher","volume-title":"Preface Methods Enzymol","author":"Lorsch","DOI":"10.1016\/B978-0-12-420070-8.09988-8"},{"key":"2025052711504211200_ref4","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1080\/02648725.2010.10648157","article-title":"Usefulness of kinetic enzyme parameters in biotechnological practice","volume":"27","author":"Carrillo","year":"2010","journal-title":"Biotechnol Genet Eng Rev"},{"key":"2025052711504211200_ref5","doi-asserted-by":"publisher","first-page":"8211","DOI":"10.1038\/s41467-023-44113-1","article-title":"UniKP: a unified framework for the prediction of enzyme kinetic parameters","volume":"14","author":"Yu","year":"2023","journal-title":"Nat Commun"},{"key":"2025052711504211200_ref6","doi-asserted-by":"publisher","first-page":"4713","DOI":"10.1021\/bi00747a026","article-title":"Anomalous pH dependence of kcat\/KM in enzyme reactions. Rate constants for the association of chymotrypsin with substrates","volume":"12","author":"Renard","year":"1973","journal-title":"Biochemistry"},{"key":"2025052711504211200_ref7","doi-asserted-by":"publisher","first-page":"3401","DOI":"10.1073\/pnas.1514240113","article-title":"Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcat measurements","volume":"113","author":"Davidi","year":"2016","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025052711504211200_ref8","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1146\/annurev-anchem-061417-125619","article-title":"Methods of measuring enzyme activity ex vivo and In vivo","volume":"11","author":"Ou","year":"2018","journal-title":"Annu Rev Anal Chem (Palo Alto Calif)"},{"key":"2025052711504211200_ref9","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1186\/s13007-017-0218-y","article-title":"Kinetic modelling: an integrated approach to analyze enzyme activity assays","volume":"13","author":"Boeckx","year":"2017","journal-title":"Plant Methods"},{"key":"2025052711504211200_ref10","doi-asserted-by":"publisher","first-page":"1201","DOI":"10.1016\/1357-2725(95)00075-Z","article-title":"Purification and kinetic characterization of \u03b3-aminobutyraldehyde dehydrogenase from rat liver","volume":"27","author":"Testore","year":"1995","journal-title":"Int J Biochem Cell Biol"},{"key":"2025052711504211200_ref11","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-0716-1554-6_10","article-title":"Enzyme kinetics, pharmacokinetics, and inhibition of aldehyde oxidase","volume":"2342","author":"Paragas","year":"2021","journal-title":"Enzyme Kinetics Drug Metabol Fundamentals Appl"},{"key":"2025052711504211200_ref12","doi-asserted-by":"publisher","first-page":"1117","DOI":"10.1016\/S0090-9556(24)15034-9","article-title":"Heterologous expression and kinetic characterization of human cytochromes P-450: validation of a pharmaceutical tool for drug metabolism research","volume":"27","author":"Masimirembwa","year":"2018","journal-title":"Drug Metab Dispos"},{"key":"2025052711504211200_ref13","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1016\/0006-291X(81)91878-7","article-title":"New human liver alcohol dehydrogenase forms with unique kinetic characteristics","volume":"98","author":"Par\u00e9s","year":"1981","journal-title":"Biochem Biophys Res Commun"},{"key":"2025052711504211200_ref14","doi-asserted-by":"publisher","first-page":"9641","DOI":"10.1074\/jbc.C100745200","article-title":"Kinetic characterization of compound I formation in the thermostable cytochrome P450 CYP119","volume":"277","author":"Kellner","year":"2002","journal-title":"J Biol Chem"},{"key":"2025052711504211200_ref15","doi-asserted-by":"publisher","first-page":"9779","DOI":"10.1039\/c3cc45250f","article-title":"A highly selective probe for human cytochrome P450 3A4: isoform selectivity, kinetic characterization and its applications","volume":"49","author":"Ge","year":"2013","journal-title":"Chem Commun"},{"key":"2025052711504211200_ref16","doi-asserted-by":"publisher","first-page":"549","DOI":"10.1046\/j.1365-2125.1997.t01-1-00626.x","article-title":"Characterization of the human cytochrome P450 enzymes involved in the metabolism of dihydrocodeine","volume":"44","author":"Kirkwood","year":"1997","journal-title":"Br J Clin Pharmacol"},{"key":"2025052711504211200_ref17","doi-asserted-by":"publisher","first-page":"23313","DOI":"10.1074\/jbc.M605511200","article-title":"The ferrous-dioxygen intermediate in human cytochrome P450 3A4: substrate dependence of formation and decay kinetics","volume":"281","author":"Denisov","year":"2006","journal-title":"J Biol Chem"},{"key":"2025052711504211200_ref18","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1006\/taap.1996.0326","article-title":"The kinetics of aflatoxin B1oxidation by human cDNA-expressed and human liver microsomal cytochromes P450 1A2 and 3A4","volume":"141","author":"Gallagher","year":"1996","journal-title":"Toxicol Appl Pharmacol"},{"key":"2025052711504211200_ref19","doi-asserted-by":"publisher","first-page":"1423","DOI":"10.1042\/BCJ20160101","article-title":"Kinetic characterization and regulation of the human retinaldehyde dehydrogenase 2 enzyme during production of retinoic acid","volume":"473","author":"Shabtai","year":"2016","journal-title":"Biochem J"},{"key":"2025052711504211200_ref20","doi-asserted-by":"publisher","first-page":"1903","DOI":"10.1021\/acs.chemrestox.4c00298","article-title":"Discovery and enzyme kinetic characterization of novel CYP2D6 variants","volume":"37","author":"Zhong","year":"2024","journal-title":"Chem Res Toxicol"},{"key":"2025052711504211200_ref21","author":"Boorla"},{"key":"2025052711504211200_ref22","first-page":"80","article-title":"Prediction of enzyme kinetic parameters based on statistical learning","volume":"17","author":"Borger","year":"2006","journal-title":"Genome Inform"},{"key":"2025052711504211200_ref23","doi-asserted-by":"publisher","first-page":"1758","DOI":"10.1080\/10408398.2012.725112","article-title":"Enzyme kinetics modeling as a tool to optimize food industry: a pragmatic approach based on amylolytic enzymes","volume":"55","author":"Galanakis","year":"2015","journal-title":"Crit Rev Food Sci Nutr"},{"key":"2025052711504211200_ref24","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1016\/j.bej.2005.04.001","article-title":"A new approach for determination of enzyme kinetic constants using response surface methodology","volume":"25","author":"Boyac\u0131","year":"2005","journal-title":"Biochem Eng J"},{"key":"2025052711504211200_ref25","first-page":"7","article-title":"Benefits of enzyme kinetics modelling","volume":"17","author":"Vasic-Racki","year":"2003","journal-title":"Chem Biochem Eng Q"},{"key":"2025052711504211200_ref26","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-62703-758-7_3","article-title":"Different enzyme kinetic models","volume":"1113","author":"Seibert","year":"2014","journal-title":"Enzyme Kinetics Drug Metabolism Fundamentals Appl"},{"key":"2025052711504211200_ref27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-8-373","article-title":"qPIPSA: relating enzymatic kinetic parameters and interaction fields","volume":"8","author":"Gabdoulline","year":"2007","journal-title":"BMC Bioinform"},{"key":"2025052711504211200_ref28","doi-asserted-by":"publisher","first-page":"btae652","DOI":"10.1093\/bioinformatics\/btae652","article-title":"ENKIE: a package for predicting enzyme kinetic parameter values and their uncertainties","volume":"40","author":"Gollub","year":"2024","journal-title":"Bioinformatics"},{"key":"2025052711504211200_ref29","doi-asserted-by":"publisher","first-page":"101094","DOI":"10.1016\/j.checat.2024.101094","article-title":"EITLEM-kinetics: a deep-learning framework for kinetic parameter prediction of mutant enzymes","volume":"4","author":"Shen","year":"2024","journal-title":"Chem Catalysis"},{"key":"2025052711504211200_ref30","doi-asserted-by":"publisher","first-page":"662","DOI":"10.1038\/s41929-022-00798-z","article-title":"Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction","volume":"5","author":"Li","year":"2022","journal-title":"Nature Catalysis"},{"key":"2025052711504211200_ref31","doi-asserted-by":"publisher","article-title":"DLKcat cannot predict meaningful kcat values for mutants and unfamiliar enzymes","author":"Kroll","DOI":"10.1101\/2023.02.06.526991"},{"key":"2025052711504211200_ref32","doi-asserted-by":"publisher","DOI":"10.1101\/2023.02.06.526991","article-title":"Machine learning models for the prediction of enzyme properties should be tested on proteins not used for model training","author":"Kroll","year":"2023","journal-title":"bioRxiv"},{"key":"2025052711504211200_ref33","doi-asserted-by":"publisher","first-page":"5252","DOI":"10.1038\/s41467-018-07652-6","article-title":"Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models","volume":"9","author":"Heckmann","year":"2018","journal-title":"Nat Commun"},{"key":"2025052711504211200_ref34","doi-asserted-by":"publisher","first-page":"4139","DOI":"10.1038\/s41467-023-39840-4","article-title":"Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning","volume":"14","author":"Kroll","year":"2023","journal-title":"Nat Commun"},{"key":"2025052711504211200_ref35","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae387","article-title":"MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction","volume":"25","author":"Wang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025052711504211200_ref36","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbad506","article-title":"DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates","volume":"25","author":"Qiu","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025052711504211200_ref37","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbae409","article-title":"DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures","volume":"25","author":"Wang","year":"2024","journal-title":"Brief Bioinform"},{"key":"2025052711504211200_ref38","doi-asserted-by":"publisher","first-page":"2072","DOI":"10.1038\/s41467-025-57215-9","article-title":"CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters","volume":"16","author":"Boorla","year":"2025","journal-title":"Nat Commun"},{"key":"2025052711504211200_ref39","doi-asserted-by":"publisher","first-page":"863","DOI":"10.1016\/j.jmb.2003.08.057","article-title":"How well is enzyme function conserved as a function of pairwise sequence identity?","volume":"333","author":"Tian","year":"2003","journal-title":"J Mol Biol"},{"key":"2025052711504211200_ref40","first-page":"19773","volume-title":"International conference on machine learning","author":"Shen"},{"key":"2025052711504211200_ref41","article-title":"Leave-one-out distinguishability in machine learning","author":"Ye","year":"2023"},{"key":"2025052711504211200_ref42","doi-asserted-by":"publisher","first-page":"430","DOI":"10.1016\/j.apsb.2016.04.004","article-title":"PBPK modeling and simulation in drug research and development","volume":"6","author":"Zhuang","year":"2016","journal-title":"Acta Pharmaceutica Sinica B"},{"key":"2025052711504211200_ref43","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1038\/clpt.2010.298","article-title":"Applications of physiologically based pharmacokinetic (PBPK) modeling and simulation during regulatory review","volume":"89","author":"Zhao","year":"2011","journal-title":"Clin Pharmacol Therapeut"},{"key":"2025052711504211200_ref44","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1016\/j.ejps.2013.09.008","article-title":"PBPK models for the prediction of in vivo performance of oral dosage forms","volume":"57","author":"Kostewicz","year":"2014","journal-title":"Eur J Pharm Sci"},{"key":"2025052711504211200_ref45","doi-asserted-by":"publisher","DOI":"10.1002\/9781119497813","volume-title":"Physiologically Based Pharmacokinetic (PBPK) Modeling and Simulations: Principles, Methods, and Applications in the Pharmaceutical Industry","author":"Peters","year":"2021"},{"key":"2025052711504211200_ref46","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1208\/s12248-012-9446-2","article-title":"Dose selection based on physiologically based pharmacokinetic (PBPK) approaches","volume":"15","author":"Jones","year":"2013","journal-title":"AAPS J"},{"key":"2025052711504211200_ref47","doi-asserted-by":"publisher","first-page":"1339","DOI":"10.1142\/s2737416524500479","article-title":"Graph-based bidirectional transformer decision threshold adjustment algorithm for class-imbalanced molecular data","volume":"23","author":"Hayes","year":"2024","journal-title":"J Comput Biophys Chem"},{"key":"2025052711504211200_ref48","doi-asserted-by":"publisher","first-page":"bbac131","DOI":"10.1093\/bib\/bbac131","article-title":"Knowledge-based BERT: a method to extract molecular features like computational chemists","volume":"23","author":"Wu","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025052711504211200_ref49","doi-asserted-by":"publisher","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","journal-title":"Science"},{"key":"2025052711504211200_ref50","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2406285121","article-title":"Protein language models learn evolutionary statistics of interacting sequence motifs","volume":"121","author":"Zhang","year":"2024","journal-title":"Proc Natl Acad Sci"},{"key":"2025052711504211200_ref51","doi-asserted-by":"publisher","first-page":"8749","DOI":"10.1021\/acs.jmedchem.9b00959","article-title":"Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism","volume":"63","author":"Xiong","year":"2020","journal-title":"J Med Chem"},{"key":"2025052711504211200_ref52","doi-asserted-by":"publisher","first-page":"bbac408","DOI":"10.1093\/bib\/bbac408","article-title":"FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction","volume":"23","author":"Cai","year":"2022","journal-title":"Brief Bioinform"},{"key":"2025052711504211200_ref53","first-page":"507","volume-title":"International Conference on Intelligent Computing","author":"Lei"},{"key":"2025052711504211200_ref54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00479-8","article-title":"Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models","volume":"13","author":"Jiang","year":"2021","journal-title":"J Chem"},{"key":"2025052711504211200_ref55","doi-asserted-by":"publisher","first-page":"2981","DOI":"10.1093\/bioinformatics\/btab195","article-title":"FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction","volume":"37","author":"Zhang","year":"2021","journal-title":"Bioinformatics"},{"key":"2025052711504211200_ref56","doi-asserted-by":"publisher","first-page":"5975","DOI":"10.1021\/acs.jcim.2c01290","article-title":"ALipSol: an attention-driven mixture-of-experts model for lipophilicity and solubility prediction","volume":"62","author":"Wu","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2025052711504211200_ref57","article-title":"Pytorch: an imperative style, high-performance deep learning library","volume":"721","author":"Paszke","year":"2019","journal-title":"Adv Neural Inform Processing Syst"},{"key":"2025052711504211200_ref58","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.623","article-title":"The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation","volume":"7","author":"Chicco","year":"2021","journal-title":"PeerJ Comput Sci"},{"key":"2025052711504211200_ref59","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.5194\/gmd-7-1247-2014","article-title":"Root mean square error (RMSE) or mean absolute error (MAE)?\u2013arguments against avoiding RMSE in the literature","volume":"7","author":"Chai","year":"2014","journal-title":"Geosci Model Dev"},{"key":"2025052711504211200_ref60","first-page":"1","article-title":"Root mean square error (RMSE) or mean absolute error (MAE): when to use them or not","volume":"2022","author":"Hodson","year":"2022","journal-title":"Geosci Model Develop Discuss"},{"key":"2025052711504211200_ref61","doi-asserted-by":"publisher","first-page":"2140","DOI":"10.1021\/ci800253u","article-title":"External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean","volume":"48","author":"Schuurmann","year":"2008","journal-title":"J Chem Inf Model"},{"key":"2025052711504211200_ref62","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-62703-646-7_6","article-title":"Clustal omega, accurate alignment of very large numbers of sequences","volume":"1079","author":"Sievers","year":"2014","journal-title":"Multiple Seq Alignment Methods"},{"key":"2025052711504211200_ref63","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-0716-1036-7_1","article-title":"The clustal omega multiple alignment package","volume":"2231","author":"Sievers","year":"2021","journal-title":"Multiple Seq Alignment: Methods Protocols"},{"key":"2025052711504211200_ref64","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1385\/0896032760","article-title":"CLUSTAL V: multiple alignment of DNA and protein sequences","volume":"25","author":"Griffin","year":"1994","journal-title":"Comput Anal Seq Data: Part II"},{"key":"2025052711504211200_ref65","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-5-113","article-title":"MUSCLE: a multiple sequence alignment method with reduced time and space complexity","volume":"5","author":"Edgar","year":"2004","journal-title":"BMC Bioinform"},{"key":"2025052711504211200_ref66","doi-asserted-by":"publisher","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2025052711504211200_ref67","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-04969-4_4","volume-title":"Advances in Sequence Analysis: Theory, Method, Applications 51\u201373","author":"Elzinga","year":"2014"},{"key":"2025052711504211200_ref68","doi-asserted-by":"publisher","first-page":"1396","DOI":"10.1093\/bioinformatics\/btv006","article-title":"Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification","volume":"31","author":"Borozan","year":"2015","journal-title":"Bioinformatics"},{"key":"2025052711504211200_ref69","doi-asserted-by":"publisher","first-page":"891","DOI":"10.1002\/prot.21770","article-title":"Sequence-similar, structure-dissimilar protein pairs in the PDB","volume":"71","author":"Kosloff","year":"2008","journal-title":"Proteins: Struct Func Bioinform"},{"key":"2025052711504211200_ref70","first-page":"1255","volume-title":"2010 IEEE fifth international conference on bio-inspired computing: theories and applications (BIC-TA)","author":"Zhang"},{"key":"2025052711504211200_ref71","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.2174\/1574893616999210805165628","article-title":"New method for sequence similarity analysis based on the position and frequency of statistically significant repeats","volume":"16","author":"Jovanovic","year":"2021","journal-title":"Curr Bioinform"},{"key":"2025052711504211200_ref72","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1504\/IJDS.2018.10011822","article-title":"Sequence similarity using composition method","volume":"3","author":"Munjal","year":"2018","journal-title":"Int J Data Sci"},{"key":"2025052711504211200_ref73","doi-asserted-by":"publisher","first-page":"342","DOI":"10.1016\/j.jmgm.2017.07.019","article-title":"Similarity\/dissimilarity calculation methods of DNA sequences: a survey","volume":"76","author":"Jin","year":"2017","journal-title":"J Mol Graph Model"},{"key":"2025052711504211200_ref74","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1006\/jmbi.1997.1525","article-title":"Empirical statistical estimates for sequence similarity searches","volume":"276","author":"Pearson","year":"1998","journal-title":"J Mol Biol"},{"key":"2025052711504211200_ref75","first-page":"348","volume-title":"2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE","author":"Triandini"},{"key":"2025052711504211200_ref76","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1758-2946-4-22","article-title":"Towards a universal SMILES representation-a standard method to generate canonical SMILES based on the InChI","volume":"4","author":"O\u2019Boyle","year":"2012","journal-title":"J Chem"},{"key":"2025052711504211200_ref77","article-title":"SMILES enumeration as data augmentation for neural network modeling of molecules","author":"Bjerrum","year":"2017"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/3\/bbaf212\/63183249\/bbaf212.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/3\/bbaf212\/63183249\/bbaf212.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T15:50:50Z","timestamp":1748361050000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf212\/8131740"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,1]]},"references-count":77,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf212","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,5,1]]},"article-number":"bbaf212"}}