{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T21:42:39Z","timestamp":1775252559788,"version":"3.50.1"},"reference-count":55,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2021,12,9]],"date-time":"2021-12-09T00:00:00Z","timestamp":1639008000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100007155","name":"Medical Research Council","doi-asserted-by":"publisher","award":["MR\/M026302\/1"],"award-info":[{"award-number":["MR\/M026302\/1"]}],"id":[{"id":"10.13039\/501100007155","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Health and Medical Research Council of Australia","award":["GNT1174405"],"award-info":[{"award-number":["GNT1174405"]}]},{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["093167\/Z\/10\/Z"],"award-info":[{"award-number":["093167\/Z\/10\/Z"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Victorian Government\u2019s Operational Infrastructure Support Program"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,1,17]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson\u2019s correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal\/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal\/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http:\/\/biosig.unimelb.edu.au\/csm_carbohydrate\/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.<\/jats:p>","DOI":"10.1093\/bib\/bbab512","type":"journal-article","created":{"date-parts":[[2021,11,9]],"date-time":"2021-11-09T12:10:57Z","timestamp":1636459857000},"source":"Crossref","is-referenced-by-count":11,"title":["CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function"],"prefix":"10.1093","volume":"23","author":[{"given":"Thanh Binh","family":"Nguyen","sequence":"first","affiliation":[{"name":"Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia"},{"name":"Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia"},{"name":"School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3004-2119","authenticated-orcid":false,"given":"Douglas E V","family":"Pires","sequence":"additional","affiliation":[{"name":"Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia"},{"name":"Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia"},{"name":"School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2948-2413","authenticated-orcid":false,"given":"David B","family":"Ascher","sequence":"additional","affiliation":[{"name":"Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia"},{"name":"Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia"},{"name":"School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia"},{"name":"Department of Biochemistry, University of Cambridge, Cambridge, UK"}]}],"member":"286","published-online":{"date-parts":[[2021,12,9]]},"reference":[{"key":"2022012000320452600_ref1","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1007\/978-1-4615-1267-7_28","article-title":"Pathogen-host protein-carbohydrate interactions as the basis of important infections","volume":"491","author":"Karlsson","year":"2001","journal-title":"Adv Exp Med Biol"},{"key":"2022012000320452600_ref2","doi-asserted-by":"crossref","first-page":"9029","DOI":"10.3390\/molecules20059029","article-title":"Protein-carbohydrate interactions as part of plant defense and animal immunity","volume":"20","author":"De Schutter","year":"2015","journal-title":"Molecules"},{"key":"2022012000320452600_ref3","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1016\/S0959-440X(02)00364-0","article-title":"Clusters, bundles, arrays and lattices: novel mechanisms for lectin-saccharide-mediated cellular interactions","volume":"12","author":"Brewer","year":"2002","journal-title":"Curr Opin Struct Biol"},{"key":"2022012000320452600_ref4","doi-asserted-by":"crossref","first-page":"1673","DOI":"10.1021\/acs.chemrev.5b00247","article-title":"Glycopolymer Nanobiotechnology","volume":"116","author":"Miura","year":"2016","journal-title":"Chem Rev"},{"key":"2022012000320452600_ref5","doi-asserted-by":"crossref","first-page":"3161","DOI":"10.1007\/s00216-011-5594-y","article-title":"Carbohydrate-protein interactions and their biosensing applications","volume":"402","author":"Zeng","year":"2012","journal-title":"Anal Bioanal Chem"},{"key":"2022012000320452600_ref6","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1016\/S0304-4165(02)00309-4","article-title":"Principles of structures of animal and plant lectins","volume":"1572","author":"Loris","year":"2002","journal-title":"Biochim Biophys Acta"},{"key":"2022012000320452600_ref7","doi-asserted-by":"crossref","first-page":"1266","DOI":"10.1021\/acscentsci.8b00453","article-title":"Structural and biochemical insights into the function and evolution of sulfoquinovosidases","volume":"4","author":"Abayakoon","year":"2018","journal-title":"ACS Cent Sci"},{"key":"2022012000320452600_ref8","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.chom.2019.08.009","article-title":"A family of dual-activity glycosyltransferase-phosphorylases mediates Mannogen turnover and virulence in leishmania parasites","volume":"26","author":"Sernee","year":"2019","journal-title":"Cell Host Microbe"},{"key":"2022012000320452600_ref9","doi-asserted-by":"crossref","first-page":"15152","DOI":"10.1021\/jacs.5b08424","article-title":"Carbohydrate-aromatic interactions in proteins","volume":"137","author":"Hudson","year":"2015","journal-title":"J Am Chem Soc"},{"key":"2022012000320452600_ref10","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1006\/jmbi.1998.2534","article-title":"Carbohydrate binding, quaternary structure and a novel hydrophobic binding site in two legume lectin oligomers from Dolichos biflorus","volume":"286","author":"Hamelryck","year":"1999","journal-title":"J Mol Biol"},{"key":"2022012000320452600_ref11","doi-asserted-by":"crossref","first-page":"6435","DOI":"10.1021\/acs.biochem.5b01058","article-title":"Neutron crystallographic studies reveal hydrogen bond and water-mediated interactions between a carbohydrate-binding module and its bound carbohydrate ligand","volume":"54","author":"Fisher","year":"2015","journal-title":"Biochemistry"},{"key":"2022012000320452600_ref12","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/S0076-6879(03)01022-X","article-title":"Exploring kinetics and mechanism of protein-sugar recognition by surface plasmon resonance","volume":"362","author":"Kapoor","year":"2003","journal-title":"Methods Enzymol"},{"key":"2022012000320452600_ref13","doi-asserted-by":"crossref","first-page":"2529","DOI":"10.1038\/nprot.2007.357","article-title":"Frontal affinity chromatography: sugar-protein interactions","volume":"2","author":"Tateno","year":"2007","journal-title":"Nat Protoc"},{"key":"2022012000320452600_ref14","doi-asserted-by":"crossref","first-page":"596","DOI":"10.1016\/j.drudis.2010.06.001","article-title":"Molecular simulations of carbohydrates and protein-carbohydrate interactions: motivation, issues and prospects","volume":"15","author":"Fadda","year":"2010","journal-title":"Drug Discov Today"},{"key":"2022012000320452600_ref15","doi-asserted-by":"crossref","first-page":"1373","DOI":"10.1016\/S0006-3495(01)75793-1","article-title":"Carbohydrate-protein recognition: molecular dynamics simulations and free energy analysis of oligosaccharide binding to concanavalin a","volume":"81","author":"Bryce","year":"2001","journal-title":"Biophys J"},{"key":"2022012000320452600_ref16","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1021\/ci800103u","article-title":"BALLDock\/SLICK: a new method for protein-carbohydrate docking","volume":"48","author":"Kerzmann","year":"2008","journal-title":"J Chem Inf Model"},{"key":"2022012000320452600_ref17","doi-asserted-by":"crossref","first-page":"6807","DOI":"10.1021\/acs.jpcb.1c00910","article-title":"Development and evaluation of GlycanDock: a protein-glycoligand docking refinement algorithm in Rosetta","volume":"125","author":"Nance","year":"2021","journal-title":"J Phys Chem B"},{"key":"2022012000320452600_ref18","first-page":"320","article-title":"An overview of scoring functions used for protein\u2013ligand interactions in molecular docking","volume":"11","author":"Li","year":"2019","journal-title":"Interdisciplinary Sciences: Comput Life Sci"},{"key":"2022012000320452600_ref19","doi-asserted-by":"crossref","first-page":"1604","DOI":"10.3390\/molecules21111604","article-title":"AutoDock-GIST: incorporating thermodynamics of active-site water into scoring function for accurate protein-ligand docking","volume":"21","author":"Uehara","year":"2016","journal-title":"Molecules"},{"key":"2022012000320452600_ref20","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1023\/A:1007996124545","article-title":"Empirical scoring functions: I. the development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes","volume":"11","author":"Eldridge","year":"1997","journal-title":"J Comput Aided Mol Des"},{"key":"2022012000320452600_ref21","doi-asserted-by":"crossref","first-page":"6177","DOI":"10.1021\/jm051256o","article-title":"Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein\u2212ligand complexes","volume":"49","author":"Friesner","year":"2006","journal-title":"J Med Chem"},{"key":"2022012000320452600_ref22","doi-asserted-by":"crossref","first-page":"2731","DOI":"10.1021\/ci200274q","article-title":"DSX: a knowledge-based scoring function for the assessment of protein\u2013ligand complexes","volume":"51","author":"Neudert","year":"2011","journal-title":"J Chem Inf Model"},{"key":"2022012000320452600_ref23","doi-asserted-by":"crossref","first-page":"e55","DOI":"10.1093\/nar\/gku077","article-title":"A knowledge-based scoring function for protein-RNA interactions derived from a statistical mechanics-based iterative method","volume":"42","author":"Huang","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref24","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1093\/bioinformatics\/btq112","article-title":"A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking","volume":"26","author":"Ballester","year":"2010","journal-title":"Bioinformatics"},{"key":"2022012000320452600_ref25","doi-asserted-by":"crossref","first-page":"2897","DOI":"10.1021\/ci2003889","article-title":"NNScore 2.0: a neural-network receptor-ligand scoring function","volume":"51","author":"Durrant","year":"2011","journal-title":"J Chem Inf Model"},{"key":"2022012000320452600_ref26","doi-asserted-by":"crossref","first-page":"1334","DOI":"10.1093\/bioinformatics\/bty757","article-title":"Development of a protein-ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions","volume":"35","author":"Wojcikowski","year":"2019","journal-title":"Bioinformatics"},{"key":"2022012000320452600_ref27","doi-asserted-by":"crossref","first-page":"W557","DOI":"10.1093\/nar\/gkw390","article-title":"CSM-lig: a web server for assessing and comparing protein-small molecule affinities","volume":"44","author":"Pires","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref28","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1002\/wcms.1225","article-title":"Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening","volume":"5","author":"Ain","year":"2015","journal-title":"Wiley Interdiscip Rev Comput Mol Sci"},{"key":"2022012000320452600_ref29","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.ddtec.2020.09.001","article-title":"Selecting machine-learning scoring functions for structure-based virtual screening","volume":"32-33","author":"Ballester","year":"2019","journal-title":"Drug Discov Today Technol"},{"key":"2022012000320452600_ref30","volume-title":"Wiley Interdisciplinary Reviews: Computational Molecular Science,","author":"Li"},{"key":"2022012000320452600_ref31","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1007\/7355_2014_42","volume-title":"Carbohydrates as Drugs","author":"Frank","year":"2014"},{"key":"2022012000320452600_ref32","article-title":"Prediction of protein-carbohydrate complex binding affinity using structural features","volume":"22","author":"Siva Shanmugam","year":"2020","journal-title":"Brief Bioinform"},{"key":"2022012000320452600_ref33","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1093\/bioinformatics\/btt691","article-title":"mCSM: predicting the effects of mutations in proteins using graph-based signatures","volume":"30","author":"Pires","year":"2014","journal-title":"Bioinformatics"},{"key":"2022012000320452600_ref34","doi-asserted-by":"crossref","first-page":"W314","DOI":"10.1093\/nar\/gku411","article-title":"DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach","volume":"42","author":"Pires","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref35","doi-asserted-by":"crossref","first-page":"W147","DOI":"10.1093\/nar\/gkaa416","article-title":"mCSM-membrane: predicting the effects of mutations on transmembrane proteins","volume":"48","author":"Pires","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref36","doi-asserted-by":"crossref","first-page":"W350","DOI":"10.1093\/nar\/gky300","article-title":"DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability","volume":"46","author":"Rodrigues","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref37","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1002\/pro.3942","article-title":"DynaMut2: assessing changes in stability and flexibility upon single and multiple point missense mutations","volume":"30","author":"Rodrigues","year":"2021","journal-title":"Protein Sci"},{"key":"2022012000320452600_ref38","doi-asserted-by":"crossref","first-page":"W125","DOI":"10.1093\/nar\/gkaa389","article-title":"mmCSM-AB: guiding rational antibody engineering through multiple point mutations","volume":"48","author":"Myung","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref39","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1093\/bioinformatics\/btz779","article-title":"mCSM-AB2: guiding rational antibody design using graph-based signatures","volume":"36","author":"Myung","year":"2020","journal-title":"Bioinformatics"},{"key":"2022012000320452600_ref40","doi-asserted-by":"crossref","first-page":"W469","DOI":"10.1093\/nar\/gkw458","article-title":"mCSM-AB: a web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures","volume":"44","author":"Pires","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref41","doi-asserted-by":"crossref","first-page":"29575","DOI":"10.1038\/srep29575","article-title":"mCSM-lig: quantifying the effects of mutations on protein-small molecule affinity in genetic disease and emergence of drug resistance","volume":"6","author":"Pires","year":"2016","journal-title":"Sci Rep"},{"key":"2022012000320452600_ref42","doi-asserted-by":"crossref","first-page":"W241","DOI":"10.1093\/nar\/gkx236","article-title":"mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions","volume":"45","author":"Pires","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref43","doi-asserted-by":"crossref","first-page":"W338","DOI":"10.1093\/nar\/gkz383","article-title":"mCSM-PPI2: predicting the effects of mutations on protein-protein interactions","volume":"47","author":"Rodrigues","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref44","doi-asserted-by":"crossref","first-page":"W417","DOI":"10.1093\/nar\/gkab273","article-title":"mmCSM-PPI: predicting the effects of multiple point mutations on protein-protein interactions","volume":"49","author":"Rodrigues","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref45","doi-asserted-by":"crossref","first-page":"D368","DOI":"10.1093\/nar\/gkz860","article-title":"ProCarbDB: a database of carbohydrate-binding proteins","volume":"48","author":"Copoiu","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2022012000320452600_ref46","doi-asserted-by":"crossref","first-page":"3615","DOI":"10.1093\/bioinformatics\/btaa141","article-title":"ProCaff: protein-carbohydrate complex binding affinity database","volume":"36","author":"Siva Shanmugam","year":"2020","journal-title":"Bioinformatics"},{"issue":"Suppl 4","key":"2022012000320452600_ref47","doi-asserted-by":"crossref","first-page":"S12","DOI":"10.1186\/1471-2164-12-S4-S12","article-title":"Cutoff scanning matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns","volume":"12","author":"Pires","year":"2011","journal-title":"BMC Genomics"},{"key":"2022012000320452600_ref48","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1016\/j.jmb.2016.12.004","article-title":"Arpeggio: a web server for calculating and visualising interatomic interactions in protein structures","volume":"429","author":"Jubb","year":"2017","journal-title":"J Mol Biol"},{"key":"2022012000320452600_ref49","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1016\/j.jmgm.2011.01.004","article-title":"BINANA: a novel algorithm for ligand-binding characterization","volume":"29","author":"Durrant","year":"2011","journal-title":"J Mol Graph Model"},{"key":"2022012000320452600_ref50","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1002\/minf.201400132","article-title":"Improving AutoDock Vina using random Forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets","volume":"34","author":"Li","year":"2015","journal-title":"Mol Inform"},{"key":"2022012000320452600_ref51","doi-asserted-by":"crossref","first-page":"944","DOI":"10.1021\/ci500091r","article-title":"Does a more precise chemical description of protein-ligand complexes lead to more accurate prediction of binding affinity?","volume":"54","author":"Ballester","year":"2014","journal-title":"J Chem Inf Model"},{"key":"2022012000320452600_ref52","doi-asserted-by":"crossref","DOI":"10.3390\/biom8010012","article-title":"The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction","volume":"8","author":"Li","year":"2018","journal-title":"Biomolecules"},{"key":"2022012000320452600_ref53","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/s13321-015-0078-2","article-title":"Open drug discovery toolkit (ODDT): a new open-source player in the drug discovery field","volume":"7","author":"Wojcikowski","year":"2015","journal-title":"J Chem"},{"key":"2022012000320452600_ref54","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/jcc.21334","article-title":"AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading","volume":"31","author":"Trott","year":"2010","journal-title":"J Comput Chem"},{"key":"2022012000320452600_ref55","doi-asserted-by":"crossref","first-page":"4200","DOI":"10.1093\/bioinformatics\/btaa480","article-title":"EasyVS: a user-friendly web-based tool for molecule library selection and structure-based virtual screening","volume":"36","author":"Pires","year":"2020","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab512\/42231514\/bbab512.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/1\/bbab512\/42231514\/bbab512.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,12]],"date-time":"2023-11-12T11:56:25Z","timestamp":1699790185000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab512\/6457169"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,9]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,1,17]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab512","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,1]]},"published":{"date-parts":[[2021,12,9]]},"article-number":"bbab512"}}