{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T21:08:14Z","timestamp":1773695294414,"version":"3.50.1"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T00:00:00Z","timestamp":1681689600000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,4,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A protein can be represented in several forms, including its 1D sequence, 3D atom coordinates, and molecular surface. A protein surface contains rich structural and chemical features directly related to the protein\u2019s function such as its ability to interact with other molecules. While many methods have been developed for comparing the similarity of proteins using the sequence and structural representations, computational methods based on molecular surface representation are limited.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we describe \u201cSurface ID,\u201d a geometric deep learning system for high-throughput surface comparison based on geometric and chemical features. Surface ID offers a novel grouping and alignment algorithm useful for clustering proteins by function, visualization, and in silico screening of potential binding partners to a target molecule. Our method demonstrates top performance in surface similarity assessment, indicating great potential for protein functional annotation, a major need in protein engineering and therapeutic design.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Source code for the Surface ID model, trained weights, and inference script are available at https:\/\/github.com\/Sanofi-Public\/LMR-SurfaceID.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad196","type":"journal-article","created":{"date-parts":[[2023,4,17]],"date-time":"2023-04-17T14:24:59Z","timestamp":1681741499000},"source":"Crossref","is-referenced-by-count":15,"title":["Surface ID: a geometry-aware system for protein molecular surface comparison"],"prefix":"10.1093","volume":"39","author":[{"given":"Saleh","family":"Riahi","sequence":"first","affiliation":[{"name":"Large Molecule Research, Sanofi , Cambridge, MA 02141, United States"}]},{"given":"Jae Hyeon","family":"Lee","sequence":"additional","affiliation":[{"name":"Data & Data Science, Sanofi , Cambridge, MA 02141, United States"},{"name":"Present\u00a0address:\u00a0Prescient Design, Genentech, Inc., South San Francisco, CA 94080, USA"}]},{"given":"Taylor","family":"Sorenson","sequence":"additional","affiliation":[{"name":"Data & Data Science, Sanofi , Cambridge, MA 02141, United States"}]},{"given":"Shuai","family":"Wei","sequence":"additional","affiliation":[{"name":"Large Molecule Research, Sanofi , Cambridge, MA 02141, United States"},{"name":"Present\u00a0address:\u00a0Bristol Myers Squibb, 100 Binney St, Cambridge, MA 02142, USA"}]},{"given":"Sven","family":"Jager","sequence":"additional","affiliation":[{"name":"R&D Digital Data & Computational Sciences, Sanofi, Industriepark Hoechst , Frankfurt am Main 65929, Germany"}]},{"given":"Reza","family":"Olfati-Saber","sequence":"additional","affiliation":[{"name":"Data & Data Science, Sanofi , Cambridge, MA 02141, United States"}]},{"given":"Yanfeng","family":"Zhou","sequence":"additional","affiliation":[{"name":"Large Molecule Research , Sanofi, Cambridge, MA 02141, United States"}]},{"given":"Anna","family":"Park","sequence":"additional","affiliation":[{"name":"Large Molecule Research, Sanofi , Cambridge, MA 02141, United States"}]},{"given":"Maria","family":"Wendt","sequence":"additional","affiliation":[{"name":"Large Molecule Research, Sanofi , Cambridge, MA 02141, United States"}]},{"given":"Herv\u00e9","family":"Minoux","sequence":"additional","affiliation":[{"name":"Data & Data Science, Sanofi , Chilly-Mazarin 91380, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2078-0513","authenticated-orcid":false,"given":"Yu","family":"Qiu","sequence":"additional","affiliation":[{"name":"Large Molecule Research, Sanofi , Cambridge, MA 02141, United States"}]}],"member":"286","published-online":{"date-parts":[[2023,4,17]]},"reference":[{"key":"2023060914152731700_btad196-B1","doi-asserted-by":"crossref","first-page":"e1006112","DOI":"10.1371\/journal.pcbi.1006112","article-title":"RosettaAntibodyDesign (RAbD): a general framework for computational antibody design","volume":"14","author":"Adolf-Bryfogle","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2023060914152731700_btad196-B2","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/nri980","article-title":"Viral mimicry of cytokines, chemokines and their receptors","volume":"3","author":"Alcami","year":"2003","journal-title":"Nat Rev Immunol"},{"key":"2023060914152731700_btad196-B3","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat Methods"},{"key":"2023060914152731700_btad196-B4","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2023060914152731700_btad196-B5","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2023060914152731700_btad196-B6","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1002\/prot.20123","article-title":"Towards a structural classification of phosphate binding sites in protein-nucleotide complexes: an automated all-against-all structural comparison using geometric matching","volume":"56","author":"Brakoulias","year":"2004","journal-title":"Proteins"},{"key":"2023060914152731700_btad196-B7","doi-asserted-by":"crossref","first-page":"8192","DOI":"10.1038\/s41598-018-26497-z","article-title":"A novel geometry-based approach to infer protein interface similarity","volume":"8","author":"Budowski-Tal","year":"2018","journal-title":"Sci Rep"},{"key":"2023060914152731700_btad196-B8","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1038\/s41586-022-04654-9","article-title":"Design of protein-binding proteins from the target structure alone","volume":"605","author":"Cao","year":"2022","journal-title":"Nature"},{"key":"2023060914152731700_btad196-B9","doi-asserted-by":"crossref","first-page":"3970","DOI":"10.1093\/bioinformatics\/btz236","article-title":"Protein multiple alignments: sequence-based versus structure-based programs","volume":"35","author":"Carpentier","year":"2019","journal-title":"Bioinformatics"},{"key":"2023060914152731700_btad196-B10","first-page":"1","article-title":"Kernel operations on the GPU, with autodiff, without memory overflows","volume":"22","author":"Charlier","year":"2021","journal-title":"J. Mach Learn Res"},{"key":"2023060914152731700_btad196-B11","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1093\/bib\/bbv099","article-title":"Multiple sequence alignment modeling: methods and applications","volume":"17","author":"Chatzou","year":"2016","journal-title":"Brief Bioinform"},{"key":"2023060914152731700_btad196-B12","doi-asserted-by":"crossref","first-page":"55","DOI":"10.3390\/antib8040055","article-title":"Antibody structure and function: the basis for engineering therapeutics","volume":"8","author":"Chiu","year":"2019","journal-title":"Antibodies (Basel)"},{"key":"2023060914152731700_btad196-B13","doi-asserted-by":"crossref","article-title":"Learning a similarity metric discriminatively, with application to face verification","author":"Chopra","DOI":"10.1109\/CVPR.2005.202"},{"key":"2023060914152731700_btad196-B14","doi-asserted-by":"crossref","first-page":"1870","DOI":"10.1093\/bioinformatics\/bty918","article-title":"Antibody interface prediction with 3D zernike descriptors and SVM","volume":"35","author":"Daberdaku","year":"2019","journal-title":"Bioinformatics"},{"key":"2023060914152731700_btad196-B15","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1093\/bioinformatics\/btv552","article-title":"ANARCI: antigen receptor numbering and receptor classification","volume":"32","author":"Dunbar","year":"2016","journal-title":"Bioinformatics"},{"key":"2023060914152731700_btad196-B16","doi-asserted-by":"crossref","first-page":"D1140","DOI":"10.1093\/nar\/gkt1043","article-title":"SAbDab: the structural antibody database","volume":"42","author":"Dunbar","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023060914152731700_btad196-B17","doi-asserted-by":"crossref","first-page":"3065","DOI":"10.3389\/fimmu.2018.03065","article-title":"Characterizing the diversity of the CDR-H3 loop conformational ensembles in relationship to antibody binding properties","volume":"9","author":"Fern\u00e1ndez-Quintero","year":"2019","journal-title":"Front Immunol"},{"key":"2023060914152731700_btad196-B18","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1038\/s41592-019-0666-6","article-title":"Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning","volume":"17","author":"Gainza","year":"2020","journal-title":"Nat Methods"},{"key":"2023060914152731700_btad196-B19","doi-asserted-by":"crossref","first-page":"D781","DOI":"10.1093\/nar\/gkj088","article-title":"IMGT\/LIGM-DB, the IMGT comprehensive database of immunoglobulin and T cell receptor nucleotide sequences","volume":"34","author":"Giudicelli","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023060914152731700_btad196-B20","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1016\/j.jmb.2005.11.044","article-title":"Fold independent structural comparisons of protein-ligand binding sites for exploring functional relationships","volume":"355","author":"Gold","year":"2006","journal-title":"J Mol Biol"},{"key":"2023060914152731700_btad196-B21","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkac387","article-title":"Dali server: structural unification of protein families","author":"Holm","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023060914152731700_btad196-B22","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1006\/jmbi.1993.1489","article-title":"Protein structure comparison by alignment of distance matrices","volume":"233","author":"Holm","year":"1993","journal-title":"J Mol Biol"},{"key":"2023060914152731700_btad196-B23","first-page":"10217","author":"Jin"},{"key":"2023060914152731700_btad196-B24","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023060914152731700_btad196-B25","doi-asserted-by":"crossref","first-page":"520","DOI":"10.2174\/138920311796957612","article-title":"Molecular surface representation using 3D zernike descriptors for protein shape comparison and docking","volume":"12","author":"Kihara","year":"2011","journal-title":"Curr Protein Pept Sci"},{"key":"2023060914152731700_btad196-B26","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1016\/0263-7855(96)00030-6","article-title":"Molecular recognition via face center representation of a molecular surface","volume":"14","author":"Lin","year":"1996","journal-title":"J Mol Graph"},{"key":"2023060914152731700_btad196-B27","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/B978-0-12-800168-4.00005-6","article-title":"Algorithms, applications, and challenges of protein structure alignment","volume":"94","author":"Ma","year":"2014","journal-title":"Adv Protein Chem Struct Biol"},{"key":"2023060914152731700_btad196-B28","author":"McInnes"},{"key":"2023060914152731700_btad196-B29","doi-asserted-by":"crossref","first-page":"2347","DOI":"10.1093\/bioinformatics\/bti337","article-title":"Real spherical harmonic expansion coefficients as 3D shape descriptors for protein binding pocket and ligand comparisons","volume":"21","author":"Morris","year":"2005","journal-title":"Bioinformatics"},{"key":"2023060914152731700_btad196-B30","doi-asserted-by":"crossref","first-page":"10495","DOI":"10.1073\/pnas.88.23.10495","article-title":"Efficient detection of three-dimensional structural motifs in biological macromolecules by computer vision techniques","volume":"88","author":"Nussinov","year":"1991","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023060914152731700_btad196-B31","doi-asserted-by":"crossref","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","article-title":"Improved tools for biological sequence comparison","volume":"85","author":"Pearson","year":"1988","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023060914152731700_btad196-B32","doi-asserted-by":"crossref","first-page":"D290","DOI":"10.1093\/nar\/gkr1065","article-title":"The PFAM protein families database","volume":"40","author":"Punta","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023060914152731700_btad196-B33","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1093\/bioinformatics\/btaa739","article-title":"CoV-AbDab: the coronavirus antibody database","volume":"37","author":"Raybould","year":"2021","journal-title":"Bioinformatics"},{"issue":"15","key":"2023060914152731700_btad196-B34","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023060914152731700_btad196-B35","author":"Schr\u00f6dinger LLC","year":"2015"},{"key":"2023060914152731700_btad196-B36","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1002\/prot.10628","article-title":"A method for simultaneous alignment of multiple protein structures","volume":"56","author":"Shatsky","year":"2004","journal-title":"Proteins"},{"key":"2023060914152731700_btad196-B37","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.jmb.2004.04.012","article-title":"Recognition of functional sites in protein structures","volume":"339","author":"Shulman-Peleg","year":"2004","journal-title":"J Mol Biol"},{"key":"2023060914152731700_btad196-B38","author":"Sverrisson"},{"key":"2023060914152731700_btad196-B39","doi-asserted-by":"crossref","first-page":"1170","DOI":"10.1073\/pnas.1119684109","article-title":"Classification of protein functional surfaces using structural characteristics","volume":"109","author":"Tseng","year":"2012","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023060914152731700_btad196-B40","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1098\/rsif.2011.0584","article-title":"Flexibility and binding affinity in protein-ligand, protein-protein and multi-component protein interactions: limitations of current computational approaches","volume":"9","author":"Tuffery","year":"2012","journal-title":"J R Soc Interface"},{"key":"2023060914152731700_btad196-B41","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023060914152731700_btad196-B42","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1186\/1471-2105-10-407","article-title":"Protein-protein docking using region-based 3D zernike descriptors","volume":"10","author":"Venkatraman","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023060914152731700_btad196-B43","doi-asserted-by":"crossref","first-page":"2308","DOI":"10.1002\/pro.5560061104","article-title":"TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites","volume":"6","author":"Wallace","year":"2008","journal-title":"Protein Sci"},{"key":"2023060914152731700_btad196-B44","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1186\/s12859-018-2524-4","article-title":"A benchmark study of sequence alignment methods for protein clustering","volume":"19","author":"Wang","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023060914152731700_btad196-B45","doi-asserted-by":"crossref","first-page":"16622","DOI":"10.1073\/pnas.0906146106","article-title":"Fast screening of protein surfaces using geometric invariant fingerprints","volume":"106","author":"Yin","year":"2009","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023060914152731700_btad196-B46","doi-asserted-by":"crossref","first-page":"1996","DOI":"10.1109\/TCBB.2020.2966633","article-title":"Protein family classification from scratch: a CNN based deep learning approach","volume":"18","author":"Zhang","year":"2021","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"key":"2023060914152731700_btad196-B47","doi-asserted-by":"crossref","first-page":"707","DOI":"10.1093\/bioinformatics\/btu724","article-title":"Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0","volume":"31","author":"Zhu","year":"2015","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad196\/49989163\/btad196.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/4\/btad196\/50521120\/btad196.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/4\/btad196\/50521120\/btad196.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,9]],"date-time":"2023-06-09T16:22:35Z","timestamp":1686327755000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad196\/7126409"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,4,1]]},"references-count":47,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,4,3]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad196","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,4,1]]},"published":{"date-parts":[[2023,4,1]]},"article-number":"btad196"}}