{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:02:47Z","timestamp":1775325767553,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2020,12,1]],"date-time":"2020-12-01T00:00:00Z","timestamp":1606780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Netherlands Organization for Scientific Research"},{"DOI":"10.13039\/501100003246","name":"NWO","doi-asserted-by":"publisher","award":["TTW 15043"],"award-info":[{"award-number":["TTW 15043"]}],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003246","name":"NWO","doi-asserted-by":"publisher","award":["TTW 14516"],"award-info":[{"award-number":["TTW 14516"]}],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,12,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Python code available at https:\/\/git.wur.nl\/durai001\/geometricus.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa839","type":"journal-article","created":{"date-parts":[[2020,9,16]],"date-time":"2020-09-16T12:09:37Z","timestamp":1600258177000},"page":"i718-i725","source":"Crossref","is-referenced-by-count":38,"title":["Geometricus represents protein structures as shape-mers derived from moment invariants"],"prefix":"10.1093","volume":"36","author":[{"given":"Janani","family":"Durairaj","sequence":"first","affiliation":[{"name":"Bioinformatics Group, Department of Plant Sciences"}]},{"given":"Mehmet","family":"Akdel","sequence":"additional","affiliation":[{"name":"Bioinformatics Group, Department of Plant Sciences"}]},{"given":"Dick","family":"de Ridder","sequence":"additional","affiliation":[{"name":"Bioinformatics Group, Department of Plant Sciences"}]},{"given":"Aalt D J","family":"van Dijk","sequence":"additional","affiliation":[{"name":"Bioinformatics Group, Department of Plant Sciences"},{"name":"Mathematical and Statistical Methods - Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6700AP, The Netherlands"}]}],"member":"286","published-online":{"date-parts":[[2020,12,29]]},"reference":[{"key":"2023062409330469900_btaa839-B1","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat. Methods"},{"key":"2023062409330469900_btaa839-B2","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1186\/s12859-019-2932-0","article-title":"ProteinNet: a standardized data set for machine learning of protein structure","volume":"20","author":"AlQuraishi","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023062409330469900_btaa839-B3","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/bioinformatics\/btr168","article-title":"ProDy: protein dynamics inferred from theory and experiments","volume":"27","author":"Bakan","year":"2011","journal-title":"Bioinformatics"},{"key":"2023062409330469900_btaa839-B4","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1093\/nar\/30.1.276","article-title":"The Pfam protein families database","volume":"30","author":"Bateman","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023062409330469900_btaa839-B5","author":"Bepler","year":"2019"},{"key":"2023062409330469900_btaa839-B6","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1111\/j.1432-1033.1977.tb11885.x","article-title":"The Protein Data Bank: a computer-based archival file for macromolecular structures","volume":"80","author":"Bernstein","year":"1977","journal-title":"Eur. J. Biochem"},{"key":"2023062409330469900_btaa839-B7","doi-asserted-by":"crossref","first-page":"3481","DOI":"10.1073\/pnas.0914097107","article-title":"FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately","volume":"107","author":"Budowski-Tal","year":"2010","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062409330469900_btaa839-B8","first-page":"82","article-title":"PyMOL: an open-source molecular graphics tool","volume":"40","author":"DeLano","year":"2002","journal-title":"CCP4 Newsl. Protein Crystallogr"},{"key":"2023062409330469900_btaa839-B9","doi-asserted-by":"crossref","first-page":"734","DOI":"10.1093\/bib\/bbt056","article-title":"Similarity-based machine learning methods for predicting drug-target interactions: a brief review","volume":"15","author":"Ding","year":"2014","journal-title":"Brief. Bioinform"},{"key":"2023062409330469900_btaa839-B10","doi-asserted-by":"crossref","first-page":"1356","DOI":"10.1046\/j.1432-1033.2002.02767.x","article-title":"Prediction of protein\u2013protein interaction sites in heterocomplexes with neural networks","volume":"269","author":"Fariselli","year":"2002","journal-title":"Eur. J. Biochem"},{"key":"2023062409330469900_btaa839-B11","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1016\/0167-8655(94)90092-2","article-title":"Affine moment invariants: a new tool for character recognition","volume":"15","author":"Flusser","year":"1994","journal-title":"Pattern Recogn Lett"},{"key":"2023062409330469900_btaa839-B12","doi-asserted-by":"crossref","first-page":"234","DOI":"10.1109\/TPAMI.2003.1177154","article-title":"Moment forms invariant to rotation and blur in arbitrary number of dimensions","volume":"25","author":"Flusser","year":"2003","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023062409330469900_btaa839-B13","first-page":"410","volume-title":"Computational Biology and Bioinformatics","author":"Garg","year":"2016"},{"key":"2023062409330469900_btaa839-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-07652-6","article-title":"Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models","volume":"9","author":"Heckmann","year":"2018","journal-title":"Nat. Commun"},{"key":"2023062409330469900_btaa839-B15","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1109\/TIT.1962.1057692","article-title":"Visual pattern recognition by moment invariants","volume":"8","author":"Hu","year":"1962","journal-title":"IRE Trans. Inform. Theory"},{"key":"2023062409330469900_btaa839-B16","doi-asserted-by":"crossref","first-page":"e0138022","DOI":"10.1371\/journal.pone.0138022","article-title":"Structure based thermostability prediction models for protein single point mutations with machine learning tools","volume":"10","author":"Jia","year":"2015","journal-title":"PLoS One"},{"key":"2023062409330469900_btaa839-B17","doi-asserted-by":"crossref","first-page":"922","DOI":"10.1107\/S0567739476001873","article-title":"A solution for the best rotation to relate two sets of vectors","volume":"32","author":"Kabsch","year":"1976","journal-title":"Acta Crystallogr. A"},{"key":"2023062409330469900_btaa839-B18","doi-asserted-by":"crossref","first-page":"D365","DOI":"10.1093\/nar\/gkv1082","article-title":"KLIFS: a structural kinase-ligand interaction database","volume":"44","author":"Kooistra","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062409330469900_btaa839-B19","first-page":"371","author":"Kratz","year":"2011"},{"key":"2023062409330469900_btaa839-B20","first-page":"1","author":"Lam","year":"2015"},{"key":"2023062409330469900_btaa839-B21","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/j.jmb.2008.12.044","article-title":"Structural alphabets for protein structure classification: a comparison study","volume":"387","author":"Le","year":"2009","journal-title":"J. Mol. Biol"},{"key":"2023062409330469900_btaa839-B22","doi-asserted-by":"crossref","first-page":"2535","DOI":"10.3390\/molecules23102535","article-title":"Machine learning approaches for protein\u2013protein interaction hot spot prediction: progress and comparative assessment","volume":"23","author":"Liu","year":"2018","journal-title":"Molecules"},{"key":"2023062409330469900_btaa839-B23","doi-asserted-by":"crossref","first-page":"i773","DOI":"10.1093\/bioinformatics\/bty585","article-title":"Learning structural motif representations for efficient protein structure search","volume":"34","author":"Liu","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062409330469900_btaa839-B24","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1186\/1471-2105-8-307","article-title":"Protein structural similarity search by Ramachandran codes","volume":"8","author":"Lo","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023062409330469900_btaa839-B25","first-page":"121","volume-title":"Adv. Protein Chem. Struct. Biol","author":"Ma","year":"2014"},{"key":"2023062409330469900_btaa839-B26","doi-asserted-by":"crossref","first-page":"D297","DOI":"10.1093\/nar\/gkt1208","article-title":"MMDB and VAST+: tracking structural similarities between macromolecular complexes","volume":"42","author":"Madej","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062409330469900_btaa839-B27","doi-asserted-by":"crossref","first-page":"819","DOI":"10.1109\/34.709598","article-title":"N-dimensional moment invariants and conceptual mathematical theory of recognition n-dimensional solids","volume":"20","author":"Mamistvalov","year":"1998","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023062409330469900_btaa839-B28","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.media.2004.06.016","article-title":"Brain morphometry using 3D moment invariants","volume":"8","author":"Mangin","year":"2004","journal-title":"Med. Image Anal"},{"key":"2023062409330469900_btaa839-B29","article-title":"UMAP: uniform manifold approximation and projection for dimension reduction","author":"McInnes","year":"2018","journal-title":"ArXiv e-Prints"},{"key":"2023062409330469900_btaa839-B30","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1039\/fd9929300269","article-title":"Modelling the structure and function of enzymes by machine learning","volume":"93","author":"Michael","year":"1992","journal-title":"Faraday Discuss"},{"key":"2023062409330469900_btaa839-B31","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1002\/prot.25064","article-title":"Critical assessment of methods of protein structure prediction: progress and new directions in round XI","volume":"84","author":"Moult","year":"2016","journal-title":"Proteins"},{"key":"2023062409330469900_btaa839-B32","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023062409330469900_btaa839-B33","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1093\/nar\/gkg062","article-title":"The CATH database: an extended protein family resource for structural and functional genomics","volume":"31","author":"Pearl","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023062409330469900_btaa839-B34","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023062409330469900_btaa839-B35","first-page":"9686","author":"Rao","year":"2019"},{"key":"2023062409330469900_btaa839-B36","doi-asserted-by":"crossref","first-page":"1876","DOI":"10.3844\/ajbbsp.2006.1876.1878","article-title":"Object detection using geometric invariant moment","volume":"3","author":"Rizon","year":"2006","journal-title":"Am. J. Appl. Sci"},{"key":"2023062409330469900_btaa839-B37","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1109\/TPAMI.1980.4766990","article-title":"Three-dimensional moment invariants","volume":"PAMI-2","author":"Sadjadi","year":"1980","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023062409330469900_btaa839-B38","first-page":"2051","author":"Se","year":"2001"},{"key":"2023062409330469900_btaa839-B39","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/s41586-019-1923-7","article-title":"Improved protein structure prediction using potentials from deep learning","volume":"577","author":"Senior","year":"2020","journal-title":"Nature"},{"key":"2023062409330469900_btaa839-B40","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/0471250953.bi0307s03","article-title":"An overview of multiple sequence alignment","volume":"3","author":"Simossis","year":"2003","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023062409330469900_btaa839-B41","doi-asserted-by":"crossref","first-page":"3139","DOI":"10.1093\/bioinformatics\/btm503","article-title":"Moment invariants as shape recognition technique for comparing protein binding sites","volume":"23","author":"Sommer","year":"2007","journal-title":"Bioinformatics"},{"key":"2023062409330469900_btaa839-B42","doi-asserted-by":"crossref","first-page":"W582","DOI":"10.1093\/nar\/gkh430","article-title":"FATCAT: a web server for flexible structure comparison and structure similarity searching","volume":"32","author":"Ye","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023062409330469900_btaa839-B43","doi-asserted-by":"crossref","first-page":"177","DOI":"10.2174\/1389200219666180829121038","article-title":"Targeting virus-host protein interactions: feature extraction and machine learning approaches","volume":"20","author":"Zheng","year":"2019","journal-title":"Curr. Drug Metab"},{"key":"2023062409330469900_btaa839-B44","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/s00138-015-0730-x","article-title":"On a 3D analogue of the first Hu moment invariant and a family of shape ellipsoidness measures","volume":"27","author":"\u017duni\u0107","year":"2016","journal-title":"Mach. Vis. Appl"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_2\/i718\/50693680\/btaa839.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/Supplement_2\/i718\/50693680\/btaa839.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T19:56:34Z","timestamp":1687636594000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/Supplement_2\/i718\/6055902"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":44,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2020,12,30]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa839","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.09.07.285569","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,12]]},"published":{"date-parts":[[2020,12]]}}}