{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,4]],"date-time":"2024-10-04T04:15:56Z","timestamp":1728015356651},"reference-count":97,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T00:00:00Z","timestamp":1662595200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T00:00:00Z","timestamp":1662595200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Though proposing algorithmic approaches for protein domain decomposition has been of high interest, the inherent ambiguity to the problem makes it still an active area of research. Besides, accurate automated methods are in high demand as the number of solved structures for complex proteins is on the rise. While majority of the previous efforts for decomposition of 3D structures are centered on the developing clustering algorithms, employing enhanced measures of proximity between the amino acids has remained rather uncharted. If there exists a kernel function that in its reproducing kernel Hilbert space, structural domains of proteins become well separated, then protein structures can be parsed into domains without the need to use a complex clustering algorithm. Inspired by this idea, we developed a protein domain decomposition method based on diffusion kernels on protein graphs. We examined all combinations of four graph node kernels and two clustering algorithms to investigate their capability to decompose protein structures. The proposed method is tested on five of the most commonly used benchmark datasets for protein domain assignment plus a comprehensive non-redundant dataset. The results show a competitive performance of the method utilizing one of the diffusion kernels compared to four of the best automatic methods. Our method is also able to offer alternative partitionings for the same structure which is in line with the subjective definition of protein domain. With a competitive accuracy and balanced performance for the simple and complex structures despite relying on a relatively naive criterion to choose optimal decomposition, the proposed method revealed that diffusion kernels on graphs in particular, and kernel functions in general are promising measures to facilitate parsing proteins into domains and performing different structural analysis on proteins. The size and interconnectedness of the protein graphs make them promising targets for diffusion kernels as measures of affinity between amino acids. The versatility of our method allows the implementation of future kernels with higher performance. The source code of the proposed method is accessible at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/taherimo\/kludo\">https:\/\/github.com\/taherimo\/kludo<\/jats:ext-link>. Also, the proposed method is available as a web application from<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/cbph.ir\/tools\/kludo\">https:\/\/cbph.ir\/tools\/kludo<\/jats:ext-link>.<\/jats:p>","DOI":"10.1186\/s12859-022-04902-9","type":"journal-article","created":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T16:22:06Z","timestamp":1662654126000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Assignment of structural domains in proteins using diffusion kernels on graphs"],"prefix":"10.1186","volume":"23","author":[{"given":"Mohammad","family":"Taheri-Ledari","sequence":"first","affiliation":[]},{"given":"Amirali","family":"Zandieh","sequence":"additional","affiliation":[]},{"given":"Seyed Peyman","family":"Shariatpanahi","sequence":"additional","affiliation":[]},{"given":"Changiz","family":"Eslahchi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,9,8]]},"reference":[{"issue":"3","key":"4902_CR1","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1073\/pnas.70.3.697","volume":"70","author":"DB Wetlaufer","year":"1973","unstructured":"Wetlaufer DB. Nucleation, rapid folding, and globular intrachain regions in proteins. Proc Natl Acad Sci. 1973;70(3):697\u2013701.","journal-title":"Proc Natl Acad Sci"},{"issue":"3","key":"4902_CR2","doi-asserted-by":"publisher","first-page":"562","DOI":"10.1016\/j.jmb.2006.05.060","volume":"361","author":"TA Holland","year":"2006","unstructured":"Holland TA, Veretnik S, Shindyalov IN, Bourne PE. Partitioning protein structures into domains: why is it so difficult? J Mol Biol. 2006;361(3):562\u201390.","journal-title":"J Mol Biol"},{"key":"4902_CR3","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1016\/0079-6107(83)90003-2","volume":"42","author":"J Janin","year":"1983","unstructured":"Janin J, Wodak SJ. Structural domains in proteins and their role in the dynamics of protein function. Prog Biophys Mol Biol. 1983;42:21\u201378.","journal-title":"Prog Biophys Mol Biol"},{"issue":"24","key":"4902_CR4","doi-asserted-by":"publisher","first-page":"9420","DOI":"10.1073\/pnas.1202604109","volume":"109","author":"LL Porter","year":"2012","unstructured":"Porter LL, Rose GD. A thermodynamic definition of protein domains. Proc Natl Acad Sci. 2012;109(24):9420\u20135.","journal-title":"Proc Natl Acad Sci"},{"issue":"4","key":"4902_CR5","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1006\/jmbi.2001.4513","volume":"307","author":"AE Todd","year":"2001","unstructured":"Todd AE, Orengo CA, Thornton JM. Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol. 2001;307(4):1113\u201343.","journal-title":"J Mol Biol"},{"key":"4902_CR6","doi-asserted-by":"publisher","first-page":"415","DOI":"10.1146\/annurev.biochem.77.062706.133317","volume":"77","author":"C Kiel","year":"2008","unstructured":"Kiel C, Beltrao P, Serrano L. Analyzing protein interaction networks using structural information. Annu Rev Biochem. 2008;77:415\u201341.","journal-title":"Annu Rev Biochem"},{"issue":"D1","key":"4902_CR7","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1093\/nar\/gks1266","volume":"41","author":"TE Lewis","year":"2012","unstructured":"Lewis TE, Sillitoe I, Andreeva A, Blundell TL, Buchan DW, Chothia C, Cuff A, Dana JM, Filippis I, Gough J. Genome3d: a UK collaborative project to annotate genomic sequences with predicted 3d structures based on scop and cath domains. Nucleic Acids Res. 2012;41(D1):499\u2013507.","journal-title":"Nucleic Acids Res"},{"key":"4902_CR8","doi-asserted-by":"crossref","unstructured":"Lu C-H, Huang S-W, Lai Y-L, Lin C-P, Shih C-H, Huang C-C, Hsu W-L, Hwang J-K. On the relationship between the protein structure and protein dynamics. Proteins: Struct Funct Bioinform. 2008;72(2):625\u201334.","DOI":"10.1002\/prot.21954"},{"issue":"12","key":"4902_CR9","doi-asserted-by":"publisher","first-page":"4993","DOI":"10.1016\/j.bpj.2009.03.051","volume":"96","author":"R Potestio","year":"2009","unstructured":"Potestio R, Pontiggia F, Micheletti C. Coarse-grained description of protein internal dynamics: an optimal strategy for decomposing proteins in rigid subunits. Biophys J. 2009;96(12):4993\u20135002.","journal-title":"Biophys J"},{"key":"4902_CR10","first-page":"485","volume":"2","author":"S Veretnik","year":"2009","unstructured":"Veretnik S, Gu J, Wodak S. Identifying structural domains in proteins. Struct Bioinform. 2009;2:485\u2013513.","journal-title":"Struct Bioinform"},{"issue":"4","key":"4902_CR11","first-page":"536","volume":"247","author":"AG Murzin","year":"1995","unstructured":"Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536\u201340.","journal-title":"J Mol Biol"},{"issue":"6","key":"4902_CR12","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1093\/protein\/8.6.513","volume":"8","author":"SA Islam","year":"1995","unstructured":"Islam SA, Luo J, Sternberg MJ. Identification and analysis of domains in proteins. Protein Eng Des Sel. 1995;8(6):513\u201326.","journal-title":"Protein Eng Des Sel"},{"issue":"8","key":"4902_CR13","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","volume":"5","author":"CA Orengo","year":"1997","unstructured":"Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. Cath-a hierarchic classification of protein domain structures. Structure. 1997;5(8):1093\u2013109.","journal-title":"Structure"},{"issue":"1","key":"4902_CR14","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1016\/0022-2836(74)90136-3","volume":"85","author":"MG Rossman","year":"1974","unstructured":"Rossman MG, Liljas A. Recognition of structural domains in globular proteins. J Mol Biol. 1974;85(1):177\u201381.","journal-title":"J Mol Biol"},{"issue":"3","key":"4902_CR15","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1016\/0022-2836(78)90043-8","volume":"126","author":"GM Crippen","year":"1978","unstructured":"Crippen GM. The tree structural organization of proteins. J Mol Biol. 1978;126(3):315\u201332.","journal-title":"J Mol Biol"},{"issue":"3","key":"4902_CR16","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1016\/0022-2836(79)90363-2","volume":"134","author":"GD Rose","year":"1979","unstructured":"Rose GD. Hierarchic organization of domains in globular proteins. J Mol Biol. 1979;134(3):447\u201370.","journal-title":"J Mol Biol"},{"issue":"23","key":"4902_CR17","doi-asserted-by":"publisher","first-page":"6544","DOI":"10.1021\/bi00526a005","volume":"20","author":"SJ Wodak","year":"1981","unstructured":"Wodak SJ, Janin J. Location of structural domains in proteins. Biochemistry. 1981;20(23):6544\u201352.","journal-title":"Biochemistry"},{"key":"4902_CR18","doi-asserted-by":"crossref","unstructured":"Holm L, Sander C. Parser for protein folding units. Proteins: Struct Funct Bioinform. 1994;19(3):256\u201368.","DOI":"10.1002\/prot.340190309"},{"issue":"1","key":"4902_CR19","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1002\/pro.5560040113","volume":"4","author":"MB Swindells","year":"1995","unstructured":"Swindells MB. A procedure for detecting structural domains in proteins. Protein Sci. 1995;4(1):103\u201312.","journal-title":"Protein Sci"},{"issue":"5","key":"4902_CR20","doi-asserted-by":"publisher","first-page":"872","DOI":"10.1002\/pro.5560040507","volume":"4","author":"AS Siddiqui","year":"1995","unstructured":"Siddiqui AS, Barton GJ. Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci. 1995;4(5):872\u201384.","journal-title":"Protein Sci"},{"key":"4902_CR21","doi-asserted-by":"crossref","unstructured":"Veretnik S, Shindyalov I. Computational methods for domain partitioning of protein structures. In: Computational Methods for Protein Structure Prediction and Modeling, 2007:125\u2013145. Springer, Berlin","DOI":"10.1007\/978-0-387-68372-0_4"},{"issue":"12","key":"4902_CR22","doi-asserted-by":"publisher","first-page":"1091","DOI":"10.1093\/bioinformatics\/16.12.1091","volume":"16","author":"Y Xu","year":"2000","unstructured":"Xu Y, Xu D, Gabow HN. Protein domain decomposition using a graph-theoretic approach. Bioinformatics. 2000;16(12):1091\u2013104.","journal-title":"Bioinformatics"},{"issue":"3","key":"4902_CR23","doi-asserted-by":"publisher","first-page":"944","DOI":"10.1093\/nar\/gkg189","volume":"31","author":"J-T Guo","year":"2003","unstructured":"Guo J-T, Xu D, Kim D, Xu Y. Improving the performance of domainparser for structural domain partition using neural network. Nucleic Acids Res. 2003;31(3):944\u201352.","journal-title":"Nucleic Acids Res"},{"key":"4902_CR24","doi-asserted-by":"crossref","unstructured":"Sistla RK, KV B, Vishveshwara S. Identification of domains and domain interface residues in multidomain proteins from graph spectral method. Proteins: Struct Funct Bioinform. 2005;59(3):616\u201326.","DOI":"10.1002\/prot.20444"},{"key":"4902_CR25","doi-asserted-by":"crossref","unstructured":"Wernisch L, Hunting M, Wodak SJ. Identification of structural domains in proteins by a graph heuristic. Proteins: Struct Funct Bioinform. 1999;35(3):338\u201352.","DOI":"10.1002\/(SICI)1097-0134(19990515)35:3<338::AID-PROT8>3.0.CO;2-I"},{"key":"4902_CR26","doi-asserted-by":"crossref","unstructured":"Ansari ES, Eslahchi C, Pezeshk H, Sadeghi M. Prodomas, protein domain assignment algorithm using center-based clustering and independent dominating set. Proteins: Struct Funct Bioinform. 2014;82(9):1937\u201346.","DOI":"10.1002\/prot.24547"},{"issue":"2","key":"4902_CR27","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1051\/ro\/2015040","volume":"50","author":"M Milostan","year":"2016","unstructured":"Milostan M, Lukasiak P. Domgen-graph based method for protein domain delineation. RAIRO-Oper Res. 2016;50(2):363\u201374.","journal-title":"RAIRO-Oper. Res."},{"key":"4902_CR28","doi-asserted-by":"crossref","unstructured":"Kundu S, Sorensen DC, Phillips\u00a0Jr GN. Automatic domain decomposition of proteins by a gaussian network model. Proteins: Struct Funct Bioinform. 2004;57(4):725\u201333.","DOI":"10.1002\/prot.20268"},{"key":"4902_CR29","doi-asserted-by":"crossref","unstructured":"Taylor TJ, Vaisman II. Protein structural domain assignment with a Delaunay tessellation derived lattice. In: 2006 3rd International Symposium on Voronoi Diagrams in Science and Engineering, 2006;232\u2013240. IEEE","DOI":"10.1109\/ISVD.2006.29"},{"issue":"3","key":"4902_CR30","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1093\/proeng\/gzg026","volume":"16","author":"IN Berezovsky","year":"2003","unstructured":"Berezovsky IN. Discrete structure of van der Waals domains in globular proteins. Protein Eng. 2003;16(3):161\u20137.","journal-title":"Protein Eng"},{"issue":"01","key":"4902_CR31","doi-asserted-by":"publisher","first-page":"1340012","DOI":"10.1142\/S021972001340012X","volume":"11","author":"SS Arab","year":"2013","unstructured":"Arab SS, Gharamaleki MP, Pashandi Z, Mobasseri R. Putracer: a novel method for identification of continuous-domains in multi-domain proteins. J Bioinform Comput Biol. 2013;11(01):1340012.","journal-title":"J Bioinform Comput Biol"},{"issue":"3","key":"4902_CR32","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1093\/protein\/12.3.203","volume":"12","author":"WR Taylor","year":"1999","unstructured":"Taylor WR. Protein structural domain identification. Protein Eng. 1999;12(3):203\u201316.","journal-title":"Protein Eng"},{"issue":"1","key":"4902_CR33","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1007\/s002490050246","volume":"29","author":"Z-Y Xuan","year":"2000","unstructured":"Xuan Z-Y, Ling L-J, Chen R-S. A new method for protein domain recognition. Eur Biophys J. 2000;29(1):7\u201316.","journal-title":"Eur Biophys J"},{"issue":"3","key":"4902_CR34","doi-asserted-by":"publisher","first-page":"506","DOI":"10.1002\/pro.5560040317","volume":"4","author":"R Sowdhamini","year":"1995","unstructured":"Sowdhamini R, Blundell TL. An automatic method involving cluster analysis of secondary structures for the identification of domains in proteins. Protein Sci. 1995;4(3):506\u201320.","journal-title":"Protein Sci"},{"issue":"1","key":"4902_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-8-237","volume":"8","author":"F Emmert-Streib","year":"2007","unstructured":"Emmert-Streib F, Mushegian A. A topological algorithm for identification of structural domains of proteins. BMC Bioinform. 2007;8(1):1\u201310.","journal-title":"BMC Bioinform."},{"issue":"10","key":"4902_CR36","doi-asserted-by":"publisher","first-page":"3331","DOI":"10.1021\/jp210568a","volume":"116","author":"A Genoni","year":"2012","unstructured":"Genoni A, Morra G, Colombo G. Identification of domains in protein structures from the analysis of intramolecular interactions. J Phys Chem B. 2012;116(10):3331\u201343.","journal-title":"J Phys Chem B"},{"issue":"4","key":"4902_CR37","doi-asserted-by":"publisher","first-page":"778","DOI":"10.1107\/S0021889807023874","volume":"40","author":"O Carugo","year":"2007","unstructured":"Carugo O. Identification of domains in protein crystal structures. J Appl Crystallogr. 2007;40(4):778\u201381.","journal-title":"J Appl Crystallogr"},{"key":"4902_CR38","doi-asserted-by":"crossref","unstructured":"Madej T, Gibrat J-F, Bryant SH. Threading a database of protein cores. Proteins: Struct Funct Bioinform. 1995;23(3):356\u201369.","DOI":"10.1002\/prot.340230309"},{"issue":"5","key":"4902_CR39","doi-asserted-by":"publisher","first-page":"947","DOI":"10.1110\/ps.062597307","volume":"16","author":"H Zhou","year":"2007","unstructured":"Zhou H, Xue B, Zhou Y. Ddomain: dividing structures into domains using a normalized domain\u2013domain interaction profile. Protein Sci. 2007;16(5):947\u201355.","journal-title":"Protein Sci"},{"issue":"3","key":"4902_CR40","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1093\/bioinformatics\/btg006","volume":"19","author":"N Alexandrov","year":"2003","unstructured":"Alexandrov N, Shindyalov I. Pdp: protein domain parser. Bioinformatics. 2003;19(3):429\u201330.","journal-title":"Bioinformatics"},{"issue":"1","key":"4902_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-11-310","volume":"11","author":"K Alden","year":"2010","unstructured":"Alden K, Veretnik S, Bourne PE. dConsensus: a tool for displaying domain assignments by multiple structure-based algorithms and for construction of a consensus assignment. BMC Bioinform. 2010;11(1):1\u20137.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4902_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-13-286","volume":"13","author":"HJ Feldman","year":"2012","unstructured":"Feldman HJ. Identifying structural domains of proteins using clustering. BMC Bioinform. 2012;13(1):1\u201312.","journal-title":"BMC Bioinform"},{"issue":"1","key":"4902_CR43","doi-asserted-by":"publisher","first-page":"1600552","DOI":"10.1126\/sciadv.1600552","volume":"3","author":"G Postic","year":"2017","unstructured":"Postic G, Ghouzam Y, Chebrek R, Gelly J-C. An ambiguity principle for assigning protein structural domains. Sci Adv. 2017;3(1):1600552.","journal-title":"Sci Adv"},{"key":"4902_CR44","doi-asserted-by":"crossref","unstructured":"Koczyk G, Berezovsky IN. Domain hierarchy and closed loops (DHCL): a server for exploring hierarchy of protein domain structure. Nucleic Acids Res. 36(suppl_2), 2008:239\u201345.","DOI":"10.1093\/nar\/gkn326"},{"issue":"7","key":"4902_CR45","doi-asserted-by":"publisher","first-page":"1040","DOI":"10.1093\/bioinformatics\/bts076","volume":"28","author":"F Samson","year":"2012","unstructured":"Samson F, Shrager R, Tai C-H, Sam V, Lee B, Munson PJ, Gibrat J-F, Garnier J. Domire: a web server for identifying structural domains and their neighbors in proteins. Bioinformatics. 2012;28(7):1040\u20131.","journal-title":"Bioinformatics"},{"issue":"1","key":"4902_CR46","first-page":"1","volume":"4","author":"Y Hua","year":"2014","unstructured":"Hua Y, Zhu M, Wang Y, Xie Z, Li M. A hybrid method for identification of structural domains. Sci Rep. 2014;4(1):1\u20137.","journal-title":"Sci Rep"},{"key":"4902_CR47","volume-title":"Pattern Recognition","author":"S Theodoridis","year":"2009","unstructured":"Theodoridis S. Pattern Recognition. Burlington: Academic Press; 2009."},{"key":"4902_CR48","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/j.neunet.2012.03.001","volume":"31","author":"F Fouss","year":"2012","unstructured":"Fouss F, Francoisse K, Yen L, Pirotte A, Saerens M. An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw. 2012;31:53\u201372.","journal-title":"Neural Netw"},{"key":"4902_CR49","doi-asserted-by":"crossref","unstructured":"Oneto L, Navarin N, Sperduti A, Anguita D. Deep graph node kernels: a convex approach. In: 2017 International joint conference on neural networks (IJCNN), 2017:316\u2013323. IEEE","DOI":"10.1109\/IJCNN.2017.7965871"},{"key":"4902_CR50","unstructured":"Kondor RI, Lafferty J. Diffusion kernels on graphs and other discrete structures. In: Proceedings of the 19th international conference on machine learning, 2002;2002:315\u201322."},{"issue":"D1","key":"4902_CR51","doi-asserted-by":"publisher","first-page":"475","DOI":"10.1093\/nar\/gky1134","volume":"47","author":"J-M Chandonia","year":"2019","unstructured":"Chandonia J-M, Fox NK, Brenner SE. Scope: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res. 2019;47(D1):475\u201381.","journal-title":"Nucleic Acids Res"},{"key":"4902_CR52","doi-asserted-by":"crossref","unstructured":"Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolym: Original Res Biomol. 1983;22(12):2577\u2013637.","DOI":"10.1002\/bip.360221211"},{"key":"4902_CR53","doi-asserted-by":"crossref","unstructured":"Joosten RP, Te\u00a0Beek TA, Krieger E, Hekkelman ML, Hooft RW, Schneider R, Sander C, Vriend G. A series of pdb related databases for everyday needs. Nucleic Acids Res. 39(suppl_1), 2010:411\u20139.","DOI":"10.1093\/nar\/gkq1105"},{"issue":"3","key":"4902_CR54","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1016\/0022-2836(87)90038-6","volume":"196","author":"S Miller","year":"1987","unstructured":"Miller S, Janin J, Lesk AM, Chothia C. Interior and surface of monomeric proteins. J Mol Biol. 1987;196(3):641\u201356.","journal-title":"J Mol Biol"},{"issue":"1","key":"4902_CR55","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/0022-2836(82)90515-0","volume":"157","author":"J Kyte","year":"1982","unstructured":"Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol. 1982;157(1):105\u201332.","journal-title":"J Mol Biol"},{"issue":"4","key":"4902_CR56","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1134\/S0026893308040195","volume":"42","author":"MY Lobanov","year":"2008","unstructured":"Lobanov MY, Bogatyreva N, Galzitskaya O. Radius of gyration as an indicator of protein structure compactness. Mol Biol. 2008;42(4):623\u20138.","journal-title":"Mol Biol"},{"key":"4902_CR57","doi-asserted-by":"crossref","unstructured":"Wang S, Yao X. Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009:324\u2013331. IEEE","DOI":"10.1109\/CIDM.2009.4938667"},{"key":"4902_CR58","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321\u201357.","journal-title":"J Artif Intell Res."},{"issue":"6","key":"4902_CR59","doi-asserted-by":"publisher","first-page":"725","DOI":"10.1121\/1.1906679","volume":"22","author":"A Bavelas","year":"1950","unstructured":"Bavelas A. Communication patterns in task-oriented groups. J Acoust Soc Am. 1950;22(6):725\u201330.","journal-title":"J Acoust Soc Am"},{"key":"4902_CR60","doi-asserted-by":"crossref","unstructured":"Freeman LC. A set of measures of centrality based on betweenness. Sociometry, 1977:35\u201341.","DOI":"10.2307\/3033543"},{"issue":"2","key":"4902_CR61","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1093\/oxfordjournals.aob.a083391","volume":"18","author":"B Hopkins","year":"1954","unstructured":"Hopkins B, Skellam JG. A new method for determining the type of distribution of plant individuals. Ann Bot. 1954;18(2):213\u201327.","journal-title":"Ann Bot"},{"key":"4902_CR62","doi-asserted-by":"crossref","unstructured":"Hartigan JA, Hartigan PM. The dip test of unimodality. Ann Stat., 1985;70\u201384.","DOI":"10.1214\/aos\/1176346577"},{"issue":"2","key":"4902_CR63","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1177\/104649647100200201","volume":"2","author":"PW Holland","year":"1971","unstructured":"Holland PW, Leinhardt S. Transitivity in structural models of small groups. Compar Group Stud. 1971;2(2):107\u201324.","journal-title":"Compar Group Stud"},{"key":"4902_CR64","doi-asserted-by":"crossref","unstructured":"Watts DJ, Strogatz SH. Collective dynamics of \u2018small-world\u2019 networks. Nature 393(6684), 1998:440\u20132.","DOI":"10.1038\/30918"},{"issue":"1","key":"4902_CR65","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1016\/j.acha.2005.07.004","volume":"21","author":"B Nadler","year":"2006","unstructured":"Nadler B, Lafon S, Coifman RR, Kevrekidis IG. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl Comput Harmon Anal. 2006;21(1):113\u201327.","journal-title":"Appl Comput Harmon Anal"},{"key":"4902_CR66","doi-asserted-by":"crossref","unstructured":"Pons P, Latapy M. Computing communities in large networks using random walks. In: International symposium on computer and information sciences, 2005:284\u2013293. Springer","DOI":"10.1007\/11569596_31"},{"issue":"11","key":"4902_CR67","doi-asserted-by":"publisher","first-page":"1054","DOI":"10.1007\/s11427-014-4745-8","volume":"57","author":"B Chen","year":"2014","unstructured":"Chen B, Li M, Wang J, Wu F-X. Disease gene identification by using graph kernels and Markov random fields. Sci China Life Sci. 2014;57(11):1054\u201363.","journal-title":"Sci China Life Sci"},{"key":"4902_CR68","doi-asserted-by":"crossref","unstructured":"Smola AJ, Kondor R. Kernels and regularization on graphs. In: Learning Theory and Kernel Machines, 2003:144\u2013158. Springer, Berlin","DOI":"10.1007\/978-3-540-45167-9_12"},{"key":"4902_CR69","first-page":"125","volume":"9","author":"PY Chebotarev","year":"1997","unstructured":"Chebotarev PY, Shamis E. A matrix-forest theorem and measuring relations in small social group. Avtomatika i Telemekhanika. 1997;9:125\u201337.","journal-title":"Avtomatika i Telemekhanika"},{"key":"4902_CR70","doi-asserted-by":"crossref","unstructured":"Fouss F, Yen L, Pirotte A, Saerens M. An experimental investigation of graph kernels on a collaborative recommendation task. In: Sixth International Conference on Data Mining (ICDM\u201906), 2006:863\u2013868. IEEE","DOI":"10.1109\/ICDM.2006.18"},{"issue":"10","key":"4902_CR71","doi-asserted-by":"publisher","first-page":"1119","DOI":"10.1107\/S2059798316013218","volume":"72","author":"JJ Tanner","year":"2016","unstructured":"Tanner JJ. Empirical power laws for the radii of gyration of protein oligomers. Acta Crystallographica Sect D: Struct Biol. 2016;72(10):1119\u201329.","journal-title":"Acta Crystallographica Sect D: Struct Biol"},{"key":"4902_CR72","unstructured":"Stella XY, Shi J. Multiclass spectral clustering. In: ICCV, 2003:313\u2013319."},{"key":"4902_CR73","first-page":"849","volume":"14","author":"A Ng","year":"2001","unstructured":"Ng A, Jordan M, Weiss Y. On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst. 2001;14:849\u201356.","journal-title":"Adv Neural Inf Process Syst"},{"key":"4902_CR74","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","volume":"20","author":"PJ Rousseeuw","year":"1987","unstructured":"Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53\u201365.","journal-title":"J Comput Appl Math"},{"issue":"D1","key":"4902_CR75","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1093\/nar\/gkw1098","volume":"45","author":"NL Dawson","year":"2017","unstructured":"Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I. Cath: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res. 2017;45(D1):289\u201395.","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"4902_CR76","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1002\/pro.5560070202","volume":"7","author":"S Jones","year":"1998","unstructured":"Jones S, Stewart M, Michie A, Swindells MB, Orengo C, Thornton JM. Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Sci. 1998;7(2):233\u201342.","journal-title":"Protein Sci"},{"issue":"336","key":"4902_CR77","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","volume":"66","author":"WM Rand","year":"1971","unstructured":"Rand WM. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66(336):846\u201350.","journal-title":"J Am Stat Assoc"},{"issue":"1","key":"4902_CR78","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193\u2013218.","journal-title":"J Classif"},{"key":"4902_CR79","first-page":"2837","volume":"11","author":"NX Vinh","year":"2010","unstructured":"Vinh NX, Epps J, Bailey J. Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res. 2010;11:2837\u201354.","journal-title":"J Mach Learn Res"},{"issue":"3","key":"4902_CR80","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1016\/j.jmb.2004.03.053","volume":"339","author":"S Veretnik","year":"2004","unstructured":"Veretnik S, Bourne PE, Alexandrov NN, Shindyalov IN. Toward consistent assignment of structural domains in proteins. J Mol Biol. 2004;339(3):647\u201378.","journal-title":"J Mol Biol"},{"issue":"1","key":"4902_CR81","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1093\/bioinformatics\/btq610","volume":"27","author":"J-C Gelly","year":"2011","unstructured":"Gelly J-C, de Brevern AG. Protein peeling 3D: new tools for analyzing protein structures. Bioinformatics. 2011;27(1):132\u20133.","journal-title":"Bioinformatics"},{"key":"4902_CR82","doi-asserted-by":"crossref","unstructured":"Tran-Van D, Sperduti A, Costa F. Link enrichment for diffusion-based graph node kernels. In: International conference on artificial neural networks, 2017:155\u2013162. Springer","DOI":"10.1007\/978-3-319-68612-7_18"},{"key":"4902_CR83","unstructured":"Navarin N, Sperduti A. Approximated neighbours minhash graph node kernel. In: ESANN, 2017:281\u2013286."},{"issue":"10","key":"4902_CR84","doi-asserted-by":"publisher","first-page":"3390","DOI":"10.1093\/nar\/gki615","volume":"33","author":"L Brocchieri","year":"2005","unstructured":"Brocchieri L, Karlin S. Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res. 2005;33(10):3390\u2013400.","journal-title":"Nucleic Acids Res"},{"key":"4902_CR85","doi-asserted-by":"crossref","unstructured":"Sch\u00f6lkopf B, Smola A, M\u00fcller K-R. Kernel principal component analysis. In: International Conference on Artificial Neural Networks, 1997:pp. 583\u2013588. Springer","DOI":"10.1007\/BFb0020217"},{"issue":"5","key":"4902_CR86","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1162\/089976698300017467","volume":"10","author":"B Sch\u00f6lkopf","year":"1998","unstructured":"Sch\u00f6lkopf B, Smola A, M\u00fcller K-R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998;10(5):1299\u2013319.","journal-title":"Neural Comput"},{"issue":"5167","key":"4902_CR87","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1126\/science.7516581","volume":"264","author":"MR Sawaya","year":"1994","unstructured":"Sawaya MR, Pelletier H, Kumar A, Wilson SH, Kraut J. Crystal structure of rat DNA polymerase beta: evidence for a common polymerase mechanism. Science. 1994;264(5167):1930\u20135.","journal-title":"Science"},{"issue":"1","key":"4902_CR88","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1016\/S0092-8674(01)00515-3","volume":"107","author":"H Ling","year":"2001","unstructured":"Ling H, Boudsocq F, Woodgate R, Yang W. Crystal structure of a y-family DNA polymerase in action: a mechanism for error-prone and lesion-bypass replication. Cell. 2001;107(1):91\u2013102.","journal-title":"Cell"},{"issue":"4","key":"4902_CR89","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1007\/s000180050165","volume":"54","author":"A Matagne","year":"1998","unstructured":"Matagne A, Dobson CM. The folding process of hen lysozyme: a perspective from the \u2018new view\u2019. Cell Mol Life Sci. 1998;54(4):363\u201371.","journal-title":"Cell Mol Life Sci"},{"key":"4902_CR90","doi-asserted-by":"crossref","unstructured":"Gilquin B, Guilbert C, Perahia D. Unfolding of hen egg lysozyme by molecular dynamics simulations at 300k: insight into the role of the interdomain interface. Proteins: Struct Funct Bioinform. 2000;41(1):58\u201374.","DOI":"10.1002\/1097-0134(20001001)41:1<58::AID-PROT90>3.0.CO;2-3"},{"issue":"6","key":"4902_CR91","doi-asserted-by":"publisher","first-page":"682","DOI":"10.1016\/j.molcel.2011.02.027","volume":"41","author":"A Khushoo","year":"2011","unstructured":"Khushoo A, Yang Z, Johnson AE, Skach WR. Ligand-driven vectorial folding of ribosome-bound human CFTR NBD1. Mol Cell. 2011;41(6):682\u201392.","journal-title":"Mol Cell"},{"key":"4902_CR92","first-page":"201","volume":"3","author":"SJ Kim","year":"2012","unstructured":"Kim SJ, Skach WR. Mechanisms of CFTR folding at the endoplasmic reticulum. Front Pharmacol. 2012;3:201.","journal-title":"Front Pharmacol"},{"issue":"1","key":"4902_CR93","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1016\/j.bpj.2019.10.028","volume":"118","author":"M Warias","year":"2020","unstructured":"Warias M, Grubm\u00fcller H, Bock LV. tRNA dissociation from EF-Tu after GTP hydrolysis: primary steps and antibiotic inhibition. Biophys J. 2020;118(1):151\u201361.","journal-title":"Biophys J"},{"issue":"1","key":"4902_CR94","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-018-2025-5","volume":"19","author":"G Zampieri","year":"2018","unstructured":"Zampieri G, Van Tran D, Donini M, Navarin N, Aiolli F, Sperduti A, Valle G. Scuba: scalable kernel-based gene prioritization. BMC Bioinform. 2018;19(1):1\u201312.","journal-title":"BMC Bioinform"},{"key":"4902_CR95","doi-asserted-by":"crossref","unstructured":"Bett DK, Mondal AM. Diffusion kernel to identify missing ppis in protein network biomarker. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015;1614\u20139. IEEE","DOI":"10.1109\/BIBM.2015.7359917"},{"key":"4902_CR96","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1016\/j.neucom.2018.01.089","volume":"298","author":"DT Van","year":"2018","unstructured":"Van DT, Sperduti A, Costa F. The conjunctive disjunctive graph node kernel for disease gene prioritization. Neurocomputing. 2018;298:90\u20139.","journal-title":"Neurocomputing"},{"issue":"2","key":"4902_CR97","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1007\/s11063-017-9742-z","volume":"48","author":"L Oneto","year":"2018","unstructured":"Oneto L, Navarin N, Sperduti A, Anguita D. Multilayer graph node kernels: stacking while maintaining convexity. Neural Process Lett. 2018;48(2):649\u201367.","journal-title":"Neural Process Lett"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04902-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-022-04902-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-022-04902-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,3]],"date-time":"2024-10-03T13:45:01Z","timestamp":1727963101000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-022-04902-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,8]]},"references-count":97,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["4902"],"URL":"https:\/\/doi.org\/10.1186\/s12859-022-04902-9","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2022,9,8]]},"assertion":[{"value":"31 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 September 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"369"}}