{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"institution":[{"name":"Research Square"}],"indexed":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T06:19:08Z","timestamp":1747203548162,"version":"3.40.5"},"posted":{"date-parts":[[2020,2,11]]},"group-title":"In Review","reference-count":65,"publisher":"Springer Science and Business Media LLC","license":[{"start":{"date-parts":[[2020,2,11]],"date-time":"2020-02-11T00:00:00Z","timestamp":1581379200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"accepted":{"date-parts":[[2019,10,2]]},"abstract":"<title>Abstract<\/title>\n        <p>Background Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional \u201cdot plot\u201d protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. Results Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decay quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2 % sequence identity. We assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB to perform method testing on. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence without needing structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. Conclusions Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.<\/p>","DOI":"10.21203\/rs.2.15797\/v2","type":"posted-content","created":{"date-parts":[[2020,2,11]],"date-time":"2020-02-11T19:35:41Z","timestamp":1581449741000},"source":"Crossref","is-referenced-by-count":0,"title":["Self-Analysis of Repeat Proteins Reveals Evolutionarily Conserved Patterns"],"prefix":"10.21203","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1844-6997","authenticated-orcid":false,"given":"Matthew","family":"Merski","sequence":"first","affiliation":[{"name":"Uniwersytet Warszawski"}]},{"given":"Krzysztof","family":"M\u0142ynarczyk","sequence":"additional","affiliation":[{"name":"Uniwersytet Warszawski"}]},{"given":"Jan","family":"Ludwiczak","sequence":"additional","affiliation":[{"name":"Uniwersytet Warszawski"}]},{"given":"Jakub","family":"Skrzeczkowski","sequence":"additional","affiliation":[{"name":"Uniwersytet Warszawski"}]},{"given":"Stanis\u0142aw","family":"Dunin-Horkawicz","sequence":"additional","affiliation":[{"name":"Uniwersytet Warszawski"}]},{"given":"Maria W.","family":"G\u00f3rna","sequence":"additional","affiliation":[{"name":"Uniwersytet Warszawski"}]}],"member":"297","reference":[{"issue":"5","key":"ref1","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1007\/BF01025494","article-title":"Relation between Sequence Similarity and Structural Similarity in Proteins - Role of Important Properties of Amino-Acids","volume":"4","author":"Kidera A","year":"1985","unstructured":"Kidera A, Konishi Y, Ooi T, Scheraga HA. Relation between Sequence Similarity and Structural Similarity in Proteins - Role of Important Properties of Amino-Acids. J Protein Chem. 1985;4(5):265\u201397.","journal-title":"J Protein Chem"},{"issue":"6","key":"ref2","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1093\/bioinformatics\/btm006","article-title":"On the relationship between sequence and structure similarities in proteomics","volume":"23","author":"Krissinel E","year":"2007","unstructured":"Krissinel E. On the relationship between sequence and structure similarities in proteomics. Bioinformatics. 2007;23(6):717\u201323.","journal-title":"Bioinformatics"},{"key":"ref3","article-title":"Intrinsically Disordered Proteins and Their \"Mysterious\" (Meta)Physics","volume":"7","author":"Uversky VN","year":"2019","unstructured":"Uversky VN. Intrinsically Disordered Proteins and Their \"Mysterious\" (Meta)Physics. Front Phys-Lausanne 2019, 7.","journal-title":"Front Phys-Lausanne"},{"key":"ref4","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2148-12-155","article-title":"Dissecting the role of low-complexity regions in the evolution of vertebrate proteins","volume":"12","author":"Rado-Trilla N","year":"2012","unstructured":"Rado-Trilla N, Alba MM. Dissecting the role of low-complexity regions in the evolution of vertebrate proteins. Bmc Evol Biol 2012, 12.","journal-title":"Bmc Evol Biol"},{"issue":"4","key":"ref5","doi-asserted-by":"crossref","first-page":"879","DOI":"10.1021\/pr060048x","article-title":"Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions","volume":"5","author":"Chen JW","year":"2006","unstructured":"Chen JW, Romero P, Uversky VN, Dunker AK. Conservation of intrinsic disorder in protein domains and families: I. A database of conserved predicted disordered regions. J Proteome Res. 2006;5(4):879\u201387.","journal-title":"J Proteome Res"},{"issue":"2","key":"ref6","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1039\/C4MB00425F","article-title":"Low complexity and disordered regions of proteins have different structural and amino acid preferences","volume":"11","author":"Kumari B","year":"2015","unstructured":"Kumari B, Kumar R, Kumar M. Low complexity and disordered regions of proteins have different structural and amino acid preferences. Mol Biosyst. 2015;11(2):585\u201394.","journal-title":"Mol Biosyst"},{"issue":"00","key":"ref7","first-page":"1","article-title":"Disentangling the complexity of low complexity proteins","volume":"00","author":"Mier P","year":"2019","unstructured":"Mier P, Paladin L, Taman S, Petrosian S, Hajdu-Soltesz B, Urbanek A, Gruca A, Plewczynski D, Grynberg M, Bernado P, et al. Disentangling the complexity of low complexity proteins. Brief Bioinform. 2019;00(00):1\u201315.","journal-title":"Brief Bioinform"},{"issue":"3","key":"ref8","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/j.jsb.2011.08.009","article-title":"Tandem repeats in proteins: From sequence to structure","volume":"179","author":"Kajava AV","year":"2012","unstructured":"Kajava AV. Tandem repeats in proteins: From sequence to structure. J Struct Biol. 2012;179(3):279\u201388.","journal-title":"J Struct Biol"},{"issue":"D1","key":"ref9","doi-asserted-by":"crossref","first-page":"D308","DOI":"10.1093\/nar\/gkw1136","article-title":"RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures","volume":"45","author":"Paladin L","year":"2017","unstructured":"Paladin L, Hirsh L, Piovesan D, Andrade-Navarro MA, Kajava AV, Tosatto SCE. RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures. Nucleic Acids Res. 2017;45(D1):D308\u201312.","journal-title":"Nucleic Acids Res"},{"issue":"12","key":"ref10","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1111\/j.1742-4658.2010.07684.x","article-title":"Protein tandem repeats - the more perfect, the less structured","volume":"277","author":"Jorda J","year":"2010","unstructured":"Jorda J, Xue B, Uversky VN, Kajava AV. Protein tandem repeats - the more perfect, the less structured. Febs J. 2010;277(12):2673\u201382.","journal-title":"Febs J"},{"issue":"3","key":"ref11","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1128\/IAI.01035-12","article-title":"Tetratricopeptide Repeat Motifs in the World of Bacterial Pathogens: Role in Virulence Mechanisms","volume":"81","author":"Cerveny L","year":"2013","unstructured":"Cerveny L, Straskova A, Dankova V, Hartlova A, Ceckova M, Staud F, Stulik J. Tetratricopeptide Repeat Motifs in the World of Bacterial Pathogens: Role in Virulence Mechanisms. Infect Immun. 2013;81(3):629\u201335.","journal-title":"Infect Immun"},{"issue":"12","key":"ref12","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1016\/j.tplants.2008.10.001","article-title":"Pentatricopeptide repeat proteins: a socket set for organelle gene expression","volume":"13","author":"Schmitz-Linneweber C","year":"2008","unstructured":"Schmitz-Linneweber C, Small I. Pentatricopeptide repeat proteins: a socket set for organelle gene expression. Trends Plant Sci. 2008;13(12):663\u201370.","journal-title":"Trends Plant Sci"},{"issue":"6671","key":"ref13","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1038\/32204","article-title":"The 1.7 angstrom crystal structure of the regulator of chromosome condensation (RCC1) reveals a seven-bladed propeller","volume":"392","author":"Renault L","year":"1998","unstructured":"Renault L, Nassar N, Vetter I, Becker J, Klebe C, Roth M, Wittinghofer A. The 1.7 angstrom crystal structure of the regulator of chromosome condensation (RCC1) reveals a seven-bladed propeller. Nature. 1998;392(6671):97\u2013101.","journal-title":"Nature"},{"issue":"6","key":"ref14","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0100015","article-title":"Interferon-Induced Genes of the Expanded IFIT Family Show Conserved Antiviral Activities in Non-Mammalian Species","volume":"9","author":"Varela M","year":"2014","unstructured":"Varela M, Diaz-Rosales P, Pereiro P, Forn-Cuni G, Costa MM, Dios S, Romero A, Figueras A, Novoa B. Interferon-Induced Genes of the Expanded IFIT Family Show Conserved Antiviral Activities in Non-Mammalian Species. Plos One 2014, 9(6).","journal-title":"Plos One"},{"issue":"17","key":"ref15","doi-asserted-by":"crossref","first-page":"9292","DOI":"10.1073\/pnas.93.17.9292","article-title":"SPINDLY, a tetratricopeptide repeat protein involved in gibberellin signal transduction Arabidopsis","volume":"93","author":"Jacobsen SE","year":"1996","unstructured":"Jacobsen SE, Binkowski KA, Olszewski NE. SPINDLY, a tetratricopeptide repeat protein involved in gibberellin signal transduction Arabidopsis. P Natl Acad Sci USA. 1996;93(17):9292\u20136.","journal-title":"P Natl Acad Sci USA"},{"key":"ref16","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-13-S3-S8","article-title":"Ab initio detection of fuzzy amino acid tandem repeats in protein sequences","volume":"13","author":"Pellegrini M","year":"2012","unstructured":"Pellegrini M, Renda ME, Vecchio A. Ab initio detection of fuzzy amino acid tandem repeats in protein sequences. Bmc Bioinformatics 2012, 13.","journal-title":"Bmc Bioinformatics"},{"issue":"1","key":"ref17","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1006\/jmbi.1999.3136","article-title":"A census of protein repeats","volume":"293","author":"Marcotte EM","year":"1999","unstructured":"Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D. A census of protein repeats. J Mol Biol. 1999;293(1):151\u201360.","journal-title":"J Mol Biol"},{"issue":"2\u20133","key":"ref18","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1006\/jsbi.2000.4328","article-title":"Review: Proteins with repeated sequence - Structural prediction and modeling","volume":"134","author":"Kajava AV","year":"2001","unstructured":"Kajava AV. Review: Proteins with repeated sequence - Structural prediction and modeling. J Struct Biol. 2001;134(2\u20133):132\u201344.","journal-title":"J Struct Biol"},{"key":"ref19","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.732","article-title":"Tandem-repeat protein domains across the tree of life","volume":"3","author":"Jernigan KK","year":"2015","unstructured":"Jernigan KK, Bordenstein SR. Tandem-repeat protein domains across the tree of life. Peerj 2015, 3.","journal-title":"Peerj"},{"issue":"20","key":"ref20","doi-asserted-by":"crossref","first-page":"10005","DOI":"10.1093\/nar\/gks726","article-title":"Repeat or not repeat?-Statistical validation of tandem repeat prediction in genomic sequences","volume":"40","author":"Schaper E","year":"2012","unstructured":"Schaper E, Kajava AV, Hauser A, Anisimova M. Repeat or not repeat?-Statistical validation of tandem repeat prediction in genomic sequences. Nucleic Acids Res. 2012;40(20):10005\u201317.","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"ref21","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/0092-8674(90)90745-Z","article-title":"A Repeating Amino-Acid Motif in Cdc23 Defines a Family of Proteins and a New Relationship among Genes Required for Mitosis and Rna-Synthesis","volume":"60","author":"Sikorski RS","year":"1990","unstructured":"Sikorski RS, Boguski MS, Goebl M, Hieter P. A Repeating Amino-Acid Motif in Cdc23 Defines a Family of Proteins and a New Relationship among Genes Required for Mitosis and Rna-Synthesis. Cell. 1990;60(2):307\u201317.","journal-title":"Cell"},{"issue":"12","key":"ref22","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1016\/j.tibs.2003.10.007","article-title":"TPR proteins: the versatile helix","volume":"28","author":"D'Andrea LD","year":"2003","unstructured":"D'Andrea LD, Regan L. TPR proteins: the versatile helix. Trends Biochem Sci. 2003;28(12):655\u201362.","journal-title":"Trends Biochem Sci"},{"issue":"11","key":"ref23","doi-asserted-by":"crossref","first-page":"2055","DOI":"10.1016\/j.str.2015.07.022","article-title":"A Naturally Occurring Repeat Protein with High Internal Sequence Identity Defines a New Class of TPR-like Proteins","volume":"23","author":"Marold JD","year":"2015","unstructured":"Marold JD, Kavran JM, Bowman GD, Barrick D. A Naturally Occurring Repeat Protein with High Internal Sequence Identity Defines a New Class of TPR-like Proteins. Structure. 2015;23(11):2055\u201365.","journal-title":"Structure"},{"issue":"3","key":"ref24","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1007\/s00018-016-2319-6","article-title":"Metazoan evolution of the armadillo repeat superfamily","volume":"74","author":"Gul IS","year":"2017","unstructured":"Gul IS, Hulpiau P, Saeys Y, van Roy F. Metazoan evolution of the armadillo repeat superfamily. Cell Mol Life Sci. 2017;74(3):525\u201341.","journal-title":"Cell Mol Life Sci"},{"issue":"1","key":"ref25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1006\/jmbi.2001.4624","article-title":"Comparison of ARM and HEAT protein repeats","volume":"309","author":"Andrade MA","year":"2001","unstructured":"Andrade MA, Petosa C, O'Donoghue SI, Muller CW, Bork P. Comparison of ARM and HEAT protein repeats. J Mol Biol. 2001;309(1):1\u201318.","journal-title":"J Mol Biol"},{"issue":"2","key":"ref26","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1038\/ng1095-115","article-title":"Heat Repeats in the Huntingtons-Disease Protein","volume":"11","author":"Andrade MA","year":"1995","unstructured":"Andrade MA, Bork P. Heat Repeats in the Huntingtons-Disease Protein. Nat Genet. 1995;11(2):115\u20136.","journal-title":"Nat Genet"},{"issue":"2\u20133","key":"ref27","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1006\/jsbi.2001.4392","article-title":"Protein repeats: Structures, functions, and evolution","volume":"134","author":"Andrade MA","year":"2001","unstructured":"Andrade MA, Perez-Iratxeta C, Ponting CP. Protein repeats: Structures, functions, and evolution. J Struct Biol. 2001;134(2\u20133):117\u201331.","journal-title":"J Struct Biol"},{"key":"ref28","doi-asserted-by":"crossref","first-page":"844","DOI":"10.1042\/BST20150083","article-title":"Repeat proteins challenge the concept of structural domains","volume":"43","author":"Espada R","year":"2015","unstructured":"Espada R, Parra RG, Sippl MJ, Mora T, Walczak AM, Ferreiro DU. Repeat proteins challenge the concept of structural domains. Biochem Soc T. 2015;43:844\u20139.","journal-title":"Biochem Soc T"},{"issue":"5","key":"ref29","doi-asserted-by":"crossref","first-page":"1132","DOI":"10.1093\/molbev\/msu062","article-title":"Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes","volume":"31","author":"Schaper E","year":"2014","unstructured":"Schaper E, Gascuel O, Anisimova M. Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes. Mol Biol Evol. 2014;31(5):1132\u201348.","journal-title":"Mol Biol Evol"},{"issue":"12","key":"ref30","doi-asserted-by":"crossref","first-page":"3170","DOI":"10.1093\/molbev\/msw194","article-title":"Evolution of Protein Domain Repeats in Metazoa","volume":"33","author":"Schuler A","year":"2016","unstructured":"Schuler A, Bornberg-Bauer E. Evolution of Protein Domain Repeats in Metazoa. Mol Biol Evol. 2016;33(12):3170\u201382.","journal-title":"Mol Biol Evol"},{"year":"1996","author":"Sonnhammer ELL","unstructured":"Sonnhammer ELL, Durbin R: A dot-matrix program with dynamic threshold control suited for genomic DNA and protein sequence analysis (Reprinted from Gene Combis, vol 167, pg GC1-GC10, 1996). Gene 1995, 167(1\u20132):Gc1-Gc10.","key":"ref31"},{"issue":"D1","key":"ref32","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","author":"Bateman A","year":"2019","unstructured":"Bateman A, Martin MJ, Orchard S, Magrane M, Alpi E, Bely B, Bingley M, Britto R, Bursteinas B, Busiello G, et al. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506\u201315.","journal-title":"Nucleic Acids Res"},{"issue":"22","key":"ref33","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino-Acid Substitution Matrices from Protein Blocks","volume":"89","author":"Henikoff S","year":"1992","unstructured":"Henikoff S, Henikoff JG. Amino-Acid Substitution Matrices from Protein Blocks. P Natl Acad Sci USA. 1992;89(22):10915\u20139.","journal-title":"P Natl Acad Sci USA"},{"issue":"12","key":"ref34","doi-asserted-by":"crossref","first-page":"1572","DOI":"10.1093\/bioinformatics\/btg180","article-title":"MrBayes 3: Bayesian phylogenetic inference under mixed models","volume":"19","author":"Ronquist F","year":"2003","unstructured":"Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19(12):1572\u20134.","journal-title":"Bioinformatics"},{"key":"ref35","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1093\/cid\/ciq117","article-title":"Relationship between Immunity to Borrelia burgdorferi Outer-surface Protein A (OspA) and Lyme Arthritis","volume":"52","author":"Steere AC","year":"2011","unstructured":"Steere AC, Drouin EE, Glickstein LJ. Relationship between Immunity to Borrelia burgdorferi Outer-surface Protein A (OspA) and Lyme Arthritis. Clin Infect Dis. 2011;52:259\u201365.","journal-title":"Clin Infect Dis"},{"key":"ref36","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1107\/S139900471500704X","article-title":"Structural characterization of a novel subfamily of leucine-rich repeat proteins from the human pathogen Leptospira interrogans","volume":"71","author":"Miras I","year":"2015","unstructured":"Miras I, Saul F, Nowakowski M, Weber P, Haouz A, Shepard W, Picardeau M. Structural characterization of a novel subfamily of leucine-rich repeat proteins from the human pathogen Leptospira interrogans. Acta Crystallogr D. 2015;71:1351\u20139.","journal-title":"Acta Crystallogr D"},{"issue":"6","key":"ref37","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkx1313","article-title":"HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks","volume":"46","author":"Azad A","year":"2018","unstructured":"Azad A, Pavlopoulos GA, Ouzounis CA, Kyrpides NC, Buluc A. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic Acids Res 2018, 46(6).","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"ref38","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock PJA","year":"2009","unstructured":"Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422\u20133.","journal-title":"Bioinformatics"},{"issue":"18","key":"ref39","doi-asserted-by":"crossref","first-page":"3702","DOI":"10.1093\/bioinformatics\/bth444","article-title":"CLANS: a Java application for visualizing protein families based on pairwise similarity","volume":"20","author":"Frickey T","year":"2004","unstructured":"Frickey T, Lupas A. CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics. 2004;20(18):3702\u20134.","journal-title":"Bioinformatics"},{"issue":"4","key":"ref40","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1002\/(SICI)1097-0134(19990601)35:4<440::AID-PROT7>3.0.CO;2-Y","article-title":"A fast algorithm for genome-wide analysis of proteins with repeated sequences","volume":"35","author":"Pellegrini M","year":"1999","unstructured":"Pellegrini M, Marcotte EM, Yeates TO. A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins-Structure Function Genetics. 1999;35(4):440\u20136.","journal-title":"Proteins-Structure Function Genetics"},{"key":"ref41","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1093\/bioinformatics\/bth911","article-title":"Tracking repeats using significance and transitivity","volume":"20","author":"Szklarczyk R","year":"2004","unstructured":"Szklarczyk R, Heringa J. Tracking repeats using significance and transitivity. Bioinformatics. 2004;20:311\u20137.","journal-title":"Bioinformatics"},{"issue":"2","key":"ref42","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1002\/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-Z","article-title":"Rapid automatic detection and alignment of repeats in protein sequences","volume":"41","author":"Heger A","year":"2000","unstructured":"Heger A, Holm L. Rapid automatic detection and alignment of repeats in protein sequences. Proteins-Structure Function Genetics. 2000;41(2):224\u201337.","journal-title":"Proteins-Structure Function Genetics"},{"issue":"1","key":"ref43","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/nar\/28.1.257","article-title":"SCOP: a Structural Classification of Proteins database","volume":"28","author":"Lo Conte L","year":"2000","unstructured":"Lo Conte L, Ailey B, Hubbard TJP, Brenner SE, Murzin AG, Chothia C. SCOP: a Structural Classification of Proteins database. Nucleic Acids Res. 2000;28(1):257\u20139.","journal-title":"Nucleic Acids Res"},{"key":"ref44","doi-asserted-by":"crossref","first-page":"W137","DOI":"10.1093\/nar\/gkl130","article-title":"HHrep: de novo protein repeat detection and the origin of TIM barrels","volume":"34","author":"Soding J","year":"2006","unstructured":"Soding J, Remmert M, Biegert A. HHrep: de novo protein repeat detection and the origin of TIM barrels. Nucleic Acids Res. 2006;34:W137\u201342.","journal-title":"Nucleic Acids Res"},{"issue":"12","key":"ref45","doi-asserted-by":"crossref","first-page":"i358","DOI":"10.1093\/bioinformatics\/btq209","article-title":"TRStalker: an efficient heuristic for finding fuzzy tandem repeats","volume":"26","author":"Pellegrini M","year":"2010","unstructured":"Pellegrini M, Renda ME, Vecchio A. TRStalker: an efficient heuristic for finding fuzzy tandem repeats. Bioinformatics. 2010;26(12):i358\u201366.","journal-title":"Bioinformatics"},{"issue":"20","key":"ref46","doi-asserted-by":"crossref","first-page":"2632","DOI":"10.1093\/bioinformatics\/btp482","article-title":"T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm","volume":"25","author":"Jorda J","year":"2009","unstructured":"Jorda J, Kajava AV. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics. 2009;25(20):2632\u20138.","journal-title":"Bioinformatics"},{"key":"ref47","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-8-382","article-title":"XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences","volume":"8","author":"Newman AM","year":"2007","unstructured":"Newman AM, Cooper JB. XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences. Bmc Bioinformatics 2007, 8.","journal-title":"Bmc Bioinformatics"},{"key":"ref48","article-title":"Genome-wide investigation of pentatricopeptide repeat gene family in poplar and their expression analysis in response to biotic and abiotic stresses","volume":"8","author":"Xing HT","year":"2018","unstructured":"Xing HT, Fu XK, Yang C, Tang XF, Guo L, Li CF, Xu CZ, Luo KM. Genome-wide investigation of pentatricopeptide repeat gene family in poplar and their expression analysis in response to biotic and abiotic stresses. Sci Rep-Uk 2018, 8.","journal-title":"Sci Rep-Uk"},{"issue":"4","key":"ref49","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1111\/j.1365-313X.2012.05111.x","article-title":"Identification of an OPR protein involved in the translation initiation of the PsaB subunit of photosystem I","volume":"72","author":"Rahire M","year":"2012","unstructured":"Rahire M, Laroche F, Cerutti L, Rochaix JD. Identification of an OPR protein involved in the translation initiation of the PsaB subunit of photosystem I. Plant J. 2012;72(4):652\u201361.","journal-title":"Plant J"},{"issue":"3","key":"ref50","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1016\/j.ygeno.2006.11.011","article-title":"Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats","volume":"89","author":"Mularoni L","year":"2007","unstructured":"Mularoni L, Veitia RA, Alba MM. Highly constrained proteins contain an unexpectedly large number of amino acid tandem repeats. Genomics. 2007;89(3):316\u201325.","journal-title":"Genomics"},{"issue":"47","key":"ref51","doi-asserted-by":"crossref","first-page":"17753","DOI":"10.1073\/pnas.0606690103","article-title":"Atomic structures of peptide self-assembly mimics","volume":"103","author":"Makabe K","year":"2006","unstructured":"Makabe K, McElheny D, Tereshko V, Hilyard A, Gawlak G, Yan S, Koide A, Koide S. Atomic structures of peptide self-assembly mimics. P Natl Acad Sci USA. 2006;103(47):17753\u20138.","journal-title":"P Natl Acad Sci USA"},{"issue":"1","key":"ref52","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1002\/(SICI)1097-0134(199705)28:1<72::AID-PROT7>3.0.CO;2-L","article-title":"An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease","volume":"28","author":"Holm L","year":"1997","unstructured":"Holm L, Sander C. An evolutionary treasure: Unification of a broad set of amidohydrolases related to urease. Proteins. 1997;28(1):72\u201382.","journal-title":"Proteins"},{"issue":"10","key":"ref53","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0077074","article-title":"beta-Propeller Blades as Ancestral Peptides in Protein Evolution","volume":"8","author":"Kopec KO","year":"2013","unstructured":"Kopec KO, Lupas AN. beta-Propeller Blades as Ancestral Peptides in Protein Evolution. Plos One 2013, 8(10).","journal-title":"Plos One"},{"issue":"D1","key":"ref54","doi-asserted-by":"crossref","first-page":"D315","DOI":"10.1093\/nar\/gky952","article-title":"EncoMPASS: an online database for analyzing structure and symmetry in membrane proteins","volume":"47","author":"Sarti E","year":"2019","unstructured":"Sarti E, Aleksandrova AA, Ganta SK, Yavatkar AS, Forrest LR. EncoMPASS: an online database for analyzing structure and symmetry in membrane proteins. Nucleic Acids Res. 2019;47(D1):D315\u201321.","journal-title":"Nucleic Acids Res"},{"key":"ref55","volume-title":"In., vol.\u00a0R package version 1.18.0","author":"Kaisers W","year":"2019","unstructured":"Kaisers W. seqTools: Analysis of nucleotide, sequence and quality content on fastq files. In., vol.\u00a0R package version 1.18.0; 2019."},{"year":"2014","author":"Hold-Geoffroy Y","unstructured":"Hold-Geoffroy Y, Gagnon O, Parizeau M: Once you SCOOP, no need to fork. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment: July 13\u201318, 2014; Atlanta, GA, USA. 2014.","key":"ref56"},{"issue":"9","key":"ref57","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v053.i09","article-title":"fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python","volume":"53","author":"Mullner D","year":"2013","unstructured":"Mullner D. fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python. J Stat Softw. 2013;53(9):1\u201318.","journal-title":"J Stat Softw"},{"issue":"22","key":"ref58","doi-asserted-by":"crossref","first-page":"3718","DOI":"10.1093\/bioinformatics\/btv428","article-title":"dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering","volume":"31","author":"Galili T","year":"2015","unstructured":"Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31(22):3718\u201320.","journal-title":"Bioinformatics"},{"issue":"11","key":"ref59","doi-asserted-by":"crossref","first-page":"1857","DOI":"10.1093\/bioinformatics\/btv042","article-title":"protr\/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences","volume":"31","author":"Xiao N","year":"2015","unstructured":"Xiao N, Cao DS, Zhu MF, Xu QS. protr\/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics. 2015;31(11):1857\u20139.","journal-title":"Bioinformatics"},{"key":"ref60","article-title":"Biostrings: Efficient manipulation of biological strings","author":"Pag\u00e8s H","year":"2017","unstructured":"Pag\u00e8s H, Aboyoun P. R G, S aD: Biostrings: Efficient manipulation of biological strings. In., 2.46.0 edn. R; 2017.","journal-title":"In."},{"year":"2016","author":"Warnes GR","unstructured":"Warnes GR, Bolker B, Bonebakker L, Gentleman R, Huber W, Liaw A, Lumley T, Maechler M, Magnusson A, Moeller S, et al: Gplots: Various R Programming Tools for Plotting Data. In. R; 2016.","key":"ref61"},{"issue":"1","key":"ref62","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman HM","year":"2000","unstructured":"Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000;28(1):235\u201342.","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"ref63","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic Local Alignment Search Tool","volume":"215","author":"Altschul SF","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. J Mol Biol. 1990;215(3):403\u201310.","journal-title":"J Mol Biol"},{"issue":"D1","key":"ref64","doi-asserted-by":"crossref","first-page":"D279","DOI":"10.1093\/nar\/gkv1344","article-title":"The Pfam protein families database: towards a more sustainable future","volume":"44","author":"Finn RD","year":"2016","unstructured":"Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44(D1):D279\u201385.","journal-title":"Nucleic Acids Res"},{"issue":"W1","key":"ref65","doi-asserted-by":"crossref","first-page":"W550","DOI":"10.1093\/nar\/gkx273","article-title":"Programmatic access to bioinformatics tools from EMBL-EBI update: 2017","volume":"45","author":"Chojnacki S","year":"2017","unstructured":"Chojnacki S, Cowley A, Lee J, Foix A, Lopez R. Programmatic access to bioinformatics tools from EMBL-EBI update: 2017. Nucleic Acids Res. 2017;45(W1):W550\u20133.","journal-title":"Nucleic Acids Res"}],"container-title":[],"original-title":[],"link":[{"URL":"https:\/\/www.researchsquare.com\/article\/rs-6457\/v2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.researchsquare.com\/article\/rs-6457\/v2.html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T21:01:13Z","timestamp":1665090073000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.researchsquare.com\/article\/rs-6457\/v2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,11]]},"references-count":65,"URL":"https:\/\/doi.org\/10.21203\/rs.2.15797\/v2","relation":{"is-preprint-of":[{"id-type":"doi","id":"10.1186\/s12859-020-3493-y","asserted-by":"subject"}]},"subject":[],"published":{"date-parts":[[2020,2,11]]},"subtype":"preprint"}}