{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T01:48:25Z","timestamp":1774057705468,"version":"3.50.1"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T00:00:00Z","timestamp":1753056000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Engineering and Physical Sciences Research Council CDT training","award":["EP\/S022856\/1"],"award-info":[{"award-number":["EP\/S022856\/1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Solenoid proteins, a subset of tandem repeat proteins, have structurally distinct, modular, and elongated architectures that differentiate them from globular proteins. These proteins play essential roles in diverse biological processes, including protein binding, enzymatic catalysis, ice binding, and nucleic acid interactions. Despite their biological significance and increasing commercial applications\u2013such as in therapeutic engineered variants like DARPins and designed PPR proteins\u2013accurate identification and annotation of solenoid structures remain challenging. Given that solenoid structures are more conserved than their sequences, recent advances in protein structure prediction suggest that structure-based solenoid detection methods are preferable to sequence-based ones.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce SOLeNNoID, a deep-learning-based pipeline for predicting solenoid residues in protein structures. Our method employs a convolutional neural network architecture to analyse protein distance matrices, enabling accurate identification of solenoid-containing regions. SOLeNNoID covers all three solenoid subclasses: \u03b1-, \u03b1\/\u03b2-, and \u03b2-solenoids. Comparative evaluation against existing structure-based methods demonstrates the superior performance of our approach. Applying SOLeNNoID to the entire Protein Data Bank led to a 71% increase in detected solenoid-containing entries compared to the gold-standard RepeatsDB database, significantly expanding the known solenoid protein repertoire.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>SOLeNNoID is implemented in Python and available on github at https:\/\/github.com\/gnik2018\/SOLeNNoID. The source code and pre-trained models are accessible under a free-software license. Training data are available on Zenodo at https:\/\/zenodo.org\/records\/14927497.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf415","type":"journal-article","created":{"date-parts":[[2025,7,18]],"date-time":"2025-07-18T11:19:50Z","timestamp":1752837590000},"source":"Crossref","is-referenced-by-count":4,"title":["SOLeNNoID: a deep learning pipeline for solenoid residue detection in protein structures"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0342-2423","authenticated-orcid":false,"given":"Georgi I","family":"Nikov","sequence":"first","affiliation":[{"name":"Life Sciences, Imperial College , London SW7 2AZ,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7107-6569","authenticated-orcid":false,"given":"Daniella","family":"Pretorius","sequence":"additional","affiliation":[{"name":"Life Sciences, Imperial College , London SW7 2AZ,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8897-0161","authenticated-orcid":false,"given":"James W","family":"Murray","sequence":"additional","affiliation":[{"name":"Life Sciences, Imperial College , London SW7 2AZ,","place":["United Kingdom"]}]}],"member":"286","published-online":{"date-parts":[[2025,7,21]]},"reference":[{"key":"2025081319161028800_btaf415-B1","author":"Abadi","year":"2016"},{"key":"2025081319161028800_btaf415-B2","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1016\/j.ijbiomac.2021.08.105","article-title":"Anti freeze proteins (AFP): properties, sources and applications - a review","volume":"189","author":"Baskaran","year":"2021","journal-title":"Int J Biol Macromol"},{"key":"2025081319161028800_btaf415-B3","doi-asserted-by":"crossref","first-page":"D480","DOI":"10.1093\/nar\/gkaa1100","article-title":"UniProt: the universal protein knowledgebase in 2021","volume":"49","author":"Bateman","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B4","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1016\/S1471-4906(03)00242-4","article-title":"Leucine-rich repeats and pathogen recognition in toll-like receptors","volume":"24","author":"Bell","year":"2003","journal-title":"Trends Immunol"},{"key":"2025081319161028800_btaf415-B5","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B6","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1093\/bioinformatics\/btn039","article-title":"De novo identification of highly diverged protein repeats by probabilistic consistency","volume":"24","author":"Biegert","year":"2008","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B7","doi-asserted-by":"crossref","first-page":"D437","DOI":"10.1093\/nar\/gkaa1038","article-title":"RCSB protein data bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences","volume":"49","author":"Burley","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B8","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1007\/s12038-020-00058-x","article-title":"Prigsa2: improved version of protein repeat identification by graph spectral analysis","volume":"45","author":"Chakrabarty","year":"2020","journal-title":"J Biosci"},{"key":"2025081319161028800_btaf415-B9","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1002\/pro.4052","article-title":"DbStRiPs: database of structural repeats in proteins","volume":"31","author":"Chakrabarty","year":"2022","journal-title":"Protein Sci"},{"key":"2025081319161028800_btaf415-B10","author":"Chollet","year":"2015"},{"key":"2025081319161028800_btaf415-B11","author":"Clevert","year":"2016"},{"key":"2025081319161028800_btaf415-B12","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B13","doi-asserted-by":"crossref","first-page":"691865","DOI":"10.3389\/fbinf.2021.691865","article-title":"TRAL 2.0: tandem repeat detection with circular profile hidden markov models and evolutionary aligner","volume":"1","author":"Delucchi","year":"2021","journal-title":"Front Bioinform"},{"key":"2025081319161028800_btaf415-B14","doi-asserted-by":"crossref","first-page":"407","DOI":"10.3390\/genes11040407","article-title":"A new census of protein tandem repeats and their relationship with intrinsic disorder","volume":"11","author":"Delucchi","year":"2020","journal-title":"Genes (Basel)"},{"key":"2025081319161028800_btaf415-B15","doi-asserted-by":"crossref","first-page":"2611","DOI":"10.1016\/j.febslet.2015.08.025","article-title":"Tapo: a combined method for the identification of tandem repeats in protein structures","volume":"589","author":"Do Viet","year":"2015","journal-title":"FEBS Lett"},{"key":"2025081319161028800_btaf415-B16","doi-asserted-by":"crossref","first-page":"D352","DOI":"10.1093\/nar\/gkt1175","article-title":"Repeatsdgb: a database of tandem repeat protein structures","volume":"42","author":"Domenico","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B17","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1093\/bioinformatics\/btx828","article-title":"mTM-align: an algorithm for fast and accurate multiple protein structure alignment","volume":"34","author":"Dong","year":"2018","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B18","doi-asserted-by":"crossref","first-page":"e59530","DOI":"10.7554\/eLife.59530","article-title":"Structural analysis of the Legionella pneumophila Dot\/Icm Type IV secretion system core complex","volume":"9","author":"Durie","year":"2020","journal-title":". ELife"},{"key":"2025081319161028800_btaf415-B19","first-page":"e28384","volume-title":"ELife","author":"D\u00edaz-Sant\u00edn","year":"2017"},{"key":"2025081319161028800_btaf415-B20","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1016\/j.bbapap.2009.08.026","article-title":"The \u03b3 class of carbonic anhydrases","volume":"1804","author":"Ferry","year":"2010","journal-title":"Biochim Biophys Acta"},{"key":"2025081319161028800_btaf415-B21","doi-asserted-by":"crossref","first-page":"e79894","DOI":"10.1371\/journal.pone.0079894","article-title":"Functional and genomic analyses of alpha-solenoid proteins","volume":"8","author":"Fournier","year":"2013","journal-title":"PLoS ONE"},{"key":"2025081319161028800_btaf415-B22","first-page":"48","author":"Garg","year":"2022"},{"key":"2025081319161028800_btaf415-B23","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1016\/j.sbi.2021.02.002","article-title":"Repeat proteins: designing new shapes and functions for solenoid folds","volume":"68","author":"Gidley","year":"2021","journal-title":"Curr Opin Struct Biol"},{"key":"2025081319161028800_btaf415-B24","doi-asserted-by":"crossref","first-page":"W402","DOI":"10.1093\/nar\/gky360","article-title":"Repeatsdb-lite: a web server for unit annotation of tandem repeat proteins","volume":"46","author":"Hirsh","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B25","doi-asserted-by":"crossref","first-page":"2632","DOI":"10.1093\/bioinformatics\/btp482","article-title":"T-REKS: identification of tandem REpeats in sequences with a K-meanS based algorithm","volume":"25","author":"Jorda","year":"2009","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B26","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/j.jsb.2011.08.009","article-title":"Tandem repeats in proteins: from sequence to structure","volume":"179","author":"Kajava","year":"2012","journal-title":"J Struct Biol"},{"key":"2025081319161028800_btaf415-B27","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/S0065-3233(06)73003-0","article-title":"\u03b2-rolls, \u03b2-helices, and other \u03b2-solenoid proteins","volume":"73","author":"Kajava","year":"2006","journal-title":"Adv Protein Chem"},{"key":"2025081319161028800_btaf415-B28","author":"Kingma","year":"2015"},{"key":"2025081319161028800_btaf415-B29","first-page":"87","volume-title":"Positioning and Power in Academic Publishing: Players, Agents and Agendas","author":"Kluyver","year":"2016"},{"key":"2025081319161028800_btaf415-B30","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/1479-7364-4-3-207","article-title":"The CATH database","volume":"4","author":"Knudsen","year":"2010","journal-title":"Hum Genomics"},{"key":"2025081319161028800_btaf415-B31","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1016\/S0968-0004(00)01667-4","article-title":"When protein folding is simplified to protein coiling: the continuum of solenoid protein structures","volume":"25","author":"Kobe","year":"2000","journal-title":"Trends Biochem Sci"},{"key":"2025081319161028800_btaf415-B32","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025081319161028800_btaf415-B33","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1016\/j.biochi.2015.04.004","article-title":"An overview of pentatricopeptide repeat proteins and their applications","volume":"113","author":"Manna","year":"2015","journal-title":"Biochimie"},{"key":"2025081319161028800_btaf415-B34","doi-asserted-by":"crossref","first-page":"i289","DOI":"10.1093\/bioinformatics\/btp232","article-title":"REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform","volume":"25","author":"Marsella","year":"2009","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B35","doi-asserted-by":"crossref","first-page":"e4792","DOI":"10.1002\/pro.4792","article-title":"UCSF ChimeraX: tools for structure building and analysis","volume":"32","author":"Meng","year":"2023","journal-title":"Protein Sci"},{"key":"2025081319161028800_btaf415-B36","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J Mol Biol"},{"key":"2025081319161028800_btaf415-B37","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1093\/bioinformatics\/btx789","article-title":"NGLview\u2013interactive molecular graphics for jupyter notebooks","volume":"34","author":"Nguyen","year":"2018","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B38","doi-asserted-by":"crossref","first-page":"5113","DOI":"10.1093\/bioinformatics\/btz454","article-title":"DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures","volume":"35","author":"Pag\u00e8s","year":"2019","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B39","doi-asserted-by":"crossref","first-page":"e1000304","DOI":"10.1371\/journal.pcbi.1000304","article-title":"Detection of alpha-rod protein repeats using a neural network and application to huntingtin","volume":"5","author":"Palidwor","year":"2009","journal-title":"PLoS Comput Biol"},{"key":"2025081319161028800_btaf415-B40","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1016\/j.sbi.2017.02.001","article-title":"Designing repeat proteins: a modular approach to protein design","volume":"45","author":"Parmeggiani","year":"2017","journal-title":"Curr Opin Struct Biol"},{"key":"2025081319161028800_btaf415-B41","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learning Res"},{"key":"2025081319161028800_btaf415-B42","author":"Ronneberger","year":"2015"},{"key":"2025081319161028800_btaf415-B43","first-page":"242","author":"Rossum","year":"2009"},{"key":"2025081319161028800_btaf415-B44","doi-asserted-by":"crossref","first-page":"3051","DOI":"10.1093\/bioinformatics\/btv306","article-title":"TRAL: tandem repeat annotation library","volume":"31","author":"Schaper","year":"2015","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B45","author":"Schmidberger","year":"2019"},{"key":"2025081319161028800_btaf415-B46","author":"Schr\u00f6dinger"},{"key":"2025081319161028800_btaf415-B47","doi-asserted-by":"crossref","first-page":"i311","DOI":"10.1093\/bioinformatics\/bth911","article-title":"Tracking repeats using significance and transitivity","volume":"20(Suppl 1)","author":"Szklarczyk","year":"2004","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B48","doi-asserted-by":"crossref","first-page":"2272","DOI":"10.1093\/bioinformatics\/btz921","article-title":"Logomaker: beautiful sequence logos in python","volume":"36","author":"Tareen","year":"2020","journal-title":"Bioinformatics"},{"key":"2025081319161028800_btaf415-B49","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B50","doi-asserted-by":"crossref","first-page":"D385","DOI":"10.1093\/nar\/gkv1047","article-title":"PDBe: improved accessibility of macromolecular structure data from pdb and emdb","volume":"44","author":"Velankar","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2025081319161028800_btaf415-B51","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MCSE.2011.37","article-title":"The numpy array: a structure for efficient numerical computation","volume":"13","author":"Walt","year":"2011","journal-title":"Comput Sci Eng"},{"key":"2025081319161028800_btaf415-B52","doi-asserted-by":"crossref","first-page":"6800","DOI":"10.1073\/pnas.1821959116","article-title":"Cryo-EM structures of Helicobacter pylori vacuolating cytotoxin a oligomeric assemblies at near-atomic resolution","volume":"116","author":"Zhang","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025081319161028800_btaf415-B53","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf415\/63810281\/btaf415.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/8\/btaf415\/63810281\/btaf415.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/8\/btaf415\/63810281\/btaf415.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,13]],"date-time":"2025-08-13T23:16:23Z","timestamp":1755126983000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf415\/8209735"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,7,21]]},"references-count":53,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf415","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.07.22.604558","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,8]]},"published":{"date-parts":[[2025,7,21]]},"article-number":"btaf415"}}