{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T22:50:56Z","timestamp":1773874256944,"version":"3.50.1"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T00:00:00Z","timestamp":1746489600000},"content-version":"vor","delay-in-days":5,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000268","name":"UK Biotechnology and Biological Sciences Research Council","doi-asserted-by":"crossref","award":["BB\/T019409\/1"],"award-info":[{"award-number":["BB\/T019409\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000268","name":"UK Biotechnology and Biological Sciences Research Council","doi-asserted-by":"crossref","award":["BB\/T019379\/1"],"award-info":[{"award-number":["BB\/T019379\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100000268","name":"UK Biotechnology and Biological Sciences Research Council","doi-asserted-by":"crossref","award":["BB\/W008556\/1"],"award-info":[{"award-number":["BB\/W008556\/1"]}],"id":[{"id":"10.13039\/501100000268","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,5,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The availability of very large numbers of protein structures from accurate computational methods poses new challenges in storing, searching and detecting relationships between these structures. In particular, the new-found abundance of multi-domain structures in the AlphaFold structure database introduces challenges for traditional structure comparison methods.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We address these challenges using a fast, embedding-based structure comparison method called Foldclass which detects structural similarity between protein domains. We demonstrate the accuracy of Foldclass embeddings for homology detection. In combination with a recently developed deep learning-based automatic domain segmentation tool Merizo, we develop Merizo-search, which first segments multi-domain query structures into domains, and then searches a Foldclass embedding database to determine the top matches for each constituent domain. Combining the ability of Merizo to accurately segment complete chains into domains, and Foldclass to embed and detect similar domains, the Merizo-search tool can be used to rapidly detect per-domain similarities for complete chains, taking as little as 2\u2009min to search all 365 million domains from the Encyclopedia of Domains. We anticipate that these tools will enable many analyses using the wealth of predicted structural data now available.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Foldclass and Merizo-search are available at https:\/\/github.com\/psipred\/merizo_search. The version used in this publication is archived at https:\/\/doi.org\/10.5281\/zenodo.15120830. Merizo-search is also available on the PSIPRED web server at http:\/\/bioinf.cs.ucl.ac.uk\/psipred.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf277","type":"journal-article","created":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T12:06:58Z","timestamp":1746533218000},"source":"Crossref","is-referenced-by-count":6,"title":["Foldclass and Merizo-search: scalable structural similarity search for single- and multi-domain proteins using geometric learning"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2671-2140","authenticated-orcid":false,"given":"Shaun M","family":"Kandathil","sequence":"first","affiliation":[{"name":"Department of Computer Science, University College London , London WC1E 6BT,","place":["United Kingdom"]}]},{"given":"Andy M","family":"Lau","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University College London , London WC1E 6BT,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7391-4696","authenticated-orcid":false,"given":"Daniel W A","family":"Buchan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University College London , London WC1E 6BT,","place":["United Kingdom"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8626-3765","authenticated-orcid":false,"given":"David T","family":"Jones","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University College London , London WC1E 6BT,","place":["United Kingdom"]},{"name":"Institute of Structural and Molecular Biology, University College London , London WC1E 6BT,","place":["United Kingdom"]}]}],"member":"286","published-online":{"date-parts":[[2025,5,6]]},"reference":[{"key":"2025052922563907400_btaf277-B1","doi-asserted-by":"publisher","first-page":"277","DOI":"10.3390\/biom13020277","article-title":"KinFams: De-novo classification of protein kinases using CATH functional units","volume":"13","author":"Adeyelu","year":"2023","journal-title":"Biomolecules"},{"key":"2025052922563907400_btaf277-B2","doi-asserted-by":"crossref","first-page":"D376","DOI":"10.1093\/nar\/gkz1064","article-title":"The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures","volume":"48","author":"Andreeva","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2025052922563907400_btaf277-B3","doi-asserted-by":"crossref","first-page":"W402","DOI":"10.1093\/nar\/gkz297","article-title":"The PSIPRED protein analysis workbench: 20 years on","volume":"47","author":"Buchan","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025052922563907400_btaf277-B4","doi-asserted-by":"crossref","first-page":"W287","DOI":"10.1093\/nar\/gkae328","article-title":"Deep learning for the PSIPRED protein analysis workbench","volume":"52","author":"Buchan","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025052922563907400_btaf277-B5","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: an evolutionary classification of protein domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"2025052922563907400_btaf277-B6","doi-asserted-by":"publisher","author":"Douze","year":"2024","DOI":"10.48550\/arXiv.2401.08281,"},{"key":"2025052922563907400_btaf277-B7","doi-asserted-by":"crossref","first-page":"i718","DOI":"10.1093\/bioinformatics\/btaa839","article-title":"Geometricus represents protein structures as shape-mers derived from moment invariants","volume":"36","author":"Durairaj","year":"2020","journal-title":"Bioinformatics"},{"key":"2025052922563907400_btaf277-B8","doi-asserted-by":"publisher","volume-title":"Bioinform Adv","author":"Greener","DOI":"10.1093\/bioadv\/vbaf042"},{"key":"2025052922563907400_btaf277-B9","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1038\/s41587-023-01917-2","article-title":"Protein remote homology detection and structural alignment using deep learning","volume":"42","author":"Hamamsy","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025052922563907400_btaf277-B10","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025052922563907400_btaf277-B11","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/s41587-023-01773-0","article-title":"Fast and accurate protein structure search with Foldseek","volume":"42","author":"van Kempen","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025052922563907400_btaf277-B12","doi-asserted-by":"publisher","author":"Kingma","year":"2014","DOI":"10.48550\/arXiv.1412.6980,"},{"key":"2025052922563907400_btaf277-B13","doi-asserted-by":"crossref","first-page":"8445","DOI":"10.1038\/s41467-023-43934-4","article-title":"Merizo: a rapid and accurate protein domain segmentation method using invariant point attention","volume":"14","author":"Lau","year":"2023","journal-title":"Nat Commun"},{"key":"2025052922563907400_btaf277-B14","doi-asserted-by":"crossref","first-page":"eadq4946","DOI":"10.1126\/science.adq4946","article-title":"Exploring structural diversity across the protein universe with the Encyclopedia of Domains","volume":"386","author":"Lau","year":"2024","journal-title":"Science"},{"key":"2025052922563907400_btaf277-B15","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2025052922563907400_btaf277-B16","doi-asserted-by":"publisher","author":"Loshchilov","year":"2017","DOI":"10.48550\/arXiv.1711.05101,"},{"key":"2025052922563907400_btaf277-B17","doi-asserted-by":"crossref","first-page":"617","DOI":"10.1016\/S0076-6879(96)66038-8","article-title":"SSAP: sequential structure alignment program for protein structure comparison","volume":"266","author":"Orengo","year":"1996","journal-title":"Methods Enzymol"},{"key":"2025052922563907400_btaf277-B18","doi-asserted-by":"crossref","first-page":"e1600552","DOI":"10.1126\/sciadv.1600552","article-title":"An ambiguity principle for assigning protein structural domains","volume":"3","author":"Postic","year":"2017","journal-title":"Sci Adv"},{"key":"2025052922563907400_btaf277-B19","doi-asserted-by":"publisher","author":"Satorras","year":"2021","DOI":"10.48550\/arXiv.2102.09844"},{"key":"2025052922563907400_btaf277-B20","doi-asserted-by":"crossref","first-page":"D266","DOI":"10.1093\/nar\/gkaa1079","article-title":"CATH: increased structural coverage of functional space","volume":"49","author":"Sillitoe","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2025052922563907400_btaf277-B21","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1038\/nbt.3988","article-title":"MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets","volume":"35","author":"Steinegger","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2025052922563907400_btaf277-B22","doi-asserted-by":"publisher","author":"Vaswani","year":"2017","DOI":"10.48550\/arXiv.1706.03762,"},{"key":"2025052922563907400_btaf277-B23","volume-title":"Bioinformatics","author":"Wells"},{"key":"2025052922563907400_btaf277-B24","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2025052922563907400_btaf277-B25","doi-asserted-by":"crossref","first-page":"btad070","DOI":"10.1093\/bioinformatics\/btad070","article-title":"A unified approach to protein domain parsing with inter-residue distance matrix","volume":"39","author":"Zhu","year":"2023","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf277\/63067617\/btaf277.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/5\/btaf277\/63067617\/btaf277.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/5\/btaf277\/63067617\/btaf277.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T02:56:48Z","timestamp":1748573808000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf277\/8125651"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,5]]},"references-count":25,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,6]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf277","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,5]]},"published":{"date-parts":[[2025,5]]},"article-number":"btaf277"}}