{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T00:18:03Z","timestamp":1760660283423,"version":"build-2065373602"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T00:00:00Z","timestamp":1755907200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard","award":["#1764269"],"award-info":[{"award-number":["#1764269"]}]},{"name":"Burroughs-Wellcome Careers at the Scientific Interface"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Recent advancements in protein structure prediction methods have vastly increased the size of databases of protein structures, necessitating fast methods for protein structure comparison. Search methods that find structurally similar proteins can be applied to find remote homologs, study the functional relationships among proteins, and aid in protein engineering tasks.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We design a \u201c3Dn\u201d structural alphabet that encodes the local neighborhoods around each amino acid in an interpretable way. In a search benchmark task, a combination of our alphabet and Foldseek\u2019s 3Di alphabet, outperforms each alphabet individually and ranks best among local search methods that do not require amino acid identity information. We provide software tools that enable the exploration of novel alphabets and combinations of alphabets for protein structure search.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The code is freely available at https:\/\/github.com\/spetti\/structure_comparison and at Zenodo https:\/\/doi.org\/10.5281\/zenodo.15734371.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf458","type":"journal-article","created":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T17:43:10Z","timestamp":1755970990000},"source":"Crossref","is-referenced-by-count":0,"title":["An interpretable alphabet for local protein structure search based on amino acid neighborhoods"],"prefix":"10.1093","volume":"41","author":[{"given":"Saba","family":"Zerefa","sequence":"first","affiliation":[{"name":"School of Engineering and Applied Sciences, Harvard University , Cambridge, MA 02138,","place":["United States"]}]},{"given":"Jesse","family":"Cool","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Tufts University , Medford, MA 02155,","place":["United States"]}]},{"given":"Pramesh","family":"Singh","sequence":"additional","affiliation":[{"name":"Tufts Institute for Artificial Intelligence, Tufts University , Medford, MA 02155,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8281-8161","authenticated-orcid":false,"given":"Samantha","family":"Petti","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Tufts University , Medford, MA 02155,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,8,23]]},"reference":[{"key":"2025101607400200200_btaf458-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"2025101607400200200_btaf458-B2","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"J Stat Mech"},{"key":"2025101607400200200_btaf458-B3","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1006\/jmbi.1998.1943","article-title":"Prediction of local structure in proteins using a library of sequence-structure motifs","volume":"281","author":"Bystroff","year":"1998","journal-title":"J Mol Biol"},{"key":"2025101607400200200_btaf458-B4","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1016\/j.jmb.2004.04.005","article-title":"A hidden Markov model derived structural alphabet for proteins","volume":"339","author":"Camproux","year":"2004","journal-title":"J Mol Biol"},{"key":"2025101607400200200_btaf458-B5","doi-asserted-by":"crossref","first-page":"D475","DOI":"10.1093\/nar\/gky1134","article-title":"SCOPe: classification of large macromolecular structures in the structural classification of proteins\u2014extended database","volume":"47","author":"Chandonia","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025101607400200200_btaf458-B6","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1126\/science.add2187","article-title":"Robust deep learning\u2013based protein sequence design using ProteinMPNN","volume":"378","author":"Dauparas","year":"2022","journal-title":"Science"},{"key":"2025101607400200200_btaf458-B7","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1002\/1097-0134(20001115)41:3<271::AID-PROT10>3.0.CO;2-Z","article-title":"Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks","volume":"41","author":"de Brevern","year":"2000","journal-title":"Proteins"},{"key":"2025101607400200200_btaf458-B8","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"HMMER web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"year":"2024","author":"Gao","key":"2025101607400200200_btaf458-B9"},{"year":"2024","author":"Gaujac","key":"2025101607400200200_btaf458-B10"},{"key":"2025101607400200200_btaf458-B11","doi-asserted-by":"crossref","first-page":"1323","DOI":"10.1093\/bioinformatics\/btw006","article-title":"MMseqs software suite for fast and deep clustering and searching of large protein sequence sets","volume":"32","author":"Hauser","year":"2016","journal-title":"Bioinformatics"},{"key":"2025101607400200200_btaf458-B12","doi-asserted-by":"crossref","first-page":"lqae150","DOI":"10.1093\/nargab\/lqae150","article-title":"Bilingual language model for protein sequence and structure","volume":"6","author":"Heinzinger","year":"2024","journal-title":"NAR Genom Bioinform"},{"key":"2025101607400200200_btaf458-B13","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025101607400200200_btaf458-B14","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1002\/pro.3749","article-title":"DALI and the persistence of protein shape","volume":"29","author":"Holm","year":"2020","journal-title":"Protein Sci"},{"key":"2025101607400200200_btaf458-B15","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/s41587-024-02353-6","article-title":"Fast, sensitive detection of protein homologs using deep dense retrieval","volume":"43","author":"Hong","year":"2025","journal-title":"Nat Biotechnol"},{"key":"2025101607400200200_btaf458-B16","doi-asserted-by":"crossref","first-page":"RP91415","DOI":"10.7554\/eLife.91415","article-title":"Sensitive remote homology search by local alignment of small positional embeddings from protein language models","volume":"12","author":"Johnson","year":"2024","journal-title":"Elife"},{"key":"2025101607400200200_btaf458-B17","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with Alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2025101607400200200_btaf458-B18","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/j.jmb.2008.12.044","article-title":"Structural alphabets for protein structure classification: a comparison study","volume":"387","author":"Le","year":"2009","journal-title":"J Mol Biol"},{"key":"2025101607400200200_btaf458-B19","doi-asserted-by":"crossref","first-page":"2775","DOI":"10.1038\/s41467-024-46808-5","article-title":"PLMSearch: protein language model powers accurate and fast sequence search for remote homology","volume":"15","author":"Liu","year":"2024","journal-title":"Nat Commun"},{"key":"2025101607400200200_btaf458-B20","doi-asserted-by":"crossref","first-page":"101286","DOI":"10.1016\/j.patter.2025.101289","volume":"6","author":"Lu","year":"2025","journal-title":"Patterns"},{"key":"2025101607400200200_btaf458-B21","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/B978-0-12-800168-4.00005-6","article-title":"Algorithms, applications, and challenges of protein structure alignment","volume":"94","author":"Ma","year":"2014","journal-title":"Adv Protein Chem Struct Biol"},{"key":"2025101607400200200_btaf458-B22","doi-asserted-by":"crossref","first-page":"2722","DOI":"10.1093\/bioinformatics\/btt473","article-title":"lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests","volume":"29","author":"Mariani","year":"2013","journal-title":"Bioinformatics"},{"key":"2025101607400200200_btaf458-B23","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1142\/S0219720008003370","article-title":"Tali: local alignment of protein structures using backbone torsion angles","volume":"6","author":"Miao","year":"2008","journal-title":"J Bioinform Comput Biol"},{"key":"2025101607400200200_btaf458-B24","doi-asserted-by":"crossref","first-page":"btac724","DOI":"10.1093\/bioinformatics\/btac724","article-title":"End-to-end learning of multiple sequence alignments with differentiable Smith\u2013Waterman","volume":"39","author":"Petti","year":"2023","journal-title":"Bioinformatics"},{"volume-title":"Nucleic Acids Res","year":"2024","author":"Proch\u00e1zka","key":"2025101607400200200_btaf458-B25"},{"key":"2025101607400200200_btaf458-B26","doi-asserted-by":"crossref","first-page":"016110","DOI":"10.1103\/PhysRevE.74.016110","article-title":"Statistical mechanics of community detection","volume":"74","author":"Reichardt","year":"2006","journal-title":"Phys Rev E Stat Nonlin Soft Matter Phys"},{"key":"2025101607400200200_btaf458-B27","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1016\/S0022-2836(05)80194-9","article-title":"Automatic definition of recurrent local structure motifs in proteins","volume":"213","author":"Rooman","year":"1990","journal-title":"J Mol Biol"},{"key":"2025101607400200200_btaf458-B28","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J Mol Biol"},{"year":"2025","author":"Trinquier","key":"2025101607400200200_btaf458-B29"},{"key":"2025101607400200200_btaf458-B30","doi-asserted-by":"crossref","first-page":"R31","DOI":"10.1186\/gb-2007-8-3-r31","article-title":"Kappa-alpha plot derived structural alphabet and blosum-like substitution matrix for rapid search of protein structure database","volume":"8","author":"Tung","year":"2007","journal-title":"Genome Biol"},{"key":"2025101607400200200_btaf458-B31","doi-asserted-by":"crossref","first-page":"D204","DOI":"10.1093\/nar\/gku989","article-title":"UniProt: a hub for protein information","volume":"43","author":"UniProt Consortium","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2025101607400200200_btaf458-B32","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1038\/s41587-023-01773-0","article-title":"Fast and accurate protein structure search with Foldseek","volume":"42","author":"Van Kempen","year":"2024","journal-title":"Nat Biotechnol"},{"key":"2025101607400200200_btaf458-B33","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Zhang","year":"2004","journal-title":"Proteins Struct Funct Bioinf"},{"key":"2025101607400200200_btaf458-B34","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf458\/64114790\/btaf458.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf458\/64114790\/btaf458.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/10\/btaf458\/64114790\/btaf458.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T11:40:21Z","timestamp":1760614821000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf458\/8240328"}},"subtitle":[],"editor":[{"given":"Jianlin","family":"Cheng","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,8,23]]},"references-count":34,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2025,10,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf458","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2025,10]]},"published":{"date-parts":[[2025,8,23]]},"article-number":"btaf458"}}