{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:23:35Z","timestamp":1772173415172,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1013057","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T00:00:00Z","timestamp":1749427200000}}],"reference-count":48,"publisher":"Public Library of Science (PLoS)","issue":"5","license":[{"start":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T00:00:00Z","timestamp":1748563200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010665","name":"H2020 Marie Sk\u0142odowska-Curie Actions","doi-asserted-by":"publisher","award":["860197"],"award-info":[{"award-number":["860197"]}],"id":[{"id":"10.13039\/100010665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010665","name":"H2020 Marie Sk\u0142odowska-Curie Actions","doi-asserted-by":"publisher","award":["860197"],"award-info":[{"award-number":["860197"]}],"id":[{"id":"10.13039\/100010665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Repertoire sequencing allows us to investigate the antibody-mediated immune response. The clustering of sequences is a crucial step in the data analysis pipeline, aiding in the identification of functionally related antibodies. The conventional clustering approach of clonotyping relies on sequence information, particularly CDRH3 sequence identity and V\/J gene usage, to group sequences into clonotypes. It has been suggested that the limitations of sequence-based approaches to identify sequence-dissimilar but functionally converged antibodies can be overcome by using structure information to group antibodies. Recent advances have made structure-based methods feasible on a repertoire level. However, so far, their performance has only been evaluated on single-antigen sets of antibodies. A comprehensive comparison of the benefits and limitations of structure-based tools on realistic and diverse repertoire data is missing. Here, we aim to explore the promise of structure-based clustering algorithms to replace or augment the standard sequence-based approach, specifically by identifying low-sequence identity groups. Two methods, SAAB+ and SPACE2, are evaluated against clonotyping. We curated a dataset of well-annotated pairs of antibodies that show high overlap in epitope residues and thus bind the same region within their respective antigen. This set of antibodies was introduced into a simulated repertoire to compare the performance of clustering approaches on a diverse antibody set. Our analysis reveals that structure-based methods do group more antibodies together compared to clonotyping. However, it also highlights the limitations associated with the need for same-length CDR regions by SPACE2. This work thoroughly compares the utility of different clustering methods and provides insights into what further steps are required to effectively use antibody structural information to group immune repertoire data.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013057","type":"journal-article","created":{"date-parts":[[2025,5,30]],"date-time":"2025-05-30T13:43:12Z","timestamp":1748612592000},"page":"e1013057","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":0,"title":["Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data"],"prefix":"10.1371","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8570-7640","authenticated-orcid":true,"given":"Katharina","family":"Waury","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Stefan","family":"Lelieveld","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2779-7174","authenticated-orcid":true,"given":"Sanne","family":"Abeln","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8582-5404","authenticated-orcid":true,"given":"Henk-Jan","family":"van den Ham","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2025,5,30]]},"reference":[{"key":"pcbi.1013057.ref001","doi-asserted-by":"crossref","first-page":"626793","DOI":"10.3389\/fimmu.2021.626793","article-title":"The future of blood testing is the immunome","volume":"12","author":"RA Arnaout","year":"2021","journal-title":"Front Immunol"},{"key":"pcbi.1013057.ref002","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.coisb.2016.12.009","article-title":"Advances and applications of immune receptor sequencing in systems immunology","volume":"1","author":"P Lindau","year":"2017","journal-title":"Curr Opin Syst Biol"},{"issue":"12","key":"pcbi.1013057.ref003","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1016\/j.it.2014.09.004","article-title":"Characterizing immune repertoires by high throughput sequencing: strategies and applications","volume":"35","author":"JJA Calis","year":"2014","journal-title":"Trends Immunol"},{"key":"pcbi.1013057.ref004","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1186\/s13073-015-0243-2","article-title":"Practical guidelines for B-cell receptor repertoire sequencing analysis","volume":"7","author":"G Yaari","year":"2015","journal-title":"Genome Med"},{"issue":"1676","key":"pcbi.1013057.ref005","doi-asserted-by":"crossref","first-page":"20140239","DOI":"10.1098\/rstb.2014.0239","article-title":"The analysis of clonal expansions in normal and autoimmune B cell repertoires","volume":"370","author":"U Hershberg","year":"2015","journal-title":"Philos Trans R Soc Lond B Biol Sci"},{"issue":"1","key":"pcbi.1013057.ref006","article-title":"Adaptive immune receptor repertoire analysis","volume":"4","author":"V Mhanna","year":"2024","journal-title":"Nat Rev Methods Primers"},{"issue":"1","key":"pcbi.1013057.ref007","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/S1074-7613(00)00006-6","article-title":"Diversity in the CDR3 region of V(H) is sufficient for most antibody specificities","volume":"13","author":"JL Xu","year":"2000","journal-title":"Immunity"},{"issue":"1","key":"pcbi.1013057.ref008","doi-asserted-by":"crossref","first-page":"1996732","DOI":"10.1080\/19420862.2021.1996732","article-title":"Current strategies for detecting functional convergence across B-cell receptor repertoires","volume":"13","author":"MIJ Raybould","year":"2021","journal-title":"MAbs"},{"issue":"6049","key":"pcbi.1013057.ref009","doi-asserted-by":"crossref","first-page":"1633","DOI":"10.1126\/science.1207227","article-title":"Sequence and structural convergence of broad and potent HIV antibodies that mimic CD4 binding","volume":"333","author":"JF Scheid","year":"2011","journal-title":"Science"},{"issue":"10","key":"pcbi.1013057.ref010","doi-asserted-by":"crossref","first-page":"2308","DOI":"10.1110\/ps.0209102","article-title":"How do two unrelated antibodies, HyHEL-10 and F9.13.7, recognize the same epitope of hen egg-white lysozyme?","volume":"11","author":"J Pons","year":"2002","journal-title":"Protein Sci"},{"issue":"4","key":"pcbi.1013057.ref011","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1002\/1097-0134(20000901)40:4<572::AID-PROT30>3.0.CO;2-N","article-title":"Structural evidence for recognition of a single epitope by two distinct antibodies","volume":"40","author":"D Fleury","year":"2000","journal-title":"Proteins"},{"issue":"1","key":"pcbi.1013057.ref012","doi-asserted-by":"crossref","first-page":"1869406","DOI":"10.1080\/19420862.2020.1869406","article-title":"A computational method for immune repertoire mining that identifies novel binders from different clonotypes, demonstrated by identifying anti-pertussis toxoid antibodies","volume":"13","author":"E Richardson","year":"2021","journal-title":"MAbs"},{"issue":"2","key":"pcbi.1013057.ref013","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1006\/jmbi.1999.3110","article-title":"Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm","volume":"293","author":"PE Wright","year":"1999","journal-title":"J Mol Biol"},{"key":"pcbi.1013057.ref014","doi-asserted-by":"crossref","first-page":"1753","DOI":"10.3389\/fimmu.2017.01753","article-title":"How B-cell receptor repertoire sequencing can be enriched with structural antibody data","volume":"8","author":"A Kovaltsuk","year":"2017","journal-title":"Front Immunol"},{"key":"pcbi.1013057.ref015","doi-asserted-by":"crossref","first-page":"2000","DOI":"10.1016\/j.csbj.2020.07.008","article-title":"Methods for sequence and structural analysis of B and T cell receptor repertoires","volume":"18","author":"S Teraguchi","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"issue":"12","key":"pcbi.1013057.ref016","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1009675","article-title":"Epitope profiling using computational structural modelling demonstrated on coronavirus-binding antibodies","volume":"17","author":"SA Robinson","year":"2021","journal-title":"PLoS Comput Biol"},{"issue":"7873","key":"pcbi.1013057.ref017","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"J Jumper","year":"2021","journal-title":"Nature"},{"key":"pcbi.1013057.ref018","article-title":"Deciphering antibody affinity maturation with language models and weakly supervised learning","author":"JA Ruffolo","year":"2021","journal-title":"arXiv preprint"},{"key":"pcbi.1013057.ref019","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. CoRR. 2018. doi: abs\/1810.04805"},{"issue":"1","key":"pcbi.1013057.ref020","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1038\/s42003-023-04927-7","article-title":"ImmuneBuilder: deep-learning models for predicting the structures of immune proteins","volume":"6","author":"B Abanades","year":"2023","journal-title":"Commun Biol"},{"issue":"1","key":"pcbi.1013057.ref021","doi-asserted-by":"crossref","first-page":"2389","DOI":"10.1038\/s41467-023-38063-x","article-title":"Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies","volume":"14","author":"JA Ruffolo","year":"2023","journal-title":"Nat Commun"},{"issue":"19","key":"pcbi.1013057.ref022","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.1525510113","article-title":"Large-scale sequence and structural comparisons of human naive and antigen-experienced antibody repertoires","volume":"113","author":"BJ DeKosky","year":"2016","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"1","key":"pcbi.1013057.ref023","doi-asserted-by":"crossref","first-page":"1873478","DOI":"10.1080\/19420862.2021.1873478","article-title":"Ab-Ligity: identifying sequence-dissimilar antibodies that bind to the same epitope","volume":"13","author":"WK Wong","year":"2021","journal-title":"MAbs"},{"key":"pcbi.1013057.ref024","doi-asserted-by":"crossref","first-page":"1237621","DOI":"10.3389\/fmolb.2023.1237621","article-title":"Improved computational epitope profiling using structural models identifies a broader diversity of antibodies that bind to the same epitope","volume":"10","author":"FC Spoendlin","year":"2023","journal-title":"Front Mol Biosci"},{"issue":"2","key":"pcbi.1013057.ref025","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1007636","article-title":"Structural diversity of B-cell receptor repertoires along the B-cell differentiation axis in humans and mice","volume":"16","author":"A Kovaltsuk","year":"2020","journal-title":"PLoS Comput Biol"},{"issue":"4","key":"pcbi.1013057.ref026","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1039\/C9ME00021F","article-title":"Functional clustering of B cell receptors using sequence and structural features","volume":"4","author":"Z Xu","year":"2019","journal-title":"Mol Syst Des Eng"},{"key":"pcbi.1013057.ref027","doi-asserted-by":"crossref","first-page":"1352508","DOI":"10.3389\/fmolb.2024.1352508","article-title":"Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications","volume":"11","author":"D Chomicz","year":"2024","journal-title":"Front Mol Biosci"},{"issue":"7948","key":"pcbi.1013057.ref028","first-page":"521","article-title":"Imprinted SARS-CoV-2 humoral immunity induces convergent Omicron RBD evolution","volume":"614","author":"Y Cao","year":"2023","journal-title":"Nature"},{"key":"pcbi.1013057.ref029","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gky1006","article-title":"The immune epitope database (IEDB): 2018 update","volume":"47","author":"R Vita","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1013057.ref030","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkt1043","article-title":"SAbDab: the structural antibody database","volume":"42","author":"J Dunbar","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1013057.ref031","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkac1077","article-title":"RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence\/machine learning","volume":"51","author":"SK Burley","year":"2023","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"pcbi.1013057.ref032","doi-asserted-by":"crossref","first-page":"2268255","DOI":"10.1080\/19420862.2023.2268255","article-title":"A window into the human immune system: comprehensive characterization of the complexity of antibody complementary-determining regions in functional antibodies","volume":"15","author":"O Mejias-Gomez","year":"2023","journal-title":"MAbs"},{"issue":"11","key":"pcbi.1013057.ref033","doi-asserted-by":"crossref","first-page":"3594","DOI":"10.1093\/bioinformatics\/btaa158","article-title":"immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking","volume":"36","author":"CR Weber","year":"2020","journal-title":"Bioinformatics"},{"key":"pcbi.1013057.ref034","unstructured":"ENPICOM. IGX platform: unlock the full potential of your repertoire data. https:\/\/enpicom.com\/igx-platform\/. 2021."},{"issue":"1","key":"pcbi.1013057.ref035","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1038\/s42003-023-05744-8","article-title":"Contextualising the developability risk of antibodies with lambda light chains using enhanced therapeutic antibody profiling","volume":"7","author":"MIJ Raybould","year":"2024","journal-title":"Commun Biol"},{"issue":"7","key":"pcbi.1013057.ref036","doi-asserted-by":"crossref","first-page":"1311","DOI":"10.1002\/prot.25291","article-title":"The H3 loop of antibodies shows unique structural characteristics","volume":"85","author":"C Regep","year":"2017","journal-title":"Proteins"},{"issue":"6","key":"pcbi.1013057.ref037","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1010167","article-title":"Learning the statistics and landscape of somatic mutation-induced insertions and deletions in antibodies","volume":"18","author":"C Lupo","year":"2022","journal-title":"PLoS Comput Biol"},{"issue":"4","key":"pcbi.1013057.ref038","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1080\/19420862.2016.1158370","article-title":"Length-independent structural similarities enrich the antibody CDR canonical class model","volume":"8","author":"J Nowak","year":"2016","journal-title":"MAbs"},{"issue":"1","key":"pcbi.1013057.ref039","doi-asserted-by":"crossref","first-page":"2175319","DOI":"10.1080\/19420862.2023.2175319","article-title":"Challenges in antibody structure prediction","volume":"15","author":"ML Fern\u00e1ndez-Quintero","year":"2023","journal-title":"MAbs"},{"issue":"6","key":"pcbi.1013057.ref040","doi-asserted-by":"crossref","first-page":"1077","DOI":"10.1080\/19420862.2019.1618676","article-title":"CDR-H3 loop ensemble in solution - conformational selection upon antibody binding","volume":"11","author":"ML Fern\u00e1ndez-Quintero","year":"2019","journal-title":"MAbs"},{"issue":"4","key":"pcbi.1013057.ref041","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkaa1160","article-title":"Alignment free identification of clones in B cell receptor repertoires","volume":"49","author":"O Lindenbaum","year":"2021","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"pcbi.1013057.ref042","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1010723","article-title":"Inference of B cell clonal families using heavy\/light chain pairing information","volume":"18","author":"DK Ralph","year":"2022","journal-title":"PLoS Comput Biol"},{"issue":"2","key":"pcbi.1013057.ref043","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1093\/bioinformatics\/btv552","article-title":"ANARCI: antigen receptor numbering and receptor classification","volume":"32","author":"J Dunbar","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1013057.ref044","unstructured":"Bachmann M. Levenshtein Python C extension module. 2021. https:\/\/rapidfuzz.github.io\/Levenshtein\/index.html"},{"key":"pcbi.1013057.ref045","doi-asserted-by":"crossref","first-page":"1330153","DOI":"10.3389\/fimmu.2023.1330153","article-title":"AIRR-C IG reference sets: curated sets of immunoglobulin heavy and light chain germline genes","volume":"14","author":"AM Collins","year":"2024","journal-title":"Front Immunol"},{"issue":"11","key":"pcbi.1013057.ref046","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"PJA Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"pcbi.1013057.ref047","doi-asserted-by":"crossref","first-page":"358","DOI":"10.3389\/fimmu.2013.00358","article-title":"Models of somatic hypermutation targeting and substitution based on synonymous mutations from high-throughput immunoglobulin sequencing data","volume":"4","author":"G Yaari","year":"2013","journal-title":"Front Immunol"},{"key":"pcbi.1013057.ref048","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1038\/s41421-019-0137-3","article-title":"More than one antibody of individual B cells revealed by single-cell immune profiling","volume":"5","author":"Z Shi","year":"2019","journal-title":"Cell Discov"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1013057","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T00:00:00Z","timestamp":1749427200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013057","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,9]],"date-time":"2025-06-09T14:01:53Z","timestamp":1749477713000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013057"}},"subtitle":[],"editor":[{"given":"Claude","family":"Loverdo","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,5,30]]},"references-count":48,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5,30]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013057","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.06.11.598449","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,30]]}}}