{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T03:27:25Z","timestamp":1776482845913,"version":"3.51.2"},"reference-count":10,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2021,6,16]],"date-time":"2021-06-16T00:00:00Z","timestamp":1623801600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100003130","name":"Research Foundation Flanders","doi-asserted-by":"publisher","award":["1S40321N"],"award-info":[{"award-number":["1S40321N"]}],"id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Flemish Government under the \u2018Onderzoeksprogramma Artifici\u00eble Intelligentie (AI) Vlaanderen\u2019 programme"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,12,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large datasets exceeding 1 million sequences. To account for this limitation, we developed ClusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences, without knowledge about their antigen specificity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Benchmarking comparisons revealed similar accuracy of ClusTCR as compared to other TCR clustering methods, as measured by cluster retention, purity and consistency. ClusTCR offers a drastic improvement in clustering speed, which allows the clustering of millions of TCR sequences in just a few minutes through ultraefficient similarity searching and sequence hashing.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>ClusTCR was written in Python 3. It is available as an anaconda package (https:\/\/anaconda.org\/svalkiers\/clustcr) and on github (https:\/\/github.com\/svalkiers\/clusTCR).<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab446","type":"journal-article","created":{"date-parts":[[2021,6,15]],"date-time":"2021-06-15T15:14:17Z","timestamp":1623770057000},"page":"4865-4867","source":"Crossref","is-referenced-by-count":65,"title":["ClusTCR: a python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4940-1310","authenticated-orcid":false,"given":"Sebastiaan","family":"Valkiers","sequence":"first","affiliation":[{"name":"Adrem Data Lab, Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium"},{"name":"Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), Interdepartmental Consortium, University of Antwerp, 2020 Antwerp, Belgium"}]},{"given":"Max","family":"Van Houcke","sequence":"additional","affiliation":[{"name":"Adrem Data Lab, Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8217-2564","authenticated-orcid":false,"given":"Kris","family":"Laukens","sequence":"additional","affiliation":[{"name":"Adrem Data Lab, Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium"},{"name":"Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), Interdepartmental Consortium, University of Antwerp, 2020 Antwerp, Belgium"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5903-633X","authenticated-orcid":false,"given":"Pieter","family":"Meysman","sequence":"additional","affiliation":[{"name":"Adrem Data Lab, Department of Computer Science, University of Antwerp, 2020 Antwerp, Belgium"},{"name":"Antwerp Unit for Data Analysis and Computation in Immunology and Sequencing (AUDACIS), Interdepartmental Consortium, University of Antwerp, 2020 Antwerp, Belgium"}]}],"member":"286","published-online":{"date-parts":[[2021,6,16]]},"reference":[{"key":"2021121116305180900_btab446-B1","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1038\/nature22383","article-title":"Quantifiable predictive features define epitope-specific t cell receptor repertoires","volume":"547","author":"Dash","year":"2017","journal-title":"Nature"},{"key":"2021121116305180900_btab446-B2","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1038\/nri3020","article-title":"Interrogating the repertoire: broadening the scope of peptide-MHC multimer analysis","volume":"11","author":"Davis","year":"2011","journal-title":"Nat Rev Immunol"},{"key":"2021121116305180900_btab446-B3","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1093\/nar\/30.7.1575","article-title":"An efficient algorithm for large-scale detection of protein families","volume":"30","author":"Enright","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2021121116305180900_btab446-B4","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1038\/nature22976","article-title":"Identifying specificity groups in the t cell receptor repertoire","volume":"547","author":"Glanville","year":"2017","journal-title":"Nature"},{"key":"2021121116305180900_btab446-B5","doi-asserted-by":"crossref","first-page":"1194","DOI":"10.1038\/s41587-020-0505-4","article-title":"Analyzing the mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening","volume":"38","author":"Huang","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2021121116305180900_btab446-B6","author":"Johnson","year":"2019"},{"key":"2021121116305180900_btab446-B7","doi-asserted-by":"crossref","first-page":"e22057","DOI":"10.7554\/eLife.22057","article-title":"T cell receptor repertoires of mice and humans are clustered in similarity networks around conserved public CDR3 sequences","volume":"6","author":"Madi","year":"2017","journal-title":"Elife"},{"key":"2021121116305180900_btab446-B8","author":"Mayer-Blackwell","year":"2020"},{"key":"2021121116305180900_btab446-B9","doi-asserted-by":"crossref","first-page":"1461","DOI":"10.1093\/bioinformatics\/bty821","article-title":"On the viability of unsupervised T-cell receptor sequence clustering for epitope preference","volume":"35","author":"Meysman","year":"2019","journal-title":"Bioinformatics"},{"key":"2021121116305180900_btab446-B10","doi-asserted-by":"crossref","first-page":"1359","DOI":"10.1158\/1078-0432.CCR-19-3249","article-title":"Investigation of antigen-specific T-cell receptor clusters in human cancers","volume":"26","author":"Zhang","year":"2020","journal-title":"Clin Cancer Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab446\/38923919\/btab446.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/24\/4865\/41726822\/btab446.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/24\/4865\/41726822\/btab446.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,11]],"date-time":"2021-12-11T11:31:06Z","timestamp":1639222266000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/24\/4865\/6300511"}},"subtitle":[],"editor":[{"given":"Dr Valentina","family":"Boeva","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,6,16]]},"references-count":10,"journal-issue":{"issue":"24","published-online":{"date-parts":[[2021,6,16]]},"published-print":{"date-parts":[[2021,12,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab446","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.02.22.432291","asserted-by":"object"}]},"ISSN":["1367-4803","1460-2059"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1460-2059","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,12,15]]},"published":{"date-parts":[[2021,6,16]]}}}