{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T05:04:26Z","timestamp":1773896666755,"version":"3.50.1"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2013,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Summary: High-throughput sequencing provides an opportunity to analyse the repertoire of antigen-specific receptors with an unprecedented breadth and depth. However, the quantity of raw data produced by this technology requires efficient ways to categorize and store the output for subsequent analysis. To this end, we have defined a simple five-item identifier that uniquely and unambiguously defines each TcR sequence. We then describe a novel application of finite-state automaton to map Illumina short-read sequence data for individual TcRs to their respective identifier. An extension of the standard algorithm is also described, which allows for the presence of single-base pair mismatches arising from sequencing error. The software package, named Decombinator, is tested first on a set of artificial in silico sequences and then on a set of published human TcR-\u03b2 sequences. Decombinator assigned sequences at a rate more than two orders of magnitude faster than that achieved by classical pairwise alignment algorithms, and with a high degree of accuracy (&amp;gt;88%), even after introducing up to 1% error rates in the in silico sequences. Analysis of the published sequence dataset highlighted the strong V and J usage bias observed in the human peripheral blood repertoire, which seems to be unconnected to antigen exposure. The analysis also highlighted the enormous size of the available repertoire and the challenge of obtaining a comprehensive description for it. The Decombinator package will be a valuable tool for further in-depth analysis of the T-cell repertoire.<\/jats:p>\n               <jats:p>Availability and implementation: The Decombinator package is implemented in Python (v2.6) and is freely available at https:\/\/github.com\/uclinfectionimmunity\/Decombinator along with full documentation and examples of typical usage.<\/jats:p>\n               <jats:p>Contact: \u00a0b.chain@ucl.ac.uk<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt004","type":"journal-article","created":{"date-parts":[[2013,1,10]],"date-time":"2013-01-10T05:46:12Z","timestamp":1357796772000},"page":"542-550","source":"Crossref","is-referenced-by-count":97,"title":["Decombinator: a tool for fast, efficient gene assignment in T-cell receptor sequences using a finite state machine"],"prefix":"10.1093","volume":"29","author":[{"given":"Niclas","family":"Thomas","sequence":"first","affiliation":[{"name":"1 CoMPLEX Department, UCL, Gower Street, London, WC1E 6BT, 2Division of Infection and Immunity, The Cruciform Building, UCL, Gower Street, London, WC1 6BT, UK, 3Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel and 4Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK"}]},{"given":"James","family":"Heather","sequence":"additional","affiliation":[{"name":"1 CoMPLEX Department, UCL, Gower Street, London, WC1E 6BT, 2Division of Infection and Immunity, The Cruciform Building, UCL, Gower Street, London, WC1 6BT, UK, 3Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel and 4Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK"}]},{"given":"Wilfred","family":"Ndifon","sequence":"additional","affiliation":[{"name":"1 CoMPLEX Department, UCL, Gower Street, London, WC1E 6BT, 2Division of Infection and Immunity, The Cruciform Building, UCL, Gower Street, London, WC1 6BT, UK, 3Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel and 4Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK"}]},{"given":"John","family":"Shawe-Taylor","sequence":"additional","affiliation":[{"name":"1 CoMPLEX Department, UCL, Gower Street, London, WC1E 6BT, 2Division of Infection and Immunity, The Cruciform Building, UCL, Gower Street, London, WC1 6BT, UK, 3Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel and 4Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK"}]},{"given":"Benjamin","family":"Chain","sequence":"additional","affiliation":[{"name":"1 CoMPLEX Department, UCL, Gower Street, London, WC1E 6BT, 2Division of Infection and Immunity, The Cruciform Building, UCL, Gower Street, London, WC1 6BT, UK, 3Department of Immunology, Weizmann Institute of Science, Rehovot 76100, Israel and 4Department of Computer Science, UCL, Gower Street, London, WC1E 6BT, UK"}]}],"member":"286","published-online":{"date-parts":[[2013,1,9]]},"reference":[{"key":"2023051607330781800_btt004-B1","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1145\/360825.360855","article-title":"Efficient string matching: an aid to bibliographic search","volume":"18","author":"Aho","year":"1975","journal-title":"Commun. ACM"},{"key":"2023051607330781800_btt004-B2","first-page":"26","article-title":"IMGT\/HighV-QUEST: the IMGT web portal for immunoglobulin (IG) or antibody and t cell receptor (TR) analysis from NGS high throughput and deep sequencing","volume":"8","author":"Alamyar","year":"2012","journal-title":"Immunome Res."},{"key":"2023051607330781800_btt004-B3","doi-asserted-by":"crossref","first-page":"3469","DOI":"10.1182\/blood-2011-11-395384","article-title":"Blood T-cell receptor diversity decreases during the course of HIV infection, but the potential for a diverse repertoire persists","volume":"119","author":"Baum","year":"2012","journal-title":"Blood"},{"key":"2023051607330781800_btt004-B4","doi-asserted-by":"crossref","first-page":"3073","DOI":"10.1002\/eji.201242517","article-title":"Next generation sequencing for TCR repertoire profiling: platform-specific features and correction algorithms","volume":"42","author":"Bolotin","year":"2012","journal-title":"Eur. J. Immunol."},{"key":"2023051607330781800_btt004-B5","doi-asserted-by":"crossref","first-page":"762","DOI":"10.1145\/359842.359859","article-title":"A fast string searching algorithm","volume":"20","author":"Boyer","year":"1977","journal-title":"Commun. ACM"},{"key":"2023051607330781800_btt004-B6","doi-asserted-by":"crossref","first-page":"1817","DOI":"10.1101\/gr.092924.109","article-title":"Profiling the T-cell receptor beta-chain repertoire by massively parallel sequencing","volume":"19","author":"Freeman","year":"2009","journal-title":"Genome Res."},{"key":"2023051607330781800_btt004-B7","doi-asserted-by":"crossref","first-page":"D256","DOI":"10.1093\/nar\/gki010","article-title":"IMGT\/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes","volume":"33","author":"Giudicelli","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023051607330781800_btt004-B8","doi-asserted-by":"crossref","first-page":"5109","DOI":"10.4049\/jimmunol.152.10.5109","article-title":"Circulating t cell repertoire complexity in normal individuals and bone marrow recipients analyzed by CDR3 size spectratyping. correlation with immune status","volume":"152","author":"Gorski","year":"1994","journal-title":"J. Immunol."},{"key":"2023051607330781800_btt004-B9","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1101\/gr.073262.107","article-title":"The new paradigm of flow cell sequencing","volume":"18","author":"Holt","year":"2008","journal-title":"Genome Res."},{"key":"2023051607330781800_btt004-B10","first-page":"656","article-title":"BLAT\u2013the BLAST-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res."},{"key":"2023051607330781800_btt004-B11","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1016\/j.imlet.2010.06.011","article-title":"Human T-cell memory consists mainly of unexpanded clones","volume":"133","author":"Klarenbeek","year":"2010","journal-title":"Immunol. Lett."},{"key":"2023051607330781800_btt004-B12","doi-asserted-by":"crossref","first-page":"953","DOI":"10.1128\/CDLI.7.6.953-959.2000","article-title":"T-cell receptor V\u03b2 repertoire CDR3 length diversity differs within CD45RA and CD45RO T-cell subsets in healthy and human immunodeficiency virus-infected children","volume":"7","author":"Kou","year":"2000","journal-title":"Clin. Diagn. Lab. Immunol."},{"key":"2023051607330781800_btt004-B13","doi-asserted-by":"crossref","first-page":"e30087","DOI":"10.1371\/journal.pone.0030087","article-title":"Direct comparisons of illumina vs. roche 454 sequencing technologies on the same microbial community DNA sample","volume":"7","author":"Luo","year":"2012","journal-title":"PLoS One"},{"key":"2023051607330781800_btt004-B14","doi-asserted-by":"crossref","first-page":"16161","DOI":"10.1073\/pnas.1212755109","article-title":"Statistical inference of the generation probability of T-cell receptors from sequence repertoires","volume":"109","author":"Murugan","year":"2012","journal-title":"Proc. Natl Acad. Sci. USA."},{"key":"2023051607330781800_btt004-B15","doi-asserted-by":"crossref","first-page":"15865","DOI":"10.1073\/pnas.1203916109","article-title":"Chromatin conformation governs T-cell receptor J\u03b2 gene segment usage","volume":"109","author":"Ndifon","year":"2012","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023051607330781800_btt004-B16","volume-title":"Biostrings: String objects representing biological sequences, and matching algorithms","author":"Pages","year":"2012"},{"key":"2023051607330781800_btt004-B17","doi-asserted-by":"crossref","first-page":"4099","DOI":"10.1182\/blood-2009-04-217604","article-title":"Comprehensive assessment of T-cell receptor beta-chain diversity in alphabeta T cells","volume":"114","author":"Robins","year":"2009","journal-title":"Blood"},{"key":"2023051607330781800_btt004-B18","doi-asserted-by":"crossref","first-page":"3557","DOI":"10.1158\/0008-5472.CAN-12-0277","article-title":"The T-cell receptor repertoire of tumor-infiltrating regulatory T lymphocytes is skewed towards public sequences","volume":"72","author":"Sainz-Perez","year":"2012","journal-title":"Cancer Res."},{"key":"2023051607330781800_btt004-B19","first-page":"294","article-title":"A pattern matching algorithm for codon optimization and CpG motif-engineering in DNA expression vectors","volume":"2","author":"Satya","year":"2003","journal-title":"Proc. IEEE Comput. Soc. Bioinform. Conf."},{"key":"2023051607330781800_btt004-B20","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1038\/nbt1486","article-title":"Next-generation DNA sequencing","volume":"26","author":"Shendure","year":"2008","journal-title":"Nature Biotechnol."},{"key":"2023051607330781800_btt004-B21","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1007\/BF01769703","article-title":"Approximate string matching with suffix automata","volume":"10","author":"Ukkonen","year":"1993","journal-title":"Algorithmica"},{"key":"2023051607330781800_btt004-B22","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1038\/nri2260","article-title":"The molecular basis for public T-cell responses?","volume":"8","author":"Venturi","year":"2008","journal-title":"Nat. Rev. Immunol."},{"key":"2023051607330781800_btt004-B23","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1093\/bioinformatics\/btk004","article-title":"SoDA: implementation of a 3D alignment algorithm for inference of antigen receptor recombinations","volume":"22","author":"Volpe","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051607330781800_btt004-B24","doi-asserted-by":"crossref","first-page":"1518","DOI":"10.1073\/pnas.0913939107","article-title":"High throughput sequencing reveals a complex pattern of dynamic interrelationships among human T cell subsets","volume":"107","author":"Wang","year":"2010","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023051607330781800_btt004-B25","doi-asserted-by":"crossref","first-page":"790","DOI":"10.1101\/gr.115428.110","article-title":"Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes","volume":"21","author":"Warren","year":"2011","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/5\/542\/50335769\/bioinformatics_29_5_542.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/29\/5\/542\/50335769\/bioinformatics_29_5_542.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T07:56:14Z","timestamp":1684223774000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/29\/5\/542\/249065"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,1,9]]},"references-count":25,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2013,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt004","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2013,3,1]]},"published":{"date-parts":[[2013,1,9]]}}}