{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T07:19:16Z","timestamp":1773127156207,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2024,4,23]],"date-time":"2024-04-23T00:00:00Z","timestamp":1713830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R21AI58114"],"award-info":[{"award-number":["R21AI58114"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Antibody therapeutic candidates must exhibit not only tight binding to their target but also good developability properties, especially low risk of immunogenicity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this work, we fit a simple generative model, SAM, to sixty million human heavy and seventy million human light chains. We show that the probability of a sequence calculated by the model distinguishes human sequences from other species with the same or better accuracy on a variety of benchmark datasets containing &amp;gt;400 million sequences than any other model in the literature, outperforming large language models (LLMs) by large margins. SAM can humanize sequences, generate new sequences, and score sequences for humanness. It is both fast and fully interpretable. Our results highlight the importance of using simple models as baselines for protein engineering tasks. We additionally introduce a new tool for numbering antibody sequences which is orders of magnitude faster than existing tools in the literature.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>All tools developed in this study are available at https:\/\/github.com\/Wang-lab-UCSD\/AntPack.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae278","type":"journal-article","created":{"date-parts":[[2024,4,19]],"date-time":"2024-04-19T20:10:19Z","timestamp":1713557419000},"source":"Crossref","is-referenced-by-count":8,"title":["For antibody sequence generative modeling, mixture models may be all you need"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7000-2082","authenticated-orcid":false,"given":"Jonathan","family":"Parkinson","sequence":"first","affiliation":[{"name":"Department of Chemistry and Biochemistry, University of California , San Diego, La Jolla, CA 92093-0359, United States"},{"name":"MAP Bioscience , La Jolla, CA 92093, United States"}]},{"given":"Wei","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Chemistry and Biochemistry, University of California , San Diego, La Jolla, CA 92093-0359, United States"},{"name":"Department of Cellular and Molecular Medicine, University of California , San Diego, La Jolla, CA 92093-0359, United States"}]}],"member":"286","published-online":{"date-parts":[[2024,4,23]]},"reference":[{"key":"2024062405552919600_btae278-B1","doi-asserted-by":"crossref","first-page":"3832","DOI":"10.1016\/j.molimm.2008.05.022","article-title":"Analysis and improvements to Kabat and structurally correct numbering of antibody variable domains","volume":"45","author":"Abhinandan","year":"2008","journal-title":"Mol Immunol"},{"key":"2024062405552919600_btae278-B2","doi-asserted-by":"crossref","first-page":"927","DOI":"10.1006\/jmbi.1997.1354","article-title":"Standard conformations for the canonical structures of Immunoglobulins1","volume":"273","author":"Al-Lazikani","year":"1997","journal-title":"J Mol Biol"},{"key":"2024062405552919600_btae278-B3","doi-asserted-by":"crossref","first-page":"1743053","DOI":"10.1080\/19420862.2020.1743053","article-title":"Predicting antibody developability profiles through early stage discovery screening","volume":"12","author":"Bailly","year":"2020","journal-title":"MAbs"},{"key":"2024062405552919600_btae278-B4","author":"Briney","year":"2018"},{"key":"2024062405552919600_btae278-B5","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.sbi.2016.07.012","article-title":"Engineering antibody therapeutics","volume":"38","author":"Chiu","year":"2016","journal-title":"Curr Opin Struct Biol"},{"key":"2024062405552919600_btae278-B6","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.3389\/fimmu.2018.02278","article-title":"Understanding the significance and implications of antibody numbering and Antigen-Binding surface\/residue definition","volume":"9","author":"Dondelinger","year":"2018","journal-title":"Front Immunol"},{"key":"2024062405552919600_btae278-B7","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/S0022-2836(03)00530-8","article-title":"Engineering stable cytoplasmic intrabodies with designed specificity","volume":"330","author":"Donini","year":"2003","journal-title":"J Mol Biol"},{"key":"2024062405552919600_btae278-B8","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1093\/bioinformatics\/btv552","article-title":"ANARCI: antigen receptor numbering and receptor classification","volume":"32","author":"Dunbar","year":"2015","journal-title":"Bioinformatics"},{"key":"2024062405552919600_btae278-B9","author":"Faure","year":"2023"},{"key":"2024062405552919600_btae278-B10","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1016\/0022-2836(92)91010-M","article-title":"Antibody framework residues affecting the conformation of the hypervariable loops","volume":"224","author":"Foote","year":"1992","journal-title":"J Mol Biol"},{"key":"2024062405552919600_btae278-B11","doi-asserted-by":"crossref","first-page":"2365","DOI":"10.3389\/fimmu.2019.02365","article-title":"cAb-Rep: a database of curated antibody repertoires for exploring antibody diversity and predicting antibody prevalence","volume":"10","author":"Guo","year":"2019","journal-title":"Front Immunol"},{"key":"2024062405552919600_btae278-B12","doi-asserted-by":"crossref","first-page":"256","DOI":"10.4161\/mabs.2.3.11641","article-title":"The immunogenicity of humanized and fully human antibodies","volume":"2","author":"Harding","year":"2010","journal-title":"MAbs"},{"key":"2024062405552919600_btae278-B13","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1006\/jmbi.2001.4662","article-title":"Yet another numbering scheme for immunoglobulin variable domains: an automatic modeling and analysis tool","volume":"309","author":"Honegger","year":"2001","journal-title":"J Mol Biol"},{"key":"2024062405552919600_btae278-B14","doi-asserted-by":"crossref","first-page":"pdb.top115","DOI":"10.1101\/pdb.top115","article-title":"IMGT, the international ImMunoGeneTics information system","volume":"2011","author":"Lefranc","year":"2011","journal-title":"Cold Spring Harbor Protocols"},{"key":"2024062405552919600_btae278-B15","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/j.dci.2004.07.003","article-title":"IMGT unique numbering for immunoglobulin and T cell receptor constant domains and Ig superfamily C-like domains","volume":"29","author":"Lefranc","year":"2005","journal-title":"Dev Comp Immunol"},{"key":"2024062405552919600_btae278-B16","doi-asserted-by":"crossref","first-page":"1524","DOI":"10.1002\/pro.3633","article-title":"AbRSA: a robust tool for antibody numbering","volume":"28","author":"Li","year":"2019","journal-title":"Protein Sci"},{"key":"2024062405552919600_btae278-B17","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1038\/s41467-018-02832-w","article-title":"High-Throughput immune repertoire analysis with IGoR","volume":"9","author":"Marcou","year":"2018","journal-title":"Nat Commun"},{"key":"2024062405552919600_btae278-B18","doi-asserted-by":"crossref","first-page":"4041","DOI":"10.1093\/bioinformatics\/btab434","article-title":"Humanization of antibodies using a machine learning approach on large-scale repertoire data","volume":"37","author":"Marks","year":"2021","journal-title":"Bioinformatics"},{"key":"2024062405552919600_btae278-B19","doi-asserted-by":"crossref","first-page":"968","DOI":"10.1016\/j.cels.2023.10.002","article-title":"ProGen2: exploring the boundaries of protein language models","volume":"14","author":"Nijkamp","year":"2023","journal-title":"Cell Syst"},{"key":"2024062405552919600_btae278-B20","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1002\/pro.4205","article-title":"Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences","volume":"31","author":"Olsen","year":"2022","journal-title":"Protein Sci"},{"key":"2024062405552919600_btae278-B21","doi-asserted-by":"crossref","first-page":"454","DOI":"10.1038\/s41467-023-36028-8","article-title":"The RESP AI model accelerates the identification of Tight-Binding antibodies","volume":"14","author":"Parkinson","year":"2023","journal-title":"Nat Commun"},{"key":"2024062405552919600_btae278-B22","doi-asserted-by":"crossref","first-page":"4589","DOI":"10.1021\/acs.jcim.3c00601","article-title":"Linear-Scaling kernels for protein sequences and small molecules outperform deep learning while providing uncertainty quantitation and improved interpretability","volume":"63","author":"Parkinson","year":"2023","journal-title":"J Chem Inf Model"},{"key":"2024062405552919600_btae278-B23","doi-asserted-by":"crossref","first-page":"2020203","DOI":"10.1080\/19420862.2021.2020203","article-title":"BioPhi: a platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning","volume":"14","author":"Prihoda","year":"2022","journal-title":"MAbs"},{"key":"2024062405552919600_btae278-B24","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1038\/s42256-023-00778-3","article-title":"Assessing antibody and nanobody nativeness for hit selection and humanization with AbNatiV","volume":"6","author":"Ramon","year":"2024","journal-title":"Nat Mach Intell"},{"key":"2024062405552919600_btae278-B25","author":"Ruffolo","year":"2021"},{"key":"2024062405552919600_btae278-B26","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/02648725.2013.801235","article-title":"Antibody humanization methods\u2014a review and update","volume":"29","author":"Safdari","year":"2013","journal-title":"Biotechnol Genet Eng Rev"},{"key":"2024062405552919600_btae278-B27","doi-asserted-by":"crossref","first-page":"979","DOI":"10.1016\/j.cels.2023.10.001","article-title":"IgLM: infilling language modeling for antibody sequence design","volume":"14","author":"Shuai","year":"2023","journal-title":"Cell Syst"},{"key":"2024062405552919600_btae278-B28","doi-asserted-by":"crossref","first-page":"2474","DOI":"10.1016\/j.molimm.2008.01.016","article-title":"Humanization of a highly stable single-chain antibody by structure-based antigen-binding site grafting","volume":"45","author":"Villani","year":"2008","journal-title":"Mol Immunol"},{"key":"2024062405552919600_btae278-B29","doi-asserted-by":"crossref","first-page":"3594","DOI":"10.1093\/bioinformatics\/btaa158","article-title":"ImmuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking","volume":"36","author":"Weber","year":"2020","journal-title":"Bioinformatics"},{"key":"2024062405552919600_btae278-B30","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1093\/protein\/gzz031","article-title":"Quantifying the nativeness of antibody sequences using long short-term memory networks","volume":"32","author":"Wollacott","year":"2019","journal-title":"Protein Eng Des Sel"},{"key":"2024062405552919600_btae278-B31","first-page":"7057","article-title":"Pillars article: an analysis of the sequences of the variable regions of bence jones proteins and myeloma light chains and their implications for antibody complementarity. J. Exp. Med. 1970. 132: 211-250","volume":"180","author":"Wu","year":"2008","journal-title":"J Immunol (Baltimore, MD.: 1950)"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae278\/57310744\/btae278.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/5\/btae278\/58309540\/btae278.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/5\/btae278\/58309540\/btae278.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,24]],"date-time":"2024-06-24T02:00:24Z","timestamp":1719194424000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae278\/7656770"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,4,23]]},"references-count":31,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae278","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.01.27.577555","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,5,1]]},"published":{"date-parts":[[2024,4,23]]},"article-number":"btae278"}}