{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T05:13:31Z","timestamp":1770182011895,"version":"3.49.0"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T00:00:00Z","timestamp":1768435200000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Research Funds of Beijing Normal University","award":["312200502537"],"award-info":[{"award-number":["312200502537"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,1,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to major histocompatibility complex (MHC) Class I molecules for antigen presentation to T cells. Traditionally, MHC Class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve nonadjacent residues, challenging the assumptions of existing methods.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>In this study, we propose Gap-Aware Motif Mining Algorithm (GAMMA), a probabilistic framework designed to identify noncontiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with Markov chain Monte Carlo sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC Class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The raw data and the source codes are available on GitHub (https:\/\/github.com\/RanLIUaca\/GAMMAmotif).<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btag014","type":"journal-article","created":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T12:56:08Z","timestamp":1767963368000},"source":"Crossref","is-referenced-by-count":0,"title":["GAMMA: gap-aware motif mining under incomplete labeling with applications to MHC motifs"],"prefix":"10.1093","volume":"42","author":[{"given":"Xinyi","family":"Tang","sequence":"first","affiliation":[{"name":"Department of Mathematics, Statistics and Insurance, The Hang Seng University of Hong Kong , Shatin, Hong Kong SAR,","place":["China"]},{"name":"Department of Statistics, The Chinese University of Hong Kong , Shatin, Hong Kong SAR,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3627-3994","authenticated-orcid":false,"given":"Ran","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Statistics, Faculty of Arts and Sciences, Beijing Normal University , Zhuhai 519087,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2026,1,14]]},"reference":[{"key":"2026020310415337500_btag014-B1","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1038\/ni.3310","article-title":"Structural interplay between germline interactions and adaptive recognition determines the bandwidth of TCR-peptide-MHC cross-reactivity","volume":"17","author":"Adams","year":"2016","journal-title":"Nat Immunol"},{"key":"2026020310415337500_btag014-B2","doi-asserted-by":"crossref","first-page":"W39","DOI":"10.1093\/nar\/gkv416","article-title":"The MEME suite","volume":"43","author":"Bailey","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2026020310415337500_btag014-B3","doi-asserted-by":"crossref","first-page":"e1005725","DOI":"10.1371\/journal.pcbi.1005725","article-title":"Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity","volume":"13","author":"Bassani-Sternberg","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2026020310415337500_btag014-B4","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2026020310415337500_btag014-B5","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2026020310415337500_btag014-B6","doi-asserted-by":"crossref","first-page":"e1000071","DOI":"10.1371\/journal.pcbi.1000071","article-title":"Discovering sequence motifs with arbitrary insertions and deletions","volume":"4","author":"Frith","year":"2008","journal-title":"PLoS Comput Biol"},{"key":"2026020310415337500_btag014-B7","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1016\/j.cels.2022.12.002","article-title":"Improved predictions of antigen presentation and TCR recognition with MixMHCpred2.2 and PRIME2.0 reveal potent SARS-CoV-2 CD8+ T-cell epitopes","volume":"14","author":"Gfeller","year":"2023","journal-title":"Cell Syst"},{"key":"2026020310415337500_btag014-B8","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1111\/tan.13358","article-title":"HLA common and well-documented alleles in China","volume":"92","author":"He","year":"2018","journal-title":"HLA"},{"key":"2026020310415337500_btag014-B9","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.gde.2017.02.007","article-title":"Transcription factor-DNA binding: beyond binding site motifs","volume":"43","author":"Inukai","year":"2017","journal-title":"Curr Opin Genet Dev"},{"key":"2026020310415337500_btag014-B10","doi-asserted-by":"crossref","first-page":"bbae552","DOI":"10.1093\/bib\/bbae552","article-title":"IMGT\/RobustpMHC: robust training for class-I MHC peptide binding prediction","volume":"25","author":"Kushwaha","year":"2024","journal-title":"Brief Bioinform"},{"key":"2026020310415337500_btag014-B11","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1126\/science.8211139","article-title":"Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment","volume":"262","author":"Lawrence","year":"1993","journal-title":"Science"},{"key":"2026020310415337500_btag014-B12","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1080\/01621459.1994.10476829","article-title":"The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem","volume":"89","author":"Liu","year":"1994","journal-title":"J Am Stat Assoc"},{"key":"2026020310415337500_btag014-B13","doi-asserted-by":"crossref","first-page":"bbad208","DOI":"10.1093\/bib\/bbad208","article-title":"A Bayesian approach to estimate MHC-peptide binding threshold","volume":"24","author":"Liu","year":"2023","journal-title":"Brief Bioinform"},{"key":"2026020310415337500_btag014-B14","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1016\/0092-8674(93)90490-H","article-title":"The antigenic identity of peptide-MHC complexes: a comparison of the conformations of five viral peptides presented by HLA-A2","volume":"75","author":"Madden","year":"1993","journal-title":"Cell"},{"key":"2026020310415337500_btag014-B15","doi-asserted-by":"crossref","DOI":"10.1201\/9781315533247","volume-title":"Janeway\u2019s Immunobiology","author":"Murphy","year":"2016"},{"key":"2026020310415337500_btag014-B16","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1186\/1471-2105-10-296","article-title":"NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction","volume":"10","author":"Nielsen","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2026020310415337500_btag014-B17","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1146\/annurev-immunol-082119-124838","article-title":"T cell epitope predictions","volume":"38","author":"Peters","year":"2020","journal-title":"Annu Rev Immunol"},{"key":"2026020310415337500_btag014-B18","doi-asserted-by":"crossref","first-page":"759","DOI":"10.1007\/s00251-008-0330-2","article-title":"MHC motif viewer","volume":"60","author":"Rapin","year":"2008","journal-title":"Immunogenetics"},{"key":"2026020310415337500_btag014-B19","doi-asserted-by":"crossref","first-page":"D174","DOI":"10.1093\/nar\/gkad1059","article-title":"JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles","volume":"52","author":"Rauluseviciute","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2026020310415337500_btag014-B20","doi-asserted-by":"crossref","first-page":"W449","DOI":"10.1093\/nar\/gkaa379","article-title":"NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data","volume":"48","author":"Reynisson","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2026020310415337500_btag014-B21","doi-asserted-by":"crossref","first-page":"3296","DOI":"10.1073\/pnas.86.9.3296","article-title":"Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis","volume":"86","author":"Sette","year":"1989","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2026020310415337500_btag014-B22","doi-asserted-by":"crossref","first-page":"D428","DOI":"10.1093\/nar\/gkac965","article-title":"The MHC Motif Atlas: a database of MHC binding specificities and ligands","volume":"51","author":"Tadros","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2026020310415337500_btag014-B23","doi-asserted-by":"crossref","first-page":"bbad156","DOI":"10.1093\/bib\/bbad156","article-title":"A survey on algorithms to characterize transcription factor binding sites","volume":"24","author":"Tognon","year":"2023","journal-title":"Brief Bioinform"},{"key":"2026020310415337500_btag014-B24","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.molcel.2014.05.032","article-title":"A million peptide motifs for the molecular biologist","volume":"55","author":"Tompa","year":"2014","journal-title":"Mol Cell"},{"key":"2026020310415337500_btag014-B25","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1038\/ni1432","article-title":"A T cell receptor flattens a bulged antigenic peptide presented by a major histocompatibility complex class I molecule","volume":"8","author":"Tynan","year":"2007","journal-title":"Nat Immunol"},{"key":"2026020310415337500_btag014-B26","doi-asserted-by":"crossref","first-page":"3645","DOI":"10.1093\/bioinformatics\/btx469","article-title":"Ggseqlogo: a versatile R package for drawing sequence logos","volume":"33","author":"Wagih","year":"2017","journal-title":"Bioinformatics"},{"key":"2026020310415337500_btag014-B27","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1038\/s41467-021-27669-8","article-title":"Structural assessment of HLA-A2-restricted SARS-CoV-2 spike epitopes recognized by public and private T-cell receptors","volume":"13","author":"Wu","year":"2022","journal-title":"Nat Commun"},{"key":"2026020310415337500_btag014-B28","doi-asserted-by":"crossref","first-page":"552","DOI":"10.1038\/nature00920","article-title":"Two-step binding mechanism for T-cell receptor recognition of peptide MHC","volume":"418","author":"Wu","year":"2002","journal-title":"Nature"},{"key":"2026020310415337500_btag014-B29","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1016\/j.molimm.2011.08.015","article-title":"Structural basis of cross-allele presentation by HLA-A0301 and HLA-A1101 revealed by two HIV-derived peptide complexes","volume":"49","author":"Zhang","year":"2011","journal-title":"Mol Immunol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btag014\/66419816\/btag014.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/2\/btag014\/66419816\/btag014.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/2\/btag014\/66419816\/btag014.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T15:42:01Z","timestamp":1770133321000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btag014\/8426182"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2026,1,3]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2026,1,3]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btag014","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,2]]},"published":{"date-parts":[[2026,1,3]]},"article-number":"btag014"}}