{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T03:56:52Z","timestamp":1772769412116,"version":"3.50.1"},"reference-count":56,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:00:00Z","timestamp":1750982400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T00:00:00Z","timestamp":1750982400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/100006151","name":"Basic Energy Sciences","doi-asserted-by":"crossref","award":["DE-AC02-76SF00515"],"award-info":[{"award-number":["DE-AC02-76SF00515"]}],"id":[{"id":"10.13039\/100006151","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003497","name":"Bundesbeh\u00f6rden der Schweizerischen Eidgenossenschaft","doi-asserted-by":"crossref","award":["Swiss Government Excellence Scholarships for Forei"],"award-info":[{"award-number":["Swiss Government Excellence Scholarships for Forei"]}],"id":[{"id":"10.13039\/501100003497","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["EXC-2094-390783311"],"award-info":[{"award-number":["EXC-2094-390783311"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]},{"name":"USA-Israel BSF","award":["2022641"],"award-info":[{"award-number":["2022641"]}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"crossref","award":["200020_212127"],"award-info":[{"award-number":["200020_212127"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>In this work, we significantly enhance masked particle modeling (MPM), a self-supervised learning scheme for constructing highly expressive representations of unordered sets relevant to developing foundation models for high-energy physics. In MPM, a model is trained to recover the missing elements of a set, a learning objective that requires no labels and can be applied directly to experimental data. We achieve significant performance improvements over previous work on MPM by addressing inefficiencies in the implementation and incorporating a more powerful decoder. We compare several pre-training tasks and introduce new reconstruction methods that utilize conditional generative models without data tokenization or discretization. We show that these new methods outperform the tokenized learning objective from the original MPM on a new test bed for foundation models for jets, which includes using a wide variety of downstream tasks relevant to jet physics, such as classification, secondary vertex finding, and track identification.<\/jats:p>","DOI":"10.1088\/2632-2153\/addb98","type":"journal-article","created":{"date-parts":[[2025,5,21]],"date-time":"2025-05-21T18:54:36Z","timestamp":1747853676000},"page":"025075","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Is tokenization needed for masked particle modeling?"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1406-1413","authenticated-orcid":true,"given":"Matthew","family":"Leigh","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2999-6150","authenticated-orcid":true,"given":"Samuel","family":"Klein","sequence":"additional","affiliation":[]},{"given":"Fran\u00e7ois","family":"Charton","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8535-6687","authenticated-orcid":true,"given":"Tobias","family":"Golling","sequence":"additional","affiliation":[]},{"given":"Lukas","family":"Heinrich","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3386-6869","authenticated-orcid":true,"given":"Michael","family":"Kagan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6156-1790","authenticated-orcid":true,"given":"In\u00eas","family":"Ochoa","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5480-5099","authenticated-orcid":true,"given":"Margarita","family":"Osadchy","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,6,27]]},"reference":[{"key":"mlstaddb98bib1","article-title":"On the opportunities and risks of foundation models","author":"Bommasani","year":"2022"},{"key":"mlstaddb98bib2","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019"},{"key":"mlstaddb98bib3","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018"},{"key":"mlstaddb98bib4","doi-asserted-by":"publisher","first-page":"9630","DOI":"10.1109\/ICCV48922.2021.00951","article-title":"Emerging properties in self-supervised vision transformers","author":"Caron","year":"2021"},{"key":"mlstaddb98bib5","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2106.08254","article-title":"Beit: Bert pre-training of image transformers","author":"Bao","year":"2022"},{"key":"mlstaddb98bib6","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/3\/08\/S08003","article-title":"The ATLAS experiment at the CERN large hadron collider","volume":"3","author":"ATLAS Collaboration","year":"2008","journal-title":"J. Instrum."},{"key":"mlstaddb98bib7","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/3\/08\/S08004","article-title":"The CMS experiment at the CERN LHC","volume":"3","author":"CMS Collaboration","year":"2008","journal-title":"J. Instrum."},{"key":"mlstaddb98bib8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.physletb.2012.08.020","article-title":"Observation of a new particle in the search for the standard model higgs boson with the atlas detector at the LHC","volume":"716","author":"The ATLAS Collaboration","year":"2012","journal-title":"Phys. Lett. B"},{"key":"mlstaddb98bib9","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1016\/j.physletb.2012.08.021","article-title":"Observation of a new boson at a mass of 125 gev with the CMS experiment at the LHC","volume":"716","author":"The CMS Collaboration","year":"2012","journal-title":"Phys. Lett. B"},{"key":"mlstaddb98bib10","doi-asserted-by":"publisher","first-page":"014","DOI":"10.21468\/SciPostPhys.7.1.014","article-title":"The Machine Learning landscape of top taggers","volume":"7","author":"Kasieczka","year":"2019","journal-title":"SciPost Phys."},{"key":"mlstaddb98bib11","doi-asserted-by":"publisher","first-page":"540","DOI":"10.1140\/epjc\/s10052-021-09342-y","article-title":"Secondary vertex finding in jets with neural networks","volume":"81","author":"Shlomi","year":"2021","journal-title":"Eur. Phys. J. C"},{"key":"mlstaddb98bib12","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/19\/08\/P08018","article-title":"Accuracy versus precision in boosted top tagging with the atlas detector","volume":"19","author":"The ATLAS Collaboration","year":"2024","journal-title":"J. Instrum."},{"key":"mlstaddb98bib13","doi-asserted-by":"publisher","first-page":"250","DOI":"10.1016\/S0168-9002(03)01368-8","article-title":"GEANT4: A Simulation toolkit","volume":"A506","author":"Geant4 Collaboration","year":"2003","journal-title":"Nucl. Instrum. Methods A"},{"key":"mlstaddb98bib14","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ad64a8","article-title":"Masked particle modeling on sets: towards self-supervised high energy physics foundation models","volume":"5","author":"Golling","year":"2024","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstaddb98bib15","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1145\/1390156.1390294","article-title":"Extracting and composing robust features with denoising autoencoders","author":"Vincent","year":"2008"},{"key":"mlstaddb98bib16","doi-asserted-by":"publisher","first-page":"16000","DOI":"10.48550\/arXiv.2111.06377","article-title":"Masked autoencoders are scalable vision learners","author":"He","year":"2022"},{"key":"mlstaddb98bib17","doi-asserted-by":"publisher","first-page":"9653","DOI":"10.48550\/arXiv.2111.09886","article-title":"Simmim: a simple framework for masked image modeling","author":"Xie","year":"2022"},{"key":"mlstaddb98bib18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1711.00937","article-title":"Neural discrete representation learning","author":"van den Oord","year":"2017"},{"key":"mlstaddb98bib19","article-title":"ibot: Image bert pre-training with online tokenizer","author":"Zhou","year":"2022"},{"key":"mlstaddb98bib20","doi-asserted-by":"publisher","first-page":"188","DOI":"10.21468\/SciPostPhys.12.6.188","article-title":"Symmetries, safety and self-supervision","volume":"12","author":"Dillon","year":"2022","journal-title":"SciPost Phys."},{"key":"mlstaddb98bib21","doi-asserted-by":"publisher","first-page":"1597","DOI":"10.48550\/arXiv.2002.05709","article-title":"A simple framework for contrastive learning of visual representations","author":"Chen","year":"2020"},{"key":"mlstaddb98bib22","article-title":"Re-simulation-based self-supervised learning for pre-training foundation models","volume":"111","author":"Harris","year":"2024","journal-title":"Phys. Rev. D"},{"key":"mlstaddb98bib23","doi-asserted-by":"crossref","DOI":"10.1088\/2632-2153\/ad66ad","article-title":"Omnijet-\u03b1: The first cross-task foundation model for particle physics","author":"Birk","year":"2024"},{"key":"mlstaddb98bib24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2312.06909","article-title":"Pre-training strategy using real particle collision data for event classification in collider physics","author":"Kishimoto","year":"2023"},{"key":"mlstaddb98bib25","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ad55a3","article-title":"Finetuning foundation models for joint analysis optimization in high energy physics","volume":"5","author":"Vigl","year":"2024","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstaddb98bib26","article-title":"Omnilearn: a method to simultaneously facilitate all jet physics tasks","author":"Mikuni","year":"2024"},{"key":"mlstaddb98bib27","doi-asserted-by":"publisher","first-page":"JHEP02(2014)057","DOI":"10.1007\/JHEP02(2014)057","article-title":"DELPHES 3, A modular framework for fast simulation of a generic collider experiment","author":"de Favereau","year":"2014","journal-title":"J. High Energy Phys."},{"key":"mlstaddb98bib28","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.6619768)","article-title":"JetClass: A Large-Scale Dataset for Deep Learning in Jet Physics","author":"Qu","year":"2022"},{"key":"mlstaddb98bib29","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.13350327)","article-title":"Dataset for flavour tagging R&D","author":"Ochoa","year":"2024"},{"key":"mlstaddb98bib30","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1016\/j.cpc.2008.01.036","article-title":"A brief introduction to pythia 8.1","volume":"178","author":"Sj\u00f6strand","year":"2008","journal-title":"Comput. Phys. Commun."},{"key":"mlstaddb98bib31","doi-asserted-by":"publisher","first-page":"JHEP07(2014)079","DOI":"10.1007\/JHEP07(2014)079","article-title":"The automated computation of tree-level and next-to-leading order differential cross sections and their matching to parton shower simulations","author":"Alwall","year":"2014","journal-title":"J. High Energy Phys."},{"key":"mlstaddb98bib32","doi-asserted-by":"publisher","first-page":"JHEP04(2008)063","DOI":"10.1088\/1126-6708\/2008\/04\/063","article-title":"The anti-kt jet clustering algorithm","author":"Cacciari","year":"2008","journal-title":"J. High Energy Phys."},{"key":"mlstaddb98bib33","article-title":"Deep sets based neural networks for impact parameter flavour tagging in ATLAS","author":"ATLAS Collaboration","year":"2020"},{"key":"mlstaddb98bib34","article-title":"Flashattention-2: faster attention with better parallelism and work partitioning","author":"Dao","year":"2023"},{"key":"mlstaddb98bib35","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2110.04627","article-title":"Vector-quantized image modeling with improved VQGAN","author":"Yu","year":"2022"},{"key":"mlstaddb98bib36","article-title":"TorchPQ","author":"Omer","year":"2021"},{"key":"mlstaddb98bib37","doi-asserted-by":"publisher","first-page":"1530","DOI":"10.48550\/arXiv.1505.05770","article-title":"Variational inference with normalizing flows","author":"Rezende","year":"2015"},{"key":"mlstaddb98bib38","doi-asserted-by":"publisher","first-page":"5361","DOI":"10.21105\/joss.05361","article-title":"Normflows: a pytorch package for normalizing flows","volume":"8","author":"Stimper","year":"2023","journal-title":"J. Open Source Softw."},{"key":"mlstaddb98bib39","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2210.02747","article-title":"Flow matching for generative modeling","author":"Lipman","year":"2023"},{"key":"mlstaddb98bib40","doi-asserted-by":"crossref","DOI":"10.1145\/3680528.3687625","article-title":"Fast high-resolution image synthesis with latent adversarial diffusion distillation","author":"Sauer","year":"2024"},{"key":"mlstaddb98bib41","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.109.012010","article-title":"Faster diffusion model with improved quality for particle cloud generation","volume":"109","author":"Leigh","year":"2024","journal-title":"Phys. Rev. D"},{"key":"mlstaddb98bib42","doi-asserted-by":"publisher","first-page":"16284","DOI":"10.48550\/arXiv.2304.03283","article-title":"Diffusion models as masked autoencoders","author":"Wei","year":"2023"},{"key":"mlstaddb98bib43","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV51070.2023.00387","article-title":"Scalable diffusion models with transformers","author":"Peebles","year":"2023"},{"key":"mlstaddb98bib44","article-title":"Decoupled weight decay regularization","author":"Loshchilov","year":"2019"},{"key":"mlstaddb98bib45","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1109\/ICCV48922.2021.00010","article-title":"Going deeper with image transformers","author":"Touvron","year":"2021"},{"key":"mlstaddb98bib46","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2309.16588","article-title":"Vision transformers need registers","author":"Darcet","year":"2024"},{"key":"mlstaddb98bib47","article-title":"Particle transformer for jet tagging","author":"Qu","year":"2022"},{"key":"mlstaddb98bib48","doi-asserted-by":"publisher","first-page":"JHEP10(2017)174","DOI":"10.1007\/JHEP10(2017)174","article-title":"Classification without labels: learning from mixed samples in high energy physics","author":"Metodiev","year":"2017","journal-title":"J. High Energy Phys."},{"key":"mlstaddb98bib49","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.108.092008","article-title":"Learning to isolate muons in data","volume":"108","author":"Witkowski","year":"2023","journal-title":"Phys. Rev. D"},{"key":"mlstaddb98bib50","doi-asserted-by":"publisher","first-page":"JHEP04(2011)069","DOI":"10.1007\/JHEP04(2011)069","article-title":"Multivariate discrimination and the higgs+w\/z search","author":"Gallicchio","year":"2011","journal-title":"J. High Energy Phys."},{"key":"mlstaddb98bib51","article-title":"Graph neural network jet flavour tagging with the ATLAS detector","author":"ATLAS Collaboration","year":"2022"},{"key":"mlstaddb98bib52","first-page":"1","article-title":"Siamese neural networks for one-shot image recognition","volume":"vol 2","author":"Koch","year":"2015"},{"key":"mlstaddb98bib53","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Class."},{"key":"mlstaddb98bib54","article-title":"Normformer: improved transformer pretraining with extra normalization","author":"Shleifer","year":"2021"},{"key":"mlstaddb98bib55","doi-asserted-by":"publisher","first-page":"10524","DOI":"10.48550\/arXiv.2002.04745","article-title":"On layer normalization in the transformer architecture","author":"Xiong","year":"2020"},{"key":"mlstaddb98bib56","article-title":"Glu variants improve transformer","author":"Shazeer","year":"2020"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T07:05:03Z","timestamp":1751007903000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/addb98"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,27]]},"references-count":56,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,6,27]]},"published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/addb98","relation":{"has-review":[{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v5\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v5\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v1\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v1\/review2","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v3\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v4\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v2\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v3\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v3\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v2\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v2\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v4\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/ADDB98\/v1\/decision1","asserted-by":"object"}]},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,27]]},"assertion":[{"value":"Is tokenization needed for masked particle modeling?","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-10-11","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-05-21","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-06-27","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}