{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T03:38:00Z","timestamp":1774409880334,"version":"3.50.1"},"reference-count":70,"publisher":"IOP Publishing","issue":"3","license":[{"start":{"date-parts":[[2024,8,2]],"date-time":"2024-08-02T00:00:00Z","timestamp":1722556800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2024,8,2]],"date-time":"2024-08-02T00:00:00Z","timestamp":1722556800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"PUNCH4NFDI","award":["460248186"],"award-info":[{"award-number":["460248186"]}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["EXC 2121 Quantum Universe \u2013 390833306"],"award-info":[{"award-number":["EXC 2121 Quantum Universe \u2013 390833306"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2024,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new<jats:sc>OmniJet<\/jats:sc>-<jats:italic>\u03b1<\/jats:italic>model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.<\/jats:p>","DOI":"10.1088\/2632-2153\/ad66ad","type":"journal-article","created":{"date-parts":[[2024,7,23]],"date-time":"2024-07-23T15:51:19Z","timestamp":1721749879000},"page":"035031","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":25,"title":["OmniJet-\u03b1: the first cross-task foundation model for particle physics"],"prefix":"10.1088","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1931-0127","authenticated-orcid":false,"given":"Joschka","family":"Birk","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1551-814X","authenticated-orcid":true,"given":"Anna","family":"Hallin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3457-2755","authenticated-orcid":false,"given":"Gregor","family":"Kasieczka","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2024,8,2]]},"reference":[{"key":"mlstad66adbib1","article-title":"On the opportunities and risks of foundation models","author":"Bommasani","year":"2022"},{"key":"mlstad66adbib2","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019"},{"key":"mlstad66adbib3","article-title":"BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","author":"Lewis","year":"2019"},{"key":"mlstad66adbib4","article-title":"Language models are few-shot learners","author":"Brown","year":"2020"},{"key":"mlstad66adbib5","article-title":"LLaMA: open and efficient foundation language models","author":"Touvron","year":"2023"},{"key":"mlstad66adbib6","article-title":"Zero-shot text-to-image generation","author":"Ramesh","year":"2022"},{"key":"mlstad66adbib7","doi-asserted-by":"publisher","first-page":"014","DOI":"10.21468\/SciPostPhys.7.1.014","article-title":"The machine learning landscape of top taggers","volume":"7","author":"Kasieczka","year":"2019","journal-title":"SciPost Phys."},{"key":"mlstad66adbib8","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1038\/s42254-022-00455-1","article-title":"Machine learning in the search for new fundamental physics","volume":"4","author":"Karagiorgi","year":"2022","journal-title":"Nat. Rev. Phys."},{"key":"mlstad66adbib9","doi-asserted-by":"publisher","first-page":"JHEP10(2018)121","DOI":"10.1007\/JHEP10(2018)121","article-title":"Pulling out all the tops with computer vision and deep learning","author":"Macaluso","year":"2018","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib10","first-page":"pp 18281","article-title":"Particle transformer for jet tagging","author":"Qu","year":"2022"},{"key":"mlstad66adbib11","article-title":"Finetuning foundation models for joint analysis optimization","author":"Vigl","year":"2024"},{"key":"mlstad66adbib12","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1007\/s41781-018-0018-8","article-title":"A roadmap for HEP software and computing R&D for the 2020s","volume":"3","author":"(HEP Software Foundation)","year":"2019","journal-title":"Comput. Softw. Big Sci."},{"key":"mlstad66adbib13","author":"Boehnlein","year":"2022"},{"key":"mlstad66adbib14","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.120.042003","article-title":"Accelerating science with generative adversarial networks: an application to 3D particle showers in multilayer calorimeters","volume":"120","author":"Paganini","year":"2018","journal-title":"Phys. Rev. Lett."},{"key":"mlstad66adbib15","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/s41781-021-00056-0","article-title":"Getting high: high fidelity simulation of high granularity calorimeters with high speed","volume":"5","author":"Buhmann","year":"2021","journal-title":"Comput. Softw. Big Sci."},{"key":"mlstad66adbib16","article-title":"CaloClouds II: ultra-fast geometry-independent highly-granular calorimeter simulation","author":"Buhmann","year":"2023"},{"key":"mlstad66adbib17","article-title":"New directions for surrogate models and differentiable programming for high energy physics detector simulation","author":"Adelmann","year":"2022"},{"key":"mlstad66adbib18","doi-asserted-by":"publisher","first-page":"079","DOI":"10.21468\/SciPostPhys.14.4.079","article-title":"Machine learning and LHC event generation","volume":"14","author":"Butter","year":"2023","journal-title":"SciPost Phys."},{"key":"mlstad66adbib19","article-title":"Deep generative models for detector signature simulation: an analytical taxonomy","author":"Hashemi","year":"2023"},{"key":"mlstad66adbib20","doi-asserted-by":"publisher","first-page":"075","DOI":"10.21468\/SciPostPhys.7.6.075","article-title":"How to GAN LHC events","volume":"7","author":"Butter","year":"2019","journal-title":"SciPost Phys."},{"key":"mlstad66adbib21","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1007\/s41781-017-0004-6","article-title":"Learning particle physics by example: location-aware generative adversarial networks for physics synthesis","volume":"1","author":"de Oliveira","year":"2017","journal-title":"Comput. Softw. Big Sci."},{"key":"mlstad66adbib22","article-title":"Les Houches guide to reusable ML models in LHC analyses","author":"Jack","year":"2024"},{"key":"mlstad66adbib23","doi-asserted-by":"crossref","DOI":"10.1140\/epjc\/s10052-024-13353-w","article-title":"Classifier surrogates: sharing AI-based searches with the world","author":"Bieringer","year":"2024"},{"key":"mlstad66adbib24","doi-asserted-by":"publisher","first-page":"188","DOI":"10.21468\/SciPostPhys.12.6.188","article-title":"Symmetries, safety and self-supervision","volume":"12","author":"Dillon","year":"2022","journal-title":"SciPost Phys."},{"key":"mlstad66adbib25","article-title":"Semi-visible jets, energy-based models, and self-supervision","author":"Favaro","year":"2023"},{"key":"mlstad66adbib26","article-title":"Anomalies, representations, and self-supervision","author":"Dillon","year":"2023"},{"key":"mlstad66adbib27","doi-asserted-by":"publisher","first-page":"JHEP07(2023)108","DOI":"10.1007\/JHEP07(2023)108","article-title":"Neural embedding: learning the embedding of the manifold of physics data","author":"Park","year":"2023","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib28","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.106.056005","article-title":"Self-supervised anomaly detection for new physics","volume":"106","author":"Dillon","year":"2022","journal-title":"Phys. Rev. D"},{"key":"mlstad66adbib29","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/s41781-022-00082-6","article-title":"Shared data and algorithms for deep learning in fundamental physics","volume":"6","author":"Benato","year":"2022","journal-title":"Comput. Softw. Big Sci."},{"key":"mlstad66adbib30","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/18\/11\/P11003","article-title":"Generalizing to new geometries with geometry-aware autoregressive models (GAAMs) for fast calorimeter simulation","volume":"18","author":"Liu","year":"2023","journal-title":"J. Instrum."},{"key":"mlstad66adbib31","doi-asserted-by":"publisher","DOI":"10.1016\/j.physletb.2023.138079","article-title":"MetaHEP: meta learning for fast shower simulation of high energy physics experiments","volume":"844","author":"Salamani","year":"2023","journal-title":"Phys. Lett. B"},{"key":"mlstad66adbib32","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.105.094030","article-title":"Metalearning and data augmentation for mass-generalized jet taggers","volume":"105","author":"Dolan","year":"2022","journal-title":"Phys. Rev. D"},{"key":"mlstad66adbib33","doi-asserted-by":"publisher","first-page":"JHEP02(2024)138","DOI":"10.1007\/JHEP02(2024)138","article-title":"Improving the performance of weak supervision searches using transfer and meta-learning","author":"Beauchesne","year":"2024","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib34","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.2603256)","article-title":"Top quark tagging reference dataset","author":"Kasieczka","year":"2019"},{"key":"mlstad66adbib35","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.6619768)","article-title":"JetClass: a large-scale dataset for deep learning in jet physics","author":"Qu","year":"2022"},{"key":"mlstad66adbib36","article-title":"Attention is all you need","author":"Vaswani","year":"2017"},{"key":"mlstad66adbib37","doi-asserted-by":"publisher","first-page":"JHEP06(2023)184","DOI":"10.1007\/JHEP06(2023)184","article-title":"Learning the language of QCD jets with transformers","author":"Finke","year":"2023","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib38","article-title":"Jet diffusion versus JetGPT\u2014modern networks for the LHC","author":"Butter","year":"2023"},{"key":"mlstad66adbib39","doi-asserted-by":"crossref","DOI":"10.1088\/2632-2153\/ad64a8","article-title":"Masked particle modeling on sets: towards self-supervised high energy physics foundation models","author":"Heinrich","year":"2024"},{"key":"mlstad66adbib40","article-title":"A language model for particle tracking","author":"Huang","year":"2024"},{"key":"mlstad66adbib41","article-title":"Ultra-high-resolution detector simulation with intra-event aware GAN and self-supervised relational reasoning","author":"Hashemi","year":"2023"},{"key":"mlstad66adbib42","article-title":"Re-simulation-based self-supervised learning for pre-training foundation models","author":"Harris","year":"2024"},{"key":"mlstad66adbib43","doi-asserted-by":"publisher","first-page":"JHEP01(2019)121","DOI":"10.1007\/JHEP01(2019)121","article-title":"Energy flow networks: deep sets for particle jets","author":"Komiske","year":"2019","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib44","doi-asserted-by":"crossref","DOI":"10.21468\/SciPostPhys.15.4.130","article-title":"EPiC-GAN: equivariant point cloud generation for particle jets","author":"Buhmann","year":"2023"},{"key":"mlstad66adbib45","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/18\/11\/P11025","article-title":"CaloClouds: fast geometry-independent highly-granular calorimeter simulation","volume":"18","author":"Buhmann","year":"2023","journal-title":"J. Instrum."},{"key":"mlstad66adbib46","article-title":"Neural discrete representation learning","author":"van den Oord","year":"2018"},{"key":"mlstad66adbib47","article-title":"BEiT: BERT pre-training of image transformers","author":"Bao","year":"2022"},{"key":"mlstad66adbib48","doi-asserted-by":"publisher","first-page":"JHEP07(2014)079","DOI":"10.1007\/JHEP07(2014)079","article-title":"The automated computation of tree-level and next-to-leading order differential cross sections and their matching to parton shower simulations","author":"Alwall","year":"2014","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib49","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1016\/j.cpc.2015.01.024","article-title":"An introduction to PYTHIA 8.2","volume":"191","author":"Sj\u00f6strand","year":"2015","journal-title":"Comput. Phys. Commun."},{"key":"mlstad66adbib50","doi-asserted-by":"publisher","first-page":"JHEP02(2014)057","DOI":"10.1007\/JHEP02(2014)057","article-title":"DELPHES 3: a modular framework for fast simulation of a generic collider experiment","author":"de Favereau","year":"2014","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib51","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/3\/08\/S08004","article-title":"The CMS experiment at the CERN LHC","volume":"3","author":"The CMS Collaboration","year":"2008","journal-title":"J. Instrum."},{"key":"mlstad66adbib52","doi-asserted-by":"publisher","first-page":"JHEP04(2008)063","DOI":"10.1088\/1126-6708\/2008\/04\/063","article-title":"The anti-kt jet clustering algorithm","author":"Cacciari","year":"2008","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib53","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.7671687)","article-title":"vector","author":"Schreiner","year":"2023"},{"key":"mlstad66adbib54","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.10498548)","article-title":"Awkward Array","author":"Pivarski","year":"2024"},{"key":"mlstad66adbib55","article-title":"Straightening out the straight-through estimator: overcoming optimization challenges in vector quantized networks","author":"Huh","year":"2023"},{"key":"mlstad66adbib56","article-title":"Improving language understanding by generative pre-training","author":"Radford","year":"2018"},{"key":"mlstad66adbib57","article-title":"Layer normalization","author":"Ba","year":"2016"},{"key":"mlstad66adbib58","doi-asserted-by":"publisher","first-page":"JHEP03(2011)015","DOI":"10.1007\/JHEP03(2011)015","article-title":"Identifying boosted objects with N-subjettiness","author":"Thaler","year":"2011","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib59","article-title":"Deep sets","author":"Zaheer","year":"2018"},{"key":"mlstad66adbib60","article-title":"Normformer: improved transformer pretraining with extra normalization","author":"Shleifer","year":"2021"},{"key":"mlstad66adbib61","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.107.113003","article-title":"Fast and accurate simulations of calorimeter showers with normalizing flows","volume":"107","author":"Krause","year":"2023","journal-title":"Phys. Rev. D"},{"key":"mlstad66adbib62","article-title":"How to understand limitations of generative networks","author":"Das","year":"2023"},{"key":"mlstad66adbib63","article-title":"Flow matching beyond kinematics: generating jets with particle-id and trajectory displacement information","author":"Birk","year":"2023"},{"key":"mlstad66adbib64","doi-asserted-by":"publisher","first-page":"JHEP05(2017)006","DOI":"10.1007\/JHEP05(2017)006","article-title":"Deep-learning top taggers or the end of QCD?","author":"Kasieczka","year":"2017","journal-title":"J. High Energy Phys."},{"key":"mlstad66adbib65","first-page":"pp 8024","article-title":"PyTorch: an imperative style, high-performance deep learning library","volume":"vol 32","author":"Paszke","year":"2019"},{"key":"mlstad66adbib66","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.10779019)","article-title":"Pytorch lightning","author":"(The PyTorch Lightning Team)","year":"2024"},{"key":"mlstad66adbib67","article-title":"vqtorch: PyTorch package for vector quantization","author":"Huh","year":"2022"},{"key":"mlstad66adbib68","article-title":"Decoupled weight decay regularization","author":"Loshchilov","year":"2019"},{"key":"mlstad66adbib69","article-title":"A disciplined approach to neural network hyper-parameters: Part 1\u2014learning rate, batch size, momentum, and weight decay","author":"Smith","year":"2018"},{"key":"mlstad66adbib70","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2017"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,24]],"date-time":"2024-11-24T17:11:43Z","timestamp":1732468303000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad66ad"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,2]]},"references-count":70,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2024,8,2]]},"published-print":{"date-parts":[[2024,9,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ad66ad","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,2]]},"assertion":[{"value":"OmniJet-\u03b1: the first cross-task foundation model for particle physics","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2024 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-04-11","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-07-23","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-08-02","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}