{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T22:53:39Z","timestamp":1777676019622,"version":"3.51.4"},"reference-count":74,"publisher":"SAGE Publications","issue":"5-6","license":[{"start":{"date-parts":[[2023,10,7]],"date-time":"2023-10-07T00:00:00Z","timestamp":1696636800000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["39KJ95010"],"award-info":[{"award-number":["39KJ95010"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of High Performance Computing Applications"],"published-print":{"date-parts":[[2022,11]]},"abstract":"<jats:p>The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on \u223c9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.<\/jats:p>","DOI":"10.1177\/10943420221121804","type":"journal-article","created":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T16:55:56Z","timestamp":1665161756000},"page":"587-602","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":24,"title":["Language models for the prediction of SARS-CoV-2 inhibitors"],"prefix":"10.1177","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7330-9656","authenticated-orcid":false,"given":"Andrew E","family":"Blanchard","sequence":"first","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John","family":"Gounley","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Debsindhu","family":"Bhowmik","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mayanka","family":"Chandra Shekar","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Isaac","family":"Lyngaas","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shang","family":"Gao","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junqi","family":"Yin","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aristeidis","family":"Tsaris","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feiyi","family":"Wang","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1852-3849","authenticated-orcid":false,"given":"Jens","family":"Glaser","sequence":"additional","affiliation":[{"name":"Oak Ridge National Laboratory, Oak Ridge, TN, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2022,10,7]]},"reference":[{"key":"bibr74-10943420221121804","unstructured":"GlaserJ (2021) Binding Affinity Training Data Set URL https:\/\/huggingface.co\/datasets\/jglaser\/binding_affinity"},{"key":"bibr1-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.0c01010"},{"key":"bibr2-10943420221121804","unstructured":"Achdout H, Aimon A, Bar-David E et al. (2020) COVID moonshot: open science discovery of SARS-CoV-2 main protease inhibitors by combining crowdsourcing, high-throughput experiments, computational simulations, and machine learning. bioRxiv."},{"key":"bibr3-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9005703"},{"key":"bibr4-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-019-0393-0"},{"key":"bibr5-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1038\/nchem.1243"},{"key":"bibr6-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-021-00494-3"},{"key":"bibr7-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/ci034290p"},{"key":"bibr8-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.8b00839"},{"key":"bibr9-10943420221121804","unstructured":"Chithrananda S, Grand G, Ramsundar B (2020) ChemBERTa: large-scale self-supervised pretraining for molecular property prediction (NeurIPS). URL http:\/\/arxiv.org\/abs\/2010.09885"},{"key":"bibr10-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1038\/nbt.1990"},{"key":"bibr11-10943420221121804","unstructured":"De Cao N, Kipf T (2018) MolGAN: an implicit generative model for small molecular graphs. ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models."},{"key":"bibr12-10943420221121804","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"bibr13-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1016\/S1473-3099(20)30120-1"},{"key":"bibr14-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-44874-8"},{"key":"bibr15-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3095381"},{"key":"bibr16-10943420221121804","unstructured":"Enamine (2020) Enamine\n                      REAL\n                      Database. https:\/\/enamine.net\/compound-collections\/real-compounds\/real-database. Accessed: 2020-04-01, throughhttps:\/\/virtual-flow.org\/"},{"key":"bibr17-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-1-8"},{"key":"bibr18-10943420221121804","unstructured":"Fischer W, Eron JJ, Holman W, et al. (2021) Molnupiravir, an oral antiviral treatment for COVID-19 URL https:\/\/www.medrxiv.org\/content\/early\/2021\/06\/17\/2021.06.17.21258639"},{"key":"bibr19-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acscentsci.0c00229"},{"key":"bibr20-10943420221121804","unstructured":"Goyal P, Doll\u00e1r P, Girshick R, et al. (2017) Accurate, large minibatch SGD: training ImageNet in 1 Hour.\n                      arXiv preprint arXiv:1706.02677\n                      ."},{"key":"bibr21-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.9b00943"},{"key":"bibr22-10943420221121804","unstructured":"Gu Y, Tinn R, Cheng H, et al. (2020) Domain-specific language model pretraining for biomedical natural language processing.\n                      arXiv\n                      URL https:\/\/arxiv.org\/abs\/2007.15779"},{"key":"bibr23-10943420221121804","unstructured":"Gurbych O, Druchok M, Yarish D, et al. (2020) High throughput screening with machine learning URL http:\/\/arxiv.org\/abs\/2012.08275"},{"key":"bibr24-10943420221121804","author":"HHS","year":"2021","journal-title":"Health & Human Services"},{"key":"bibr25-10943420221121804","unstructured":"Honda S, Shi S, Ueda HR (2019) Smiles transformer: pre-trained molecular fingerprint for low data drug discovery URL https:\/\/arxiv.org\/abs\/1911.04738"},{"key":"bibr26-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1007\/0-387-28111-8_5"},{"key":"bibr27-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1002\/prot.20512"},{"key":"bibr28-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1111\/j.1476-5381.2010.01127.x"},{"key":"bibr29-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1177\/10943420211010930"},{"key":"bibr30-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1039\/c8sc05372c"},{"key":"bibr31-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-021-03819-2"},{"key":"bibr32-10943420221121804","volume-title":"19th international workshop on data mining in bioinformatics","author":"Koyama K","year":"2020"},{"key":"bibr33-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00054"},{"key":"bibr34-10943420221121804","unstructured":"Laanait N, Romero J, Yin J, et al. (2019) Exascale deep learning for scientific inverse problems.\n                      arXiv preprint arXiv:1909.11150\n                      ."},{"key":"bibr35-10943420221121804","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.400"},{"key":"bibr36-10943420221121804","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.500"},{"key":"bibr37-10943420221121804","unstructured":"Lin Y, Han S, Mao H, et al. (2017) Deep gradient compression: reducing the communication bandwidth for distributed training URL https:\/\/arxiv.org\/abs\/1712.01887"},{"key":"bibr38-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkl999"},{"key":"bibr39-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu626"},{"key":"bibr40-10943420221121804","unstructured":"Liu Y, Ott M, Goyal N, et al. (2019) RoBERTa: a robustly optimized bert pretraining approach URL http:\/\/arxiv.org\/abs\/1907.11692"},{"key":"bibr41-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/ci300124c"},{"key":"bibr42-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.9b01053"},{"key":"bibr43-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1145\/2908812.2908916"},{"key":"bibr44-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"bibr45-10943420221121804","unstructured":"NIH (2021) COVID-19 treatment guidelines panel. Coronavirus disease 2019 (COVID-19) treatment guidelines.\n                      National Institutes of Health\n                      URL https:\/\/www.covid19treatmentguidelines.nih.gov\/"},{"key":"bibr46-10943420221121804","unstructured":"Oleksandr G, Maksym D, Dzvenymyra Y, et al. (2020) High throughput screening with machine learning URL https:\/\/arxiv.org\/abs\/2012.08275"},{"key":"bibr47-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bty593"},{"key":"bibr48-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3406703"},{"key":"bibr49-10943420221121804","unstructured":"RDKit (2021) RDKit: open-source cheminformatics. http:\/\/www.rdkit.org"},{"key":"bibr50-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1016\/j.vaccine.2017.04.082"},{"key":"bibr51-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/ar500432k"},{"key":"bibr52-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1126\/science.aat2663"},{"key":"bibr53-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"bibr54-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1039\/c8sc02339e"},{"key":"bibr55-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acscentsci.7b00512"},{"key":"bibr56-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1101\/2021.02.13.431008"},{"key":"bibr57-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1037\/11491-005"},{"key":"bibr58-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.6b00290"},{"key":"bibr59-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/acsmedchemlett.0c00088"},{"key":"bibr60-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2018.00055"},{"key":"bibr61-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/ja401184g"},{"key":"bibr62-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1145\/3307339.3342186"},{"key":"bibr63-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/ci00057a005"},{"key":"bibr64-10943420221121804","volume-title":"Therapeutics and COVID-19: Living Guideline","author":"WHO","year":"2021"},{"key":"bibr65-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1021\/ci990307l"},{"key":"bibr66-10943420221121804","unstructured":"Wolf T, Debut L, Sanh V et al. (2019) Huggingface\u2019s transformers: state-of-the-art natural language processing URL https:\/\/arxiv.org\/abs\/1910.03771"},{"key":"bibr67-10943420221121804","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"bibr68-10943420221121804","unstructured":"Wu Y, Schuster M, Chen Z, et al. (2016) Google\u2019s neural machine translation system: bridging the gap between human and machine translation URL http:\/\/arxiv.org\/abs\/1609.08144"},{"key":"bibr69-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1101\/2020.12.23.424259"},{"key":"bibr70-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gks966"},{"key":"bibr71-10943420221121804","doi-asserted-by":"publisher","DOI":"10.1246\/cl.180665"},{"key":"bibr72-10943420221121804","unstructured":"You Y, Li J, Reddi S, et al. (2019) Large batch optimization for deep learning: training BERT in 76 minutes URL http:\/\/arxiv.org\/abs\/1904.00962"},{"key":"bibr73-10943420221121804","unstructured":"Zheng S, Lin H, Zha S, et al. (2020) Accelerated large batch optimization of BERT pretraining in 54 minutes.\n                      arXiv preprint arXiv:2006.13484\n                      ."}],"container-title":["The International Journal of High Performance Computing Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221121804","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/10943420221121804","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221121804","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/10943420221121804","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T08:17:24Z","timestamp":1777450644000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/10943420221121804"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,7]]},"references-count":74,"journal-issue":{"issue":"5-6","published-print":{"date-parts":[[2022,11]]}},"alternative-id":["10.1177\/10943420221121804"],"URL":"https:\/\/doi.org\/10.1177\/10943420221121804","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.12.10.471928","asserted-by":"object"}]},"ISSN":["1094-3420","1741-2846"],"issn-type":[{"value":"1094-3420","type":"print"},{"value":"1741-2846","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,7]]}}}