{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,2]],"date-time":"2026-03-02T21:51:35Z","timestamp":1772488295296,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T00:00:00Z","timestamp":1716940800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100004339","name":"Sanofi","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004339","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, and transfection efficiency.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To optimize LNPs, we developed and tested models that enable the virtual screening of LNPs with high transfection efficiency. Our best method uses the lipid Simplified Molecular-Input Line-Entry System (SMILES) as inputs to a large language model. Large language model-generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties, which could lead to higher efficiency and reduced experimental time and costs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Code and data links available at: https:\/\/github.com\/Sanofi-Public\/LipoBART.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae342","type":"journal-article","created":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T18:47:37Z","timestamp":1717008457000},"source":"Crossref","is-referenced-by-count":23,"title":["Representations of lipid nanoparticles using large language models for transfection efficiency prediction"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-6375-3084","authenticated-orcid":false,"given":"Saeed","family":"Moayedpour","sequence":"first","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]},{"given":"Jonathan","family":"Broadbent","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Toronto, ON, M5V 1V6,","place":["Canada"]}]},{"given":"Saleh","family":"Riahi","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]},{"given":"Michael","family":"Bailey","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]},{"given":"Hoa","family":"V. Thu","sequence":"additional","affiliation":[{"name":"DataSentics , Brno 602 00, Czech Republic"}]},{"given":"Dimitar","family":"Dobchev","sequence":"additional","affiliation":[{"name":"mRNA Center of Excellence, Marcy L\u2019Etoile , Sanofi, 69280,","place":["France"]}]},{"given":"Akshay","family":"Balsubramani","sequence":"additional","affiliation":[{"name":"mRNA Center of Excellence, Sanofi , Waltham, MA, 02451,","place":["United States"]}]},{"given":"Ricardo","family":"N.D. Santos","sequence":"additional","affiliation":[{"name":"mRNA Center of Excellence, Marcy L\u2019Etoile , Sanofi, 69280,","place":["France"]}]},{"given":"Lorenzo","family":"Kogler-Anele","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Toronto, ON, M5V 1V6,","place":["Canada"]}]},{"given":"Alejandro","family":"Corrochano-Navarro","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]},{"given":"Sizhen","family":"Li","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]},{"given":"Fernando","family":"U. Montoya","sequence":"additional","affiliation":[{"name":"mRNA Center of Excellence, Marcy L\u2019Etoile , Sanofi, 69280,","place":["France"]}]},{"given":"Vikram","family":"Agarwal","sequence":"additional","affiliation":[{"name":"mRNA Center of Excellence, Sanofi , Waltham, MA, 02451,","place":["United States"]}]},{"given":"Ziv","family":"Bar-Joseph","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]},{"given":"Sven","family":"Jager","sequence":"additional","affiliation":[{"name":"Digital R&D, Sanofi , Cambridge, MA, 02141,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2024,5,29]]},"reference":[{"key":"2024121222494047600_btae342-B1","doi-asserted-by":"crossref","first-page":"2860","DOI":"10.1093\/bioinformatics\/btv285","article-title":"The SwissLipids knowledgebase for lipid biology","volume":"31","author":"Aimo","year":"2015","journal-title":"Bioinformatics"},{"key":"2024121222494047600_btae342-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1056\/NEJMoa2035389","article-title":"Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine","volume":"384","author":"Baden","year":"2021","journal-title":"N Engl J Med"},{"key":"2024121222494047600_btae342-B27","author":"Bjerrum E, Edwards L"},{"key":"2024121222494047600_btae342-B3","first-page":"5027","article-title":"Quantifying lipid nanoparticle-mediated GFP expression in the murine retina","volume":"64","author":"Curtis","year":"2023","journal-title":"Invest Ophthalmol Vis Sci"},{"key":"2024121222494047600_btae342-B4","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/j.colsurfb.2019.01.027","article-title":"Erythropoietin-loaded solid lipid nanoparticles: preparation, optimization, and in vivo evaluation","volume":"178","author":"Dara","year":"2019","journal-title":"Colloids Surf B Biointerfaces"},{"key":"2024121222494047600_btae342-B5","article-title":"Machine learning-guided lipid nanoparticle design for mRNA delivery","author":"Ding","year":"2023"},{"key":"2024121222494047600_btae342-B6","first-page":"2224","article-title":"Convolutional networks on graphs for learning molecular fingerprints","volume":"28","author":"Duvenaud","year":"2015","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024121222494047600_btae342-B7","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1021\/acs.accounts.1c00544","article-title":"Chemistry of lipid nanoparticles for RNA delivery","volume":"55","author":"Eygeris","year":"2021","journal-title":"Acc Chem Res"},{"key":"2024121222494047600_btae342-B8","doi-asserted-by":"crossref","first-page":"D1100","DOI":"10.1093\/nar\/gkr777","article-title":"ChEMBL: a large-scale bioactivity database for drug discovery","volume":"40","author":"Gaulton","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2024121222494047600_btae342-B9","author":"Gilmer"},{"key":"2024121222494047600_btae342-B10","article-title":"Chemception: a deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR\/QSPR models","author":"Goh","year":"2017"},{"key":"2024121222494047600_btae342-B11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/natrevmats.2017.56","article-title":"Tools for translation: non-viral materials for therapeutic mRNA delivery","volume":"2","author":"Hajj","year":"2017","journal-title":"Nat Rev Mater"},{"key":"2024121222494047600_btae342-B12","doi-asserted-by":"crossref","first-page":"7233","DOI":"10.1038\/s41467-021-27493-0","article-title":"An ionizable lipid toolbox for RNA delivery","volume":"12","author":"Han","year":"2021","journal-title":"Nat Commun"},{"key":"2024121222494047600_btae342-B13","article-title":"Improving neural networks by preventing co-adaptation of feature detectors","author":"Hinton","year":"2012"},{"key":"2024121222494047600_btae342-B14","doi-asserted-by":"crossref","first-page":"1078","DOI":"10.1038\/s41578-021-00358-0","article-title":"Lipid nanoparticles for mRNA delivery","volume":"6","author":"Hou","year":"2021","journal-title":"Nat Rev Mater"},{"key":"2024121222494047600_btae342-B15","doi-asserted-by":"crossref","first-page":"015022","DOI":"10.1088\/2632-2153\/ac3ffb","article-title":"Chemformer: a pre-trained transformer for computational chemistry","volume":"3","author":"Irwin","year":"2022","journal-title":"Mach Learn Sci Technol"},{"key":"2024121222494047600_btae342-B16","doi-asserted-by":"crossref","first-page":"7300","DOI":"10.1021\/acs.nanolett.5b02497","article-title":"Optimization of lipid nanoparticle formulations for mRNA delivery in vivo with fractional factorial and definitive screening designs","volume":"15","author":"Kauffman","year":"2015","journal-title":"Nano Lett"},{"key":"2024121222494047600_btae342-B17","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.addr.2020.12.014","article-title":"Self-assembled mRNA vaccines","volume":"170","author":"Kim","year":"2021","journal-title":"Adv Drug Deliv Rev"},{"key":"2024121222494047600_btae342-B18","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/j.copbio.2021.09.016","article-title":"Principles for designing an optimal mRNA lipid nanoparticle vaccine","volume":"73","author":"Kon","year":"2022","journal-title":"Curr Opin Biotechnol"},{"key":"2024121222494047600_btae342-B19","article-title":"Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension","author":"Lewis","year":"2019"},{"key":"2024121222494047600_btae342-B20","doi-asserted-by":"crossref","first-page":"8099","DOI":"10.1021\/acs.nanolett.5b03528","article-title":"An orthogonal array optimization of lipid-like nanoparticles for mRNA delivery in vivo","volume":"15","author":"Li","year":"2015","journal-title":"Nano Lett"},{"key":"2024121222494047600_btae342-B21","doi-asserted-by":"crossref","DOI":"10.1101\/2023.09.09.556981","article-title":"Codonbert: large language models for mRNA design and optimization","author":"Li","year":"2023"},{"key":"2024121222494047600_btae342-B22","doi-asserted-by":"crossref","first-page":"1835","DOI":"10.1021\/acs.bioconjchem.0c00295","article-title":"Combinatorial library of cyclic benzylidene acetal-containing pH-responsive lipidoid nanoparticles for intracellular mRNA delivery","volume":"31","author":"Li","year":"2020","journal-title":"Bioconjug Chem"},{"key":"2024121222494047600_btae342-B23","article-title":"Language models of protein sequences at the scale of evolution enable accurate structure prediction","author":"Lin","year":"2022","journal-title":"bioRxiv"},{"key":"2024121222494047600_btae342-B24","doi-asserted-by":"crossref","first-page":"701","DOI":"10.1038\/s41563-020-00886-0","article-title":"Membrane-destabilizing ionizable phospholipids for organ-selective mRNA delivery and CRISPR\u2013Cas gene editing","volume":"20","author":"Liu","year":"2021","journal-title":"Nat Mater"},{"key":"2024121222494047600_btae342-B25","doi-asserted-by":"crossref","first-page":"2000099","DOI":"10.1002\/adtp.202000099","article-title":"Nanoplatforms for mRNA therapeutics","volume":"4","author":"Meng","year":"2021","journal-title":"Adv Therap"},{"key":"2024121222494047600_btae342-B26","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.3390\/ijms22041676","article-title":"Advances in de novo drug design: from conventional to machine learning methods","volume":"22","author":"Mouchlis","year":"2021","journal-title":"Int J Mol Sci"},{"key":"2024121222494047600_btae342-B28","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1186\/s13321-020-00469-w","article-title":"DECIMER: towards deep learning for chemical image recognition","volume":"12","author":"Rajan","year":"2020","journal-title":"J Cheminform"},{"key":"2024121222494047600_btae342-B29","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J Chem Inf Model"},{"key":"2024121222494047600_btae342-B30","first-page":"12559","article-title":"Self-supervised graph transformer on large-scale molecular data","volume":"33","author":"Rong","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024121222494047600_btae342-B31","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1038\/s42256-022-00580-7","article-title":"Large-scale chemical language representations capture molecular structure and properties","volume":"4","author":"Ross","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2024121222494047600_btae342-B32","doi-asserted-by":"crossref","first-page":"2324","DOI":"10.1021\/acs.jcim.5b00559","article-title":"Zinc 15\u2013ligand discovery for everyone","volume":"55","author":"Sterling","year":"2015","journal-title":"J Chem Inf Model"},{"key":"2024121222494047600_btae342-B33","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1007\/s11095-022-03460-2","article-title":"Structure and function of cationic and ionizable lipids for nucleic acid delivery","volume":"40","author":"Sun","year":"2023","journal-title":"Pharm Res"},{"key":"2024121222494047600_btae342-B34","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1186\/s13321-023-00725-9","article-title":"Improving the quality of chemical language model outcomes with atom-in-smiles tokenization","volume":"15","author":"Ucak","year":"2023","journal-title":"J Cheminform"},{"key":"2024121222494047600_btae342-B35","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1039\/D2DD00058J","article-title":"A smile is all you need: predicting limiting activity coefficients from smiles with natural language processing","volume":"1","author":"Winter","year":"2022","journal-title":"Digit Discov"},{"issue":"34","key":"2024121222494047600_btae342-B36","doi-asserted-by":"crossref","DOI":"10.1126\/sciadv.abc2315","article-title":"Functionalized lipid-like nanoparticles for in vivo mRNA delivery and base editing","volume":"6","author":"Zhang","year":"2020","journal-title":"Sci Adv"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae342\/57986583\/btae342.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/7\/btae342\/61025775\/btae342.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/7\/btae342\/61025775\/btae342.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,12]],"date-time":"2024-12-12T22:51:11Z","timestamp":1734043871000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae342\/7684951"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,29]]},"references-count":36,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae342","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,5,29]]},"article-number":"btae342"}}