{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T17:49:40Z","timestamp":1778176180269,"version":"3.51.4"},"reference-count":38,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2021,3,31]],"date-time":"2021-03-31T00:00:00Z","timestamp":1617148800000},"content-version":"vor","delay-in-days":30,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,3,31]],"date-time":"2021-03-31T00:00:00Z","timestamp":1617148800000},"content-version":"tdm","delay-in-days":30,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2021,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Artificial intelligence is driving one of the most important revolutions in organic chemistry. Multiple platforms, including tools for reaction prediction and synthesis planning based on machine learning, have successfully become part of the organic chemists\u2019 daily laboratory, assisting in domain-specific synthetic problems. Unlike reaction prediction and retrosynthetic models, the prediction of reaction yields has received less attention in spite of the enormous potential of accurately predicting reaction conversion rates. Reaction yields models, describing the percentage of the reactants converted to the desired products, could guide chemists and help them select high-yielding reactions and score synthesis routes, reducing the number of attempts. So far, yield predictions have been predominantly performed for high-throughput experiments using a categorical (one-hot) encoding of reactants, concatenated molecular fingerprints, or computed chemical descriptors. Here, we extend the application of natural language processing architectures to predict reaction properties given a text-based representation of the reaction, using an encoder transformer model combined with a regression layer. We demonstrate outstanding prediction performance on two high-throughput experiment reactions sets. An analysis of the yields reported in the open-source USPTO data set shows that their distribution differs depending on the mass scale, limiting the data set applicability in reaction yields predictions.<\/jats:p>","DOI":"10.1088\/2632-2153\/abc81d","type":"journal-article","created":{"date-parts":[[2021,5,3]],"date-time":"2021-05-03T07:18:43Z","timestamp":1620026323000},"page":"015016","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":202,"title":["Prediction of chemical reaction yields using deep learning"],"prefix":"10.1088","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3046-6576","authenticated-orcid":false,"given":"Philippe","family":"Schwaller","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7554-0288","authenticated-orcid":false,"given":"Alain C","family":"Vaucher","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8717-0456","authenticated-orcid":false,"given":"Teodoro","family":"Laino","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2724-2942","authenticated-orcid":false,"given":"Jean-Louis","family":"Reymond","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2021,3,31]]},"reference":[{"key":"mlstabc81dbib1","doi-asserted-by":"crossref","DOI":"10.26434\/chemrxiv.12298559.v1","article-title":"Unsupervised Attention-Guided Atom-Mapping","author":"Schwaller","year":"2020"},{"key":"mlstabc81dbib2","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/nature25978","article-title":"Planning chemical syntheses with deep neural networks and symbolic AI","volume":"555","author":"Segler","year":"2018","journal-title":"Nature"},{"key":"mlstabc81dbib3","doi-asserted-by":"publisher","first-page":"eaax1566","DOI":"10.1126\/science.aax1566","article-title":"A robotic platform for flow synthesis of organic compounds informed by AI planning","volume":"365","author":"Coley","year":"2019","journal-title":"Science"},{"key":"mlstabc81dbib4","doi-asserted-by":"publisher","first-page":"3316","DOI":"10.1039\/C9SC05704H","article-title":"Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy","volume":"11","author":"Schwaller","year":"2020","journal-title":"Chem. Sci."},{"key":"mlstabc81dbib5","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1186\/s13321-020-00472-1","article-title":"AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning","volume":"12","author":"Genheden","year":"2020","journal-title":"J. Cheminform."},{"key":"mlstabc81dbib6","doi-asserted-by":"publisher","first-page":"1572","DOI":"10.1021\/acscentsci.9b00576","article-title":"Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction","volume":"5","author":"Schwaller","year":"2019","journal-title":"ACS Cent. Sci."},{"key":"mlstabc81dbib7","doi-asserted-by":"publisher","first-page":"L173","DOI":"10.1016\/0926-860x(94)80169-x","article-title":"Estimation of catalytic performance by neural network\u2014product distribution in oxidative dehydrogenation of ethylbenzene","volume":"114","author":"Kite","year":"1994","journal-title":"Appl. Catal. A"},{"key":"mlstabc81dbib8","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1038\/nature17439","article-title":"Machine-learning-assisted materials discovery using failed experiments","volume":"533","author":"Raccuglia","year":"2016","journal-title":"Nature"},{"key":"mlstabc81dbib9","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1126\/science.aar5169","article-title":"Predicting reaction performance in C\u2013N cross-coupling using machine learning","volume":"360","author":"Ahneman","year":"2018","journal-title":"Science"},{"key":"mlstabc81dbib10","doi-asserted-by":"publisher","first-page":"6416","DOI":"10.1126\/science.aat8603","article-title":"Comment on \u201cPredicting reaction performance in C\u2013N cross-coupling using machine learning\"","volume":"362","author":"Chuang","year":"2018","journal-title":"Science"},{"key":"mlstabc81dbib11","doi-asserted-by":"publisher","first-page":"1379","DOI":"10.1016\/j.chempr.2020.02.017","article-title":"A structure-based platform for predicting chemical reactivity","volume":"6","author":"Sandfort","year":"2020","journal-title":"Chem."},{"key":"mlstabc81dbib12","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1038\/s41586-018-0307-8","article-title":"Controlling an organic synthesis robot with machine learning to search for new reactivity","volume":"559","author":"Granda","year":"2018","journal-title":"Nature"},{"key":"mlstabc81dbib13","doi-asserted-by":"publisher","first-page":"2269","DOI":"10.1039\/D0QO00544D","article-title":"Optimizing chemical reaction conditions using deep learning: a case study for the Suzuki\u2013Miyaura cross-coupling reaction","volume":"7","author":"Fu","year":"2020","journal-title":"Org. Chem. Front."},{"key":"mlstabc81dbib14","doi-asserted-by":"publisher","first-page":"1963","DOI":"10.1039\/D0RE00232A","article-title":"Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening","volume":"5","author":"Eyke","year":"2020","journal-title":"React. Chem. Eng."},{"key":"mlstabc81dbib15","doi-asserted-by":"publisher","first-page":"3582","DOI":"10.1038\/s41598-017-02303-0","article-title":"Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient?","volume":"7","author":"Skoraczy\u0144ski","year":"2017","journal-title":"Sci. Rep."},{"key":"mlstabc81dbib16","doi-asserted-by":"publisher","first-page":"6091","DOI":"10.1039\/C8SC02339E","article-title":"\u201cFound in translation\u201d: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models","volume":"9","author":"Schwaller","year":"2018","journal-title":"Chem. Sci."},{"key":"mlstabc81dbib17","doi-asserted-by":"publisher","first-page":"144","DOI":"10.1038\/s42256-020-00284-w","article-title":"Mapping the space of chemical reactions using attention-based neural networks","volume":"3","author":"Schwaller","year":"2021","journal-title":"Nat. Mach. Intell."},{"key":"mlstabc81dbib18","doi-asserted-by":"publisher","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2019"},{"key":"mlstabc81dbib19","first-page":"5998","article-title":"Attention is all you need","author":"Vaswani","year":"2017"},{"key":"mlstabc81dbib20","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Model."},{"key":"mlstabc81dbib21","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1126\/science.aap9112","article-title":"A platform for automated nanomole-scale reaction screening and micromole-scale synthesis in flow","volume":"359","author":"Perera","year":"2018","journal-title":"Science"},{"key":"mlstabc81dbib22","doi-asserted-by":"publisher","DOI":"10.17863\/CAM.16293","article-title":"Extraction of chemical structures and reactions from the literature","author":"Lowe","year":"2012"},{"key":"mlstabc81dbib23","doi-asserted-by":"publisher","author":"Lowe","year":"2017","DOI":"10.6084\/m9.figshare.5104873.v1"},{"key":"mlstabc81dbib24","author":""},{"key":"mlstabc81dbib25","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","author":"Wolf","year":"2020"},{"key":"mlstabc81dbib26","first-page":"8026","article-title":"PyTorch: an imperative style, high-performance deep learning library","author":"Paszke","year":"2019"},{"key":"mlstabc81dbib27","author":"Landrum","year":"2019"},{"key":"mlstabc81dbib28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-020-18671-7","article-title":"Transfer learning enables the molecular transformer to predict regio-and stereoselective reactions on carbohydrates","volume":"11","author":"Pesciullesi","year":"2020","journal-title":"Nat. Commun."},{"key":"mlstabc81dbib29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-0416-x","article-title":"Visualization of very large high-dimensional data sets as minimum spanning trees","volume":"12","author":"Probst","year":"2020","journal-title":"J. Cheminform."},{"key":"mlstabc81dbib30","doi-asserted-by":"publisher","first-page":"1433","DOI":"10.1093\/bioinformatics\/btx760","article-title":"Fun: a framework for interactive visualizations of large, high-dimensional datasets on the web","volume":"34","author":"Probst","year":"2017","journal-title":"Bioinformatics"},{"key":"mlstabc81dbib31","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572","article-title":"Billion-scale similarity search with GPUs","author":"Johnson","year":"2019","journal-title":"IEEE Trans. Big Data"},{"key":"mlstabc81dbib32","doi-asserted-by":"crossref","DOI":"10.26434\/chemrxiv.12395120.v1","article-title":"Unassisted noise-reduction of chemical reactions data sets","author":"Toniato","year":"2020"},{"key":"mlstabc81dbib33","doi-asserted-by":"publisher","first-page":"5575","DOI":"10.1038\/s41467-020-19266-y","article-title":"State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis","volume":"11","author":"Tetko","year":"2020","journal-title":"Nat. Commun."},{"key":"mlstabc81dbib34","doi-asserted-by":"publisher","first-page":"187","DOI":"10.18653\/v1\/2020.acl-demos.22","article-title":"Exbert: a visual analysis tool to explore learned representations in transformers models","author":"Hoover","year":"2019"},{"key":"mlstabc81dbib35","doi-asserted-by":"publisher","first-page":"63","DOI":"10.18653\/v1\/W19-4808","article-title":"Analyzing the structure of attention in a transformer language model","author":"Vig","year":"2019"},{"key":"mlstabc81dbib36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41597-020-0460-4","article-title":"Reactants, products and transition states of elementary chemical reactions based on quantum chemistry","volume":"7","author":"Grambow","year":"2020","journal-title":"Sci. Data"},{"key":"mlstabc81dbib37","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba822","article-title":"Thousands of reactants and transition states for competing E2 and SN2 reactions","volume":"1","author":"von Rudorff","year":"2020","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstabc81dbib38","doi-asserted-by":"publisher","first-page":"1163","DOI":"10.1039\/D0SC04896H","article-title":"Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies","volume":"12","author":"Jorner","year":"2020","journal-title":"Chem. Sci."}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abc81d","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abc81d\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abc81d\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abc81d\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T11:19:12Z","timestamp":1679915952000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abc81d"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,1]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,3,31]]},"published-print":{"date-parts":[[2021,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/abc81d","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv.12758474.v2","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv.12758474.v1","asserted-by":"object"},{"id-type":"doi","id":"10.26434\/chemrxiv.12758474","asserted-by":"object"}]},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,1]]},"assertion":[{"value":"Prediction of chemical reaction yields using deep learning","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2021 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2020-08-03","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2020-11-05","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2021-03-31","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}