{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,26]],"date-time":"2026-06-26T11:05:38Z","timestamp":1782471938798,"version":"3.54.5"},"reference-count":89,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2024,3,21]],"date-time":"2024-03-21T00:00:00Z","timestamp":1710979200000},"content-version":"vor","delay-in-days":20,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,21]],"date-time":"2024-03-21T00:00:00Z","timestamp":1710979200000},"content-version":"tdm","delay-in-days":20,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"Ed Clark Chair of Advanced Materials"},{"DOI":"10.13039\/501100010785","name":"Canada First Research Excellence Fund","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100010785","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003579","name":"University of Toronto","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003579","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Canada CIFAR AI Chair"},{"name":"European Research Council"},{"name":"Acceleration Consortium"},{"DOI":"10.13039\/501100000780","name":"European Union","doi-asserted-by":"crossref","award":["772834"],"award-info":[{"award-number":["772834"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2024,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We present an automated data-collection pipeline involving a convolutional neural network and a large language model to extract user-specified tabular data from peer-reviewed literature. The pipeline is applied to 74 reports published between 1957 and 2014 with experimentally-measured oxidation potentials for 592 organic molecules (\u22120.75 to 3.58\u2009V). After data curation (solvents, reference electrodes, and missed data points), we trained multiple supervised machine learning (ML) models reaching prediction errors similar to experimental uncertainty (\u223c0.2\u2009V). For experimental measurements of identical molecules reported in multiple studies, we identified the most likely value based on out-of-sample ML predictions. Using the trained ML models, we then estimated oxidation potentials of \u223c132k small organic molecules from the QM9 (quantum mechanics data for organic molecules with up to 9 atoms not counting hydrogens) data set, with predicted values spanning 0.21\u20133.46\u2009V. Analysis of the QM9 predictions in terms of plausible descriptor-property trends suggests that aliphaticity increases the oxidation potential of an organic molecule on average from \u223c1.5\u2009V to \u223c2\u2009V, while an increase in number of heavy atoms lowers it systematically. The pipeline introduced offers significant reductions in human labor otherwise required for conventional manual data collection of experimental results, and exemplifies how to accelerate scientific research through automation.<\/jats:p>","DOI":"10.1088\/2632-2153\/ad2f52","type":"journal-article","created":{"date-parts":[[2024,3,2]],"date-time":"2024-03-02T22:22:30Z","timestamp":1709418150000},"page":"015052","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Autonomous data extraction from peer reviewed literature for training machine learning models of oxidation potentials"],"prefix":"10.1088","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4521-389X","authenticated-orcid":true,"given":"Siwoo","family":"Lee","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9382-2342","authenticated-orcid":true,"given":"Stefan","family":"Heinen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Danish","family":"Khan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7419-0466","authenticated-orcid":false,"given":"O","family":"Anatole von Lilienfeld","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"266","published-online":{"date-parts":[[2024,3,21]]},"reference":[{"key":"mlstad2f52bib1","volume":"vol 331","author":"Akhter","year":"2019"},{"key":"mlstad2f52bib2","doi-asserted-by":"publisher","first-page":"103","DOI":"10.4097\/kjae.2018.71.2.103","volume":"71","author":"Ahn","year":"2018","journal-title":"Korean J. Anesthesiol."},{"key":"mlstad2f52bib3","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1111\/nae2.28","volume":"31","author":"Owens","year":"2021","journal-title":"Nurse Auth. Ed."},{"key":"mlstad2f52bib4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12874-019-0863-0","volume":"20","author":"B\u00fcchter","year":"2020","journal-title":"BMC Med. Res. Methodol."},{"key":"mlstad2f52bib5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1057\/s41599-021-00903-w","volume":"8","author":"Bornmann","year":"2021","journal-title":"Humanit. Soc. Sci. Commun."},{"key":"mlstad2f52bib6","doi-asserted-by":"publisher","first-page":"575","DOI":"10.1007\/s11192-010-0202-z","volume":"84","author":"Larsen","year":"2010","journal-title":"Scientometrics"},{"key":"mlstad2f52bib7","doi-asserted-by":"publisher","first-page":"3383","DOI":"10.1007\/s11837-021-04902-9","volume":"73","author":"Hong","year":"2021","journal-title":"JOM"},{"key":"mlstad2f52bib8","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1126\/science.aaa8415","volume":"349","author":"Jordan","year":"2015","journal-title":"Science"},{"key":"mlstad2f52bib9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.rse.2006.03.004","volume":"104","author":"Foody","year":"2006","journal-title":"Remote Sens. Environ."},{"key":"mlstad2f52bib10","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3087865","volume":"9","author":"Hashmi","year":"2021","journal-title":"IEEE Access"},{"key":"mlstad2f52bib11","doi-asserted-by":"publisher","DOI":"10.1016\/j.array.2022.100220","volume":"15","author":"Colter","year":"2022","journal-title":"Array"},{"key":"mlstad2f52bib12","first-page":"pp 128","author":"Paliwal","year":"2019"},{"key":"mlstad2f52bib13","first-page":"pp 1449","author":"G\u00f6bel","year":"2013"},{"key":"mlstad2f52bib14","author":"Islam","year":"2017"},{"key":"mlstad2f52bib15","first-page":"pp 629","volume":"vol 2","author":"Smith","year":"2007"},{"key":"mlstad2f52bib16","volume":"vol 8658","author":"Smith","year":"2013"},{"key":"mlstad2f52bib17","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3012542","volume":"8","author":"Memon","year":"2020","journal-title":"IEEE Access"},{"key":"mlstad2f52bib18","doi-asserted-by":"publisher","DOI":"10.1016\/j.websem.2022.100761","volume":"76","author":"Liu","year":"2022","journal-title":"J. Web Semant."},{"key":"mlstad2f52bib19","author":"Zhao","year":"2023"},{"key":"mlstad2f52bib20","author":"Fan","year":"2023"},{"key":"mlstad2f52bib21","doi-asserted-by":"publisher","first-page":"3293","DOI":"10.1038\/s41467-022-30839-x","volume":"13","author":"Flam-Shepherd","year":"2022","journal-title":"Nat. Commun."},{"key":"mlstad2f52bib22","doi-asserted-by":"publisher","DOI":"10.1016\/j.sbi.2023.102527","volume":"79","author":"Grisoni","year":"2023","journal-title":"Curr. Opin. Struct. Biol."},{"key":"mlstad2f52bib23","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1039\/D1DD00009H","volume":"1","author":"Hocky","year":"2022","journal-title":"Digit. Discov."},{"key":"mlstad2f52bib24","doi-asserted-by":"publisher","author":"Jablonka","year":"2023","DOI":"10.26434\/chemrxiv-2023-fw8n4"},{"key":"mlstad2f52bib25","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/acadcd","volume":"4","author":"Fu","year":"2023","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstad2f52bib26","doi-asserted-by":"publisher","first-page":"1894","DOI":"10.1021\/acs.jcim.6b00207","volume":"56","author":"Swain","year":"2016","journal-title":"J. Chem. Inf. Model."},{"key":"mlstad2f52bib27","article-title":"Introducing chatGPT","author":"Open AI"},{"key":"mlstad2f52bib28","author":"Eloundou","year":"2023"},{"key":"mlstad2f52bib29","article-title":"Preprint","author":"OpenAI","year":"2023"},{"key":"mlstad2f52bib30","author":"Koubaa","year":"2023"},{"key":"mlstad2f52bib31","doi-asserted-by":"publisher","DOI":"10.1148\/radiol.230163","article-title":"ChatGPT and other large language models are double-edged swords","volume":"307","author":"Shen","year":"2023","journal-title":"Radiology"},{"key":"mlstad2f52bib32","doi-asserted-by":"publisher","first-page":"451","DOI":"10.3389\/fchem.2020.00451","volume":"8","author":"Zhong","year":"2020","journal-title":"Front. Chem."},{"key":"mlstad2f52bib33","doi-asserted-by":"publisher","first-page":"5513","DOI":"10.1039\/D0SE00687D","volume":"4","author":"de la Cruz","year":"2020","journal-title":"Sustain. Energy Fuels"},{"key":"mlstad2f52bib34","doi-asserted-by":"publisher","DOI":"10.1021\/acs.energyfuels.0c02855","volume":"34","author":"Cao","year":"2020","journal-title":"Energy Fuels"},{"key":"mlstad2f52bib35","doi-asserted-by":"publisher","first-page":"4370","DOI":"10.1039\/D0SE00800A","volume":"4","author":"Li","year":"2020","journal-title":"Sustain. Energy Fuels"},{"key":"mlstad2f52bib36","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2014.22","volume":"1","author":"Ramakrishnan","year":"2014","journal-title":"Sci. Data"},{"key":"mlstad2f52bib37","doi-asserted-by":"publisher","DOI":"10.1039\/c4cp01572j","volume":"16","author":"Marenich","year":"2014","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstad2f52bib38","doi-asserted-by":"publisher","first-page":"7407","DOI":"10.1021\/jp025853n","volume":"106","author":"Baik","year":"2002","journal-title":"J. Phys. Chem. A"},{"key":"mlstad2f52bib39","doi-asserted-by":"publisher","first-page":"8852","DOI":"10.1021\/jp5060777","volume":"118","author":"Bachman","year":"2014","journal-title":"J. Phys. Chem. A"},{"key":"mlstad2f52bib40","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1021\/acs.jctc.1c01040","volume":"18","author":"Hruska","year":"2022","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad2f52bib41","doi-asserted-by":"publisher","first-page":"1034","DOI":"10.1021\/acs.jctc.7b00169","volume":"13","author":"Zhang","year":"2017","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad2f52bib42","doi-asserted-by":"publisher","first-page":"2161","DOI":"10.1021\/cr960149m","volume":"99","author":"Cramer","year":"1999","journal-title":"Chem. Rev."},{"key":"mlstad2f52bib43","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1021\/acs.iecr.0c05055","volume":"60","author":"Zhang","year":"2020","journal-title":"Ind. Eng. Chem. Res."},{"key":"mlstad2f52bib44","doi-asserted-by":"publisher","DOI":"10.1021\/acsomega.1c06856","volume":"7","author":"Ghule","year":"2022","journal-title":"ACS Omega"},{"key":"mlstad2f52bib45","doi-asserted-by":"publisher","DOI":"10.1063\/5.0098330","volume":"157","author":"Wang","year":"2022","journal-title":"J. Chem. Phys."},{"key":"mlstad2f52bib46","doi-asserted-by":"publisher","DOI":"10.1016\/j.mtener.2020.100482","volume":"17","author":"Allam","year":"2020","journal-title":"Mater. Today Energy"},{"key":"mlstad2f52bib47","article-title":"pytesseract \u2014 pypi.org"},{"key":"mlstad2f52bib48","first-page":"pp 4700","author":"Huang","year":"2017"},{"key":"mlstad2f52bib49","first-page":"pp 445","author":"Fang","year":"2012"},{"key":"mlstad2f52bib50","doi-asserted-by":"publisher","first-page":"9297","DOI":"10.1021\/jo501761c","volume":"79","author":"Luo","year":"2014","journal-title":"J. Org. Chem."},{"key":"mlstad2f52bib51","article-title":"pdf2image \u2014 pypi.org"},{"key":"mlstad2f52bib52","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1021\/acs.jchemed.7b00361","volume":"95","author":"Elgrishi","year":"2018","journal-title":"J. Chem. Educ."},{"key":"mlstad2f52bib53","author":"Lemm","year":"2021"},{"key":"mlstad2f52bib54","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"Weininger","year":"1988","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"mlstad2f52bib55","first-page":"31","volume":"8","author":"Landrum","year":"2013","journal-title":"Greg Landrum"},{"key":"mlstad2f52bib56","doi-asserted-by":"publisher","first-page":"e1493","DOI":"10.1002\/wcms.1493","volume":"11","author":"Bannwarth","year":"2021","journal-title":"Wiley Interdiscip. Rev.-Comput. Mol. Sci."},{"key":"mlstad2f52bib57","doi-asserted-by":"publisher","first-page":"1989","DOI":"10.1021\/acs.jctc.7b00118","volume":"13","author":"Grimme","year":"2017","journal-title":"J. Chem. Theory Comput."},{"key":"mlstad2f52bib58","author":"Izutsu","year":"2009"},{"key":"mlstad2f52bib59","volume":"vol 541","author":"Inzelt","year":"2013"},{"key":"mlstad2f52bib60","first-page":"pp 785","author":"Chen","year":"2016"},{"key":"mlstad2f52bib61","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1002\/9781119356059.ch5","volume":"30","author":"Ramakrishnan","year":"2017","journal-title":"Rev. Comput. Chem."},{"key":"mlstad2f52bib62","first-page":"1883","article-title":"Quantum machine learning in chemistry and materials","author":"Huang","year":"2020"},{"key":"mlstad2f52bib63","author":"Christensen","year":"2017"},{"key":"mlstad2f52bib64","article-title":"sklearn.kernel_ridge.KernelRidge \u2014 scikit-learn.org"},{"key":"mlstad2f52bib65","first-page":"pp 115","author":"Bergstra","year":"2013"},{"key":"mlstad2f52bib66","doi-asserted-by":"publisher","DOI":"10.1063\/1.3553717","volume":"134","author":"Behler","year":"2011","journal-title":"J. Chem. Phys."},{"key":"mlstad2f52bib67","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-018-0258-y","volume":"10","author":"Moriwaki","year":"2018","journal-title":"J. Cheminform."},{"key":"mlstad2f52bib68","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.87.184115","volume":"87","author":"Bart\u00f3k","year":"2013","journal-title":"Phys. Rev. B"},{"key":"mlstad2f52bib69","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1038\/s41557-020-0527-z","volume":"12","author":"Huang","year":"2020","journal-title":"Nat. Chem."},{"key":"mlstad2f52bib70","doi-asserted-by":"publisher","first-page":"9759","DOI":"10.1021\/acs.chemrev.1c00021","volume":"121","author":"Musil","year":"2021","journal-title":"Chem. Rev."},{"key":"mlstad2f52bib71","doi-asserted-by":"publisher","DOI":"10.1016\/j.cpc.2019.106949","volume":"247","author":"Himanen","year":"2020","journal-title":"Comput. Phys. Commun."},{"key":"mlstad2f52bib72","doi-asserted-by":"publisher","first-page":"2864","DOI":"10.1021\/ci300415d","volume":"52","author":"Ruddigkeit","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"mlstad2f52bib73","doi-asserted-by":"publisher","first-page":"2193","DOI":"10.1063\/1.455064","volume":"89","author":"Petersson","year":"1988","journal-title":"J. Chem. Phys."},{"key":"mlstad2f52bib74","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1103\/PhysRevB.37.785","volume":"37","author":"Lee","year":"1988","journal-title":"Phys. Rev. B"},{"key":"mlstad2f52bib75","doi-asserted-by":"publisher","first-page":"2155","DOI":"10.1063\/1.464913","volume":"96","author":"Becke","year":"1993","journal-title":"J. Chem. Phys."},{"key":"mlstad2f52bib76","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-019-0407-y","volume":"12","author":"Rajan","year":"2020","journal-title":"J. Cheminform."},{"key":"mlstad2f52bib77","doi-asserted-by":"publisher","first-page":"16","DOI":"10.1186\/s13321-021-00538-8","volume":"13","author":"Rajan","year":"2021","journal-title":"J. Cheminform."},{"key":"mlstad2f52bib78","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-020-00477-w","volume":"13","author":"Rajan","year":"2021","journal-title":"J. Cheminform."},{"key":"mlstad2f52bib79","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1016\/0893-6080(93)90013-M","volume":"6","author":"Amari","year":"1993","journal-title":"Neural Netw."},{"key":"mlstad2f52bib80","volume":"vol 6","author":"Cortes","year":"1993"},{"key":"mlstad2f52bib81","doi-asserted-by":"publisher","first-page":"916","DOI":"10.1021\/jo00971a023","volume":"37","author":"Miller","year":"1972","journal-title":"J. Org. Chem."},{"key":"mlstad2f52bib82","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1021\/cr60254a003","volume":"68","author":"Weinberg","year":"1968","journal-title":"Chem. Rev."},{"key":"mlstad2f52bib83","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1016\/S0040-4020(01)96458-0","volume":"41","author":"Minsky","year":"1985","journal-title":"Tetrahedron"},{"key":"mlstad2f52bib84","doi-asserted-by":"publisher","first-page":"3121","DOI":"10.1039\/b002601h","volume":"2","author":"Aihara","year":"2000","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstad2f52bib85","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1039\/B811056P","volume":"106","author":"Jalan","year":"2010","journal-title":"Ann. Rep. C"},{"key":"mlstad2f52bib86","doi-asserted-by":"publisher","DOI":"10.1039\/C8CP07562J","volume":"21","author":"Borhani","year":"2019","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstad2f52bib87","doi-asserted-by":"publisher","first-page":"8306","DOI":"10.1039\/D2MA00742H","volume":"3","author":"Mazouin","year":"2022","journal-title":"Mater. Adv."},{"key":"mlstad2f52bib88","doi-asserted-by":"publisher","first-page":"483","DOI":"10.1038\/s44160-022-00231-0","volume":"1","author":"Abolhasani","year":"2023","journal-title":"Nat. Synth."},{"key":"mlstad2f52bib89","doi-asserted-by":"publisher","author":"Lee","year":"2023","DOI":"10.5281\/ZENODO.8203072)"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,21]],"date-time":"2024-03-21T14:53:20Z","timestamp":1711032800000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ad2f52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,1]]},"references-count":89,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,3,21]]},"published-print":{"date-parts":[[2024,3,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ad2f52","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,1]]},"assertion":[{"value":"Autonomous data extraction from peer reviewed literature for training machine learning models of oxidation potentials","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2024 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2023-08-03","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-03-01","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2024-03-21","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}