{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,1]],"date-time":"2025-05-01T16:14:43Z","timestamp":1746116083103,"version":"3.40.4"},"reference-count":143,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T00:00:00Z","timestamp":1745971200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T00:00:00Z","timestamp":1745971200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100009150","name":"National Center of Competence in Research Materials\u2019 Revolution: Computational Design and Discovery of Novel Materials","doi-asserted-by":"crossref","award":["205602"],"award-info":[{"award-number":["205602"]}],"id":[{"id":"10.13039\/501100009150","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100023650","name":"NCCR Catalysis","doi-asserted-by":"crossref","award":["180544"],"award-info":[{"award-number":["180544"]}],"id":[{"id":"10.13039\/501100023650","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"crossref","award":["185030"],"award-info":[{"award-number":["185030"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"crossref"}]},{"name":"European Research Council","award":["817977"],"award-info":[{"award-number":["817977"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Integer linear programming (ILP) is an elegant approach to solve linear optimization problems, naturally described using integer decision variables. Within the context of physics-inspired machine learning (ML) applied to chemistry, we demonstrate the relevance of an ILP formulation to select molecular training sets for predictions of size-extensive properties. We show that our algorithm outperforms existing unsupervised training set selection approaches, especially when predicting properties of molecules larger than those present in the training set. We argue that the reason for the improved performance is due to the selection that is based on the notion of local similarity (i.e. per-atom) and a unique ILP approach that finds optimal solutions efficiently. Altogether, this work provides a practical algorithm to improve the performance of physics-inspired ML models and offers insights into the conceptual differences with existing training set selection approaches.<\/jats:p>","DOI":"10.1088\/2632-2153\/adcd38","type":"journal-article","created":{"date-parts":[[2025,4,15]],"date-time":"2025-04-15T22:58:41Z","timestamp":1744757921000},"page":"025030","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Integer linear programming for unsupervised training set selection in molecular machine learning"],"prefix":"10.1088","volume":"6","author":[{"given":"Matthieu","family":"Haeberle","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7992-5529","authenticated-orcid":false,"given":"Puck","family":"van Gerwen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6315-4398","authenticated-orcid":true,"given":"Ruben","family":"Laplaza","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2250-9898","authenticated-orcid":true,"given":"Ksenia R","family":"Briling","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9332-4543","authenticated-orcid":false,"given":"Jan","family":"Weinreich","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7928-1076","authenticated-orcid":false,"given":"Friedrich","family":"Eisenbrand","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7993-2879","authenticated-orcid":true,"given":"Cl\u00e9mence","family":"Corminboeuf","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,4,30]]},"reference":[{"key":"mlstadcd38bib1","doi-asserted-by":"publisher","first-page":"368","DOI":"10.1016\/j.sbi.2006.04.004","volume":"16","author":"Edgar","year":"2006","journal-title":"Curr. Opin. Struct. Biol."},{"first-page":"pp 199","year":"2009","author":"Althaus","key":"mlstadcd38bib2"},{"key":"mlstadcd38bib3","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1093\/bioinformatics\/15.3.203","volume":"15","author":"Lenhof","year":"1999","journal-title":"Bioinformatics"},{"key":"mlstadcd38bib4","doi-asserted-by":"publisher","first-page":"S4","DOI":"10.1093\/bioinformatics\/18.suppl_2.S4","volume":"18","author":"Althaus","year":"2002","journal-title":"Bioinformatics"},{"key":"mlstadcd38bib5","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1007\/s10107-005-0659-3","volume":"105","author":"Althaus","year":"2006","journal-title":"Math. Program."},{"key":"mlstadcd38bib6","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1007\/s10878-008-9139-z","volume":"16","author":"Althaus","year":"2008","journal-title":"J. Comb. Optim."},{"key":"mlstadcd38bib7","first-page":"pp 153","article-title":"A polyhedral approach to RNA sequence structure alignment","author":"Lenhof","year":"1998"},{"key":"mlstadcd38bib8","first-page":"pp 100","article-title":"Structural alignment of large-size proteins via lagrangian relaxation","author":"Caprara","year":"2002"},{"key":"mlstadcd38bib9","first-page":"pp 15","article-title":"A combinatorial approach to protein docking with flexible side-chains","author":"Althaus","year":"2000"},{"key":"mlstadcd38bib10","doi-asserted-by":"publisher","first-page":"1028","DOI":"10.1093\/bioinformatics\/bti144","volume":"21","author":"Kingsford","year":"2004","journal-title":"Bioinformatics"},{"key":"mlstadcd38bib11","doi-asserted-by":"publisher","first-page":"593","DOI":"10.1021\/ci800228y","volume":"49","author":"Law","year":"2009","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib12","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1021\/op500373e","volume":"19","author":"B\u00f8gevig","year":"2015","journal-title":"Org. Process. Res. Dev."},{"key":"mlstadcd38bib13","doi-asserted-by":"publisher","first-page":"5966","DOI":"10.1002\/chem.201605499","volume":"23","author":"Segler","year":"2017","journal-title":"Chem. \u2013 Eur. J."},{"key":"mlstadcd38bib14","doi-asserted-by":"publisher","first-page":"1281","DOI":"10.1021\/acs.accounts.8b00087","volume":"51","author":"Coley","year":"2018","journal-title":"Acc. Chem. Res."},{"key":"mlstadcd38bib15","doi-asserted-by":"publisher","first-page":"3398","DOI":"10.1021\/acs.jcim.0c00403","volume":"60","author":"Fortunato","year":"2020","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib16","doi-asserted-by":"publisher","first-page":"2884","DOI":"10.1021\/ci400442f","volume":"53","author":"Kraut","year":"2013","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib17","doi-asserted-by":"publisher","first-page":"1296","DOI":"10.1021\/ci020023s","volume":"42","author":"Chen","year":"2002","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"mlstadcd38bib18","doi-asserted-by":"publisher","first-page":"2140","DOI":"10.1021\/acs.jcim.6b00319","volume":"56","author":"Lin","year":"2016","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib19","doi-asserted-by":"publisher","first-page":"932","DOI":"10.1039\/D3DD00175J","volume":"3","author":"van Gerwen","year":"2024","journal-title":"Dig. Disc."},{"key":"mlstadcd38bib20","doi-asserted-by":"publisher","first-page":"5771","DOI":"10.1021\/acs.jcim.4c00104","volume":"64","author":"van Gerwen","year":"2024","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib21","doi-asserted-by":"publisher","first-page":"2523","DOI":"10.1016\/j.dam.2012.01.026","volume":"160","author":"Bahiense","year":"2012","journal-title":"Discrete Appl. Math."},{"key":"mlstadcd38bib22","doi-asserted-by":"publisher","first-page":"S6","DOI":"10.1186\/1471-2105-7-S4-S6","volume":"7","author":"Huang","year":"2006","journal-title":"BMC Bioinform."},{"key":"mlstadcd38bib23","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1021\/ci200351b","volume":"52","author":"First","year":"2012","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib24","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1186\/s13015-014-0023-3","volume":"9","author":"Mann","year":"2014","journal-title":"Algorithms Mol. Bio."},{"key":"mlstadcd38bib25","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1021\/ci00052a009","volume":"26","author":"Fujita","year":"1986","journal-title":"J. Chem. Inf. Comput. Sci."},{"key":"mlstadcd38bib26","doi-asserted-by":"publisher","first-page":"2516","DOI":"10.1021\/acs.jcim.9b00102","volume":"59","author":"Nugmanov","year":"2019","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib27","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1007\/s10822-005-9008-0","volume":"19","author":"Varnek","year":"2005","journal-title":"J. Comput. Aided Mol. Des."},{"key":"mlstadcd38bib28","doi-asserted-by":"publisher","first-page":"678","DOI":"10.1016\/j.mcm.2006.02.004","volume":"44","author":"Sen","year":"2006","journal-title":"Math. Comput. Model."},{"key":"mlstadcd38bib29","doi-asserted-by":"publisher","first-page":"1267","DOI":"10.1021\/ci049645z","volume":"45","author":"Froeyen","year":"2005","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib30","doi-asserted-by":"publisher","first-page":"9777","DOI":"10.1021\/ac402180c","volume":"85","author":"Baran","year":"2013","journal-title":"Anal. Chem."},{"key":"mlstadcd38bib31","first-page":"pp 1273","article-title":"Computing h\/d-exchange speeds of single residues from data of peptic fragments","author":"Althaus","year":"2008"},{"key":"mlstadcd38bib32","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1007\/s10910-011-9911-7","volume":"50","author":"Johnston","year":"2012","journal-title":"J. Math. Chem."},{"key":"mlstadcd38bib33","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1016\/j.compchemeng.2016.04.019","volume":"90","author":"Willis","year":"2016","journal-title":"Comput. Chem. Eng."},{"key":"mlstadcd38bib34","first-page":"pp 40","article-title":"A method for the inverse qsar\/qspr based on artificial neural networks and mixed integer linear programming","author":"Chiewvanichakorn","year":"2020"},{"first-page":"pp 433","year":"2020","author":"Zhang","key":"mlstadcd38bib35"},{"key":"mlstadcd38bib36","first-page":"pp 21","article-title":"A method for molecular design based on linear regression and integer programming","author":"Zhu","year":"2022"},{"key":"mlstadcd38bib37","doi-asserted-by":"publisher","first-page":"17058","DOI":"10.1007\/s10489-021-03088-6","volume":"52","author":"Zhang","year":"2022","journal-title":"Appl. Intell."},{"year":"2021","author":"Ido","key":"mlstadcd38bib38"},{"key":"mlstadcd38bib39","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1016\/S0097-8485(02)00049-9","volume":"26","author":"Ostrovsky","year":"2002","journal-title":"Comput. Chem."},{"key":"mlstadcd38bib40","doi-asserted-by":"publisher","first-page":"839","DOI":"10.1021\/ie0605985","volume":"46","author":"Zhu","year":"2007","journal-title":"Ind. Eng. Chem. Res."},{"key":"mlstadcd38bib41","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1000246","volume":"4","author":"Toussaint","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"mlstadcd38bib42","doi-asserted-by":"publisher","DOI":"10.1209\/0295-5075\/77\/50006","volume":"77","author":"Lavor","year":"2007","journal-title":"Europhys. Lett."},{"key":"mlstadcd38bib43","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1007\/s10898-012-9868-5","volume":"56","author":"Janes","year":"2013","journal-title":"J. Glob. Optim."},{"key":"mlstadcd38bib44","doi-asserted-by":"publisher","first-page":"10001","DOI":"10.1021\/acs.chemrev.0c01303","volume":"121","author":"Huang","year":"2021","journal-title":"Chem. Rev."},{"key":"mlstadcd38bib45","doi-asserted-by":"publisher","first-page":"9759","DOI":"10.1021\/acs.chemrev.1c00021","volume":"121","author":"Musil","year":"2021","journal-title":"Chem. Rev."},{"key":"mlstadcd38bib46","doi-asserted-by":"publisher","first-page":"10073","DOI":"10.1021\/acs.chemrev.1c00022","volume":"121","author":"Deringer","year":"2021","journal-title":"Chem. Rev."},{"key":"mlstadcd38bib47","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41524-022-00721-x","volume":"8","author":"Langer","year":"2022","journal-title":"npj Comput. Mater."},{"key":"mlstadcd38bib48","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1146\/annurev-physchem-042018-052331","volume":"71","author":"No\u00e9","year":"2020","journal-title":"Annu. Rev. Phys. Chem."},{"key":"mlstadcd38bib49","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac8e4f","volume":"3","author":"Fabregat","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"article-title":"Distance metric learning: a comprehensive survey","year":"2006","author":"Yang","key":"mlstadcd38bib50"},{"key":"mlstadcd38bib51","doi-asserted-by":"publisher","first-page":"2120","DOI":"10.1021\/acs.jctc.5b00141","volume":"11","author":"Dral","year":"2015","journal-title":"J. Chem. Theory Comput."},{"key":"mlstadcd38bib52","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/abd486","volume":"2","author":"Reddy","year":"2021","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadcd38bib53","doi-asserted-by":"publisher","DOI":"10.1063\/1.5020710","volume":"148","author":"Faber","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib54","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.87.184115","volume":"87","author":"Bart\u00f3k","year":"2013","journal-title":"Phys. Rev. B"},{"key":"mlstadcd38bib55","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.98.146401","volume":"98","author":"Behler","year":"2007","journal-title":"Phys. Rev. Lett."},{"key":"mlstadcd38bib56","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.99.014104","volume":"99","author":"Drautz","year":"2019","journal-title":"Phys. Rev. B"},{"key":"mlstadcd38bib57","doi-asserted-by":"publisher","DOI":"10.1063\/5.0021116","volume":"153","author":"Nigam","year":"2020","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib58","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.108.058301","volume":"108","author":"Rupp","year":"2012","journal-title":"Phys. Rev. Lett."},{"key":"mlstadcd38bib59","doi-asserted-by":"publisher","first-page":"2326","DOI":"10.1021\/acs.jpclett.5b00831","volume":"6","author":"Hansen","year":"2015","journal-title":"J. Chem. Phys. Lett."},{"key":"mlstadcd38bib60","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aca005","volume":"3","author":"Huo","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadcd38bib61","doi-asserted-by":"publisher","first-page":"1467","DOI":"10.1021\/acs.jctc.1c00813","volume":"18","author":"Fabregat","year":"2022","journal-title":"J. Chem. Theory Comput."},{"key":"mlstadcd38bib62","doi-asserted-by":"publisher","first-page":"945","DOI":"10.1038\/s41557-020-0527-z","volume":"12","author":"Huang","year":"2020","journal-title":"Nat. Chem."},{"key":"mlstadcd38bib63","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ad0fa3","volume":"4","author":"Lemm","year":"2023","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadcd38bib64","doi-asserted-by":"publisher","first-page":"3401","DOI":"10.1073\/pnas.1816132116","volume":"116","author":"Wilkins","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"mlstadcd38bib65","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1021\/acscentsci.8b00551","volume":"5","author":"Grisafi","year":"2018","journal-title":"ACS Cent. Sci."},{"key":"mlstadcd38bib66","doi-asserted-by":"publisher","first-page":"3309","DOI":"10.1021\/acs.jpclett.5b01456","volume":"6","author":"Rupp","year":"2015","journal-title":"J. Chem. Phys. Lett."},{"key":"mlstadcd38bib67","doi-asserted-by":"publisher","DOI":"10.1002\/syst.201900052","volume":"2","author":"Jung","year":"2020","journal-title":"ChemSystemsChem"},{"key":"mlstadcd38bib68","doi-asserted-by":"publisher","first-page":"3404","DOI":"10.1021\/ct400195d","volume":"9","author":"Hansen","year":"2013","journal-title":"J. Chem. Theory Comput."},{"key":"mlstadcd38bib69","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.1701816","volume":"3","author":"Bart\u00f3k","year":"2017","journal-title":"Sci. Adv."},{"key":"mlstadcd38bib70","doi-asserted-by":"publisher","first-page":"1923","DOI":"10.1021\/acs.jcim.7b00090","volume":"57","author":"Unke","year":"2017","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib71","doi-asserted-by":"publisher","first-page":"6879","DOI":"10.1039\/D1SC00482D","volume":"12","author":"Gallarati","year":"2021","journal-title":"Chem. Sci."},{"key":"mlstadcd38bib72","first-page":"pp 612","article-title":"Metric learning for kernel regression","author":"Weinberger","year":"2007"},{"key":"mlstadcd38bib73","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ad8f13","volume":"5","author":"Ullah","year":"2024","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadcd38bib74","doi-asserted-by":"publisher","first-page":"809","DOI":"10.1038\/s41929-018-0176-4","volume":"1","author":"Bo","year":"2018","journal-title":"Nat. Catal."},{"key":"mlstadcd38bib75","doi-asserted-by":"publisher","DOI":"10.1016\/j.coche.2021.100778","volume":"36","author":"Nandy","year":"2022","journal-title":"Curr. Opin. Chem. Eng."},{"key":"mlstadcd38bib76","doi-asserted-by":"publisher","first-page":"676","DOI":"10.1557\/mrs.2018.208","volume":"43","author":"Draxl","year":"2018","journal-title":"MRS Bull."},{"key":"mlstadcd38bib77","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2017.193","volume":"4","author":"Smith","year":"2017","journal-title":"Sci. Data"},{"key":"mlstadcd38bib78","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2014.22","volume":"1","author":"Ramakrishnan","year":"2014","journal-title":"Sci. Data"},{"key":"mlstadcd38bib79","doi-asserted-by":"publisher","DOI":"10.1063\/1.4812323","volume":"1","author":"Jain","year":"2013","journal-title":"APL Mater."},{"key":"mlstadcd38bib80","doi-asserted-by":"publisher","first-page":"1300","DOI":"10.1021\/acs.jcim.7b00083","volume":"57","author":"Nakata","year":"2017","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib81","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1038\/s41597-021-00812-2","volume":"8","author":"Hoja","year":"2021","journal-title":"Sci. Data"},{"key":"mlstadcd38bib82","doi-asserted-by":"publisher","DOI":"10.1088\/1367-2630\/15\/9\/095003","volume":"15","author":"Montavon","year":"2013","journal-title":"New J. Phys."},{"key":"mlstadcd38bib83","doi-asserted-by":"publisher","DOI":"10.1063\/1.4928757","volume":"143","author":"Ramakrishnan","year":"2015","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib84","doi-asserted-by":"publisher","DOI":"10.1016\/j.cdc.2023.101040","volume":"46","author":"Neeser","year":"2023","journal-title":"Chem. Data Coll."},{"key":"mlstadcd38bib85","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.commatsci.2015.02.050","volume":"103","author":"Qu","year":"2015","journal-title":"Comput. Mater. Sci."},{"key":"mlstadcd38bib86","doi-asserted-by":"publisher","first-page":"3704","DOI":"10.1021\/acs.jcim.2c00503","volume":"62","author":"Wahab","year":"2022","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib87","doi-asserted-by":"publisher","first-page":"1717","DOI":"10.1021\/acscentsci.9b00804","volume":"5","author":"Yamada","year":"2019","journal-title":"ACS Cent. Sci."},{"key":"mlstadcd38bib88","doi-asserted-by":"publisher","first-page":"5316","DOI":"10.1038\/s41467-019-13297-w","volume":"10","author":"Jha","year":"2019","journal-title":"Nat. Commun."},{"key":"mlstadcd38bib89","doi-asserted-by":"publisher","first-page":"5826","DOI":"10.1021\/acs.jpca.9b04195","volume":"123","author":"Grambow","year":"2019","journal-title":"J. Phys. Chem. A"},{"key":"mlstadcd38bib90","doi-asserted-by":"publisher","first-page":"2357","DOI":"10.3390\/molecules25102357","volume":"25","author":"Bai","year":"2020","journal-title":"Molecules"},{"key":"mlstadcd38bib91","doi-asserted-by":"publisher","first-page":"10022","DOI":"10.1039\/D1SC01206A","volume":"12","author":"Jackson","year":"2021","journal-title":"Chem. Sci."},{"key":"mlstadcd38bib92","doi-asserted-by":"publisher","first-page":"6655","DOI":"10.1039\/D1SC06932B","volume":"13","author":"Shim","year":"2022","journal-title":"Chem. Sci."},{"key":"mlstadcd38bib93","doi-asserted-by":"publisher","first-page":"5143","DOI":"10.1039\/D3SC04928K","volume":"15","author":"King-Smith","year":"2024","journal-title":"Chem. Sci."},{"key":"mlstadcd38bib94","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1214\/10-BA521","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadcd38bib95","first-page":"1633","volume":"10","author":"Taylor","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadcd38bib96","doi-asserted-by":"publisher","first-page":"4874","DOI":"10.1038\/s41467-020-18671-7","volume":"11","author":"Pesciullesi","year":"2020","journal-title":"Nat. Commun."},{"key":"mlstadcd38bib97","doi-asserted-by":"publisher","DOI":"10.1063\/1.5023802","volume":"148","author":"Smith","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib98","doi-asserted-by":"publisher","DOI":"10.1063\/1.5005095","volume":"148","author":"Gubaev","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib99","doi-asserted-by":"publisher","first-page":"104","DOI":"10.1038\/s41524-020-00367-7","volume":"6","author":"Sivaraman","year":"2020","journal-title":"npj Comput. Mater."},{"key":"mlstadcd38bib100","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1016\/j.commatsci.2017.08.031","volume":"140","author":"Podryabinkin","year":"2017","journal-title":"Comput. Mater. Sci."},{"key":"mlstadcd38bib101","doi-asserted-by":"publisher","first-page":"458","DOI":"10.1016\/j.drudis.2014.12.004","volume":"20","author":"Reker","year":"2015","journal-title":"Drug Discov. Today"},{"key":"mlstadcd38bib102","doi-asserted-by":"publisher","first-page":"1134","DOI":"10.1039\/D3DD00037K","volume":"2","author":"Wen","year":"2023","journal-title":"Dig. Disc."},{"key":"mlstadcd38bib103","doi-asserted-by":"publisher","first-page":"4146","DOI":"10.1039\/D3SC04653B","volume":"15","author":"Dodds","year":"2024","journal-title":"Chem. Sci."},{"year":"2023","author":"Heinen","key":"mlstadcd38bib104"},{"year":"2017","author":"Bachem","key":"mlstadcd38bib105"},{"key":"mlstadcd38bib106","first-page":"1","volume":"18","author":"Lucic","year":"2018","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadcd38bib107","first-page":"pp 14879","article-title":"Coresets via bilevel optimization for continual learning and streaming","author":"Borsos","year":"2020"},{"year":"2020","author":"Mirzasoleiman","key":"mlstadcd38bib108"},{"key":"mlstadcd38bib109","doi-asserted-by":"publisher","DOI":"10.1063\/1.4989536","volume":"146","author":"Dral","year":"2017","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib110","doi-asserted-by":"publisher","DOI":"10.1063\/1.5024611","volume":"148","author":"Imbalzano","year":"2018","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib111","doi-asserted-by":"publisher","first-page":"5139","DOI":"10.1021\/acs.jctc.0c00362","volume":"16","author":"Rossi","year":"2020","journal-title":"J. Chem. Theory Comput."},{"key":"mlstadcd38bib112","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1073\/pnas.0803205106","volume":"106","author":"Mahoney","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"mlstadcd38bib113","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/abfe7c","volume":"2","author":"Cersonsky","year":"2021","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadcd38bib114","doi-asserted-by":"publisher","first-page":"1351","DOI":"10.1021\/acs.jpclett.7b00038","volume":"8","author":"Browning","year":"2017","journal-title":"J. Chem. Phys. Lett."},{"article-title":"Enamine real compounds","year":"2020","author":"","key":"mlstadcd38bib115"},{"key":"mlstadcd38bib116","doi-asserted-by":"publisher","first-page":"888","DOI":"10.1162\/neco.1992.4.6.888","volume":"4","author":"Bottou","year":"1992","journal-title":"Neural Comput."},{"article-title":"Linear programming methods and the bipartite matching polytope","year":"2012","author":"Schrijver","key":"mlstadcd38bib117"},{"year":"2014","author":"Wolsey","key":"mlstadcd38bib118"},{"article-title":"Machine-learning quantum-chemical properties of molecules and chemical reactions","year":"2024","author":"Van Gerwen","key":"mlstadcd38bib119"},{"key":"mlstadcd38bib120","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1016\/S0166-1280(99)00235-3","volume":"493","author":"Adamo","year":"1999","journal-title":"J. Mol. Struct. THEOCHEM"},{"key":"mlstadcd38bib121","doi-asserted-by":"publisher","first-page":"3297","DOI":"10.1039\/b508541a","volume":"7","author":"Weigend","year":"2005","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstadcd38bib122","doi-asserted-by":"publisher","DOI":"10.1063\/1.3382344","volume":"132","author":"Grimme","year":"2010","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib123","doi-asserted-by":"publisher","first-page":"e1606","DOI":"10.1002\/wcms.1606","volume":"12","author":"Neese","year":"2022","journal-title":"Wiley Interdiscip. Rev. Comput. Mol. Sci."},{"key":"mlstadcd38bib124","doi-asserted-by":"publisher","DOI":"10.1063\/1.5126701","volume":"152","author":"Christensen","year":"2020","journal-title":"J. Chem. Phys."},{"article-title":"QML: a python toolkit for quantum machine learning","year":"2017","author":"Christensen","key":"mlstadcd38bib125"},{"article-title":"Gurobi optimizer reference manual","year":"2022","author":"Gurobi Optimization, LLC","key":"mlstadcd38bib126"},{"key":"mlstadcd38bib127","doi-asserted-by":"publisher","first-page":"81","DOI":"10.12688\/openreseurope.15789.2","volume":"3","author":"Goscinski","year":"2023","journal-title":"Open Res. Eur."},{"key":"mlstadcd38bib128","doi-asserted-by":"publisher","first-page":"13754","DOI":"10.1039\/C6CP00415F","volume":"18","author":"De","year":"2016","journal-title":"Phys. Chem. Chem. Phys."},{"key":"mlstadcd38bib129","doi-asserted-by":"publisher","first-page":"3084","DOI":"10.1021\/acs.jctc.0c00100","volume":"16","author":"Fabregat","year":"2020","journal-title":"J. Chem. Theory Comput."},{"key":"mlstadcd38bib130","doi-asserted-by":"publisher","first-page":"3640","DOI":"10.1039\/D3SC06208B","volume":"15","author":"Gallarati","year":"2024","journal-title":"Chem. Sci."},{"key":"mlstadcd38bib131","doi-asserted-by":"publisher","DOI":"10.1063\/5.0036522","volume":"154","author":"Imbalzano","year":"2021","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib132","doi-asserted-by":"publisher","DOI":"10.1088\/1367-2630\/ab4509","volume":"21","author":"Raimbault","year":"2019","journal-title":"New J. Phys."},{"key":"mlstadcd38bib133","doi-asserted-by":"publisher","first-page":"1201","DOI":"10.1021\/acs.jcim.3c01953","volume":"64","author":"C\u00e9lerse","year":"2024","journal-title":"J. Chem. Inf. Model."},{"key":"mlstadcd38bib134","doi-asserted-by":"publisher","first-page":"221","DOI":"10.1021\/acs.accounts.1c00503","volume":"55","author":"Mouvet","year":"2022","journal-title":"Acc. Chem. Res."},{"key":"mlstadcd38bib135","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/aba9ef","volume":"1","author":"Helfrecht","year":"2020","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadcd38bib136","first-page":"2579","volume":"9","author":"Van der Maaten","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadcd38bib137","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1038\/s41592-018-0308-4","volume":"16","author":"Linderman","year":"2019","journal-title":"Nat. Methods"},{"key":"mlstadcd38bib138","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v109.i03","volume":"109","author":"Poli\u010dar","year":"2024","journal-title":"J. Stat. Softw."},{"key":"mlstadcd38bib139","doi-asserted-by":"publisher","first-page":"188","DOI":"10.1038\/s41524-022-00874-9","volume":"8","author":"Vela","year":"2022","journal-title":"npj Comput. Mater."},{"key":"mlstadcd38bib140","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2022.100588","volume":"3","author":"Krenn","year":"2022","journal-title":"Patterns"},{"key":"mlstadcd38bib141","doi-asserted-by":"publisher","DOI":"10.1063\/5.0085153","volume":"156","author":"Jur\u00e1skov\u00e1","year":"2022","journal-title":"J. Chem. Phys."},{"key":"mlstadcd38bib142","doi-asserted-by":"publisher","first-page":"10350","DOI":"10.1021\/acs.jctc.4c01201","volume":"20","author":"C\u00e9lerse","year":"2024","journal-title":"J. Chem. Theory Comput."},{"year":"2024","author":"Haeberle","key":"mlstadcd38bib143"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T09:56:49Z","timestamp":1746007009000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adcd38"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,30]]},"references-count":143,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,4,30]]},"published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/adcd38","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2025,4,30]]},"assertion":[{"value":"Integer linear programming for unsupervised training set selection in molecular machine learning","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-10-22","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-04-15","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-04-30","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}