{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T08:35:16Z","timestamp":1778574916116,"version":"3.51.4"},"reference-count":147,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T00:00:00Z","timestamp":1730073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"publisher","award":["HR00111920027"],"award-info":[{"award-number":["HR00111920027"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NSERC Discovery","award":["RGPIN-2022-04910"],"award-info":[{"award-number":["RGPIN-2022-04910"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:sec>\n                    <jats:title>Abstract<\/jats:title>\n                    <jats:p>Drug solubility is an important parameter in the drug development process, yet it is often tedious and challenging to measure, especially for expensive drugs or those available in small quantities. To alleviate these challenges, machine learning (ML) has been applied to predict drug solubility as an alternative approach. However, the majority of existing ML research has focused on the predictions of aqueous solubility and\/or solubility at specific temperatures, which restricts the model applicability in pharmaceutical development. To bridge this gap, we compiled a dataset of 27,000 solubility datapoints, including solubility of small molecules measured in a range of binary solvent mixtures under various temperatures. Next, a panel of ML models were trained on this dataset with their hyperparameters tuned using Bayesian optimization. The resulting top-performing models, both gradient boosted decision trees (light gradient boosting machine and extreme gradient boosting), achieved mean absolute errors (MAE) of 0.33 for LogS (S in g\/100\u00a0g) on the holdout set. These models were further validated through a prospective study, wherein the solubility of four drug molecules were predicted by the models and then validated with in-house solubility experiments. This prospective study demonstrated that the models accurately predicted the solubility of solutes in specific binary solvent mixtures under different temperatures, especially for drugs whose features closely align within the solutes in the dataset (MAE\u2009&lt;\u20090.5 for LogS). To support future research and facilitate advancements in the field, we have made the dataset and code openly available.<\/jats:p>\n                    <jats:p>\n                      <jats:bold>Scientific contribution<\/jats:bold>\n                    <\/jats:p>\n                    <jats:p>Our research advances the state-of-the-art in predicting solubility for small molecules by leveraging ML and a uniquely comprehensive dataset. Unlike existing ML studies that predominantly focus on solubility in aqueous solvents at fixed temperatures, our work enables prediction of drug solubility in a variety of binary solvent mixtures over a broad temperature range, providing practical insights on the modeling of solubility for realistic pharmaceutical applications. These advancements along with the open access dataset and code support significant steps in the drug development process including new molecule discovery, drug analysis and formulation.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Graphical Abstract<\/jats:title>\n                  <\/jats:sec>","DOI":"10.1186\/s13321-024-00911-3","type":"journal-article","created":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T10:02:42Z","timestamp":1730109762000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning"],"prefix":"10.1186","volume":"16","author":[{"given":"Zeqing","family":"Bao","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gary","family":"Tom","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Austin","family":"Cheng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeffrey","family":"Watchorn","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Al\u00e1n","family":"Aspuru-Guzik","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christine","family":"Allen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,10,28]]},"reference":[{"key":"911_CR1","doi-asserted-by":"publisher","first-page":"546","DOI":"10.1016\/j.addr.2007.05.007","volume":"59","author":"J Alsenz","year":"2007","unstructured":"Alsenz J, Kansy M (2007) High throughput solubility measurement in drug discovery and development. Adv Drug Deliv Rev 59:546\u2013567. https:\/\/doi.org\/10.1016\/j.addr.2007.05.007","journal-title":"Adv Drug Deliv Rev"},{"key":"911_CR2","doi-asserted-by":"publisher","first-page":"71","DOI":"10.3390\/pr9010071","volume":"9","author":"OMH Salo-Ahen","year":"2021","unstructured":"Salo-Ahen OMH, Alanko I, Bhadane R, Bonvin AMJJ, Honorato RV, Hossain S, Juffer AH, Kabedev A, Lahtela-Kakkonen M, Larsen AS, Lescrinier E, Marimuthu P, Mirza MU, Mustafa G, Nunes-Alves A, Pantsar T, Saadabadi A, Singaravelu K, Vanmeert M (2021) Molecular dynamics simulations in drug discovery and pharmaceutical development. Processes 9:71. https:\/\/doi.org\/10.3390\/pr9010071","journal-title":"Processes"},{"key":"911_CR3","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.drudis.2020.10.010","volume":"26","author":"D Paul","year":"2021","unstructured":"Paul D, Sanap G, Shenoy S, Kalyane D, Kalia K, Tekade RK (2021) Artificial intelligence in drug discovery and development. Drug Discov Today 26:80\u201393. https:\/\/doi.org\/10.1016\/j.drudis.2020.10.010","journal-title":"Drug Discov Today"},{"key":"911_CR4","doi-asserted-by":"publisher","first-page":"1717","DOI":"10.1080\/03639045.2019.1665062","volume":"45","author":"A Veseli","year":"2019","unstructured":"Veseli A, \u017dakelj S, Kristl A (2019) A review of methods for solubility determination in biopharmaceutical drug characterization. Drug Dev Ind Pharm 45:1717\u20131724. https:\/\/doi.org\/10.1080\/03639045.2019.1665062","journal-title":"Drug Dev Ind Pharm"},{"key":"911_CR5","doi-asserted-by":"publisher","first-page":"1195","DOI":"10.1002\/jssc.200401935","volume":"28","author":"S Pedersen-Bjergaard","year":"2005","unstructured":"Pedersen-Bjergaard S, Rasmussen KE, Brekke A, Ho TS, Gr\u00f8nhaug Halvorsen T (2005) Liquid-phase microextraction of basic drugs\u2014selection of extraction mode based on computer calculated solubility data. J Sep Sci 28:1195\u20131203. https:\/\/doi.org\/10.1002\/jssc.200401935","journal-title":"J Sep Sci"},{"key":"911_CR6","doi-asserted-by":"publisher","first-page":"114507","DOI":"10.1016\/j.addr.2022.114507","volume":"190","author":"S Salunke","year":"2022","unstructured":"Salunke S, O\u2019Brien F, Cheng Thiam Tan D, Harris D, Math M-C, Ari\u00ebn T, Klein S, Timpe C (2022) Oral drug delivery strategies for development of poorly water soluble drugs in paediatric patient population. Adv Drug Delivery Rev 190:114507. https:\/\/doi.org\/10.1016\/j.addr.2022.114507","journal-title":"Adv Drug Delivery Rev"},{"key":"911_CR7","doi-asserted-by":"publisher","DOI":"10.1016\/j.lfs.2022.120301","volume":"291","author":"KU Khan","year":"2022","unstructured":"Khan KU, Minhas MU, Badshah SF, Suhail M, Ahmad A, Ijaz S (2022) Overview of nanoparticulate strategies for solubility enhancement of poorly soluble drugs. Life Sci 291:120301. https:\/\/doi.org\/10.1016\/j.lfs.2022.120301","journal-title":"Life Sci"},{"key":"911_CR8","doi-asserted-by":"publisher","first-page":"137","DOI":"10.4103\/jrptps.JRPTPS_134_19","volume":"10","author":"A Ainurofiq","year":"2021","unstructured":"Ainurofiq A, Putro DS, Ramadhani DA, Putra GM, Do Espirito Santo LDC (2021) A review on solubility enhancement methods for poorly water-soluble drugs. J Reports Pharm Sci 10:137. https:\/\/doi.org\/10.4103\/jrptps.JRPTPS_134_19","journal-title":"J Reports Pharm Sci"},{"key":"911_CR9","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1016\/j.ejps.2012.07.019","volume":"47","author":"C Saal","year":"2012","unstructured":"Saal C, Petereit AC (2012) Optimizing solubility: kinetic versus thermodynamic solubility temptations and risks. Eur J Pharm Sci 47:589\u2013595. https:\/\/doi.org\/10.1016\/j.ejps.2012.07.019","journal-title":"Eur J Pharm Sci"},{"key":"911_CR10","doi-asserted-by":"publisher","first-page":"1315","DOI":"10.1016\/j.drudis.2022.01.017","volume":"27","author":"JA Barrett","year":"2022","unstructured":"Barrett JA, Yang W, Skolnik SM, Belliveau LM, Patros KM (2022) Discovery solubility measurement and assessment of small molecules with drug development in mind. Drug Discovery Today 27:1315\u20131325. https:\/\/doi.org\/10.1016\/j.drudis.2022.01.017","journal-title":"Drug Discovery Today"},{"key":"911_CR11","doi-asserted-by":"publisher","first-page":"11618","DOI":"10.1039\/D1NJ01349A","volume":"45","author":"D Csics\u00e1k","year":"2021","unstructured":"Csics\u00e1k D, Borb\u00e1s E, K\u00e1d\u00e1r S, T\u0151zs\u00e9r P, Bagi P, Pataki H, Sink\u00f3 B, Tak\u00e1cs-Nov\u00e1k K, V\u00f6lgyi G (2021) Towards more accurate solubility measurements with real time monitoring: a carvedilol case study. New J Chem 45:11618\u201311625. https:\/\/doi.org\/10.1039\/D1NJ01349A","journal-title":"New J Chem"},{"key":"911_CR12","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1016\/j.ddtec.2018.04.004","volume":"27","author":"T Sou","year":"2018","unstructured":"Sou T, Bergstr\u00f6m CAS (2018) Automated assays for thermodynamic (equilibrium) solubility determination. Drug Discov Today Technol 27:11\u201319. https:\/\/doi.org\/10.1016\/j.ddtec.2018.04.004","journal-title":"Drug Discov Today Technol"},{"key":"911_CR13","doi-asserted-by":"publisher","first-page":"5977","DOI":"10.3390\/ma16175977","volume":"16","author":"G Huang","year":"2023","unstructured":"Huang G, Guo Y, Chen Y, Nie Z (2023) Application of machine learning in material synthesis and property prediction. Materials 16:5977. https:\/\/doi.org\/10.3390\/ma16175977","journal-title":"Materials"},{"key":"911_CR14","doi-asserted-by":"publisher","first-page":"468","DOI":"10.1002\/wcms.1183","volume":"4","author":"JBO Mitchell","year":"2014","unstructured":"Mitchell JBO (2014) Machine learning methods in chemoinformatics. Wiley Interdiscip Rev Comput Mol Sci 4:468\u2013481. https:\/\/doi.org\/10.1002\/wcms.1183","journal-title":"Wiley Interdiscip Rev Comput Mol Sci"},{"key":"911_CR15","doi-asserted-by":"publisher","first-page":"10309","DOI":"10.1021\/acs.analchem.3c00921","volume":"95","author":"CMK Stienstra","year":"2023","unstructured":"Stienstra CMK, Ieritano C, Haack A, Hopkins WS (2023) Bridging the Gap between differential mobility, Log S, and Log P using machine learning and SHAP analysis. Anal Chem 95:10309\u201310321. https:\/\/doi.org\/10.1021\/acs.analchem.3c00921","journal-title":"Anal Chem"},{"key":"911_CR16","doi-asserted-by":"publisher","first-page":"5753","DOI":"10.1038\/s41467-020-19594-z","volume":"11","author":"S Boobier","year":"2020","unstructured":"Boobier S, Hose DRJ, Blacker AJ, Nguyen BN (2020) Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 11:5753. https:\/\/doi.org\/10.1038\/s41467-020-19594-z","journal-title":"Nat Commun"},{"key":"911_CR17","doi-asserted-by":"publisher","first-page":"42027","DOI":"10.1021\/acsomega.2c03885","volume":"7","author":"M Li","year":"2022","unstructured":"Li M, Chen H, Zhang H, Zeng M, Chen B, Guan L (2022) Prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm. ACS Omega 7:42027\u201342035. https:\/\/doi.org\/10.1021\/acsomega.2c03885","journal-title":"ACS Omega"},{"key":"911_CR18","doi-asserted-by":"publisher","first-page":"1101","DOI":"10.3390\/pharmaceutics13071101","volume":"13","author":"EM Tosca","year":"2021","unstructured":"Tosca EM, Bartolucci R, Magni P (2021) Application of artificial neural networks to predict the intrinsic solubility of drug-like molecules. Pharmaceutics 13:1101. https:\/\/doi.org\/10.3390\/pharmaceutics13071101","journal-title":"Pharmaceutics"},{"key":"911_CR19","doi-asserted-by":"publisher","first-page":"3236","DOI":"10.1021\/acsomega.2c06702","volume":"8","author":"W Ahmad","year":"2023","unstructured":"Ahmad W, Tayara H, Chong KT (2023) Attention-Based graph neural network for molecular solubility prediction. ACS Omega 8:3236\u20133244. https:\/\/doi.org\/10.1021\/acsomega.2c06702","journal-title":"ACS Omega"},{"key":"911_CR20","doi-asserted-by":"publisher","DOI":"10.3389\/fonc.2020.00121","author":"Q Cui","year":"2020","unstructured":"Cui Q, Lu S, Ni B, Zeng X, Tan Y, Chen YD, Zhao H (2020) Improved prediction of aqueous solubility of novel compounds by going deeper with deep learning. Front Oncol. https:\/\/doi.org\/10.3389\/fonc.2020.00121","journal-title":"Front Oncol"},{"key":"911_CR21","doi-asserted-by":"publisher","DOI":"10.1002\/cem.3349","volume":"35","author":"M Lovri\u0107","year":"2021","unstructured":"Lovri\u0107 M, Pavlovi\u0107 K, \u017duvela P, Spataru A, Lu\u010di\u0107 B, Kern R, Wong MW (2021) Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: generalization, complexity, or predictive ability? J Chemom 35:e3349. https:\/\/doi.org\/10.1002\/cem.3349","journal-title":"J Chemom"},{"key":"911_CR22","doi-asserted-by":"publisher","first-page":"1000","DOI":"10.1021\/ci034243x","volume":"44","author":"JS Delaney","year":"2004","unstructured":"Delaney JS (2004) ESOL: estimating aqueous solubility directly from molecular structure. J Chem Inf Comput Sci 44:1000\u20131005. https:\/\/doi.org\/10.1021\/ci034243x","journal-title":"J Chem Inf Comput Sci"},{"key":"911_CR23","doi-asserted-by":"publisher","first-page":"759","DOI":"10.1039\/D2DD00146B","volume":"2","author":"G Tom","year":"2023","unstructured":"Tom G, Hickman RJ, Zinzuwadia A, Mohajeri A, Sanchez-Lengeling B, Aspuru-Guzik A (2023) Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS. Digital Discovery 2:759\u2013774. https:\/\/doi.org\/10.1039\/D2DD00146B","journal-title":"Digital Discovery"},{"key":"911_CR24","doi-asserted-by":"publisher","unstructured":"Griffiths RR, Klarner L, Moss H, Ravuri A, Truong S, Du Y, Stanton S, Tom G, Rankovic B, Jamasb A, Deshwal A, Schwartz J, Tripp A, Kell G, Frieder S, Bourached A, Chan A, Moss J, Guo C, Durholt J, Chaurasia S, Strieth-Kalthoff F, Lee AA, Cheng B, Aspuru-Guzik A, Schwaller P, Tang J (2023) GAUCHE: a library for gaussian processes in chemistry. https:\/\/doi.org\/10.48550\/arXiv.2212.04450","DOI":"10.48550\/arXiv.2212.04450"},{"key":"911_CR25","doi-asserted-by":"publisher","first-page":"657","DOI":"10.1021\/acs.jcim.6b00332","volume":"57","author":"S Kim","year":"2017","unstructured":"Kim S, Jinich A, Aspuru-Guzik A (2017) MultiDK: a multiple descriptor multiple kernel approach for molecular discovery and its application to organic flow battery electrolytes. J Chem Inf Model 57:657\u2013668. https:\/\/doi.org\/10.1021\/acs.jcim.6b00332","journal-title":"J Chem Inf Model"},{"key":"911_CR26","doi-asserted-by":"publisher","first-page":"356","DOI":"10.1039\/D2DD00024E","volume":"2","author":"AD Vassileiou","year":"2023","unstructured":"Vassileiou AD, Robertson MN, Wareham BG, Soundaranathan M, Ottoboni S, Florence AJ, Hartwig T, Johnston BF (2023) A unified ML framework for solubility prediction across organic solvents. Digital Discovery 2:356\u2013367. https:\/\/doi.org\/10.1039\/D2DD00024E","journal-title":"Digital Discovery"},{"key":"911_CR27","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1186\/s13321-021-00575-3","volume":"13","author":"Z Ye","year":"2021","unstructured":"Ye Z, Ouyang D (2021) Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms. J Cheminform 13:98. https:\/\/doi.org\/10.1186\/s13321-021-00575-3","journal-title":"J Cheminform"},{"key":"911_CR28","doi-asserted-by":"publisher","first-page":"10785","DOI":"10.1021\/jacs.2c01768","volume":"144","author":"FH Vermeire","year":"2022","unstructured":"Vermeire FH, Chung Y, Green WH (2022) Predicting solubility limits of organic solutes for a wide range of solvents and temperatures. J Am Chem Soc 144:10785\u201310797. https:\/\/doi.org\/10.1021\/jacs.2c01768","journal-title":"J Am Chem Soc"},{"key":"911_CR29","doi-asserted-by":"publisher","first-page":"890","DOI":"10.1080\/00319104.2020.1858420","volume":"59","author":"IP Osorio","year":"2021","unstructured":"Osorio IP, Mart\u00ednez F, Pe\u00f1a M\u00c1, Jouyban A, Acree WE Jr (2021) Solubility of sulphadiazine in some Carbitol\u00ae (1) + water (2) mixtures: determination, correlation, and preferential solvation. Phys Chem Liq 59:890\u2013906. https:\/\/doi.org\/10.1080\/00319104.2020.1858420","journal-title":"Phys Chem Liq"},{"key":"911_CR30","doi-asserted-by":"publisher","first-page":"827","DOI":"10.1080\/00319104.2020.1849208","volume":"59","author":"E Rahimpour","year":"2021","unstructured":"Rahimpour E, Azarmir O, Hassanzadeh D, Nokhodchi A, Jouyban A (2021) Solubility of paracetamol in the ternary solvent mixtures of water + ethanol + glycerol at 298.2 and 303.2 K. Phys Chem Liq 59:827\u2013834. https:\/\/doi.org\/10.1080\/00319104.2020.1849208","journal-title":"Phys Chem Liq"},{"key":"911_CR31","doi-asserted-by":"publisher","first-page":"817","DOI":"10.1080\/00319104.2020.1836640","volume":"59","author":"A Maheri","year":"2021","unstructured":"Maheri A, Ghanbarpour P, Rahimpour E, Acree WE Jr, Jouyban A, Azarbayjani AF, Kouhkan M (2021) Solubilisation of dexamethasone: experimental data, co-solvency and Polarised Continuum Modelling. Phys Chem Liq 59:817\u2013826. https:\/\/doi.org\/10.1080\/00319104.2020.1836640","journal-title":"Phys Chem Liq"},{"key":"911_CR32","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1007\/s12247-019-09384-6","volume":"15","author":"SK Jagdale","year":"2020","unstructured":"Jagdale SK, Nawale RB (2020) Estimation and correlation of solubility of practically insoluble drug itraconazole in 1,4-butanediol + water mixtures using extended hildebrand solubility approach. J Pharm Innov 15:344\u2013356. https:\/\/doi.org\/10.1007\/s12247-019-09384-6","journal-title":"J Pharm Innov"},{"key":"911_CR33","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/j.ijpharm.2016.08.032","volume":"514","author":"H Gasmi","year":"2016","unstructured":"Gasmi H, Siepmann F, Hamoudi MC, Danede F, Verin J, Willart J-F, Siepmann J (2016) Towards a better understanding of the different release phases from PLGA microparticles: dexamethasone-loaded systems. Int J Pharm 514:189\u2013199. https:\/\/doi.org\/10.1016\/j.ijpharm.2016.08.032","journal-title":"Int J Pharm"},{"key":"911_CR34","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1038\/s41467-022-35343-w","volume":"14","author":"P Bannigan","year":"2023","unstructured":"Bannigan P, Bao Z, Hickman RJ, Aldeghi M, H\u00e4se F, Aspuru-Guzik A, Allen C (2023) Machine learning models to accelerate the design of polymeric long-acting injectables. Nat Commun 14:35. https:\/\/doi.org\/10.1038\/s41467-022-35343-w","journal-title":"Nat Commun"},{"key":"911_CR35","doi-asserted-by":"publisher","first-page":"3082","DOI":"10.1021\/acs.iecr.8b04584","volume":"58","author":"S Chinta","year":"2019","unstructured":"Chinta S, Rengaswamy R (2019) Machine learning derived quantitative structure property relationship (QSPR) to predict drug solubility in binary solvent systems. Ind Eng Chem Res 58:3082\u20133092. https:\/\/doi.org\/10.1021\/acs.iecr.8b04584","journal-title":"Ind Eng Chem Res"},{"key":"911_CR36","unstructured":"Drugs@FDA: FDA-Approved Drugs (n.d.) https:\/\/www.accessdata.fda.gov\/scripts\/cder\/daf\/index.cfm (accessed March 13, 2024)"},{"key":"911_CR37","doi-asserted-by":"publisher","first-page":"2791","DOI":"10.3390\/molecules25122791","volume":"25","author":"B Zheng","year":"2020","unstructured":"Zheng B, McClements DJ (2020) Formulation of more efficacious curcumin delivery systems using colloid science: enhanced solubility, stability, and bioavailability. Molecules 25:2791. https:\/\/doi.org\/10.3390\/molecules25122791","journal-title":"Molecules"},{"key":"911_CR38","doi-asserted-by":"publisher","first-page":"2611","DOI":"10.1021\/acs.jced.0c00015","volume":"65","author":"M An","year":"2020","unstructured":"An M, Yi D, Qiu J, Liu H, Hu S, Han J, Guo Y, Huang H, He H, Wang P (2020) Measurement and correlation for solubility of moroxydine hydrochloride in pure and binary solvents. J Chem Eng Data 65:2611\u20132618. https:\/\/doi.org\/10.1021\/acs.jced.0c00015","journal-title":"J Chem Eng Data"},{"key":"911_CR39","doi-asserted-by":"publisher","DOI":"10.1016\/j.molliq.2020.113546","volume":"314","author":"M Moradi","year":"2020","unstructured":"Moradi M, Rahimpour E, Hemmati S, Martinez F, Barzegar-Jalali M, Jouyban A (2020) Solubility of mesalazine in polyethylene glycol 400 + water mixtures at different temperatures. J Mol Liq 314:113546. https:\/\/doi.org\/10.1016\/j.molliq.2020.113546","journal-title":"J Mol Liq"},{"key":"911_CR40","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-06042-2","author":"T Verdonck","year":"2021","unstructured":"Verdonck T, Baesens B, \u00d3skarsd\u00f3ttir M, van den Broucke S (2021) Special issue on feature engineering editorial. Mach Learn. https:\/\/doi.org\/10.1007\/s10994-021-06042-2","journal-title":"Mach Learn"},{"key":"911_CR41","unstructured":"Zheng A, Casari A (2018) Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, O\u2019Reilly Media, Inc."},{"key":"911_CR42","doi-asserted-by":"publisher","DOI":"10.1016\/j.addr.2021.05.016","volume":"175","author":"P Bannigan","year":"2021","unstructured":"Bannigan P, Aldeghi M, Bao Z, H\u00e4se F, Aspuru-Guzik A, Allen C (2021) Machine learning directed drug formulation development. Adv Drug Deliv Rev 175:113806. https:\/\/doi.org\/10.1016\/j.addr.2021.05.016","journal-title":"Adv Drug Deliv Rev"},{"key":"911_CR43","doi-asserted-by":"publisher","DOI":"10.1016\/j.addr.2023.115108","volume":"202","author":"Z Bao","year":"2023","unstructured":"Bao Z, Bufton J, Hickman RJ, Aspuru-Guzik A, Bannigan P, Allen C (2023) Revolutionizing drug formulation development: the increasing impact of machine learning. Adv Drug Deliv Rev 202:115108. https:\/\/doi.org\/10.1016\/j.addr.2023.115108","journal-title":"Adv Drug Deliv Rev"},{"key":"911_CR44","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1016\/j.ejps.2008.10.009","volume":"36","author":"FL Nordstr\u00f6m","year":"2009","unstructured":"Nordstr\u00f6m FL, Rasmuson \u00c5C (2009) Prediction of solubility curves and melting properties of organic and pharmaceutical compounds. Eur J Pharm Sci 36:330\u2013344. https:\/\/doi.org\/10.1016\/j.ejps.2008.10.009","journal-title":"Eur J Pharm Sci"},{"key":"911_CR45","doi-asserted-by":"publisher","first-page":"2660","DOI":"10.1021\/acs.molpharmaceut.0c00355","volume":"17","author":"N Wyttenbach","year":"2020","unstructured":"Wyttenbach N, Niederquell A, Kuentz M (2020) Machine estimation of drug melting properties and influence on solubility prediction. Mol Pharmaceutics 17:2660\u20132671. https:\/\/doi.org\/10.1021\/acs.molpharmaceut.0c00355","journal-title":"Mol Pharmaceutics"},{"key":"911_CR46","doi-asserted-by":"publisher","first-page":"44205","DOI":"10.1039\/D0RA08947H","volume":"10","author":"H Tam Do","year":"2020","unstructured":"Tam Do H, Zen Chua Y, Kumar A, Pabsch D, Hallermann M, Zaitsau D, Schick C, Held C (2020) Melting properties of amino acids and their solubility in water. RSC Adv 10:44205\u201344215. https:\/\/doi.org\/10.1039\/D0RA08947H","journal-title":"RSC Adv"},{"key":"911_CR47","unstructured":"Empowering Innovation & Scientific Discoveries | CAS (n.d.) https:\/\/www.cas.org\/ (accessed February 7, 2024)"},{"key":"911_CR48","unstructured":"Online Chemical Modeling Environment (n.d.) https:\/\/ochem.eu\/predictor\/show.do (accessed July 19, 2024)"},{"key":"911_CR49","unstructured":"RDKit (n.d.) https:\/\/www.rdkit.org\/ (accessed February 7, 2024)"},{"key":"911_CR50","doi-asserted-by":"publisher","first-page":"3211","DOI":"10.3390\/app10093211","volume":"10","author":"H Jeon","year":"2020","unstructured":"Jeon H, Oh S (2020) Hybrid-recursive feature elimination for efficient feature selection. Appl Sci 10:3211. https:\/\/doi.org\/10.3390\/app10093211","journal-title":"Appl Sci"},{"key":"911_CR51","doi-asserted-by":"publisher","unstructured":"Singh D, Climente-Gonzalez H, Petrovich M, Kawakami E, Yamada M (2023) FsNet: Feature Selection Network on High-dimensional Biological Data, in: 2023 International Joint Conference on Neural Networks (IJCNN), pp. 1\u20139. https:\/\/doi.org\/10.1109\/IJCNN54540.2023.10191985","DOI":"10.1109\/IJCNN54540.2023.10191985"},{"key":"911_CR52","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2019.106839","volume":"143","author":"A Bommert","year":"2020","unstructured":"Bommert A, Sun X, Bischl B, Rahnenf\u00fchrer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 143:106839. https:\/\/doi.org\/10.1016\/j.csda.2019.106839","journal-title":"Comput Stat Data Anal"},{"key":"911_CR53","doi-asserted-by":"publisher","DOI":"10.1016\/j.actamat.2023.119132","volume":"256","author":"H Meng","year":"2023","unstructured":"Meng H, Yu R, Tang Z, Wen Z, Yu H, Chu Y (2023) Formation ability descriptors for high-entropy diborides established through high-throughput experiments and machine learning. Acta Mater 256:119132. https:\/\/doi.org\/10.1016\/j.actamat.2023.119132","journal-title":"Acta Mater"},{"key":"911_CR54","doi-asserted-by":"publisher","first-page":"39","DOI":"10.12691\/ajams-8-2-1","volume":"8","author":"N Shrestha","year":"2020","unstructured":"Shrestha N (2020) Detecting multicollinearity in regression analysis. Am J Appl Math Stat 8:39\u201342","journal-title":"Am J Appl Math Stat"},{"key":"911_CR55","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.radonc.2019.11.023","volume":"145","author":"W Zhang","year":"2020","unstructured":"Zhang W, Fang M, Dong D, Wang X, Ke X, Zhang L, Hu C, Guo L, Guan X, Zhou J, Shan X, Tian J (2020) Development and validation of a CT-based radiomic nomogram for preoperative prediction of early recurrence in advanced gastric cancer. Radiother Oncol 145:13\u201320. https:\/\/doi.org\/10.1016\/j.radonc.2019.11.023","journal-title":"Radiother Oncol"},{"key":"911_CR56","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1007\/s11063-021-10632-5","volume":"54","author":"B Zhao","year":"2022","unstructured":"Zhao B, Dong X, Guo Y, Jia X, Huang Y (2022) PCA dimensionality reduction method for image classification. Neural Process Lett 54:347\u2013368. https:\/\/doi.org\/10.1007\/s11063-021-10632-5","journal-title":"Neural Process Lett"},{"key":"911_CR57","doi-asserted-by":"publisher","first-page":"2603","DOI":"10.1016\/S0098-1354(00)00616-5","volume":"24","author":"N Brauner","year":"2000","unstructured":"Brauner N, Shacham M (2000) Considering precision of data in reduction of dimensionality and PCA. Comput Chem Eng 24:2603\u20132611. https:\/\/doi.org\/10.1016\/S0098-1354(00)00616-5","journal-title":"Comput Chem Eng"},{"key":"911_CR58","unstructured":"van der Maaten L, Postma E, Herik (2007) Dimensionality reduction: a comparative review. J Mach Learn Res JMLR 10"},{"key":"911_CR59","doi-asserted-by":"publisher","first-page":"1697","DOI":"10.1039\/D3DD00009E","volume":"2","author":"S Stuart","year":"2023","unstructured":"Stuart S, Watchorn J, Gu FX (2023) An interpretable machine learning framework for modelling macromolecular interaction mechanisms with nuclear magnetic resonance. Digital Discovery 2:1697\u20131709. https:\/\/doi.org\/10.1039\/D3DD00009E","journal-title":"Digital Discovery"},{"key":"911_CR60","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1007\/s40572-019-00229-5","volume":"6","author":"EA Gibson","year":"2019","unstructured":"Gibson EA, Goldsmith J, Kioumourtzoglou M-A (2019) Complex mixtures complex analyses: an emphasis on interpretable results. Curr Envir Health Rpt 6:53\u201361. https:\/\/doi.org\/10.1007\/s40572-019-00229-5","journal-title":"Curr Envir Health Rpt"},{"key":"911_CR61","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0232296","volume":"15","author":"RP Monti","year":"2020","unstructured":"Monti RP, Gibberd A, Roy S, Nunes M, Lorenz R, Leech R, Ogawa T, Kawanabe M, Hyv\u00e4rinen A (2020) Interpretable brain age prediction using linear latent variable models of functional connectivity. PLoS ONE 15:e0232296. https:\/\/doi.org\/10.1371\/journal.pone.0232296","journal-title":"PLoS ONE"},{"key":"911_CR62","doi-asserted-by":"publisher","first-page":"1456","DOI":"10.3390\/pr9081456","volume":"9","author":"C Trinh","year":"2021","unstructured":"Trinh C, Meimaroglou D, Hoppe S (2021) Machine learning in chemical product engineering: the state of the art and a guide for newcomers. Processes 9:1456. https:\/\/doi.org\/10.3390\/pr9081456","journal-title":"Processes"},{"key":"911_CR63","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1007\/s10064-023-03403-0","volume":"82","author":"S Kim","year":"2023","unstructured":"Kim S, Yoon H-K (2023) Application of classification coupled with PCA and SMOTE, for obtaining safety factor of landslide based on HRA. Bull Eng Geol Environ 82:381. https:\/\/doi.org\/10.1007\/s10064-023-03403-0","journal-title":"Bull Eng Geol Environ"},{"key":"911_CR64","unstructured":"scikit-optimize: sequential model-based optimization in Python\u2014scikit-optimize 0.8.1 documentation, (n.d.). https:\/\/scikit-optimize.github.io\/stable\/ (accessed February 7, 2024)"},{"key":"911_CR65","unstructured":"Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian Optimization of Machine Learning Algorithms, in: Advances in Neural Information Processing Systems, Curran Associates, Inc., https:\/\/papers.nips.cc\/paper_files\/paper\/2012\/hash\/05311655a15b75fab86956663e1819cd-Abstract.html (accessed February 8, 2024)"},{"key":"911_CR66","first-page":"26","volume":"17","author":"J Wu","year":"2019","unstructured":"Wu J, Chen X-Y, Zhang H, Xiong L-D, Lei H, Deng S-H (2019) Hyperparameter optimization for machine learning models based on bayesian optimizationb. J Electron Sci Technol 17:26\u201340","journal-title":"J Electron Sci Technol"},{"key":"911_CR67","doi-asserted-by":"publisher","unstructured":"Ban T, Ohue M, Akiyama Y (2017) Efficient hyperparameter optimization by using Bayesian optimization for drug-target interaction prediction, in: 2017 IEEE 7th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS), pp. 1\u20136. https:\/\/doi.org\/10.1109\/ICCABS.2017.8114299","DOI":"10.1109\/ICCABS.2017.8114299"},{"key":"911_CR68","doi-asserted-by":"publisher","unstructured":"Shekhar S, Bansode A, Salim A (2021) A Comparative study of Hyper-Parameter Optimization Tools, in: 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), pp. 1\u20136. https:\/\/doi.org\/10.1109\/CSDE53843.2021.9718485","DOI":"10.1109\/CSDE53843.2021.9718485"},{"key":"911_CR69","doi-asserted-by":"publisher","first-page":"035022","DOI":"10.1088\/2632-2153\/abee59","volume":"2","author":"A Stuke","year":"2021","unstructured":"Stuke A, Rinke P, Todorovi\u0107 M (2021) Efficient hyperparameter tuning for kernel ridge regression with Bayesian optimization. Mach Learn Sci Technol 2:035022. https:\/\/doi.org\/10.1088\/2632-2153\/abee59","journal-title":"Mach Learn Sci Technol"},{"key":"911_CR70","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.inffus.2021.11.011","volume":"81","author":"R Shwartz-Ziv","year":"2022","unstructured":"Shwartz-Ziv R, Armon A (2022) Tabular data: deep learning is not all you need. Inf Fusion 81:84\u201390. https:\/\/doi.org\/10.1016\/j.inffus.2021.11.011","journal-title":"Inf Fusion"},{"key":"911_CR71","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1186\/s13321-023-00743-7","volume":"15","author":"D Boldini","year":"2023","unstructured":"Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA (2023) Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform 15:73. https:\/\/doi.org\/10.1186\/s13321-023-00743-7","journal-title":"J Cheminform"},{"key":"911_CR72","doi-asserted-by":"publisher","first-page":"1937","DOI":"10.1007\/s10462-020-09896-5","volume":"54","author":"C Bent\u00e9jac","year":"2021","unstructured":"Bent\u00e9jac C, Cs\u00f6rg\u0151 A, Mart\u00ednez-Mu\u00f1oz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54:1937\u20131967. https:\/\/doi.org\/10.1007\/s10462-020-09896-5","journal-title":"Artif Intell Rev"},{"key":"911_CR73","doi-asserted-by":"publisher","DOI":"10.1016\/j.expneurol.2022.114081","volume":"353","author":"Y Xie","year":"2022","unstructured":"Xie Y, Zou X, Han J, Zhang Z, Feng Z, Ouyang Q, Hua S, Liu Z, Li C, Cai Y, Zou Y, Tang Y, Jiang X (2022) Indole-3-propionic acid alleviates ischemic brain injury in a mouse middle cerebral artery occlusion model. Exp Neurol 353:114081. https:\/\/doi.org\/10.1016\/j.expneurol.2022.114081","journal-title":"Exp Neurol"},{"key":"911_CR74","doi-asserted-by":"publisher","first-page":"2897","DOI":"10.1021\/acschemneuro.2c00418","volume":"13","author":"Q Zhao","year":"2022","unstructured":"Zhao Q, Chen T, Ni C, Hu Y, Nan Y, Lin W, Liu Y, Zheng F, Shi X, Lin Z, Zhu J, Lin Z (2022) Indole-3-propionic acid attenuates HI-related blood-brain barrier injury in neonatal rats by modulating the PXR signaling pathway. ACS Chem Neurosci 13:2897\u20132912. https:\/\/doi.org\/10.1021\/acschemneuro.2c00418","journal-title":"ACS Chem Neurosci"},{"key":"911_CR75","doi-asserted-by":"publisher","first-page":"3467","DOI":"10.3390\/nu14173467","volume":"14","author":"Z Zheng","year":"2022","unstructured":"Zheng Z, Wang S, Wu C, Cao Y, Gu Q, Zhu Y, Zhang W, Hu W (2022) Gut Microbiota dysbiosis after traumatic brain injury contributes to persistent microglial activation associated with upregulated Lyz2 and shifted tryptophan metabolic phenotype. Nutrients 14:3467. https:\/\/doi.org\/10.3390\/nu14173467","journal-title":"Nutrients"},{"key":"911_CR76","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuropharm.2023.109690","volume":"239","author":"Y Zhou","year":"2023","unstructured":"Zhou Y, Chen Y, He H, Peng M, Zeng M, Sun H (2023) The role of the indoles in microbiota-gut-brain axis and potential therapeutic targets: a focus on human neurological and neuropsychiatric diseases. Neuropharmacology 239:109690. https:\/\/doi.org\/10.1016\/j.neuropharm.2023.109690","journal-title":"Neuropharmacology"},{"key":"911_CR77","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1002\/ana.26552","volume":"93","author":"VM Bhave","year":"2023","unstructured":"Bhave VM, Ament Z, Patki A, Gao Y, Kijpaisalratana N, Guo B, Chaudhary NS, Guarniz A-LG, Gerszten R, Correa A, Cushman M, Judd S, Irvin MR, Kimberly WT (2023) Plasma metabolites link dietary patterns to stroke risk. Ann Neurol 93:500\u2013510. https:\/\/doi.org\/10.1002\/ana.26552","journal-title":"Ann Neurol"},{"key":"911_CR78","doi-asserted-by":"publisher","DOI":"10.1016\/j.biopha.2023.114559","volume":"162","author":"S Zhang","year":"2023","unstructured":"Zhang S, Jin M, Ren J, Sun X, Zhang Z, Luo Y, Sun X (2023) New insight into gut microbiota and their metabolites in ischemic stroke: a promising therapeutic target. Biomed Pharmacother 162:114559. https:\/\/doi.org\/10.1016\/j.biopha.2023.114559","journal-title":"Biomed Pharmacother"},{"key":"911_CR79","doi-asserted-by":"publisher","DOI":"10.3389\/fendo.2022.841703","author":"B Zhang","year":"2022","unstructured":"Zhang B, Jiang M, Zhao J, Song Y, Du W, Shi J (2022) The mechanism underlying the influence of indole-3-propionic acid: a relevance to metabolic disorders. Front Endocrinol. https:\/\/doi.org\/10.3389\/fendo.2022.841703","journal-title":"Front Endocrinol"},{"key":"911_CR80","doi-asserted-by":"publisher","first-page":"151","DOI":"10.3390\/nu15010151","volume":"15","author":"H Jiang","year":"2023","unstructured":"Jiang H, Chen C, Gao J (2023) Extensive summary of the important roles of indole propionic acid, a gut microbial metabolite in host health and disease. Nutrients 15:151. https:\/\/doi.org\/10.3390\/nu15010151","journal-title":"Nutrients"},{"key":"911_CR81","doi-asserted-by":"publisher","DOI":"10.1177\/1178646920978404","author":"ML Garcez","year":"2020","unstructured":"Garcez ML, Tan VX, Heng B, Guillemin GJ (2020) Sodium butyrate and indole-3-propionic acid prevent the increase of cytokines and kynurenine levels in LPS-induced human primary astrocytes. Int J Tryptophan Res. https:\/\/doi.org\/10.1177\/1178646920978404","journal-title":"Int J Tryptophan Res"},{"key":"911_CR82","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1016\/S0045-6535(02)00118-2","volume":"48","author":"Y Ran","year":"2002","unstructured":"Ran Y, He Y, Yang G, Johnson JLH, Yalkowsky SH (2002) Estimation of aqueous solubility of organic compounds by using the general solubility equation. Chemosphere 48:487\u2013509. https:\/\/doi.org\/10.1016\/S0045-6535(02)00118-2","journal-title":"Chemosphere"},{"key":"911_CR83","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1021\/ci000338c","volume":"41","author":"Y Ran","year":"2001","unstructured":"Ran Y, Yalkowsky SH (2001) Prediction of drug solubility by the general solubility equation (GSE). J Chem Inf Comput Sci 41:354\u2013357. https:\/\/doi.org\/10.1021\/ci000338c","journal-title":"J Chem Inf Comput Sci"},{"key":"911_CR84","doi-asserted-by":"publisher","first-page":"9259","DOI":"10.1021\/acs.iecr.1c00998","volume":"60","author":"K Ge","year":"2021","unstructured":"Ge K, Ji Y (2021) Novel computational approach by combining machine learning with molecular thermodynamics for predicting drug solubility in solvents. Ind Eng Chem Res 60:9259\u20139268. https:\/\/doi.org\/10.1021\/acs.iecr.1c00998","journal-title":"Ind Eng Chem Res"},{"key":"911_CR85","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1007\/s11705-021-2083-5","volume":"16","author":"Y Ma","year":"2022","unstructured":"Ma Y, Gao Z, Shi P, Chen M, Wu S, Yang C, Wang J, Cheng J, Gong J (2022) Machine learning-based solubility prediction and methodology evaluation of active pharmaceutical ingredients in industrial crystallization. Front Chem Sci Eng 16:523\u2013535. https:\/\/doi.org\/10.1007\/s11705-021-2083-5","journal-title":"Front Chem Sci Eng"},{"key":"911_CR86","doi-asserted-by":"publisher","first-page":"3320","DOI":"10.1021\/ci5005288","volume":"54","author":"IV Tetko","year":"2014","unstructured":"Tetko IV, Sushko Y, Novotarskyi S, Patiny L, Kondratov I, Petrenko AE, Charochkina L, Asiri AM (2014) How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 54:3320\u20133329. https:\/\/doi.org\/10.1021\/ci5005288","journal-title":"J Chem Inf Model"},{"key":"911_CR87","doi-asserted-by":"publisher","first-page":"025015","DOI":"10.1088\/2632-2153\/ab8aa3","volume":"1","author":"G Sivaraman","year":"2020","unstructured":"Sivaraman G, Jackson NE, Sanchez-Lengeling B, V\u00e1zquez-Mayagoitia \u00c1, Aspuru-Guzik A, Vishwanath V, de Pablo JJ (2020) A machine learning workflow for molecular analysis: application to melting points. Mach Learn Sci Technol 1:025015. https:\/\/doi.org\/10.1088\/2632-2153\/ab8aa3","journal-title":"Mach Learn Sci Technol"},{"key":"911_CR88","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1039\/D1EA00090J","volume":"2","author":"T Galeazzo","year":"2022","unstructured":"Galeazzo T, Shiraiwa M (2022) Predicting glass transition temperature and melting point of organic compounds via machine learning and molecular embeddings. Environ Sci Atmos 2:362\u2013374. https:\/\/doi.org\/10.1039\/D1EA00090J","journal-title":"Environ Sci Atmos"},{"key":"911_CR89","doi-asserted-by":"publisher","first-page":"318","DOI":"10.1016\/j.molliq.2018.03.090","volume":"264","author":"V Venkatraman","year":"2018","unstructured":"Venkatraman V, Evjen S, Knuutila HK, Fiksdahl A, Alsberg BK (2018) Predicting ionic liquid melting points using machine learning. J Mol Liq 264:318\u2013326. https:\/\/doi.org\/10.1016\/j.molliq.2018.03.090","journal-title":"J Mol Liq"},{"key":"911_CR90","doi-asserted-by":"publisher","first-page":"2948","DOI":"10.1021\/acs.jcim.3c00308","volume":"63","author":"X Zhu","year":"2023","unstructured":"Zhu X, Polyakov VR, Bajjuri K, Hu H, Maderna A, Tovee CA, Ward SC (2023) Building machine learning small molecule melting points and solubility models using CCDC melting points dataset. J Chem Inf Model 63:2948\u20132959. https:\/\/doi.org\/10.1021\/acs.jcim.3c00308","journal-title":"J Chem Inf Model"},{"key":"911_CR91","doi-asserted-by":"publisher","first-page":"646","DOI":"10.3390\/app8040646","volume":"8","author":"MF Uddin","year":"2018","unstructured":"Uddin MF, Lee J, Rizvi S, Hamada S (2018) Proposing enhanced feature engineering and a selection model for machine learning processes. Appl Sci 8:646. https:\/\/doi.org\/10.3390\/app8040646","journal-title":"Appl Sci"},{"key":"911_CR92","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1016\/j.cattod.2016.04.013","volume":"280","author":"Z Li","year":"2017","unstructured":"Li Z, Ma X, Xin H (2017) Feature engineering of machine-learning chemisorption models for catalyst design. Catal Today 280:232\u2013238. https:\/\/doi.org\/10.1016\/j.cattod.2016.04.013","journal-title":"Catal Today"},{"key":"911_CR93","doi-asserted-by":"publisher","first-page":"1878","DOI":"10.1093\/bib\/bby061","volume":"20","author":"AS Rifaioglu","year":"2019","unstructured":"Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Do\u011fan T (2019) Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 20:1878\u20131912. https:\/\/doi.org\/10.1093\/bib\/bby061","journal-title":"Brief Bioinform"},{"key":"911_CR94","doi-asserted-by":"publisher","first-page":"1315","DOI":"10.1007\/s11030-021-10217-3","volume":"25","author":"R Gupta","year":"2021","unstructured":"Gupta R, Srivastava D, Sahu M, Tiwari S, Ambasta RK, Kumar P (2021) Artificial intelligence to deep learning: machine intelligence approach for drug discovery. Mol Divers 25:1315\u20131360. https:\/\/doi.org\/10.1007\/s11030-021-10217-3","journal-title":"Mol Divers"},{"key":"911_CR95","doi-asserted-by":"publisher","unstructured":"Huang K, Fu T, Gao W, Zhao Y, Roohani Y, Leskovec J, Coley CW, Xiao C, Sun J, Zitnik M (2021) Therapeutics data commons: machine learning datasets and tasks for drug discovery and development. https:\/\/doi.org\/10.48550\/arXiv.2102.09548","DOI":"10.48550\/arXiv.2102.09548"},{"key":"911_CR96","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1186\/s13321-020-00445-4","volume":"12","author":"A Capecchi","year":"2020","unstructured":"Capecchi A, Probst D, Reymond J-L (2020) One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J Cheminf 12:43. https:\/\/doi.org\/10.1186\/s13321-020-00445-4","journal-title":"J Cheminf"},{"key":"911_CR97","doi-asserted-by":"publisher","first-page":"2726","DOI":"10.1021\/acs.jcim.2c00242","volume":"62","author":"MC Hutter","year":"2022","unstructured":"Hutter MC (2022) Differential multimolecule fingerprint for similarity search\u2500making use of active and inactive compound sets in virtual screening. J Chem Inf Model 62:2726\u20132736. https:\/\/doi.org\/10.1021\/acs.jcim.2c00242","journal-title":"J Chem Inf Model"},{"key":"911_CR98","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2020.606668","author":"L Xie","year":"2020","unstructured":"Xie L, Xu L, Kong R, Chang S, Xu X (2020) Improvement of prediction performance with conjoint molecular fingerprint in deep learning. Front Pharmacol. https:\/\/doi.org\/10.3389\/fphar.2020.606668","journal-title":"Front Pharmacol"},{"key":"911_CR99","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1186\/s12859-022-05076-0","volume":"24","author":"W Breslin","year":"2023","unstructured":"Breslin W, Pham D (2023) Machine learning and drug discovery for neglected tropical diseases. BMC Bioinformatics 24:165. https:\/\/doi.org\/10.1186\/s12859-022-05076-0","journal-title":"BMC Bioinformatics"},{"key":"911_CR100","doi-asserted-by":"publisher","first-page":"2147","DOI":"10.1021\/acs.jcim.0c01318","volume":"61","author":"P Nguyen","year":"2021","unstructured":"Nguyen P, Loveland D, Kim JT, Karande P, Hiszpanski AM, Han TY-J (2021) Predicting energetics materials\u2019 crystalline density from chemical structure by machine learning. J Chem Inf Model 61:2147\u20132158. https:\/\/doi.org\/10.1021\/acs.jcim.0c01318","journal-title":"J Chem Inf Model"},{"key":"911_CR101","doi-asserted-by":"publisher","DOI":"10.1002\/qua.27230","volume":"123","author":"KM Katubi","year":"2023","unstructured":"Katubi KM, Saqib M, Mubashir T, Tahir MH, Halawa MI, Akbar A, Basha B, Sulaman M, Alrowaili ZA, Al-Buriahi MS (2023) Predicting the multiple parameters of organic acceptors through machine learning using RDkit descriptors: an easy and fast pipeline. Int J Quantum Chem 123:e27230. https:\/\/doi.org\/10.1002\/qua.27230","journal-title":"Int J Quantum Chem"},{"key":"911_CR102","doi-asserted-by":"publisher","DOI":"10.1016\/j.mlwa.2022.100265","volume":"8","author":"D Packwood","year":"2022","unstructured":"Packwood D, Nguyen LTH, Cesana P, Zhang G, Staykov A, Fukumoto Y, Nguyen DH (2022) Machine learning in materials chemistry: an invitation. Mach Learn Appl 8:100265. https:\/\/doi.org\/10.1016\/j.mlwa.2022.100265","journal-title":"Mach Learn Appl"},{"key":"911_CR103","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s43246-022-00315-6","volume":"3","author":"P Reiser","year":"2022","unstructured":"Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P (2022) Graph neural networks for materials science and chemistry. Commun Mater 3:1\u201318. https:\/\/doi.org\/10.1038\/s43246-022-00315-6","journal-title":"Commun Mater"},{"key":"911_CR104","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1039\/C3EE42756K","volume":"7","author":"J Hachmann","year":"2014","unstructured":"Hachmann J, Olivares-Amaya R, Jinich A, Appleton AL, Blood-Forsythe MA, Seress LR, Rom\u00e1n-Salgado C, Trepte K, Atahan-Evrenk S, Er S, Shrestha S, Mondal R, Sokolov A, Bao Z, Aspuru-Guzik A (2014) Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry-the Harvard Clean Energy Project. Energy Environ Sci 7:698\u2013704. https:\/\/doi.org\/10.1039\/C3EE42756K","journal-title":"Energy Environ Sci"},{"key":"911_CR105","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1039\/C5MH00282F","volume":"3","author":"EO Pyzer-Knapp","year":"2016","unstructured":"Pyzer-Knapp EO, Simm GN, Guzik AA (2016) A Bayesian approach to calibrating high-throughput virtual screening results and application to organic photovoltaic materials. Mater Horiz 3:226\u2013233. https:\/\/doi.org\/10.1039\/C5MH00282F","journal-title":"Mater Horiz"},{"key":"911_CR106","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41524-023-01040-5","volume":"9","author":"S Stuart","year":"2023","unstructured":"Stuart S, Watchorn J, Gu FX (2023) Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials. Npj Comput Mater 9:1\u201310. https:\/\/doi.org\/10.1038\/s41524-023-01040-5","journal-title":"Npj Comput Mater"},{"key":"911_CR107","doi-asserted-by":"publisher","first-page":"8705","DOI":"10.1021\/acs.jmedchem.0c00385","volume":"63","author":"KV Chuang","year":"2020","unstructured":"Chuang KV, Gunsalus LM, Keiser MJ (2020) Learning molecular representations for medicinal chemistry. J Med Chem 63:8705\u20138722. https:\/\/doi.org\/10.1021\/acs.jmedchem.0c00385","journal-title":"J Med Chem"},{"key":"911_CR108","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1021\/ci960373c","volume":"37","author":"RD Brown","year":"1997","unstructured":"Brown RD, Martin YC (1997) The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J Chem Inf Comput Sci 37:1\u20139. https:\/\/doi.org\/10.1021\/ci960373c","journal-title":"J Chem Inf Comput Sci"},{"key":"911_CR109","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1007\/s10822-020-00361-7","volume":"35","author":"A Sato","year":"2021","unstructured":"Sato A, Miyao T, Jasial S, Funatsu K (2021) Comparing predictive ability of QSAR\/QSPR models using 2D and 3D molecular representations. J Comput Aided Mol Des 35:179\u2013193. https:\/\/doi.org\/10.1007\/s10822-020-00361-7","journal-title":"J Comput Aided Mol Des"},{"key":"911_CR110","doi-asserted-by":"publisher","first-page":"6802","DOI":"10.1021\/jm060902w","volume":"49","author":"JH Nettles","year":"2006","unstructured":"Nettles JH, Jenkins JL, Bender A, Deng Z, Davies JW, Glick M (2006) Bridging chemical and biological space: \u201ctarget fishing\u201d using 2D and 3D molecular descriptors. J Med Chem 49:6802\u20136810. https:\/\/doi.org\/10.1021\/jm060902w","journal-title":"J Med Chem"},{"key":"911_CR111","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1186\/s12902-022-01121-4","volume":"22","author":"Y Zhang","year":"2022","unstructured":"Zhang Y, Zhang X, Razbek J, Li D, Xia W, Bao L, Mao H, Daken M, Cao M (2022) Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome. BMC Endocr Disord 22:214. https:\/\/doi.org\/10.1186\/s12902-022-01121-4","journal-title":"BMC Endocr Disord"},{"key":"911_CR112","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","volume":"1","author":"C Rudin","year":"2019","unstructured":"Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1:206\u2013215. https:\/\/doi.org\/10.1038\/s42256-019-0048-x","journal-title":"Nat Mach Intell"},{"key":"911_CR113","doi-asserted-by":"publisher","first-page":"674","DOI":"10.1021\/ci0202741","volume":"43","author":"DT Manallack","year":"2003","unstructured":"Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR (2003) A consensus neural network-based technique for discriminating soluble and poorly soluble compounds. J Chem Inf Comput Sci 43:674\u2013679. https:\/\/doi.org\/10.1021\/ci0202741","journal-title":"J Chem Inf Comput Sci"},{"key":"911_CR114","doi-asserted-by":"publisher","first-page":"7078","DOI":"10.1016\/j.bmc.2010.08.003","volume":"18","author":"R Gozalbes","year":"2010","unstructured":"Gozalbes R, Pineda-Lucena A (2010) QSAR-based solubility model for drug-like compounds. Bioorg Med Chem 18:7078\u20137084. https:\/\/doi.org\/10.1016\/j.bmc.2010.08.003","journal-title":"Bioorg Med Chem"},{"key":"911_CR115","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1023\/A:1015103914543","volume":"19","author":"H Gao","year":"2002","unstructured":"Gao H, Shanmugasundaram V, Lee P (2002) Estimation of aqueous solubility of organic compounds with QSPR approach. Pharm Res 19:497\u2013503. https:\/\/doi.org\/10.1023\/A:1015103914543","journal-title":"Pharm Res"},{"key":"911_CR116","doi-asserted-by":"publisher","unstructured":"Xue N, Zhang Y, Liu S (2024) Evaluation of Machine Learning Models for Aqueous Solubility Prediction in Drug Discovery, https:\/\/doi.org\/10.1101\/2024.06.10.598383","DOI":"10.1101\/2024.06.10.598383"},{"key":"911_CR117","unstructured":"Christine-Allen-Lab\/Solubility_ML, GitHub (n.d.). https:\/\/github.com\/Christine-Allen-Lab\/Solubility_ML (accessed March 26, 2024)"},{"key":"911_CR118","unstructured":"PubChem, PubChem, (n.d.). https:\/\/pubchem.ncbi.nlm.nih.gov\/ (accessed March 21, 2024)"},{"key":"911_CR119","unstructured":"Main Page, Wikipedia, the Free Encyclopedia (2024). https:\/\/en.wikipedia.org\/w\/index.php?title=Main_Page&oldid=1212457119 (accessed March 21, 2024)"},{"key":"911_CR120","doi-asserted-by":"publisher","first-page":"D1074","DOI":"10.1093\/nar\/gkx1037","volume":"46","author":"DS Wishart","year":"2018","unstructured":"Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074\u2013D1082. https:\/\/doi.org\/10.1093\/nar\/gkx1037","journal-title":"Nucleic Acids Res"},{"key":"911_CR121","unstructured":"ChemSpider | Search and share chemistry, (n.d.). https:\/\/www.chemspider.com\/ (accessed March 21, 2024)"},{"key":"911_CR122","unstructured":"Chemical Database Online, (n.d.). https:\/\/www.chembk.com\/en (accessed July 19, 2024)."},{"key":"911_CR123","unstructured":"Pesticide Properties Database, (n.d.). https:\/\/sitem.herts.ac.uk\/aeru\/ppdb\/en\/ (accessed July 19, 2024)"},{"key":"911_CR124","unstructured":"CAS Number Search\u2014Chemsrc, (n.d.). https:\/\/www.chemsrc.com\/en\/ (accessed July 19, 2024)"},{"key":"911_CR125","unstructured":"LKT Labs\u2014Biochemicals for Life Science Research, (n.d.). https:\/\/lktlabs.com\/ (accessed March 21, 2024)"},{"key":"911_CR126","unstructured":"ChemicalBook, (n.d.). https:\/\/www.chemicalbook.com\/ProductIndex_EN.aspx (accessed March 21, 2024)"},{"key":"911_CR127","unstructured":"MilliporeSigma | Life Science Products & Service Solutions, (n.d.). https:\/\/www.sigmaaldrich.com\/CA\/en (accessed March 21, 2024)"},{"key":"911_CR128","unstructured":"Antibodies, Gene Editors, Chemicals & Lab Supplies For Research | Santa Cruz Biotechnology, (n.d.). https:\/\/www.scbt.com\/home (accessed March 21, 2024)"},{"key":"911_CR129","unstructured":"Lab Equipment and Lab Supplies | Fisher Scientific, (n.d.). https:\/\/www.fishersci.com\/us\/en\/home.html (accessed March 21, 2024)"},{"key":"911_CR130","unstructured":"Home\u2014AK Scientific (n.d.) https:\/\/aksci.com\/ (accessed July 19, 2024)"},{"key":"911_CR131","unstructured":"Aziridine, Benzyl Isothiocyanate & Benzoyl Isothiocyanate Manufacturers, MOLTUS RESEARCH LABORATORIES PRIVATE LIMITED (n.d.) https:\/\/www.moltuslab.com\/ (accessed July 19, 2024)"},{"key":"911_CR132","unstructured":"TCI AMERICA | Homepage (n.d.) https:\/\/www.tcichemicals.com\/CA\/en\/ (accessed July 19, 2024)"},{"key":"911_CR133","unstructured":"Guidechem chemical B2B network provides information on china and global chemical market quotation and relative chemical Information.Guidechem Chemical Network providing the most complete information of the chemical industry., GuideChem (n.d.). https:\/\/www.guidechem.com (accessed July 19, 2024)"},{"key":"911_CR134","unstructured":"ECHEMI: Online Chemical Company to Buy Chemical Products, ECHEMI (n.d.) https:\/\/www.echemi.com (accessed July 19, 2024)"},{"key":"911_CR135","unstructured":"EBCLink, Drug Delivery (2024). http:\/\/www.ebclink.com\/ (accessed July 19, 2024)."},{"key":"911_CR136","unstructured":"Dielectric Constant (n.d.) https:\/\/macro.lsu.edu\/HowTo\/solvents\/Dielectric%20Constant%20.htm (accessed July 19, 2024)"},{"key":"911_CR137","unstructured":"Solvent Physical Properties (n.d.) https:\/\/people.chem.umass.edu\/xray\/solvent.html (accessed July 19, 2024)"},{"key":"911_CR138","unstructured":"Dielectric constant (n.d.) https:\/\/depts.washington.edu\/eooptic\/linkfiles\/ (accessed July 19, 2024)"},{"key":"911_CR139","unstructured":"rdkit.Chem.Descriptors3D (n.d.) https:\/\/www.rdkit.org\/docs\/source\/rdkit.Chem.Descriptors3D.html# (accessed July 19, 2024)"},{"key":"911_CR140","unstructured":"Jacot-Descombes L, Turcani L, Jorner K, morfeus (2024) https:\/\/github.com\/digital-chemistry-laboratory\/morfeus (accessed July 19, 2024)"},{"key":"911_CR141","unstructured":"scikit-learn: machine learning in Python\u2014scikit-learn 1.4.0 documentation (n.d.) https:\/\/scikit-learn.org\/stable\/ (accessed February 7, 2024)"},{"key":"911_CR142","unstructured":"Welcome to LightGBM\u2019s documentation!\u2014LightGBM 4.3.0.99 documentation (n.d.) https:\/\/lightgbm.readthedocs.io\/en\/latest\/ (accessed February 7, 2024)"},{"key":"911_CR143","unstructured":"XGBoost Python Package\u2014xgboost 2.1.0-dev documentation (n.d.) https:\/\/xgboost.readthedocs.io\/en\/latest\/python\/index.html (accessed February 7, 2024)"},{"key":"911_CR144","unstructured":"lightgbm.plot_importance\u2014LightGBM 4.3.0.99 documentation (n.d.) https:\/\/lightgbm.readthedocs.io\/en\/latest\/pythonapi\/lightgbm.plot_importance.html (accessed March 12, 2024)"},{"key":"911_CR145","unstructured":"StandardScaler, Scikit-Learn (n.d.) https:\/\/www.scikit-learn\/stable\/modules\/generated\/sklearn.preprocessing.StandardScaler.html (accessed July 19, 2024)"},{"key":"911_CR146","unstructured":"KMeans, Scikit-Learn (n.d.) https:\/\/www.scikit-learn\/stable\/modules\/generated\/sklearn.cluster.KMeans.html (accessed July 19, 2024)"},{"key":"911_CR147","unstructured":"mahalanobis\u2014SciPy v1.14.0 Manual (n.d.) https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.mahalanobis.html (accessed July 19, 2024)"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00911-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00911-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00911-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,28]],"date-time":"2024-10-28T10:07:56Z","timestamp":1730110076000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00911-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,28]]},"references-count":147,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["911"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00911-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-4170106\/v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,28]]},"assertion":[{"value":"26 March 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 October 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"C.A. is a cofounder and CEO of Intrepid Labs Inc., A.A.G. is a cofounder of Intrepid Labs Inc., Kebotix Inc., and Zapata AI.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"117"}}