{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T08:12:07Z","timestamp":1777709527105,"version":"3.51.4"},"reference-count":70,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T00:00:00Z","timestamp":1568764800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T00:00:00Z","timestamp":1568764800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n              <jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction.<\/jats:p>\n              <\/jats:sec>\n              <jats:sec>\n                <jats:title>Methods<\/jats:title>\n                <jats:p>The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure\u2013activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and\/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN).<\/jats:p>\n              <\/jats:sec>\n              <jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R<jats:sup>2<\/jats:sup>) around 0.80. Two commercial pKa predictors from ACD\/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products.<\/jats:p>\n              <\/jats:sec>\n              <jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s13321-019-0384-1","type":"journal-article","created":{"date-parts":[[2019,9,18]],"date-time":"2019-09-18T09:04:12Z","timestamp":1568797452000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":144,"title":["Open-source QSAR models for pKa prediction using multiple machine learning approaches"],"prefix":"10.1186","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6426-8036","authenticated-orcid":false,"given":"Kamel","family":"Mansouri","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6135-752X","authenticated-orcid":false,"given":"Neal F.","family":"Cariello","sequence":"additional","affiliation":[]},{"given":"Alexandru","family":"Korotcov","sequence":"additional","affiliation":[]},{"given":"Valery","family":"Tkachenko","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5606-7560","authenticated-orcid":false,"given":"Chris M.","family":"Grulke","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9436-5673","authenticated-orcid":false,"given":"Catherine S.","family":"Sprankle","sequence":"additional","affiliation":[]},{"given":"David","family":"Allen","sequence":"additional","affiliation":[]},{"given":"Warren M.","family":"Casey","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7914-3682","authenticated-orcid":false,"given":"Nicole C.","family":"Kleinstreuer","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2668-4821","authenticated-orcid":false,"given":"Antony J.","family":"Williams","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,9,18]]},"reference":[{"key":"384_CR1","unstructured":"Wikipedia (2019) Acid dissociation constant. \n                    https:\/\/en.wikipedia.org\/w\/index.php?title=Acid_dissociation_constant&oldid=897688731\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR2","unstructured":"US EPA-OCSPP (2015) Guidance for reporting on the environmental fate and transport of the stressors of concern in problem formulations. In: US EPA. \n                    https:\/\/www.epa.gov\/pesticide-science-and-assessing-pesticide-risks\/guidance-reporting-environmental-fate-and-transport\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR3","doi-asserted-by":"publisher","first-page":"294","DOI":"10.1016\/0147-6513(82)90019-7","volume":"6","author":"W Kl\u00f6pffer","year":"1982","unstructured":"Kl\u00f6pffer W, Rippen G, Frische R (1982) Physicochemical properties as useful tools for predicting the environmental fate of organic chemicals. Ecotoxicol Environ Saf 6:294\u2013301. \n                    https:\/\/doi.org\/10.1016\/0147-6513(82)90019-7","journal-title":"Ecotoxicol Environ Saf"},{"key":"384_CR4","unstructured":"Linde CD (1994) Physico-chemical properties and environmental fate of pesticides. In: Environmental hazards assessment program, state of California EPA. \n                    http:\/\/agris.fao.org\/agris-search\/search.do?recordID=US201300074742\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR5","doi-asserted-by":"publisher","DOI":"10.17226\/18872","volume-title":"A framework to guide selection of chemical alternatives","author":"National Research Council","year":"2014","unstructured":"National Research Council (2014) A framework to guide selection of chemical alternatives. The National Academies Press, Washington, D.C. \n                    https:\/\/doi.org\/10.17226\/18872"},{"key":"384_CR6","doi-asserted-by":"publisher","first-page":"1812","DOI":"10.1002\/cbdv.200900153","volume":"6","author":"G Cruciani","year":"2009","unstructured":"Cruciani G, Milletti F, Storchi L et al (2009) In silico pKa prediction and ADME profiling. Chem Biodivers 6:1812\u20131821. \n                    https:\/\/doi.org\/10.1002\/cbdv.200900153","journal-title":"Chem Biodivers"},{"key":"384_CR7","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1016\/j.ddtec.2004.08.011","volume":"1","author":"EH Kerns","year":"2004","unstructured":"Kerns EH, Di L (2004) Physicochemical profiling: overview of the screens. Drug Discov Today Technol 1:343\u2013348. \n                    https:\/\/doi.org\/10.1016\/j.ddtec.2004.08.011","journal-title":"Drug Discov Today Technol"},{"key":"384_CR8","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1093\/toxsci\/kfv171","volume":"148","author":"BA Wetmore","year":"2015","unstructured":"Wetmore BA, Wambaugh JF, Allen B et al (2015) Incorporating high-throughput exposure predictions with dosimetry-adjusted in vitro bioactivity to inform chemical toxicity testing. Toxicol Sci 148:121\u2013136. \n                    https:\/\/doi.org\/10.1093\/toxsci\/kfv171","journal-title":"Toxicol Sci"},{"key":"384_CR9","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1016\/j.scitotenv.2017.09.033","volume":"615","author":"CL Strope","year":"2018","unstructured":"Strope CL, Mansouri K, Clewell HJ et al (2018) High-throughput in silico prediction of ionization equilibria for pharmacokinetic modeling. Sci Total Environ 615:150\u2013160. \n                    https:\/\/doi.org\/10.1016\/j.scitotenv.2017.09.033","journal-title":"Sci Total Environ"},{"key":"384_CR10","doi-asserted-by":"publisher","first-page":"3103","DOI":"10.1002\/jps.20217","volume":"93","author":"IV Tetko","year":"2004","unstructured":"Tetko IV, Bruneau P (2004) Application of ALOGPS to predict 1-octanol\/water distribution coefficients, logP, and logD, of AstraZeneca in-house database. J Pharm Sci 93:3103\u20133110. \n                    https:\/\/doi.org\/10.1002\/jps.20217","journal-title":"J Pharm Sci"},{"key":"384_CR11","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1038\/194178b0","volume":"194","author":"C Hansch","year":"1962","unstructured":"Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194:178\u2013180. \n                    https:\/\/doi.org\/10.1038\/194178b0","journal-title":"Nature"},{"key":"384_CR12","doi-asserted-by":"publisher","first-page":"1243","DOI":"10.1021\/acs.jcim.6b00129","volume":"56","author":"D Fourches","year":"2016","unstructured":"Fourches D, Muratov E, Tropsha A (2016) Trust, but verify II: a practical guide to chemogenomics data curation. J Chem Inf Model 56:1243\u20131252. \n                    https:\/\/doi.org\/10.1021\/acs.jcim.6b00129","journal-title":"J Chem Inf Model"},{"key":"384_CR13","doi-asserted-by":"publisher","first-page":"911","DOI":"10.1080\/1062936X.2016.1253611","volume":"27","author":"K Mansouri","year":"2016","unstructured":"Mansouri K, Grulke CM, Richard AM et al (2016) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Environ Res 27:911\u2013937. \n                    https:\/\/doi.org\/10.1080\/1062936X.2016.1253611","journal-title":"SAR QSAR Environ Res"},{"key":"384_CR14","unstructured":"BioByte Corporation (2019) BioByte. \n                    http:\/\/www.biobyte.com\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR15","unstructured":"Advanced Chemistry Development ACDLabs (2019) Chemistry software for analytical and chemical knowledge management. \n                    https:\/\/www.acdlabs.com\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR16","unstructured":"Simulations Plus (2019) Simulations Plus: model-based drug development to make better data-driven decisions. \n                    https:\/\/www.simulations-plus.com\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR17","unstructured":"ChemAxon Ltd. (2019) Chemicalize. \n                    https:\/\/chemaxon.com\/products\/chemicalize\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR18","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1007\/s10822-011-9440-2","volume":"25","author":"I Sushko","year":"2011","unstructured":"Sushko I, Novotarskyi S, K\u00f6rner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25:533\u2013554. \n                    https:\/\/doi.org\/10.1007\/s10822-011-9440-2","journal-title":"J Comput Aided Mol Des"},{"key":"384_CR19","unstructured":"Online Chemical Modeling Environment (OCHEM) (2019) Online chemical database with modeling environment. \n                    https:\/\/ochem.eu\/home\/show.do\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR20","unstructured":"QSAR DataBank (2019) Institute of Chemistry, University of Tartu, Tartu, Estonia. \n                    https:\/\/qsardb.org\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR21","unstructured":"Chembench (2019) Carolina Exploratory Center for Cheminformatics Research, Chapel Hill, NC. \n                    https:\/\/chembench.mml.unc.edu\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR22","volume-title":"Making open and machine readable the new default for government information","author":"B Obama","year":"2013","unstructured":"Obama B (2013) Making open and machine readable the new default for government information. Office of the Executive, Washington, D.C"},{"key":"384_CR23","unstructured":"Burwell SM, VanRoekel S, Mancini DJ (2013) Memorandum for the heads of executive departments and agencies\u2014project open data. \n                    https:\/\/project-open-data.cio.gov\/policy-memo\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR24","first-page":"25","volume":"1","author":"DT Manallack","year":"2007","unstructured":"Manallack DT (2007) The pK(a) distribution of drugs: application to drug discovery. Perspect Med Chem 1:25\u201338","journal-title":"Perspect Med Chem"},{"key":"384_CR25","doi-asserted-by":"publisher","first-page":"2013","DOI":"10.1021\/ci900209w","volume":"49","author":"AC Lee","year":"2009","unstructured":"Lee AC, Crippen GM (2009) Predicting pKa. J Chem Inf Model 49:2013\u20132033","journal-title":"J Chem Inf Model"},{"key":"384_CR26","doi-asserted-by":"publisher","first-page":"307","DOI":"10.2174\/138620711795508403","volume":"14","author":"M Rupp","year":"2011","unstructured":"Rupp M, K\u00f6rner R, Tetko IV (2011) Predicting the pK a of small molecules. Comb Chem High Throughput Screen 14:307\u2013327. \n                    https:\/\/doi.org\/10.2174\/138620711795508403","journal-title":"Comb Chem High Throughput Screen"},{"key":"384_CR27","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-018-0263-1","author":"K Mansouri","year":"2018","unstructured":"Mansouri K, Grulke CM, Judson RS, Williams AJ (2018) OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform. \n                    https:\/\/doi.org\/10.1186\/s13321-018-0263-1","journal-title":"J Cheminform"},{"key":"384_CR28","doi-asserted-by":"publisher","first-page":"2801","DOI":"10.1021\/ci900289x","volume":"49","author":"C Liao","year":"2009","unstructured":"Liao C, Nicklaus MC (2009) Comparison of nine programs predicting pKa values of pharmaceutical substances. J Chem Inf Model 49:2801\u20132812. \n                    https:\/\/doi.org\/10.1021\/ci900289x","journal-title":"J Chem Inf Model"},{"key":"384_CR29","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1021\/ci500588j","volume":"55","author":"T Sander","year":"2015","unstructured":"Sander T, Freyss J, von Korff M, Rufener C (2015) DataWarrior: an open-source program for chemistry aware data visualization and analysis. J Chem Inf Model 55:460\u2013473. \n                    https:\/\/doi.org\/10.1021\/ci500588j","journal-title":"J Chem Inf Model"},{"key":"384_CR30","doi-asserted-by":"publisher","first-page":"1023","DOI":"10.1289\/ehp.1510267","volume":"124","author":"K Mansouri","year":"2016","unstructured":"Mansouri K, Abdelaziz A, Rybacka A et al (2016) CERAPP: collaborative estrogen receptor activity prediction project. Environ Health Perspect 124:1023\u20131033. \n                    https:\/\/doi.org\/10.1289\/ehp.1510267","journal-title":"Environ Health Perspect"},{"key":"384_CR31","doi-asserted-by":"publisher","first-page":"1466","DOI":"10.1002\/jcc.21707","volume":"32","author":"CW Yap","year":"2011","unstructured":"Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466\u20131474. \n                    https:\/\/doi.org\/10.1002\/jcc.21707","journal-title":"J Comput Chem"},{"key":"384_CR32","unstructured":"Sander T (2019) Openmolecules.org: free services all around molecules. \n                    http:\/\/www.openmolecules.org\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR33","doi-asserted-by":"publisher","first-page":"510","DOI":"10.1021\/ci500667v","volume":"55","author":"C Yang","year":"2015","unstructured":"Yang C, Tarkhov A, Marusczyk J et al (2015) New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model 55:510\u2013528. \n                    https:\/\/doi.org\/10.1021\/ci500667v","journal-title":"J Chem Inf Model"},{"key":"384_CR34","doi-asserted-by":"crossref","unstructured":"Berthold MR, Cebron N, Dill F et al (2008) KNIME: the Konstanz information miner. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications: proceedings of the 31st annual conference of the Gesellschaft f\u00fcr Klassifikation e.V., Albert-Ludwigs-Universit\u00e4t Freiburg, March 7\u20139, 2007. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 319\u2013326","DOI":"10.1007\/978-3-540-78246-9_38"},{"key":"384_CR35","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1016\/j.talanta.2018.01.022","volume":"182","author":"AD McEachran","year":"2018","unstructured":"McEachran AD, Mansouri K, Newton SR et al (2018) A comparison of three liquid chromatography (LC) retention time prediction models. Talanta 182:371\u2013379. \n                    https:\/\/doi.org\/10.1016\/j.talanta.2018.01.022","journal-title":"Talanta"},{"key":"384_CR36","doi-asserted-by":"publisher","first-page":"1225","DOI":"10.1021\/acs.chemrestox.6b00135","volume":"29","author":"AM Richard","year":"2016","unstructured":"Richard AM, Judson RS, Houck KA et al (2016) ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol 29:1225\u20131251. \n                    https:\/\/doi.org\/10.1021\/acs.chemrestox.6b00135","journal-title":"Chem Res Toxicol"},{"key":"384_CR37","doi-asserted-by":"crossref","unstructured":"Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM, New York, pp 144\u2013152","DOI":"10.1145\/130385.130401"},{"key":"384_CR38","first-page":"273","volume-title":"Machine learning","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Machine learning. McGraw Hill, New York, pp 273\u2013297"},{"key":"384_CR39","unstructured":"Chang C-C, Lin C-J (2001) LIBSVM 3.1: a library for support vector machines. \n                    http:\/\/www.csie.ntu.edu.tw\/~cjlin\/libsvm\n                    \n                  . National Taiwan University, Department of Computer Science, Taipei 106, Taiwan"},{"issue":"3","key":"384_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1961189.1961199","volume":"2","author":"CC Chang","year":"2011","unstructured":"Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):1\u201327. \n                    https:\/\/doi.org\/10.1145\/1961189.1961199","journal-title":"ACM Trans Intell Syst Technol"},{"key":"384_CR41","doi-asserted-by":"publisher","first-page":"194","DOI":"10.1002\/cem.1290","volume":"24","author":"V Consonni","year":"2010","unstructured":"Consonni V, Ballabio D, Todeschini R (2010) Evaluation of model predictive ability by external validation techniques. J Chemom 24:194\u2013201. \n                    https:\/\/doi.org\/10.1002\/cem.1290","journal-title":"J Chemom"},{"key":"384_CR42","doi-asserted-by":"publisher","first-page":"1905","DOI":"10.1021\/acs.jcim.6b00277","volume":"56","author":"R Todeschini","year":"2016","unstructured":"Todeschini R, Ballabio D, Grisoni F (2016) Beware of unreliable Q2! A comparative study of regression metrics for predictivity assessment of QSAR models. J Chem Inf Model 56:1905\u20131913. \n                    https:\/\/doi.org\/10.1021\/acs.jcim.6b00277","journal-title":"J Chem Inf Model"},{"key":"384_CR43","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.chemolab.2010.10.010","volume":"105","author":"D Ballabio","year":"2011","unstructured":"Ballabio D, Vasighi M, Consonni V, Kompany-Zareh M (2011) Genetic algorithms for architecture optimisation of counter-propagation artificial neural networks. Chemom Intell Lab Syst 105:56\u201364","journal-title":"Chemom Intell Lab Syst"},{"key":"384_CR44","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/S0169-7439(98)00051-3","volume":"41","author":"R Leardi","year":"1998","unstructured":"Leardi R, Lupi\u00e1\u00f1ez Gonz\u00e1lez A (1998) Genetic algorithms applied to feature selection in PLS regression: how and when to use them. Chemom Intell Lab Syst 41:195\u2013207. \n                    https:\/\/doi.org\/10.1016\/S0169-7439(98)00051-3","journal-title":"Chemom Intell Lab Syst"},{"key":"384_CR45","unstructured":"Mansouri K (2019) OPERA\u2014open structure\u2013activity\/property relationship app. National Institute of Environmental Health Science, Research Triangle Park, NC. \n                    https:\/\/github.com\/NIEHS\/OPERA"},{"key":"384_CR46","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.3390\/molecules17054791","volume":"17","author":"F Sahigara","year":"2012","unstructured":"Sahigara F, Mansouri K, Ballabio D et al (2012) Comparison of different approaches to define the applicability domain of QSAR models. Molecules 17:4791\u20134810. \n                    https:\/\/doi.org\/10.3390\/molecules17054791","journal-title":"Molecules"},{"key":"384_CR47","unstructured":"MathWorks (2018) MATLAB 2018a. \n                    www.mathworks.com"},{"key":"384_CR48","doi-asserted-by":"publisher","unstructured":"Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining\u2014KDD\u201916, pp 785\u2013794. \n                    https:\/\/doi.org\/10.1145\/2939672.2939785","DOI":"10.1145\/2939672.2939785"},{"key":"384_CR49","unstructured":"XGBoost (2019) XGBoost documentation. \n                    https:\/\/xgboost.readthedocs.io\/en\/latest\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR50","unstructured":"Nishida K (2017) Introduction to extreme gradient boosting in exploratory. \n                    https:\/\/blog.exploratory.io\/introduction-to-extreme-gradient-boosting-in-exploratory-7bbec554ac7\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR51","doi-asserted-by":"publisher","first-page":"2353","DOI":"10.1021\/acs.jcim.6b00591","volume":"56","author":"RP Sheridan","year":"2016","unstructured":"Sheridan RP, Wang WM, Liaw A et al (2016) Extreme gradient boosting as a method for quantitative structure\u2013activity relationships. J Chem Inf Model 56:2353\u20132360. \n                    https:\/\/doi.org\/10.1021\/acs.jcim.6b00591","journal-title":"J Chem Inf Model"},{"key":"384_CR52","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v028.i05","volume":"28","author":"M Kuhn","year":"2008","unstructured":"Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1\u201326. \n                    https:\/\/doi.org\/10.18637\/jss.v028.i05","journal-title":"J Stat Softw"},{"key":"384_CR53","unstructured":"Chen T, He T, Benesty M et al (2019) xgboost: extreme gradient boosting. \n                    https:\/\/CRAN.R-project.org\/package=xgboost\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR54","unstructured":"Cariello N (2018) NIEHS\/machine-learning-pipeline development. \n                    https:\/\/github.com\/NIEHS\/Machine-Learning-Pipeline\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR55","doi-asserted-by":"publisher","first-page":"878","DOI":"10.15252\/msb.20156651","volume":"12","author":"C Angermueller","year":"2016","unstructured":"Angermueller C, P\u00e4rnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12:878. \n                    https:\/\/doi.org\/10.15252\/msb.20156651","journal-title":"Mol Syst Biol"},{"key":"384_CR56","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1042\/ETLS20160025","volume":"1","author":"W Jones","year":"2017","unstructured":"Jones W, Alasoo K, Fishman D, Parts L (2017) Computational biology: deep learning. Emerg Top Life Sci 1:257\u2013274. \n                    https:\/\/doi.org\/10.1042\/ETLS20160025","journal-title":"Emerg Top Life Sci"},{"key":"384_CR57","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1021\/acs.molpharmaceut.5b00982","volume":"13","author":"P Mamoshina","year":"2016","unstructured":"Mamoshina P, Vieira A, Putin E, Zhavoronkov A (2016) Applications of deep learning in biomedicine. Mol Pharm 13:1445\u20131454. \n                    https:\/\/doi.org\/10.1021\/acs.molpharmaceut.5b00982","journal-title":"Mol Pharm"},{"key":"384_CR58","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1002\/jcc.24764","volume":"38","author":"GB Goh","year":"2017","unstructured":"Goh GB, Hodas NO, Vishnu A (2017) Deep learning for computational chemistry. J Comput Chem 38:1291\u20131307. \n                    https:\/\/doi.org\/10.1002\/jcc.24764","journal-title":"J Comput Chem"},{"key":"384_CR59","doi-asserted-by":"publisher","first-page":"642","DOI":"10.1021\/acs.chemrestox.6b00385","volume":"30","author":"TB Hughes","year":"2017","unstructured":"Hughes TB, Swamidass SJ (2017) Deep learning to predict the formation of quinone species in drug metabolism. Chem Res Toxicol 30:642\u2013656. \n                    https:\/\/doi.org\/10.1021\/acs.chemrestox.6b00385","journal-title":"Chem Res Toxicol"},{"key":"384_CR60","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1021\/ci500747n","volume":"55","author":"J Ma","year":"2015","unstructured":"Ma J, Sheridan RP, Liaw A et al (2015) Deep neural nets as a method for quantitative structure\u2013activity relationships. J Chem Inf Model 55:263\u2013274. \n                    https:\/\/doi.org\/10.1021\/ci500747n","journal-title":"J Chem Inf Model"},{"key":"384_CR61","unstructured":"Chollet F Keras: the Python deep learning library. \n                    https:\/\/keras.io\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR62","unstructured":"Google, Inc (2019) TensorFlow. \n                    https:\/\/www.tensorflow.org\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR63","unstructured":"Sci-kit Learn Developers (2019) scikit-learn: machine learning in Python. \n                    https:\/\/scikit-learn.org\/stable\/\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR64","doi-asserted-by":"publisher","unstructured":"Voosen P, 2017, Pm 2:00 (2017) How AI detectives are cracking open the black box of deep learning. \n                    https:\/\/doi.org\/10.1126\/science.aan7059\n                    \n                  . Accessed 21 May 2019","DOI":"10.1126\/science.aan7059"},{"key":"384_CR65","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1038\/538020a","volume":"538","author":"D Castelvecchi","year":"2016","unstructured":"Castelvecchi D (2016) Can we open the black box of AI? Nat News 538:20. \n                    https:\/\/doi.org\/10.1038\/538020a","journal-title":"Nat News"},{"key":"384_CR66","unstructured":"US EPA-NCCT (2019) EPA | TSCA: TSCA inventory, active non-confidential portion. \n                    https:\/\/comptox.epa.gov\/dashboard\/chemical_lists\/tscaactivenonconf\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR67","unstructured":"US EPA-NCCT (2019) Chemistry Dashboard | Batch Search. \n                    https:\/\/comptox.epa.gov\/dashboard\/dsstoxdb\/batch_search\n                    \n                  . Accessed 21 May 2019"},{"key":"384_CR68","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1093\/toxsci\/kfy020","volume":"163","author":"JF Wambaugh","year":"2018","unstructured":"Wambaugh JF, Hughes MF, Ring CL et al (2018) Evaluating in vitro\u2013in vivo extrapolation of toxicokinetics. Toxicol Sci 163:152\u2013169. \n                    https:\/\/doi.org\/10.1093\/toxsci\/kfy020","journal-title":"Toxicol Sci"},{"key":"384_CR69","doi-asserted-by":"publisher","first-page":"2046","DOI":"10.1021\/acs.chemrestox.7b00084","volume":"30","author":"J Liu","year":"2017","unstructured":"Liu J, Patlewicz G, Williams AJ et al (2017) Predicting organ toxicity using in vitro bioactivity data and chemical structure. Chem Res Toxicol 30:2046\u20132059. \n                    https:\/\/doi.org\/10.1021\/acs.chemrestox.7b00084","journal-title":"Chem Res Toxicol"},{"key":"384_CR70","unstructured":"US EPA-NCCT (2019) Chemistry dashboard predictions. \n                    https:\/\/comptox.epa.gov\/dashboard\/predictions\/index\n                    \n                  . Accessed 23 Aug 2019"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-019-0384-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13321-019-0384-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-019-0384-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,9,17]],"date-time":"2020-09-17T12:33:32Z","timestamp":1600346012000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-019-0384-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,18]]},"references-count":70,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["384"],"URL":"https:\/\/doi.org\/10.1186\/s13321-019-0384-1","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,18]]},"assertion":[{"value":"23 May 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 September 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 September 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"60"}}