{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T12:21:53Z","timestamp":1774959713799,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,3,30]],"date-time":"2020-03-30T00:00:00Z","timestamp":1585526400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,3,30]],"date-time":"2020-03-30T00:00:00Z","timestamp":1585526400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Ensemble learning helps improve machine learning results by combining several models and allows the production of better predictive performance compared to a single model. It also benefits and accelerates the researches in quantitative structure\u2013activity relationship (QSAR) and quantitative structure\u2013property relationship (QSPR). With the growing number of ensemble learning models such as random forest, the effectiveness of QSAR\/QSPR will be limited by the machine\u2019s inability to interpret the predictions to researchers. In fact, many implementations of ensemble learning models are able to quantify the overall magnitude of each feature. For example, feature importance allows us to assess the relative importance of features and to interpret the predictions. However, different ensemble learning methods or implementations may lead to different feature selections for interpretation. In this paper, we compared the predictability and interpretability of four typical well-established ensemble learning models (Random forest, extreme randomized trees, adaptive boosting and gradient boosting) for regression and binary classification modeling tasks. Then, the blending methods were built by summarizing four different ensemble learning methods. The blending method led to better performance and a unification interpretation by summarizing individual predictions from different learning models. The important features of two case studies which gave us some valuable information to compound properties were discussed in detail in this report. QSPR modeling with interpretable machine learning techniques can move the chemical design forward to work more efficiently, confirm hypothesis and establish knowledge for better results.<\/jats:p>","DOI":"10.1186\/s13321-020-0417-9","type":"journal-article","created":{"date-parts":[[2020,3,30]],"date-time":"2020-03-30T17:02:42Z","timestamp":1585587762000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":68,"title":["Comparison and improvement of the predictability and interpretability with ensemble learning models in QSPR applications"],"prefix":"10.1186","volume":"12","author":[{"given":"Chia-Hsiu","family":"Chen","sequence":"first","affiliation":[]},{"given":"Kenichi","family":"Tanaka","sequence":"additional","affiliation":[]},{"given":"Masaaki","family":"Kotera","sequence":"additional","affiliation":[]},{"given":"Kimito","family":"Funatsu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,3,30]]},"reference":[{"key":"417_CR1","doi-asserted-by":"publisher","first-page":"468","DOI":"10.1002\/wcms.1183","volume":"4","author":"JBO Mitchell","year":"2014","unstructured":"Mitchell JBO (2014) Machine learning methods in chemoinformatics. Wiley Interdiscip Rev Comput Mol Sci 4:468\u2013481","journal-title":"Wiley Interdiscip Rev Comput Mol Sci"},{"key":"417_CR2","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1039\/cs9952400279","volume":"24","author":"AR Katritzky","year":"1995","unstructured":"Katritzky AR, Lobanov VS, Karelson M (1995) QSPR: the correlation and quantitative prediction of chemical and physical properties from structure. Chem Soc Rev 24:279\u2013287","journal-title":"Chem Soc Rev"},{"key":"417_CR3","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1038\/194178b0","volume":"194","author":"C Hansch","year":"1962","unstructured":"Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients. Nature 194:178","journal-title":"Nature"},{"key":"417_CR4","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. CRC Press, Boca Raton"},{"key":"417_CR5","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1016\/0954-1810(94)00011-S","volume":"9","author":"ATC Goh","year":"1995","unstructured":"Goh ATC (1995) Back-propagation neural networks for modeling complex systems. Artif Intell Eng 9:143\u2013151. https:\/\/doi.org\/10.1016\/0954-1810(94)00011-S","journal-title":"Artif Intell Eng"},{"key":"417_CR6","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273\u2013297. https:\/\/doi.org\/10.1007\/BF00994018","journal-title":"Mach Learn"},{"key":"417_CR7","unstructured":"Kim B, Khanna R, Koyejo OO (2016) Examples are not enough, learn to criticize! criticism for interpretability. In: Advances in neural information processing systems. pp 2280\u20132288"},{"key":"417_CR8","doi-asserted-by":"crossref","unstructured":"Lakkaraju H, Bach SH, Leskovec J (2016) Interpretable decision sets: A joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp 1675\u20131684","DOI":"10.1145\/2939672.2939874"},{"key":"417_CR9","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45:5\u201332. https:\/\/doi.org\/10.1023\/A:1010933404324","journal-title":"Mach Learn"},{"key":"417_CR10","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1186\/1471-2105-9-307","volume":"9","author":"C Strobl","year":"2008","unstructured":"Strobl C, Boulesteix A-L, Kneib T et al (2008) Conditional variable importance for random forests. BMC Bioinform 9:307","journal-title":"BMC Bioinform"},{"key":"417_CR11","doi-asserted-by":"crossref","unstructured":"Svetnik V, Liaw A, Tong C, Wang T (2004) Application of Breiman\u2019s random forest to modeling structure\u2013activity relationships of pharmaceutical molecules BT. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems: 5th international workshop, MCS 2004, Cagliari, Italy, June 9\u201311, 2004. Proceedings. Springer Berlin Heidelberg, Berlin, pp 334\u2013343","DOI":"10.1007\/978-3-540-25966-4_33"},{"key":"417_CR12","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1186\/1758-2946-5-9","volume":"5","author":"AL Teixeira","year":"2013","unstructured":"Teixeira AL, Leal JP, Falcao AO (2013) Random forests for feature selection in QSPR models\u2014an application for predicting standard enthalpy of formation of hydrocarbons. J Cheminform 5:9","journal-title":"J Cheminform"},{"key":"417_CR13","doi-asserted-by":"publisher","first-page":"2179","DOI":"10.1021\/ci049849f","volume":"44","author":"R Guha","year":"2004","unstructured":"Guha R, Jurs PC (2004) Development of linear, ensemble, and nonlinear models for the prediction and interpretation of the biological activity of a set of PDGFR inhibitors. J Chem Inf Comput Sci 44:2179\u20132189. https:\/\/doi.org\/10.1021\/ci049849f","journal-title":"J Chem Inf Comput Sci"},{"key":"417_CR14","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1021\/ci900203n","volume":"49","author":"PG Polishchuk","year":"2009","unstructured":"Polishchuk PG, Muratov EN, Artemenko AG et al (2009) Application of random forest approach to QSAR prediction of aquatic toxicity. J Chem Inf Model 49:2481\u20132488. https:\/\/doi.org\/10.1021\/ci900203n","journal-title":"J Chem Inf Model"},{"key":"417_CR15","doi-asserted-by":"publisher","first-page":"1773","DOI":"10.1021\/acs.jcim.6b00753","volume":"57","author":"RL Marchese Robinson","year":"2017","unstructured":"Marchese Robinson RL, Palczewska A, Palczewski J, Kidley N (2017) Comparison of the predictive performance and interpretability of random forest and linear models on benchmark data sets. J Chem Inf Model 57:1773\u20131792","journal-title":"J Chem Inf Model"},{"key":"417_CR16","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1007\/BF00058655","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L (1996) Bagging predictors. Mach Learn 24:123\u2013140. https:\/\/doi.org\/10.1007\/BF00058655","journal-title":"Mach Learn"},{"key":"417_CR17","first-page":"1612","volume":"14","author":"Y Freund","year":"1999","unstructured":"Freund Y, Schapire R, Abe N (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14:1612","journal-title":"J Jpn Soc Artif Intell"},{"key":"417_CR18","doi-asserted-by":"publisher","first-page":"766","DOI":"10.1021\/ci700443v","volume":"48","author":"H Zhu","year":"2008","unstructured":"Zhu H, Tropsha A, Fourches D et al (2008) Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model 48:766\u2013784","journal-title":"J Chem Inf Model"},{"key":"417_CR19","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","volume":"5","author":"DH Wolpert","year":"1992","unstructured":"Wolpert DH (1992) Stacked generalization. Neural Netw 5:241\u2013259","journal-title":"Neural Netw"},{"key":"417_CR20","unstructured":"Bennett J, Lanning S et al (2007) The netflix prize. In: Proceedings of KDD cup and workshop. p 35"},{"key":"417_CR21","unstructured":"fluorophores.org. http:\/\/www.fluorophores.tugraz.at\/. Accessed 1 May 2007"},{"key":"417_CR22","doi-asserted-by":"publisher","first-page":"3075","DOI":"10.1021\/bi00581a025","volume":"18","author":"G Weber","year":"1979","unstructured":"Weber G, Farris FJ (1979) Synthesis and spectral properties of a hydrophobic fluorescent probe: 6-propionyl-2-(dimethylamino)naphthalene. Biochemistry 18:3075\u20133078. https:\/\/doi.org\/10.1021\/bi00581a025","journal-title":"Biochemistry"},{"key":"417_CR23","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1021\/jz9003685","volume":"1","author":"OA Kucherak","year":"2010","unstructured":"Kucherak OA, Didier P, M\u00e9ly Y, Klymchenko AS (2010) Fluorene analogues of prodan with superior fluorescence brightness and solvatochromism. J Phys Chem Lett 1:616\u2013620. https:\/\/doi.org\/10.1021\/jz9003685","journal-title":"J Phys Chem Lett"},{"key":"417_CR24","doi-asserted-by":"publisher","first-page":"9651","DOI":"10.1021\/jo0616660","volume":"71","author":"Z Lu","year":"2006","unstructured":"Lu Z, Lord SJ, Wang H et al (2006) Long-wavelength analogue of PRODAN: synthesis and properties of anthradan, a fluorophore with a 2,6-donor\u2013acceptor anthracene structure. J Org Chem 71:9651\u20139657. https:\/\/doi.org\/10.1021\/jo0616660","journal-title":"J Org Chem"},{"key":"417_CR25","volume-title":"LiqCryst 4.6 database","author":"V Vill","year":"2005","unstructured":"Vill V (2005) LiqCryst 4.6 database. LCI, Fujitsu"},{"key":"417_CR26","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1613\/jair.614","volume":"11","author":"D Opitz","year":"1999","unstructured":"Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169\u2013198","journal-title":"J Artif Intell Res"},{"key":"417_CR27","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/MCAS.2006.1688199","volume":"6","author":"R Polikar","year":"2006","unstructured":"Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6:21\u201345","journal-title":"IEEE Circuits Syst Mag"},{"key":"417_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-009-9124-7","volume":"33","author":"L Rokach","year":"2010","unstructured":"Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1\u201339","journal-title":"Artif Intell Rev"},{"key":"417_CR29","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","volume":"63","author":"P Geurts","year":"2006","unstructured":"Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3\u201342. https:\/\/doi.org\/10.1007\/s10994-006-6226-1","journal-title":"Mach Learn"},{"key":"417_CR30","unstructured":"Breiman L (1997) Arcing the edge"},{"key":"417_CR31","unstructured":"Friedman JH (2016) Greedy function approximation: a gradient boosting machine. https:\/\/statweb.stanford.edu\/~jhf\/ftp\/trebst.pdf"},{"key":"417_CR32","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","volume":"38","author":"JH Friedman","year":"2002","unstructured":"Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367\u2013378","journal-title":"Comput Stat Data Anal"},{"key":"417_CR33","first-page":"49","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L (1996) Stacked regressions. Mach Learn 24:49\u201364","journal-title":"Mach Learn"},{"key":"417_CR34","doi-asserted-by":"publisher","first-page":"1205","DOI":"10.4155\/fmc.10.194","volume":"2","author":"EN Muratov","year":"2010","unstructured":"Muratov EN, Artemenko AG, Varlamova EV et al (2010) Per aspera ad astra: application of simplex QSAR approach in antiviral research. Future Med Chem 2:1205\u20131226","journal-title":"Future Med Chem"},{"key":"417_CR35","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1038\/nature17439","volume":"533","author":"P Raccuglia","year":"2016","unstructured":"Raccuglia P, Elbert KC, Adler PDF et al (2016) Machine-learning-assisted materials discovery using failed experiments. Nature 533:73","journal-title":"Nature"},{"key":"417_CR36","unstructured":"Kode-Chemoinformatics (2016) Dragon version 7.0.4"},{"key":"417_CR37","unstructured":"Frisch MJ, Trucks GW, Schlegel HB, et al (2016) Gaussian 09 Revision A.02"},{"key":"417_CR38","unstructured":"RDKit. http:\/\/rdkit.org\/. Accessed 1 Apr 2017"},{"key":"417_CR39","doi-asserted-by":"publisher","first-page":"1372","DOI":"10.1063\/1.464304","volume":"98","author":"AD Becke","year":"1993","unstructured":"Becke AD (1993) A new mixing of Hartree\u2013Fock and local density-functional theories. J Chem Phys 98:1372\u20131377. https:\/\/doi.org\/10.1063\/1.464304","journal-title":"J Chem Phys"},{"key":"417_CR40","doi-asserted-by":"publisher","first-page":"695","DOI":"10.1007\/s10895-018-2233-4","volume":"28","author":"C-H Chen","year":"2018","unstructured":"Chen C-H, Tanaka K, Funatsu K (2018) Random forest approach to QSPR study of fluorescence properties combining quantum chemical descriptors and solvent conditions. J Fluoresc 28:695\u2013706","journal-title":"J Fluoresc"},{"key":"417_CR41","doi-asserted-by":"publisher","first-page":"17128","DOI":"10.1021\/jp1097487","volume":"114","author":"A Marini","year":"2010","unstructured":"Marini A, Mu\u00f1oz-Losa A, Biancardi A, Mennucci B (2010) What is solvatochromism? J Phys Chem B 114:17128\u201317135. https:\/\/doi.org\/10.1021\/jp1097487","journal-title":"J Phys Chem B"},{"key":"417_CR42","doi-asserted-by":"publisher","first-page":"1800095","DOI":"10.1002\/minf.201800095","volume":"38","author":"C-H Chen","year":"2018","unstructured":"Chen C-H, Tanaka K, Funatsu K (2019) Random forest model with combined features: a practical approach to predict liquid-crystalline property. Mol Inform 38:1800095","journal-title":"Mol Inform"},{"key":"417_CR43","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"417_CR44","doi-asserted-by":"publisher","first-page":"2937","DOI":"10.1021\/ja01264a059","volume":"64","author":"SE Sheppard","year":"1942","unstructured":"Sheppard SE, Newsome PT (1942) The effect of solvents on the absorption spectra of dyes. II. Some dyes other than cyanines. J Am Chem Soc 64:2937\u20132946","journal-title":"J Am Chem Soc"},{"key":"417_CR45","volume-title":"Molecular structure and the properties of liquid crystals","author":"GW Gray","year":"1962","unstructured":"Gray GW (1962) Molecular structure and the properties of liquid crystals. Academic Press, Cambridge"},{"key":"417_CR46","volume-title":"Introduction to liquid crystals","author":"E Priestly","year":"2012","unstructured":"Priestly E (2012) Introduction to liquid crystals. Springer Science & Business Media, Berlin"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-0417-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s13321-020-0417-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-020-0417-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T00:00:02Z","timestamp":1617062402000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-020-0417-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,30]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["417"],"URL":"https:\/\/doi.org\/10.1186\/s13321-020-0417-9","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,30]]},"assertion":[{"value":"12 December 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 February 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 March 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"19"}}