{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,23]],"date-time":"2026-07-23T12:47:18Z","timestamp":1784810838713,"version":"3.55.0"},"reference-count":38,"publisher":"Walter de Gruyter GmbH","issue":"3","license":[{"start":{"date-parts":[[2022,8,26]],"date-time":"2022-08-26T00:00:00Z","timestamp":1661472000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["SFRH\/BD\/130913\/2017"],"award-info":[{"award-number":["SFRH\/BD\/130913\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UIDB\/04469\/2020"],"award-info":[{"award-number":["UIDB\/04469\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008530","name":"European Regional Development Fund","doi-asserted-by":"publisher","award":["NORTE-01-0247-FEDER- 039831"],"award-info":[{"award-number":["NORTE-01-0247-FEDER- 039831"]}],"id":[{"id":"10.13039\/501100008530","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Machine learning (ML) is increasingly being used to guide drug discovery processes. When applying ML approaches to chemical datasets, molecular descriptors and fingerprints are typically used to represent compounds as numerical vectors. However, in recent years, end-to-end deep learning (DL) methods that can learn feature representations directly from line notations or molecular graphs have been proposed as alternatives to using precomputed features. This study set out to investigate which compound representation methods are the most suitable for drug sensitivity prediction in cancer cell lines. Twelve different representations were benchmarked on 5 compound screening datasets, using DeepMol, a new chemoinformatics package developed by our research group, to perform these analyses. The results of this study show that the predictive performance of end-to-end DL models is comparable to, and at times surpasses, that of models trained on molecular fingerprints, even when less training data is available. This study also found that combining several compound representation methods into an ensemble can improve performance. Finally, we show that a <jats:italic>post hoc<\/jats:italic> feature attribution method can boost the explainability of the DL models.<\/jats:p>","DOI":"10.1515\/jib-2022-0006","type":"journal-article","created":{"date-parts":[[2022,8,26]],"date-time":"2022-08-26T08:04:47Z","timestamp":1661501087000},"source":"Crossref","is-referenced-by-count":50,"title":["Evaluating molecular representations in machine learning models for drug response prediction and interpretability"],"prefix":"10.1515","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9258-5303","authenticated-orcid":false,"given":"Delora","family":"Baptista","sequence":"first","affiliation":[{"name":"Centre of Biological Engineering , University of Minho, Campus of Gualtar , Braga , Portugal"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jo\u00e3o","family":"Correia","sequence":"additional","affiliation":[{"name":"Centre of Biological Engineering , University of Minho, Campus of Gualtar , Braga , Portugal"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bruno","family":"Pereira","sequence":"additional","affiliation":[{"name":"Centre of Biological Engineering , University of Minho, Campus of Gualtar , Braga , Portugal"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8439-8172","authenticated-orcid":false,"given":"Miguel","family":"Rocha","sequence":"additional","affiliation":[{"name":"Centre of Biological Engineering , University of Minho, Campus of Gualtar , Braga , Portugal"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"374","published-online":{"date-parts":[[2022,8,26]]},"reference":[{"key":"2023033120301377396_j_jib-2022-0006_ref_001","doi-asserted-by":"crossref","unstructured":"Ali, M, Aittokallio, T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev 2019;11:31\u20139. https:\/\/doi.org\/10.1007\/s12551-018-0446-z.","DOI":"10.1007\/s12551-018-0446-z"},{"key":"2023033120301377396_j_jib-2022-0006_ref_002","doi-asserted-by":"crossref","unstructured":"Adam, G, Ramp\u00e1\u0161ek, L, Safikhani, Z, Smirnov, P, Haibe-Kains, B, Goldenberg, A. Machine learning approaches to drug response prediction: challenges and recent progress. npj Precis Oncol 2020;4:19. https:\/\/doi.org\/10.1038\/s41698-020-0122-1.","DOI":"10.1038\/s41698-020-0122-1"},{"key":"2023033120301377396_j_jib-2022-0006_ref_003","doi-asserted-by":"crossref","unstructured":"Cereto-Massagu\u00e9, A, Ojeda, MJ, Valls, C, Mulero, M, Garcia-Vallv\u00e9, S, Pujadas, G. Molecular fingerprint similarity search in virtual screening. Methods 2015;71:58\u201363. https:\/\/doi.org\/10.1016\/j.ymeth.2014.08.005.","DOI":"10.1016\/j.ymeth.2014.08.005"},{"key":"2023033120301377396_j_jib-2022-0006_ref_004","unstructured":"Duvenaud, D, Maclaurin, D, Aguilera-Iparraguirre, J, G\u00f3mez-Bombarelli, R, Hirzel, T, Aspuru-Guzik, A, et al.. Convolutional networks on graphs for learning molecular fingerprints. J Chem Inf Model 2015;56:399\u2013411."},{"key":"2023033120301377396_j_jib-2022-0006_ref_005","doi-asserted-by":"crossref","unstructured":"Xiong, Z, Wang, D, Liu, X, Zhong, F, Wan, X, Li, X, et al.. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism. J Med Chem 2020;63:8749\u201360. https:\/\/doi.org\/10.1021\/acs.jmedchem.9b00959.","DOI":"10.1021\/acs.jmedchem.9b00959"},{"key":"2023033120301377396_j_jib-2022-0006_ref_006","doi-asserted-by":"crossref","unstructured":"Jaeger, S, Fulle, S, Turk, S. Mol2vec: unsupervised machine learning approach with chemical intuition. J Chem Inf Model 2018;58:27\u201335. https:\/\/doi.org\/10.1021\/acs.jcim.7b00616.","DOI":"10.1021\/acs.jcim.7b00616"},{"key":"2023033120301377396_j_jib-2022-0006_ref_007","doi-asserted-by":"crossref","unstructured":"Mayr, A, Klambauer, G, Unterthiner, T, Steijaert, M, Wegner, JK, Ceulemans, H, et al.. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem Sci 2018;9:5441\u201351. https:\/\/doi.org\/10.1039\/c8sc00148k.","DOI":"10.1039\/C8SC00148K"},{"key":"2023033120301377396_j_jib-2022-0006_ref_008","doi-asserted-by":"crossref","unstructured":"Jiang, D, Wu, Z, Hsieh, CY, Chen, G, Liao, B, Wang, Z, et al.. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J Cheminf 2021;13:12. https:\/\/doi.org\/10.1186\/s13321-020-00479-8.","DOI":"10.1186\/s13321-020-00479-8"},{"key":"2023033120301377396_j_jib-2022-0006_ref_009","doi-asserted-by":"crossref","unstructured":"Hop, P, Allgood, B, Yu, J. Geometric deep learning autonomously learns chemical features that outperform those engineered by domain experts. Mol Pharm 2018;15:4371\u20137. https:\/\/doi.org\/10.1021\/acs.molpharmaceut.7b01144.","DOI":"10.1021\/acs.molpharmaceut.7b01144"},{"key":"2023033120301377396_j_jib-2022-0006_ref_010","doi-asserted-by":"crossref","unstructured":"Zagidullin, B, Wang, Z, Guan, Y, Pitk\u00e4nen, E, Tang, J. Comparative analysis of molecular fingerprints in prediction of drug combination effects. Briefings Bioinf 2021;22:bbab291. https:\/\/doi.org\/10.1093\/bib\/bbab291.","DOI":"10.1093\/bib\/bbab291"},{"key":"2023033120301377396_j_jib-2022-0006_ref_011","doi-asserted-by":"crossref","unstructured":"Wu, Z, Ramsundar, B, Feinberg, EN, Gomes, J, Geniesse, C, Pappu, AS, et al.. MoleculeNet: a benchmark for molecular machine learning. Chem Sci 2018;9:513\u201330. https:\/\/doi.org\/10.1039\/c7sc02664a.","DOI":"10.1039\/C7SC02664A"},{"key":"2023033120301377396_j_jib-2022-0006_ref_012","unstructured":"Pappu, A, Paige, B. Making graph neural networks worth it for low-data molecular machine learning. In: Machine learning for molecules workshop @ NeurIPS 2020; 2020. Available from: http:\/\/arxiv.org\/abs\/2011.12203."},{"key":"2023033120301377396_j_jib-2022-0006_ref_013","doi-asserted-by":"crossref","unstructured":"Yang, K, Swanson, K, Jin, W, Coley, C, Eiden, P, Gao, H, et al.. Analyzing learned molecular representations for property prediction. J Chem Inf Model 2019;59:3370\u201388. https:\/\/doi.org\/10.1021\/acs.jcim.9b00237.","DOI":"10.1021\/acs.jcim.9b00237"},{"key":"2023033120301377396_j_jib-2022-0006_ref_014","doi-asserted-by":"crossref","unstructured":"Pan, S, Wu, J, Zhu, X, Long, G, Zhang, C. Finding the best not the most: regularized loss minimization subgraph selection for graph classification. Pattern Recogn 2015;48:3783\u201396. https:\/\/doi.org\/10.1016\/j.patcog.2015.05.019.","DOI":"10.1016\/j.patcog.2015.05.019"},{"key":"2023033120301377396_j_jib-2022-0006_ref_015","doi-asserted-by":"crossref","unstructured":"Cort\u00e9s-Ciriano, I, Bender, A. KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminf 2019;11:41. https:\/\/doi.org\/10.1186\/s13321-019-0364-5.","DOI":"10.1186\/s13321-019-0364-5"},{"key":"2023033120301377396_j_jib-2022-0006_ref_016","doi-asserted-by":"crossref","unstructured":"Mendez, D, Gaulton, A, Bento, AP, Chambers, J, De Veij, M, F\u00e9lix, E, et al.. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 2019;47:D930\u201340. https:\/\/doi.org\/10.1093\/nar\/gky1075.","DOI":"10.1093\/nar\/gky1075"},{"key":"2023033120301377396_j_jib-2022-0006_ref_017","doi-asserted-by":"crossref","unstructured":"Yang, W, Soares, J, Greninger, P, Edelman, EJ, Lightfoot, H, Forbes, S, et al.. Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res 2013;41:D955\u201361. https:\/\/doi.org\/10.1093\/nar\/gks1111.","DOI":"10.1093\/nar\/gks1111"},{"key":"2023033120301377396_j_jib-2022-0006_ref_018","doi-asserted-by":"crossref","unstructured":"Seashore-Ludlow, B, Rees, MG, Cheah, JH, Coko, M, Price, EV, Coletti, ME, et al.. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov 2015;5:1210\u201323. https:\/\/doi.org\/10.1158\/2159-8290.cd-15-0235.","DOI":"10.1158\/2159-8290.CD-15-0235"},{"key":"2023033120301377396_j_jib-2022-0006_ref_019","doi-asserted-by":"crossref","unstructured":"Bento, AP, Hersey, A, F\u00e9lix, E, Landrum, G, Gaulton, A, Atkinson, F, et al.. An open source chemical structure curation pipeline using RDKit. J Cheminf 2020;12:51. https:\/\/doi.org\/10.1186\/s13321-020-00456-1.","DOI":"10.1186\/s13321-020-00456-1"},{"key":"2023033120301377396_j_jib-2022-0006_ref_020","doi-asserted-by":"crossref","unstructured":"Rogers, D, Hahn, M. Extended-connectivity fingerprints. J Chem Inf Model 2010;50:742\u201354. https:\/\/doi.org\/10.1021\/ci100050t.","DOI":"10.1021\/ci100050t"},{"key":"2023033120301377396_j_jib-2022-0006_ref_021","doi-asserted-by":"crossref","unstructured":"Morgan, HL. The generation of a unique machine description for chemical structures-A technique developed at chemical abstracts service. J Chem Doc 1965;5:107\u201313. https:\/\/doi.org\/10.1021\/c160017a018.","DOI":"10.1021\/c160017a018"},{"key":"2023033120301377396_j_jib-2022-0006_ref_022","doi-asserted-by":"crossref","unstructured":"Durant, JL, Leland, BA, Henry, DR, Nourse, JG. Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 2002;42:1273\u201380. https:\/\/doi.org\/10.1021\/ci010132r.","DOI":"10.1021\/ci010132r"},{"key":"2023033120301377396_j_jib-2022-0006_ref_023","doi-asserted-by":"crossref","unstructured":"Carhart, RE, Smith, DH, Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 1985;25:64\u201373. https:\/\/doi.org\/10.1021\/ci00046a002.","DOI":"10.1021\/ci00046a002"},{"key":"2023033120301377396_j_jib-2022-0006_ref_024","unstructured":"Landrum, G. RDKit: Open-source cheminformatics; 2006. Available from: https:\/\/www.rdkit.org\/."},{"key":"2023033120301377396_j_jib-2022-0006_ref_025","doi-asserted-by":"crossref","unstructured":"Kim, Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics; 2014:1746\u201351 pp.","DOI":"10.3115\/v1\/D14-1181"},{"key":"2023033120301377396_j_jib-2022-0006_ref_026","unstructured":"Ramsundar, B, Eastman, P, Walters, P, Pande, V, Leswing, K, Wu, Z. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More. Sebastopol, CA: O\u2019Reilly Media; 2019."},{"key":"2023033120301377396_j_jib-2022-0006_ref_027","unstructured":"Kipf, TN, Welling, M. Semi-supervised classification with graph convolutional networks. In: 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, conference track proceedings; 2017. Available from: OpenReview.net."},{"key":"2023033120301377396_j_jib-2022-0006_ref_028","unstructured":"Velickovic, P, Cucurull, G, Casanova, A, Romero, A, Li\u00f2, P, Bengio, Y. Graph attention networks. In: 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30\u2013May 3, 2018, conference track proceedings; 2018. Available from: OpenReview.net."},{"key":"2023033120301377396_j_jib-2022-0006_ref_029","unstructured":"Kingma, DP, Ba, J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd international conference on learning representations; 2014."},{"key":"2023033120301377396_j_jib-2022-0006_ref_030","unstructured":"Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I, Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15:1929\u201358."},{"key":"2023033120301377396_j_jib-2022-0006_ref_031","unstructured":"Lundberg, SM, Lee, SI. A unified approach to interpreting model predictions. In: Guyon, I, Luxburg, UV, Bengio, S, Wallach, H, Fergus, R, Vishwanathan, S, editors, et al.. Advances in neural information rocessing systems 30. Red Hook, NY: Curran Associates, Inc.; 2017:4765\u201374 pp."},{"key":"2023033120301377396_j_jib-2022-0006_ref_032","unstructured":"Shrikumar, A, Greenside, P, Kundaje, A. Learning important features through propagating activation differences. In: Proceedings of the 34th international conference on machine learning-volume 70; 2017:3145\u201353 pp. JMLR. org."},{"key":"2023033120301377396_j_jib-2022-0006_ref_033","unstructured":"Abadi, M, Barham, P, Chen, J, Chen, Z, Davis, A, Dean, J, et al.. Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX symposium on operating systems design and implementation, vol 16; 2016. p.\u00a0265\u201383."},{"key":"2023033120301377396_j_jib-2022-0006_ref_034","unstructured":"Chollet, F, et al.. Keras; 2015. Available from: https:\/\/keras.io."},{"key":"2023033120301377396_j_jib-2022-0006_ref_035","unstructured":"Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, et al.. Scikit-learn: machine learning in Python. J Mach Learn Res 2012;12:2825\u201330."},{"key":"2023033120301377396_j_jib-2022-0006_ref_036","doi-asserted-by":"crossref","unstructured":"McLoughlin, EC, O\u2019Boyle, NM. Colchicine-binding site inhibitors from chemistry to clinic: a review. Pharmaceuticals 2020;13:8. https:\/\/doi.org\/10.3390\/ph13010008.","DOI":"10.3390\/ph13010008"},{"key":"2023033120301377396_j_jib-2022-0006_ref_037","doi-asserted-by":"crossref","unstructured":"Nguyen, TL, McGrath, C, Hermone, AR, Burnett, JC, Zaharevitz, DW, Day, BW, et al.. A common pharmacophore for a diverse set of colchicine site inhibitors using a structure-based approach. J Med Chem 2005;48:6107\u201316. https:\/\/doi.org\/10.1021\/jm058275i.","DOI":"10.1021\/jm050502t"},{"key":"2023033120301377396_j_jib-2022-0006_ref_038","unstructured":"Ying, R, Bourgeois, D, You, J, Zitnik, M, Leskovec, J. Gnnexplainer: generating explanations for graph neural networks. Adv Neural Inf Process Syst 2019;32:9240."}],"container-title":["Journal of Integrative Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2022-0006\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2022-0006\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,1]],"date-time":"2023-04-01T09:53:35Z","timestamp":1680342815000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2022-0006\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,26]]},"references-count":38,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,4,8]]},"published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1515\/jib-2022-0006"],"URL":"https:\/\/doi.org\/10.1515\/jib-2022-0006","relation":{},"ISSN":["1613-4516"],"issn-type":[{"value":"1613-4516","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,26]]},"article-number":"20220006"}}