{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T11:41:48Z","timestamp":1769946108514,"version":"3.49.0"},"reference-count":40,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2018,12,5]],"date-time":"2018-12-05T00:00:00Z","timestamp":1543968000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,2,25]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We present a flexible deep convolutional neural network method for the analysis of arbitrary sized graph structures representing molecules. This method, which makes use of the Lipinski RDKit module, an open-source cheminformatics software, enables the incorporation of any global molecular (such as molecular charge and molecular weight) and local (such as atom hybridization and bond orders) information. In this paper, we show that this method significantly outperforms another recently proposed method based on deep convolutional neural networks on several datasets that are studied. Several best practices for training deep convolutional neural networks on chemical datasets are also highlighted within the article, such as how to select the information to be included in the model, how to prevent overfitting and how unbalanced classes in the data can be handled.<\/jats:p>","DOI":"10.1515\/jib-2018-0065","type":"journal-article","created":{"date-parts":[[2018,12,5]],"date-time":"2018-12-05T09:03:32Z","timestamp":1544000612000},"source":"Crossref","is-referenced-by-count":13,"title":["Deep Convolutional Neural Networks for the Prediction of Molecular Properties: Challenges and Opportunities Connected to the Data"],"prefix":"10.1515","volume":"16","author":[{"given":"Niclas","family":"St\u00e5hl","sequence":"first","affiliation":[{"name":"School of Informatics , University of Sk\u00f6vde , H\u00f6gskolev\u00e4gen 28, SE 54145 , Sk\u00f6vde , Sweden"}]},{"given":"G\u00f6ran","family":"Falkman","sequence":"additional","affiliation":[{"name":"School of Informatics , University of Sk\u00f6vde , Sk\u00f6vde , Sweden"}]},{"given":"Alexander","family":"Karlsson","sequence":"additional","affiliation":[{"name":"School of Informatics , University of Sk\u00f6vde , Sk\u00f6vde , Sweden"}]},{"given":"Gunnar","family":"Mathiason","sequence":"additional","affiliation":[{"name":"School of Informatics , University of Sk\u00f6vde , Sk\u00f6vde , Sweden"}]},{"given":"Jonas","family":"Bostr\u00f6m","sequence":"additional","affiliation":[{"name":"Department of Medicinal Chemistry , CVMD iMED, AstraZeneca , M\u00f6lndal , Sweden"}]}],"member":"374","published-online":{"date-parts":[[2018,12,5]]},"reference":[{"key":"2023033120511952090_j_jib-2018-0065_ref_001_w2aab3b7b3b1b6b1ab1b7b1Aa","doi-asserted-by":"crossref","unstructured":"Dickson M, Gagnon JP. Key factors in the rising cost of new drug discovery and development. Nat Rev Drug Discov 2004;3:417\u201329.1513678910.1038\/nrd1382","DOI":"10.1038\/nrd1382"},{"key":"2023033120511952090_j_jib-2018-0065_ref_002_w2aab3b7b3b1b6b1ab1b7b2Aa","doi-asserted-by":"crossref","unstructured":"Jorgensen WL. The many roles of computation in drug discovery. Science 2004;303:1813\u20138.10.1126\/science.109636115031495","DOI":"10.1126\/science.1096361"},{"key":"2023033120511952090_j_jib-2018-0065_ref_003_w2aab3b7b3b1b6b1ab1b7b3Aa","doi-asserted-by":"crossref","unstructured":"Varnek A, Baskin I. Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf and Model 2012;52:1413\u201337.10.1021\/ci200409x","DOI":"10.1021\/ci200409x"},{"key":"2023033120511952090_j_jib-2018-0065_ref_004_w2aab3b7b3b1b6b1ab1b7b4Aa","unstructured":"Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, NIPS\u201912. Curran Associates Inc., 2012:1097\u2013105."},{"key":"2023033120511952090_j_jib-2018-0065_ref_005_w2aab3b7b3b1b6b1ab1b7b5Aa","doi-asserted-by":"crossref","unstructured":"Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Mag 2012;29:82\u201397.10.1109\/MSP.2012.2205597","DOI":"10.1109\/MSP.2012.2205597"},{"key":"2023033120511952090_j_jib-2018-0065_ref_006_w2aab3b7b3b1b6b1ab1b7b6Aa","doi-asserted-by":"crossref","unstructured":"LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436\u201344.2601744210.1038\/nature14539","DOI":"10.1038\/nature14539"},{"key":"2023033120511952090_j_jib-2018-0065_ref_007_w2aab3b7b3b1b6b1ab1b7b7Aa","doi-asserted-by":"crossref","unstructured":"Gawehn E, Hiss JA, Schneider G. Deep learning in drug discovery. Mol Inform 2016;35:3\u201314.10.1002\/minf.20150100827491648","DOI":"10.1002\/minf.201501008"},{"key":"2023033120511952090_j_jib-2018-0065_ref_008_w2aab3b7b3b1b6b1ab1b7b8Aa","doi-asserted-by":"crossref","unstructured":"Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 2003;43:1947\u201358.10.1021\/ci034160g","DOI":"10.1021\/ci034160g"},{"key":"2023033120511952090_j_jib-2018-0065_ref_009_w2aab3b7b3b1b6b1ab1b7b9Aa","doi-asserted-by":"crossref","unstructured":"Bradford JR, Westhead DR. Improved prediction of protein\u2013protein binding sites using a support vector machines approach. Bioinformatics 2005;21:1487\u201394.1561338410.1093\/bioinformatics\/bti242","DOI":"10.1093\/bioinformatics\/bti242"},{"key":"2023033120511952090_j_jib-2018-0065_ref_010_w2aab3b7b3b1b6b1ab1b7c10Aa","doi-asserted-by":"crossref","unstructured":"Zheng W, Tropsha A. Novel variable selection quantitative structure- property relationship approach based on the k-nearest-neighbor principle. J Chem Inf Comput Sci 2000;40:185\u201394.1066156610.1021\/ci980033m","DOI":"10.1021\/ci980033m"},{"key":"2023033120511952090_j_jib-2018-0065_ref_011_w2aab3b7b3b1b6b1ab1b7c11Aa","doi-asserted-by":"crossref","unstructured":"Gasteiger J, Zupan J. Neural networks in chemistry. Angewandte Chem Int Ed Engl 1993;32:503\u201327.10.1002\/anie.199305031","DOI":"10.1002\/anie.199305031"},{"key":"2023033120511952090_j_jib-2018-0065_ref_012_w2aab3b7b3b1b6b1ab1b7c12Aa","doi-asserted-by":"crossref","unstructured":"Burbidge R, Trotter M, Buxton B, Holden S. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers & Chemistry 2001;26:5\u201314.10.1016\/S0097-8485(01)00094-811765851","DOI":"10.1016\/S0097-8485(01)00094-8"},{"key":"2023033120511952090_j_jib-2018-0065_ref_013_w2aab3b7b3b1b6b1ab1b7c13Aa","doi-asserted-by":"crossref","unstructured":"Mitchell JBO. Machine learning methods in chemoinformatics. Wiley Interdiscip Rev: Comput Mol Sci 2014;4:468\u201381.25285160","DOI":"10.1002\/wcms.1183"},{"key":"2023033120511952090_j_jib-2018-0065_ref_014_w2aab3b7b3b1b6b1ab1b7c14Aa","doi-asserted-by":"crossref","unstructured":"Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf and Model 2010;50:742\u201354.10.1021\/ci100050t","DOI":"10.1021\/ci100050t"},{"key":"2023033120511952090_j_jib-2018-0065_ref_015_w2aab3b7b3b1b6b1ab1b7c15Aa","doi-asserted-by":"crossref","unstructured":"Huuskonen J, Salo M, Taskinen J. Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J Chem Inf Comput Sci 1998;38:450\u20136.961178510.1021\/ci970100x","DOI":"10.1021\/ci970100x"},{"key":"2023033120511952090_j_jib-2018-0065_ref_016_w2aab3b7b3b1b6b1ab1b7c16Aa","doi-asserted-by":"crossref","unstructured":"Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V. Deep neural nets as a method for quantitative structure\u2013activity relationships. J Chem Inf and Model 2015;55:263\u201374.10.1021\/ci500747n","DOI":"10.1021\/ci500747n"},{"key":"2023033120511952090_j_jib-2018-0065_ref_017_w2aab3b7b3b1b6b1ab1b7c17Aa","doi-asserted-by":"crossref","unstructured":"Ekins S. The next era: Deep learning in pharmaceutical research. Pharm Res 2016;33:2594\u2013603.2759999110.1007\/s11095-016-2029-7","DOI":"10.1007\/s11095-016-2029-7"},{"key":"2023033120511952090_j_jib-2018-0065_ref_018_w2aab3b7b3b1b6b1ab1b7c18Aa","doi-asserted-by":"crossref","unstructured":"Mayr A, Klambauer G, Unterthiner T, Hochreiter S. Deeptox: toxicity prediction using deep learning. Front Environ Sci 2016;3:80.","DOI":"10.3389\/fenvs.2015.00080"},{"key":"2023033120511952090_j_jib-2018-0065_ref_019_w2aab3b7b3b1b6b1ab1b7c19Aa","unstructured":"Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, Adams RP. Convolutional networks on graphs for learning molecular fingerprints. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds., Advances in Neural Information Processing Systems 28. Curran Associates, Inc., 2015:2224\u201332. URL http:\/\/papers.nips.cc\/paper\/5954-convolutional-networks-on-graphs-for-learning-molecular-fingerprints.pdf."},{"key":"2023033120511952090_j_jib-2018-0065_ref_020_w2aab3b7b3b1b6b1ab1b7c20Aa","doi-asserted-by":"crossref","unstructured":"Kearnes S, McCloskey K, Berndl M, Pande V, Riley P. Molecular graph convolutions: moving beyond fingerprints. J Comput-Aided Mol Des 2016;30:595\u2013608. ISSN 1573-4951. doi: 10.1007\/s10822-016-9938-8. URL http:\/\/dx.doi.org\/10.1007\/s10822-016-9938-8.27558503","DOI":"10.1007\/s10822-016-9938-8"},{"key":"2023033120511952090_j_jib-2018-0065_ref_021_w2aab3b7b3b1b6b1ab1b7c21Aa","doi-asserted-by":"crossref","unstructured":"Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V. MoleculeNet: a benchmark for molecular machine learning. Chemical Science. 2018;9(2):513\u2013530.2962911810.1039\/C7SC02664A","DOI":"10.1039\/C7SC02664A"},{"key":"2023033120511952090_j_jib-2018-0065_ref_022_w2aab3b7b3b1b6b1ab1b7c22Aa","doi-asserted-by":"crossref","unstructured":"Chen JJ, Tsai CA, Young JF, Kodell RL. Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res 2005;16:517\u201329.10.1080\/1065936050046846816428129","DOI":"10.1080\/10659360500468468"},{"key":"2023033120511952090_j_jib-2018-0065_ref_023_w2aab3b7b3b1b6b1ab1b7c23Aa","doi-asserted-by":"crossref","unstructured":"Kuhn M, Letunic I, Jensen LJ, Bork P. The sider database of drugs and side effects. Nucleic Acids Research 2016;44:D1075\u20139.2648135010.1093\/nar\/gkv1075","DOI":"10.1093\/nar\/gkv1075"},{"key":"2023033120511952090_j_jib-2018-0065_ref_024_w2aab3b7b3b1b6b1ab1b7c24Aa","doi-asserted-by":"crossref","unstructured":"Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology 1982;143:29\u201336.10.1148\/radiology.143.1.7063747","DOI":"10.1148\/radiology.143.1.7063747"},{"key":"2023033120511952090_j_jib-2018-0065_ref_025_w2aab3b7b3b1b6b1ab1b7c25Aa","unstructured":"Dahl GE, Jaitly N, Salakhutdinov R. Multi-task neural networks for QSAR predictions. arXiv preprint arXiv:1406.1231 2014."},{"key":"2023033120511952090_j_jib-2018-0065_ref_026_w2aab3b7b3b1b6b1ab1b7c26Aa","doi-asserted-by":"crossref","unstructured":"Lusci A, Pollastri G, Baldi P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf and Model 2013;53:1563\u201375.10.1021\/ci400187y","DOI":"10.1021\/ci400187y"},{"key":"2023033120511952090_j_jib-2018-0065_ref_027_w2aab3b7b3b1b6b1ab1b7c27Aa","doi-asserted-by":"crossref","unstructured":"Xu Y, Dai Z, Chen F, Gao S, Pei J, Lai L. Deep learning for drug-induced liver injury. J Chem Inf and Model 2015;55:2085\u201393.10.1021\/acs.jcim.5b00238","DOI":"10.1021\/acs.jcim.5b00238"},{"key":"2023033120511952090_j_jib-2018-0065_ref_028_w2aab3b7b3b1b6b1ab1b7c28Aa","unstructured":"Wallach I, Dzamba M, Heifets A. Atomnet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint arXiv:1510.02855 2015."},{"key":"2023033120511952090_j_jib-2018-0065_ref_029_w2aab3b7b3b1b6b1ab1b7c29Aa","doi-asserted-by":"crossref","unstructured":"Segler MHS, Kogej T, Tyrchan C, Waller MP. Generating focussed molecule libraries for drug discovery with recurrent neural networks. arXiv preprint arXiv:1701.01329 2017.","DOI":"10.1021\/acscentsci.7b00512"},{"key":"2023033120511952090_j_jib-2018-0065_ref_030_w2aab3b7b3b1b6b1ab1b7c30Aa","unstructured":"Weininger D. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. In: Proc. Edinburgh Math. SOC. vol. 17. 1970:1\u201314."},{"key":"2023033120511952090_j_jib-2018-0065_ref_031_w2aab3b7b3b1b6b1ab1b7c31Aa","unstructured":"Graves A. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 2013."},{"key":"2023033120511952090_j_jib-2018-0065_ref_032_w2aab3b7b3b1b6b1ab1b7c32Aa","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770\u201378.","DOI":"10.1109\/CVPR.2016.90"},{"key":"2023033120511952090_j_jib-2018-0065_ref_033_w2aab3b7b3b1b6b1ab1b7c33Aa","unstructured":"Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15:1929\u201358."},{"key":"2023033120511952090_j_jib-2018-0065_ref_034_w2aab3b7b3b1b6b1ab1b7c34Aa","unstructured":"Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. In: Proc. ICML. vol. 30. 2013."},{"key":"2023033120511952090_j_jib-2018-0065_ref_035_w2aab3b7b3b1b6b1ab1b7c35Aa","unstructured":"Landrum G. Rdkit: Open-source cheminformatics. Online). http:\/\/www.rdkit.org. Accessed, 3(04):2012, 2006."},{"key":"2023033120511952090_j_jib-2018-0065_ref_036_w2aab3b7b3b1b6b1ab1b7c36Aa","doi-asserted-by":"crossref","unstructured":"Schmidt CW. Tox 21: new dimensions of toxicity testing. Environ Health Perspect 2009;117:A348.19672388","DOI":"10.1289\/ehp.117-a348"},{"key":"2023033120511952090_j_jib-2018-0065_ref_037_w2aab3b7b3b1b6b1ab1b7c37Aa","unstructured":"Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs\/1605.02688, 2016. URL http:\/\/arxiv.org\/abs\/1605.02688."},{"key":"2023033120511952090_j_jib-2018-0065_ref_038_w2aab3b7b3b1b6b1ab1b7c38Aa","unstructured":"Kingma D, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014."},{"key":"2023033120511952090_j_jib-2018-0065_ref_039_w2aab3b7b3b1b6b1ab1b7c39Aa","doi-asserted-by":"crossref","unstructured":"Shi T, Horvath S. Unsupervised learning with random forest predictors. J Comput Graph Stat 2006;15:118\u201338.10.1198\/106186006X94072","DOI":"10.1198\/106186006X94072"},{"key":"2023033120511952090_j_jib-2018-0065_ref_040_w2aab3b7b3b1b6b1ab1b7c40Aa","unstructured":"Ganganwar V. An overview of classification algorithms for imbalanced datasets. Int J Emer Tech Adv Engg 2012;2:42\u20137."}],"container-title":["Journal of Integrative Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/view\/journals\/jib\/16\/1\/article-20180065.xml","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2018-0065\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2018-0065\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,1]],"date-time":"2023-04-01T10:16:34Z","timestamp":1680344194000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/jib-2018-0065\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,5]]},"references-count":40,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2018,12,22]]},"published-print":{"date-parts":[[2019,2,25]]}},"alternative-id":["10.1515\/jib-2018-0065"],"URL":"https:\/\/doi.org\/10.1515\/jib-2018-0065","relation":{},"ISSN":["1613-4516"],"issn-type":[{"value":"1613-4516","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,5]]},"article-number":"20180065"}}