{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T09:11:39Z","timestamp":1772701899662,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2013,7,12]],"date-time":"2013-07-12T00:00:00Z","timestamp":1373587200000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE).<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Machine learning models were trained with a data set of &gt;12,000 BDEs calculated by B3LYP\/6-311++G(d,p)\/\/DFTB. Descriptors were designed to encode atom types and connectivity in the 2D topological environment of the bonds. The best model, an Associative Neural Network (ASNN) based on 85 bond descriptors, was able to predict the BDE of 887 bonds in an independent test set (covering a range of 17.67\u2013202.30\u00a0kcal\/mol) with RMSD of 5.29\u00a0kcal\/mol, mean absolute deviation of 3.35\u00a0kcal\/mol, and <jats:italic>R<\/jats:italic>\n              <jats:sup>2<\/jats:sup>\u2009=\u20090.953. The predictions were compared with semi-empirical PM6 calculations, and were found to be superior for all types of bonds in the data set, except for O-H, N-H, and N-N bonds. The B3LYP\/6-311++G(d,p)\/\/DFTB calculations can approach the higher-level calculations B3LYP\/6-311++G(3df,2p)\/\/B3LYP\/6-31G(d,p) with an RMSD of 3.04\u00a0kcal\/mol, which is less than the RMSD of ASNN (against both DFT methods). An experimental web service for on-line prediction of BDEs is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/joao.airesdesousa.com\/bde\" ext-link-type=\"uri\">http:\/\/joao.airesdesousa.com\/bde<\/jats:ext-link>.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Knowledge could be automatically extracted by machine learning techniques from a data set of calculated BDEs, providing ultra-fast access to accurate estimations of DFT-calculated BDEs. This demonstrates how to extract value from large volumes of data currently being produced by quantum chemistry calculations at an increasing speed mostly without human intervention. In this way, high-level theoretical quantum calculations can be used in large-scale applications that otherwise would not afford the intrinsic computational cost.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1758-2946-5-34","type":"journal-article","created":{"date-parts":[[2013,7,12]],"date-time":"2013-07-12T18:17:50Z","timestamp":1373653070000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":63,"title":["A big data approach to the ultra-fast prediction of DFT-calculated bond energies"],"prefix":"10.1186","volume":"5","author":[{"given":"Xiaohui","family":"Qu","sequence":"first","affiliation":[]},{"given":"Diogo ARS","family":"Latino","sequence":"additional","affiliation":[]},{"given":"Joao","family":"Aires-de-Sousa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2013,7,12]]},"reference":[{"key":"475_CR1","doi-asserted-by":"publisher","first-page":"931","DOI":"10.1021\/ct100684s","volume":"7","author":"M Gaus","year":"2011","unstructured":"Gaus M, Cui Q, Elstner M: DFTB3: extension of the Self-Consistent-Charge Density-Functional Tight-Binding Method (SCC-DFTB). J Chem Theory Comput. 2011, 7: 931-948. 10.1021\/ct100684s.","journal-title":"J Chem Theory Comput"},{"key":"475_CR2","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1021\/cr200107z","volume":"112","author":"AJ Cohen","year":"2012","unstructured":"Cohen AJ, Mori-S\u00e1nchez P, Yang WT: Challenges for density functional theory. Chem Rev. 2012, 112: 289-320. 10.1021\/cr200107z.","journal-title":"Chem Rev"},{"key":"475_CR3","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L: Random forests. Mach Learn. 2001, 45: 5-32. 10.1023\/A:1010933404324.","journal-title":"Mach Learn"},{"key":"475_CR4","doi-asserted-by":"publisher","first-page":"717","DOI":"10.1021\/ci010379o","volume":"42","author":"IV Tetko","year":"2002","unstructured":"Tetko IV: Neural network studies. 4. Introduction to associative neural networks. J Chem Inf Comput Sci. 2002, 42: 717-728. 10.1021\/ci010379o.","journal-title":"J Chem Inf Comput Sci"},{"key":"475_CR5","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.1021\/ja002455u","volume":"123","author":"JS Wright","year":"2001","unstructured":"Wright JS, Johnson ER, DiLabio GA: Predicting the activity of phenolic antioxidants: theoretical method, analysis of substituent effects, and application to major families of antioxidants. J Am Chem Soc. 2001, 123: 1173-1183. 10.1021\/ja002455u.","journal-title":"J Am Chem Soc"},{"key":"475_CR6","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1016\/j.ejmech.2012.08.017","volume":"56","author":"KLM Drew","year":"2012","unstructured":"Drew KLM, Reynisson J: The impact of carbon\u2013hydrogen bond dissociation energies on the prediction of the cytochrome P450 mediated major metabolic site of drug-like compounds. Eur J Med Chem. 2012, 56: 48-55.","journal-title":"Eur J Med Chem"},{"key":"475_CR7","doi-asserted-by":"publisher","first-page":"3111","DOI":"10.1002\/rcm.2177","volume":"19","author":"AW Hill","year":"2005","unstructured":"Hill AW, Mortishire-Smith RJ: Automated assignment of high-resolution collisionally activated dissociation mass spectra using a systematic bond disconnection approach. Rapid Commun Mass Spectrom. 2005, 19: 3111-3118. 10.1002\/rcm.2177.","journal-title":"Rapid Commun Mass Spectrom"},{"key":"475_CR8","doi-asserted-by":"publisher","first-page":"1222","DOI":"10.1021\/ci000387p","volume":"40","author":"A Cherkasov","year":"2000","unstructured":"Cherkasov A, Jonsson M: A new method for estimation of homolytic C-H bond dissociation enthalpies. J Chem Inf Comput Sci. 2000, 40: 1222-1226. 10.1021\/ci000387p.","journal-title":"J Chem Inf Comput Sci"},{"key":"475_CR9","doi-asserted-by":"publisher","first-page":"669","DOI":"10.1021\/ci034248u","volume":"44","author":"CX Xue","year":"2004","unstructured":"Xue CX, Zhang RS, Liu HX, Yao XJ, Liu MC, Hu ZC, Fan BT: An accurate QSPR study of O-H bond dissociation energy in substituted phenols based on support vector machines. J Chem Inf Comput Sci. 2004, 44: 669-677. 10.1021\/ci034248u.","journal-title":"J Chem Inf Comput Sci"},{"key":"475_CR10","doi-asserted-by":"publisher","first-page":"5717","DOI":"10.1002\/ejoc.200700419","volume":"2007","author":"A Stanger","year":"2007","unstructured":"Stanger A: A simple and intuitive description of C\u2013H bond energies. Eur J Org Chem. 2007, 2007: 5717-5725. 10.1002\/ejoc.200700419.","journal-title":"Eur J Org Chem"},{"key":"475_CR11","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1016\/j.theochem.2010.06.012","volume":"955","author":"KR Przybylak","year":"2010","unstructured":"Przybylak KR, Cronin MTD: Correlation between bond dissociation energies and spin distribution. J Mol Struct. 2010, 955: 165-170. 10.1016\/j.theochem.2010.06.012.","journal-title":"J Mol Struct"},{"key":"475_CR12","doi-asserted-by":"publisher","first-page":"3129","DOI":"10.1021\/jo035306d","volume":"69","author":"Y Feng","year":"2004","unstructured":"Feng Y, Liu L, Wang JT, Zhao SW, Guo QX: Homolytic C-H and N-H bond dissociation energies of strained organic compounds. J Org Chem. 2004, 69: 3129-3138. 10.1021\/jo035306d.","journal-title":"J Org Chem"},{"key":"475_CR13","doi-asserted-by":"publisher","first-page":"754","DOI":"10.1002\/qua.21522","volume":"108","author":"JVA dos Santos","year":"2008","unstructured":"dos Santos JVA, Newton AS, Bernardino R, Guedes RC: Substituent effects on O\u2013H and S\u2013H bond dissociation enthalpies of disubstituted phenols and thiophenols. Int J Quantum Chem. 2008, 108: 754-761.","journal-title":"Int J Quantum Chem"},{"key":"475_CR14","doi-asserted-by":"publisher","first-page":"1757","DOI":"10.1021\/ci3001277","volume":"52","author":"JJ Irwin","year":"2012","unstructured":"Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inf Model. 2012, 52: 1757-1768. 10.1021\/ci3001277.","journal-title":"J Chem Inf Model"},{"key":"475_CR15","doi-asserted-by":"publisher","first-page":"987","DOI":"10.1016\/S1359-6446(05)03511-7","volume":"10","author":"RAE Carr","year":"2005","unstructured":"Carr RAE, Congreve M, Murray CW, Rees DC: Fragment-based lead discovery: leads by design. Drug Discov Today. 2005, 10: 987-992. 10.1016\/S1359-6446(05)03511-7.","journal-title":"Drug Discov Today"},{"key":"475_CR16","unstructured":"ChemAxon. JChem. 5.8.2 [http:\/\/www.chemaxon.com] (accessed February 2012)"},{"key":"475_CR17","doi-asserted-by":"publisher","first-page":"5678","DOI":"10.1021\/jp070186p","volume":"111","author":"B Aradi","year":"2007","unstructured":"Aradi B, Hourahine B, Frauenheim T: DFTB+, a sparse matrix-based implementation of the DFTB method. J Phys Chem A. 2007, 111: 5678-5684. 10.1021\/jp070186p.","journal-title":"J Phys Chem A"},{"key":"475_CR18","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1109\/5992.998641","volume":"4","author":"SR Bahn","year":"2002","unstructured":"Bahn SR, Jacobsen KW: An object-oriented scripting interface to a legacy electronic structure code. Comput Sci Eng. 2002, 4: 56-66.","journal-title":"Comput Sci Eng"},{"key":"475_CR19","volume-title":"Numerical Optimization","author":"J Nocedal","year":"2006","unstructured":"Nocedal J, Wright SJ: Numerical Optimization. 2006, New York: Springer, 2","edition":"2"},{"key":"475_CR20","doi-asserted-by":"publisher","first-page":"170201","DOI":"10.1103\/PhysRevLett.97.170201","volume":"97","author":"E Bitzek","year":"2006","unstructured":"Bitzek E, Koskinen P, G\u00e4hler F, Moseler M, Gumbsch P: Structural relaxation made simple. Phys Rev Lett. 2006, 97: 170201-","journal-title":"Phys Rev Lett"},{"key":"475_CR21","doi-asserted-by":"publisher","first-page":"1347","DOI":"10.1002\/jcc.540141112","volume":"14","author":"MW Schmidt","year":"1993","unstructured":"Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JJ, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA: General atomic and molecular electronic structure system. J Comput Chem. 1993, 14: 1347-1363. 10.1002\/jcc.540141112. GAMESS Version 11 Aug 2011 (R1)","journal-title":"J Comput Chem"},{"key":"475_CR22","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.1007\/s00894-007-0233-4","volume":"13","author":"JJP Stewart","year":"2007","unstructured":"Stewart JJP: Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. J Mol Model. 2007, 13: 1173-1213. 10.1007\/s00894-007-0233-4.","journal-title":"J Mol Model"},{"key":"475_CR23","unstructured":"Stewart JJP: MOPAC2009 Version 11.366L. [http:\/\/openmopac.net]"},{"key":"475_CR24","doi-asserted-by":"publisher","first-page":"2111","DOI":"10.2174\/138161206777585274","volume":"12","author":"C Steinbeck","year":"2006","unstructured":"Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL: Recent developments of the Chemistry Development Kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des. 2006, 12: 2111-2120. 10.2174\/138161206777585274.","journal-title":"Curr Pharm Des"},{"key":"475_CR25","volume-title":"R: A language and environment for statistical computing","author":"R Development Core Team","year":"2011","unstructured":"R Development Core Team: R: A language and environment for statistical computing. 2011, Vienna, Austria: R Foundation for Statistical Computing, [http:\/\/www.R-project.org]"},{"key":"475_CR26","first-page":"18","volume":"2","author":"A Liaw","year":"2002","unstructured":"Liaw A, Wiener A: Classification and regression by RandomForest. R News. 2002, 2: 18-22.","journal-title":"R News"},{"key":"475_CR27","doi-asserted-by":"publisher","first-page":"2005","DOI":"10.1021\/ci034033k","volume":"43","author":"Y Feng","year":"2003","unstructured":"Feng Y, Liu L, Wang JT, Huang H, Guo QX: Assessment of experimental bond dissociation energies using composite ab initio methods and evaluation of the performances of density functional methods in the calculation of bond dissociation energies. J Chem Inf Comput Sci. 2003, 43: 2005-2013. 10.1021\/ci034033k.","journal-title":"J Chem Inf Comput Sci"},{"key":"475_CR28","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1145\/1656274.1656278","volume":"11","author":"M Hall","year":"2009","unstructured":"Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA data mining software: an update. SIGKDD Explorations. 2009, 11: 10-18. 10.1145\/1656274.1656278.","journal-title":"SIGKDD Explorations"},{"key":"475_CR29","doi-asserted-by":"publisher","first-page":"1824","DOI":"10.1002\/jcc.21764","volume":"32","author":"IY Zhang","year":"2011","unstructured":"Zhang IY, Wu J, Luo Y, Xu X: Accurate bond dissociation enthalpies by using doubly hybrid XYG3 functional. J Comput Chem. 2011, 32: 1824-1838. 10.1002\/jcc.21764.","journal-title":"J Comput Chem"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-5-34.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/1758-2946-5-34\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1758-2946-5-34.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T00:45:42Z","timestamp":1630543542000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/1758-2946-5-34"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,7,12]]},"references-count":29,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["475"],"URL":"https:\/\/doi.org\/10.1186\/1758-2946-5-34","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,7,12]]},"assertion":[{"value":"18 April 2013","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 July 2013","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 July 2013","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"34"}}