{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T00:40:24Z","timestamp":1774572024981,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T00:00:00Z","timestamp":1654560000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T00:00:00Z","timestamp":1654560000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Lhasa Limited"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Recently, imputation techniques have been adapted to predict activity values among sparse bioactivity matrices, showing improvements in predictive performance over traditional QSAR models. These models are able to use experimental activity values for auxiliary assays when predicting the activity of a test compound on a specific assay. In this study, we tested three different multi-task imputation techniques on three classification-based toxicity datasets: two of small scale (12 assays each) and one large scale with 417 assays. Moreover, we analyzed in detail the improvements shown by the imputation models. We found that test compounds that were dissimilar to training compounds, as well as test compounds with a large number of experimental values for other assays, showed the largest improvements. We also investigated the impact of sparsity on the improvements seen as well as the relatedness of the assays being considered. Our results show that even a small amount of additional information can provide imputation methods with a strong boost in predictive performance over traditional single task and multi-task predictive models.<\/jats:p>","DOI":"10.1186\/s13321-022-00611-w","type":"journal-article","created":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T08:05:45Z","timestamp":1654589145000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction"],"prefix":"10.1186","volume":"14","author":[{"given":"Moritz","family":"Walter","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Luke N.","family":"Allen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonio","family":"de la Vega de Le\u00f3n","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samuel J.","family":"Webb","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Valerie J.","family":"Gillet","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,6,7]]},"reference":[{"key":"611_CR1","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1111\/j.1468-0319.1995.tb00042.x","volume":"28","author":"R Caruana","year":"1997","unstructured":"Caruana R (1997) Multitask learning. Mach Learn 28:41\u201375. https:\/\/doi.org\/10.1111\/j.1468-0319.1995.tb00042.x","journal-title":"Mach Learn"},{"key":"611_CR2","doi-asserted-by":"publisher","first-page":"80","DOI":"10.3389\/fenvs.2015.00080","volume":"3","author":"A Mayr","year":"2016","unstructured":"Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sci 3:80. https:\/\/doi.org\/10.3389\/fenvs.2015.00080","journal-title":"Front Environ Sci"},{"key":"611_CR3","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1021\/ci500747n","volume":"55","author":"J Ma","year":"2015","unstructured":"Ma J, Sheridan RP, Liaw A et al (2015) Deep neural nets as a method for quantitative structure-activity relationships. J Chem Inf Model 55:263\u2013274. https:\/\/doi.org\/10.1021\/ci500747n","journal-title":"J Chem Inf Model"},{"key":"611_CR4","unstructured":"Simm J, Arany A, Zakeri P, et al (2015) Macau: scalable Bayesian multi-relational factorization with side information using MCMC. arxiv:150904610v2"},{"key":"611_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-018-0281-z","volume":"10","author":"A de la Vega de Le\u00f3n","year":"2018","unstructured":"de la Vega de Le\u00f3n A, Chen B, Gillet VJ (2018) Effect of missing data on multitask prediction methods. J Cheminform 10:1\u201312. https:\/\/doi.org\/10.1186\/s13321-018-0281-z","journal-title":"J Cheminform"},{"key":"611_CR6","doi-asserted-by":"publisher","first-page":"1444","DOI":"10.1021\/acs.jcim.0c00864","volume":"61","author":"MA Trapotsi","year":"2021","unstructured":"Trapotsi MA, Mervin LH, Afzal AM et al (2021) Comparison of chemical structure and cell morphology information for multitask bioactivity predictions. J Chem Inf Model 61:1444\u20131456. https:\/\/doi.org\/10.1021\/acs.jcim.0c00864","journal-title":"J Chem Inf Model"},{"key":"611_CR7","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1021\/ci8002914","volume":"49","author":"A Varnek","year":"2009","unstructured":"Varnek A, Gaudin C, Marcou G et al (2009) Inductive transfer of knowledge: application of multi-task learning and Feature Net approaches to model tissue-air partition coefficients. J Chem Inf Model 49:133\u2013144. https:\/\/doi.org\/10.1021\/ci8002914","journal-title":"J Chem Inf Model"},{"key":"611_CR8","doi-asserted-by":"publisher","first-page":"1062","DOI":"10.1021\/acs.jcim.8b00685","volume":"59","author":"S Sosnin","year":"2019","unstructured":"Sosnin S, Karlov D, Tetko IV, Fedorov MV (2019) Comparative study of multitask toxicity modeling on a broad chemical space. J Chem Inf Model 59:1062\u20131072. https:\/\/doi.org\/10.1021\/acs.jcim.8b00685","journal-title":"J Chem Inf Model"},{"key":"611_CR9","doi-asserted-by":"publisher","first-page":"2830","DOI":"10.1021\/acs.jcim.0c00250","volume":"60","author":"U Norinder","year":"2020","unstructured":"Norinder U, Spjuth O, Svensson F (2020) Using predicted bioactivity profiles to improve predictive modeling. J Chem Inf Model 60:2830\u20132837. https:\/\/doi.org\/10.1021\/acs.jcim.0c00250","journal-title":"J Chem Inf Model"},{"key":"611_CR10","doi-asserted-by":"publisher","first-page":"4450","DOI":"10.1021\/acs.jcim.9b00375","volume":"59","author":"EJ Martin","year":"2019","unstructured":"Martin EJ, Polyakov VR, Zhu XW et al (2019) All-Assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 novartis assays. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.9b00375","journal-title":"J Chem Inf Model"},{"key":"611_CR11","doi-asserted-by":"publisher","first-page":"1197","DOI":"10.1021\/acs.jcim.8b00768","volume":"59","author":"TM Whitehead","year":"2019","unstructured":"Whitehead TM, Irwin BWJ, Hunt P et al (2019) Imputation of assay bioactivity data using deep learning. J Chem Inf Model 59:1197\u20131204. https:\/\/doi.org\/10.1021\/acs.jcim.8b00768","journal-title":"J Chem Inf Model"},{"key":"611_CR12","doi-asserted-by":"publisher","first-page":"2848","DOI":"10.1021\/acs.jcim.0c00443","volume":"60","author":"BWJ Irwin","year":"2020","unstructured":"Irwin BWJ, Levell JR, Whitehead TM et al (2020) Practical applications of deep learning to impute heterogeneous drug discovery data. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.0c00443","journal-title":"J Chem Inf Model"},{"key":"611_CR13","doi-asserted-by":"publisher","first-page":"2077","DOI":"10.1021\/acs.jcim.7b00166","volume":"57","author":"EJ Martin","year":"2017","unstructured":"Martin EJ, Polyakov VR, Tian L, Perez RC (2017) Profile-QSAR 2.0: kinase virtual screening accuracy comparable to four-concentration IC50s for realistically novel compounds. J Chem Inf Model 57:2077\u20132088. https:\/\/doi.org\/10.1021\/acs.jcim.7b00166","journal-title":"J Chem Inf Model"},{"key":"611_CR14","doi-asserted-by":"publisher","first-page":"4977","DOI":"10.1021\/jm4004285","volume":"57","author":"A Cherkasov","year":"2014","unstructured":"Cherkasov A, Muratov EN, Fourches D et al (2014) QSAR modeling: Where have you been? Where are you going to? J Med Chem 57:4977\u20135010. https:\/\/doi.org\/10.1021\/jm4004285","journal-title":"J Med Chem"},{"key":"611_CR15","unstructured":"ISSSTY database. https:\/\/www.iss.it\/isstox. Accessed 25 May 2021"},{"key":"611_CR16","doi-asserted-by":"publisher","DOI":"10.1088\/0305-4470\/31\/2\/004","author":"OECD","year":"1997","unstructured":"OECD (1997) Test No. 471: bacterial reverse mutation test. J Phys A Math Gen. https:\/\/doi.org\/10.1088\/0305-4470\/31\/2\/004","journal-title":"J Phys A Math Gen"},{"key":"611_CR17","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1093\/mutage\/get016","volume":"28","author":"R Benigni","year":"2013","unstructured":"Benigni R, Battistelli CL, Bossa C et al (2013) New perspectives in toxicological information management, and the role of ISSTOX databases in assessing chemical mutagenicity and carcinogenicity. Mutagenesis 28:401\u2013409. https:\/\/doi.org\/10.1093\/mutage\/get016","journal-title":"Mutagenesis"},{"key":"611_CR18","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/S0027-5107(00)00064-6","volume":"455","author":"K Mortelmans","year":"2000","unstructured":"Mortelmans K, Zeiger E (2000) The Ames Salmonella\/microsome mutagenicity assay. Mutat Res Mol Mech Mutagen 455:29\u201360. https:\/\/doi.org\/10.1016\/S0027-5107(00)00064-6","journal-title":"Mutat Res Mol Mech Mutagen"},{"key":"611_CR19","unstructured":"Tox21 Challenge dataset. https:\/\/tripod.nih.gov\/tox21\/challenge\/data.jsp. Accessed 25 May 2021"},{"key":"611_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/ncomms10425","volume":"7","author":"R Huang","year":"2016","unstructured":"Huang R, Xia M, Sakamuru S et al (2016) Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization. Nat Commun 7:1\u201310. https:\/\/doi.org\/10.1038\/ncomms10425","journal-title":"Nat Commun"},{"key":"611_CR21","doi-asserted-by":"publisher","first-page":"1225","DOI":"10.1021\/acs.chemrestox.6b00135","volume":"29","author":"AM Richard","year":"2016","unstructured":"Richard AM, Judson RS, Houck KA et al (2016) ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol 29:1225\u20131251. https:\/\/doi.org\/10.1021\/acs.chemrestox.6b00135","journal-title":"Chem Res Toxicol"},{"key":"611_CR22","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1039\/c7sc02664a","volume":"9","author":"Z Wu","year":"2018","unstructured":"Wu Z, Ramsundar B, Feinberg EN et al (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9:513\u2013530. https:\/\/doi.org\/10.1039\/c7sc02664a","journal-title":"Chem Sci"},{"key":"611_CR23","unstructured":"RDKit: Open-source cheminformatics. http:\/\/www.rdkit.org. Accessed 25 May 2021"},{"key":"611_CR24","unstructured":"Swain M MolVS. https:\/\/github.com\/mcs07\/MolVS. Accessed 25 May 2021"},{"key":"611_CR25","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system: 1: introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comput Sci"},{"key":"611_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-015-0068-4","volume":"7","author":"SR Heller","year":"2015","unstructured":"Heller SR, McNaught A, Pletnev I et al (2015) InChI, the IUPAC international chemical identifier. J Cheminform 7:1\u201334. https:\/\/doi.org\/10.1186\/s13321-015-0068-4","journal-title":"J Cheminform"},{"key":"611_CR27","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742\u2013754","journal-title":"J Chem Inf Model"},{"key":"611_CR28","doi-asserted-by":"publisher","first-page":"1947","DOI":"10.1021\/ci034160g","volume":"43","author":"V Svetnik","year":"2003","unstructured":"Svetnik V, Liaw A, Tong C et al (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 43:1947\u20131958. https:\/\/doi.org\/10.1021\/ci034160g","journal-title":"J Chem Inf Comput Sci"},{"key":"611_CR29","unstructured":"Python Software Foundation Python Language Reference, version 3. https:\/\/www.python.org\/. Accessed 25 May 2021"},{"key":"611_CR30","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"611_CR31","doi-asserted-by":"publisher","unstructured":"Meir R, R\u00e4tsch G (2003) An introduction to boosting and leveraging. Lect Notes Comput Sci. https:\/\/doi.org\/10.1007\/3-540-36434-x_4","DOI":"10.1007\/3-540-36434-x_4"},{"key":"611_CR32","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.2307\/2699986","volume":"29","author":"JH Friedman","year":"2001","unstructured":"Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189\u20131232. https:\/\/doi.org\/10.2307\/2699986","journal-title":"Ann Stat"},{"key":"611_CR33","doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. pp 785\u2013794","DOI":"10.1145\/2939672.2939785"},{"key":"611_CR34","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436\u2013444. https:\/\/doi.org\/10.1038\/nature14539","journal-title":"Nature"},{"key":"611_CR35","unstructured":"Abadi M, Barham P, Chen J, et al (2016) TensorFlow: A System for Large-scale Machine Learning. In: Proceedings of the 12th USENIX Conference on operating systems design and implementation. USENIX Association, Berkeley, CA, USA, pp 265\u2013283"},{"key":"611_CR36","unstructured":"Chollet F, others (2015) Keras. https:\/\/keras.io. Accessed 25 May 2021"},{"key":"611_CR37","unstructured":"Davis IL, Stentz A (1995) Sensor fusion for autonomous outdoor navigation using neural networks. In: Proceedings 1995 IEEE\/RSJ international conference on intelligent robots and systems. Human robot interaction and cooperative robots. IEEE Computer Society Press, pp 338\u2013343"},{"key":"611_CR38","doi-asserted-by":"publisher","first-page":"2490","DOI":"10.1021\/acs.jcim.7b00087","volume":"57","author":"Y Xu","year":"2017","unstructured":"Xu Y, Ma J, Liaw A et al (2017) Demystifying multitask deep neural networks for quantitative structure\u2212activity relationships. J Chem Inf Model 57:2490\u20132504. https:\/\/doi.org\/10.1021\/acs.jcim.7b00087","journal-title":"J Chem Inf Model"},{"key":"611_CR39","doi-asserted-by":"publisher","first-page":"2623","DOI":"10.1021\/acs.jcim.1c00160","volume":"61","author":"C Esposito","year":"2021","unstructured":"Esposito C, Landrum GA, Schneider N et al (2021) GHOST: adjusting the decision threshold to handle imbalanced data in machine learning. J Chem Inf Model 61:2623\u20132640. https:\/\/doi.org\/10.1021\/acs.jcim.1c00160","journal-title":"J Chem Inf Model"},{"key":"611_CR40","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1007\/s11548-013-0913-8","volume":"9","author":"B Song","year":"2014","unstructured":"Song B, Zhang G, Zhu W, Liang Z (2014) ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography. Int J Comput Assist Radiol Surg 9:79\u201389. https:\/\/doi.org\/10.1007\/s11548-013-0913-8","journal-title":"Int J Comput Assist Radiol Surg"},{"key":"611_CR41","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","volume":"27","author":"CE Shannon","year":"1948","unstructured":"Shannon CE (1948) A Mathematical theory of communication. Bell Syst Tech J 27:379\u2013423. https:\/\/doi.org\/10.1002\/j.1538-7305.1948.tb01338.x","journal-title":"Bell Syst Tech J"},{"key":"611_CR42","doi-asserted-by":"publisher","unstructured":"Irwin BWJ, Mahmoud S, Whitehead TM et al (2020) Imputation versus prediction: applications in machine learning for drug discovery. Futur Drug Discov 2:FDD38. https:\/\/doi.org\/10.4155\/fdd-2020-0008","DOI":"10.4155\/fdd-2020-0008"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00611-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-022-00611-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00611-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,7]],"date-time":"2022-06-07T20:12:35Z","timestamp":1654632755000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-022-00611-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,7]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["611"],"URL":"https:\/\/doi.org\/10.1186\/s13321-022-00611-w","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,7]]},"assertion":[{"value":"1 April 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 June 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing financial interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"32"}}