{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T04:19:02Z","timestamp":1777349942145,"version":"3.51.4"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T00:00:00Z","timestamp":1673395200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T00:00:00Z","timestamp":1673395200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Ubiquitin-specific-processing protease 7 (USP7) is a promising target protein for cancer therapy, and great attention has been given to the identification of USP7 inhibitors. Traditional virtual screening methods have now been successfully applied to discover USP7 inhibitors aiming at reducing costs and speeding up time in several studies. However, due to their unsatisfactory accuracy, it is still a difficult task to develop USP7 inhibitors. In this study, multiple supervised learning classifiers were built to distinguish active USP7 inhibitors from inactive ligands. Physicochemical descriptors, MACCS keys, ECFP4 fingerprints and SMILES were first calculated to represent the compounds in our in-house dataset. Two deep learning (DL) models and nine classical machine learning (ML) models were then constructed based on different combinations of the above molecular representations under three activity cutoff values, and a total of 15 groups of experiments (75 experiments) were implemented. The performance of the models in these experiments was evaluated, compared and discussed using a variety of metrics. The optimal models are ensemble learning models when the dataset is balanced or severely imbalanced, and SMILES-based DL performs the best when the dataset is slightly imbalanced. Meanwhile, multimodal data fusion in some cases can improve the performance of ML and DL models. In addition, SMOTE, unbiased decoy selection and SMILES enumeration can improve the performance of ML and DL models when the dataset is severely imbalanced, and SMOTE works the best. Our study established highly accurate supervised learning classification models, which would accelerate the development of USP7 inhibitors. Some guidance was also provided for drug researchers in selecting supervised models and molecular representations as well as handling imbalanced datasets.<\/jats:p>\n                <jats:p><jats:bold>Graphical Abstract<\/jats:bold><\/jats:p>","DOI":"10.1186\/s13321-022-00675-8","type":"journal-article","created":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T14:02:56Z","timestamp":1673445776000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Multimodal data fusion for supervised learning-based identification of USP7 inhibitors: a systematic comparison"],"prefix":"10.1186","volume":"15","author":[{"given":"Wen-feng","family":"Shen","sequence":"first","affiliation":[]},{"given":"He-wei","family":"Tang","sequence":"additional","affiliation":[]},{"given":"Jia-bo","family":"Li","sequence":"additional","affiliation":[]},{"given":"Xiang","family":"Li","sequence":"additional","affiliation":[]},{"given":"Si","family":"Chen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,1,11]]},"reference":[{"key":"675_CR1","doi-asserted-by":"publisher","first-page":"534","DOI":"10.1038\/nature24006","volume":"550","author":"L Kategaya","year":"2017","unstructured":"Kategaya L, Di Lello P, Roug\u00e9 L et al (2017) USP7 small-molecule inhibitors interfere with ubiquitin binding. Nature 550:534\u2013538","journal-title":"Nature"},{"key":"675_CR2","doi-asserted-by":"publisher","first-page":"490","DOI":"10.1016\/j.drudis.2020.10.028","volume":"26","author":"L Nininahazwe","year":"2021","unstructured":"Nininahazwe L, Liu B, He C et al (2021) The emerging nature of ubiquitin-specific protease 7 (USP7): a new target in cancer therapy. Drug Discov Today 26:490\u2013502","journal-title":"Drug Discov Today"},{"key":"675_CR3","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1016\/j.gendis.2020.10.004","volume":"9","author":"A Al-Eidan","year":"2022","unstructured":"Al-Eidan A, Wang Y, Skipp P, Ewing RM (2022) The USP7 protein interaction network and its roles in tumorigenesis. Genes Dis 9:41\u201350","journal-title":"Genes Dis"},{"key":"675_CR4","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbab135","author":"Y Zhao","year":"2021","unstructured":"Zhao Y, Wang X-G, Ma Z-Y et al (2021) Systematic comparison of ligand-based and structure-based virtual screening methods on poly (ADP-ribose) polymerase-1 inhibitors. Brief Bioinform. https:\/\/doi.org\/10.1093\/bib\/bbab135","journal-title":"Brief Bioinform"},{"key":"675_CR5","doi-asserted-by":"publisher","first-page":"10056","DOI":"10.1021\/acs.jmedchem.7b01293","volume":"60","author":"P Di Lello","year":"2017","unstructured":"Di Lello P, Pastor R, Murray JM et al (2017) Discovery of small-molecule inhibitors of ubiquitin specific protease 7 (USP7) using integrated NMR and in silico techniques. J Med Chem 60:10056\u201310070","journal-title":"J Med Chem"},{"key":"675_CR6","doi-asserted-by":"publisher","DOI":"10.1002\/minf.202100273","author":"S Zhang","year":"2022","unstructured":"Zhang S, Wang Y, Liu L et al (2022) Virtual screening inhibitors of ubiquitin-specific protease 7 combining pharmacophore modeling and molecular docking. Mol Inf. https:\/\/doi.org\/10.1002\/minf.202100273","journal-title":"Mol Inf"},{"key":"675_CR7","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1002\/cmdc.202000675","volume":"16","author":"D Kanan","year":"2021","unstructured":"Kanan D, Kanan T, Dogan B et al (2021) An integrated in silico approach and in vitro study for the discovery of small-molecule USP7 inhibitors as potential cancer therapies. ChemMedChem 16:555\u2013567","journal-title":"ChemMedChem"},{"key":"675_CR8","doi-asserted-by":"publisher","first-page":"3255","DOI":"10.1021\/acs.jcim.0c00154","volume":"60","author":"S Liu","year":"2020","unstructured":"Liu S, Zhou X, Li M et al (2020) Discovery of ubiquitin-specific protease 7 (USP7) inhibitors with novel scaffold structures by virtual screening, molecular dynamics simulation, and biological evaluation. J Chem Inf Model 60:3255\u20133264","journal-title":"J Chem Inf Model"},{"key":"675_CR9","doi-asserted-by":"publisher","first-page":"10520","DOI":"10.1021\/acs.chemrev.8b00728","volume":"119","author":"X Yang","year":"2019","unstructured":"Yang X, Wang Y, Byrne R et al (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 119:10520\u201310594","journal-title":"Chem Rev"},{"key":"675_CR10","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1021\/tx500389q","volume":"28","author":"H Shi","year":"2015","unstructured":"Shi H, Tian S, Li Y et al (2015) Absorption, distribution, metabolism, excretion, and toxicity evaluation in drug discovery. 14. Prediction of human pregnane X receptor activators by using naive Bayesian classification technique. Chem Res Toxicol 28:116\u2013125","journal-title":"Chem Res Toxicol"},{"key":"675_CR11","doi-asserted-by":"publisher","first-page":"755","DOI":"10.1080\/17460441.2020.1745183","volume":"15","author":"II Baskin","year":"2020","unstructured":"Baskin II (2020) The power of deep learning to ligand-based novel drug discovery. Expert Opin Drug Discov 15:755\u2013764","journal-title":"Expert Opin Drug Discov"},{"key":"675_CR12","doi-asserted-by":"crossref","unstructured":"Chauhan NK, Singh K (2018) A review on conventional machine learning vs deep learning. In: 2018 International conference on computing, power and communication technologies (GUCON), Greater Noida, India, 28\u201329 September 2018","DOI":"10.1109\/GUCON.2018.8675097"},{"key":"675_CR13","doi-asserted-by":"publisher","DOI":"10.1002\/minf.201600118","author":"DA Winkler","year":"2017","unstructured":"Winkler DA, Le TC (2017) Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and QSAR. Mol Inform. https:\/\/doi.org\/10.1002\/minf.201600118","journal-title":"Mol Inform"},{"key":"675_CR14","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-020-00460-5","author":"L David","year":"2020","unstructured":"David L, Thakkar A, Mercado R, Engkvist O (2020) Molecular representations in AI-driven drug discovery: a review and practical guide. J Cheminform. https:\/\/doi.org\/10.1186\/s13321-020-00460-5","journal-title":"J Cheminform"},{"key":"675_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113885","author":"R Bokade","year":"2021","unstructured":"Bokade R, Navato A, Ouyang R et al (2021) A cross-disciplinary comparison of multimodal data fusion approaches and applications: accelerating learning through trans-disciplinary information sharing. Expert Syst Appl. https:\/\/doi.org\/10.1016\/j.eswa.2020.113885","journal-title":"Expert Syst Appl"},{"key":"675_CR16","doi-asserted-by":"publisher","first-page":"829","DOI":"10.1162\/neco_a_01273","volume":"32","author":"J Gao","year":"2020","unstructured":"Gao J, Li P, Chen Z, Zhang J (2020) A survey on deep learning for multimodal data fusion. Neural Comput 32:829\u2013864","journal-title":"Neural Comput"},{"key":"675_CR17","first-page":"33","volume":"8","author":"PH Foo","year":"2013","unstructured":"Foo PH, Ng GW (2013) High-level information fusion: an overview. J Adv Inf Fusion 8:33\u201372","journal-title":"J Adv Inf Fusion"},{"key":"675_CR18","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbab569","author":"SR Stahlschmidt","year":"2022","unstructured":"Stahlschmidt SR, Ulfenborg B, Synnergren J (2022) Multimodal deep learning for biomedical data fusion: a review. Brief Bioinform. https:\/\/doi.org\/10.1093\/bib\/bbab569","journal-title":"Brief Bioinform"},{"key":"675_CR19","doi-asserted-by":"publisher","DOI":"10.12688\/f1000research.8357.1","author":"S Jasial","year":"2016","unstructured":"Jasial S, Hu Y, Vogt M, Bajorath J (2016) Activity-relevant similarity values for fingerprints and implications for similarity searching. F1000Research. https:\/\/doi.org\/10.12688\/f1000research.8357.1","journal-title":"F1000Research"},{"key":"675_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-016-0148-0","volume":"8","author":"NM O\u2019Boyle","year":"2016","unstructured":"O\u2019Boyle NM, Sayle RA (2016) Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform 8:1\u201314","journal-title":"J Cheminform"},{"key":"675_CR21","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I et al (2017) Attention is all you need. In: 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA, 2017"},{"key":"675_CR22","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Model 28:31\u201336","journal-title":"J Chem Inf Model"},{"key":"675_CR23","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321\u2013357","journal-title":"J Artif Intell Res"},{"key":"675_CR24","doi-asserted-by":"publisher","first-page":"1433","DOI":"10.1021\/ci500062f","volume":"54","author":"J Xia","year":"2014","unstructured":"Xia J, Jin H, Liu Z et al (2014) An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs. J Chem Inf Model 54:1433\u20131450","journal-title":"J Chem Inf Model"},{"key":"675_CR25","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1703.07076","author":"EJ Bjerrum","year":"2017","unstructured":"Bjerrum EJ (2017) SMILES enumeration as data augmentation for neural network modeling of molecules. arXiv. https:\/\/doi.org\/10.48550\/arXiv.1703.07076","journal-title":"arXiv"},{"issue":"2","key":"675_CR26","first-page":"281","volume":"13","author":"J Bergstra","year":"2012","unstructured":"Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(2):281\u2013305","journal-title":"J Mach Learn Res"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00675-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-022-00675-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00675-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T14:07:31Z","timestamp":1673446051000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-022-00675-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,11]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["675"],"URL":"https:\/\/doi.org\/10.1186\/s13321-022-00675-8","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,11]]},"assertion":[{"value":"13 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 January 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This is not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"5"}}