{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T03:32:38Z","timestamp":1768534358402,"version":"3.49.0"},"reference-count":75,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T00:00:00Z","timestamp":1646611200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep learning\u2019s automatic feature extraction has been a revolutionary addition to computational drug discovery, infusing both the capabilities of learning abstract features and discovering complex molecular patterns via learning from molecular data. Since biological and chemical knowledge are necessary for overcoming the challenges of data curation, balancing, training, and evaluation, it is important for databases to contain information regarding the exact target and disease of each bioassay. The existing depositories such as PubChem or ChEMBL offer the screening data for millions of molecules against a variety of cells and targets, however, their bioassays contain complex biological descriptions which can hinder their usage by the machine learning community. In this work, a comprehensive disease and target-based dataset is collected from PubChem in order to facilitate and accelerate molecular machine learning for better drug discovery. MolData is one the largest efforts to date for democratizing the molecular machine learning, with roughly 170 million drug screening results from 1.4 million unique molecules assigned to specific diseases and targets. It also provides 30 unique categories of targets and diseases. Correlation analysis of the MolData bioassays unveils valuable information for drug repurposing for multiple diseases including cancer, metabolic disorders, and infectious diseases. Finally, we provide a benchmark of more than 30 models trained on each category using multitask learning. MolData aims to pave the way for computational drug discovery and accelerate the advancement of molecular artificial intelligence in a practical manner. The MolData benchmark data is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/GitHub.com\/Transilico\/MolData\">https:\/\/GitHub.com\/Transilico\/MolData<\/jats:ext-link> as well as within the additional files.<\/jats:p>","DOI":"10.1186\/s13321-022-00590-y","type":"journal-article","created":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T08:04:34Z","timestamp":1646640274000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":26,"title":["MolData, a molecular benchmark for disease and target based machine learning"],"prefix":"10.1186","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4050-0897","authenticated-orcid":false,"given":"Arash","family":"Keshavarzi Arshadi","sequence":"first","affiliation":[]},{"given":"Milad","family":"Salem","sequence":"additional","affiliation":[]},{"given":"Arash","family":"Firouzbakht","sequence":"additional","affiliation":[]},{"given":"Jiann Shiun","family":"Yuan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,3,7]]},"reference":[{"issue":"2","key":"590_CR1","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1039\/C7SC02664A","volume":"9","author":"Z Wu","year":"2018","unstructured":"Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513\u2013530. https:\/\/doi.org\/10.1039\/C7SC02664A","journal-title":"Chem Sci"},{"key":"590_CR2","doi-asserted-by":"publisher","DOI":"10.1016\/j.drudis.2018.01.039","author":"H Chen","year":"2018","unstructured":"Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of deep learning in drug discovery. Drug Discovery Today. https:\/\/doi.org\/10.1016\/j.drudis.2018.01.039","journal-title":"Drug Discovery Today"},{"issue":"10","key":"590_CR3","doi-asserted-by":"publisher","first-page":"4311","DOI":"10.1021\/acs.molpharmaceut.8b00930","volume":"15","author":"A Zhavoronkov","year":"2018","unstructured":"Zhavoronkov A (2018) Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry. Mol Pharm 15(10):4311\u20134313. https:\/\/doi.org\/10.1021\/acs.molpharmaceut.8b00930","journal-title":"Mol Pharm"},{"issue":"6","key":"590_CR4","doi-asserted-by":"publisher","first-page":"2697","DOI":"10.1021\/ACS.JCIM.0C01489","volume":"61","author":"D Deng","year":"2021","unstructured":"Deng D, Chen X, Zhang R, Lei Z, Wang X, Zhou F (2021) XGraphBoost: extracting graph neural network-based features for a better prediction of molecular properties. J Chem Inf Model 61(6):2697\u20132705. https:\/\/doi.org\/10.1021\/ACS.JCIM.0C01489","journal-title":"J Chem Inf Model"},{"key":"590_CR5","doi-asserted-by":"crossref","unstructured":"Minnich AJ, McLoughlin K, Tse M, Deng J, Weber A, Murad N, Allen JE. AMPL: A Data-Driven Modeling Pipeline for Drug Discovery. 2019.","DOI":"10.1021\/acs.jcim.9b01053"},{"key":"590_CR6","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/J.IJINFOMGT.2019.01.021","volume":"48","author":"Y Duan","year":"2019","unstructured":"Duan Y, Edwards JS, Dwivedi YK (2019) Artificial intelligence for decision making in the era of Big Data \u2013 evolution, challenges and research agenda. Int J Inf Manage 48:63\u201371. https:\/\/doi.org\/10.1016\/J.IJINFOMGT.2019.01.021","journal-title":"Int J Inf Manage"},{"key":"590_CR7","doi-asserted-by":"publisher","DOI":"10.1155\/2021\/6675279","author":"SK Hussin","year":"2021","unstructured":"Hussin SK, Abdelmageid SM, Alkhalil A, Omar YM, Marie MI, Ramadan RA (2021) Handling imbalance classification virtual screening big data using machine learning algorithms. Complexity. https:\/\/doi.org\/10.1155\/2021\/6675279","journal-title":"Complexity"},{"issue":"1","key":"590_CR8","doi-asserted-by":"publisher","first-page":"1874","DOI":"10.1021\/ACSOMEGA.8B03173","volume":"4","author":"A Karim","year":"2019","unstructured":"Karim A, Mishra A, Newton MAH, Sattar A (2019) Efficient toxicity prediction via simple features using shallow neural networks and decision trees. ACS Omega 4(1):1874\u20131888. https:\/\/doi.org\/10.1021\/ACSOMEGA.8B03173","journal-title":"ACS Omega"},{"key":"590_CR9","doi-asserted-by":"publisher","first-page":"80","DOI":"10.3389\/fenvs.2015.00080","volume":"3","author":"A Mayr","year":"2016","unstructured":"Mayr A, Klambauer G, Unterthiner T, Hochreiter S (2016) DeepTox: toxicity prediction using deep learning. Front Environ Sci 3:80. https:\/\/doi.org\/10.3389\/fenvs.2015.00080","journal-title":"Front Environ Sci"},{"issue":"D1","key":"590_CR10","doi-asserted-by":"publisher","first-page":"D400","DOI":"10.1093\/NAR\/GKR1132","volume":"40","author":"Y Wang","year":"2012","unstructured":"Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z, Bryant SH (2012) PubChem\u2019s BioAssay database. Nucleic Acids Res 40(D1):D400\u2013D412. https:\/\/doi.org\/10.1093\/NAR\/GKR1132","journal-title":"Nucleic Acids Res"},{"key":"590_CR11","unstructured":"PubChem. (n.d.). Accessed 6 Oct 2021. https:\/\/pubchem.ncbi.nlm.nih.gov\/"},{"issue":"W1","key":"590_CR12","doi-asserted-by":"publisher","first-page":"W612","DOI":"10.1093\/NAR\/GKV352","volume":"43","author":"M Davies","year":"2015","unstructured":"Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, Overington JP (2015) ChEMBL web services: Streamlining access to drug discovery data and utilities. Nucleic Acids Res 43(W1):W612\u2013W620. https:\/\/doi.org\/10.1093\/NAR\/GKV352","journal-title":"Nucleic Acids Res"},{"key":"590_CR13","unstructured":"ChemSpider | Search and share chemistry. (n.d.). Accessed 6 Oct 2021. http:\/\/www.chemspider.com\/"},{"issue":"11","key":"590_CR14","doi-asserted-by":"publisher","DOI":"10.1371\/JOURNAL.PONE.0049198","volume":"7","author":"UD Vempati","year":"2012","unstructured":"Vempati UD, Przydzial MJ, Chung C, Abeyruwan S, Mir A, Sakurai K, Sch\u00fcrer SC (2012) Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO). PLoS ONE 7(11):e49198. https:\/\/doi.org\/10.1371\/JOURNAL.PONE.0049198","journal-title":"PLoS ONE"},{"issue":"11","key":"590_CR15","doi-asserted-by":"publisher","first-page":"49198","DOI":"10.1371\/journal.pone.0049198","volume":"7","author":"UD Vempati","year":"2012","unstructured":"Vempati UD, Przydzial MJ, Chung C, Abeyruwan S, Mir A (2012) Formalization, annotation and analysis of diverse drug and probe screening assay datasets using the BioAssay Ontology (BAO). PLoS ONE 7(11):49198. https:\/\/doi.org\/10.1371\/journal.pone.0049198","journal-title":"PLoS ONE"},{"key":"590_CR16","unstructured":"Ramsundar B, Kearnes S, Riley P, Webster D, Konerding D, Pande V. Massively Multitask Networks for Drug Discovery. arXiv:1502.02072v1 [stat.ML]. 2015."},{"key":"590_CR17","unstructured":"Merck Molecular Activity Challenge | Kaggle. (n.d.). https:\/\/www.kaggle.com\/c\/MerckActivity. Accessed 7 Oct 2021."},{"issue":"2","key":"590_CR18","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1021\/ACS.CHEMRESTOX.0C00264","volume":"34","author":"AM Richard","year":"2020","unstructured":"Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, Tice RR (2020) The Tox21 10K Compound Library: collaborative chemistry advancing toxicology. Chem Res Toxicol 34(2):189\u2013216. https:\/\/doi.org\/10.1021\/ACS.CHEMRESTOX.0C00264","journal-title":"Chem Res Toxicol"},{"key":"590_CR19","doi-asserted-by":"publisher","DOI":"10.3389\/fenvs.2015.00080","author":"T Unterthiner","year":"2015","unstructured":"Unterthiner T, Mayr A, Klambauer G, Hochreiter S (2015). Toxicity Predict Deep Learn. https:\/\/doi.org\/10.3389\/fenvs.2015.00080","journal-title":"Toxicity Predict Deep Learn."},{"key":"590_CR20","unstructured":"chemprop\/chemprop: Message Passing Neural Networks for Molecule Property Prediction. (n.d.). https:\/\/github.com\/chemprop\/chemprop. Accessed 2 Jan 2022."},{"key":"590_CR21","doi-asserted-by":"publisher","unstructured":"Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J. Data and text mining BioBERT: a pre-trained biomedical language representation model for biomedical text mining. https:\/\/doi.org\/10.1093\/bioinformatics\/btz682","DOI":"10.1093\/bioinformatics\/btz682"},{"key":"590_CR22","unstructured":"Devlin J, Chang M-W, Lee K, Google KT, Language AI. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding."},{"issue":"8","key":"590_CR23","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1007\/s10822-016-9938-8","volume":"30","author":"S Kearnes","year":"2016","unstructured":"Kearnes S, McCloskey K, Berndl M, Pande V, Riley P (2016) Molecular graph convolutions: moving beyond fingerprints. J Comput Aided Mol Des 30(8):595\u2013608. https:\/\/doi.org\/10.1007\/s10822-016-9938-8","journal-title":"J Comput Aided Mol Des"},{"key":"590_CR24","unstructured":"Data Sources - PubChem. https:\/\/pubchem.ncbi.nlm.nih.gov\/sources\/#sort=Live-BioAssay-Count. Accessed 7 Oct 2021."},{"key":"590_CR25","unstructured":"Tox21 - PubChem Data Source. https:\/\/pubchem.ncbi.nlm.nih.gov\/source\/824. Accessed 7 Oct 2021."},{"key":"590_CR26","doi-asserted-by":"publisher","DOI":"10.1093\/DATABASE\/BAW068","author":"J Li","year":"2016","unstructured":"Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, Lu Z (2016) BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. https:\/\/doi.org\/10.1093\/DATABASE\/BAW068","journal-title":"Database"},{"key":"590_CR27","unstructured":"Diabetes. https:\/\/www.who.int\/news-room\/fact-sheets\/detail\/diabetes. Accessed 2 Jan 2022."},{"key":"590_CR28","doi-asserted-by":"publisher","DOI":"10.1038\/nrd.2016.230","author":"R Santos","year":"2017","unstructured":"Santos R, Ursu O, Gaulton A, Patr\u00edcia Bento A, Donadi RS, Bologa CG, Overington JP (2017) A comprehensive map of molecular drug targets. Nat Publ Group. https:\/\/doi.org\/10.1038\/nrd.2016.230","journal-title":"Nat Publ Group"},{"issue":"5","key":"590_CR29","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. J Chem Inf Model 50(5):742\u2013754. https:\/\/doi.org\/10.1021\/ci100050t","journal-title":"J Chem Inf Model"},{"issue":"1","key":"590_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/S13321-017-0195-1\/FIGURES\/6","volume":"9","author":"E Fern\u00e1ndez-De Gortari","year":"2017","unstructured":"Fern\u00e1ndez-De Gortari E, Garc\u00eda-Jacas CR, Martinez-Mayorga K, Medina-Franco JL (2017) Database fingerprint (DFP): an approach to represent molecular databases. J Cheminform 9(1):1\u20139. https:\/\/doi.org\/10.1186\/S13321-017-0195-1\/FIGURES\/6","journal-title":"J Cheminform"},{"issue":"1","key":"590_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/S13321-015-0069-3\/FIGURES\/7","volume":"7","author":"D Bajusz","year":"2015","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform 7(1):1\u201313. https:\/\/doi.org\/10.1186\/S13321-015-0069-3\/FIGURES\/7","journal-title":"J Cheminform"},{"key":"590_CR32","doi-asserted-by":"publisher","first-page":"65","DOI":"10.3389\/frai.2020.00065","volume":"3","author":"A Keshavarzi Arshadi","year":"2020","unstructured":"Keshavarzi Arshadi A, Webb J, Salem M, Cruz E, Calad-Thomson S, Ghadirian N, Yuan JS (2020) Artificial Intelligence for COVID-19 drug discovery and vaccine development. Front Artif Intell 3:65. https:\/\/doi.org\/10.3389\/frai.2020.00065","journal-title":"Front Artif Intell"},{"key":"590_CR33","doi-asserted-by":"publisher","DOI":"10.1021\/acsinfecdis.5b00030","author":"PB Madrid","year":"2015","unstructured":"Madrid PB, Panchal RG, Warren TK, Shurtleff AC, Endsley AN, Green CE, Tanga MJ (2015) Evaluation of Ebola Virus Inhibitors for Drug Repurposing. Drug. https:\/\/doi.org\/10.1021\/acsinfecdis.5b00030","journal-title":"Drug"},{"issue":"6","key":"590_CR34","doi-asserted-by":"publisher","first-page":"941","DOI":"10.3201\/EID2006.131302","volume":"20","author":"SE Schachterle","year":"2014","unstructured":"Schachterle SE, Mtove G, Levens JP, Clemens E, Shi L, Raj A, Sullivan DJ (2014) Short-term malaria reduction by single-dose azithromycin during mass drug administration for Trachoma, Tanzania. Emerg Infect Dis 20(6):941\u2013949. https:\/\/doi.org\/10.3201\/EID2006.131302","journal-title":"Emerg Infect Dis"},{"key":"590_CR35","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2019.01526","author":"AK Arshadi","year":"2020","unstructured":"Arshadi AK, Salem M, Collins J, Yuan JS, Chakrabarti D (2020) Deepmalaria: Artificial intelligence driven discovery of potent antiplasmodials. Front Pharmacol. https:\/\/doi.org\/10.3389\/fphar.2019.01526","journal-title":"Front Pharmacol"},{"issue":"1","key":"590_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1475-2875-13-458","volume":"13","author":"I Sagara","year":"2014","unstructured":"Sagara I, Oduro AR, Mulenga M, Dieng Y, Ogutu B, Tiono AB, Dunne MW (2014) Efficacy and safety of a combination of azithromycin and chloroquine for the treatment of uncomplicated Plasmodium falciparum malaria in two multi-country randomised clinical trials in African adults. Malar J 13(1):1\u201310. https:\/\/doi.org\/10.1186\/1475-2875-13-458","journal-title":"Malar J"},{"issue":"2","key":"590_CR37","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1210\/ER.2017-00176","volume":"39","author":"A Lamri","year":"2018","unstructured":"Lamri A, Pigeyre M, Garver WS, Meyre D (2018) The Extending Spectrum of NPC1-related human disorders: from Niemann-Pick C1 Disease to Obesity. Endocr Rev 39(2):192. https:\/\/doi.org\/10.1210\/ER.2017-00176","journal-title":"Endocr Rev"},{"issue":"11","key":"590_CR38","doi-asserted-by":"publisher","first-page":"1558","DOI":"10.1096\/FJ.04-2714FJE","volume":"19","author":"K, N., A, C., K, D., DK, S., EL, H.","year":"2005","unstructured":"K, N., A, C., K, D., DK, S., EL, H., DL, M., RE, P. (2005) Protein transduction of Rab9 in Niemann-Pick C cells reduces cholesterol storage. FASEB J 19(11):1558\u20131560. https:\/\/doi.org\/10.1096\/FJ.04-2714FJE","journal-title":"FASEB J"},{"issue":"9","key":"590_CR39","doi-asserted-by":"publisher","first-page":"4928","DOI":"10.1093\/NAR\/GKAA255","volume":"48","author":"S Giovannini","year":"2020","unstructured":"Giovannini S, Weller M-C, Hanzl\u00edkov\u00e1 H, Shiota T, Takeda S, Jiricny J (2020) ATAD5 deficiency alters DNA damage metabolism and sensitizes cells to PARP inhibition. Nucleic Acids Res 48(9):4928\u20134939. https:\/\/doi.org\/10.1093\/NAR\/GKAA255","journal-title":"Nucleic Acids Res"},{"key":"590_CR40","doi-asserted-by":"crossref","unstructured":"Pensa S, Regis G, Boselli D, Novelli F, Poli V. STAT1 and STAT3 in Tumorigenesis: Two Sides of the Same Coin? 2013.","DOI":"10.4161\/jkst.20045"},{"issue":"8","key":"590_CR41","doi-asserted-by":"publisher","first-page":"5078","DOI":"10.4049\/JIMMUNOL.176.8.5078","volume":"176","author":"A Chapgier","year":"2006","unstructured":"Chapgier A, Wynn RF, Jouanguy E, Filipe-Santos O, Zhang S, Feinberg J, Arkwright PD (2006) Human Complete Stat-1 Deficiency Is Associated with Defective Type I and II IFN responses in vitro but immunity to some low virulence viruses in vivo. J Immunol 176(8):5078\u20135083. https:\/\/doi.org\/10.4049\/JIMMUNOL.176.8.5078","journal-title":"J Immunol"},{"issue":"7426","key":"590_CR42","doi-asserted-by":"publisher","first-page":"1271","DOI":"10.1136\/BMJ.327.7426.1271","volume":"327","author":"JK Richmond","year":"2003","unstructured":"Richmond JK, Baglole DJ (2003) Lassa fever: epidemiology, clinical features, and social consequences. BMJ 327(7426):1271. https:\/\/doi.org\/10.1136\/BMJ.327.7426.1271","journal-title":"BMJ"},{"key":"590_CR43","unstructured":"Lassa fever. https:\/\/www.who.int\/health-topics\/lassa-fever#tab=tab_1. Accessed 7 Oct 2021."},{"issue":"2","key":"590_CR44","doi-asserted-by":"publisher","first-page":"191","DOI":"10.2174\/187152609787847730","volume":"9","author":"G Og","year":"2009","unstructured":"Og G, Be J, Mr V, Wj V, Gw T, He L (2009) Drug targets in infections with Ebola and Marburg viruses. Infect Disord Drug Targets 9(2):191\u2013200. https:\/\/doi.org\/10.2174\/187152609787847730","journal-title":"Infect Disord Drug Targets"},{"key":"590_CR45","unstructured":"Marburg virus disease. https:\/\/www.who.int\/news-room\/fact-sheets\/detail\/marburg-virus-disease. Accessed 7 Oct 2021."},{"issue":"9","key":"590_CR46","doi-asserted-by":"publisher","first-page":"1696","DOI":"10.3201\/EID2409.180233","volume":"24","author":"K Rosenke","year":"2018","unstructured":"Rosenke K, Feldmann H, Westover JB, Hanley PW, Martellaro C, Feldmann F, Safronetz D (2018) Use of favipiravir to treat lassa virus infection in macaques. Emerg Infect Dis 24(9):1696\u20131699. https:\/\/doi.org\/10.3201\/EID2409.180233","journal-title":"Emerg Infect Dis"},{"key":"590_CR47","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1016\/J.ANTIVIRAL.2017.12.021","volume":"151","author":"B Sl","year":"2018","unstructured":"Sl B, Tm B, J, W., KS, W., SA, V. T., L, D., TK, W. (2018) Efficacy of favipiravir (T-705) in nonhuman primates infected with Ebola virus or Marburg virus. Antiviral Res 151:97\u2013104. https:\/\/doi.org\/10.1016\/J.ANTIVIRAL.2017.12.021","journal-title":"Antiviral Res"},{"key":"590_CR48","doi-asserted-by":"publisher","first-page":"5","DOI":"10.3390\/molecules21050559","volume":"21","author":"H Yuan","year":"2016","unstructured":"Yuan H, Ma Q, Ye L, Piao G (2016) The Traditional Medicine and Modern Medicine from Natural Products. Molecules (Basel, Switzerland) 21:5. https:\/\/doi.org\/10.3390\/molecules21050559","journal-title":"Molecules (Basel, Switzerland)"},{"issue":"2","key":"590_CR49","doi-asserted-by":"publisher","first-page":"303","DOI":"10.3390\/metabo2020303","volume":"2","author":"DA Dias","year":"2012","unstructured":"Dias DA, Urban S, Roessner U (2012) A historical overview of natural products in drug discovery. Metabolites 2(2):303\u2013336. https:\/\/doi.org\/10.3390\/metabo2020303","journal-title":"Metabolites"},{"issue":"3","key":"590_CR50","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/nrd1657","volume":"4","author":"FE Koehn","year":"2005","unstructured":"Koehn FE, Carter GT (2005) The evolving role of natural products in drug discovery. Nat Rev Drug Discovery 4(3):206\u2013220. https:\/\/doi.org\/10.1038\/nrd1657","journal-title":"Nat Rev Drug Discovery"},{"key":"590_CR51","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/978-3-7643-8117-2_6","volume-title":"Natural Compounds as Drugs","author":"JM Rollinger","year":"2008","unstructured":"Rollinger JM, Stuppner H, Langer T (2008) Virtual screening for the discovery of bioactive natural products. Natural Compounds as Drugs, vol I. Basel, Birkh\u00e4user Basel, pp 211\u2013249"},{"issue":"3","key":"590_CR52","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1016\/j.cbpa.2011.03.004","volume":"15","author":"J Hong","year":"2011","unstructured":"Hong J (2011) Role of natural product diversity in chemical biology. Curr Opin Chem Biol 15(3):350\u2013354. https:\/\/doi.org\/10.1016\/j.cbpa.2011.03.004","journal-title":"Curr Opin Chem Biol"},{"key":"590_CR53","unstructured":"Early Translation Branch (ETB) | National Center for Advancing Translational Sciences. https:\/\/ncats.nih.gov\/etb. Accessed 22 Oct 2021."},{"key":"590_CR54","unstructured":"Broad Institute. https:\/\/www.broadinstitute.org\/. Accessed 22 Oct 2021."},{"key":"590_CR55","doi-asserted-by":"publisher","DOI":"10.1161\/CIRCULATIONAHA.121.057480","author":"S Khurshid","year":"2021","unstructured":"Khurshid S, Friedman S, Reeder C, di Achille P, Diamant N, Singh P, Lubitz SA (2021) Electrocardiogram-based Deep Learning and Clinical Risk Factors to Predict Atrial Fibrillation. Circulation. https:\/\/doi.org\/10.1161\/CIRCULATIONAHA.121.057480","journal-title":"Circulation"},{"key":"590_CR56","unstructured":"Home | SBP. https:\/\/www.sbpdiscovery.org\/. Accessed 22 Oct 2021."},{"issue":"2","key":"590_CR57","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1016\/J.CELL.2020.11.042","volume":"184","author":"JZ Shen","year":"2021","unstructured":"Shen JZ, Qiu Z, Wu Q, Finlay D, Garcia G, Sun D, Spruck C (2021) FBXO44 promotes DNA replication-coupled repetitive element silencing in cancer cells. Cell 184(2):352-369.e23. https:\/\/doi.org\/10.1016\/J.CELL.2020.11.042","journal-title":"Cell"},{"key":"590_CR58","unstructured":"UNM Center for Molecular Discovery | University of New Mexico flow cytometry research center. http:\/\/nmmlsc.health.unm.edu\/. Accessed 22 Oct 2021."},{"issue":"8","key":"590_CR59","doi-asserted-by":"publisher","first-page":"733","DOI":"10.1016\/S1074-5521(03)00170-4","volume":"10","author":"A Vogt","year":"2003","unstructured":"Vogt A, Cooley KA, Brisson M, Tarpley MG, Wipf P, Lazo JS (2003) Cell-active dual specificity phosphatase inhibitors identified by high-content screening. Chem Biol 10(8):733\u2013742. https:\/\/doi.org\/10.1016\/S1074-5521(03)00170-4","journal-title":"Chem Biol"},{"key":"590_CR60","unstructured":"Biological Discovery through Chemical Innovation | Emory University | Atlanta GA. https:\/\/bdci.emory.edu\/. Accessed 22 Oct 2021."},{"key":"590_CR61","doi-asserted-by":"publisher","DOI":"10.1016\/J.CELREP.2021.108991\/ATTACHMENT\/0319A4A3-170A-4D46-A15C-1AD356390813\/MMC1.PDF","author":"N Raj","year":"2021","unstructured":"Raj N, McEachin ZT, Harousseau W, Zhou Y, Zhang F, Merritt-Garza ME, Bassell GJ (2021) Cell-type-specific profiling of human cellular models of fragile X syndrome reveal PI3K-dependent defects in translation and neurogenesis. Cell Rep. https:\/\/doi.org\/10.1016\/J.CELREP.2021.108991\/ATTACHMENT\/0319A4A3-170A-4D46-A15C-1AD356390813\/MMC1.PDF","journal-title":"Cell Rep"},{"key":"590_CR62","unstructured":"Toxicology in the 21st Century (Tox21) | National Center for Advancing Translational Sciences. https:\/\/ncats.nih.gov\/tox21. Accessed 22 Oct 2021."},{"key":"590_CR63","doi-asserted-by":"publisher","unstructured":"Linnenbrink EPA. United states federal government tox21 collaboration advancing toxicology to improve environmental health and pharmaceutical safety. Overview. https:\/\/doi.org\/10.14573\/altex.1803011","DOI":"10.14573\/altex.1803011"},{"key":"590_CR64","unstructured":"Lead Identification | Scripps Florida. https:\/\/hts.florida.scripps.edu\/. Accessed 22 Oct 2021."},{"key":"590_CR65","doi-asserted-by":"publisher","unstructured":"Identification of potent small molecule inhibitors of SARS-CoV-2 entry. (2021). SLAS Discovery. https:\/\/doi.org\/10.1016\/J.SLASD.2021.10.012","DOI":"10.1016\/J.SLASD.2021.10.012"},{"key":"590_CR66","unstructured":"Johns Hopkins Ion Channel Center - PubChem Data Source. https:\/\/pubchem.ncbi.nlm.nih.gov\/source\/Johns Hopkins Ion Channel Center. Accessed 22 Oct 2021."},{"key":"590_CR67","doi-asserted-by":"publisher","DOI":"10.1021\/ACSCHEMBIO.1C00721","author":"M Dasovich","year":"2021","unstructured":"Dasovich M, Zhuo J, Goodman JA, Thomas A, McPherson RL, Jayabalan AK, Leung AKL (2021) High-Throughput Activity Assay for Screening Inhibitors of the SARS-CoV-2 Mac1 Macrodomain. ACS Chem Biol. https:\/\/doi.org\/10.1021\/ACSCHEMBIO.1C00721","journal-title":"ACS Chem Biol"},{"key":"590_CR68","unstructured":"ICCB-Longwood Screening Facility. https:\/\/iccb.med.harvard.edu\/. Accessed 22 Oct 2021."},{"issue":"11","key":"590_CR69","doi-asserted-by":"publisher","first-page":"2309","DOI":"10.1038\/NPROT.2013.130","volume":"8","author":"EH Mashalidis","year":"2013","unstructured":"Mashalidis EH, \u015aled\u00e5 P, Lang S, Abell C (2013) A three-stage biophysical screening cascade for fragment-based drug discovery. Nat Protoc 8(11):2309\u20132324. https:\/\/doi.org\/10.1038\/NPROT.2013.130","journal-title":"Nat Protoc"},{"key":"590_CR70","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In: Advances in Neural Information Processing Systems; 2013. https:\/\/arxiv.org\/abs\/1310.4546v1"},{"key":"590_CR71","unstructured":"Pyysalo S, Ginter F, Moen H, Salakoski T, Ananiadou S. Distributional Semantics Resources for Biomedical Text Processing. https:\/\/github.com\/spyysalo\/nxml2txt"},{"key":"590_CR72","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference; 2018; 1. p. 4171\u20134186. https:\/\/arxiv.org\/abs\/1810.04805v2"},{"key":"590_CR73","unstructured":"HMMER. http:\/\/hmmer.org\/. Accessed 7 Oct 2021."},{"issue":"1","key":"590_CR74","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/CI00057A005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules. J Chem Inf Comput Sci 28(1):31\u201336. https:\/\/doi.org\/10.1021\/CI00057A005","journal-title":"J Chem Inf Comput Sci"},{"issue":"15","key":"590_CR75","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/JM9602928","volume":"39","author":"WB Gub","year":"1996","unstructured":"Gub WB, Murcko MA (1996) The Properties of Known Drugs. 1. Molecular Frameworks. J Medic Chem 39(15):2887\u20132893. https:\/\/doi.org\/10.1021\/JM9602928","journal-title":"J Medic Chem"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00590-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-022-00590-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-022-00590-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,7]],"date-time":"2022-03-07T08:09:38Z","timestamp":1646640578000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-022-00590-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,7]]},"references-count":75,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["590"],"URL":"https:\/\/doi.org\/10.1186\/s13321-022-00590-y","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,7]]},"assertion":[{"value":"26 October 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 February 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"10"}}