{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T19:34:38Z","timestamp":1774380878256,"version":"3.50.1"},"reference-count":145,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T00:00:00Z","timestamp":1701302400000},"content-version":"vor","delay-in-days":8,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Within drug discovery, the goal of AI scientists and cheminformaticians is to help identify molecular starting points that will develop into safe and efficacious drugs while reducing costs, time and failure rates. To achieve this goal, it is crucial to represent molecules in a digital format that makes them machine-readable and facilitates the accurate prediction of properties that drive decision-making. Over the years, molecular representations have evolved from intuitive and human-readable formats to bespoke numerical descriptors and fingerprints, and now to learned representations that capture patterns and salient features across vast chemical spaces. Among these, sequence-based and graph-based representations of small molecules have become highly popular. However, each approach has strengths and weaknesses across dimensions such as generality, computational cost, inversibility for generative applications and interpretability, which can be critical in informing practitioners\u2019 decisions. As the drug discovery landscape evolves, opportunities for innovation continue to emerge. These include the creation of molecular representations for high-value, low-data regimes, the distillation of broader biological and chemical knowledge into novel learned representations and the modeling of up-and-coming therapeutic modalities.<\/jats:p>","DOI":"10.1093\/bib\/bbad422","type":"journal-article","created":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T20:45:20Z","timestamp":1701377120000},"source":"Crossref","is-referenced-by-count":46,"title":["From intuition to AI: evolution of small molecule representations in drug discovery"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8277-3178","authenticated-orcid":false,"given":"Miles","family":"McGibbon","sequence":"first","affiliation":[{"name":"Institute of Quantitative Biology , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"},{"name":"University of Edinburgh , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6996-3663","authenticated-orcid":false,"given":"Steven","family":"Shave","sequence":"additional","affiliation":[{"name":"Institute of Quantitative Biology , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"},{"name":"University of Edinburgh , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3324-9000","authenticated-orcid":false,"given":"Jie","family":"Dong","sequence":"additional","affiliation":[{"name":"Xiangya School of Pharmaceutical Sciences, Central South University , Changsha, 410013 , China"}]},{"given":"Yumiao","family":"Gao","sequence":"additional","affiliation":[{"name":"Institute of Quantitative Biology , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"},{"name":"University of Edinburgh , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3469-1546","authenticated-orcid":false,"given":"Douglas R","family":"Houston","sequence":"additional","affiliation":[{"name":"Institute of Quantitative Biology , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"},{"name":"University of Edinburgh , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"}]},{"given":"Jiancong","family":"Xie","sequence":"additional","affiliation":[{"name":"Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University , Guangzhou, 510000 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6782-2813","authenticated-orcid":false,"given":"Yuedong","family":"Yang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Machine Intelligence and Advanced Computing, Sun Yat-Sen University , Guangzhou, 510000 , China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3046-6576","authenticated-orcid":false,"given":"Philippe","family":"Schwaller","sequence":"additional","affiliation":[{"name":"Laboratory of Artificial Chemical Intelligence (LIAC), Institut des Sciences et Ing\u00e9nierie Chimiques, Ecole Polytechnique F\u00e9d\u00e9rale de Lausanne (EPFL) , Lausanne , Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9602-2375","authenticated-orcid":false,"given":"Vincent","family":"Blay","sequence":"additional","affiliation":[{"name":"Institute of Quantitative Biology , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"},{"name":"University of Edinburgh , Biochemistry and Biotechnology, , Edinburgh, Scotland EH9 3BF , United Kingdom"}]}],"member":"286","published-online":{"date-parts":[[2023,11,29]]},"reference":[{"key":"2023113020443340900_ref1","doi-asserted-by":"crossref","DOI":"10.1039\/9781849733069","volume-title":"Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names","author":"Favre","year":"2013"},{"key":"2023113020443340900_ref2","first-page":"56","article-title":"Molecular representations in AI-driven drug discovery: a review and practical guide","volume":"12","author":"David","year":"2020","journal-title":"J Chem"},{"key":"2023113020443340900_ref3","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","article-title":"SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules","volume":"28","author":"Weininger","year":"1988","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref4","doi-asserted-by":"crossref","first-page":"2294","DOI":"10.1021\/ci7004687","article-title":"SYBYL line notation (SLN): a single notation to represent chemical structures, queries, reactions, and virtual libraries","volume":"48","author":"Homer","year":"2008","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref5","doi-asserted-by":"crossref","first-page":"045024","DOI":"10.1088\/2632-2153\/aba947","article-title":"Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation","volume":"1","author":"Krenn","year":"2020","journal-title":"Mach Learn Sci Technol"},{"key":"2023113020443340900_ref6","first-page":"23","article-title":"InChI, the IUPAC international chemical identifier","volume":"7","author":"Heller","year":"2015","journal-title":"J Chem"},{"key":"2023113020443340900_ref7","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1021\/ci00007a012","article-title":"Description of several chemical structure file formats used by computer programs developed at molecular design limited","volume":"32","author":"Dalby","year":"1992","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref8","doi-asserted-by":"crossref","first-page":"e1603","DOI":"10.1002\/wcms.1603","article-title":"A review of molecular representation in the age of machine learning","volume":"12","author":"Wigh","year":"2022","journal-title":"WIREs Comput Mol Sci"},{"key":"2023113020443340900_ref9","first-page":"27","article-title":"USRCAT: real-time ultrafast shape recognition with pharmacophoric constraints","volume":"4","author":"Schreyer","year":"2012","journal-title":"J Chem"},{"key":"2023113020443340900_ref10","doi-asserted-by":"crossref","first-page":"6144","DOI":"10.1021\/jm049654z","article-title":"A 3D similarity method for scaffold hopping from known drugs or natural ligands to new Chemotypes","volume":"47","author":"Jenkins","year":"2004","journal-title":"J Med Chem"},{"key":"2023113020443340900_ref11","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/j.ddtec.2004.11.007","article-title":"Lead- and drug-like compounds: the rule-of-five revolution","volume":"1","author":"Lipinski","year":"2004","journal-title":"Drug Discov Today Technol"},{"key":"2023113020443340900_ref12","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1038\/nchem.1243","article-title":"Quantifying the chemical beauty of drugs","volume":"4","author":"Bickerton","year":"2012","journal-title":"Nat Chem"},{"key":"2023113020443340900_ref13","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1016\/j.drudis.2006.10.005","article-title":"Similarity-based virtual screening using 2D fingerprints","volume":"11","author":"Willett","year":"2006","journal-title":"Drug Discov Today"},{"key":"2023113020443340900_ref14","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1021\/ci010132r","article-title":"Reoptimization of MDL keys for use in drug discovery","volume":"42","author":"Durant","year":"2002","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref15","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","article-title":"Extended-connectivity fingerprints","volume":"50","author":"Rogers","year":"2010","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref16","first-page":"43","article-title":"One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome","volume":"12","author":"Capecchi","year":"2020","journal-title":"J Chem"},{"key":"2023113020443340900_ref17","doi-asserted-by":"crossref","first-page":"3789","DOI":"10.1021\/ja01479a015","article-title":"Chemical Topology1","volume":"83","author":"Frisch","year":"1961","journal-title":"J Am Chem Soc"},{"key":"2023113020443340900_ref18","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/BF01200821","article-title":"Generalized molecular descriptors","volume":"7","author":"Randi\u0107","year":"1991","journal-title":"J Math Chem"},{"key":"2023113020443340900_ref19","doi-asserted-by":"crossref","DOI":"10.1002\/9783527613106","volume-title":"Handbook of Molecular Descriptors","author":"Todeschini","year":"2000"},{"key":"2023113020443340900_ref20","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/S0169-409X(00)00129-0","article-title":"Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings","volume":"46","author":"Lipinski","year":"2001","journal-title":"Adv Drug Deliv Rev"},{"key":"2023113020443340900_ref21","doi-asserted-by":"crossref","first-page":"3743","DOI":"10.1002\/(SICI)1521-3773(19991216)38:24<3743::AID-ANIE3743>3.0.CO;2-U","article-title":"The Design of Leadlike Combinatorial Libraries","volume":"38","author":"Teague","year":"1999","journal-title":"Angew Chem Int Ed"},{"key":"2023113020443340900_ref22","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1023\/A:1015952613760","article-title":"An electrotopological-state index for atoms in molecules","volume":"07","author":"Kier","year":"1990","journal-title":"Pharm Res"},{"key":"2023113020443340900_ref23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0097-8485(94)80016-2","article-title":"Structural descriptors in organic chemistry\u2014new topological parameter based on electrotopological state of graph vertices","volume":"18","author":"Voelkel","year":"1994","journal-title":"Comput Chem"},{"key":"2023113020443340900_ref24","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1021\/ci00046a002","article-title":"Atom pairs as molecular features in structure-activity studies: definition and applications","volume":"25","author":"Carhart","year":"1985","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref25","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1021\/ci00019a008","article-title":"Charge indexes. New topological descriptors","volume":"34","author":"Galvez","year":"1994","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref26","doi-asserted-by":"crossref","first-page":"1296","DOI":"10.1002\/(SICI)1096-987X(199608)17:11<1296::AID-JCC2>3.0.CO;2-H","article-title":"Different electrostatic descriptors in comparative molecular field analysis: a comparison of molecular electrostatic and coulomb potentials","volume":"17","author":"Kroemer","year":"1996","journal-title":"J Comput Chem"},{"key":"2023113020443340900_ref27","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1007\/s12039-009-0097-5","article-title":"Signatures of molecular recognition from the topography of electrostatic potential","volume":"121","author":"Roy","year":"2009","journal-title":"J Chem Sci"},{"key":"2023113020443340900_ref28","doi-asserted-by":"crossref","first-page":"1092","DOI":"10.1093\/bioinformatics\/btt105","article-title":"ChemoPy: freely available python package for computational biology and chemoinformatics","volume":"29","author":"Cao","year":"2013","journal-title":"Bioinformatics"},{"key":"2023113020443340900_ref29","doi-asserted-by":"crossref","first-page":"3086","DOI":"10.1021\/ci400127q","article-title":"PyDPI: freely available python package for chemoinformatics, bioinformatics, and chemogenomics studies","volume":"53","author":"Cao","year":"2013","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref30","first-page":"16","article-title":"PyBioMed: a python library for various molecular representations of chemicals, proteins and DNAs and their interactions","volume":"10","author":"Dong","year":"2018","journal-title":"J Chem"},{"key":"2023113020443340900_ref31","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1021\/acs.jcim.2c00229","article-title":"MACAW: an accessible tool for molecular embedding and inverse molecular design","volume":"62","author":"Blay","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref32","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1021\/ci025584y","article-title":"The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics","volume":"43","author":"Steinbeck","year":"2003","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref33","first-page":"3","article-title":"jCompoundMapper: an open source Java library and command-line tool for chemical fingerprints","volume":"3","author":"Hinselmann","year":"2011","journal-title":"J Chem"},{"key":"2023113020443340900_ref34","doi-asserted-by":"crossref","first-page":"1337","DOI":"10.1021\/ci800038f","article-title":"Mold2, molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics","volume":"48","author":"Hong","year":"2008","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref35","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1002\/jcc.21707","article-title":"PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints","volume":"32","author":"Yap","year":"2011","journal-title":"J Comput Chem"},{"key":"2023113020443340900_ref36","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1093\/bioinformatics\/btu624","article-title":"Rcpi: R\/Bioconductor package to generate various descriptors of proteins, compounds and their interactions","volume":"31","author":"Cao","year":"2015","journal-title":"Bioinformatics"},{"key":"2023113020443340900_ref37","doi-asserted-by":"crossref","first-page":"474","DOI":"10.1093\/bib\/bbz150","article-title":"BioMedR: an R\/CRAN package for integrated data analysis pipeline in biomedical study","volume":"22","author":"Dong","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023113020443340900_ref38","first-page":"60","article-title":"ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation","volume":"7","author":"Dong","year":"2015","journal-title":"J Chem"},{"key":"2023113020443340900_ref39","first-page":"34","article-title":"BioTriangle: a web-accessible platform for generating various molecular representations for chemicals, proteins, DNAs\/RNAs and their interactions","volume":"8","author":"Dong","year":"2016","journal-title":"J Chem"},{"key":"2023113020443340900_ref40","doi-asserted-by":"crossref","first-page":"160018","DOI":"10.1038\/sdata.2016.18","article-title":"The FAIR guiding principles for scientific data management and stewardship","volume":"3","author":"Wilkinson","year":"2016","journal-title":"Sci Data"},{"key":"2023113020443340900_ref41","first-page":"39","article-title":"From FAIR research data toward FAIR and open research software","volume":"62","author":"Hasselbring","year":"2020","journal-title":"Inf Technol"},{"key":"2023113020443340900_ref42","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1021\/ci00016a005","article-title":"Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR\/QSPR research","volume":"33","author":"Katritzky","year":"1993","journal-title":"J Chem Inf Comput Sci"},{"key":"2023113020443340900_ref43","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1021\/ac00220a013","article-title":"Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies","volume":"62","author":"Stanton","year":"1990","journal-title":"Anal Chem"},{"key":"2023113020443340900_ref44","doi-asserted-by":"crossref","first-page":"681","DOI":"10.2174\/1389557043403765","article-title":"Predicting synthetic accessibility: application in drug discovery and development","volume":"4","author":"Baber","year":"2004","journal-title":"Mini Rev Med Chem"},{"key":"2023113020443340900_ref45","doi-asserted-by":"crossref","first-page":"4350","DOI":"10.1021\/jm020155c","article-title":"Do structurally similar molecules have similar biological activity?","volume":"45","author":"Martin","year":"2002","journal-title":"J Med Chem"},{"key":"2023113020443340900_ref46","doi-asserted-by":"crossref","first-page":"3204","DOI":"10.1039\/b409813g","article-title":"Molecular similarity: a key technique in molecular informatics","volume":"2","author":"Bender","year":"2004","journal-title":"Org Biomol Chem"},{"key":"2023113020443340900_ref47","doi-asserted-by":"crossref","first-page":"3186","DOI":"10.1021\/jm401411z","article-title":"Molecular similarity in medicinal chemistry","volume":"57","author":"Maggiora","year":"2014","journal-title":"J Med Chem"},{"key":"2023113020443340900_ref48","doi-asserted-by":"crossref","first-page":"1340","DOI":"10.3390\/pr11051340","article-title":"Deep learning based methods for molecular similarity searching: a systematic review","volume":"11","author":"Nasser","year":"2023","journal-title":"Processes"},{"key":"2023113020443340900_ref49","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1016\/j.ejphar.2009.06.065","article-title":"Rational drug design","volume":"625","author":"Mandal","year":"2009","journal-title":"Eur J Pharmacol"},{"key":"2023113020443340900_ref50","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1039\/C7SC02664A","article-title":"MoleculeNet: a benchmark for molecular machine learning \u2020","volume":"9","author":"Wu","year":"2018","journal-title":"Chem Sci"},{"key":"2023113020443340900_ref51","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1146\/annurev-physchem-042018-052331","article-title":"Machine learning for molecular simulation","volume":"71","author":"No\u00e9","year":"2020","journal-title":"Annu Rev Phys Chem"},{"key":"2023113020443340900_ref52","doi-asserted-by":"crossref","first-page":"e26870","DOI":"10.1002\/qua.26870","article-title":"Molecular representations for machine learning applications in chemistry","volume":"122","author":"Raghunathan","year":"2022","journal-title":"Int J Quantum Che"},{"key":"2023113020443340900_ref53","doi-asserted-by":"crossref","first-page":"10995","DOI":"10.3390\/ijms222010995","article-title":"Quantum artificial neural network approach to derive a highly predictive 3D-QSAR model for blood-brain barrier passage","volume":"22","author":"Kim","year":"2021","journal-title":"Int J Mol Sci"},{"key":"2023113020443340900_ref54","doi-asserted-by":"crossref","first-page":"5024","DOI":"10.1038\/s41467-019-12875-2","article-title":"Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions","volume":"10","author":"Sch\u00fctt","year":"2019","journal-title":"Nat Commun"},{"key":"2023113020443340900_ref55","doi-asserted-by":"crossref","first-page":"015023","DOI":"10.1088\/2632-2153\/acb900","article-title":"Quantum machine learning framework for virtual screening in drug discovery: a prospective quantum advantage","volume":"4","author":"Mensa","year":"2023","journal-title":"Mach Learn Sci Technol"},{"key":"2023113020443340900_ref56","doi-asserted-by":"crossref","first-page":"10775","DOI":"10.1039\/D2CP00834C","article-title":"\u0394-quantum machine-learning for medicinal chemistry","volume":"24","author":"Atz","year":"2022","journal-title":"Phys Chem Chem Phys"},{"key":"2023113020443340900_ref57","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/1752-153X-2-5","article-title":"Pybel: a python wrapper for the OpenBabel cheminformatics toolkit","volume":"2","author":"O\u2019Boyle","year":"2008","journal-title":"Chem Cent J"},{"key":"2023113020443340900_ref58","first-page":"4","article-title":"Mordred: a molecular descriptor calculator","volume":"10","author":"Moriwaki","year":"2018","journal-title":"J Chem"},{"key":"2023113020443340900_ref59","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1007\/978-1-0716-0150-1_32","article-title":"alvaDesc: a tool to calculate and analyze molecular descriptors and fingerprints","author":"Mauri","year":"2020","journal-title":"Ecotoxicol QSARs"},{"key":"2023113020443340900_ref60","first-page":"27","article-title":"ChemSAR: an online pipelining platform for molecular SAR modeling","volume":"9","author":"Dong","year":"2017","journal-title":"J Chem"},{"key":"2023113020443340900_ref61","doi-asserted-by":"crossref","first-page":"8705","DOI":"10.1021\/acs.jmedchem.0c00385","article-title":"Learning molecular representations for medicinal chemistry","volume":"63","author":"Chuang","year":"2020","journal-title":"J Med Chem"},{"key":"2023113020443340900_ref62","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998","journal-title":"Proc IEEE"},{"key":"2023113020443340900_ref63","volume-title":"Algorithms for Computational Biology","author":"Chen","year":"2020"},{"key":"2023113020443340900_ref64","doi-asserted-by":"crossref","first-page":"3400","DOI":"10.1016\/j.bmcl.2018.08.032","article-title":"Quantitative structure\u2013activity relationship analysis using deep learning based on a novel molecular image input technique","volume":"28","author":"Uesawa","year":"2018","journal-title":"Bioorg Med Chem Lett"},{"key":"2023113020443340900_ref65","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1186\/s12859-018-2523-5","article-title":"Convolutional neural network based on SMILES representation of compounds for detecting chemical motif","volume":"19","author":"Hirohara","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023113020443340900_ref66","doi-asserted-by":"crossref","first-page":"3383","DOI":"10.3390\/molecules24183383","article-title":"Toxicity prediction method based on multi-channel convolutional neural network","volume":"24","author":"Yuan","year":"2019","journal-title":"Molecules"},{"key":"2023113020443340900_ref67","doi-asserted-by":"crossref","first-page":"4378","DOI":"10.1021\/acs.molpharmaceut.7b01134","article-title":"3D molecular representations based on the wave transform for convolutional neural networks","volume":"15","author":"Kuzminykh","year":"2018","journal-title":"Mol Pharm"},{"key":"2023113020443340900_ref68","doi-asserted-by":"crossref","first-page":"bbab474","DOI":"10.1093\/bib\/bbab474","article-title":"A point cloud-based deep learning strategy for protein-ligand binding affinity prediction","volume":"23","author":"Wang","year":"2022","journal-title":"Brief Bioinform"},{"key":"2023113020443340900_ref69","first-page":"155","volume-title":"Advances in Neural Information Processing Systems","author":"Gens","year":"2014"},{"key":"2023113020443340900_ref70","article-title":"Finding symmetry breaking order parameters with Euclidean neural networks","volume":"3","author":"Geiger","year":"2022","journal-title":"e3nn: Euclidean Neural Networks"},{"key":"2023113020443340900_ref71","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ddtec.2020.11.009","article-title":"A compact review of molecular property prediction with graph neural networks","volume":"37","author":"Wieder","year":"2020","journal-title":"Drug Discov Today Technol"},{"key":"2023113020443340900_ref72","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1021\/acs.jcim.6b00601","article-title":"Convolutional embedding of attributed molecular graphs for physical property prediction","volume":"57","author":"Coley","year":"2017","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref73","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.3390\/ijms20143389","article-title":"Chemi-net: a molecular graph convolutional network for accurate drug property prediction","volume":"20","author":"Liu","year":"2019","journal-title":"Int J Mol Sci"},{"key":"2023113020443340900_ref74","doi-asserted-by":"crossref","first-page":"bbac566","DOI":"10.1093\/bib\/bbac566","article-title":"CasANGCL: pre-training and fine-tuning model based on cascaded attention network and graph contrastive learning for molecular property prediction","volume":"24","author":"Zheng","year":"2023","journal-title":"Brief Bioinform"},{"key":"2023113020443340900_ref75","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/j.aiopen.2021.01.001","article-title":"Graph neural networks: a review of methods and applications","volume":"1","author":"Zhou","year":"2020","journal-title":"AI Open"},{"key":"2023113020443340900_ref76","volume-title":"A Fair Comparison of Graph Neural Networks for Graph Classification","author":"Errica","year":"2022"},{"key":"2023113020443340900_ref77","first-page":"2220","article-title":"Rethinking pooling in graph neural networks","volume":"33","author":"Mesquita","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023113020443340900_ref78","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1038\/s42256-022-00447-x","article-title":"Molecular contrastive learning of representations via graph neural networks","volume":"4","author":"Wang","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023113020443340900_ref79","doi-asserted-by":"crossref","first-page":"3948","DOI":"10.1021\/acs.jcim.2c00521","article-title":"SMICLR: contrastive learning on multiple molecular representations for semisupervised and unsupervised representation learning","volume":"62","author":"Pinheiro","year":"2022","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref80","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","article-title":"Automatic chemical design using a data-driven continuous representation of molecules","volume":"4","author":"G\u00f3mez-Bombarelli","year":"2018","journal-title":"ACS Cent Sci"},{"key":"2023113020443340900_ref81","author":"O\u2019Boyle","year":"2018"},{"key":"2023113020443340900_ref82","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1016\/j.aiopen.2022.10.001","article-title":"A survey of transformers","volume":"3","author":"Lin","year":"2022","journal-title":"AI Open"},{"key":"2023113020443340900_ref83","doi-asserted-by":"crossref","first-page":"e82819","DOI":"10.7554\/eLife.82819","article-title":"Transformer-based deep learning for predicting protein properties in the life sciences","volume":"12","author":"Chandra","journal-title":"Elife"},{"key":"2023113020443340900_ref84","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1126\/science.add2187","article-title":"Robust deep learning\u2013based protein sequence design using ProteinMPNN","volume":"378","author":"Dauparas","year":"2022","journal-title":"Science"},{"key":"2023113020443340900_ref85","doi-asserted-by":"crossref","first-page":"lqad021","DOI":"10.1093\/nargab\/lqad021","article-title":"TIS transformer: remapping the human proteome using deep learning","volume":"5","author":"Clauwaert","year":"2023","journal-title":"NAR Genom Bioinform"},{"key":"2023113020443340900_ref86","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","article-title":"Applications of machine learning in drug discovery and development","volume":"18","author":"Vamathevan","year":"2019","journal-title":"Nat Rev Drug Discov"},{"key":"2023113020443340900_ref87","doi-asserted-by":"crossref","DOI":"10.1016\/j.isci.2022.105231","article-title":"Improving molecular property prediction through a task similarity enhanced transfer learning strategy","volume":"25","author":"Li","year":"2022","journal-title":"iScience"},{"key":"2023113020443340900_ref88","first-page":"429","volume-title":"Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","author":"Wang","year":"2023"},{"key":"2023113020443340900_ref89","doi-asserted-by":"crossref","first-page":"e220007","DOI":"10.1148\/ryai.220007","article-title":"Performance of multiple Pretrained BERT models to automate and accelerate data annotation for large datasets","volume":"4","author":"Tejani","year":"2022","journal-title":"Radiol Artif Intell"},{"key":"2023113020443340900_ref90","doi-asserted-by":"crossref","first-page":"015022","DOI":"10.1088\/2632-2153\/ac3ffb","article-title":"Chemformer: a pre-trained transformer for computational chemistry","volume":"3","author":"Irwin","year":"2022","journal-title":"Mach Learn Sci Technol"},{"key":"2023113020443340900_ref91","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1038\/s42256-022-00557-6","article-title":"Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework","volume":"4","author":"Zeng","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023113020443340900_ref92","article-title":"GADTI: graph autoencoder approach for DTI prediction from heterogeneous network","volume":"12","author":"Liu","year":"2021","journal-title":"Front Genet"},{"key":"2023113020443340900_ref93","volume-title":"Attention Is All You Need","author":"Vaswani","year":"2017"},{"key":"2023113020443340900_ref94","author":"OpenAI","year":"2023"},{"key":"2023113020443340900_ref95","doi-asserted-by":"crossref","first-page":"2797","DOI":"10.3390\/ijms23052797","article-title":"Unsupervised learning in drug design from self-organization to deep chemistry","volume":"23","author":"Polanski","year":"2022","journal-title":"Int J Mol Sci"},{"key":"2023113020443340900_ref96","doi-asserted-by":"crossref","first-page":"18642","DOI":"10.1021\/acsomega.0c01149","article-title":"Generative model for proposing drug candidates satisfying anticancer properties using a conditional Variational autoencoder","volume":"5","author":"Joo","year":"2020","journal-title":"ACS Omega"},{"key":"2023113020443340900_ref97","first-page":"31","article-title":"Molecular generative model based on conditional variational autoencoder for de novo molecular design","volume":"10","author":"Lim","year":"2018","journal-title":"J Chem"},{"key":"2023113020443340900_ref98","doi-asserted-by":"crossref","first-page":"5714","DOI":"10.1021\/acs.jcim.0c00174","article-title":"The synthesizability of molecules proposed by generative models","volume":"60","author":"Gao","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2023113020443340900_ref99","author":"Chithrananda","year":"2020"},{"key":"2023113020443340900_ref100","volume-title":"Molecular Representation Learning with Language Models and Domain-Relevant Auxiliary Tasks","author":"Fabian","year":"2020"},{"key":"2023113020443340900_ref101","volume-title":"MegaMolBart: Generally Applicable Chemical AI Models with Large-Scale Pretrained Transformers"},{"key":"2023113020443340900_ref102","doi-asserted-by":"crossref","first-page":"1256","DOI":"10.1038\/s42256-022-00580-7","article-title":"Large-scale chemical language representations capture molecular structure and properties","volume":"4","author":"Ross","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023113020443340900_ref103","doi-asserted-by":"crossref","first-page":"899","DOI":"10.1016\/j.scib.2022.01.029","article-title":"X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis","volume":"67","author":"Xue","year":"2022","journal-title":"Sci Bull"},{"key":"2023113020443340900_ref104","first-page":"12","article-title":"Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models","volume":"13","author":"Jiang","year":"2021","journal-title":"J Chem"},{"key":"2023113020443340900_ref105","doi-asserted-by":"crossref","first-page":"8683","DOI":"10.1021\/acs.jmedchem.9b02147","article-title":"Transfer learning for drug discovery","volume":"63","author":"Cai","year":"2020","journal-title":"J Med Chem"},{"key":"2023113020443340900_ref106","doi-asserted-by":"crossref","first-page":"74","DOI":"10.3389\/fphar.2018.00074","article-title":"Transfer and multi-task learning in QSAR Modeling: advances and challenges","volume":"9","author":"Sim\u00f5es","year":"2018","journal-title":"Front Pharmacol"},{"key":"2023113020443340900_ref107","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2023113020443340900_ref108","doi-asserted-by":"crossref","first-page":"4874","DOI":"10.1038\/s41467-020-18671-7","article-title":"Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates","volume":"11","author":"Pesciullesi","year":"2020","journal-title":"Nat Commun"},{"key":"2023113020443340900_ref109","first-page":"27","article-title":"Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT","volume":"12","author":"Li","year":"2020","journal-title":"J Chem"},{"key":"2023113020443340900_ref110","volume-title":"Language Models are Few-Shot Learners","author":"Brown","year":"2020"},{"key":"2023113020443340900_ref111","volume-title":"Is GPT All You Need for Low-Data Discovery in Chemistry?","author":"Jablonka","year":"2023"},{"key":"2023113020443340900_ref112","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.1016\/j.drudis.2020.11.037","article-title":"Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data","volume":"26","author":"Bender","year":"2021","journal-title":"Drug Discov Today"},{"key":"2023113020443340900_ref113","volume-title":"LightSeq2: Accelerated Training for Transformer-based Models on GPUs","author":"Wang","year":"2022"},{"key":"2023113020443340900_ref114","volume-title":"Sparks of Artificial General Intelligence: Early Experiments with GPT-4","author":"Bubeck","year":"2023"},{"key":"2023113020443340900_ref115","author":"Huang","year":"2023"},{"key":"2023113020443340900_ref116","author":"Bran","year":"2023"},{"key":"2023113020443340900_ref117","author":"Schick","year":"2023"},{"key":"2023113020443340900_ref118","volume-title":"Emergent Autonomous Scientific Research Capabilities of Large Language Models","author":"Boiko","year":"2023"},{"key":"2023113020443340900_ref119","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1038\/s42256-022-00465-9","article-title":"Dual use of artificial-intelligence-powered drug discovery","volume":"4","author":"Urbina","year":"2022","journal-title":"Nat Mach Intell"},{"key":"2023113020443340900_ref120","volume-title":"Censoring Chemical Data to Mitigate Dual Use Risk","author":"Campbell","year":"2023"},{"key":"2023113020443340900_ref121","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1038\/nbt.2786","article-title":"Clinical development success rates for investigational drugs","volume":"32","author":"Hay","year":"2014","journal-title":"Nat Biotechnol"},{"key":"2023113020443340900_ref122","doi-asserted-by":"crossref","first-page":"e0147215","DOI":"10.1371\/journal.pone.0147215","article-title":"When quality beats quantity: decision theory, drug discovery, and the reproducibility crisis","volume":"11","author":"Scannell","year":"2016","journal-title":"PloS One"},{"key":"2023113020443340900_ref123","doi-asserted-by":"crossref","first-page":"712","DOI":"10.1038\/nrd3439-c1","article-title":"Believe it or not: how much can we rely on published data on potential drug targets?","volume":"10","author":"Prinz","year":"2011","journal-title":"Nat Rev Drug Discov"},{"key":"2023113020443340900_ref124","first-page":"9","article-title":"Towards reproducible computational drug discovery","volume":"12","author":"Schaduangrat","year":"2020","journal-title":"J Chem"},{"key":"2023113020443340900_ref125","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1016\/j.cell.2015.11.031","article-title":"A new golden age of natural products drug discovery","volume":"163","author":"Shen","year":"2015","journal-title":"Cell"},{"key":"2023113020443340900_ref126","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1038\/s41573-020-00114-z","article-title":"Natural products in drug discovery: advances and opportunities","volume":"20","author":"Atanasov","year":"2021","journal-title":"Nat Rev Drug Discov"},{"key":"2023113020443340900_ref127","doi-asserted-by":"crossref","first-page":"5498","DOI":"10.1039\/D2CS00197G","article-title":"Chasing molecular glue degraders: screening approaches","volume":"51","author":"Domostegui","year":"2022","journal-title":"Chem Soc Rev"},{"key":"2023113020443340900_ref128","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1038\/s41573-021-00371-6","article-title":"PROTAC targeted protein degraders: the past is prologue","volume":"21","author":"B\u00e9k\u00e9s","year":"2022","journal-title":"Nat Rev Drug Discov"},{"key":"2023113020443340900_ref129","doi-asserted-by":"crossref","first-page":"1153","DOI":"10.1039\/D0BM01755H","article-title":"Cell-penetrating peptides (CPPs): an overview of applications for improving the potential of nanotherapeutics","volume":"9","author":"Desale","year":"2021","journal-title":"Biomater Sci"},{"key":"2023113020443340900_ref130","doi-asserted-by":"crossref","first-page":"1807","DOI":"10.1016\/j.drudis.2020.07.024","article-title":"High-throughput screening: today\u2019s biochemical and cell-based approaches","volume":"25","author":"Blay","year":"2020","journal-title":"Drug Discov Today"},{"key":"2023113020443340900_ref131","doi-asserted-by":"crossref","first-page":"103351","DOI":"10.1016\/j.drudis.2022.103351","article-title":"Combining DELs and machine learning for toxicology prediction","volume":"27","author":"Blay","year":"2022","journal-title":"Drug Discov Today"},{"key":"2023113020443340900_ref132","first-page":"08.16.504181","author":"Bachas","year":"2022"},{"key":"2023113020443340900_ref133","article-title":"Pre-training molecular graph representation with 3D geometry","author":"Liu","year":"2021"},{"key":"2023113020443340900_ref134","doi-asserted-by":"crossref","first-page":"7946","DOI":"10.1021\/acs.jmedchem.2c00487","article-title":"On the frustration to predict binding affinities from protein\u2013ligand structures with deep neural networks","volume":"65","author":"Volkov","year":"2022","journal-title":"J Med Chem"},{"key":"2023113020443340900_ref135","doi-asserted-by":"crossref","first-page":"2614","DOI":"10.1093\/bioinformatics\/bty114","article-title":"A global network of biomedical relationships derived from text","volume":"34","author":"Percha","year":"2018","journal-title":"Bioinformatics"},{"key":"2023113020443340900_ref136","doi-asserted-by":"crossref","first-page":"bbaa344","DOI":"10.1093\/bib\/bbaa344","article-title":"PharmKG: a dedicated knowledge graph benchmark for bomedical data mining","volume":"22","author":"Zheng","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023113020443340900_ref137","doi-asserted-by":"crossref","first-page":"3275","DOI":"10.1016\/j.chempr.2022.08.015","article-title":"Selective functionalization of hindered meta-C\u2013H bond of o-alkylaryl ketones promoted by automation and deep learning","volume":"8","author":"Qiu","year":"2022","journal-title":"Chem"},{"key":"2023113020443340900_ref138","doi-asserted-by":"crossref","first-page":"1087","DOI":"10.1038\/s41587-020-0502-7","article-title":"Extending the small-molecule similarity principle to all levels of biology with the chemical checker","volume":"38","author":"Duran-Frigola","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2023113020443340900_ref139","article-title":"Multi-view graph neural networks for molecular property prediction","author":"Ma","year":"2020"},{"key":"2023113020443340900_ref140","doi-asserted-by":"crossref","first-page":"16297","DOI":"10.1021\/acs.jpcc.2c03051","article-title":"Improving material property prediction by leveraging the large-scale computational database and deep learning","volume":"126","author":"Chen","year":"2022","journal-title":"J Phys Chem C"},{"key":"2023113020443340900_ref141","first-page":"6","article-title":"Development of natural compound molecular fingerprint (NC-MFP) with the dictionary of natural products (DNP) for natural product-based drug development","volume":"12","author":"Seo","year":"2020","journal-title":"J Chem"},{"key":"2023113020443340900_ref142","doi-asserted-by":"crossref","DOI":"10.1016\/j.patter.2022.100628","article-title":"Quantitative evaluation of explainable graph neural networks for molecular property prediction","volume":"3","author":"Rao","year":"2022","journal-title":"Patterns"},{"key":"2023113020443340900_ref143","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1038\/nature06684","article-title":"The MC-fold and MC-Sym pipeline infers RNA structure from sequence data","volume":"452","author":"Parisien","year":"2008","journal-title":"Nature"},{"key":"2023113020443340900_ref144","doi-asserted-by":"crossref","first-page":"998","DOI":"10.1016\/j.chembiol.2020.07.020","article-title":"PROTACs: an emerging therapeutic modality in precision medicine","volume":"27","author":"Nalawansha","year":"2020","journal-title":"Cell Chem Biol"},{"key":"2023113020443340900_ref145","doi-asserted-by":"crossref","first-page":"7133","DOI":"10.1038\/s41467-022-34807-3","article-title":"DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs","volume":"13","author":"Li","year":"2022","journal-title":"Nat Commun"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad422\/53933271\/bbad422.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad422\/53933271\/bbad422.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,30]],"date-time":"2023-11-30T20:46:01Z","timestamp":1701377161000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad422\/7455245"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,22]]},"references-count":145,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad422","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2023,11,22]]},"article-number":"bbad422"}}