{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T18:22:09Z","timestamp":1776363729626,"version":"3.51.2"},"reference-count":158,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T00:00:00Z","timestamp":1721001600000},"content-version":"vor","delay-in-days":53,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100027426","name":"Schmidt Futures","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100027426","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,5,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https:\/\/github.com\/gersteinlab\/GenAI4Drug.<\/jats:p>","DOI":"10.1093\/bib\/bbae338","type":"journal-article","created":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T11:20:29Z","timestamp":1721042429000},"source":"Crossref","is-referenced-by-count":74,"title":["A survey of generative AI for <i>de novo<\/i> drug design: new frontiers in molecule and protein generation"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-2700-4513","authenticated-orcid":false,"given":"Xiangru","family":"Tang","sequence":"first","affiliation":[{"name":"Department of Computer Science, Yale University , New Haven, CT 06520 , United States"}]},{"given":"Howard","family":"Dai","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Yale University , New Haven, CT 06520 , United States"}]},{"given":"Elizabeth","family":"Knight","sequence":"additional","affiliation":[{"name":"School of Medicine, Yale University , New Haven, CT 06520 , United States"}]},{"given":"Fang","family":"Wu","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University , CA 94305 , United States"}]},{"given":"Yunyang","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Yale University , New Haven, CT 06520 , United States"}]},{"given":"Tianxiao","family":"Li","sequence":"additional","affiliation":[{"name":"Program in Computational Biology & Bioinformatics, Yale University , New Haven, CT 06520 , United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9746-3719","authenticated-orcid":false,"given":"Mark","family":"Gerstein","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Yale University , New Haven, CT 06520 , United States"},{"name":"Program in Computational Biology & Bioinformatics, Yale University , New Haven, CT 06520 , United States"},{"name":"Department of Statistics & Data Science, Yale University , New Haven, CT 06520 , United States"},{"name":"Department of Biomedical Informatics & Data Science, Yale University , New Haven, CT 06520 , United States"},{"name":"Department of Molecular Biophysics & Biochemistry, Yale University , New Haven, CT 06520 , United States"}]}],"member":"286","published-online":{"date-parts":[[2024,7,15]]},"reference":[{"key":"2024071511201558800_ref1","doi-asserted-by":"crossref","first-page":"1960","DOI":"10.1126\/science.287.5460.1960","article-title":"Drug discovery: a historical perspective","volume":"287","author":"Drews","year":"2000","journal-title":"Science"},{"key":"2024071511201558800_ref2","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1016\/j.ejphar.2009.06.065","article-title":"Rational drug design","volume":"625","author":"Mandal","year":"2009","journal-title":"Eur J Pharmacol"},{"key":"2024071511201558800_ref3","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.sbi.2018.01.006","article-title":"Statistical and machine learning approaches to predicting protein\u2013ligand interactions","volume":"49","author":"Colwell","year":"2018","journal-title":"Curr Opin Struct Biol"},{"key":"2024071511201558800_ref4","first-page":"1","article-title":"Comparison of preclinical development programs for small molecules (drugs\/pharmaceuticals) and large molecules (biologics\/biopharmaceuticals): studies, timing, materials, and costs","author":"Horvath","year":"2010","journal-title":"Pharmaceutical Sciences Encyclopedia: Drug Discovery, Development, and Manufacturing"},{"key":"2024071511201558800_ref5","doi-asserted-by":"crossref","first-page":"334","DOI":"10.1124\/pr.112.007336","article-title":"Computational methods in drug discovery","volume":"66","author":"Sliwoski","year":"2014","journal-title":"Pharmacol Rev"},{"key":"2024071511201558800_ref6","doi-asserted-by":"crossref","first-page":"353","DOI":"10.1038\/s41573-019-0050-3","article-title":"Rethinking drug design in the artificial intelligence era","volume":"19","author":"Petra Schneider","year":"2020","journal-title":"Nat Rev Drug Discov"},{"key":"2024071511201558800_ref7","first-page":"1","article-title":"Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data era","volume":"20","author":"Jing","year":"2018","journal-title":"AAPS J"},{"key":"2024071511201558800_ref8","doi-asserted-by":"crossref","first-page":"2618","DOI":"10.1021\/acs.jcim.7b00274","article-title":"Interpretation of quantitative structure\u2013activity relationship models: past, present, and future","volume":"57","author":"Polishchuk","year":"2017","journal-title":"J Chem Inf Model"},{"key":"2024071511201558800_ref9","article-title":"A practical overview of quantitative structure-activity relationship","volume":"8","author":"Isarankura-Na-Ayudhya","year":"2009","journal-title":"EXCLI"},{"key":"2024071511201558800_ref10","doi-asserted-by":"crossref","first-page":"24131","DOI":"10.1039\/C7TA01812F","article-title":"High-throughput screening of bimetallic catalysts enabled by machine learning","volume":"5","author":"Li","year":"2017","journal-title":"J Mater Chem A"},{"key":"2024071511201558800_ref11","first-page":"e1478","volume":"11","author":"Li","year":"2021","journal-title":"Wiley interdisciplinary reviews: computational molecular. Science"},{"key":"2024071511201558800_ref12","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1038\/s41592-019-0496-6","article-title":"Machine-learning-guided directed evolution for protein engineering","volume":"16","author":"Yang","year":"2019","journal-title":"Nat Methods"},{"key":"2024071511201558800_ref13","doi-asserted-by":"crossref","first-page":"8852","DOI":"10.1073\/pnas.1901979116","article-title":"Machine learning-assisted directed protein evolution with combinatorial libraries","volume":"116","author":"Wu","year":"2019","journal-title":"Proc Natl Acad Sci"},{"key":"2024071511201558800_ref14","first-page":"299","article-title":"De novo drug design","author":"Hartenfeller","year":"2011","journal-title":"Chemoinformatics and computational chemical biology"},{"key":"2024071511201558800_ref15","doi-asserted-by":"crossref","first-page":"1676","DOI":"10.3390\/ijms22041676","article-title":"Advances in de novo drug design: from conventional to machine learning methods","volume":"22","author":"Mouchlis","year":"2021","journal-title":"Int J Mol Sci"},{"key":"2024071511201558800_ref16","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1517\/17460441.2016.1146250","article-title":"Use of machine learning approaches for novel drug discovery","volume":"11","author":"Lima","year":"2016","journal-title":"Expert Opin Drug Discovery"},{"key":"2024071511201558800_ref17","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.sbi.2021.10.001","article-title":"Deep learning approaches for de novo drug design: an overview","volume":"72","author":"Wang","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2024071511201558800_ref18","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1517\/17460441.2010.497534","article-title":"De novo design: balancing novelty and confined chemical space","volume":"5","author":"Kutchukian","year":"2010","journal-title":"Expert Opin Drug Discovery"},{"key":"2024071511201558800_ref19","first-page":"139","article-title":"Computational approaches for de novo drug design: past, present, and future","author":"Liu","year":"2020","journal-title":"Artificial neural networks"},{"key":"2024071511201558800_ref20","doi-asserted-by":"crossref","first-page":"1972","DOI":"10.1056\/NEJMc1504317","article-title":"The cost of drug development","volume":"372","author":"DiMasi","year":"2015","journal-title":"N Engl J Med"},{"key":"2024071511201558800_ref21","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1016\/j.copbio.2007.04.009","article-title":"Progress in computational protein design","volume":"18","author":"Lippow","year":"2007","journal-title":"Curr Opin Biotechnol"},{"key":"2024071511201558800_ref22","doi-asserted-by":"crossref","first-page":"246","DOI":"10.3390\/ijms17020246","article-title":"Systems pharmacology in small molecular drug discovery","volume":"17","author":"Zhou","year":"2016","journal-title":"Int J Mol Sci"},{"key":"2024071511201558800_ref23","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1038\/nchem.1243","article-title":"Quantifying the chemical beauty of drugs","volume":"4","author":"Richard Bickerton","year":"2012","journal-title":"Nat Chem"},{"key":"2024071511201558800_ref24","first-page":"760","article-title":"Understanding drug-likeness","volume":"1","author":"Ursu","year":"2011","journal-title":"Wiley interdisciplinary reviews: computational molecular Science"},{"key":"2024071511201558800_ref25","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1007\/s10822-013-9672-4","article-title":"Estimation of the size of drug-like chemical space based on gdb-17 data","volume":"27","author":"Polishchuk","year":"2013","journal-title":"J Comput Aided Mol Des"},{"key":"2024071511201558800_ref26","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.jhealeco.2016.01.012","article-title":"Innovation in the pharmaceutical industry: new estimates of r&d costs","volume":"47","author":"DiMasi","year":"2016","journal-title":"J Health Econ"},{"key":"2024071511201558800_ref27","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1038\/d41573-022-00025-1","article-title":"Ai in small-molecule drug discovery: a coming wave","volume":"21","author":"Jayatunga","year":"2022","journal-title":"Nat Rev Drug Discov"},{"key":"2024071511201558800_ref28","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbac102","article-title":"Protein design via deep learning","volume":"23","author":"Ding","year":"2022","journal-title":"Brief Bioinform"},{"key":"2024071511201558800_ref29","doi-asserted-by":"crossref","first-page":"100142","DOI":"10.1016\/j.patter.2020.100142","article-title":"Deep learning in protein structural modeling and design","volume":"1","author":"Gao","year":"2020","journal-title":"Patterns"},{"key":"2024071511201558800_ref30","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1038\/nature19946","article-title":"The coming of age of de novo protein design","volume":"537","author":"Huang","year":"2016","journal-title":"Nature"},{"key":"2024071511201558800_ref31","article-title":"Ontoprotein: protein pretraining with gene ontology embedding","author":"Zhang","year":"2022"},{"key":"2024071511201558800_ref32","first-page":"2023\u201301","article-title":"Protein representation learning via knowledge enhanced primary structure modeling","author":"Zhou","year":"2023"},{"key":"2024071511201558800_ref33","first-page":"2023\u201302","article-title":"Retrieved sequence augmentation for protein representation learning","author":"Ma","year":"2023"},{"key":"2024071511201558800_ref34","doi-asserted-by":"crossref","first-page":"866","DOI":"10.1038\/nrm2805","article-title":"Exploring protein fitness landscapes by directed evolution","volume":"10","author":"Romero","year":"2009","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2024071511201558800_ref35","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1126\/science.278.5335.82","article-title":"De novo protein design: fully automated sequence selection","volume":"278","author":"Dahiyat","year":"1997","journal-title":"Science"},{"key":"2024071511201558800_ref36","article-title":"A systematic survey in geometric deep learning for structure-based drug design","author":"Zhang","year":"2023"},{"key":"2024071511201558800_ref37","doi-asserted-by":"crossref","first-page":"102559","DOI":"10.1016\/j.sbi.2023.102559","article-title":"Integrating structure-based approaches in generative molecular design","volume":"79","author":"Thomas","year":"2023","journal-title":"Curr Opin Struct Biol"},{"key":"2024071511201558800_ref38","doi-asserted-by":"crossref","first-page":"2008790","DOI":"10.1080\/19420862.2021.2008790","article-title":"Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies","volume":"14","author":"Akbar","year":"2022","journal-title":"MAbs"},{"key":"2024071511201558800_ref39","doi-asserted-by":"crossref","first-page":"102379","DOI":"10.1016\/j.sbi.2022.102379","article-title":"Advances in computational structure-based antibody design","volume":"74","author":"Hummer","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2024071511201558800_ref40","doi-asserted-by":"crossref","first-page":"100473","DOI":"10.1016\/j.cobme.2023.100473","article-title":"Ai models for protein design are driving antibody engineering","volume":"28","author":"Chungyoun","year":"2023","journal-title":"Current opinion Biomed Eng"},{"key":"2024071511201558800_ref41","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.tips.2022.12.005","article-title":"Computational and artificial intelligence-based methods for antibody development","volume":"44","author":"Kim","year":"2023","journal-title":"Trends Pharmacol Sci"},{"key":"2024071511201558800_ref42","article-title":"A survey on graph diffusion models: generative ai in science for molecule, protein and material","author":"Zhang","year":"2023"},{"key":"2024071511201558800_ref43","article-title":"Diffusion models in bioinformatics: a new wave of deep learning revolution in action","author":"Guo","year":"2023"},{"key":"2024071511201558800_ref44","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Advances in neural information processing systems"},{"key":"2024071511201558800_ref45","article-title":"Auto-encoding variational bayes","author":"Kingma","year":"2013"},{"key":"2024071511201558800_ref46","first-page":"1530","article-title":"Variational inference with normalizing flows","volume-title":"International conference on machine learning","author":"Rezende","year":"2015"},{"key":"2024071511201558800_ref47","article-title":"Diffusion models: a comprehensive survey of methods and applications","author":"Yang","year":"2022"},{"key":"2024071511201558800_ref48","doi-asserted-by":"crossref","first-page":"3797","DOI":"10.1109\/TIT.2014.2320500","article-title":"R\u00e9nyi divergence and kullback-leibler divergence","volume":"60","author":"Van Erven","year":"2014","journal-title":"IEEE Trans Inf Theory"},{"key":"2024071511201558800_ref49","first-page":"17981","article-title":"Structured denoising diffusion models in discrete state-spaces","volume":"34","author":"Austin","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024071511201558800_ref50","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Advances in neural information processing systems"},{"key":"2024071511201558800_ref51","first-page":"579","article-title":"Multilayer perceptron and neural networks","volume":"8","author":"Popescu","year":"2009","journal-title":"WSEAS Transactions on Circuits and Systems"},{"key":"2024071511201558800_ref52","article-title":"A tutorial on energy-based learning","volume":"1","author":"LeCun","year":"2006","journal-title":"Predicting structured data"},{"key":"2024071511201558800_ref53","first-page":"1105","article-title":"Learning deep energy models","volume-title":"Proceedings of the 28th international conference on machine learning (ICML-11)","author":"Ngiam","year":"2011"},{"key":"2024071511201558800_ref54","article-title":"Schnet: a continuous-filter convolutional neural network for modeling quantum interactions","volume":"30","author":"Sch\u201dutt","year":"2017","journal-title":"Advances in neural information processing systems"},{"key":"2024071511201558800_ref55","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2024071511201558800_ref56","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018"},{"key":"2024071511201558800_ref57","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1109\/TNN.2008.2005605","article-title":"The graph neural network model","volume":"20","author":"Scarselli","year":"2008","journal-title":"IEEE Trans Neural Netw"},{"key":"2024071511201558800_ref58","first-page":"9323","article-title":"E (n) equivariant graph neural networks","volume-title":"International conference on machine learning","author":"Satorras","year":"2021"},{"key":"2024071511201558800_ref59","first-page":"1263","article-title":"Neural message passing for quantum chemistry","volume-title":"International conference on machine learning","author":"Gilmer","year":"2017"},{"key":"2024071511201558800_ref60","article-title":"Semi-supervised classification with graph convolutional networks","author":"Kipf","year":"2016"},{"key":"2024071511201558800_ref61","article-title":"How powerful are graph neural networks?","author":"Xu","year":"2018"},{"key":"2024071511201558800_ref62","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput"},{"key":"2024071511201558800_ref63","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.patcog.2017.10.013","article-title":"Recent advances in convolutional neural networks","volume":"77","author":"Jiuxiang","year":"2018","journal-title":"Pattern Recognit"},{"key":"2024071511201558800_ref64","article-title":"An introduction to convolutional neural networks","author":"O\u2019Shea","year":"2015"},{"key":"2024071511201558800_ref65","first-page":"2023","article-title":"Mollm: a unified language model for integrating biomedical text with 2d and 3d molecular representations","author":"Tang","year":"2023\u201311"},{"key":"2024071511201558800_ref66","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2014.22","article-title":"Quantum chemistry structures and properties of 134 kilo molecules","volume":"1","author":"Ramakrishnan","year":"2014","journal-title":"Scientific data"},{"key":"2024071511201558800_ref67","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1038\/s41597-022-01288-4","article-title":"Geom, energy-annotated molecular conformations for property prediction and molecular generation","volume":"9","author":"Axelrod","year":"2022","journal-title":"Scientific Data"},{"key":"2024071511201558800_ref68","article-title":"Top-n: Equivariant set and graph generation without exchangeability","author":"Vignac","year":"2021"},{"key":"2024071511201558800_ref69","article-title":"Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules","volume":"32","author":"Gebauer","year":"2019","journal-title":"Advances in neural information processing systems"},{"key":"2024071511201558800_ref70","article-title":"Geometric latent diffusion models for 3d molecule generation","author":"Xu","year":"2023"},{"key":"2024071511201558800_ref71","first-page":"4181","volume-title":"(n) equivariant normalizing flows. Advances in Neural Information Processing Systems","year":"2021"},{"key":"2024071511201558800_ref72","article-title":"Geometry-complete diffusion for 3d molecule generation","author":"Morehead","year":"2023"},{"key":"2024071511201558800_ref73","article-title":"Mdm: molecular diffusion model for 3d molecule generation","author":"Huang","year":"2022"},{"key":"2024071511201558800_ref74","article-title":"Learning joint 2d & 3d diffusion models for complete molecule generation","author":"Huang","year":"2023"},{"key":"2024071511201558800_ref75","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-43415-0_33","article-title":"Midi: mixed graph and 3d denoising diffusion for molecule generation","author":"Vignac","year":"2023"},{"key":"2024071511201558800_ref76","first-page":"8867","article-title":"Equivariant diffusion for molecule generation in 3d","volume-title":"International Conference on Machine Learning","author":"Hoogeboom","year":"2022"},{"key":"2024071511201558800_ref77","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","article-title":"Automatic chemical design using a data-driven continuous representation of molecules","volume":"4","author":"G\u00f3mez-Bombarelli","year":"2018","journal-title":"ACS Cent Sci."},{"key":"2024071511201558800_ref78","first-page":"1945","article-title":"Grammar variational autoencoder","volume-title":"International conference on machine learning","author":"Kusner","year":"2017"},{"key":"2024071511201558800_ref79","article-title":"Syntax-directed variational autoencoder for structured data","author":"Dai","year":"2018"},{"key":"2024071511201558800_ref80","first-page":"2323","article-title":"Junction tree variational autoencoder for molecular graph generation","volume-title":"International conference on machine learning","author":"Jin","year":"2018"},{"key":"2024071511201558800_ref81","doi-asserted-by":"crossref","first-page":"4200","DOI":"10.1021\/acs.jcim.0c00411","article-title":"Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design","volume":"60","author":"Francoeur","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2024071511201558800_ref82","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1002\/prot.20512","article-title":"Binding moad (mother of all databases)","volume":"60","author":"Liegi","year":"2005","journal-title":"Proteins"},{"key":"2024071511201558800_ref83","doi-asserted-by":"crossref","first-page":"6065","DOI":"10.1021\/acs.jcim.0c00675","article-title":"Zinc20\u2014a free ultralarge-scale chemical database for ligand discovery","volume":"60","author":"Irwin","year":"2020","journal-title":"J Chem Inf Model"},{"key":"2024071511201558800_ref84","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref85","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1002\/jcc.21334","article-title":"Autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading","volume":"31","author":"Trott","year":"2010","journal-title":"J Comput Chem"},{"key":"2024071511201558800_ref86","first-page":"1","article-title":"Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions","volume":"1","author":"Ertl","year":"2009","journal-title":"J Chem"},{"key":"2024071511201558800_ref87","article-title":"Tanimoto","author":"Taffee","year":"1958","journal-title":"Elementary mathematical theory of classification and prediction"},{"key":"2024071511201558800_ref88","doi-asserted-by":"crossref","DOI":"10.1101\/2023.06.29.543848","article-title":"Druggpt: a gpt-based strategy for designing potential ligands targeting specific proteins","author":"Li","year":"2023"},{"key":"2024071511201558800_ref89","article-title":"Generating 3d molecular structures conditional on a receptor binding site with deep generative models","author":"Masuda","year":"2020"},{"key":"2024071511201558800_ref90","first-page":"17644","article-title":"Pocket2mol: Efficient molecular sampling based on 3d protein pockets","volume-title":"International Conference on Machine Learning","author":"Peng","year":"2022"},{"key":"2024071511201558800_ref91","first-page":"6229","article-title":"A 3d generative model for structure-based drug design","volume":"34","author":"Luo","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024071511201558800_ref92","article-title":"3d equivariant diffusion for target-aware molecule generation and affinity prediction","author":"Guan","year":"2023"},{"key":"2024071511201558800_ref93","article-title":"Structure-based drug design with equivariant diffusion models","author":"Schneuing","year":"2022"},{"key":"2024071511201558800_ref94","volume-title":"Biochemistry, essential amino acids","author":"Lopez","year":"2020"},{"key":"2024071511201558800_ref95","first-page":"D465","article-title":"Norine: update of the nonribosomal peptide resource","volume":"48","author":"Flissi","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref96","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1002\/prot.340230308","article-title":"Protein structure prediction by threading methods: evaluation of current techniques","volume":"23","author":"Lemer","year":"1995","journal-title":"Proteins"},{"key":"2024071511201558800_ref97","doi-asserted-by":"crossref","first-page":"509","DOI":"10.1002\/0471721204.ch25","volume-title":"Homology modeling Structural bioinformatics","author":"Krieger","year":"2003"},{"key":"2024071511201558800_ref98","doi-asserted-by":"crossref","first-page":"1607","DOI":"10.1002\/prot.26237","article-title":"Critical assessment of methods of protein structure prediction (casp)\u2014round xiv","volume":"89","author":"Kryshtafovych","year":"2021","journal-title":"Proteins"},{"key":"2024071511201558800_ref99","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1002\/prot.25431","article-title":"Continuous automated model evaluation (cameo) complementing the critical assessment of structure prediction in casp12","volume":"86","author":"Haas","year":"2018","journal-title":"Proteins"},{"key":"2024071511201558800_ref100","doi-asserted-by":"crossref","first-page":"3370","DOI":"10.1093\/nar\/gkg571","article-title":"Lga: a method for finding 3d similarities in protein structures","volume":"31","author":"Zemla","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref101","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Yang","year":"2004","journal-title":"Proteins"},{"key":"2024071511201558800_ref102","doi-asserted-by":"crossref","first-page":"2722","DOI":"10.1093\/bioinformatics\/btt473","article-title":"Lddt: a local superposition-free score for comparing protein structures and models using distance difference tests","volume":"29","author":"Mariani","year":"2013","journal-title":"Bioinformatics"},{"key":"2024071511201558800_ref103","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2024071511201558800_ref104","article-title":"Eigenfold: generative protein structure prediction with diffusion models","author":"Jing","year":"2023"},{"key":"2024071511201558800_ref105","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Allan dos Santos costa, Maryam Fazel-Zarandi, tom Sercu, Salvatore Candido, and Alexander rives. Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2024071511201558800_ref106","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2024071511201558800_ref107","article-title":"The trrosetta server for fast and accurate protein structure prediction","author":"Zongyang","year":"2021","journal-title":"Nature News"},{"key":"2024071511201558800_ref108","doi-asserted-by":"crossref","first-page":"2389","DOI":"10.1038\/s41467-023-38063-x","article-title":"Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies","volume":"14","author":"Ruffolo","year":"2023","journal-title":"Nat Commun"},{"key":"2024071511201558800_ref109","article-title":"Deciphering antibody affinity maturation with language models and weakly supervised learning","author":"Ruffolo","year":"2021"},{"key":"2024071511201558800_ref110","first-page":"2022\u201311","article-title":"Tfold-ab: fast and accurate antibody structure prediction without sequence homologs","author":"Wu","year":"2022"},{"key":"2024071511201558800_ref111","doi-asserted-by":"crossref","first-page":"953","DOI":"10.1098\/rsif.2008.0085","article-title":"How much of protein sequence space has been explored by life on earth?","volume":"5","author":"Dryden","year":"2008","journal-title":"Journal of The Royal Society Interface"},{"key":"2024071511201558800_ref112","doi-asserted-by":"crossref","first-page":"btae037","DOI":"10.1093\/bioinformatics\/btae037","article-title":"Multi-indicator comparative evaluation for deep learning-based protein sequence design methods","volume":"40","author":"Yu","year":"2024","journal-title":"Bioinformatics"},{"key":"2024071511201558800_ref113","doi-asserted-by":"crossref","first-page":"115D","DOI":"10.1093\/nar\/gkh131","article-title":"Uniprot: the universal protein knowledgebase","volume":"32","author":"Apweiler","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref114","doi-asserted-by":"crossref","first-page":"D376","DOI":"10.1093\/nar\/gku947","volume":"43","author":"Sillitoe","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref115","doi-asserted-by":"crossref","first-page":"2565","DOI":"10.1002\/prot.24620","article-title":"Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles","volume":"82","author":"Li","year":"2014","journal-title":"Proteins"},{"key":"2024071511201558800_ref116","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1093\/bioinformatics\/btg224","article-title":"Pisces: a protein sequence culling server","volume":"19","author":"Wang","year":"2003","journal-title":"Bioinformatics"},{"key":"2024071511201558800_ref117","doi-asserted-by":"crossref","first-page":"2947","DOI":"10.1093\/bioinformatics\/btm404","article-title":"Clustal w and clustal x version 2.0","volume":"23","author":"Larkin","year":"2007","journal-title":"Bioinformatics"},{"key":"2024071511201558800_ref118","first-page":"2023\u201303","article-title":"Proteinvae: Variational autoencoder for translational protein design","author":"Lyu","year":"2023"},{"key":"2024071511201558800_ref119","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"Prottrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2021","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2024071511201558800_ref120","doi-asserted-by":"crossref","DOI":"10.1101\/2023.01.23.525232","article-title":"Prot-vae: protein transformer variational autoencoder for functional protein design","author":"Sevgen","year":"2023"},{"key":"2024071511201558800_ref121","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1038\/s42256-021-00310-5","article-title":"Expanding functional protein sequence spaces using generative adversarial networks","volume":"3","author":"Repecka","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2024071511201558800_ref122","doi-asserted-by":"crossref","first-page":"402","DOI":"10.1016\/j.cels.2020.08.016","article-title":"Fast and flexible protein design using deep graph neural networks","volume":"11","author":"Strokach","year":"2020","journal-title":"Cell Syst"},{"key":"2024071511201558800_ref123","article-title":"Pifold: toward effective and efficient protein inverse folding","author":"Gao","year":"2022"},{"key":"2024071511201558800_ref124","doi-asserted-by":"crossref","first-page":"746","DOI":"10.1038\/s41467-022-28313-9","article-title":"Protein sequence design with a learned potential","volume":"13","author":"Anand","year":"2022","journal-title":"Nat Commun"},{"key":"2024071511201558800_ref125","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1038\/s43588-022-00273-6","article-title":"Rotamer-free protein sequence design based on deep learning and self-consistency","volume":"2","author":"Liu","year":"2022","journal-title":"Nat Comput Sci"},{"key":"2024071511201558800_ref126","doi-asserted-by":"crossref","first-page":"7434","DOI":"10.1038\/s41467-023-43166-6","article-title":"Prorefiner: an entropy-based refining strategy for inverse protein folding with global graph attention","volume":"14","author":"Zhou","year":"2023","journal-title":"Nat Commun"},{"key":"2024071511201558800_ref127","author":"Jing"},{"key":"2024071511201558800_ref128","first-page":"8946","volume-title":"Learning inverse folding from millions of predicted structures. In International conference on machine learning","author":"Hsu","year":"2022"},{"key":"2024071511201558800_ref129","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1126\/science.add2187","article-title":"Robust deep learning\u2013based protein sequence design using proteinmpnn","volume":"378","author":"Dauparas","year":"2022","journal-title":"Science"},{"key":"2024071511201558800_ref130","doi-asserted-by":"crossref","first-page":"bbae135","DOI":"10.1093\/bib\/bbae135","article-title":"Graphormer supervised de novo protein design method and function validation","volume":"25","author":"Junxi","year":"2024","journal-title":"Brief Bioinform"},{"key":"2024071511201558800_ref131","first-page":"28877","article-title":"Do transformers really perform badly for graph representation?","volume":"34","author":"Ying","year":"2021","journal-title":"Advances in neural information processing systems"},{"key":"2024071511201558800_ref132","first-page":"8844","article-title":"Msa transformer","volume-title":"International Conference on Machine Learning","author":"Rao","year":"2021"},{"key":"2024071511201558800_ref133","doi-asserted-by":"crossref","first-page":"D439","DOI":"10.1093\/nar\/gkab1061","article-title":"Alphafold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models","volume":"50","author":"Varadi","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref134","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"Scop: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J Mol Biol"},{"key":"2024071511201558800_ref135","doi-asserted-by":"crossref","first-page":"D553","DOI":"10.1093\/nar\/gkab1054","article-title":"Scope: improvements to the structural classification of proteins\u2013extended database to facilitate variant interpretation and machine learning","volume":"50","author":"Chandonia","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref136","article-title":"Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem","author":"Trippe","year":"2022"},{"key":"2024071511201558800_ref137","article-title":"A latent diffusion model for protein structure generation","author":"Fu","year":"2023"},{"key":"2024071511201558800_ref138","article-title":"Protein structure generation via folding diffusion","author":"Wu","year":"2022"},{"key":"2024071511201558800_ref139","article-title":"Se (3) diffusion model with application to protein backbone generation","author":"Yim"},{"key":"2024071511201558800_ref140","article-title":"Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds","author":"Lin","year":"2023"},{"key":"2024071511201558800_ref141","doi-asserted-by":"crossref","first-page":"1089","DOI":"10.1038\/s41586-023-06415-8","article-title":"De novo design of protein structure and function with rfdiffusion","volume":"620","author":"Watson","year":"2023","journal-title":"Nature"},{"key":"2024071511201558800_ref142","article-title":"Joint design of protein sequence and structure based on motifs","author":"Song","year":"2023"},{"key":"2024071511201558800_ref143","article-title":"Protein sequence and structure co-design with equivariant translation","author":"Shi","year":"2022"},{"key":"2024071511201558800_ref144","first-page":"2023\u201305","article-title":"An all-atom protein generative model","author":"Chu","year":"2023","journal-title":"bioRxiv"},{"key":"2024071511201558800_ref145","first-page":"2023\u201310","article-title":"Protein language model supervised precise and efficient protein backbone design method","author":"Zhang","year":"2023"},{"key":"2024071511201558800_ref146","doi-asserted-by":"crossref","first-page":"2031482","DOI":"10.1080\/19420862.2022.2031482","article-title":"In silico proof of principle of machine learning-based antibody design at unconstrained scale","volume":"14","author":"Akbar","year":"2022","journal-title":"MAbs"},{"key":"2024071511201558800_ref147","article-title":"Iterative refinement graph neural network for antibody sequence-structure co-design","author":"Jin","year":"2021"},{"key":"2024071511201558800_ref148","article-title":"End-to-end full-atom antibody design","author":"Kong","year":"2023"},{"key":"2024071511201558800_ref149","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1038\/s41573-020-00135-8","article-title":"Trends in peptide drug discovery","volume":"20","author":"Muttenthaler","year":"2021","journal-title":"Nat Rev Drug Discov"},{"key":"2024071511201558800_ref150","first-page":"3","article-title":"A multi-modal contrastive diffusion model for therapeutic peptide generation","volume-title":"AAAI","author":"Wang","year":"2024"},{"key":"2024071511201558800_ref151","article-title":"Pepgb: facilitating peptide drug discovery via graph neural networks","author":"Lei","year":"2024"},{"key":"2024071511201558800_ref152","article-title":"Pepharmony: a multi-view contrastive learning framework for integrated sequence and structure-based peptide encoding","author":"Zhang","year":"2024"},{"key":"2024071511201558800_ref153","doi-asserted-by":"crossref","first-page":"2192","DOI":"10.1111\/j.1742-4658.2012.08603.x","article-title":"Proso ii\u2013a new method for protein solubility prediction","volume":"279","author":"Smialowski","year":"2012","journal-title":"FEBS J"},{"key":"2024071511201558800_ref154","doi-asserted-by":"crossref","first-page":"D901","DOI":"10.1093\/nar\/gkm958","article-title":"Drugbank: a knowledgebase for drugs, drug actions and drug targets","volume":"36","author":"Wishart","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2024071511201558800_ref155","volume-title":"Adanovo: Adaptive De Novo peptide sequencing with conditional mutual information","author":"Xia","year":"2024"},{"key":"2024071511201558800_ref156","doi-asserted-by":"crossref","first-page":"8247","DOI":"10.1073\/pnas.1705691114","article-title":"De novo peptide sequencing by deep learning","volume":"114","author":"Tran","year":"2017","journal-title":"Proc Natl Acad Sci"},{"key":"2024071511201558800_ref157","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1038\/s42256-021-00304-3","article-title":"Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices","volume":"3","author":"Qiao","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2024071511201558800_ref158","first-page":"25514","article-title":"De novo mass spectrometry peptide sequencing with a transformer model","volume-title":"International Conference on Machine Learning","author":"Yilmaz","year":"2022"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/4\/bbae338\/58544272\/bbae338.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/4\/bbae338\/58544272\/bbae338.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T11:21:39Z","timestamp":1721042499000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae338\/7713723"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,23]]},"references-count":158,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,5,23]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae338","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,5,23]]},"article-number":"bbae338"}}