{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T07:50:56Z","timestamp":1777881056891,"version":"3.51.4"},"reference-count":134,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2023,10,21]],"date-time":"2023-10-21T00:00:00Z","timestamp":1697846400000},"content-version":"vor","delay-in-days":29,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Department of Chemical Engineering and Materials Science at Michigan State University"},{"DOI":"10.13039\/100000199","name":"USDA","doi-asserted-by":"publisher","award":["13700968"],"award-info":[{"award-number":["13700968"]}],"id":[{"id":"10.13039\/100000199","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,9,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The widespread adoption of high-throughput omics technologies has exponentially increased the amount of protein sequence data involved in many salient disease pathways and their respective therapeutics and diagnostics. Despite the availability of large-scale sequence data, the lack of experimental fitness annotations underpins the need for self-supervised and unsupervised machine learning (ML) methods. These techniques leverage the meaningful features encoded in abundant unlabeled sequences to accomplish complex protein engineering tasks. Proficiency in the rapidly evolving fields of protein engineering and generative AI is required to realize the full potential of ML models as a tool for protein fitness landscape navigation. Here, we support this work by (i) providing an overview of the architecture and mathematical details of the most successful ML models applicable to sequence data (e.g. variational autoencoders, autoregressive models, generative adversarial neural networks, and diffusion models), (ii) guiding how to effectively implement these models on protein sequence data to predict fitness or generate high-fitness sequences and (iii) highlighting several successful studies that implement these techniques in protein engineering (from paratope regions and subcellular localization prediction to high-fitness sequences and protein design rules generation). By providing a comprehensive survey of model details, novel architecture developments, comparisons of model applications, and current challenges, this study intends to provide structured guidance and robust framework for delivering a prospective outlook in the ML-driven protein engineering field.<\/jats:p>","DOI":"10.1093\/bib\/bbad358","type":"journal-article","created":{"date-parts":[[2023,10,21]],"date-time":"2023-10-21T03:54:51Z","timestamp":1697860491000},"source":"Crossref","is-referenced-by-count":49,"title":["Generative models for protein sequence modeling: recent advances and future directions"],"prefix":"10.1093","volume":"24","author":[{"given":"Mehrsa","family":"Mardikoraem","sequence":"first","affiliation":[{"name":"Michigan State University (MSU)\u2018s Department of Chemical Engineering and Materials Science"}]},{"given":"Zirui","family":"Wang","sequence":"additional","affiliation":[{"name":"Regeneron Pharmaceuticals, Inc. Having received his B.S. in Chemical Engineering from MSU, he is currently pursuing a M.S. in Computer Science from Syracuse University"}]},{"given":"Nathaniel","family":"Pascual","sequence":"additional","affiliation":[{"name":"B.S. in Chemical Engineering from MSU"}]},{"given":"Daniel","family":"Woldring","sequence":"additional","affiliation":[{"name":"MSU\u2019s Department of Chemical Engineering and Materials Science and a member of MSU\u2019s Institute for Quantitative Health Sciences and Engineering"}]}],"member":"286","published-online":{"date-parts":[[2023,10,20]]},"reference":[{"key":"2023102103544369900_ref1","doi-asserted-by":"crossref","first-page":"1293","DOI":"10.1002\/cbic.200900062","article-title":"Engineered two-helix small proteins for molecular recognition","volume":"10","author":"Webster","year":"2009","journal-title":"Chem Bio Chem"},{"key":"2023102103544369900_ref2","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1109\/JBHI.2020.2984355","article-title":"Early detection of Alzheimer\u2019s disease with blood plasma proteins using support vector machines","volume":"25","author":"Eke","year":"2021","journal-title":"IEEE J Biomed Health Inform"},{"key":"2023102103544369900_ref3","doi-asserted-by":"crossref","first-page":"1302","DOI":"10.3389\/fimmu.2018.01302","article-title":"The clinical significance and potential role of C-reactive protein in chronic inflammatory and neurodegenerative diseases","volume":"9","author":"Luan","year":"2018","journal-title":"Front Immunol"},{"key":"2023102103544369900_ref4","doi-asserted-by":"crossref","first-page":"2140","DOI":"10.1158\/1078-0432.CCR-19-1655","article-title":"Efficacy of Affibody-based ultrasound molecular imaging of vascular B7-H3 for breast cancer detection","volume":"26","author":"Bam","year":"2020","journal-title":"Clin Cancer Res"},{"key":"2023102103544369900_ref5","doi-asserted-by":"crossref","first-page":"2506","DOI":"10.3390\/polym13152506","article-title":"Proteins in food systems\u2014bionanomaterials, conventional and unconventional sources, functional properties, and development opportunities","volume":"13","author":"Ma\u0142ecki","year":"2021","journal-title":"Polymers"},{"key":"2023102103544369900_ref6","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1016\/0958-1669(94)90026-4","article-title":"Engineering proteins for environmental applications","volume":"5","author":"Janssen","year":"1994","journal-title":"Curr Opin Biotechnol"},{"key":"2023102103544369900_ref7","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.copbio.2010.12.006","article-title":"Molecular Design of the Microbial Cell Surface toward the recovery of metal ions","volume":"22","author":"Kuroda","year":"2011","journal-title":"Curr Opin Biotechnol"},{"key":"2023102103544369900_ref8","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1111\/1751-7915.12059","article-title":"Bioremediation: a genuine technology to remediate radionuclides from the environment","volume":"6","author":"Prakash","year":"2013","journal-title":"J Microbial Biotechnol"},{"key":"2023102103544369900_ref9","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1080\/15226514.2011.568537","article-title":"Toward protein engineering for phytoremediation: possibilities and challenges","volume":"13","author":"Jez","year":"2011","journal-title":"Int J Phytoremediation"},{"key":"2023102103544369900_ref10","doi-asserted-by":"crossref","first-page":"3820","DOI":"10.1002\/bit.27525","article-title":"Display of lead-binding proteins on Escherichia coli surface for lead bioremediation","volume":"117","author":"Jia","year":"2020","journal-title":"Biotechnol Bioeng"},{"key":"2023102103544369900_ref11","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1093\/protein\/gzu016","article-title":"Selection of high-affinity Centyrin FN3 domains from a simple library diversified at a combination of strand and loop positions","volume":"27","author":"Diem","year":"2014","journal-title":"Protein Eng Des Sel"},{"key":"2023102103544369900_ref12","doi-asserted-by":"crossref","first-page":"e2026658118","DOI":"10.1073\/pnas.2026658118","article-title":"High-throughput developability assays enable library-scale identification of producible protein scaffold variants","volume":"118","author":"Golinski","year":"2021","journal-title":"Proc Natl Acad Sci"},{"key":"2023102103544369900_ref13","doi-asserted-by":"crossref","first-page":"1271","DOI":"10.1110\/ps.0239303","article-title":"Protein\u2013protein docking with a reduced protein model accounting for side-chain flexibility","volume":"12","author":"Zacharias","year":"2003","journal-title":"Protein Sci"},{"key":"2023102103544369900_ref14","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/j.pisc.2016.08.002","article-title":"Reconstruction of ancestral enzymes","volume":"9","author":"Merkl","year":"2016","journal-title":"Perspect Sci"},{"key":"2023102103544369900_ref15","volume-title":"Attention Is All You Need","author":"Vaswani","year":"2017"},{"key":"2023102103544369900_ref16","volume-title":"Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey","author":"Ghojogh","year":"2020"},{"key":"2023102103544369900_ref17","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"2023102103544369900_ref18","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"key":"2023102103544369900_ref19","doi-asserted-by":"crossref","first-page":"bbab072","DOI":"10.1093\/bib\/bbab072","article-title":"DeepDTAF: a deep learning method to predict protein\u2013ligand binding affinity","volume":"22","author":"Wang","year":"2021","journal-title":"Brief Bioinform"},{"key":"2023102103544369900_ref20","doi-asserted-by":"crossref","first-page":"1411","DOI":"10.1021\/acssynbio.9b00099","article-title":"Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima","volume":"8","author":"Li","year":"2019","journal-title":"ACS Synth Biol"},{"key":"2023102103544369900_ref21","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1093\/bioinformatics\/bty166","article-title":"DeepSol: a deep learning framework for sequence-based protein solubility prediction","volume":"34","author":"Khurana","year":"2018","journal-title":"Bioinformatics"},{"key":"2023102103544369900_ref22","doi-asserted-by":"crossref","first-page":"i802","DOI":"10.1093\/bioinformatics\/bty573","article-title":"Predicting protein\u2013protein interactions through sequence-based deep learning","volume":"34","author":"Hashemifar","year":"2018","journal-title":"Bioinformatics"},{"key":"2023102103544369900_ref23","doi-asserted-by":"crossref","first-page":"9848","DOI":"10.1038\/s41598-019-46369-4","article-title":"Predicting protein-protein interactions from matrix-based protein sequence using convolution neural network and feature-selective rotation Forest","volume":"9","author":"Wang","year":"2019","journal-title":"Sci Rep"},{"key":"2023102103544369900_ref24","first-page":"2022.03.09.483666","article-title":"A deep unsupervised language model for protein design","author":"Ferruz","year":"2022"},{"key":"2023102103544369900_ref25","first-page":"9689","article-title":"Evaluating protein transfer learning with TAPE","volume":"32","author":"Rao","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2023102103544369900_ref26","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat Methods"},{"key":"2023102103544369900_ref27","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci"},{"key":"2023102103544369900_ref28","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.1109\/TPAMI.2021.3095381","article-title":"ProtTrans: toward understanding the language of life through self-supervised learning","volume":"44","author":"Elnaggar","year":"2022","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2023102103544369900_ref29","volume-title":"How to Hallucinate Functional Proteins","author":"Costello","year":"2019"},{"key":"2023102103544369900_ref30","doi-asserted-by":"crossref","first-page":"4348","DOI":"10.1038\/s41467-022-32007-7","article-title":"ProtGPT2 is a deep unsupervised language model for protein design","volume":"13","author":"Ferruz","year":"2022","journal-title":"Nat Commun"},{"key":"2023102103544369900_ref31","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1038\/s42256-021-00310-5","article-title":"Expanding functional protein sequence spaces using generative adversarial networks","volume":"3","author":"Repecka","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2023102103544369900_ref32","doi-asserted-by":"crossref","first-page":"1089","DOI":"10.1038\/s41586-023-06415-8","article-title":"De novo design of protein structure and function with RF diffusion","volume":"620","author":"Watson","year":"2023","journal-title":"Nature"},{"key":"2023102103544369900_ref33","volume-title":"Auto-Encoding Variational Bayes","author":"Kingma","year":"2022"},{"key":"2023102103544369900_ref34","volume-title":"Generative Adversarial Networks","author":"Goodfellow","year":"2014"},{"key":"2023102103544369900_ref35","first-page":"2256","volume-title":"Proceedings of the Proceedings of the 32nd International Conference on Machine Learning","author":"Sohl-Dickstein"},{"key":"2023102103544369900_ref36","doi-asserted-by":"crossref","first-page":"132306","DOI":"10.1016\/j.physd.2019.132306","article-title":"Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network","volume":"404","author":"Sherstinsky","year":"2020","journal-title":"Phys Nonlinear Phenom"},{"key":"2023102103544369900_ref37","doi-asserted-by":"crossref","first-page":"2673","DOI":"10.1109\/78.650093","article-title":"Bidirectional recurrent neural networks","volume":"45","author":"Schuster","year":"1997","journal-title":"IEEE Trans Signal Process"},{"key":"2023102103544369900_ref38","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023102103544369900_ref39","volume-title":"Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling","author":"Chung","year":"2014"},{"key":"2023102103544369900_ref40","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1021\/acs.jcim.7b00414","article-title":"Recurrent neural network model for constructive peptide design","volume":"58","author":"M\u00fcller","year":"2018","journal-title":"J Chem Inf Model"},{"key":"2023102103544369900_ref41","doi-asserted-by":"crossref","first-page":"5852","DOI":"10.1038\/s41598-021-85274-7","article-title":"Antibody design using LSTM based deep generative model from phage display library for affinity maturation","volume":"11","author":"Saka","year":"2021","journal-title":"Sci Rep"},{"key":"2023102103544369900_ref42","first-page":"671552","article-title":"RamaNet: computational de novo helical protein backbone design using a long short-term memory generative neural network","volume-title":"F1000 Research Full","author":"Sabban","year":"2020"},{"key":"2023102103544369900_ref43","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1186\/s12859-018-2280-5","article-title":"Prediction of 8-state protein secondary structures by a novel deep learning architecture","volume":"19","author":"Zhang","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023102103544369900_ref44","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2021.naacl-main.405","volume-title":"Limitations of Autoregressive Models and Their Alternatives","author":"Lin","year":"2021"},{"key":"2023102103544369900_ref45","doi-asserted-by":"crossref","first-page":"5800","DOI":"10.1038\/s41467-021-25756-4","article-title":"Efficient generative Modeling of protein sequences using simple autoregressive models","volume":"12","author":"Trinquier","year":"2021","journal-title":"Nat Commun"},{"key":"2023102103544369900_ref46","doi-asserted-by":"crossref","first-page":"2403","DOI":"10.1038\/s41467-021-22732-w","article-title":"Protein design and variant prediction using autoregressive generative models","volume":"12","author":"Shin","year":"2021","journal-title":"Nat Commun"},{"key":"2023102103544369900_ref47","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1109\/BIBM52615.2021.9669631","volume-title":"Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"Zhang","year":"2021"},{"key":"2023102103544369900_ref48","doi-asserted-by":"crossref","DOI":"10.1101\/103994","volume-title":"Deep Recurrent Neural Network for Protein Function Prediction from Sequence","author":"Liu","year":"2017"},{"key":"2023102103544369900_ref49","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1007\/s12065-018-0171-3","article-title":"A novel improved prediction of protein structural class using deep recurrent neural network","volume":"14","author":"Panda","year":"2021","journal-title":"Evol Intell"},{"key":"2023102103544369900_ref50","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1126\/science.aba3304","article-title":"An evolution-based model for designing Chorismate mutase enzymes","volume":"369","author":"Russ","year":"2020","journal-title":"Science"},{"key":"2023102103544369900_ref51","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1038\/s41598-020-79682-4","article-title":"Transformer neural network for protein-specific de novo drug generation as a machine translation problem","volume":"11","author":"Grechishnikova","year":"2021","journal-title":"Sci Rep"},{"key":"2023102103544369900_ref52","doi-asserted-by":"crossref","first-page":"2154","DOI":"10.1021\/acssynbio.0c00219","article-title":"Signal peptides generated by attention-based neural networks","volume":"9","author":"Wu","year":"2020","journal-title":"ACS Synth Biol"},{"key":"2023102103544369900_ref53","doi-asserted-by":"crossref","first-page":"2269","DOI":"10.1093\/bioinformatics\/btac104","article-title":"TransformerGO: predicting protein\u2013protein interactions by modelling the attention between sets of gene ontology terms","volume":"38","author":"Ieremie","year":"2022","journal-title":"Bioinformatics"},{"key":"2023102103544369900_ref54","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1002\/prot.26052","article-title":"Combination of deep neural network with attention mechanism enhances the explainability of protein contact prediction","volume":"89","author":"Chen","year":"2021","journal-title":"Proteins Struct Funct Bioinforma"},{"key":"2023102103544369900_ref55","volume-title":"An Introduction to Convolutional Neural Networks","author":"O\u2019Shea","year":"2015"},{"key":"2023102103544369900_ref56","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1093\/bioinformatics\/btab715","article-title":"HyperAttentionDTI: improving drug\u2013protein interaction prediction by sequence-based deep learning with attention mechanism","volume":"38","author":"Zhao","year":"2022","journal-title":"Bioinformatics"},{"key":"2023102103544369900_ref57","author":"Devlin","year":"2019"},{"key":"2023102103544369900_ref58","article-title":"Improving language understanding by generative pre-training","author":"Radford"},{"key":"2023102103544369900_ref59","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems 33 (NeurIPS 2020)","author":"Brown","year":"2020"},{"key":"2023102103544369900_ref60","first-page":"200","volume-title":"Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.","author":"Tsimpoukelli","year":"2021"},{"key":"2023102103544369900_ref61","article-title":"BART: Denoising sequence-to-sequence pre-training for natural language generation","volume":"58","author":"Lewis","year":"2019","journal-title":"Transl Comprehen"},{"key":"2023102103544369900_ref62","volume-title":"Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers","author":"Choromanski","year":"2020"},{"key":"2023102103544369900_ref63","doi-asserted-by":"crossref","first-page":"123912","DOI":"10.1109\/ACCESS.2021.3110269","article-title":"Pre-training of deep bidirectional protein sequence representations with structural information","volume":"9","author":"Min","year":"2021","journal-title":"IEEE Access"},{"key":"2023102103544369900_ref64","doi-asserted-by":"crossref","first-page":"1732","DOI":"10.3390\/molecules22101732","article-title":"ProLanGO: protein function prediction using neural machine translation based on a recurrent neural network","volume":"22","author":"Cao","year":"2017","journal-title":"Molecules"},{"key":"2023102103544369900_ref65","doi-asserted-by":"crossref","first-page":"e0225317","DOI":"10.1371\/journal.pone.0225317","article-title":"An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences","volume":"14","author":"Hu","year":"2019","journal-title":"PloS One"},{"key":"2023102103544369900_ref66","first-page":"2021.01.26.428322","article-title":"Generating novel protein sequences using Gibbs sampling of masked language models","author":"Johnson","year":"2021"},{"key":"2023102103544369900_ref67","first-page":"16990","volume-title":"Proceedings of the Proceedings of the 39th International Conference on Machine Learning","author":"Notin"},{"key":"2023102103544369900_ref68","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1038\/s42256-022-00532-1","article-title":"Transformer-based protein generation with regularized latent space optimization","volume":"4","author":"Castro","year":"2022","journal-title":"Nat. Mach. Intell."},{"key":"2023102103544369900_ref69","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1109\/TKDE.2009.191","article-title":"A survey on transfer learning","volume":"22","author":"Pan","year":"2010","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2023102103544369900_ref70","volume-title":"Nucleic Acids Research","author":"UniProt: A Hub for Protein Information"},{"key":"2023102103544369900_ref71","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: comprehensive and non-redundant UniProt reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023102103544369900_ref72","doi-asserted-by":"crossref","first-page":"D387","DOI":"10.1093\/nar\/gkab1053","article-title":"The sequence read archive: a decade more of explosive growth","volume":"50","author":"Katz","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023102103544369900_ref73","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41592-021-01100-y","article-title":"Low-N protein engineering with data-efficient deep learning","volume":"18","author":"Biswas","year":"2021","journal-title":"Nat Methods"},{"key":"2023102103544369900_ref74","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Lin","year":"2023","journal-title":"Science"},{"key":"2023102103544369900_ref75","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1186\/s12859-019-3220-8","article-title":"Modeling aspects of the language of life through transfer-learning protein sequences","volume":"20","author":"Heinzinger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023102103544369900_ref76","doi-asserted-by":"crossref","DOI":"10.1145\/3388440.3412467","article-title":"Transforming the language of life: transformer neural networks for protein prediction tasks","volume-title":"Journal of Computational Biology","author":"Nambiar","year":"2020"},{"key":"2023102103544369900_ref77","article-title":"BERTology meets biology: interpreting attention in protein language models","author":"Vig","year":"2021"},{"key":"2023102103544369900_ref78","volume-title":"Pre-Training Co-Evolutionary Protein Representation via A Pairwise Masked Language Model","author":"He","year":"2021"},{"key":"2023102103544369900_ref79","article-title":"Protein sequence profile prediction using ProtAlbert transformer","volume-title":"Computational Biology and Chemistry","author":"Behjati","year":"2021"},{"key":"2023102103544369900_ref80","doi-asserted-by":"crossref","DOI":"10.3390\/pharmaceutics15051337","article-title":"Protein fitness prediction is impacted by the interplay of language models, ensemble learning, and sampling methods","volume":"15","author":"Mardikoraem","year":"2023","journal-title":"Pharmaceutics"},{"key":"2023102103544369900_ref81","volume-title":"Is Transfer Learning Necessary for Protein Landscape Prediction?","author":"Shanehsazzadeh","year":"2020"},{"key":"2023102103544369900_ref82","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1016\/j.cels.2021.07.008","article-title":"Informed training set design enables efficient machine learning-assisted directed protein evolution","volume":"12","author":"Wittmann","year":"2021","journal-title":"Cell Syst."},{"key":"2023102103544369900_ref83","volume-title":"Variational Auto-Encoding of Protein Sequences","author":"Sinai","year":"2018"},{"key":"2023102103544369900_ref84","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","article-title":"Variational inference: a review for statisticians","volume":"112","author":"Blei","year":"2017","journal-title":"J Am Stat Assoc"},{"key":"2023102103544369900_ref85","article-title":"Beta-VAE: learning basic visual concepts with a constrained variational framework","volume-title":"Conference Paper for International Conference on Learning Representations (ICLR)","author":"Higgins","year":"2022"},{"key":"2023102103544369900_ref86","volume-title":"Preventing Posterior Collapse with \u03b4-VAES","author":"Razavi","year":"2019"},{"key":"2023102103544369900_ref87","doi-asserted-by":"crossref","first-page":"e46935","DOI":"10.7554\/eLife.46935","article-title":"IV deep generative models for T cell receptor protein sequences","volume":"8","author":"Davidsen","year":"2019","journal-title":"Elife"},{"key":"2023102103544369900_ref88","doi-asserted-by":"crossref","first-page":"16189","DOI":"10.1038\/s41598-018-34533-1","article-title":"Design of metalloproteins and novel protein folds using variational autoencoders","volume":"8","author":"Greener","year":"2018","journal-title":"Sci Rep"},{"key":"2023102103544369900_ref89","first-page":"000135","article-title":"Analysing protein dynamics using machine learning based generative models","volume-title":"Proceedings of the 2020 IEEE 14th International Symposium on Applied Computational Intelligence and Informatics (SACI)","author":"Albu","year":"2020"},{"key":"2023102103544369900_ref90","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.cels.2020.05.007","article-title":"A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences","volume":"11","author":"Linder","year":"2020","journal-title":"Cell Syst"},{"key":"2023102103544369900_ref91","doi-asserted-by":"crossref","first-page":"e1008736","DOI":"10.1371\/journal.pcbi.1008736","article-title":"Generating functional protein variants with variational autoencoders","volume":"17","author":"Hawkins-Hooker","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2023102103544369900_ref92","doi-asserted-by":"crossref","first-page":"6302","DOI":"10.1038\/s41467-021-26529-9","article-title":"The generative capacity of probabilistic protein sequence models","volume":"12","author":"McGee","year":"2021","journal-title":"Nat Commun"},{"key":"2023102103544369900_ref93","volume-title":"Ancestral Protein Sequence Reconstruction Using a Tree-Structured Ornstein-Uhlenbeck Variational Autoencoder","author":"Moreta","year":"2022"},{"key":"2023102103544369900_ref94","volume-title":"Wasserstein GAN","author":"Arjovsky","year":"2017"},{"key":"2023102103544369900_ref95","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s42256-020-0222-1","article-title":"Protein function prediction is improved by creating synthetic feature samples with generative adversarial networks","volume":"2","author":"Wan","year":"2020","journal-title":"Nat. Mach. Intell."},{"key":"2023102103544369900_ref96","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1038\/s42256-019-0017-4","article-title":"Feedback GAN for DNA optimizes protein functions","volume":"1","author":"Gupta","year":"2019","journal-title":"Nat Mach Intell"},{"key":"2023102103544369900_ref97","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1038\/nbt.1990","article-title":"Comprehensive analysis of kinase inhibitor selectivity","volume":"29","author":"Davis","year":"2011","journal-title":"Nat Biotechnol"},{"key":"2023102103544369900_ref98","doi-asserted-by":"crossref","DOI":"10.3389\/fgene.2019.01243","article-title":"GANsDTA: predicting drug-target binding affinity using GANs","volume":"10","author":"Zhao","year":"2020","journal-title":"Front Genet"},{"key":"2023102103544369900_ref99","doi-asserted-by":"crossref","DOI":"10.1101\/2020.04.12.024844","article-title":"Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks","author":"Amimeur","year":"2020"},{"key":"2023102103544369900_ref100","volume-title":"MutaGAN: A Seq2seq GAN Framework to Predict Mutations of Evolving Protein Populations","author":"Berman","year":"2020"},{"key":"2023102103544369900_ref101","doi-asserted-by":"crossref","first-page":"e0244430","DOI":"10.1371\/journal.pone.0244430","article-title":"PFP-WGAN: protein function prediction by discovering gene ontology term correlations with generative adversarial networks","volume":"16","author":"Seyyedsalehi","year":"2021","journal-title":"PloS One"},{"key":"2023102103544369900_ref102","volume-title":"Hierarchical Text-Conditional Image Generation with CLIP Latents","author":"Ramesh","year":"2022"},{"key":"2023102103544369900_ref103","first-page":"8821","volume-title":"Proceedings of the Proceedings of the 38th International Conference on Machine Learning","author":"Ramesh","year":"8831"},{"key":"2023102103544369900_ref104","volume-title":"Denoising Diffusion Probabilistic Models","author":"Ho","year":"2020"},{"key":"2023102103544369900_ref105","doi-asserted-by":"crossref","first-page":"16591","DOI":"10.1109\/ACCESS.2021.3053408","article-title":"INet: convolutional networks for biomedical image segmentation","volume":"9","author":"Weng","year":"2021","journal-title":"IEEE Access"},{"key":"2023102103544369900_ref106","volume-title":"Improved Denoising Diffusion Probabilistic Models","author":"Nichol","year":"2021"},{"key":"2023102103544369900_ref107","first-page":"8780","article-title":"Diffusion models beat GANs on image synthesis","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Dhariwal","year":"2021"},{"key":"2023102103544369900_ref108","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1016\/j.cels.2020.10.007","article-title":"Inferring protein sequence-function relationships with large-scale positive-Unlabeled learning","volume":"12","author":"Song","year":"2021","journal-title":"Cell Syst"},{"key":"2023102103544369900_ref109","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/s13662-018-1466-5","article-title":"Numerical methods for simulation of stochastic differential equations","volume":"2018","author":"Bayram","year":"2018","journal-title":"Adv Differ Equ"},{"key":"2023102103544369900_ref110","doi-asserted-by":"crossref","DOI":"10.1145\/3626235","article-title":"Diffusion models: a comprehensive survey of methods and applications","author":"Yang","year":"2023"},{"issue":"12","key":"2023102103544369900_ref111","first-page":"01.518682","article-title":"Illuminating protein space with a programmable generative model","volume":"2022","author":"Ingraham","year":"2022","journal-title":"bioRxiv"},{"key":"2023102103544369900_ref112","doi-asserted-by":"crossref","DOI":"10.1101\/2023.05.08.539766","article-title":"Joint generation of protein sequence and structure with RoseTTAFold sequence space diffusion","author":"Lisanza","year":"2023"},{"key":"2023102103544369900_ref113","doi-asserted-by":"crossref","first-page":"1828","DOI":"10.1016\/j.chempr.2023.03.020","article-title":"Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model","volume":"9","author":"Ni","year":"2023","journal-title":"Chem"},{"key":"2023102103544369900_ref114","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2023102103544369900_ref115","volume-title":"Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models","author":"Anand","year":"2022"},{"key":"2023102103544369900_ref116","article-title":"Joint protein sequence-structure co-design via Equivariant diffusion","author":"Vinod"},{"key":"2023102103544369900_ref117","doi-asserted-by":"crossref","first-page":"1156","DOI":"10.1021\/acsbiomaterials.1c01343","article-title":"End-to-end deep learning model to predict and design secondary structure content of structural proteins","volume":"8","author":"Yu","year":"2022","journal-title":"ACS Biomater Sci Eng"},{"key":"2023102103544369900_ref118","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1038\/s41551-021-00699-9","article-title":"Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning","volume":"5","author":"Mason","year":"2021","journal-title":"Nat Biomed Eng"},{"key":"2023102103544369900_ref119","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1002\/pro.4205","article-title":"Observed antibody space: a diverse database of cleaned, annotated, and translated unpaired and paired antibody sequences","volume":"31","author":"Olsen","year":"2022","journal-title":"Protein Sci Publ Protein Soc"},{"key":"2023102103544369900_ref120","article-title":"AbDiffuser: full-atom generation of in-vitro functioning antibodies","author":"Martinkus","year":"2023"},{"key":"2023102103544369900_ref121","article-title":"Generating images with perceptual similarity metrics based on deep networks","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Dosovitskiy","year":"2016"},{"key":"2023102103544369900_ref122","volume-title":"Learning Hierarchical Priors in VAEs","author":"Klushyn","year":"2019"},{"key":"2023102103544369900_ref123","volume-title":"Ladder Variational Autoencoders","author":"S\u00f8nderby","year":"2016"},{"key":"2023102103544369900_ref124","volume-title":"Neural Discrete Representation Learning","author":"Oord","year":"2018"},{"key":"2023102103544369900_ref125","article-title":"Reformer: the efficient transformer","author":"Kitaev","year":"2020"},{"key":"2023102103544369900_ref126","first-page":"17283","article-title":"Big bird: transformers for longer sequences","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Zaheer","year":"2020"},{"key":"2023102103544369900_ref127","volume-title":"Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks","author":"Zhu","year":"2020"},{"key":"2023102103544369900_ref128","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.2017.304","volume-title":"Least Squares Generative Adversarial Networks","author":"Mao","year":"2017"},{"key":"2023102103544369900_ref129","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Srivastava","year":"2017"},{"key":"2023102103544369900_ref130","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1007\/978-3-031-20050-2_17","volume-title":"Computer Vision \u2013 ECCV 2022","author":"Jing","year":"2022"},{"key":"2023102103544369900_ref131","volume-title":"Training Diffusion Models with Reinforcement Learning","author":"Black","year":"2023"},{"key":"2023102103544369900_ref132","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023102103544369900_ref133","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.sbi.2021.11.008","article-title":"Deep generative modeling for protein design","volume":"72","author":"Strokach","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2023102103544369900_ref134","first-page":"2021.07.09.450648","article-title":"Language models enable zero-shot prediction of the effects of mutations on protein function","author":"Meier","year":"2021"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad358\/52312609\/bbad358.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/24\/6\/bbad358\/52312609\/bbad358.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,21]],"date-time":"2023-10-21T03:55:33Z","timestamp":1697860533000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad358\/7325909"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":134,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,9,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad358","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,9,22]]},"article-number":"bbad358"}}