{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:32Z","timestamp":1772138012669,"version":"3.50.1"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T00:00:00Z","timestamp":1624665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001700","name":"Ministry of Education, Culture, Sports, Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001700","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["17H06410"],"award-info":[{"award-number":["17H06410"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["19K20409"],"award-info":[{"award-number":["19K20409"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["19K06502"],"award-info":[{"award-number":["19K06502"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["19K06077"],"award-info":[{"award-number":["19K06077"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["20H00315"],"award-info":[{"award-number":["20H00315"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","award":["JP19am0401023"],"award-info":[{"award-number":["JP19am0401023"]}],"id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","award":["JP19ak0101122"],"award-info":[{"award-number":["JP19ak0101122"]}],"id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Accurate variant effect prediction has broad impacts on protein engineering. Recent machine learning approaches toward this end are based on representation learning, by which feature vectors are learned and generated from unlabeled sequences. However, it is unclear how to effectively learn evolutionary properties of an engineering target protein from homologous sequences, taking into account the protein\u2019s sequence-level structure called domain architecture (DA). Additionally, no optimal protocols are established for incorporating such properties into Transformer, the neural network well-known to perform the best in natural language processing research. This article proposes DA-aware evolutionary fine-tuning, or \u2018evotuning\u2019, protocols for Transformer-based variant effect prediction, considering various combinations of homology search, fine-tuning and sequence vectorization strategies. We exhaustively evaluated our protocols on diverse proteins with different functions and DAs. The results indicated that our protocols achieved significantly better performances than previous DA-unaware ones. The visualizations of attention maps suggested that the structural information was incorporated by evotuning without direct supervision, possibly leading to better prediction accuracy.<\/jats:p>","DOI":"10.1093\/bib\/bbab234","type":"journal-article","created":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T15:27:02Z","timestamp":1622647622000},"source":"Crossref","is-referenced-by-count":15,"title":["Evotuning protocols for Transformer-based variant effect prediction on multi-domain proteins"],"prefix":"10.1093","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9096-5591","authenticated-orcid":false,"given":"Hideki","family":"Yamaguchi","sequence":"first","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan"},{"name":"Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4853-0153","authenticated-orcid":false,"given":"Yutaka","family":"Saito","sequence":"additional","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba 277-8561, Japan"},{"name":"Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Koto-ku, Tokyo 135-0064, Japan"},{"name":"AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Shinjuku-ku, Tokyo 169-8555, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2021,6,26]]},"reference":[{"key":"2021110815065645100_ref1","doi-asserted-by":"crossref","first-page":"5618","DOI":"10.1073\/pnas.90.12.5618","article-title":"Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide","volume":"90","author":"Chen","year":"1993","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110815065645100_ref2","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1038\/nbt1172","article-title":"Engineering and characterization of a superfolder green fluorescent protein","volume":"24","author":"P\u00e9delacq","year":"2006","journal-title":"Nat Biotechnol"},{"key":"2021110815065645100_ref3","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature10975","article-title":"Exploiting a natural conformational switch to engineer an interleukin-2 \u2018superkine\u2019","volume":"484","author":"Levin","year":"2012","journal-title":"Nature"},{"key":"2021110815065645100_ref4","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1038\/nature24644","article-title":"Programmable base editing of A\u2022T to G\u2022C in genomic DNA without DNA cleavage","volume":"551","author":"Gaudelli","year":"2017","journal-title":"Nature"},{"key":"2021110815065645100_ref5","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/j.jmb.2005.02.007","article-title":"Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions","volume":"348","author":"Ekman","year":"2005","journal-title":"J Mol Biol"},{"key":"2021110815065645100_ref6","doi-asserted-by":"crossref","DOI":"10.1155\/2012\/980250","article-title":"scFv antibody: principles and clinical application","volume":"2012","author":"Ahmad","year":"2012","journal-title":"Clin Dev Immunol"},{"key":"2021110815065645100_ref7","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1038\/s41579-019-0299-x","article-title":"Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants","volume":"18","author":"Makarova","year":"2020","journal-title":"Nat Rev Microbiol"},{"key":"2021110815065645100_ref8","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1016\/j.cels.2017.11.003","article-title":"Quantitative missense variant effect prediction using large-scale mutagenesis data","volume":"6","author":"Gray","year":"2018","journal-title":"Cell Syst"},{"key":"2021110815065645100_ref9","doi-asserted-by":"crossref","first-page":"874","DOI":"10.1038\/s41588-018-0122-z","article-title":"Multiplex assessment of protein variant abundance by massively parallel sequencing","volume":"50","author":"Matreyek","year":"2018","journal-title":"Nat Genet"},{"key":"2021110815065645100_ref10","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/nmeth.3223","article-title":"Massively parallel single-amino-acid mutagenesis","volume":"12","author":"Kitzman","year":"2015","journal-title":"Nat Methods"},{"key":"2021110815065645100_ref11","doi-asserted-by":"crossref","first-page":"1581","DOI":"10.1093\/molbev\/msu081","article-title":"A comprehensive, high-resolution map of a gene's fitness landscape","volume":"31","author":"Firnberg","year":"2014","journal-title":"Mol Biol Evol"},{"key":"2021110815065645100_ref12","doi-asserted-by":"crossref","first-page":"e112","DOI":"10.1093\/nar\/gku511","article-title":"Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes","volume":"42","author":"Melnikov","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2021110815065645100_ref13","doi-asserted-by":"crossref","first-page":"2854","DOI":"10.1016\/j.jmb.2014.05.019","article-title":"Systematic exploration of ubiquitin sequence, E1 activation efficiency, and experimental fitness in yeast","volume":"426","author":"Roscoe","year":"2014","journal-title":"J Mol Biol"},{"key":"2021110815065645100_ref14","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1038\/nmeth.1492","article-title":"High-resolution mapping of protein sequence-function relationships","volume":"7","author":"Fowler","year":"2010","journal-title":"Nat Methods"},{"key":"2021110815065645100_ref15","doi-asserted-by":"crossref","first-page":"E1263","DOI":"10.1073\/pnas.1303309110","article-title":"Activity-enhancing mutations in an E3 ubiquitin ligase identified by high-throughput mutagenesis","volume":"110","author":"Starita","year":"2013","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110815065645100_ref16","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1038\/nature11500","article-title":"The spatial architecture of protein function and adaptation","volume":"491","author":"McLaughlin","year":"2012","journal-title":"Nature"},{"key":"2021110815065645100_ref17","doi-asserted-by":"crossref","first-page":"1537","DOI":"10.1261\/rna.040709.113","article-title":"Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein","volume":"19","author":"Melamed","year":"2013","journal-title":"RNA"},{"key":"2021110815065645100_ref18","doi-asserted-by":"crossref","first-page":"2014","DOI":"10.1021\/acssynbio.8b00155","article-title":"Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins","volume":"7","author":"Saito","year":"2018","journal-title":"ACS Synth Biol"},{"key":"2021110815065645100_ref19","doi-asserted-by":"crossref","first-page":"8852","DOI":"10.1073\/pnas.1901979116","article-title":"Machine learning-assisted directed protein evolution with combinatorial libraries","volume":"116","author":"Wu","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110815065645100_ref20","doi-asserted-by":"crossref","first-page":"1176","DOI":"10.1038\/s41592-019-0583-8","article-title":"Machine learning-guided channel rhodopsin engineering enables minimally invasive optogenetics","volume":"16","author":"Bedbrook","year":"2019","journal-title":"Nat Methods"},{"key":"2021110815065645100_ref21","volume-title":"33rd Conference on Neural Information Processing Systems","author":"Rao","year":"2019"},{"key":"2021110815065645100_ref22","doi-asserted-by":"publisher","DOI":"10.1101\/2020.07.12.199554","article-title":"ProtTrans: towards cracking the language of life\u2019s code through self-supervised deep learning and high performance computing","author":"Elnaggar","journal-title":"bioRxiv"},{"key":"2021110815065645100_ref23","doi-asserted-by":"publisher","DOI":"10.1101\/622803","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","author":"Rives","journal-title":"bioRxiv"},{"key":"2021110815065645100_ref24","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/28.1.374","article-title":"AAindex: amino acid index database","volume":"28","author":"Kawashima","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2021110815065645100_ref25","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1016\/j.molstruc.2006.07.004","article-title":"T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides","volume":"830","author":"Tian","year":"2007","journal-title":"J Mol Struct"},{"key":"2021110815065645100_ref26","doi-asserted-by":"crossref","first-page":"2642","DOI":"10.1093\/bioinformatics\/bty178","article-title":"Learned protein embeddings for machine learning","volume":"34","author":"Yang","year":"2018","journal-title":"Bioinformatics"},{"key":"2021110815065645100_ref27","article-title":"Multiplicative LSTM for sequence modelling","author":"Krause","journal-title":"arXiv"},{"key":"2021110815065645100_ref28","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat Methods"},{"key":"2021110815065645100_ref29","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2021110815065645100_ref30","first-page":"353","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Wang"},{"key":"2021110815065645100_ref31","volume-title":"33rd Conference on Neural Information Processing Systems","author":"Wang","year":"2019"},{"key":"2021110815065645100_ref32","doi-asserted-by":"crossref","first-page":"D427","DOI":"10.1093\/nar\/gky995","article-title":"The Pfam protein families database in 2019","volume":"47","author":"El-Gebali","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2021110815065645100_ref33","first-page":"8024","article-title":"PyTorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv Neural Inf Process Syst"},{"key":"2021110815065645100_ref34","article-title":"Adam: a method for stochastic optimization","author":"Kingma","journal-title":"arXiv"},{"key":"2021110815065645100_ref35","article-title":"Mixed precision training","author":"Micikevicius","journal-title":"arXiv"},{"key":"2021110815065645100_ref36","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin"},{"key":"2021110815065645100_ref37","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated Profile HMM Searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput Biol"},{"key":"2021110815065645100_ref38","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","article-title":"UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches","volume":"31","author":"Suzek","year":"2015","journal-title":"Bioinformatics"},{"key":"2021110815065645100_ref39","doi-asserted-by":"crossref","first-page":"3636","DOI":"10.1073\/pnas.1814684116","article-title":"Grammar of protein domain architectures","volume":"116","author":"Yu","year":"2019","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110815065645100_ref40","doi-asserted-by":"crossref","first-page":"24294","DOI":"10.1073\/pnas.2007201117","article-title":"Supertertiary protein structure affects an allosteric network","volume":"117","author":"Laursen","year":"2020","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2021110815065645100_ref41","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1016\/S0092-8674(00)81517-2","article-title":"Recognition of polyadenylate RNA by the poly(A)-binding protein","volume":"98","author":"Deo","year":"1999","journal-title":"Cell"},{"key":"2021110815065645100_ref42","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1016\/j.molcel.2012.09.001","article-title":"Interdomain allostery promotes assembly of the poly(A) mRNA complex with PABP and eIF4G","volume":"48","author":"Safaee","year":"2012","journal-title":"Mol Cell"},{"key":"2021110815065645100_ref43","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/S0092-8674(00)81663-3","article-title":"Crystal structure of the PTEN tumor suppressor: implications for its phosphoinositide phosphatase activity and membrane association","volume":"99","author":"Lee","year":"1999","journal-title":"Cell"},{"key":"2021110815065645100_ref44","doi-asserted-by":"crossref","first-page":"588","DOI":"10.1016\/j.celrep.2016.03.046","article-title":"Systematic mutant analyses elucidate general and client-specific aspects of Hsp90 function","volume":"15","author":"Mishra","year":"2016","journal-title":"Cell Rep"},{"issue":"36","key":"2021110815065645100_ref45","doi-asserted-by":"crossref","first-page":"33689","DOI":"10.1074\/jbc.M103832200","article-title":"Coordinated ATP hydrolysis by the Hsp90 dimer","volume":"276","author":"Richter","year":"2001  7","journal-title":"J Biol Chem"},{"issue":"7","key":"2021110815065645100_ref46","doi-asserted-by":"crossref","first-page":"1019","DOI":"10.1016\/j.str.2008.03.015","article-title":"Structural basis for dimerization in DNA recognition by Gal4","volume":"16","author":"Hong","year":"2008","journal-title":"Structure"},{"key":"2021110815065645100_ref47","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1214\/009053604000000067","article-title":"Least angle regression","volume":"32","author":"Efron","year":"2004","journal-title":"Ann Statist"},{"key":"2021110815065645100_ref48","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2021110815065645100_ref49","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1038\/s41592-019-0686-2","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in Python","volume":"17","author":"Virtanen","year":"2020","journal-title":"Nat Methods"},{"key":"2021110815065645100_ref50","volume-title":"International Conference on Learning Representations","author":"Rao","year":"2021"},{"key":"2021110815065645100_ref51","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/s41592-018-0138-4","article-title":"Deep generative models of genetic variation capture the effects of mutations","volume":"15","author":"Riesselman","year":"2018","journal-title":"Nat Methods"},{"key":"2021110815065645100_ref52","doi-asserted-by":"publisher","DOI":"10.1101\/2020.01.23.917682","article-title":"Low-N protein engineering with data-efficient deep learning","author":"Biswas","journal-title":"bioRxiv"},{"key":"2021110815065645100_ref53","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1038\/ng0398-280","article-title":"An intramolecular SH3-domain interaction regulates c-Abl activity","volume":"18","author":"Baril\u00e1","year":"1998","journal-title":"Nat Genet"},{"key":"2021110815065645100_ref54","article-title":"JAX: composable transformations of Python+NumPy programs","author":"Bradbury"},{"key":"2021110815065645100_ref55","doi-asserted-by":"publisher","DOI":"10.1101\/2020.05.11.088344","article-title":"Reimplementing Unirep in JAX","author":"Ma","journal-title":"bioRxiv"},{"key":"2021110815065645100_ref56","article-title":"Auto-encoding variational bayes","author":"Kingma","year":"2014","journal-title":"arXiv"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab234\/41089186\/bbab234.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab234\/41089186\/bbab234.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T10:12:26Z","timestamp":1636366346000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab234\/6309928"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,26]]},"references-count":56,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab234","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.03.05.434175","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,6,26]]},"article-number":"bbab234"}}