{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T01:32:56Z","timestamp":1781659976773,"version":"3.54.5"},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T00:00:00Z","timestamp":1716508800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Science Foundation Project of Jilin Provincial Department of Science and Technology","award":["20210101174JC"],"award-info":[{"award-number":["20210101174JC"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Due to the varying delivery methods of mRNA vaccines, codon optimization plays a critical role in vaccine design to improve the stability and expression of proteins in specific tissues. Considering the many-to-one relationship between synonymous codons and amino acids, the number of mRNA sequences encoding the same amino acid sequence could be enormous. Finding stable and highly expressed mRNA sequences from the vast sequence space using in silico methods can generally be viewed as a path-search problem or a machine translation problem. However, current deep learning-based methods inspired by machine translation may have some limitations, such as recurrent neural networks, which have a weak ability to capture the long-term dependencies of codon preferences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We develop a BERT-based architecture that uses the cross-attention mechanism for codon optimization. In CodonBERT, the codon sequence is randomly masked with each codon serving as a key and a value. In the meantime, the amino acid sequence is used as the query. CodonBERT was trained on high-expression transcripts from Human Protein Atlas mixed with different proportions of high codon adaptation index codon sequences. The result showed that CodonBERT can effectively capture the long-term dependencies between codons and amino acids, suggesting that it can be used as a customized training framework for specific optimization targets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>CodonBERT is freely available on https:\/\/github.com\/FPPGroup\/CodonBERT.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae330","type":"journal-article","created":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T21:35:07Z","timestamp":1716586507000},"source":"Crossref","is-referenced-by-count":27,"title":["CodonBERT: a BERT-based architecture tailored for codon optimization using the cross-attention mechanism"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3621-1024","authenticated-orcid":false,"given":"Zilin","family":"Ren","sequence":"first","affiliation":[{"name":"Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control , Changchun 130122, China"},{"name":"School of Information Science and Technology, Northeast Normal University , Changchun 130117, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lili","family":"Jiang","sequence":"additional","affiliation":[{"name":"Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control , Changchun 130122, China"},{"name":"School of Information Science and Technology, Northeast Normal University , Changchun 130117, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yaxin","family":"Di","sequence":"additional","affiliation":[{"name":"Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control , Changchun 130122, China"},{"name":"College of Veterinary Medicine, Northeast Agricultural University , Harbin 150038, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dufei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control , Changchun 130122, China"},{"name":"School of Information Science and Technology, Northeast Normal University , Changchun 130117, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianli","family":"Gong","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Wuhan Technology and Business University , Wuhan 340000, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianting","family":"Gong","sequence":"additional","affiliation":[{"name":"Deartment of Regenerative Medicine, Institute of Health Service and Transfusion Medicine , Beijing 100850, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qiwei","family":"Jiang","sequence":"additional","affiliation":[{"name":"Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control , Changchun 130122, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhiguo","family":"Fu","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northeast Normal University , Changchun 130117, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pingping","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northeast Normal University , Changchun 130117, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bo","family":"Zhou","sequence":"additional","affiliation":[{"name":"Changchun Veterinary Research Institute, Chinese Academy of Agricultural Sciences, State Key Laboratory of Pathogen and Biosecurity, Key Laboratory of Jilin Province for Zoonosis Prevention and Control , Changchun 130122, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ming","family":"Ni","sequence":"additional","affiliation":[{"name":"Deartment of Regenerative Medicine, Institute of Health Service and Transfusion Medicine , Beijing 100850, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,5,24]]},"reference":[{"key":"2024070608104783400_btae330-B1","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1038\/nature16509","article-title":"Codon influence on protein expression in E. coli correlates with mRNA levels","volume":"529","author":"Bo\u00ebl","year":"2016","journal-title":"Nature"},{"key":"2024070608104783400_btae330-B2","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1093\/bioinformatics\/btac020","article-title":"ProteinBERT: a universal deep-learning model of protein sequence and function","volume":"38","author":"Brandes","year":"2022","journal-title":"Bioinformatics"},{"key":"2024070608104783400_btae330-B3","doi-asserted-by":"crossref","first-page":"D916","DOI":"10.1093\/nar\/gkaa1087","article-title":"GENCODE 2021","volume":"49","author":"Frankish","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024070608104783400_btae330-B4","doi-asserted-by":"crossref","first-page":"17617","DOI":"10.1038\/s41598-020-74091-z","article-title":"Codon optimization with deep learning to enhance protein expression","volume":"10","author":"Fu","year":"2020","journal-title":"Sci Rep"},{"key":"2024070608104783400_btae330-B5","doi-asserted-by":"crossref","first-page":"W526","DOI":"10.1093\/nar\/gki376","article-title":"JCat: a novel tool to adapt codon usage of a target gene to its potential expression host","volume":"33","author":"Grote","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2024070608104783400_btae330-B6","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1038\/nrm.2017.91","article-title":"Codon optimality, bias and usage in translation and mRNA decay","volume":"19","author":"Hanson","year":"2018","journal-title":"Nat Rev Mol Cell Biol"},{"key":"2024070608104783400_btae330-B7","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s13059-023-02868-2","article-title":"Using protein-per-mRNA differences among human tissues in codon optimization","volume":"24","author":"Hernandez-Alias","year":"2023","journal-title":"Genome Biol"},{"key":"2024070608104783400_btae330-B8","first-page":"13","article-title":"Codon usage and tRNA content in unicellular and multicellular organisms","volume":"2","author":"Ikemura","year":"1985","journal-title":"Mol Biol Evol"},{"key":"2024070608104783400_btae330-B9","doi-asserted-by":"publisher","author":"Jain","year":"2022","DOI":"10.1101\/2021.11.08.467706"},{"key":"2024070608104783400_btae330-B10","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/1748-7188-6-26","article-title":"ViennaRNA package 2.0","volume":"6","author":"Lorenz","year":"2011","journal-title":"Algorithms Mol Biol"},{"key":"2024070608104783400_btae330-B11","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1093\/bib\/3.1.87","article-title":"EMBOSS opens up sequence analysis. European molecular biology open software suite","volume":"3","author":"Olson","year":"2002","journal-title":"Brief Bioinform"},{"key":"2024070608104783400_btae330-B12","doi-asserted-by":"crossref","first-page":"W126","DOI":"10.1093\/nar\/gkm219","article-title":"OPTIMIZER: a web server for optimizing the codon usage of DNA sequences","volume":"35","author":"Puigb\u00f2","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2024070608104783400_btae330-B13","doi-asserted-by":"crossref","first-page":"eaay5947","DOI":"10.1126\/science.aay5947","article-title":"An atlas of the protein-coding genes in the human, pig, and mouse brain","volume":"367","author":"Sj\u00f6stedt","year":"2020","journal-title":"Science"},{"issue":"7","key":"2024070608104783400_btae330-B14","article-title":"Detailed dissection and critical evaluation of the pfizer\/BioNTech and moderna mRNA vaccines","volume":"9","author":"Xia","year":"2021","journal-title":"Vaccines (Basel)"},{"key":"2024070608104783400_btae330-B15","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1038\/s41586-023-06127-z","article-title":"Algorithm for optimized mRNA design improves stability and immunogenicity","volume":"621","author":"Zhang","year":"2023","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae330\/57890129\/btae330.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/7\/btae330\/58460623\/btae330.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/7\/btae330\/58460623\/btae330.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T08:11:05Z","timestamp":1720253465000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae330\/7681883"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2024,5,24]]},"references-count":15,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae330","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,5,24]]},"article-number":"btae330"}}