{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,17]],"date-time":"2026-06-17T01:36:50Z","timestamp":1781660210719,"version":"3.54.5"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,4,4]],"date-time":"2023-04-04T00:00:00Z","timestamp":1680566400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,4]],"date-time":"2023-04-04T00:00:00Z","timestamp":1680566400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>\n                      In protein sequences\u2014as there are 61 sense codons but only 20 standard amino acids\u2014most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting protein. Codon optimization of synthetic DNA sequences is important for heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of\n                      <jats:italic>Escherichia coli<\/jats:italic>\n                      . We compile a dataset of over 7,000 non-redundant, high-expression, robust genes which are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential context of codon usage in genes to be learned. Our tool can predict synonymous codons for synthetic genes toward optimal expression in\n                      <jats:italic>Escherichia coli<\/jats:italic>\n                      .\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>\n                      We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome. Based on computational metrics that predict protein expression, ICOR theoretically optimizes protein expression more than frequency-based approaches. ICOR is evaluated on 1,481\n                      <jats:italic>Escherichia coli<\/jats:italic>\n                      genes as well as a benchmark set of 40 select DNA sequences whose heterologous expression has been previously characterized. ICOR\u2019s performance is measured across five metrics: the Codon Adaptation Index, GC-content, negative repeat elements, negative cis-regulatory elements, and codon frequency distribution.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>The results, based on in silico metrics, indicate that ICOR codon optimization is theoretically more effective in enhancing recombinant expression of proteins over other established codon optimization techniques. Our tool is provided as an open-source software package that includes the benchmark set of sequences used in this study.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-023-05246-8","type":"journal-article","created":{"date-parts":[[2023,4,5]],"date-time":"2023-04-05T02:51:35Z","timestamp":1680663095000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":57,"title":["ICOR: improving codon optimization with recurrent neural networks"],"prefix":"10.1186","volume":"24","author":[{"given":"Rishab","family":"Jain","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aditya","family":"Jain","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6628-5956","authenticated-orcid":false,"given":"Elizabeth","family":"Mauro","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kevin","family":"LeShane","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Douglas","family":"Densmore","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,4,4]]},"reference":[{"key":"5246_CR1","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1038\/nature04342","volume":"438","author":"D Endy","year":"2005","unstructured":"Endy D. Foundations for engineering biology. Nature. 2005;438:449\u201353.","journal-title":"Nature"},{"key":"5246_CR2","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/j.pep.2003.11.006","volume":"34","author":"Z Zhou","year":"2004","unstructured":"Zhou Z, Schnake P, Xiao L, Lal AA. Enhanced expression of a recombinant malaria candidate vaccine in Escherichia coli by codon optimization. Protein Expr Purif. 2004;34:87\u201394.","journal-title":"Protein Expr Purif"},{"key":"5246_CR3","doi-asserted-by":"publisher","first-page":"1102","DOI":"10.1590\/S0100-879X2012007500142","volume":"45","author":"IP Nascimento","year":"2012","unstructured":"Nascimento IP, Leite LCC. Recombinant vaccines and the development of new vaccine strategies. Braz J Med Biol Res. 2012;45:1102\u201311.","journal-title":"Braz J Med Biol Res"},{"key":"5246_CR4","doi-asserted-by":"publisher","first-page":"239","DOI":"10.3390\/applmicrobiol1020018","volume":"1","author":"AM Mitchell","year":"2021","unstructured":"Mitchell AM, Gogulancea V, Smith W, Wipat A, Ofi\u0163eru ID. Recombinant protein production with Escherichia coli in Glucose and glycerol limited chemostats. Appl Microbiol. 2021;1:239\u201354.","journal-title":"Appl Microbiol"},{"key":"5246_CR5","doi-asserted-by":"publisher","first-page":"2656","DOI":"10.1021\/acssynbio.8b00332","volume":"7","author":"Z Lipinszki","year":"2018","unstructured":"Lipinszki Z, Vernyik V, Farago N, Sari T, Puskas LG, Blattner FR, et al. Enhancing the translational capacity of E coli by resolving the codon bias. ACS Synthetic Biol. 2018;7:2656\u201364.","journal-title":"ACS Synthetic Biol"},{"key":"5246_CR6","first-page":"E6117","volume":"113","author":"Z Zhoua","year":"2016","unstructured":"Zhoua Z, Danga Y, Zhou M, Li L, Yu CH, Fu J, et al. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci U S A. 2016;113:E6117\u201325.","journal-title":"Proc Natl Acad Sci U S A"},{"key":"5246_CR7","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1016\/j.tibtech.2004.04.006","volume":"22","author":"C Gustafsson","year":"2004","unstructured":"Gustafsson C, Govindarajan S, Minshull J. Codon bias and heterologous protein expression. Trends Biotechnol. 2004;22:346\u201353.","journal-title":"Trends Biotechnol"},{"key":"5246_CR8","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/j.tig.2017.02.001","volume":"33","author":"CE Brule","year":"2017","unstructured":"Brule CE, Grayhack EJ. Synonymous codons: choose wisely for expression. Trends Genet. 2017;33:283\u201397.","journal-title":"Trends Genet"},{"key":"5246_CR9","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1016\/0022-2836(81)90003-6","volume":"151","author":"T Ikemura","year":"1981","unstructured":"Ikemura T. Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Molecul Biol. 1981;151:389\u2013409.","journal-title":"J Molecul Biol"},{"key":"5246_CR10","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1186\/1471-2105-7-285","volume":"7","author":"A Villalobos","year":"2006","unstructured":"Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S. Gene designer: a synthetic biology tool for constructuring artificial DNA segments. BMC Bioinformatics. 2006;7:285.","journal-title":"BMC Bioinformatics"},{"key":"5246_CR11","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1038\/nrg2899","volume":"12","author":"JB Plotkin","year":"2011","unstructured":"Plotkin JB, Kudla G. Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet. 2011;12:32\u201342.","journal-title":"Nat Rev Genet"},{"key":"5246_CR12","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1021\/bp0300467","volume":"20","author":"W Gao","year":"2004","unstructured":"Gao W, Rzewski A, Sun H, Robbins PD, Gambotto A. UpGene: Application of a web-based dna codon optimization algorithm. Biotechnol Prog. 2004;20:443\u20138.","journal-title":"Biotechnol Prog"},{"issue":"324","key":"5246_CR13","first-page":"255","volume":"2009","author":"G Kudla","year":"1979","unstructured":"Kudla G, Murray AW, Tollervey D, Plotkin JB. Coding-sequence determinants of expression in Escherichia coli. Science. 1979;2009(324):255\u20138.","journal-title":"Science"},{"key":"5246_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1475-2859-8-41","volume":"8","author":"GL Rosano","year":"2009","unstructured":"Rosano GL, Ceccarelli EA. Rare codon content affects the solubility of recombinant proteins in a codon bias-adjusted Escherichia coli strain. Microb Cell Fact. 2009;8:1\u20139.","journal-title":"Microb Cell Fact"},{"key":"5246_CR15","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1016\/j.molmed.2014.09.003","volume":"20","author":"VP Mauro","year":"2014","unstructured":"Mauro VP, Chappell SA. A critical analysis of codon optimization in human therapeutics. Trends Mol Med. 2014;20:604\u201313.","journal-title":"Trends Mol Med"},{"key":"5246_CR16","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/s12934-016-0437-3","volume":"15","author":"L Sanchez-Garcia","year":"2016","unstructured":"Sanchez-Garcia L, Mart\u00edn L, Mangues R, Ferrer-Miralles N, V\u00e1zquez E, Villaverde A. Recombinant pharmaceuticals from microbial cells: a 2015 update. Microb Cell Fact. 2016;15:33.","journal-title":"Microb Cell Fact"},{"key":"5246_CR17","doi-asserted-by":"publisher","first-page":"3872","DOI":"10.3390\/ijms19123872","volume":"19","author":"J Tian","year":"2018","unstructured":"Tian J, Li Q, Chu X, Wu N. Presyncodon, a web server for gene design with the evolutionary information of the expression hosts. Int J Molecul Sci. 2018;19:3872.","journal-title":"Int J Molecul Sci"},{"key":"5246_CR18","doi-asserted-by":"publisher","first-page":"126","DOI":"10.1093\/nar\/gkm219","volume":"35","author":"P Puigb\u00f2","year":"2007","unstructured":"Puigb\u00f2 P, Guzm\u00e1 E, Romeu A, Garcia-Vallv\u00e9 S. OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res. 2007;35:126.","journal-title":"Nucleic Acids Res"},{"key":"5246_CR19","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1002\/biot.201000332","volume":"6","author":"E Angov","year":"2011","unstructured":"Angov E. Codon usage: nature\u2019s roadmap to expression and folding of proteins. Biotechnol J. 2011;6:650.","journal-title":"Biotechnol J"},{"key":"5246_CR20","doi-asserted-by":"publisher","first-page":"7439","DOI":"10.1038\/nature11952","volume":"495","author":"JM Hurley","year":"2013","unstructured":"Hurley JM, Dunlap JC. A fable of too much too fast. Nature. 2013;495:7439.","journal-title":"Nature"},{"key":"5246_CR21","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1005531","volume":"13","author":"JL Chaney","year":"2017","unstructured":"Chaney JL, Steele A, Carmichael R, Rodriguez A, Specht AT, Ngo K, et al. Widespread position-specific conservation of synonymous rare codons within coding sequences. PLoS Comput Biol. 2017;13: e1005531.","journal-title":"PLoS Comput Biol"},{"key":"5246_CR22","doi-asserted-by":"publisher","first-page":"1236","DOI":"10.1093\/bib\/bbx044","volume":"19","author":"R Miotto","year":"2017","unstructured":"Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform. 2017;19:1236\u201346.","journal-title":"Brief Bioinform"},{"key":"5246_CR23","doi-asserted-by":"publisher","first-page":"214","DOI":"10.3389\/fgene.2019.00214","volume":"10","author":"B Tang","year":"2019","unstructured":"Tang B, Pan Z, Yin K, Khateeb A. Recent advances of deep learning in bioinformatics and computational biology. Front Genetics. 2019;10:214.","journal-title":"Front Genetics"},{"key":"5246_CR24","unstructured":"Liu P, Qiu X, Huang X. Recurrent neural network for text classification with multi-task learning. 2016. arXiv preprint arXiv:http:\/\/arxiv.org\/abs\/1605.05101."},{"key":"5246_CR25","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735\u201380.","journal-title":"Neural Comput"},{"key":"5246_CR26","doi-asserted-by":"publisher","first-page":"2673","DOI":"10.1109\/78.650093","volume":"45","author":"M Schuster","year":"1997","unstructured":"Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45:2673\u201381.","journal-title":"IEEE Trans Signal Process"},{"key":"5246_CR27","unstructured":"GenSmartTM Codon optimization tool-genscript. https:\/\/www.genscript.com\/gensmart-free-gene-codon-optimization.html. Accessed 2 Oct 2021."},{"key":"5246_CR28","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1038\/nbt.4172","volume":"36","author":"LW Koblan","year":"2018","unstructured":"Koblan LW, Doman JL, Wilson C, Levy JM, Tay T, Newby GA, et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat Biotechnol. 2018;36:843.","journal-title":"Nat Biotechnol"},{"key":"5246_CR29","unstructured":"National Center for Biotechnology Information. Genome Escherichia coli. Bethesda. 2021."},{"key":"5246_CR30","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1093\/bioinformatics\/btq003","volume":"26","author":"Y Huang","year":"2010","unstructured":"Huang Y, Niu B, Gao Y, Fu L, Li W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics. 2010;26:680\u20132.","journal-title":"Bioinformatics"},{"key":"5246_CR31","unstructured":"MATLAB. version 7.10.0 (R2010a). Natick, Massachusetts: The MathWorks Inc.; 2010."},{"key":"5246_CR32","doi-asserted-by":"publisher","first-page":"3185","DOI":"10.1016\/j.eswa.2010.09.005","volume":"38","author":"L Nanni","year":"2011","unstructured":"Nanni L, Lumini A. A new encoding technique for peptide classification. Expert Syst Appl. 2011;38:3185\u201391.","journal-title":"Expert Syst Appl"},{"key":"5246_CR33","unstructured":"Rare codon analysis tool. https:\/\/www.genscript.com\/tools\/rare-codon-analysis. Accessed 2 Oct 2021."},{"key":"5246_CR34","doi-asserted-by":"publisher","first-page":"494","DOI":"10.1016\/0958-1669(95)80082-4","volume":"6","author":"JF Kane","year":"1995","unstructured":"Kane JF. Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr Opin Biotechnol. 1995;6:494\u2013500.","journal-title":"Curr Opin Biotechnol"},{"key":"5246_CR35","doi-asserted-by":"publisher","first-page":"1281","DOI":"10.1093\/nar\/15.3.1281","volume":"15","author":"PM Sharp","year":"1987","unstructured":"Sharp PM, Li WH. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987;15:1281\u201395.","journal-title":"Nucleic Acids Res"},{"key":"5246_CR36","doi-asserted-by":"publisher","first-page":"6976","DOI":"10.1093\/nar\/gkg897","volume":"31","author":"M dos Reis","year":"2003","unstructured":"dos Reis M, Wernisch L, Savva R. Unexpected correlations between gene expression and codon usage bias from microarray data for the whole Escherichia coli K-12 genome. Nucleic Acids Res. 2003;31:6976\u201385.","journal-title":"Nucleic Acids Res"},{"key":"5246_CR37","first-page":"1","volume":"9","author":"JH Tr\u00f6semeier","year":"2019","unstructured":"Tr\u00f6semeier JH, Rudorf S, Loessner H, Hofner B, Reuter A, Schulenborg T, et al. Optimizing the dynamics of protein expression. Sci Reports. 2019;9:1\u201315.","journal-title":"Sci Reports"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05246-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05246-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05246-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,5]],"date-time":"2023-04-05T02:52:47Z","timestamp":1680663167000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05246-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,4]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5246"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05246-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.11.08.467706","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,4]]},"assertion":[{"value":"9 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not Applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not Applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Aditya Jain has declared that no competing interests exist. Rishab Jain has declared that no competing interests exist.\u00a0Douglas Densmore has read the journal's policy and has declared the following competing interests: commercial interests at Lattice Automation and BioSens8, Professorship at Boston University, and co-founder of Asimov, Inc. Kevin LeShane has read the journal's policy and has declared the following competing interests: I have financial competing interests at Lattice Automation and Asimov Inc. Elizabeth Mauro has declared that no competing interests exist.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"132"}}