{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:56Z","timestamp":1772138036667,"version":"3.50.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2020,3,16]],"date-time":"2020-03-16T00:00:00Z","timestamp":1584316800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["DP2-GM-123641"],"award-info":[{"award-number":["DP2-GM-123641"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more nonfunctional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein-screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein-variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce a novel algorithm for total DC library optimization, degenerate codon design (DeCoDe), based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>github.com\/OrensteinLab\/DeCoDe.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Contact<\/jats:title>\n                    <jats:p>yaronore@bgu.ac.il<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa162","type":"journal-article","created":{"date-parts":[[2020,3,13]],"date-time":"2020-03-13T16:25:08Z","timestamp":1584116708000},"page":"3357-3364","source":"Crossref","is-referenced-by-count":13,"title":["DeCoDe: degenerate codon design for complete protein-coding DNA libraries"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1441-0222","authenticated-orcid":false,"given":"Tyler C","family":"Shimko","sequence":"first","affiliation":[{"name":"Department of Genetics"}]},{"given":"Polly M","family":"Fordyce","sequence":"additional","affiliation":[{"name":"Department of Genetics"},{"name":"Department of Bioengineering"},{"name":"Stanford ChEM-H , Stanford University, Stanford, CA 94305, USA"},{"name":"Chan Zuckerberg Biohub , San Francisco, CA 94158, USA"}]},{"given":"Yaron","family":"Orenstein","sequence":"additional","affiliation":[{"name":"School of Electrical and Computer Engineering , Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel"}]}],"member":"286","published-online":{"date-parts":[[2020,3,16]]},"reference":[{"key":"2023062300082609200_btaa162-B1","doi-asserted-by":"crossref","first-page":"4004","DOI":"10.1073\/pnas.0910781107","article-title":"Ultrahigh-throughput screening in drop-based microfluidics for directed evolution","volume":"107","author":"Agresti","year":"2010","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B2","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1038\/nbt0392-297","article-title":"Optimizing nucleotide mixtures to encode specific subsets of amino acids for semi-random mutagenesis","volume":"10","author":"Arkin","year":"1992","journal-title":"Nat. Biotechnol"},{"key":"2023062300082609200_btaa162-B3","doi-asserted-by":"crossref","first-page":"7978","DOI":"10.1073\/pnas.88.18.7978","article-title":"Assembly of combinatorial antibody libraries on phage surfaces: the gene III site","volume":"88","author":"Barbas","year":"1991","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B4","doi-asserted-by":"crossref","first-page":"1859","DOI":"10.1016\/S0040-4039(01)90461-7","article-title":"Deoxynucleoside phosphoramidites \u2013 a new class of key intermediates for deoxypolynucleotide synthesis","volume":"22","author":"Beaucage","year":"1981","journal-title":"Tetrahedron Lett"},{"key":"2023062300082609200_btaa162-B5","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1038\/nbt0697-553","article-title":"Yeast surface display for screening combinatorial polypeptide libraries","volume":"15","author":"Boder","year":"1997","journal-title":"Nat. Biotechnol"},{"key":"2023062300082609200_btaa162-B6","doi-asserted-by":"crossref","first-page":"3390","DOI":"10.1093\/nar\/gki615","article-title":"Protein length in eukaryotic and prokaryotic proteomes","volume":"33","author":"Brocchieri","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023062300082609200_btaa162-B7","doi-asserted-by":"crossref","first-page":"16757","DOI":"10.1038\/s41598-018-35033-y","article-title":"A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes","volume":"8","author":"Cadet","year":"2018","journal-title":"Sci. Rep"},{"key":"2023062300082609200_btaa162-B8","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1471-2105-12-S1-S14","article-title":"An ILP solution for the gene duplication problem","volume":"12","author":"Chang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023062300082609200_btaa162-B9","first-page":"221","article-title":"CVXPY: a python-embedded modeling language for convex optimization","volume":"17","author":"Diamond","year":"2016","journal-title":"J. Mach. Learn. Res"},{"key":"2023062300082609200_btaa162-B10","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1038\/nature04105","article-title":"Intrinsic dynamics of an enzyme underlies catalysis","volume":"438","author":"Eisenmesser","year":"2005","journal-title":"Nature"},{"key":"2023062300082609200_btaa162-B11","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1016\/0022-2836(86)90171-3","article-title":"Cell surface exposure of the outer membrane protein OmpA of Escherichia coli K-12","volume":"188","author":"Freudl","year":"1986","journal-title":"J. Mol. Biol"},{"key":"2023062300082609200_btaa162-B12","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1126\/science.153.3734.420","article-title":"Genetic code: aspects of organization","volume":"153","author":"Goldberg","year":"1966","journal-title":"Science"},{"key":"2023062300082609200_btaa162-B13","author":"Gurobi Optimization","year":"2018"},{"key":"2023062300082609200_btaa162-B14","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1016\/j.cell.2009.07.038","article-title":"Protein sectors: evolutionary units of three-dimensional structure","volume":"138","author":"Halabi","year":"2009","journal-title":"Cell"},{"key":"2023062300082609200_btaa162-B15","doi-asserted-by":"crossref","first-page":"e34","DOI":"10.1093\/nar\/gku1323","article-title":"SwiftLib: rapid degenerate-codon-library optimization through dynamic programming","volume":"43","author":"Jacobs","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023062300082609200_btaa162-B16","doi-asserted-by":"crossref","first-page":"1364","DOI":"10.1126\/science.1089427","article-title":"Design of a novel globular protein fold with atomic-level accuracy","volume":"302","author":"Kuhlman","year":"2003","journal-title":"Science"},{"key":"2023062300082609200_btaa162-B17","doi-asserted-by":"crossref","first-page":"1249","DOI":"10.1002\/pro.5560020807","article-title":"Design of synthetic gene libraries encoding random sequence proteins with desired ensemble characteristics","volume":"2","author":"LaBean","year":"1993","journal-title":"Protein Sci"},{"key":"2023062300082609200_btaa162-B18","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1038\/s41592-019-0352-8","article-title":"FPbase: a community-editable fluorescent protein database","volume":"16","author":"Lambert","year":"2019","journal-title":"Nat. Methods"},{"key":"2023062300082609200_btaa162-B19","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1016\/B978-0-12-381270-4.00019-6","article-title":"ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules","volume":"487","author":"Leaver-Fay","year":"2011","journal-title":"Methods Enzymol"},{"key":"2023062300082609200_btaa162-B20","doi-asserted-by":"crossref","first-page":"2522","DOI":"10.1093\/nar\/gkq163","article-title":"Synthesis of high-quality libraries of long (150mer) oligonucleotides by a novel depurination controlled process","volume":"38","author":"LeProust","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023062300082609200_btaa162-B21","doi-asserted-by":"crossref","first-page":"13045","DOI":"10.1073\/pnas.1611781113","article-title":"Evolutionary trend toward kinetic stability in the folding trajectory of RNases H","volume":"113","author":"Lim","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B22","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1126\/science.286.5438.295","article-title":"Evolutionarily conserved pathways of energetic connectivity in protein families","volume":"286","author":"Lockless","year":"1999","journal-title":"Science"},{"key":"2023062300082609200_btaa162-B23","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1093\/protein\/gzi061","article-title":"Automated design of degenerate codon libraries","volume":"18","author":"Mena","year":"2005","journal-title":"Protein Eng. Des. Sel"},{"key":"2023062300082609200_btaa162-B24","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1038\/nature13001","article-title":"The ensemble nature of allostery","volume":"508","author":"Motlagh","year":"2014","journal-title":"Nature"},{"key":"2023062300082609200_btaa162-B25","doi-asserted-by":"crossref","first-page":"2317","DOI":"10.1021\/acssynbio.8b00118","article-title":"Large scale synthetic site saturation GPCR libraries reveal novel mutations that alter glucose signaling","volume":"7","author":"Oling","year":"2018","journal-title":"ACS Synth. Biol"},{"key":"2023062300082609200_btaa162-B26","doi-asserted-by":"crossref","first-page":"1743","DOI":"10.1089\/cmb.2011.0152","article-title":"Optimization of combinatorial mutagenesis","volume":"18","author":"Parker","year":"2011","journal-title":"J. Comput. Biol"},{"key":"2023062300082609200_btaa162-B27","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1093\/protein\/15.10.779","article-title":"Protein design is NP-hard","volume":"15","author":"Pierce","year":"2002","journal-title":"Protein Eng"},{"key":"2023062300082609200_btaa162-B28","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1126\/science.aao5167","article-title":"Multiplexed gene synthesis in emulsions for exploring protein functional landscapes","volume":"359","author":"Plesa","year":"2018","journal-title":"Science"},{"key":"2023062300082609200_btaa162-B29","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/0378-1119(92)90691-H","article-title":"Primary structure of the Aequorea victoria green-fluorescent protein","volume":"111","author":"Prasher","year":"1992","journal-title":"Gene"},{"key":"2023062300082609200_btaa162-B30","doi-asserted-by":"crossref","first-page":"12297","DOI":"10.1073\/pnas.94.23.12297","article-title":"RNA-peptide fusions for the in vitro selection of peptides and proteins","volume":"94","author":"Roberts","year":"1997","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B31","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1038\/nmeth.1272","article-title":"Epitope mapping of antibodies using bacterial surface display","volume":"5","author":"Rockberg","year":"2008","journal-title":"Nat. Methods"},{"key":"2023062300082609200_btaa162-B32","doi-asserted-by":"crossref","first-page":"7159","DOI":"10.1073\/pnas.1422285112","article-title":"Dissecting enzyme function with microfluidic-based deep mutational scanning","volume":"112","author":"Romero","year":"2015","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B33","doi-asserted-by":"crossref","first-page":"2014","DOI":"10.1021\/acssynbio.8b00155","article-title":"Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins","volume":"7","author":"Saito","year":"2018","journal-title":"ACS Synth. Biol"},{"key":"2023062300082609200_btaa162-B34","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1038\/nature17995","article-title":"Local fitness landscape of the green fluorescent protein","volume":"533","author":"Sarkisyan","year":"2016","journal-title":"Nature"},{"key":"2023062300082609200_btaa162-B35","doi-asserted-by":"crossref","first-page":"1588","DOI":"10.1073\/pnas.83.6.1588","article-title":"Site-saturation studies of beta-lactamase: production and characterization of mutant beta-lactamases with all possible amino acid substitutions at residue 71","volume":"83","author":"Schultz","year":"1986","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B36","doi-asserted-by":"crossref","first-page":"8308","DOI":"10.1073\/pnas.1532535100","article-title":"Molecular analysis of the evolutionary significance of ultraviolet vision in vertebrates","volume":"100","author":"Shi","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023062300082609200_btaa162-B37","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1126\/science.4001944","article-title":"Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface","volume":"228","author":"Smith","year":"1985","journal-title":"Science"},{"key":"2023062300082609200_btaa162-B38","doi-asserted-by":"crossref","first-page":"512","DOI":"10.1038\/nature03991","article-title":"Evolutionary information for specifying a protein fold","volume":"437","author":"Socolich","year":"2005","journal-title":"Nature"},{"key":"2023062300082609200_btaa162-B39","doi-asserted-by":"crossref","first-page":"9657","DOI":"10.1021\/bi050568q","article-title":"The crystal structure of the Methanocaldococcus jannaschii multifunctional L7Ae RNA-binding protein reveals an induced-fit interaction with the box C\/D RNAs","volume":"44","author":"Suryadi","year":"2005","journal-title":"Biochemistry"},{"key":"2023062300082609200_btaa162-B40","doi-asserted-by":"crossref","first-page":"e36","DOI":"10.1093\/nar\/gnh030","article-title":"Shuffled antibody libraries created by in vivo homologous recombination and yeast surface display","volume":"32","author":"Swers","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023062300082609200_btaa162-B41","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1016\/S0014-5793(01)03075-7","article-title":"An in vitro DNA virus for in vitro protein evolution","volume":"508","author":"Tabuchi","year":"2001","journal-title":"FEBS Lett"},{"key":"2023062300082609200_btaa162-B42","volume-title":"GNU Parallel 2018","author":"Tange","year":"2018","edition":"1st edn"},{"key":"2023062300082609200_btaa162-B43","doi-asserted-by":"crossref","first-page":"1668","DOI":"10.1002\/prot.24559","article-title":"Canonical structures of short CDR-L3 in antibodies","volume":"82","author":"Teplyakov","year":"2014","journal-title":"Proteins"},{"key":"2023062300082609200_btaa162-B44","doi-asserted-by":"crossref","first-page":"1714","DOI":"10.1126\/science.1086185","article-title":"Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling","volume":"301","author":"Thornton","year":"2003","journal-title":"Science"},{"key":"2023062300082609200_btaa162-B45","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1110\/ps.8.3.680","article-title":"Combinatorial codons: a computer program to approximate amino acid probabilities with biased nucleotide usage","volume":"8","author":"Wolf","year":"1999","journal-title":"Protein Sci"},{"key":"2023062300082609200_btaa162-B46","doi-asserted-by":"crossref","first-page":"8852","DOI":"10.1073\/pnas.1901979116","article-title":"Machine learning-assisted directed protein evolution with combinatorial libraries","volume":"116","author":"Wu","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa162\/33130057\/btaa162.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/11\/3357\/50670609\/bioinformatics_36_11_3357.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/11\/3357\/50670609\/bioinformatics_36_11_3357.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T14:21:53Z","timestamp":1687616513000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/11\/3357\/5807608"}},"subtitle":[],"editor":[{"given":"Pier","family":"Luigi Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,3,16]]},"references-count":46,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2020,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa162","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/809004","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,6]]},"published":{"date-parts":[[2020,3,16]]}}}