{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:22Z","timestamp":1772138062371,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2023,7,3]],"date-time":"2023-07-03T00:00:00Z","timestamp":1688342400000},"content-version":"vor","delay-in-days":2,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Being able to artificially design novel proteins of desired function is pivotal in many biological and biomedical applications. Generative statistical modeling has recently emerged as a new paradigm for designing amino acid sequences, including in particular models and embedding methods borrowed from natural language processing (NLP). However, most approaches target single proteins or protein domains, and do not take into account any functional specificity or interaction with the context. To extend beyond current computational strategies, we develop a method for generating protein domain sequences intended to interact with another protein domain. Using data from natural multidomain proteins, we cast the problem as a translation problem from a given interactor domain to the new domain to be generated, i.e. we generate artificial partner sequences conditional on an input sequence. We also show in an example that the same procedure can be applied to interactions between distinct proteins.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Evaluating our model\u2019s quality using diverse metrics, in part related to distinct biological questions, we show that our method outperforms state-of-the-art shallow autoregressive strategies. We also explore the possibility of fine-tuning pretrained large language models for the same task and of using Alphafold 2 for assessing the quality of sampled sequences.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Data and code on https:\/\/github.com\/barthelemymp\/Domain2DomainProteinTranslation.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad401","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T10:47:35Z","timestamp":1688122055000},"source":"Crossref","is-referenced-by-count":5,"title":["Generating interacting protein sequences using domain-to-domain translation"],"prefix":"10.1093","volume":"39","author":[{"given":"Barthelemy","family":"Meynard-Piganeau","sequence":"first","affiliation":[{"name":"Computational and Quantitative Biology, LCQB UMR 7238, Institut de Biologie Paris Seine, CNRS, Sorbonne Universit\u00e9 , Paris 75005, France"},{"name":"Department of Computing Sciences, Bocconi University , Milan 20100, Italy"}]},{"given":"Caterina","family":"Fabbri","sequence":"additional","affiliation":[{"name":"Department of Computing Sciences, Bocconi University , Milan 20100, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0492-3684","authenticated-orcid":false,"given":"Martin","family":"Weigt","sequence":"additional","affiliation":[{"name":"Computational and Quantitative Biology, LCQB UMR 7238, Institut de Biologie Paris Seine, CNRS, Sorbonne Universit\u00e9 , Paris 75005, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6509-0807","authenticated-orcid":false,"given":"Andrea","family":"Pagnani","sequence":"additional","affiliation":[{"name":"Politecnico di Torino , Duca degli Abruzzi, 24 , Turin 10129, Italy"},{"name":"Italian Institute for Genomic Medicine , Strada Provinciale, 142 , Candiolo 10060, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8941-7333","authenticated-orcid":false,"given":"Christoph","family":"Feinauer","sequence":"additional","affiliation":[{"name":"Department of Computing Sciences, Bocconi University , Milan 20100, Italy"}]}],"member":"286","published-online":{"date-parts":[[2023,7,3]]},"reference":[{"key":"2023070819160951500_btad401-B1","volume-title":"Molecular Biology of the Cell","author":"Alberts","year":"2008","edition":"5th edn"},{"key":"2023070819160951500_btad401-B2","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat Methods"},{"key":"2023070819160951500_btad401-B3","doi-asserted-by":"crossref","first-page":"9122","DOI":"10.1073\/pnas.1702664114","article-title":"Origins of coevolution between residues distant in protein 3d structures","volume":"114","author":"Anishchenko","year":"2017","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023070819160951500_btad401-B4","author":"Armenteros","year":"2020"},{"key":"2023070819160951500_btad401-B5","doi-asserted-by":"crossref","first-page":"12180","DOI":"10.1073\/pnas.1606762113","article-title":"Inferring interaction partners from protein sequences","volume":"113","author":"Bitbol","year":"2016","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023070819160951500_btad401-B6","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1007\/978-1-4939-7000-1_26","article-title":"Protein data bank (pdb): the single global macromolecular structure archive","volume":"1607","author":"Burley","year":"2017","journal-title":"Protein Crystallogr"},{"key":"2023070819160951500_btad401-B7","doi-asserted-by":"crossref","first-page":"E563","DOI":"10.1073\/pnas.1323734111","article-title":"Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information","volume":"111","author":"Cheng","year":"2014","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023070819160951500_btad401-B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-32986-y","article-title":"Humanization of antibodies using a statistical inference approach","volume":"8","author":"Clavero-\u00c1lvarez","year":"2018","journal-title":"Sci Rep"},{"key":"2023070819160951500_btad401-B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023070819160951500_btad401-B10","doi-asserted-by":"crossref","first-page":"012707","DOI":"10.1103\/PhysRevE.87.012707","article-title":"Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models","volume":"87","author":"Ekeberg","year":"2013","journal-title":"Phys Rev E Stat Nonlin Soft Matter Phys"},{"key":"2023070819160951500_btad401-B11","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1093\/molbev\/msy007","article-title":"How pairwise coevolutionary models capture the collective residue variability in proteins?","volume":"35","author":"Figliuzzi","year":"2018","journal-title":"Mol Biol Evol"},{"key":"2023070819160951500_btad401-B12","doi-asserted-by":"crossref","first-page":"W29","DOI":"10.1093\/nar\/gkr367","article-title":"Hmmer web server: interactive sequence similarity searching","volume":"39","author":"Finn","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023070819160951500_btad401-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-020-79682-4","article-title":"Transformer neural network for protein-specific de novo drug generation as a machine translation problem","volume":"11","author":"Grechishnikova","year":"2021","journal-title":"Sci Rep"},{"key":"2023070819160951500_btad401-B14","doi-asserted-by":"crossref","first-page":"12186","DOI":"10.1073\/pnas.1607570113","article-title":"Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis","volume":"113","author":"Gueudr\u00e9","year":"2016","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023070819160951500_btad401-B15","doi-asserted-by":"crossref","first-page":"e1008736","DOI":"10.1371\/journal.pcbi.1008736","article-title":"Generating functional protein variants with variational autoencoders","volume":"17","author":"Hawkins-Hooker","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2023070819160951500_btad401-B16","author":"Hesslow","year":"2022"},{"key":"2023070819160951500_btad401-B17","author":"Hsu","year":"2022"},{"key":"2023070819160951500_btad401-B18","author":"Jang","year":"2016"},{"key":"2023070819160951500_btad401-B19","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with alphafold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023070819160951500_btad401-B20","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The hungarian method for the assignment problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Nav Res Logist"},{"key":"2023070819160951500_btad401-B21","first-page":"2022","article-title":"Evolutionary-scale prediction of atomic level protein structure with a language model","volume":"379","author":"Lin","year":"2022","journal-title":"bioRxiv"},{"key":"2023070819160951500_btad401-B22","author":"Madani","year":"2020"},{"key":"2023070819160951500_btad401-B23","doi-asserted-by":"crossref","first-page":"102370","DOI":"10.1016\/j.sbi.2022.102370","article-title":"Computational design of novel protein\u2013protein interactions\u2014an overview on methodological approaches and applications","volume":"74","author":"Marchand","year":"2022","journal-title":"Curr Opin Struct Biol"},{"key":"2023070819160951500_btad401-B24","author":"McPartlon","year":"2022"},{"key":"2023070819160951500_btad401-B25","first-page":"29287","article-title":"Language models enable zero-shot prediction of the effects of mutations on protein function","volume":"34","author":"Meier","year":"2021","journal-title":"Adv Neural Inform Process Syst"},{"key":"2023070819160951500_btad401-B26","doi-asserted-by":"crossref","first-page":"e1007621","DOI":"10.1371\/journal.pcbi.1007621","article-title":"Filterdca: interpretable supervised contact prediction using inter-domain coevolution","volume":"16","author":"Muscat","year":"2020","journal-title":"PLoS Comput Biol"},{"key":"2023070819160951500_btad401-B27","first-page":"1","author":"Nambiar","year":"2020"},{"key":"2023070819160951500_btad401-B28","first-page":"8844","author":"Rao","year":"2021"},{"key":"2023070819160951500_btad401-B29","doi-asserted-by":"crossref","first-page":"eaaw4388","DOI":"10.1126\/science.aaw4388","article-title":"Structures of a dimodular nonribosomal peptide synthetase reveal conformational flexibility","volume":"366","author":"Reimer","year":"2019","journal-title":"Science"},{"key":"2023070819160951500_btad401-B30","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1038\/s42256-021-00310-5","article-title":"Expanding functional protein sequence spaces using generative adversarial networks","volume":"3","author":"Repecka","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2023070819160951500_btad401-B31","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/s41592-018-0138-4","article-title":"Deep generative models of genetic variation capture the effects of mutations","volume":"15","author":"Riesselman","year":"2018","journal-title":"Nat Methods"},{"key":"2023070819160951500_btad401-B32","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2021","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023070819160951500_btad401-B33","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1126\/science.aba3304","article-title":"An evolution-based model for designing chorismate mutase enzymes","volume":"369","author":"Russ","year":"2020","journal-title":"Science"},{"key":"2023070819160951500_btad401-B34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-22732-w","article-title":"Protein design and variant prediction using autoregressive generative models","volume":"12","author":"Shin","year":"2021","journal-title":"Nat Commun"},{"key":"2023070819160951500_btad401-B35","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.sbi.2017.10.014","article-title":"Inter-residue, inter-protein and inter-family coevolution: bridging the scales","volume":"50","author":"Szurmant","year":"2018","journal-title":"Curr Opin Struct Biol"},{"key":"2023070819160951500_btad401-B36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-25756-4","article-title":"Efficient generative modeling of protein sequences using simple autoregressive models","volume":"12","author":"Trinquier","year":"2021","journal-title":"Nat Commun"},{"key":"2023070819160951500_btad401-B37","doi-asserted-by":"crossref","first-page":"e39397","DOI":"10.7554\/eLife.39397","article-title":"Learning protein constitutive motifs from sequence data","volume":"8","author":"Tubiana","year":"2019","journal-title":"Elife"},{"key":"2023070819160951500_btad401-B38","first-page":"5998","article-title":"Attention is all you need","author":"Vaswani","year":"2017","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2023070819160951500_btad401-B39","doi-asserted-by":"crossref","first-page":"2154","DOI":"10.1021\/acssynbio.0c00219","article-title":"Signal peptides generated by attention-based neural networks","volume":"9","author":"Wu","year":"2020","journal-title":"ACS Synth Biol"},{"key":"2023070819160951500_btad401-B40","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1016\/j.cbpa.2021.04.004","article-title":"Protein sequence design with deep generative models","volume":"65","author":"Wu","year":"2021","journal-title":"Curr Opin Chem Biol"},{"key":"2023070819160951500_btad401-B41","first-page":"14252","article-title":"Co-evolution transformer for protein contact prediction","volume":"34","author":"Zhang","year":"2021","journal-title":"Adv Neural Inform Process Syst"},{"key":"2023070819160951500_btad401-B42","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1038\/s43588-022-00232-1","article-title":"Progressive assembly of multi-domain protein structures from cryo-em density maps","volume":"2","author":"Zhou","year":"2022","journal-title":"Nat Comput Sci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad401\/50789468\/btad401.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad401\/50843030\/btad401.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad401\/50843030\/btad401.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,8]],"date-time":"2023-07-08T15:16:56Z","timestamp":1688829416000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad401\/7218310"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,7,1]]},"references-count":42,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad401","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.05.30.494026","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,7,1]]},"published":{"date-parts":[[2023,7,1]]},"article-number":"btad401"}}