{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T23:25:03Z","timestamp":1778541903244,"version":"3.51.4"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1013996","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T00:00:00Z","timestamp":1771977600000}}],"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"2","license":[{"start":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:00:00Z","timestamp":1771459200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003407","name":"Ministero dell\u2019Istruzione, dell\u2019Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["2022TE5B7X"],"award-info":[{"award-number":["2022TE5B7X"]}],"id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003407","name":"Ministero dell\u2019Istruzione, dell\u2019Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["2022TE5B7X"],"award-info":[{"award-number":["2022TE5B7X"]}],"id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Union NextGenerationEU","award":["PE00000013"],"award-info":[{"award-number":["PE00000013"]}]},{"name":"European Union NextGenerationEU","award":["PE00000013"],"award-info":[{"award-number":["PE00000013"]}]},{"DOI":"10.13039\/100018694","name":"HORIZON EUROPE Marie Sklodowska-Curie Actions","doi-asserted-by":"publisher","award":["101131463"],"award-info":[{"award-number":["101131463"]}],"id":[{"id":"10.13039\/100018694","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100018694","name":"HORIZON EUROPE Marie Sklodowska-Curie Actions","doi-asserted-by":"publisher","award":["101131463"],"award-info":[{"award-number":["101131463"]}],"id":[{"id":"10.13039\/100018694","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"crossref","award":["IC00I0-227688"],"award-info":[{"award-number":["IC00I0-227688"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001711","name":"Swiss National Science Foundation","doi-asserted-by":"crossref","award":["IC00I0-227688"],"award-info":[{"award-number":["IC00I0-227688"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>We present FeatureDCA, a statistical framework for protein sequence modeling and generation that extends Direct Coupling Analysis (DCA) with biologically meaningful conditioning. The method can leverage different kinds of information, such as phylogeny, optimal growth temperature, enzymatic activity or, as in the case presented here, principal components derived from multiple sequence alignments, and use it to improve the learning process and consequently efficiently condition the generative process. FeatureDCA allows sampling to be guided toward specific regions of sequence space while maintaining the efficiency and interpretability of Potts-based inference. Across multiple protein families, our autoregressive implementation of FeatureDCA matches or surpasses the generative accuracy of established models in reproducing higher-order sequence statistics while preserving substantial sequence diversity. Structural validation with AlphaFold and ESMFold confirms that generated sequences adopt folds consistent with their intended wild-type targets. In a detailed case study of the Response Regulator family (PF00072), which comprises distinct structural subclasses linked to different DNA-binding domains, FeatureDCA accurately reproduces class-specific architectures when conditioned on subtype-specific principal components, highlighting its potential for fine-grained structural control. Predictions of experimental deep mutational scanning data show accuracy comparable to that of unconditioned autoregressive Potts models, indicating that FeatureDCA also captures local functional constraints. These results position FeatureDCA as a flexible and transparent approach for targeted sequence generation, bridging statistical fidelity, structural realism, and interpretability in protein design.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013996","type":"journal-article","created":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T18:46:06Z","timestamp":1771526766000},"page":"e1013996","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":1,"title":["Controllable protein design via autoregressive direct coupling analysis conditioned on principal components"],"prefix":"10.1371","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-2175-8084","authenticated-orcid":true,"given":"Francesco","family":"Caredda","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-6156-5804","authenticated-orcid":true,"given":"Lisa","family":"Gennai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paolo","family":"De Los Rios","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrea","family":"Pagnani","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2026,2,19]]},"reference":[{"issue":"1","key":"pcbi.1013996.ref001","first-page":"147","article-title":"A learning algorithm for boltzmann machines","volume":"9","author":"D Ackley","year":"1985","journal-title":"Cognitive Science."},{"issue":"4","key":"pcbi.1013996.ref002","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1093\/molbev\/msy007","article-title":"How pairwise coevolutionary models capture the collective residue variability in proteins?","volume":"35","author":"M Figliuzzi","year":"2018","journal-title":"Mol Biol Evol."},{"issue":"1","key":"pcbi.1013996.ref003","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1186\/s12859-021-04441-9","article-title":"adabmDCA: adaptive Boltzmann machine learning for biological sequences","volume":"22","author":"AP Muntoni","year":"2021","journal-title":"BMC Bioinformatics."},{"key":"pcbi.1013996.ref004","doi-asserted-by":"crossref","first-page":"024407","DOI":"10.1103\/PhysRevE.104.024407","article-title":"Sparse generative modeling via parameter reduction of Boltzmann machines: application to protein-sequence families","volume":"104","author":"P Barrat-Charlaix","year":"2021","journal-title":"Phys Rev E."},{"issue":"1","key":"pcbi.1013996.ref005","doi-asserted-by":"crossref","first-page":"5800","DOI":"10.1038\/s41467-021-25756-4","article-title":"Efficient generative modeling of protein sequences using simple autoregressive models","volume":"12","author":"J Trinquier","year":"2021","journal-title":"Nat Commun."},{"issue":"11","key":"pcbi.1013996.ref006","article-title":"GENERALIST: a latent space based generative model for protein sequence families","volume":"19","author":"H Akl","year":"2023","journal-title":"PLoS Comput Biol."},{"issue":"6502","key":"pcbi.1013996.ref007","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1126\/science.aba3304","article-title":"An evolution-based model for designing chorismate mutase enzymes","volume":"369","author":"WP Russ","year":"2020","journal-title":"Science."},{"issue":"10","key":"pcbi.1013996.ref008","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1038\/s41592-018-0138-4","article-title":"Deep generative models of genetic variation capture the effects of mutations","volume":"15","author":"AJ Riesselman","year":"2018","journal-title":"Nat Methods."},{"issue":"7","key":"pcbi.1013996.ref009","doi-asserted-by":"crossref","first-page":"2689","DOI":"10.1021\/acs.jctc.3c01057","article-title":"Protein ensemble generation through variational autoencoder latent space sampling","volume":"20","author":"S Mansoor","year":"2024","journal-title":"J Chem Theory Comput."},{"issue":"4","key":"pcbi.1013996.ref010","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1038\/s42256-021-00310-5","article-title":"Expanding functional protein sequence spaces using generative adversarial networks","volume":"3","author":"D Repecka","year":"2021","journal-title":"Nat Mach Intell."},{"key":"pcbi.1013996.ref011","doi-asserted-by":"crossref","unstructured":"Madani A, McCann B, Naik N, Keskar NS, Anand N, Eguchi RR. ProGen: language modeling for protein generation. arXiv preprint 2020. https:\/\/arxiv.org\/abs\/2004.03497","DOI":"10.1101\/2020.03.07.982272"},{"issue":"15","key":"pcbi.1013996.ref012","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"A Rives","year":"2021","journal-title":"Proc Natl Acad Sci U S A."},{"key":"pcbi.1013996.ref013","doi-asserted-by":"crossref","unstructured":"Nambiar A, Liu S, Hopkins M, Heflin M, Maslov S, Ritz A. Transforming the language of life: transformer neural networks for protein prediction tasks. openRxiv. 2020. https:\/\/doi.org\/10.1101\/2020.06.15.153643","DOI":"10.1101\/2020.06.15.153643"},{"key":"pcbi.1013996.ref014","doi-asserted-by":"crossref","first-page":"1750","DOI":"10.1016\/j.csbj.2021.03.022","article-title":"The language of proteins: NLP, machine learning & protein sequences","volume":"19","author":"D Ofer","year":"2021","journal-title":"Comput Struct Biotechnol J."},{"key":"pcbi.1013996.ref015","doi-asserted-by":"crossref","unstructured":"Rao RM, Liu J, Verkuil R, Meier J, Canny J, Abbeel P, et al. MSA transformer. In: Proceedings of the 38th International Conference on Machine Learning. PMLR; 2021. p. 8844\u201356.","DOI":"10.1101\/2021.02.12.430858"},{"key":"pcbi.1013996.ref016","doi-asserted-by":"crossref","unstructured":"Rao R, Meier J, Sercu T, Ovchinnikov S, Rives A. Transformer protein language models are unsupervised structure learners. openRxiv. 2020. https:\/\/doi.org\/10.1101\/2020.12.15.422761","DOI":"10.1101\/2020.12.15.422761"},{"key":"pcbi.1013996.ref017","doi-asserted-by":"crossref","unstructured":"Bhattacharya N, Thomas N, Rao R, Dauparas J, Koo PK, Baker D, et al. Single layers of attention suffice to predict protein contacts. openRxiv. 2020.https:\/\/doi.org\/10.1101\/2020.12.21.423882","DOI":"10.1101\/2020.12.21.423882"},{"issue":"1","key":"pcbi.1013996.ref018","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/s12859-025-06062-y","article-title":"Direct coupling analysis and the attention mechanism","volume":"26","author":"F Caredda","year":"2025","journal-title":"BMC Bioinformatics."},{"key":"pcbi.1013996.ref019","doi-asserted-by":"crossref","unstructured":"Alamdari S, Thakkar N, van den Berg R, Tenenholtz N, Strome R, Moses AM, et al. Protein generation with evolutionary diffusion: sequence is all you need. openRxiv. 2023. https:\/\/doi.org\/10.1101\/2023.09.11.556673","DOI":"10.1101\/2023.09.11.556673"},{"issue":"6637","key":"pcbi.1013996.ref020","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1126\/science.ade2574","article-title":"Evolutionary-scale prediction of atomic-level protein structure with a language model","volume":"379","author":"Z Lin","year":"2023","journal-title":"Science."},{"issue":"6736","key":"pcbi.1013996.ref021","doi-asserted-by":"crossref","first-page":"850","DOI":"10.1126\/science.ads0018","article-title":"Simulating 500 million years of evolution with a language model","volume":"387","author":"T Hayes","year":"2025","journal-title":"Science."},{"key":"pcbi.1013996.ref022","unstructured":"Notin P, Dias M, Frazer J, Marchena-Hurtado J, Gomez A, Marks DS, et al. Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval. arXiv preprint 2022. https:\/\/arxiv.org\/abs\/2205.13760"},{"key":"pcbi.1013996.ref023","unstructured":"Truong TF, Bepler T. Understanding protein function with a multimodal retrieval-augmented foundation model. arXiv preprint 2025. https:\/\/arxiv.org\/abs\/2508.04724"},{"issue":"7996","key":"pcbi.1013996.ref024","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1038\/s41586-023-06832-9","article-title":"Predicting multiple conformations via sequence clustering and AlphaFold2","volume":"625","author":"HK Wayment-Steele","year":"2024","journal-title":"Nature."},{"issue":"7873","key":"pcbi.1013996.ref025","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"J Jumper","year":"2021","journal-title":"Nature."},{"issue":"8016","key":"pcbi.1013996.ref026","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1038\/s41586-024-07487-w","article-title":"Accurate structure prediction of biomolecular interactions with AlphaFold 3","volume":"630","author":"J Abramson","year":"2024","journal-title":"Nature."},{"issue":"49","key":"pcbi.1013996.ref027","doi-asserted-by":"crossref","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"F Morcos","year":"2011","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"3","key":"pcbi.1013996.ref028","doi-asserted-by":"crossref","first-page":"032601","DOI":"10.1088\/1361-6633\/aa9965","article-title":"Inverse statistical physics of protein sequences: a key issues review","volume":"81","author":"S Cocco","year":"2018","journal-title":"Rep Prog Phys."},{"key":"pcbi.1013996.ref029","unstructured":"Feydy J, S\u00b4ejourn\u00b4e T, Vialard FX, Amari S i, Trouv\u00b4e A, Peyr\u00b4e G. Interpolating between Optimal Transport and MMD Using Sinkhorn Divergences. arXiv preprint 2018. https:\/\/arxiv.org\/abs\/1810.08278"},{"issue":"1","key":"pcbi.1013996.ref030","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1002\/prot.340090107","article-title":"Database of homology-derived protein structures and the structural meaning of sequence alignment","volume":"9","author":"C Sander","year":"1991","journal-title":"Proteins."},{"issue":"1","key":"pcbi.1013996.ref031","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The Protein Data Bank","volume":"28","author":"HM Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"issue":"19","key":"pcbi.1013996.ref032","doi-asserted-by":"crossref","first-page":"3752","DOI":"10.1016\/j.jmb.2016.08.003","article-title":"Molecular mechanisms of two-component signal transduction","volume":"428","author":"CP Zschiedrich","year":"2016","journal-title":"J Mol Biol."},{"key":"pcbi.1013996.ref033","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1146\/annurev.micro.091208.073214","article-title":"Biological insights from structures of two-component proteins","volume":"63","author":"R Gao","year":"2009","journal-title":"Annu Rev Microbiol."},{"issue":"10","key":"pcbi.1013996.ref034","doi-asserted-by":"crossref","first-page":"1612","DOI":"10.1016\/j.jmb.2013.02.003","article-title":"Comprehensive analysis of OmpR phosphorylation, dimerization, and DNA binding supports a canonical model for activation","volume":"425","author":"CM Barbieri","year":"2013","journal-title":"J Mol Biol."},{"issue":"2","key":"pcbi.1013996.ref035","doi-asserted-by":"crossref","DOI":"10.1002\/wcms.1298","article-title":"Using PyMOL as a platform for computational drug design","volume":"7","author":"S Yuan","year":"2017","journal-title":"WIREs Comput Mol Sci."},{"key":"pcbi.1013996.ref036","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"J Mistry","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"pcbi.1013996.ref037","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkac993","article-title":"InterPro in 2022","volume":"51","author":"T Paysan-Lafosse","year":"2023","journal-title":"Nucleic Acids Res."},{"issue":"1","key":"pcbi.1013996.ref038","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msv211","article-title":"Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1","volume":"33","author":"M Figliuzzi","year":"2016","journal-title":"Mol Biol Evol."},{"issue":"32","key":"pcbi.1013996.ref039","doi-asserted-by":"crossref","first-page":"13067","DOI":"10.1073\/pnas.1215206110","article-title":"Capturing the mutational landscape of the beta-lactamase TEM-1","volume":"110","author":"H Jacquier","year":"2013","journal-title":"Proc Natl Acad Sci U S A."},{"issue":"12","key":"pcbi.1013996.ref040","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1016\/j.jmb.2019.04.030","article-title":"Fitness effects of single amino acid insertions and deletions in TEM-1 \u03b2-lactamase","volume":"431","author":"CE Gonzalez","year":"2019","journal-title":"J Mol Biol."},{"key":"pcbi.1013996.ref041","doi-asserted-by":"crossref","unstructured":"Rosset L, Weigt M, Zamponi F. Data augmentation enables label-specific generation of homologous protein sequences. openRxiv. 2025. https:\/\/doi.org\/10.1101\/2025.07.22.665933","DOI":"10.1101\/2025.07.22.665933"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1013996","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T00:00:00Z","timestamp":1771977600000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013996","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T18:48:47Z","timestamp":1772045327000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013996"}},"subtitle":[],"editor":[{"given":"Fei","family":"Guo","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2026,2,19]]},"references-count":41,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2,19]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013996","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,19]]}}}