{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,19]],"date-time":"2026-04-19T10:06:57Z","timestamp":1776593217701,"version":"3.51.2"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"22","license":[{"start":{"date-parts":[[2021,6,12]],"date-time":"2021-06-12T00:00:00Z","timestamp":1623456000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-17-CE30-0021"],"award-info":[{"award-number":["ANR-17-CE30-0021"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-19-CE30-0021"],"award-info":[{"award-number":["ANR-19-CE30-0021"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Centre de Recherche Interdisciplinary"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Modeling of protein family sequence distribution from homologous sequence data recently received considerable attention, in particular for structure and function predictions, as well as for protein design. In particular, direct coupling analysis, a method to infer effective pairwise interactions between residues, was shown to capture important structural constraints and to successfully generate functional protein sequences. Building on this and other graphical models, we introduce a new framework to assess the quality of the secondary structures of the generated sequences with respect to reference structures for the family.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce two scoring functions characterizing the likeliness of the secondary structure of a protein sequence to match a reference structure, called Dot Product and Pattern Matching. We test these scores on published experimental protein mutagenesis and design dataset, and show improvement in the detection of nonfunctional sequences. We also show that use of these scores help rejecting nonfunctional sequences generated by graphical models (Restricted Boltzmann Machines) learned from homologous sequence alignments.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Data and code available at https:\/\/github.com\/CyrilMa\/ssqa<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab442","type":"journal-article","created":{"date-parts":[[2021,6,11]],"date-time":"2021-06-11T15:22:03Z","timestamp":1623424923000},"page":"4083-4090","source":"Crossref","is-referenced-by-count":10,"title":["Improving sequence-based modeling of protein families using secondary-structure quality assessment"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4545-0801","authenticated-orcid":false,"given":"Cyril","family":"Malbranke","sequence":"first","affiliation":[{"name":"Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Universit\u00e9, Universit\u00e9 de Paris , Paris, France"},{"name":"Synthetic Biology, Microbiology Department, Institut Pasteur , Paris, France"}]},{"given":"David","family":"Bikard","sequence":"additional","affiliation":[{"name":"Synthetic Biology, Microbiology Department, Institut Pasteur , Paris, France"}]},{"given":"Simona","family":"Cocco","sequence":"additional","affiliation":[{"name":"Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Universit\u00e9, Universit\u00e9 de Paris , Paris, France"}]},{"given":"R\u00e9mi","family":"Monasson","sequence":"additional","affiliation":[{"name":"Laboratory of Physics of the Ecole Normale Superieure, PSL Research, CNRS UMR 8023, Sorbonne Universit\u00e9, Universit\u00e9 de Paris , Paris, France"}]}],"member":"286","published-online":{"date-parts":[[2021,6,12]]},"reference":[{"key":"2023051607120481200_btab442-B1","doi-asserted-by":"crossref","first-page":"1315","DOI":"10.1038\/s41592-019-0598-1","article-title":"Unified rational protein engineering with sequence-based deep representation learning","volume":"16","author":"Alley","year":"2019","journal-title":"Nat. Methods"},{"key":"2023051607120481200_btab442-B2","first-page":"705426","volume-title":"DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences","author":"Asgari","year":"2019"},{"key":"2023051607120481200_btab442-B3","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1093\/bioinformatics\/btaa714","article-title":"GraphQA: protein model quality assessment using graph convolutional networks","volume":"37","author":"Baldassarre","year":"2021","journal-title":"Bioinformatics"},{"key":"2023051607120481200_btab442-B4","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1093\/nar\/30.1.276","article-title":"The Pfam protein families database","volume":"30","author":"Bateman","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023051607120481200_btab442-B5","doi-asserted-by":"crossref","first-page":"12180","DOI":"10.1073\/pnas.1606762113","article-title":"Inferring interaction partners from protein sequences","volume":"113","author":"Bitbol","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051607120481200_btab442-B6","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.cels.2020.11.005","article-title":"RBM-MHC: a semi-supervised machine-learning method for sample-specific prediction of antigen presentation by HLA-I alleles","volume":"12","author":"Bravi","year":"2020","journal-title":"Cell Syst"},{"key":"2023051607120481200_btab442-B7","doi-asserted-by":"crossref","first-page":"D464","DOI":"10.1093\/nar\/gky1004","article-title":"RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy","volume":"47","author":"Burley","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051607120481200_btab442-B8","doi-asserted-by":"crossref","first-page":"032601","DOI":"10.1088\/1361-6633\/aa9965","article-title":"Inverse statistical physics of protein sequences: a key issues review","volume":"81","author":"Cocco","year":"2018","journal-title":"Rep. Prog. Phys"},{"key":"2023051607120481200_btab442-B9","doi-asserted-by":"crossref","first-page":"508","DOI":"10.1002\/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4","article-title":"Evaluation and improvement of multiple sequence methods for protein secondary structure prediction","volume":"34","author":"Cuff","year":"1999","journal-title":"Proteins: Struc. Funct. Bioinform"},{"key":"2023051607120481200_btab442-B10","doi-asserted-by":"crossref","first-page":"4046","DOI":"10.1093\/bioinformatics\/bty494","article-title":"Deep convolutional networks for quality assessment of protein folds","volume":"34","author":"Derevyanko","year":"2018","journal-title":"Bioinformatics"},{"key":"2023051607120481200_btab442-B11","doi-asserted-by":"crossref","first-page":"W389","DOI":"10.1093\/nar\/gkv332","article-title":"JPred4: a protein secondary structure prediction server","volume":"43","author":"Drozdetskiy","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051607120481200_btab442-B12","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msv211","article-title":"Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1","volume":"33","author":"Figliuzzi","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023051607120481200_btab442-B13","doi-asserted-by":"crossref","DOI":"10.1101\/2020.04.07.029264","volume-title":"Generating Functional Protein Variants With Variational Autoencoders","author":"Hawkins-Hooker","year":"2020"},{"key":"2023051607120481200_btab442-B14","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023051607120481200_btab442-B15","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation effects predicted from sequence co-variation","volume":"35","author":"Hopf","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023051607120481200_btab442-B16","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023051607120481200_btab442-B17","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1002\/prot.25674","article-title":"NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning","volume":"87","author":"Klausen","year":"2018","journal-title":"Proteins"},{"key":"2023051607120481200_btab442-B18","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1109\/18.910572","article-title":"Factor graphs and the sum-product algorithm","volume":"47","author":"Kschischang","year":"2001","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023051607120481200_btab442-B19","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1186\/s12859-018-2367-z","article-title":"Biotite: a unifying open source computational biology framework in Python","volume":"19","author":"Kunzmann","year":"2018","journal-title":"BMC Bioinform"},{"key":"2023051607120481200_btab442-B20","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1078\/1438-4221-00198","article-title":"Molecular analysis of beta-lactamase structure and function","volume":"292","author":"Majiduddin","year":"2002","journal-title":"Int. J. Med. Microbiol"},{"key":"2023051607120481200_btab442-B21","doi-asserted-by":"crossref","first-page":"e1004262","DOI":"10.1371\/journal.pcbi.1004262","article-title":"Large-scale conformational transitions and dimerization are encoded in the amino-acid sequences of hsp70 chaperones","volume":"11","author":"Malinverni","year":"2015","journal-title":"PLoS Comput. Biol"},{"key":"2023051607120481200_btab442-B22","first-page":"8026","article-title":"PyTorch: an imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst"},{"key":"2023051607120481200_btab442-B23","first-page":"2825","article-title":"scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023051607120481200_btab442-B24","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1038\/s42256-021-00310-5","article-title":"Expanding functional protein sequence space using generative adversarial networks","volume":"3","author":"Repecka","year":"2019","journal-title":"Nat. Mach. Intell"},{"key":"2023051607120481200_btab442-B25","doi-asserted-by":"crossref","first-page":"e2016239118","DOI":"10.1073\/pnas.2016239118","article-title":"Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences","volume":"118","author":"Rives","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051607120481200_btab442-B26","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1038\/nature03990","article-title":"Natural-like function in artificial WW domains","volume":"437","author":"Russ","year":"2005","journal-title":"Nature"},{"key":"2023051607120481200_btab442-B27","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1126\/science.aba3304","article-title":"An evolution-based model for designing chorismate mutase enzymes","volume":"369","author":"Russ","year":"2020","journal-title":"Science"},{"key":"2023051607120481200_btab442-B28","first-page":"21","article-title":"Learning and evaluating Boltzmann machines","author":"Salakhutdinov","year":"2008"},{"key":"2023051607120481200_btab442-B29","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinform"},{"key":"2023051607120481200_btab442-B30","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: a worldwide hub of protein knowledge","volume":"47","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051607120481200_btab442-B31","first-page":"1064","author":"Tieleman","year":"2008"},{"key":"2023051607120481200_btab442-B32","doi-asserted-by":"crossref","first-page":"e39397","DOI":"10.7554\/eLife.39397","article-title":"Learning protein constitutive motifs from sequence data","volume":"8","author":"Tubiana","year":"2019","journal-title":"eLife"},{"key":"2023051607120481200_btab442-B33","doi-asserted-by":"crossref","first-page":"18962","DOI":"10.1038\/srep18962","article-title":"Protein secondary structure prediction using deep convolutional neural fields","volume":"6","author":"Wang","year":"2016","journal-title":"Sci. Rep"},{"key":"2023051607120481200_btab442-B34","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1073\/pnas.0805923106","article-title":"Identification of direct residue contacts in protein\u2013protein interaction by message passing","volume":"106","author":"Weigt","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051607120481200_btab442-B35","first-page":"482","article-title":"Sixty-five years of the long march in protein secondary structure prediction: the final stretch?","volume":"19","author":"Yang","year":"2018","journal-title":"Brief. Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab442\/39309821\/btab442.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/22\/4083\/50335208\/btab442.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/22\/4083\/50335208\/btab442.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T03:28:58Z","timestamp":1684207738000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/22\/4083\/6297391"}},"subtitle":[],"editor":[{"given":"Lenore","family":"Cowen","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,6,12]]},"references-count":35,"journal-issue":{"issue":"22","published-print":{"date-parts":[[2021,11,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab442","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.01.31.428964","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11,15]]},"published":{"date-parts":[[2021,6,12]]}}}