{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T09:08:10Z","timestamp":1781773690712,"version":"3.54.5"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,11,30]],"date-time":"2022-11-30T00:00:00Z","timestamp":1669766400000},"content-version":"vor","delay-in-days":20,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["S10OD028632-01"],"award-info":[{"award-number":["S10OD028632-01"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"FAS Division of Science, Research Computing Group at Harvard University"},{"name":"NSF-Simons Center for Mathematical and Statistical Analysis of Biology at Harvard","award":["#1764269"],"award-info":[{"award-number":["#1764269"]}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R35-GM134922"],"award-info":[{"award-number":["R35-GM134922"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Exascale Computing Project","award":["17-SC-20-SC"],"award-info":[{"award-number":["17-SC-20-SC"]}]},{"name":"Department of Energy Office of Science"},{"DOI":"10.13039\/100006168","name":"National Nuclear Security Administration","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006168","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory"},{"name":"Developmental Funds from the Cancer Center Support","award":["5P30CA045508"],"award-info":[{"award-number":["5P30CA045508"]}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["DP5OD026389"],"award-info":[{"award-number":["DP5OD026389"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["MCB2032259"],"award-info":[{"award-number":["MCB2032259"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Moore\u2013Simons Project on the Origin of the Eukaryotic Cell, Simons Foundation","award":["735929LPI"],"award-info":[{"award-number":["735929LPI"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Multiple sequence alignments (MSAs) of homologous sequences contain information on structural and functional constraints and their evolutionary histories. Despite their importance for many downstream tasks, such as structure prediction, MSA generation is often treated as a separate pre-processing step, without any guidance from the application it will be used for.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Here, we implement a smooth and differentiable version of the Smith\u2013Waterman pairwise alignment algorithm that enables jointly learning an MSA and a downstream machine learning system in an end-to-end fashion. To demonstrate its utility, we introduce SMURF (Smooth Markov Unaligned Random Field), a new method that jointly learns an alignment and the parameters of a Markov Random Field for unsupervised contact prediction. We find that SMURF learns MSAs that mildly improve contact prediction on a diverse set of protein and RNA families. As a proof of concept, we demonstrate that by connecting our differentiable alignment module to AlphaFold2 and maximizing predicted confidence, we can learn MSAs that improve structure predictions over the initial MSAs. Interestingly, the alignments that improve AlphaFold predictions are self-inconsistent and can be viewed as adversarial. This work highlights the potential of differentiable dynamic programming to improve neural network pipelines that rely on an alignment and the potential dangers of optimizing predictions of protein sequences with methods that are not fully understood.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Our code and examples are available at: https:\/\/github.com\/spetti\/SMURF.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac724","type":"journal-article","created":{"date-parts":[[2022,11,8]],"date-time":"2022-11-08T17:20:47Z","timestamp":1667928047000},"source":"Crossref","is-referenced-by-count":40,"title":["End-to-end learning of multiple sequence alignments with differentiable Smith\u2013Waterman"],"prefix":"10.1093","volume":"39","author":[{"given":"Samantha","family":"Petti","sequence":"first","affiliation":[{"name":"NSF-Simons Center for the Mathematical and Statistical Analysis of Biology, Harvard University , Cambridge, MA 02138, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nicholas","family":"Bhattacharya","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California Berkeley , Berkeley, CA 94720, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roshan","family":"Rao","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Sciences, University of California Berkeley , Berkeley, CA 94720, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Justas","family":"Dauparas","sequence":"additional","affiliation":[{"name":"Institute for Protein Design, University of Washington , Seattle, WA 98195, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Neil","family":"Thomas","sequence":"additional","affiliation":[{"name":"Electrical Engineering and Computer Sciences, University of California Berkeley , Berkeley, CA 94720, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Juannan","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Biology, University of Florida , Gainesville, FL 32611, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alexander M","family":"Rush","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Cornell Tech , New York, NY 10044, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8722-0038","authenticated-orcid":false,"given":"Peter","family":"Koo","sequence":"additional","affiliation":[{"name":"Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory , Cold Spring Harbor, NY 11724, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2774-2744","authenticated-orcid":false,"given":"Sergey","family":"Ovchinnikov","sequence":"additional","affiliation":[{"name":"John Harvard Distinguished Science Fellowship, Harvard University , Cambridge, MA 02138, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2022,11,10]]},"reference":[{"key":"2023010107520516600_btac724-B1","author":"Abadi","year":"2015"},{"key":"2023010107520516600_btac724-B2","first-page":"1","author":"Akiyama","year":"2021"},{"key":"2023010107520516600_btac724-B3","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023010107520516600_btac724-B4","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1126\/science.abj8754","article-title":"Accurate prediction of protein structures and interactions using a three-track neural network","volume":"373","author":"Baek","year":"2021","journal-title":"Science"},{"key":"2023010107520516600_btac724-B5","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1002\/prot.22934","article-title":"Learning generative models for protein fold families","volume":"79","author":"Balakrishnan","year":"2011","journal-title":"Proteins"},{"key":"2023010107520516600_btac724-B6","author":"Bepler","year":"2018"},{"key":"2023010107520516600_btac724-B7","article-title":"Learning with differentiable pertubed optimizers","volume":"33","author":"Berthet","year":"2020"},{"key":"2023010107520516600_btac724-B8","first-page":"34","author":"Bhattacharya"},{"key":"2023010107520516600_btac724-B9","author":"Bradbury","year":"2018"},{"key":"2023010107520516600_btac724-B10","article-title":"DTWNet: a dynamic timewarping network","author":"Cai","year":"2019","journal-title":"In: Advances in Neural Information Processing Systems, Vancouver, BC, Canada"},{"key":"2023010107520516600_btac724-B11","volume-title":"Atlas of Protein Sequence and Structure","author":"Dayhoff","year":"1972"},{"key":"2023010107520516600_btac724-B12","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023010107520516600_btac724-B13","first-page":"302","author":"Durrett","year":"2015"},{"key":"2023010107520516600_btac724-B14","doi-asserted-by":"crossref","first-page":"012707","DOI":"10.1103\/PhysRevE.87.012707","article-title":"Improved contact prediction in proteins: using pseudolikelihoods to infer potts models","volume":"87","author":"Ekeberg","year":"2013","journal-title":"Phys. Rev. E"},{"key":"2023010107520516600_btac724-B15","volume-title":"Inferring Phylogenies","author":"Felsenstein","year":"2004"},{"key":"2023010107520516600_btac724-B16","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msv211","article-title":"Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase tem-1","volume":"33","author":"Figliuzzi","year":"2016","journal-title":"Mol. Biol. Evol"},{"key":"2023010107520516600_btac724-B17","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1038\/s41586-021-04043-8","article-title":"Disease variant prediction with deep generative models of evolutionary data","volume":"599","author":"Frazer","year":"2021","journal-title":"Nature"},{"key":"2023010107520516600_btac724-B18","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1016\/j.molcel.2016.06.012","article-title":"Automated structure-and sequence-based design of proteins for high bacterial expression and stability","volume":"63","author":"Goldenzweig","year":"2016","journal-title":"Mol. Cell"},{"key":"2023010107520516600_btac724-B19","author":"Gu","year":"2015"},{"key":"2023010107520516600_btac724-B20","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023010107520516600_btac724-B21","article-title":"Fooling neural network interpretations via adversarial model manipulation","author":"Heo","year":"2019"},{"key":"2023010107520516600_btac724-B22","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation effects predicted from sequence co-variation","volume":"35","author":"Hopf","year":"2017","journal-title":"Nat. Biotechnol"},{"key":"2023010107520516600_btac724-B23","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1093\/bioinformatics\/btr638","article-title":"PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments","volume":"28","author":"Jones","year":"2012","journal-title":"Bioinformatics"},{"key":"2023010107520516600_btac724-B24","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023010107520516600_btac724-B25","doi-asserted-by":"crossref","first-page":"D192","DOI":"10.1093\/nar\/gkaa1047","article-title":"Rfam 14: expanded coverage of metagenomic, viral and microRNA families","volume":"49","author":"Kalvari","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023010107520516600_btac724-B26","doi-asserted-by":"crossref","first-page":"15674","DOI":"10.1073\/pnas.1314045110","article-title":"Assessing the utility of coevolution-based residue\u2013residue contact predictions in a sequence-and structure-rich era","volume":"110","author":"Kamisetty","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023010107520516600_btac724-B27","first-page":"2369","author":"Kim","year":"2019"},{"key":"2023010107520516600_btac724-B28","doi-asserted-by":"crossref","first-page":"45","DOI":"10.2142\/biophysico.13.0_45","article-title":"A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions","volume":"13","author":"Kinjo","year":"2016","journal-title":"Biophys. Physicobiol"},{"key":"2023010107520516600_btac724-B29","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1016\/j.jmb.2003.08.015","article-title":"Sequence alignments and pair hidden Markov models using evolutionary history","volume":"333","author":"Knudsen","year":"2003","journal-title":"J. Mol. Biol"},{"key":"2023010107520516600_btac724-B30","author":"Llinares-L\u00f3pez","year":""},{"key":"2023010107520516600_btac724-B31","doi-asserted-by":"crossref","first-page":"1407","DOI":"10.1007\/s11434-016-1103-1","article-title":"New insights into substrate folding preference of plant OSCs","volume":"61","author":"Ma","year":"2016","journal-title":"Science Bulletin"},{"key":"2023010107520516600_btac724-B32","first-page":"3462","author":"Mensch","year":"2018"},{"key":"2023010107520516600_btac724-B33","author":"Mirdita","year":"2022"},{"key":"2023010107520516600_btac724-B34","first-page":"141","article-title":"Protein sequence-structure alignment based on site-alignment probabilities","volume":"11","author":"Miyazawa","year":"2000","journal-title":"Genome Inform. Ser. Workshop Genome Inform"},{"key":"2023010107520516600_btac724-B35","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"Morcos","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023010107520516600_btac724-B36","author":"Mordvintsev","year":"2015"},{"key":"2023010107520516600_btac724-B37","author":"Morton","year":"2020"},{"key":"2023010107520516600_btac724-B38","doi-asserted-by":"crossref","first-page":"062409","DOI":"10.1103\/PhysRevE.102.062409","article-title":"Aligning biological sequences by exploiting residue conservation and coevolution","volume":"102","author":"Muntoni","year":"2020","journal-title":"Phys. Rev. E"},{"key":"2023010107520516600_btac724-B39","doi-asserted-by":"crossref","first-page":"2933","DOI":"10.1093\/bioinformatics\/btt509","article-title":"Infernal 1.1: 100-fold faster RNA homology searches","volume":"29","author":"Nawrocki","year":"2013","journal-title":"Bioinformatics"},{"key":"2023010107520516600_btac724-B40","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","article-title":"A general method applicable to the search for similarities in the amino acid sequence of two proteins","volume":"48","author":"Needleman","year":"1970","journal-title":"J. Mol. Biol"},{"key":"2023010107520516600_btac724-B41","first-page":"427","author":"Nguyen","year":"2015"},{"key":"2023010107520516600_btac724-B42","doi-asserted-by":"crossref","first-page":"e02030","DOI":"10.7554\/eLife.02030","article-title":"Robust and accurate prediction of residue\u2013residue interactions across protein interfaces using evolutionary information","volume":"3","author":"Ovchinnikov","year":"2014","journal-title":"Elife"},{"key":"2023010107520516600_btac724-B43","author":"Paszke","year":"2019"},{"key":"2023010107520516600_btac724-B44","author":"Rush","year":"2020"},{"key":"2023010107520516600_btac724-B45","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1126\/science.aba3304","article-title":"An evolution-based model for designing chorismate mutase enzymes","volume":"369","author":"Russ","year":"2020","journal-title":"Science"},{"key":"2023010107520516600_btac724-B46","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1186\/1471-2105-7-246","article-title":"Optimizing amino acid substitution matrices with a local alignment kernel","volume":"7","author":"Saigo","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023010107520516600_btac724-B47","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/0471250953.bi0313s48","article-title":"Clustal omega","volume":"48","author":"Sievers","year":"2014","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023010107520516600_btac724-B48","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023010107520516600_btac724-B49","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023010107520516600_btac724-B50","author":"Stock","year":"2021"},{"key":"2023010107520516600_btac724-B51","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1038\/s41588-018-0167-z","article-title":"Predicting the clinical impact of human mutation with deep neural networks","volume":"50","author":"Sundaram","year":"2018","journal-title":"Nat. Genet"},{"key":"2023010107520516600_btac724-B52","author":"Szegedy","year":"2014"},{"key":"2023010107520516600_btac724-B53","doi-asserted-by":"crossref","first-page":"5674","DOI":"10.1002\/anie.201713220","article-title":"Co-evolutionary fitness landscapes for sequence design","volume":"57","author":"Tian","year":"2018","journal-title":"Angew. Chem. Int. Ed. Engl"},{"key":"2023010107520516600_btac724-B54","author":"Vlastelica","year":"2020"},{"key":"2023010107520516600_btac724-B55","doi-asserted-by":"crossref","first-page":"e1008085","DOI":"10.1371\/journal.pcbi.1008085","article-title":"Remote homology search with hidden Potts models","volume":"16","author":"Wilburn","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023010107520516600_btac724-B56","first-page":"145","article-title":"Using video-oriented instructions to speed up sequence comparison","volume":"13","author":"Wozniak","year":"1997","journal-title":"Comput. Appl. Biosci"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac724\/47464959\/btac724.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac724\/48448739\/btac724.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac724\/48448739\/btac724.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T05:09:47Z","timestamp":1672549787000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac724\/6820925"}},"subtitle":[],"editor":[{"given":"Karsten","family":"Borgwardt","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2022,11,10]]},"references-count":56,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,11,10]]},"published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac724","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.10.23.465204","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,11,10]]},"article-number":"btac724"}}