{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T20:02:51Z","timestamp":1760731371880,"version":"3.37.3"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,9,21]],"date-time":"2020-09-21T00:00:00Z","timestamp":1600646400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Division of General Medical Sciences"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R35-118039"],"award-info":[{"award-number":["R35-118039"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the \u2018twilight zone\u2019 of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent \u2018d\u2019). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure \u03b1-helical proteins successfully recognizes pairs of structurally related pure \u03b2-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is \u223c150% better than HHsearch for generating pairwise alignments and \u223c50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Datasets and source codes of SAdLSA are available free of charge for academic users at http:\/\/sites.gatech.edu\/cssb\/sadlsa\/.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa810","type":"journal-article","created":{"date-parts":[[2020,9,8]],"date-time":"2020-09-08T19:13:50Z","timestamp":1599592430000},"page":"490-496","source":"Crossref","is-referenced-by-count":17,"title":["A novel sequence alignment algorithm based on deep learning of the protein folding code"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0378-3704","authenticated-orcid":false,"given":"Mu","family":"Gao","sequence":"first","affiliation":[{"name":"Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA"}]},{"given":"Jeffrey","family":"Skolnick","sequence":"additional","affiliation":[{"name":"Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology , Atlanta, GA 30332, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,9,21]]},"reference":[{"key":"2023051706075401600_btaa810-B1","first-page":"265","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation","author":"Abadi","year":"2016"},{"key":"2023051706075401600_btaa810-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"volume-title":"Pattern Recognition and Machine Learning","year":"2006","author":"Bishop","key":"2023051706075401600_btaa810-B3"},{"key":"2023051706075401600_btaa810-B4","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/j.1460-2075.1986.tb04288.x","article-title":"The relation between the divergence of sequence and structure in proteins","volume":"5","author":"Chothia","year":"1986","journal-title":"EMBO J"},{"key":"2023051706075401600_btaa810-B5","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1089\/cmb.1995.2.9","article-title":"Maximum discrimination hidden Markov models of sequence consensus","volume":"2","author":"Eddy","year":"1995","journal-title":"J. Comput. Biol"},{"key":"2023051706075401600_btaa810-B6","doi-asserted-by":"crossref","first-page":"D304","DOI":"10.1093\/nar\/gkt1240","article-title":"SCOPe: Structural Classification of Proteins\u2014extended, integrating SCOP and ASTRAL data and classification of new structures","volume":"42","author":"Fox","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051706075401600_btaa810-B7","doi-asserted-by":"crossref","first-page":"597","DOI":"10.1093\/bioinformatics\/btt024","article-title":"APoc: large-scale identification of similar protein pockets","volume":"29","author":"Gao","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051706075401600_btaa810-B8","doi-asserted-by":"crossref","first-page":"3514","DOI":"10.1038\/s41598-019-40314-1","article-title":"DESTINI: a deep-learning approach to contact-driven protein structure prediction","volume":"9","author":"Gao","year":"2019","journal-title":"Sci. Rep"},{"key":"2023051706075401600_btaa810-B9","first-page":"770","article-title":"Deep residual learning for image recognition","author":"He","year":"2016","journal-title":"Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit"},{"key":"2023051706075401600_btaa810-B10","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051706075401600_btaa810-B11","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1126\/science.273.5275.595","article-title":"Mapping the protein universe","volume":"273","author":"Holm","year":"1996","journal-title":"Science"},{"key":"2023051706075401600_btaa810-B12","doi-asserted-by":"crossref","first-page":"2577","DOI":"10.1002\/bip.360221211","article-title":"Dictionary of protein secondary structure-pattern-recognition of hydrogen-bonded and geometrical features","volume":"22","author":"Kabsch","year":"1983","journal-title":"Biopolymers"},{"key":"2023051706075401600_btaa810-B13","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1006\/jmbi.1999.3233","article-title":"Benchmarking PSI-BLAST in genome annotation","volume":"293","author":"Muller","year":"1999","journal-title":"J.\u00a0Mol. Biol"},{"key":"2023051706075401600_btaa810-B14","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng"},{"key":"2023051706075401600_btaa810-B15","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1016\/S0022-2836(02)01371-2","article-title":"COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance","volume":"326","author":"Sadreyev","year":"2003","journal-title":"J. Mol. Biol"},{"key":"2023051706075401600_btaa810-B16","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1038\/s41586-019-1923-7","article-title":"Improved protein structure prediction using potentials from deep learning","volume":"577","author":"Senior","year":"2020","journal-title":"Nature"},{"key":"2023051706075401600_btaa810-B17","doi-asserted-by":"crossref","first-page":"502","DOI":"10.1002\/prot.20106","article-title":"Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm","volume":"56","author":"Skolnick","year":"2004","journal-title":"Proteins: Struct. Funct. Bioinform"},{"key":"2023051706075401600_btaa810-B18","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023051706075401600_btaa810-B19","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"Soding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023051706075401600_btaa810-B20","doi-asserted-by":"crossref","first-page":"404","DOI":"10.1016\/j.sbi.2011.03.005","article-title":"Protein sequence comparison and fold recognition: progress and good-practice benchmarking","volume":"21","author":"Soding","year":"2011","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023051706075401600_btaa810-B21","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1186\/s12859-019-3019-7","article-title":"HH-suite3 for fast remote homology detection and deep protein annotation","volume":"20","author":"Steinegger","year":"2019","journal-title":"BMC Bioinform"},{"key":"2023051706075401600_btaa810-B22","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The Universal Protein Resource (UniProt): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023051706075401600_btaa810-B23","doi-asserted-by":"crossref","first-page":"16856","DOI":"10.1073\/pnas.1821309116","article-title":"Distance-based protein folding powered by deep learning","volume":"116","author":"Xu","year":"2019","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051706075401600_btaa810-B24","doi-asserted-by":"crossref","first-page":"1257","DOI":"10.1006\/jmbi.2001.5293","article-title":"Within the twilight zone: a sensitive profile-profile comparison tool based on information theory","volume":"315","author":"Yona","year":"2002","journal-title":"J. Mol. Biol"},{"key":"2023051706075401600_btaa810-B25","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Zhang","year":"2004","journal-title":"Proteins: Struct. Funct. Bioinform"},{"key":"2023051706075401600_btaa810-B26","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023051706075401600_btaa810-B27","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1073\/pnas.0509379103","article-title":"On the origin and highly likely completeness of single-domain protein structures","volume":"103","author":"Zhang","year":"2006","journal-title":"Proc. Natl. Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa810\/34774669\/btaa810.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/4\/490\/50359861\/btaa810.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/4\/490\/50359861\/btaa810.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T06:16:34Z","timestamp":1684304194000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/4\/490\/5909984"}},"subtitle":[],"editor":[{"given":"Arne","family":"Elofsson","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,9,21]]},"references-count":27,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa810","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,2,15]]},"published":{"date-parts":[[2020,9,21]]}}}