{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T20:42:05Z","timestamp":1776026525888,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the \u2018daylight\u2019, \u2018twilight\u2019 and \u2018midnight\u2019 zones for interpreting residue\u2013residue correspondences from sequence information alone.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac247","type":"journal-article","created":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T11:10:15Z","timestamp":1649934615000},"page":"i255-i263","source":"Crossref","is-referenced-by-count":11,"title":["On the reliability and the limits of inference of amino acid sequence alignments"],"prefix":"10.1093","volume":"38","author":[{"given":"Sandun","family":"Rajapaksa","sequence":"first","affiliation":[{"name":"Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University , Clayton, VIC 3800, Australia"}]},{"given":"Dinithi","family":"Sumanaweera","sequence":"additional","affiliation":[{"name":"Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University , Clayton, VIC 3800, Australia"}]},{"given":"Arthur M","family":"Lesk","sequence":"additional","affiliation":[{"name":"Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University , University Park, PA 16802, USA"}]},{"given":"Lloyd","family":"Allison","sequence":"additional","affiliation":[{"name":"Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University , Clayton, VIC 3800, Australia"}]},{"given":"Peter J","family":"Stuckey","sequence":"additional","affiliation":[{"name":"Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University , Clayton, VIC 3800, Australia"}]},{"given":"Maria","family":"Garcia de la Banda","sequence":"additional","affiliation":[{"name":"Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University , Clayton, VIC 3800, Australia"}]},{"given":"David","family":"Abramson","sequence":"additional","affiliation":[{"name":"Research Computing Center, University of Queensland , St Lucia, QLD 4067, Australia"}]},{"given":"Arun S","family":"Konagurthu","sequence":"additional","affiliation":[{"name":"Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University , Clayton, VIC 3800, Australia"}]}],"member":"286","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"2023041407555321000_","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-76433-7","volume-title":"Coding Ockham\u2019s Razor","author":"Allison","year":"2018"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1093\/protein\/1.2.89","article-title":"Evaluation and improvements in the automatic alignment of protein sequences","volume":"1","author":"Barton","year":"1987","journal-title":"Protein Eng"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1006\/jmbi.2001.4495","article-title":"Pairwise sequence alignment below the twilight zone","volume":"307","author":"Blake","year":"2001","journal-title":"J. Mol. Biol"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"327","DOI":"10.2174\/1389203033487072","article-title":"Crystallographic and bioinformatic studies on restriction endonucleases: inference of evolutionary relationships in the \u201cmidnight zone\u201d of homology","volume":"4","author":"Bujnicki","year":"2003","journal-title":"Curr. Protein Pept. Sci"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"e1003926","DOI":"10.1371\/journal.pcbi.1003926","article-title":"ECOD: an evolutionary classification of protein domains","volume":"10","author":"Cheng","year":"2014","journal-title":"PLoS Comput. Biol"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/j.1460-2075.1986.tb04288.x","article-title":"The relation between the divergence of sequence and structure in proteins","volume":"5","author":"Chothia","year":"1986","journal-title":"EMBO J"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1016\/S0969-2126(96)00119-0","article-title":"A structural explanation for the twilight zone of protein sequence homology","volume":"4","author":"Chung","year":"1996","journal-title":"Structure"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"1005","DOI":"10.1093\/bioinformatics\/btw757","article-title":"Statistical inference of protein structural alignments using information and compression","volume":"33","author":"Collier","year":"2017","journal-title":"Bioinformatics"},{"key":"2023041407555321000_","first-page":"345","article-title":"A model of evolutionary change in proteins","volume":"5","author":"Dayhoff","year":"1978","journal-title":"Atlas Protein Seq. Struct"},{"key":"2023041407555321000_","first-page":"160","volume-title":"Annual International Conference on Research in Computational Molecular Biology","author":"Do","year":"2006"},{"key":"2023041407555321000_","volume-title":"Of URFs and ORFs: A Primer on How to Analyze Derived Amino Acid Sequences","author":"Doolittle","year":"1986"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1002\/pro.5560010201","article-title":"Reconstructing history with amino acid sequences 1","volume":"1","author":"Doolittle","year":"1992","journal-title":"Protein Sci"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"1382","DOI":"10.1073\/pnas.80.5.1382","article-title":"Optimal sequence alignments","volume":"80","author":"Fitch","year":"1983","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1016\/S0968-0004(00)89105-7","article-title":"Dali: a network tool for protein structure comparison","volume":"20","author":"Holm","year":"1995","journal-title":"Trends Biochem. Sci"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"1487","DOI":"10.1110\/ps.9.8.1487","article-title":"Improving the quality of twilight-zone alignments","volume":"9","author":"Jaroszewski","year":"2000","journal-title":"Protein Sci"},{"key":"2023041407555321000_","first-page":"559","article-title":"Mustang: a multiple structural alignment algorithm","volume":"64","author":"Konagurthu","year":"2006","journal-title":"Proteins Bioinform"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","DOI":"10.1093\/hesc\/9780198716846.001.0001","volume-title":"Introduction to Protein Science: Architecture, Function, and Genomics","author":"Lesk","year":"2016"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"674","DOI":"10.1093\/bioinformatics\/btu697","article-title":"Context similarity scoring improves protein sequence alignments in the midnight zone","volume":"31","author":"Meier","year":"2015","journal-title":"Bioinformatics"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1093\/oxfordjournals.molbev.a003985","article-title":"Estimating amino acid substitution models: a comparison of Dayhoff\u2019s estimator, the resolvent approach and a maximum likelihood method","volume":"19","author":"M\u00fcller","year":"2002","journal-title":"Mol. Biol. Evol"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"Cath\u2014a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023041407555321000_","first-page":"133","article-title":"Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship","volume":"8","author":"Pastore","year":"1990","journal-title":"Proteins Bioinform"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"2353","DOI":"10.1093\/bioinformatics\/btm355","article-title":"Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone","volume":"23","author":"Reid","year":"2007","journal-title":"Bioinformatics"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1186\/s12859-015-0832-5","article-title":"Parameterizing sequence alignment with an explicit evolutionary model","volume":"16","author":"Rivas","year":"2015","journal-title":"BMC Bioinform"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1093\/protein\/12.2.85","article-title":"Twilight zone of protein sequence alignments","volume":"12","author":"Rost","year":"1999","journal-title":"Protein Eng"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"i360","DOI":"10.1093\/bioinformatics\/btz368","article-title":"Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors","volume":"35","author":"Sumanaweera","year":"2019","journal-title":"Bioinformatics"},{"key":"2023041407555321000_","first-page":"i229","article-title":"Bridging the gaps in statistical models of protein alignment","volume-title":"Bioinformatics","author":"Sumanaweera","year":"2022"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0022-2836(05)80006-3","article-title":"Sequence alignment and penalty choice: review of concepts, case studies and implications","volume":"235","author":"Vingron","year":"1994","journal-title":"J. Mol. Biol"},{"key":"2023041407555321000_","volume-title":"Statistical and Inductive Inference Using Minimum Message Length. Information Science and Statistics","author":"Wallace","year":"2005"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1093\/comjnl\/11.2.185","article-title":"An information measure for classification","volume":"11","author":"Wallace","year":"1968","journal-title":"Comput. J"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023041407555321000_","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/B978-1-4832-2734-4.50017-6","volume-title":"Evolving Genes and Proteins","author":"Zuckerkandl","year":"1965"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i255\/49886410\/btac247.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_1\/i255\/49886410\/btac247.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,22]],"date-time":"2024-09-22T06:55:43Z","timestamp":1726988143000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_1\/i255\/6617521"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,24]]},"references-count":34,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2022,6,24]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac247","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,1]]},"published":{"date-parts":[[2022,6,24]]}}}