{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T00:33:35Z","timestamp":1774485215912,"version":"3.50.1"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2019,7,22]],"date-time":"2019-07-22T00:00:00Z","timestamp":1563753600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,1,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Sequence alignment remains fundamental in bioinformatics. Pair-wise alignment is traditionally based on ad hoc scores for substitutions, insertions and deletions, but can also be based on probability models (pair hidden Markov models: PHMMs). PHMMs enable us to: fit the parameters to each kind of data, calculate the reliability of alignment parts and measure sequence similarity integrated over possible alignments.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>This study shows how multiple models correspond to one set of scores. Scores can be converted to probabilities by partition functions with a \u2018temperature\u2019 parameter: for any temperature, this corresponds to some PHMM. There is a special class of models with balanced length probability, i.e. no bias toward either longer or shorter alignments. The best way to score alignments and assess their significance depends on the aim: judging whether whole sequences are related versus finding related parts. This clarifies the statistical basis of sequence alignment.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz576","type":"journal-article","created":{"date-parts":[[2019,7,19]],"date-time":"2019-07-19T07:23:46Z","timestamp":1563521026000},"page":"408-415","source":"Crossref","is-referenced-by-count":19,"title":["How sequence alignment scores correspond to probability models"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0998-2859","authenticated-orcid":false,"given":"Martin C","family":"Frith","sequence":"first","affiliation":[{"name":"Artificial Intelligence Research Center , AIST, Tokyo 135-0064, Japan"},{"name":"Graduate School of Frontier Sciences , University of Tokyo, Chiba 277-8568, Japan"},{"name":"AIST-Waseda University CBBD-OIL , AIST, Tokyo 169-8555, Japan"}]}],"member":"286","published-online":{"date-parts":[[2019,7,22]]},"reference":[{"key":"2023013112075082100_btz576-B1","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1006\/jtbi.1993.1054","article-title":"Normalization of affine gap costs used in optimal sequence alignment","volume":"161","author":"Allison","year":"1993","journal-title":"J. Theor. Biol"},{"key":"2023013112075082100_btz576-B2","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1007\/BF00160262","article-title":"Finite-state models in the alignment of macromolecules","volume":"35","author":"Allison","year":"1992","journal-title":"J. Mol. Evol"},{"key":"2023013112075082100_btz576-B3","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B4","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1093\/nar\/29.2.351","article-title":"The estimation of statistical parameters for local alignment score distributions","volume":"29","author":"Altschul","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B5","doi-asserted-by":"crossref","first-page":"116","DOI":"10.1109\/TCBB.2004.32","article-title":"Improved gapped alignment in BLAST","volume":"1","author":"Cameron","year":"2004","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023013112075082100_btz576-B6","first-page":"115","article-title":"Scoring pairwise genomic sequence alignments","volume":"7","author":"Chiaromonte","year":"2002","journal-title":"Pac. Symp. Biocomput"},{"key":"2023013112075082100_btz576-B7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023013112075082100_btz576-B8","doi-asserted-by":"crossref","first-page":"e1000069.","DOI":"10.1371\/journal.pcbi.1000069","article-title":"A probabilistic model of local sequence alignment that simplifies statistical significance estimation","volume":"4","author":"Eddy","year":"2008","journal-title":"PLoS Comput. Biol"},{"key":"2023013112075082100_btz576-B9","first-page":"205","article-title":"A new generation of homology search tools based on probabilistic inference","volume":"23","author":"Eddy","year":"2009","journal-title":"Genome Inform"},{"key":"2023013112075082100_btz576-B10","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1093\/bioinformatics\/btl582","article-title":"Striped Smith-Waterman speeds database searches six times over other SIMD implementations","volume":"23","author":"Farrar","year":"2007","journal-title":"Bioinformatics"},{"key":"2023013112075082100_btz576-B11","doi-asserted-by":"crossref","first-page":"e23","DOI":"10.1093\/nar\/gkq1212","article-title":"A new repeat-masking method enables specific detection of homologous sequences","volume":"39","author":"Frith","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B12","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1186\/s13059-015-0670-9","article-title":"Split-alignment of genomes finds orthologies more accurately","volume":"16","author":"Frith","year":"2015","journal-title":"Genome Biol"},{"key":"2023013112075082100_btz576-B13","doi-asserted-by":"crossref","first-page":"1661","DOI":"10.1093\/nar\/gkx1266","article-title":"A survey of localized sequence rearrangements in human DNA","volume":"46","author":"Frith","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B14","doi-asserted-by":"crossref","first-page":"e100.","DOI":"10.1093\/nar\/gkq010","article-title":"Incorporating sequence quality data into alignment improves DNA read mapping","volume":"38","author":"Frith","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B15","doi-asserted-by":"crossref","first-page":"e100.","DOI":"10.1093\/nar\/gks275","article-title":"A mostly traditional approach improves alignment of bisulfite-converted DNA","volume":"40","author":"Frith","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B16","doi-asserted-by":"crossref","first-page":"705","DOI":"10.1016\/0022-2836(82)90398-9","article-title":"An improved algorithm for matching biological sequences","volume":"162","author":"Gotoh","year":"1982","journal-title":"J. Mol. Biol"},{"key":"2023013112075082100_btz576-B17","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023013112075082100_btz576-B18","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1093\/protein\/8.10.999","article-title":"A reliable sequence alignment method based on probabilities of residue correspondences","volume":"8","author":"Miyazawa","year":"1995","journal-title":"Protein Eng"},{"key":"2023013112075082100_btz576-B19","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1098\/rsta.1933.0009","article-title":"On the problem of the most efficient tests of statistical hypotheses","volume":"231","author":"Neyman","year":"1933","journal-title":"Phil. Trans. R. Soc. Lond. A"},{"key":"2023013112075082100_btz576-B20","doi-asserted-by":"crossref","first-page":"3697","DOI":"10.1214\/08-AOS663","article-title":"Estimating the Gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times","volume":"37","author":"Park","year":"2009","journal-title":"Ann. Statist"},{"key":"2023013112075082100_btz576-B21","doi-asserted-by":"crossref","first-page":"221.","DOI":"10.1186\/1471-2105-12-221","article-title":"Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation","volume":"12","author":"Rognes","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023013112075082100_btz576-B22","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023013112075082100_btz576-B23","doi-asserted-by":"crossref","first-page":"45.","DOI":"10.1186\/s12859-018-2014-8","article-title":"Introducing difference recurrence relations for faster semi-global alignment of long sequences","volume":"19 (Suppl. 1","author":"Suzuki","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023013112075082100_btz576-B24","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1093\/bioinformatics\/bti070","article-title":"The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions","volume":"21","author":"Yu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013112075082100_btz576-B25","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1089\/10665270152530845","article-title":"Statistical significance of probabilistic sequence alignment and related local hidden Markov models","volume":"8","author":"Yu","year":"2001","journal-title":"J. Comput. Biol"},{"key":"2023013112075082100_btz576-B26","doi-asserted-by":"crossref","first-page":"15688","DOI":"10.1073\/pnas.2533904100","article-title":"The compositional adjustment of amino acid substitution matrices","volume":"100","author":"Yu","year":"2003","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023013112075082100_btz576-B27","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1006\/jtbi.1995.0085","article-title":"Alignment of molecular sequences seen as random path analysis","volume":"174","author":"Zhang","year":"1995","journal-title":"J. Theor. Biol"},{"key":"2023013112075082100_btz576-B28","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1089\/cmb.1997.4.339","article-title":"Aligning a DNA sequence with a protein sequence","volume":"4","author":"Zhang","year":"1997","journal-title":"J. Comput. Biol"},{"key":"2023013112075082100_btz576-B29","doi-asserted-by":"crossref","first-page":"1012","DOI":"10.1093\/bioinformatics\/15.12.1012","article-title":"Post-processing long pairwise alignments","volume":"15","author":"Zhang","year":"1999","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz576\/29097431\/btz576.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/2\/408\/48991217\/btz576.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/2\/408\/48991217\/btz576.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T16:26:06Z","timestamp":1675182366000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/2\/408\/5536873"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,7,22]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,1,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz576","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/580951","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,1,15]]},"published":{"date-parts":[[2019,7,22]]}}}