{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:45:29Z","timestamp":1740185129505,"version":"3.37.3"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2017,10,12]],"date-time":"2017-10-12T00:00:00Z","timestamp":1507766400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001700","name":"Ministry of Education, Culture, Sports, Science and Technology","doi-asserted-by":"publisher","award":["JP16H05879","JP24680031","JP25240044"],"award-info":[{"award-number":["JP16H05879","JP24680031","JP25240044"]}],"id":[{"id":"10.13039\/501100001700","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment accuracy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criterion, which is widely utilized in model selection for probabilistic models with hidden variables. Our simulations indicated that this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The software is available at https:\/\/github.com\/bigsea-t\/fab-phmm.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx643","type":"journal-article","created":{"date-parts":[[2017,10,10]],"date-time":"2017-10-10T11:10:27Z","timestamp":1507633827000},"page":"576-584","source":"Crossref","is-referenced-by-count":0,"title":["Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm"],"prefix":"10.1093","volume":"34","author":[{"given":"Taikai","family":"Takeda","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9466-1034","authenticated-orcid":false,"given":"Michiaki","family":"Hamada","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, Tokyo, Japan"},{"name":"Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan"},{"name":"Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan"},{"name":"Institute for Medical-Oriented Structural Biology, Waseda University, Tokyo, Japan"},{"name":"Graduate School of Medicine, Nippon Medical School, Tokyo, Japan"}]}],"member":"286","published-online":{"date-parts":[[2017,10,12]]},"reference":[{"key":"2023012712334282500_btx643-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"first-page":"1","year":"2003","author":"Beal","key":"2023012712334282500_btx643-B2"},{"first-page":"859","year":"2016","author":"Blei","key":"2023012712334282500_btx643-B3"},{"key":"2023012712334282500_btx643-B4","doi-asserted-by":"crossref","first-page":"e1000392","DOI":"10.1371\/journal.pcbi.1000392","article-title":"Fast statistical alignment","volume":"5","author":"Bradley","year":"2009","journal-title":"PLoS Comput. Biol"},{"key":"2023012712334282500_btx643-B5","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1093\/molbev\/msn275","article-title":"Problems and solutions for estimating indel rates and length distributions","volume":"26","author":"Cartwright","year":"2009","journal-title":"Mol. Biol. Evol"},{"key":"2023012712334282500_btx643-B6","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023012712334282500_btx643-B7","doi-asserted-by":"crossref","first-page":"106.","DOI":"10.1186\/s13059-015-0670-9","article-title":"Split-alignment of genomes finds orthologies more accurately","volume":"16","author":"Frith","year":"2015","journal-title":"Genome Biol"},{"key":"2023012712334282500_btx643-B8","doi-asserted-by":"crossref","first-page":"80.","DOI":"10.1186\/1471-2105-11-80","article-title":"Parameters for accurate genome alignment","volume":"11","author":"Frith","year":"2010","journal-title":"BMC Bioinformatics"},{"year":"2012","author":"Fujimaki","key":"2023012712334282500_btx643-B9"},{"year":"2012","author":"Fujimaki","key":"2023012712334282500_btx643-B10"},{"key":"2023012712334282500_btx643-B11","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1093\/bioinformatics\/btw742","article-title":"Training alignment parameters for arbitrary sequencers with LAST-TRAIN","volume":"33","author":"Hamada","year":"2017","journal-title":"Bioinformatics"},{"key":"2023012712334282500_btx643-B12","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1093\/biomet\/57.1.97","article-title":"Monte Carlo sampling methods using Markov chains and their applications","volume":"57","author":"Hastings","year":"1970","journal-title":"Biometrika"},{"year":"2015","author":"Hayashi","key":"2023012712334282500_btx643-B13"},{"key":"2023012712334282500_btx643-B14","first-page":"1303","article-title":"Stochastic variational inference","volume":"14","author":"Hoffman","year":"2013","journal-title":"J. Mach. Learn. Res"},{"key":"2023012712334282500_btx643-B15","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1023\/A:1007665907178","article-title":"Introduction to variational methods for graphical models","volume":"37","author":"Jordan","year":"1999","journal-title":"Mach. Learn"},{"key":"2023012712334282500_btx643-B16","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1093\/bib\/bbq015","article-title":"A survey of sequence alignment algorithms for next-generation sequencing","volume":"11","author":"Li","year":"2010","journal-title":"Brief. Bioinformatics"},{"year":"2015","author":"Liu","key":"2023012712334282500_btx643-B17"},{"key":"2023012712334282500_btx643-B18","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1101\/gr.6725608","article-title":"Uncertainty in homology inferences: assessing and improving genomic sequence alignment","volume":"18","author":"Lunter","year":"2008","journal-title":"Genome Res"},{"key":"2023012712334282500_btx643-B19","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1089\/10665270252935520","article-title":"Applications of generalized pair hidden Markov models to alignment and gene finding problems","volume":"9","author":"Pachter","year":"2002","journal-title":"J. Comput. Biol"},{"key":"2023012712334282500_btx643-B20","doi-asserted-by":"crossref","first-page":"1814","DOI":"10.1101\/gr.076554.108","article-title":"Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs","volume":"18","author":"Paten","year":"2008","journal-title":"Genome Res"},{"key":"2023012712334282500_btx643-B21","doi-asserted-by":"crossref","first-page":"406.","DOI":"10.1186\/s12859-015-0832-5","article-title":"Parameterizing sequence alignment with an explicit evolutionary model","volume":"16","author":"Rivas","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023012712334282500_btx643-B22","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann. Math. Stat"},{"key":"2023012712334282500_btx643-B23","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023012712334282500_btx643-B24","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511800474","volume-title":"Algebraic Geometry and Statistical Learning Theory","author":"Watanabe","year":"2009"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/576\/48914350\/bioinformatics_34_4_576.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/4\/576\/48914350\/bioinformatics_34_4_576.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,26]],"date-time":"2023-08-26T22:44:26Z","timestamp":1693089866000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/4\/576\/4470357"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,10,12]]},"references-count":24,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx643","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,2,15]]},"published":{"date-parts":[[2017,10,12]]}}}