{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T20:46:53Z","timestamp":1761598013520},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,11,16]],"date-time":"2018-11-16T00:00:00Z","timestamp":1542326400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Hidden Markov Models (HMMs) are probabilistic models widely used in applications in computational sequence analysis. HMMs are basically unsupervised models. However, in the most important applications, they are trained in a supervised manner. Training examples accompanied by labels corresponding to different classes are given as input and the set of parameters that maximize the joint probability of sequences and labels is estimated. A main problem with this approach is that, in the majority of the cases, labels are hard to find and thus the amount of training data is limited. On the other hand, there are plenty of unclassified (unlabeled) sequences deposited in the public databases that could potentially contribute to the training procedure. This approach is called semi-supervised learning and could be very helpful in many applications.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We propose here, a method for semi-supervised learning of HMMs that can incorporate labeled, unlabeled and partially labeled data in a straightforward manner. The algorithm is based on a variant of the Expectation-Maximization (EM) algorithm, where the missing labels of the unlabeled or partially labeled data are considered as the missing data. We apply the algorithm to several biological problems, namely, for the prediction of transmembrane protein topology for alpha-helical and beta-barrel membrane proteins and for the prediction of archaeal signal peptides. The results are very promising, since the algorithms presented here can significantly improve the prediction performance of even the top-scoring classifiers.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty910","type":"journal-article","created":{"date-parts":[[2018,11,15]],"date-time":"2018-11-15T21:50:13Z","timestamp":1542318613000},"page":"2208-2215","source":"Crossref","is-referenced-by-count":23,"title":["Semi-supervised learning of Hidden Markov Models for biological sequence analysis"],"prefix":"10.1093","volume":"35","author":[{"given":"Ioannis A","family":"Tamposis","sequence":"first","affiliation":[{"name":"Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece"}]},{"given":"Konstantinos D","family":"Tsirigos","sequence":"additional","affiliation":[{"name":"Department of Bio and Health Informatics, Technical University of Denmark, Kgs Lyngby, Denmark"}]},{"given":"Margarita C","family":"Theodoropoulou","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece"}]},{"given":"Panagiota I","family":"Kontou","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece"}]},{"given":"Pantelis G","family":"Bagos","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece"}]}],"member":"286","published-online":{"date-parts":[[2018,11,16]]},"reference":[{"key":"2023051612072541000_bty910-B1","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1162\/0891201041850876","article-title":"Understanding the Yarowsky Algorithm","volume":"30","author":"Abney","year":"2004","journal-title":"Comput. Linguist."},{"key":"2023051612072541000_bty910-B2","first-page":"141","article-title":"Prediction of protein secondary structure by the hidden Markov model","volume":"9","author":"Asai","year":"1993","journal-title":"Comput. Appl. Biosci."},{"key":"2023051612072541000_bty910-B3","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1007\/978-3-540-30195-0_5","article-title":"Faster gradient descent conditional maximum likelihood training of Hidden Markov Models, using individual learning rate adaptation","volume-title":"Grammatical Inference: Algorithms and Applications","author":"Bagos","year":"2004"},{"key":"2023051612072541000_bty910-B4","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1186\/1471-2105-5-29","article-title":"A Hidden Markov Model method, capable of predicting and discriminating beta-barrel outer membrane proteins","volume":"5","author":"Bagos","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023051612072541000_bty910-B5","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1186\/1471-2105-6-7","article-title":"Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method","volume":"6","author":"Bagos","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023051612072541000_bty910-B6","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1186\/1471-2105-7-189","article-title":"Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins","volume":"7","author":"Bagos","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023051612072541000_bty910-B7","doi-asserted-by":"crossref","first-page":"5082","DOI":"10.1021\/pr800162c","article-title":"Prediction of lipoprotein signal peptides in Gram-positive bacteria with a Hidden Markov Model","volume":"7","author":"Bagos","year":"2008","journal-title":"J. Proteome Res."},{"key":"2023051612072541000_bty910-B8","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1016\/S1672-0229(08)60041-8","article-title":"How many 3D structures do we need to train a predictor?","volume":"7","author":"Bagos","year":"2009","journal-title":"Genomics Proteomics Bioinf."},{"key":"2023051612072541000_bty910-B9","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/protein\/gzn064","article-title":"Prediction of signal peptides in archaea","volume":"22","author":"Bagos","year":"2009","journal-title":"Protein Eng. Des. Sel."},{"key":"2023051612072541000_bty910-B10","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1162\/neco.1994.6.2.307","article-title":"Smooth on-line learning algorithms for Hidden Markov Models","volume":"6","author":"Baldi","year":"1994","journal-title":"Neural Comput."},{"key":"2023051612072541000_bty910-B11","first-page":"1","article-title":"An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes","volume":"3","author":"Baum","year":"1972","journal-title":"Inequalities"},{"key":"2023051612072541000_bty910-B12","volume-title":"Semi-Supervised Learning. Adaptive Computation and Machine Learning","author":"Chapelle","year":"2006"},{"key":"2023051612072541000_bty910-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. B"},{"key":"2023051612072541000_bty910-B14","doi-asserted-by":"crossref","first-page":"W408","DOI":"10.1093\/nar\/gkv451","article-title":"CCTOP: a Consensus Constrained TOPology prediction web server","volume":"43","author":"Dobson","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"2023051612072541000_bty910-B15","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis, Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023051612072541000_bty910-B16","first-page":"114","article-title":"Multiple alignment using hidden Markov models","volume":"3","author":"Eddy","year":"1995","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023051612072541000_bty910-B17","doi-asserted-by":"crossref","first-page":"2967","DOI":"10.1002\/pmic.201600249","article-title":"PlasmoSEP: predicting surface-exposed proteins on the malaria parasite using semisupervised self-training and expert-annotated data","volume":"16","author":"El-Manzalawy","year":"2016","journal-title":"Proteomics"},{"key":"2023051612072541000_bty910-B18","doi-asserted-by":"crossref","first-page":"e132","DOI":"10.1093\/bioinformatics\/btl219","article-title":"Semi-supervised LC\/MS alignment for differential proteomics","volume":"22","author":"Fischer","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B19","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1186\/s13059-017-1316-x","article-title":"McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes","volume":"18","author":"Hafez","year":"2017","journal-title":"Genome Biol."},{"key":"2023051612072541000_bty910-B20","doi-asserted-by":"crossref","first-page":"1570","DOI":"10.1109\/TPAMI.2003.1251150","article-title":"Exploitation of unlabeled sequences in Hidden Markov Models","volume":"25","author":"Inoue","year":"2003","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023051612072541000_bty910-B21","first-page":"275","article-title":"Semisupervised learning of hidden Markov models via a homotopy method","volume":"31","author":"Ji","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023051612072541000_bty910-B22","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1109\/29.60082","article-title":"The segmental K-means algorithm for estimating parameters of Hidden Markov Models","volume":"38","author":"Juang","year":"1990","journal-title":"IEEE Trans. Acoustics Speech Signal Process."},{"key":"2023051612072541000_bty910-B23","doi-asserted-by":"crossref","first-page":"1652","DOI":"10.1110\/ps.0303703","article-title":"Prediction of lipoprotein signal peptides in Gram-negative bacteria","volume":"12","author":"Juncker","year":"2003","journal-title":"Protein Sci."},{"key":"2023051612072541000_bty910-B24","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1038\/nmeth1113","article-title":"Semi-supervised learning for peptide identification from shotgun proteomics datasets","volume":"4","author":"Kall","year":"2007","journal-title":"Nat. Methods"},{"key":"2023051612072541000_bty910-B25","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1016\/j.jmb.2004.03.016","article-title":"A combined transmembrane topology and signal peptide prediction method","volume":"338","author":"Kall","year":"2004","journal-title":"J Mol. Biol."},{"key":"2023051612072541000_bty910-B26","doi-asserted-by":"crossref","first-page":"i251","DOI":"10.1093\/bioinformatics\/bti1014","article-title":"An HMM posterior decoder for sequence feature prediction that includes homology information","volume":"21","author":"Kall","year":"2005","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B27","first-page":"140","article-title":"Hidden Markov models for labelled sequences","volume-title":"Proceedings of the12th IAPR International Conference on Pattern Recognition","author":"Krogh","year":"1994"},{"key":"2023051612072541000_bty910-B28","first-page":"179","article-title":"Two methods for improving performance of an HMM and their application for gene finding","volume":"5","author":"Krogh","year":"1997","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023051612072541000_bty910-B29","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1006\/jmbi.2000.4315","article-title":"Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes","volume":"305","author":"Krogh","year":"2001","journal-title":"J. Mol. Biol."},{"key":"2023051612072541000_bty910-B30","doi-asserted-by":"crossref","first-page":"4768","DOI":"10.1093\/nar\/22.22.4768","article-title":"A hidden Markov model that finds genes in E. coli DNA","volume":"22","author":"Krogh","year":"1994","journal-title":"Nucleic Acids Res."},{"key":"2023051612072541000_bty910-B31","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/089976699300016764","article-title":"Hidden neural networks","volume":"11","author":"Krogh","year":"1999","journal-title":"Neural Comput."},{"key":"2023051612072541000_bty910-B32","doi-asserted-by":"crossref","first-page":"387","DOI":"10.1142\/S0219720008003382","article-title":"Prediction of cell wall sorting signals in gram-positive bacteria with a hidden markov model: application to complete genomes","volume":"06","author":"Litou","year":"2008","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023051612072541000_bty910-B33","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1016\/S0022-2836(03)00182-7","article-title":"Reliability measures for membrane protein topology prediction algorithms","volume":"327","author":"Melen","year":"2003","journal-title":"J. Mol. Biol."},{"key":"2023051612072541000_bty910-B34","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1093\/biomet\/80.2.267","article-title":"Maximum likelihood estimation via the ECM algorithm: a general framework","volume":"80","author":"Meng","year":"1993","journal-title":"Biometrika"},{"key":"2023051612072541000_bty910-B35","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1093\/bioinformatics\/17.7.646","article-title":"Evaluation of methods for the prediction of membrane spanning regions","volume":"17","author":"Moller","year":"2001","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B36","first-page":"122","article-title":"Prediction of signal peptides and signal anchors by a hidden Markov model","volume":"6","author":"Nielsen","year":"1998","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"2023051612072541000_bty910-B37","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1023\/A:1007692713085","article-title":"Text classification from labeled and unlabeled documents using EM","volume":"39","author":"Nigam","year":"2000","journal-title":"Mach. Learn."},{"key":"2023051612072541000_bty910-B38","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"Rabiner","year":"1989","journal-title":"Proc. IEEE"},{"key":"2023051612072541000_bty910-B39","first-page":"309","article-title":"Active Hidden Markov Models for information extraction","volume-title":"IDA 2001","author":"Scheffer","year":"2001"},{"key":"2023051612072541000_bty910-B40","doi-asserted-by":"crossref","first-page":"783","DOI":"10.1093\/bioinformatics\/btn028","article-title":"SVM-HUSTLE\u2014an iterative semi-supervised machine learning approach for pairwise protein remote homology detection","volume":"24","author":"Shah","year":"2008","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B41","doi-asserted-by":"crossref","first-page":"18500191","DOI":"10.1142\/S0219720018500191","article-title":"Extending Hidden Markov Models to allow conditioning on previous observations","volume":"16","author":"Tamposis","year":"2018","journal-title":"J. Bioinf. Comput. Biol"},{"key":"2023051612072541000_bty910-B42","doi-asserted-by":"crossref","first-page":"2490","DOI":"10.1093\/bioinformatics\/btq362","article-title":"ExTopoDB: a database of experimentally derived topological models of transmembrane proteins","volume":"26","author":"Tsaousis","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B43","doi-asserted-by":"crossref","first-page":"D324","DOI":"10.1093\/nar\/gkq863","article-title":"OMPdb: a database of \u03b2-barrel outer membrane proteins from Gram-negative bacteria","volume":"39","author":"Tsirigos","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023051612072541000_bty910-B44","doi-asserted-by":"crossref","first-page":"i665","DOI":"10.1093\/bioinformatics\/btw444","article-title":"PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins","volume":"32","author":"Tsirigos","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B45","doi-asserted-by":"crossref","first-page":"W401","DOI":"10.1093\/nar\/gkv485","article-title":"The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides","volume":"43","author":"Tsirigos","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"2023051612072541000_bty910-B46","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1093\/bioinformatics\/17.9.849","article-title":"The HMMTOP transmembrane topology prediction server","volume":"17","author":"Tusnady","year":"2001","journal-title":"Bioinformatics"},{"key":"2023051612072541000_bty910-B47","doi-asserted-by":"crossref","first-page":"1908","DOI":"10.1110\/ps.04625404","article-title":"Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information","volume":"13","author":"Viklund","year":"2004","journal-title":"Protein Sci."},{"key":"2023051612072541000_bty910-B48","doi-asserted-by":"crossref","first-page":"189","DOI":"10.3115\/981658.981684","article-title":"Unsupervised word sense disambiguation rivaling supervised methods","volume-title":"Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics","author":"Yarowsky","year":"1995"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/13\/2208\/50340477\/bioinformatics_35_13_2208.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/13\/2208\/50340477\/bioinformatics_35_13_2208.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T12:23:59Z","timestamp":1720787039000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/13\/2208\/5184961"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,11,16]]},"references-count":48,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2019,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty910","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,7,1]]},"published":{"date-parts":[[2018,11,16]]}}}