{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T02:25:30Z","timestamp":1771813530013,"version":"3.50.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T00:00:00Z","timestamp":1708992000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T00:00:00Z","timestamp":1708992000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int. J. Mach. Learn. &amp; Cyber."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Sequence data mining has become an increasingly popular research topic as the availability of data has grown rapidly over the past decades. Sequence clustering is a type of method within this field that is in high demand in the industry, but the sequence clustering problem is non-trivial and, as opposed to static cluster analysis, interpreting clusters of sequences is often difficult. Using Hidden Markov Models (HMMs), we propose the Discrete Bayesian HMM Clustering (DBHC) algorithm, an approach to clustering discrete sequences by extending a proven method for continuous sequences. The proposed algorithm is completely self-contained as it incorporates both the search for the number of clusters and the search for the number of hidden states in each cluster model in the parameter inference. We provide a working example and a simulation study to explain and showcase the capabilities of the DBHC algorithm. A case study illustrates how the hidden states in a mixture of HMMs can aid the interpretation task of a sequence cluster analysis. We conclude that the algorithm works well as it provides well-interpretable clusters for the considered application.<\/jats:p>","DOI":"10.1007\/s13042-024-02102-w","type":"journal-article","created":{"date-parts":[[2024,2,27]],"date-time":"2024-02-27T14:02:07Z","timestamp":1709042527000},"page":"3439-3454","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["DBHC: Discrete Bayesian HMM Clustering"],"prefix":"10.1007","volume":"15","author":[{"given":"Gabriel","family":"Budel","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8031-758X","authenticated-orcid":false,"given":"Flavius","family":"Frasincar","sequence":"additional","affiliation":[]},{"given":"David","family":"Boekestijn","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,2,27]]},"reference":[{"key":"2102_CR1","unstructured":"Budel G, Frasincar F (2022) DBHC: sequence clustering with Discrete-Output HMMs. https:\/\/CRAN.R-project.org\/web\/packages\/DBHC, R package version 0.0.3"},{"issue":"11","key":"2102_CR2","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1101\/gr.9.11.1135","volume":"9","author":"J Burke","year":"1999","unstructured":"Burke J, Davison D, Hide W (1999) d2_cluster: a validated method for clustering EST and full-length cDNA sequences. Genome Res 9(11):1135\u20131142","journal-title":"Genome Res"},{"issue":"4","key":"2102_CR3","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1023\/A:1024992613384","volume":"7","author":"I Cadez","year":"2003","unstructured":"Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4):399\u2013424","journal-title":"Data Min Knowl Discov"},{"issue":"1","key":"2102_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","volume":"39","author":"AP Dempster","year":"1977","unstructured":"Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1\u201322","journal-title":"J R Stat Soc Ser B (Methodol)"},{"key":"2102_CR5","volume-title":"Sequence data mining","author":"G Dong","year":"2007","unstructured":"Dong G, Pei J (2007) Sequence data mining. Springer Science & Business Media, Berlin"},{"key":"2102_CR6","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological sequence analysis: probabilistic models of proteins and nucleic acids","author":"R Durbin","year":"1998","unstructured":"Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge"},{"issue":"10","key":"2102_CR7","doi-asserted-by":"publisher","first-page":"3019","DOI":"10.1007\/s13042-022-01579-7","volume":"13","author":"W Fan","year":"2022","unstructured":"Fan W, Hou W (2022) Unsupervised modeling and feature selection of sequential spherical data through nonparametric hidden Markov models. Int J Mach Learn Cybern 13(10):3019\u20133029","journal-title":"Int J Mach Learn Cybern"},{"issue":"4","key":"2102_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v040.i04","volume":"40","author":"A Gabadinho","year":"2011","unstructured":"Gabadinho A, Ritschard G, Mueller NS, Studer M (2011) Analyzing and visualizing state sequences in R with TraMineR. J Stat Softw 40(4):1\u201337","journal-title":"J Stat Softw"},{"issue":"6","key":"2102_CR9","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1109\/TPAMI.1984.4767596","volume":"PAMI\u20136","author":"S Geman","year":"1984","unstructured":"Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell PAMI\u20136(6):721\u2013741","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"3","key":"2102_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v088.i03","volume":"88","author":"J Helske","year":"2019","unstructured":"Helske J, Helske S (2019) Mixture hidden Markov models for sequence data: the seqHMM Package in R. J Stat Softw 88(3):1\u201332","journal-title":"J Stat Softw"},{"issue":"23","key":"2102_CR11","doi-asserted-by":"publisher","first-page":"4116","DOI":"10.1002\/sim.6220","volume":"33","author":"F Lagona","year":"2014","unstructured":"Lagona F, Jdanov D, Shkolnikova M (2014) Latent time-varying factors in longitudinal analysis: a linear mixed hidden Markov model for heart rates. Stat Med 33(23):4116\u20134134","journal-title":"Stat Med"},{"key":"2102_CR12","unstructured":"Li C, Biswas G (2000) A Bayesian approach to temporal data clustering using hidden Markov models. In: Proceedings of the 17th international conference on machine learning (ICML 2000). Morgan Kaufmann Publishers Inc., pp 543\u2013550"},{"issue":"11","key":"2102_CR13","doi-asserted-by":"publisher","first-page":"1857","DOI":"10.1016\/j.patcog.2005.01.025","volume":"38","author":"TW Liao","year":"2005","unstructured":"Liao TW (2005) Clustering of time series data\u2014a survey. Pattern Recognit 38(11):1857\u20131874","journal-title":"Pattern Recognit"},{"issue":"4","key":"2102_CR14","doi-asserted-by":"publisher","first-page":"573","DOI":"10.2307\/3316097","volume":"30","author":"RJ MacKay","year":"2002","unstructured":"MacKay RJ (2002) Estimating the order of a Hidden Markov model. Can J Stat 30(4):573\u2013589","journal-title":"Can J Stat"},{"key":"2102_CR15","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4613-0457-9","volume-title":"Mathematical classification and clustering","author":"B Mirkin","year":"1996","unstructured":"Mirkin B (1996) Mathematical classification and clustering. Kluwer Academic Publishers, Norwell"},{"key":"2102_CR16","unstructured":"R Core Team (2017) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https:\/\/www.R-project.org\/"},{"issue":"2","key":"2102_CR17","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/5.18626","volume":"77","author":"LR Rabiner","year":"1989","unstructured":"Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257\u2013286","journal-title":"Proc IEEE"},{"key":"2102_CR18","doi-asserted-by":"crossref","unstructured":"Rabiner LR, Lee CH, Juang BH, Wilpon JG (1989) HMM clustering for connected word recognition. In: Proceedings of the international conference on acoustics, speech, and signal processing (ICASSP 1989). IEEE, pp 405\u2013408","DOI":"10.1109\/ICASSP.1989.266451"},{"issue":"1","key":"2102_CR19","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/MASSP.1986.1165342","volume":"3","author":"LR Rabiner","year":"1986","unstructured":"Rabiner LR, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4\u201316","journal-title":"IEEE ASSP Mag"},{"key":"2102_CR20","unstructured":"Smyth P (1996) Clustering sequences with hidden Markov models. In: Proceedings of the 10th international conference on neural information processing systems (NIPS 1996). MIT Press, pp 648\u2013654"},{"key":"2102_CR21","unstructured":"Stolcke A, Omohundro SM (1994) Best-first model merging for hidden Markov model induction. ICSI Technical Report TR-94-003"},{"key":"2102_CR22","doi-asserted-by":"crossref","unstructured":"Taghva K, Coombs JS, Pereda R, Nartker TA (2005) Address extraction using hidden Markov models. In: Proceedings of the 12th document recognition and retrieval conference (DRR 2005). SPIE, pp 119\u2013126","DOI":"10.1117\/12.587799"},{"issue":"2","key":"2102_CR23","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1007\/s40745-015-0040-1","volume":"2","author":"D Xu","year":"2015","unstructured":"Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165\u2013193","journal-title":"Ann Data Sci"}],"container-title":["International Journal of Machine Learning and Cybernetics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-024-02102-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13042-024-02102-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-024-02102-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,3]],"date-time":"2024-07-03T07:31:26Z","timestamp":1719991886000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13042-024-02102-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,27]]},"references-count":23,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["2102"],"URL":"https:\/\/doi.org\/10.1007\/s13042-024-02102-w","relation":{},"ISSN":["1868-8071","1868-808X"],"issn-type":[{"value":"1868-8071","type":"print"},{"value":"1868-808X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,27]]},"assertion":[{"value":"23 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}