{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:39:00Z","timestamp":1761007140437,"version":"build-2065373602"},"reference-count":37,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2013,1,24]],"date-time":"2013-01-24T00:00:00Z","timestamp":1358985600000},"content-version":"vor","delay-in-days":389,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc of Assoc for Info"],"published-print":{"date-parts":[[2012,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Probabilistic topic models have been proven very useful for many text mining tasks. Although many variants of topic models have been proposed, most existing works are based on the bag\u2010of\u2010words representation of text in which word combination and order are generally ignored, resulting in inaccurate semantic representation of text. In this paper, we propose a general way to go beyond the bag\u2010of\u2010words representation for topic modeling by applying frequent pattern mining to discover frequent word patterns that can capture semantic associations between words and then using them as additional supplementary semantic units to augment the conventional bag\u2010of\u2010words representation. By viewing a topic model as a generative model for such augmented text data, we can go beyond the bag\u2010of\u2010words assumption to potentially capture more semantic associations between words. Since efficient algorithms for mining frequent word patterns are available, this general strategy for improving topic models can be applied to improve any topic models without substantially increasing the computational complexity of the model. Experiment results show that such a frequent pattern\u2010based data enrichment approach can improve over two representative existing probabilistic topic models for the classification task. We also studied variations of frequent pattern usage in topic modeling and found that using compressed and closed patterns performs best.<\/jats:p>","DOI":"10.1002\/meet.14504901209","type":"journal-article","created":{"date-parts":[[2013,1,24]],"date-time":"2013-01-24T10:49:23Z","timestamp":1359024563000},"page":"1-10","source":"Crossref","is-referenced-by-count":20,"title":["Enriching text representation with frequent pattern mining for probabilistic topic modeling"],"prefix":"10.1002","volume":"49","author":[{"given":"Hyun Duk","family":"Kim","sequence":"first","affiliation":[]},{"given":"Dae Hoon","family":"Park","sequence":"additional","affiliation":[]},{"given":"Yue","family":"Lu","sequence":"additional","affiliation":[]},{"given":"ChengXiang","family":"Zhai","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2013,1,24]]},"reference":[{"key":"e_1_2_9_2_1","unstructured":"Agrawal R.andSrikant R.(1994).Fast algorithms for mining association rules in large databases. InVLDB '94: Proceedings of the 1994 International Conference on Very Large Data Bases pages487\u2013499 San Francisco CA USA. Morgan Kaufmann Publishers Inc."},{"key":"e_1_2_9_3_1","doi-asserted-by":"crossref","unstructured":"B\u00edr\u00f3 I. Szab\u00f3 J. andBencz\u00far A. A.(2008).Latent dirichlet allocation in web spam filtering. InAIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web pages29\u201332 New York NY USA. ACM.","DOI":"10.1145\/1451983.1451991"},{"key":"e_1_2_9_4_1","first-page":"147","article-title":"Correlated Topic Models","volume":"18","author":"Blei D.","year":"2006","journal-title":"NIPS '06: Advances in Neural Information Processing Systems"},{"key":"e_1_2_9_5_1","unstructured":"Blei D. M. Griffiths T. L. Jordan M. I. andTenenbaum J. B.(2004).Hierarchical topic models and the nested chinese restaurant process. In NIPS '04: Advances in Neural Information Processing Systems."},{"key":"e_1_2_9_6_1","unstructured":"Blei D. M.andMcauliffe J. D.(2007).Supervised topic models."},{"key":"e_1_2_9_7_1","doi-asserted-by":"publisher","DOI":"10.1162\/jmlr.2003.3.4-5.993"},{"key":"e_1_2_9_8_1","unstructured":"Boyd\u2010Graber J. Blei D. M. andZhu X.(2007).A topic model for word sense disambiguation. In EMNLP '07: Proceedings of the 2007 conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_2_9_9_1","doi-asserted-by":"crossref","unstructured":"Brin S. Motwani R. Ullman J. D. andTsur S.(1997).Dynamic itemset counting and implication rules for market basket data. InSIGMOD '97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data pages255\u2013264 New York NY USA. ACM.","DOI":"10.1145\/253260.253325"},{"key":"e_1_2_9_10_1","doi-asserted-by":"crossref","unstructured":"Cheng H. Yan X. Han J. andwei Hsu C.(2007).Discriminative frequent pattern analysis for effective classification. InIn ICDE pages716\u2013725.","DOI":"10.1109\/ICDE.2007.367917"},{"key":"e_1_2_9_11_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster A. P.","year":"1977","journal-title":"Journal of Royal Statist. Soc. B"},{"key":"e_1_2_9_12_1","doi-asserted-by":"crossref","unstructured":"Ding B. Lo D. Han J. andKhoo S.\u2010C.(2009).Efficient mining of closed repetitive gapped subsequences from a sequence database. InProceedings of the 2009 IEEE International Conference on Data Engineering ICDE '09 pages 1024\u20131035 Washington DC USA. IEEE Computer Society.","DOI":"10.1109\/ICDE.2009.104"},{"key":"e_1_2_9_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.1984.4767596"},{"key":"e_1_2_9_14_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0307752101"},{"key":"e_1_2_9_15_1","doi-asserted-by":"crossref","unstructured":"Han J. Pei J. andYin Y.(2000).Mining frequent patterns without candidate generation. InSIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data pages1\u201312 New York NY USA. ACM.","DOI":"10.1145\/342009.335372"},{"key":"e_1_2_9_16_1","doi-asserted-by":"crossref","unstructured":"Hofmann T.(1999).Probabilistic latent semantic indexing. InSIGIR '99: Proceedings of the 1999 international ACM SIGIR conference on research and development in Information Retrieval pages50\u201357 New York NY USA. ACM.","DOI":"10.1145\/312624.312649"},{"key":"e_1_2_9_17_1","doi-asserted-by":"crossref","unstructured":"H\u00f6rster E. Lienhart R. andSlaney M.(2007).Image retrieval on large\u2010scale image databases. InCIVR '07: Proceedings of the 2007 ACM international conference on Image and video retrieval pages17\u201324 New York NY USA. ACM.","DOI":"10.1145\/1282280.1282283"},{"key":"e_1_2_9_18_1","doi-asserted-by":"crossref","unstructured":"Lin C.andHe Y.(2009).Joint sentiment\/topic model for sentiment analysis. InCIKM '09: Proceedings of the 2009 ACM international Conference on Information and Knowledge Management pages375\u2013384 New York NY USA. ACM.","DOI":"10.1145\/1645953.1646003"},{"key":"e_1_2_9_19_1","doi-asserted-by":"crossref","unstructured":"Liu Y. Niculescu\u2010Mizil A. andGryc W.(2009).Topic\u2010link lda: joint models of topic and author community. InICML '09: Proceedings of the 2009 annual International Conference on Machine Learning pages665\u2013672 New York NY USA. ACM.","DOI":"10.1145\/1553374.1553460"},{"key":"e_1_2_9_20_1","doi-asserted-by":"crossref","unstructured":"Mei Q. Ling X. Wondra M. Su H. andZhai C.(2007).Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW '07: Proceedings of the 2007 international conference on World Wide Web pages 171\u2010180 New York NY USA. ACM.","DOI":"10.1145\/1242572.1242596"},{"key":"e_1_2_9_21_1","doi-asserted-by":"crossref","unstructured":"Mimno D. Wallach H. M. Naradowsky J. Smith D. A. andMccallum A.(2009).Polylingual topic models. InEMNLP '09: Proceedings of the 2009 conference on Empirical Methods in Natural Language Processing.","DOI":"10.3115\/1699571.1699627"},{"key":"e_1_2_9_22_1","unstructured":"Minka T. P.andLafferty J. D.(2002).Expectation\u2010propogation for the generative aspect model. InUAI pages352\u2013359."},{"key":"e_1_2_9_23_1","doi-asserted-by":"crossref","unstructured":"Park J. S. Chen M.\u2010S. andYu P. S.(1995).An effective hash\u2010based algorithm for mining association rules. InSIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data pages175\u2013186 New York NY USA. ACM.","DOI":"10.1145\/223784.223813"},{"key":"e_1_2_9_24_1","doi-asserted-by":"crossref","unstructured":"Pasquier N. Bastide Y. Taouil R. andLakhal L.(1999).Discovering frequent closed itemsets for association rules. InICDT '99: Proceedings of the 7th International Conference on Database Theory pages398\u2013416 London UK. Springer\u2010Verlag.","DOI":"10.1007\/3-540-49257-7_25"},{"key":"e_1_2_9_25_1","unstructured":"Pei J. Han J. andMao R.(2000).Closet: An efficient algorithm for mining frequent closed itemsets. InDMKD '00: Proceedings of the 2000 ACM SIGMOD workshop on research issues in Data Mining and Knowledge Discovery pages21\u201330 New York NY USA. ACM."},{"key":"e_1_2_9_26_1","unstructured":"Pei J. Han J. Mortazavi\u2010asl B. Pinto H. Chen Q. Dayal U. andchun Hsu M.(2001).Prefixspan: Mining sequential patterns efficiently by prefix\u2010projected pattern growth. InICDE '01: Proceedings of the 2001 International Conference on Data Engineering page215 Washington DC USA. IEEE Computer Society."},{"key":"e_1_2_9_27_1","doi-asserted-by":"crossref","unstructured":"Phan X.\u2010H. Nguyen L.\u2010M. andHoriguchi S.(2008).Learning to classify short and sparse text & web with hidden topics from large\u2010scale data collections. InWWW '08: Proceeding of the 17th international conference on World Wide Web pages91\u2013100 New York NY USA. ACM.","DOI":"10.1145\/1367497.1367510"},{"key":"e_1_2_9_28_1","doi-asserted-by":"crossref","unstructured":"Srikant R.andAgrawal R.(1996).Mining sequential patterns: Generalizations and performance improvements. InEDBT '96: Proceedings of the 1996 international conference on Extending Database Technology pages3\u201317. Springer\u2010Verlag.","DOI":"10.1007\/BFb0014140"},{"key":"e_1_2_9_29_1","unstructured":"Teh Y. W.andG\u00f6r\u00fcr D.(2009).Indian buffet processes with power\u2010law behavior. In Advances in Neural Information Processing Systems."},{"key":"e_1_2_9_30_1","unstructured":"Titov I.andMcDonald R.(2008a).A joint model of text and aspect ratings for sentiment summarization. InACL '08: Proceedings of the 2008 annual meeting on Association for Computational Linguistics pages308\u2013316 Columbus Ohio. Association for Computational Linguistics."},{"key":"e_1_2_9_31_1","doi-asserted-by":"crossref","unstructured":"Titov I.andMcDonald R.(2008b).Modeling online reviews with multi\u2010grain topic models. InWWW '08: Proceedings of the 2008 international conference on World Wide Web pages111\u2013120 New York NY USA. ACM.","DOI":"10.1145\/1367497.1367513"},{"key":"e_1_2_9_32_1","doi-asserted-by":"crossref","unstructured":"Wallach H. M.(2006).Topic modeling: beyond bag\u2010of\u2010words. InICML '06: Proceedings of the 2006 annual International Conference on Machine Learning pages977\u2013984 New York NY USA. ACM.","DOI":"10.1145\/1143844.1143967"},{"key":"e_1_2_9_33_1","doi-asserted-by":"crossref","unstructured":"Wang X. McCallum A. andWei X.(2007).Topical n\u2010grams: Phrase and topic discovery with an application to information retrieval. InICDM '07: Proceedings of the 2007 IEEE International Conference on Data Mining pages697\u2013702 Washington DC USA. IEEE Computer Society.","DOI":"10.1109\/ICDM.2007.86"},{"key":"e_1_2_9_34_1","doi-asserted-by":"crossref","unstructured":"Wei X.andCroft W. B.(2006).Lda\u2010based document models for ad\u2010hoc retrieval. InSIGIR '06: Proceedings of the 2006 international ACM SIGIR conference on research and development in Information Retrieval pages178\u2013185 New York NY USA. ACM.","DOI":"10.1145\/1148170.1148204"},{"key":"e_1_2_9_35_1","unstructured":"Xin D. Han J. Yan X. andCheng H.(2005).Mining compressed frequent\u2010pattern sets. InVLDB '05: Proceedings of the 31st international conference on Very large data bases pages709\u2013720. VLDB Endowment."},{"key":"e_1_2_9_36_1","doi-asserted-by":"crossref","unstructured":"Yan X. Han J. andAfshar R.(2003).Clospan: Mining closed sequential patterns in large datasets. InSDM'03: Proceedings of the 2003 SIAM international conference on Data Mining pages166\u2013177. SIAM.","DOI":"10.1137\/1.9781611972733.15"},{"key":"e_1_2_9_37_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007652502315"},{"key":"e_1_2_9_38_1","doi-asserted-by":"crossref","unstructured":"Zaki M. J.andjui Hsiao C.(2002).Charm: An efficient algorithm for closed itemset mining. InSDM'02: Proceedings of the 2002 SIAM international conference on Data Mining pages457\u2013473 Arlington VA USA. SIAM.","DOI":"10.1137\/1.9781611972726.27"}],"container-title":["Proceedings of the American Society for Information Science and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.14504901209","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fmeet.14504901209","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/meet.14504901209","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T11:36:14Z","timestamp":1760960174000},"score":1,"resource":{"primary":{"URL":"https:\/\/asistdl.onlinelibrary.wiley.com\/doi\/10.1002\/meet.14504901209"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,1]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,1]]}},"alternative-id":["10.1002\/meet.14504901209"],"URL":"https:\/\/doi.org\/10.1002\/meet.14504901209","archive":["Portico"],"relation":{},"ISSN":["0044-7870","1550-8390"],"issn-type":[{"type":"print","value":"0044-7870"},{"type":"electronic","value":"1550-8390"}],"subject":[],"published":{"date-parts":[[2012,1]]}}}