{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T11:54:19Z","timestamp":1776772459878,"version":"3.51.2"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,3,4]],"date-time":"2020-03-04T00:00:00Z","timestamp":1583280000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1707498"],"award-info":[{"award-number":["IIS-1707498"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,4,30]]},"abstract":"<jats:p>Probabilistic topic models, which can discover hidden patterns in documents, have been extensively studied. However, rather than learning from a single document collection, numerous real-world applications demand a comprehensive understanding of the relationships among various document sets. To address such needs, this article proposes a new model that can identify the common and discriminative aspects of multiple datasets. Specifically, our proposed method is a Bayesian approach that represents each document as a combination of common topics (shared across all document sets) and distinctive topics (distributions over words that are exclusive to a particular dataset). Through extensive experiments, we demonstrate the effectiveness of our method compared with state-of-the-art models. The proposed model can be useful for \u201ccomparative thinking\u201d analysis in real-world document collections.<\/jats:p>","DOI":"10.1145\/3369873","type":"journal-article","created":{"date-parts":[[2020,3,4]],"date-time":"2020-03-04T11:12:20Z","timestamp":1583320340000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":24,"title":["Probabilistic Topic Modeling for Comparative Analysis of Document Collections"],"prefix":"10.1145","volume":"14","author":[{"given":"Ting","family":"Hua","sequence":"first","affiliation":[{"name":"Virginia Tech"}]},{"given":"Chang-Tien","family":"Lu","sequence":"additional","affiliation":[{"name":"Virginia Tech"}]},{"given":"Jaegul","family":"Choo","sequence":"additional","affiliation":[{"name":"Korea University, South Korea"}]},{"given":"Chandan K.","family":"Reddy","sequence":"additional","affiliation":[{"name":"Virginia Tech"}]}],"member":"320","published-online":{"date-parts":[[2020,3,4]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201906)","author":"David","unstructured":"David M. Blei and John D. Lafferty. 2006. Dynamic topic models . In Proceedings of the International Conference on Machine Learning (ICML\u201906) . ACM, 113--120. David M. Blei and John D. Lafferty. 2006. Dynamic topic models. In Proceedings of the International Conference on Machine Learning (ICML\u201906). ACM, 113--120."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02293801"},{"key":"e_1_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Jordan Boyd-Graber Yuening Hu David Mimno etal 2017. Applications of topic models. Foundations and Trends\u00ae in Information Retrieval 11 2--3 60--62.  Jordan Boyd-Graber Yuening Hu David Mimno et al. 2017. Applications of topic models. Foundations and Trends\u00ae in Information Retrieval 11 2--3 60--62.","DOI":"10.1561\/1500000030"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.57"},{"key":"e_1_2_1_6_1","volume-title":"George","author":"Casella George","year":"1992","unstructured":"George Casella and Edward I . George . 1992 . Explaining the Gibbs sampler. In The American Statistician, Vol. 46 . Taylor 8 Francis , 167--174. George Casella and Edward I. George. 1992. Explaining the Gibbs sampler. In The American Statistician, Vol. 46. Taylor 8 Francis, 167--174."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of Neural Information Processing Systems (NIPS\u201906)","volume":"19","author":"Chemudugunta Chaitanya","year":"2006","unstructured":"Chaitanya Chemudugunta , Padhraic Smyth , and Mark Steyvers . 2006 . Modeling general and specific aspects of documents with a probabilistic topic model . In Proceedings of Neural Information Processing Systems (NIPS\u201906) , Vol. 19 . 241--248. Chaitanya Chemudugunta, Padhraic Smyth, and Mark Steyvers. 2006. Modeling general and specific aspects of documents with a probabilistic topic model. In Proceedings of Neural Information Processing Systems (NIPS\u201906), Vol. 19. 241--248."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/tvcg.2013.212"},{"key":"e_1_2_1_9_1","volume-title":"Pattern recognition and machine learning","author":"Christopher Bishop","unstructured":"Bishop Christopher . 2007. Pattern recognition and machine learning . Springer , 93--94. Bishop Christopher. 2007. Pattern recognition and machine learning. Springer, 93--94."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201901)","author":"David","unstructured":"David A. Cohn and Thomas Hofmann. 2001. The missing link-a probabilistic model of document content and hypertext connectivity . In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201901) . 430--436. David A. Cohn and Thomas Hofmann. 2001. The missing link-a probabilistic model of document content and hypertext connectivity. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201901). 430--436."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1056"},{"key":"e_1_2_1_12_1","volume-title":"Griffiths and Mark Steyvers","author":"Thomas","year":"2004","unstructured":"Thomas L. Griffiths and Mark Steyvers . 2004 . Finding scientific topics. In Proceedings of the National Academy of Sciences (PNAS\u2019 04), Vol. 101 . NAS, 5228--5235. Thomas L. Griffiths and Mark Steyvers. 2004. Finding scientific topics. In Proceedings of the National Academy of Sciences (PNAS\u201904), Vol. 101. NAS, 5228--5235."},{"key":"e_1_2_1_13_1","first-page":"3","article-title":"Crowdstory: Fine-grained event storyline generation by fusion of multi-modal crowdsourced data. In Proceedings of the ACM on Interactive","volume":"1","author":"Guo Bin","year":"2017","unstructured":"Bin Guo , Yi Ouyang , Cheng Zhang , Jiafan Zhang , Zhiwen Yu , Di Wu , and Yu Wang . 2017 . Crowdstory: Fine-grained event storyline generation by fusion of multi-modal crowdsourced data. In Proceedings of the ACM on Interactive , Mobile, Wearable and Ubiquitous Technologies 1 , 3 , 55. Bin Guo, Yi Ouyang, Cheng Zhang, Jiafan Zhang, Zhiwen Yu, Di Wu, and Yu Wang. 2017. Crowdstory: Fine-grained event storyline generation by fusion of multi-modal crowdsourced data. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3, 55.","journal-title":"Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201910)","author":"Hoffman Matthew","unstructured":"Matthew Hoffman , Francis R. Bach , and David M. Blei . 2010. Online learning for latent dirichlet allocation . In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201910) . 856--864. Matthew Hoffman, Francis R. Bach, and David M. Blei. 2010. Online learning for latent dirichlet allocation. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201910). 856--864."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/2567709.2502622"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/312624.312649"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP\u201913)","author":"Huang Lifu","year":"2013","unstructured":"Lifu Huang and Lian\u2019en Huang . 2013 . Optimized event storyline generation based on mixture-event-aspect model . In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP\u201913) . 726--735. Lifu Huang and Lian\u2019en Huang. 2013. Optimized event storyline generation based on mixture-event-aspect model. In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP\u201913). 726--735."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783338"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Gang Kou Yanqun Lu Yi Peng and Yong Shi. 2012. Evaluation of classification algorithms using MCDM and rank correlation. International Journal of Information Technology 8 Decision Making 11 01 197--225.  Gang Kou Yanqun Lu Yi Peng and Yong Shi. 2012. Evaluation of classification algorithms using MCDM and rank correlation. International Journal of Information Technology 8 Decision Making 11 01 197--225.","DOI":"10.1142\/S0219622012500095"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.02.137"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800020109"},{"key":"e_1_2_1_23_1","volume-title":"Jordan","author":"Lacoste-Julien Simon","year":"2009","unstructured":"Simon Lacoste-Julien , Fei Sha , and Michael I . Jordan . 2009 . DiscLDA: Discriminative learning for dimensionality reduction and classification. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u2019 09). 897--904. Simon Lacoste-Julien, Fei Sha, and Michael I. Jordan. 2009. DiscLDA: Discriminative learning for dimensionality reduction and classification. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201909). 897--904."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313617"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201901)","author":"Lee Daniel D.","unstructured":"Daniel D. Lee and H. Sebastian Seung . 2001. Algorithms for non-negative matrix factorization . In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201901) . 556--562. Daniel D. Lee and H. Sebastian Seung. 2001. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201901). 556--562."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914)","author":"Lee Moontae","year":"2017","unstructured":"Moontae Lee and David Mimno . 2017 . Low-dimensional embeddings for interpretable anchor-based topic inference . In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914) . 1319--1328. Moontae Lee and David Mimno. 2017. Low-dimensional embeddings for interpretable anchor-based topic inference. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914). 1319--1328."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.48"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI\u201916)","author":"Liu Pengfei","year":"2016","unstructured":"Pengfei Liu , Xipeng Qiu , and Xuanjing Huang . 2016 . Recurrent neural network for text classification with multi-task learning . In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI\u201916) . 2873--2879. Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI\u201916). 2873--2879."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/2145432.2145462"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2009916.2010006"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488467"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the 12th International AAAI Conference on Web and Social Media.","author":"Momeni Elaheh","year":"2018","unstructured":"Elaheh Momeni , Shanika Karunasekera , Palash Goyal , and Kristina Lerman . 2018 . Modeling evolution of topics in large-scale temporal text corpora . In Proceedings of the 12th International AAAI Conference on Web and Social Media. Elaheh Momeni, Shanika Karunasekera, Palash Goyal, and Kristina Lerman. 2018. Modeling evolution of topics in large-scale temporal text corpora. In Proceedings of the 12th International AAAI Conference on Web and Social Media."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/2390524.2390572"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2318728"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI\u201910)","volume":"51","author":"Paul Michael","year":"2010","unstructured":"Michael Paul and Roxana Girju . 2010 . A two-dimensional topic-aspect model for discovering multi-faceted topics . In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI\u201910) , Vol. 51 . 36. Michael Paul and Roxana Girju. 2010. A two-dimensional topic-aspect model for discovering multi-faceted topics. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI\u201910), Vol. 51. 36."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401960"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2008.927706"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of International Conference on Machine Learning (ICML\u201914)","author":"Rabinovich Maxim","unstructured":"Maxim Rabinovich and David M. Blei . 2014. The inverse regression topic model . In Proceedings of International Conference on Machine Learning (ICML\u201914) . IEEE, 199--207. Maxim Rabinovich and David M. Blei. 2014. The inverse regression topic model. In Proceedings of International Conference on Machine Learning (ICML\u201914). IEEE, 199--207."},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP\u201909)","author":"Ramage Daniel","unstructured":"Daniel Ramage , David Hall , Ramesh Nallapati , and Christopher D. Manning . 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora . In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP\u201909) . ACL, 248--256. Daniel Ramage, David Hall, Ramesh Nallapati, and Christopher D. Manning. 2009. Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP\u201909). ACL, 248--256."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020481"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.69"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018690"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2648786"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1036843.1036902"},{"key":"e_1_2_1_45_1","volume-title":"A survey on automatic Twitter event summarization.Journal of Information Processing Systems 14, 1","author":"Rudrapal Dwijen","year":"2018","unstructured":"Dwijen Rudrapal , Amitava Das , and Baby Bhattacharya . 2018. A survey on automatic Twitter event summarization.Journal of Information Processing Systems 14, 1 ( 2018 ), 79--100. Dwijen Rudrapal, Amitava Das, and Baby Bhattacharya. 2018. A survey on automatic Twitter event summarization.Journal of Information Processing Systems 14, 1 (2018), 79--100."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 15th Annual Conference of the International Speech Communication Association.","author":"Sak Ha\u015fim","year":"2014","unstructured":"Ha\u015fim Sak , Andrew Senior , and Fran\u00e7oise Beaufays . 2014 . Long short-term memory recurrent neural network architectures for large scale acoustic modeling . In Proceedings of the 15th Annual Conference of the International Speech Communication Association. Ha\u015fim Sak, Andrew Senior, and Fran\u00e7oise Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3018661.3018728"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 1105--1114","author":"Shi Tian","unstructured":"Tian Shi , Kyeongpil Kang , Jaegul Choo , and Chandan K. Reddy . 2018. Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations . In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 1105--1114 . Tian Shi, Kyeongpil Kang, Jaegul Choo, and Chandan K. Reddy. 2018. Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In Proceedings of the 2018 World Wide Web Conference. International World Wide Web Conferences Steering Committee, 1105--1114."},{"key":"e_1_2_1_49_1","volume-title":"Compare 8 contrast: Teaching comparative thinking to strengthen student learning","author":"Silver Harvey F.","unstructured":"Harvey F. Silver . 2010. Compare 8 contrast: Teaching comparative thinking to strengthen student learning . Association for Supervision 8 Curriculum Development, 1--2. Harvey F. Silver. 2010. Compare 8 contrast: Teaching comparative thinking to strengthen student learning. Association for Supervision 8 Curriculum Development, 1--2."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP\u201912) and Computational Natural Language Learning (CoNLL\u201912)","author":"Stevens Keith","year":"2012","unstructured":"Keith Stevens , Philip Kegelmeyer , David Andrzejewski , and David Buttler . 2012 . Exploring topic coherence over many models and many topics . In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP\u201912) and Computational Natural Language Learning (CoNLL\u201912) . ACL, 952--961. Keith Stevens, Philip Kegelmeyer, David Andrzejewski, and David Buttler. 2012. Exploring topic coherence over many models and many topics. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing (EMNLP\u201912) and Computational Natural Language Learning (CoNLL\u201912). ACL, 952--961."},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201905)","author":"Teh Yee W.","unstructured":"Yee W. Teh , Michael I. Jordan , Matthew J. Beal , and David M. Blei . 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes . In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201905) . 1385--1392. Yee W. Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2005. Sharing clusters among related groups: Hierarchical Dirichlet processes. In Proceedings of the Advances in Neural Information Processing Systems (NIPS\u201905). 1385--1392."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2015.112"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150450"},{"key":"e_1_2_1_54_1","volume-title":"Wright and Jorge Nocedal","author":"Stephen","year":"1999","unstructured":"Stephen J. Wright and Jorge Nocedal . 1999 . Numerical optimization, Vol. 35 . Springer Science . Stephen J. Wright and Jorge Nocedal. 1999. Numerical optimization, Vol. 35. Springer Science."},{"key":"e_1_2_1_55_1","volume-title":"Benjamin Van Durme, and Jordan L. Ying","author":"Yuan Michelle","year":"2018","unstructured":"Michelle Yuan , Benjamin Van Durme, and Jordan L. Ying . 2018 . Multilingual anchoring: Interactive topic modeling and alignment across languages. In Proceedings of the Advances in Neural Information Processing Systems . 8667--8677. Michelle Yuan, Benjamin Van Durme, and Jordan L. Ying. 2018. Multilingual anchoring: Interactive topic modeling and alignment across languages. In Proceedings of the Advances in Neural Information Processing Systems. 8667--8677."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369873","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3369873","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3369873","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:27Z","timestamp":1750203867000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369873"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,4]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,4,30]]}},"alternative-id":["10.1145\/3369873"],"URL":"https:\/\/doi.org\/10.1145\/3369873","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,4]]},"assertion":[{"value":"2018-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-03-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}