{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T10:07:51Z","timestamp":1760609271875,"version":"3.37.3"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2020,5,15]],"date-time":"2020-05-15T00:00:00Z","timestamp":1589500800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,5,15]],"date-time":"2020-05-15T00:00:00Z","timestamp":1589500800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004270","name":"Kungliga Tekniska H\u00f6gskolan","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004270","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Prog Artif Intell"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Dynamic Topic Modeling (DTM) is the ultimate solution for extracting topics from short texts generated in Online Social Networks (OSNs) like Twitter. It requires to be scalable and to be able to account for sparsity and dynamicity of short texts. Current solutions combine probabilistic mixture models like Dirichlet Multinomial or Pitman-Yor Process with approximate inference approaches like Gibbs Sampling and Stochastic Variational Inference to, respectively, account for dynamicity and scalability of DTM. However, these methods basically rely on weak probabilistic language models, which do not account for sparsity in short texts. In addition, their inference is based on iterative optimizations, which have scalability issues when it comes to DTM. We present GDTM, a single-pass graph-based DTM algorithm, to solve the problem. GDTM combines a context-rich and incremental feature representation method with graph partitioning to address scalability and dynamicity and uses a rich language model to account for sparsity. We run multiple experiments over a large-scale Twitter dataset to analyze the accuracy and scalability of GDTM and compare the results with four state-of-the-art models. In result, GDTM outperforms the best model by <jats:inline-formula><jats:alternatives><jats:tex-math>$$11\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n<mml:mrow>\n<mml:mn>11<\/mml:mn>\n<mml:mo>%<\/mml:mo>\n<\/mml:mrow>\n<\/mml:math><\/jats:alternatives><\/jats:inline-formula> on accuracy and performs by an order of magnitude faster while creating four times better topic quality over standard evaluation metrics.<\/jats:p>","DOI":"10.1007\/s13748-020-00206-2","type":"journal-article","created":{"date-parts":[[2020,5,15]],"date-time":"2020-05-15T12:04:01Z","timestamp":1589544241000},"page":"195-207","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["GDTM: Graph-based Dynamic Topic Models"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1007-8533","authenticated-orcid":false,"given":"Kambiz","family":"Ghoorchian","sequence":"first","affiliation":[]},{"given":"Magnus","family":"Sahlgren","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,5,15]]},"reference":[{"key":"206_CR1","unstructured":"Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., Umass, J.A., Cmu, B.A., Cmu, D.B., Cmu, A.B., Cmu, R.B., Dragon, I.C., Darpa, G.D., Cmu, A.H., Cmu, J.L., Umass, V.L., Cmu, X.L., Dragon, S.L., Dragon, P.V.M. Umass, R.P., Cmu, T.P., Umass, J.P., Umass, M.S.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194\u2013218, (1998)"},{"key":"206_CR2","first-page":"129","volume-title":"Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond Mining Text Data","author":"SP Crain","year":"2012","unstructured":"Crain, S.P., Zhou, K., Yang, S.-H., Zha, H.: Dimensionality Reduction and Topic Modeling: From Latent Semantic Indexing to Latent Dirichlet Allocation and Beyond Mining Text Data, pp. 129\u2013161. Springer, Berlin (2012)"},{"key":"206_CR3","first-page":"993","volume":"3","author":"DM Blei","year":"2003","unstructured":"Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993\u20131022 (2003)","journal-title":"J. Mach. Learn. Res."},{"key":"206_CR4","doi-asserted-by":"crossref","unstructured":"Yan, X., Guo, J., Lan, Y., Cheng, X.: A Biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445\u20131456, (2013)","DOI":"10.1145\/2488388.2488514"},{"key":"206_CR5","doi-asserted-by":"crossref","unstructured":"Blei, D. M., Lafferty, J. D.: Dynamic topic models. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 113\u2013120, (2006)","DOI":"10.1145\/1143844.1143859"},{"key":"206_CR6","unstructured":"Wang, C., Blei, D., Heckerman, D.: Continuous time dynamic topic models. In: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, pp. 579\u2013586, (2008)"},{"key":"206_CR7","doi-asserted-by":"crossref","unstructured":"Liang, S., Yilmaz, E., Kanoulas, E.: Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 995\u20131004, (2016)","DOI":"10.1145\/2939672.2939748"},{"key":"206_CR8","doi-asserted-by":"crossref","unstructured":"Yin, J., Wang, J.: A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 233\u2013242, (2014)","DOI":"10.1145\/2623330.2623715"},{"key":"206_CR9","doi-asserted-by":"crossref","unstructured":"Yin, J., Wang, J.: A Text clustering algorithm using an online clustering scheme for initialization. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1995\u20132004, (2016)","DOI":"10.1145\/2939672.2939841"},{"key":"206_CR10","doi-asserted-by":"publisher","first-page":"1802","DOI":"10.1007\/s10489-017-1055-4","volume":"48","author":"Q Jipeng","year":"2018","unstructured":"Jipeng, Q., Yun, L., Yunhao, Y., Xindong, W.: Short Text clustering based on Pitman\u2013Yor process mixture model. Appl. Intell. 48, 1802\u20131812 (2018)","journal-title":"Appl. Intell."},{"key":"206_CR11","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1023\/A:1007692713085","volume":"39","author":"K Nigam","year":"2000","unstructured":"Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103\u2013134 (2000)","journal-title":"Mach. Learn."},{"key":"206_CR12","first-page":"721","volume":"6","author":"S Geman","year":"1984","unstructured":"Geman, S., Geman, D.: Stochastic relaxation gibbs distributions, and the Bayesian restoration of images. IEEE Trans. 6, 721\u2013741 (1984)","journal-title":"IEEE Trans."},{"key":"206_CR13","first-page":"1303","volume":"14","author":"MD Hoffman","year":"2013","unstructured":"Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303\u20131347 (2013)","journal-title":"J. Mach. Learn. Res."},{"key":"206_CR14","unstructured":"Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering, Litera, (2005)"},{"key":"206_CR15","unstructured":"Tomas, M., Kai, C., Greg, C., Dean, J.: Efficient estimation of word representations in vector space, CoRR, arXiv:1301.3781 (2013)"},{"key":"206_CR16","doi-asserted-by":"crossref","unstructured":"Bagga, A., Baldwin, B.: Entity-based Cross-document Coreferencing using the vector space model. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 1, pp. 79\u201385, (1998)","DOI":"10.3115\/980451.980859"},{"key":"206_CR17","unstructured":"Mimno, D., Hanna Wallach, M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 262\u2013272, (2011)"},{"key":"206_CR18","doi-asserted-by":"crossref","unstructured":"Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50\u201357, (1999)","DOI":"10.1145\/312624.312649"},{"issue":"1","key":"206_CR19","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1115\/1.3662552","volume":"82","author":"RE K\u00e1lm\u00e1n","year":"1960","unstructured":"K\u00e1lm\u00e1n, R.E.: A New Approach to Linear Filtering and Prediction Problems. J Basic Eng 82(1), 35\u201345 (1960)","journal-title":"J Basic Eng"},{"key":"206_CR20","doi-asserted-by":"crossref","unstructured":"Sato, I., Nakagawa, H.: Topic models with power-law using Pitman\u2013Yor process. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 673\u2013682, (2010)","DOI":"10.1145\/1835804.1835890"},{"key":"206_CR21","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1007\/BF02764938","volume":"54","author":"WB Johnson","year":"1986","unstructured":"Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of lipschitz maps into banach spaces. Israel J. Math. 54, 129\u2013138 (1986)","journal-title":"Israel J. Math."},{"key":"206_CR22","volume-title":"Studies in Linguistic Analysis","author":"J Firth","year":"1957","unstructured":"Firth, J.: Studies in Linguistic Analysis. Philological Society, Oxford (1957)"},{"key":"206_CR23","first-page":"3","volume-title":"Distributional Structure","author":"S Harris Zellig","year":"1981","unstructured":"Harris Zellig, S.: Distributional Structure, pp. 3\u201322. Springer, Berlin (1981)"},{"key":"206_CR24","doi-asserted-by":"crossref","unstructured":"Mitchell, J., Lapata, M.: Language models based on semantic composition. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Volume 1, pp. 430\u2013439, (2009)","DOI":"10.3115\/1699510.1699567"},{"key":"206_CR25","unstructured":"Guthrie, D., Allison, B., Liu, W., Guthrie, L., Wilks, Y.: A closer look at skip-gram modelling. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC\u201906), European Language Resources Association (ELRA), (2006)"},{"key":"206_CR26","doi-asserted-by":"crossref","unstructured":"Ghoorchian, K., Girdzijauskas, S., Rahimian, F.: DeGPar: Large scale topic detection using node-cut partitioning on dense weighted graphs. In: IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 1.0, pp. 775\u2013785, (2017)","DOI":"10.1109\/ICDCS.2017.19"}],"container-title":["Progress in Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13748-020-00206-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13748-020-00206-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13748-020-00206-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,14]],"date-time":"2021-05-14T23:12:12Z","timestamp":1621033932000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13748-020-00206-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,15]]},"references-count":26,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["206"],"URL":"https:\/\/doi.org\/10.1007\/s13748-020-00206-2","relation":{},"ISSN":["2192-6352","2192-6360"],"issn-type":[{"type":"print","value":"2192-6352"},{"type":"electronic","value":"2192-6360"}],"subject":[],"published":{"date-parts":[[2020,5,15]]},"assertion":[{"value":"21 August 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 April 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 May 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}