{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T22:14:37Z","timestamp":1761948877504,"version":"3.44.0"},"reference-count":37,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T00:00:00Z","timestamp":1742342400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Intelligent Decision Technologies"],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>Topic modelling is an important technique for extracting meaningful insights from large volumes of unstructured text data. The paper presents a federated technique for topic modelling that is based on a novel approach of the Latent Dirichlet Allocation (LDA) method for topic model generation. The proposed approach enables a topic model to be developed in a distributed environment using data continually generated from multiple sources without the need for sharing actual data. The first iteration of the topic modelling uses unsupervised LDA at each device generating the data. The results of each device are aggregated at a central server to generate a set of seed words that are used for guided LDA by the subsequent iterations of topic modelling. The proposed work, Federated LDA (F-LDA) has been evaluated using two datasets: a text dataset of dialogues between patients and doctors based on factual conversations and another comprising tweets related to depression. Comparing the performance of F-LDA with that of a centralized LDA, it was observed that F-LDA results in improved coherence score as well as diversity score in comparison to centralized LDA. This indicates that F-LDA achieves better interpretable topics covering a wide range of themes without redundancy.<\/jats:p>","DOI":"10.1177\/18724981251319629","type":"journal-article","created":{"date-parts":[[2025,3,19]],"date-time":"2025-03-19T12:06:23Z","timestamp":1742385983000},"page":"2738-2759","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":7,"title":["A federated LDA-based approach for topic modelling"],"prefix":"10.1177","volume":"19","author":[{"given":"Richa","family":"Kushwaha","sequence":"first","affiliation":[{"name":"Jaypee Institute of Information Technology, Noida, India"}]},{"given":"Parmeet","family":"Kaur","sequence":"additional","affiliation":[{"name":"Jaypee Institute of Information Technology, Noida, India"}]}],"member":"179","published-online":{"date-parts":[[2025,3,19]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"1","article-title":"A survey of machine learning for big data processing","volume":"2016","author":"Qiu J","year":"2016","unstructured":"Qiu J, Wu Q, Ding G, et\u00a0al. A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016; 2016: 1\u201316.","journal-title":"EURASIP J Adv Signal Process"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2626379"},{"key":"e_1_3_2_4_2","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei D","year":"2003","unstructured":"Blei D, Ng A, Jordan M. Latent dirichlet allocation. J. Mach Learn Res 2003; 3: 993\u20131022.","journal-title":"J. Mach Learn Res"},{"volume-title":"A comparison of different topic modelling methods through a real case study of Italian customer care","author":"Papadia G","key":"e_1_3_2_5_2","unstructured":"Papadia G, Pacella M, Perrone M, et\u00a0al. A comparison of different topic modelling methods through a real case study of Italian customer care. MDPI, 8 February 2023."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-018-6894-4"},{"key":"e_1_3_2_7_2","unstructured":"Angelov D. Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470. (2020)."},{"key":"e_1_3_2_8_2","first-page":"74","volume-title":"Automated disease detection based on clinical text using topic modeling","author":"Yochum P","unstructured":"Yochum P, Nisamaneewong T. Automated disease detection based on clinical text using topic modeling. December 2022, pp.74\u201379: ICIT."},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","unstructured":"Jelodar H Wang Y Yuan C. Latent Dirichlet allocation (LDA) and topic modeling: models applications a survey. (13 November 2018).","DOI":"10.1007\/s11042-018-6894-4"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3150329"},{"key":"e_1_3_2_11_2","first-page":"153","volume-title":"2022 1st International Conference on Informatics (ICI)","author":"Kushwaha R","unstructured":"Kushwaha R, Kaur P. Depression detection on social Media. In: 2022 1st International Conference on Informatics (ICI), 2022, pp.153\u2013158."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3109425"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3056185"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2020.3032021"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3013541"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2021.108122"},{"key":"e_1_3_2_17_2","first-page":"1539","article-title":"Accelerating federated learning over reliability-agnostic clients in mobile edge computing systems","volume":"32","author":"Wu W","year":"2020","unstructured":"Wu W, He L, Lin W, et\u00a0al. Accelerating federated learning over reliability-agnostic clients in mobile edge computing systems. IEEE Trans Parallel Distrib Syst 2020; 32: 1539\u20131155.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377454"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s41666-020-00082-4"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-021-00489-2"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2021.103735"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comcom.2021.02.014"},{"key":"e_1_3_2_23_2","unstructured":"Ge S Wu F Wu C et\u00a0al. Fedner: Privacy-preserving medical named entity recognition with federated learning. arXiv preprint arXiv:2003.09288. (2020)."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40012-022-00351-0"},{"volume-title":"Word sense disambiguation using cosine similarity collaborates withWord2vec and WordNet","author":"Orkphol K","key":"e_1_3_2_25_2","unstructured":"Orkphol K. Word sense disambiguation using cosine similarity collaborates withWord2vec and WordNet. MDPI, 12 May 2019."},{"volume-title":"Word sense disambiguation using cosine similarity collaborates withWord2vec and WordNet","author":"Orkphol K","key":"e_1_3_2_26_2","unstructured":"Orkphol K, Yang W. Word sense disambiguation using cosine similarity collaborates withWord2vec and WordNet. MDPI, 12 May 2019."},{"key":"e_1_3_2_27_2","first-page":"138","article-title":"Aggregated topic models for increasing social media topic coherence","author":"Blair SJ","year":"2022","unstructured":"Blair SJ, Bi Y. Aggregated topic models for increasing social media topic coherence. Appl Intell 2022: 138\u2013156.","journal-title":"Appl Intell"},{"key":"e_1_3_2_28_2","unstructured":"McCallum AK. Mallet: A machine learning for language toolkit http:\/\/mallet.cs.umass.edu (2002)."},{"key":"e_1_3_2_29_2","unstructured":"A Semisupervised form of LDA: https:\/\/guidedlda.readthedocs.io\/en\/latest."},{"key":"e_1_3_2_30_2","unstructured":"https:\/\/www.kaggle.com\/datasets\/dsxavier\/diagnoise-me (accessed 25 July 2022)."},{"key":"e_1_3_2_31_2","unstructured":"https:\/\/www.kaggle.com\/datasets\/ferno2\/training1600000processednoemoticoncsv (accessed 01 January 2025)."},{"key":"e_1_3_2_32_2","first-page":"225","volume-title":"A comparison study between coherence and perplexity for determining the number of topics in practitioners\u2019 interviews analysis\u201d","author":"Mediano JM","unstructured":"Mediano JM, Quintero JAC. A comparison study between coherence and perplexity for determining the number of topics in practitioners\u2019 interviews analysis\u201d. 13 April 2022, pp.225\u2013234."},{"key":"e_1_3_2_33_2","first-page":"165","volume-title":"2017 IEEE International conference on data science and advanced analytics (DSAA)","author":"Syed S","unstructured":"Syed S, Spruit M. Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. In: 2017 IEEE International conference on data science and advanced analytics (DSAA), 2017, pp.165\u2013174: IEEE."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40501-3_30"},{"key":"e_1_3_2_35_2","first-page":"65","article-title":"Automated topic labeling with top-n keywords","volume":"23","author":"Zhang S","year":"2020","unstructured":"Zhang S, Wang H, Han J. Automated topic labeling with top-n keywords. Inform Retr J 2020; 23: 65\u201389.","journal-title":"Inform Retr J"},{"key":"e_1_3_2_36_2","first-page":"1227","volume-title":"2009 Ninth International Conference on Intelligent Systems Design and Applications","author":"Magatti D","unstructured":"Magatti D, Calegari S, Ciucci D, et\u00a0al. Automatic labeling of topics. In: 2009 Ninth International Conference on Intelligent Systems Design and Applications, Pisa, Italy, 2009, pp.1227\u20131232."},{"key":"e_1_3_2_37_2","first-page":"1536","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics","author":"Lau JH","unstructured":"Lau JH, Grieser K, Newman D, et\u00a0al. Automatic labelling of topic models. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19-24, 2011, pp.1536\u20131545."},{"key":"e_1_3_2_38_2","first-page":"1096","article-title":"An ontology-based labeling of influential topics using topic network analysis","volume":"15","author":"Kim HH","year":"2019","unstructured":"Kim HH, Rhee HY. An ontology-based labeling of influential topics using topic network analysis. J Inform Proces Syst 2019; 15: 1096\u20131107.","journal-title":"J Inform Proces Syst"}],"container-title":["Intelligent Decision Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/18724981251319629","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/18724981251319629","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/18724981251319629","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,18]],"date-time":"2025-08-18T13:39:34Z","timestamp":1755524374000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/18724981251319629"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,19]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.1177\/18724981251319629"],"URL":"https:\/\/doi.org\/10.1177\/18724981251319629","relation":{},"ISSN":["1872-4981","1875-8843"],"issn-type":[{"type":"print","value":"1872-4981"},{"type":"electronic","value":"1875-8843"}],"subject":[],"published":{"date-parts":[[2025,3,19]]}}}