{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T18:49:14Z","timestamp":1776106154713,"version":"3.50.1"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T00:00:00Z","timestamp":1729728000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>Topic modeling enables the discovery of concealed themes and patterns in extensive text collections. It facilitates a thorough examination of the messages present in religious texts. Topic modeling for Quranic verses is a trending study area, with various translations already explored including Bahasa, English, and Arabic. Yet, there is a need for further research, particularly in Urdu translations of the Quran. In this study, we propose applying the BERTopic framework to Urdu translations of the Holy Quran. By leveraging the BERTopic approach, which incorporates a fine-tuned BERT model, we aim to capture the contextual nuances and linguistic complexities unique to the Quran. In this study, we utilized existing Urdu translations of the Quran from eight different translators sourced from Tanzil, a renowned resource for Quranic text and translations. We assessed the performance of our proposed BERTopic model compared to traditional techniques like LDA and NMF, using coherence and diversity metrics. The results indicate that our BERT-based approach outperforms these conventional methods, achieving an average coherence improvement of 0.03 and a diversity score of 0.83. These findings highlight the effectiveness of BERTopic in extracting meaningful topics from Urdu translations of the Holy Quran and contribute to the computational analysis of religious texts, supporting scholarly endeavors in comparative studies of Quranic translations in Urdu.<\/jats:p>","DOI":"10.1145\/3694967","type":"journal-article","created":{"date-parts":[[2024,9,9]],"date-time":"2024-09-09T11:01:02Z","timestamp":1725879662000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Transformer-Based Topic Modeling for Urdu Translations of the Holy Quran"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7270-7238","authenticated-orcid":false,"given":"Amna","family":"Zafar","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Engineering and Technology, Lahore, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9248-5540","authenticated-orcid":false,"given":"Muhammad","family":"Wasim","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Management and Technology (Sialkot Campus), Lahore, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1956-6922","authenticated-orcid":false,"given":"Shaista","family":"Zulfiqar","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Management and Technology (Sialkot Campus), Lahore, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8350-5914","authenticated-orcid":false,"given":"Talha","family":"Waheed","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Engineering and Technology, Lahore, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-4207-8061","authenticated-orcid":false,"given":"AbuBakar","family":"Siddique","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Engineering and Technology, Lahore, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,10,24]]},"reference":[{"key":"e_1_3_2_2_1","first-page":"61","volume-title":"International Conference on Model and Data Engineering (MEDI\u201922)","author":"Abdelrazek Aly","year":"2022","unstructured":"Aly Abdelrazek, Walaa Medhat, Eman Gawish, and Ahmed Hassan. 2022. Topic modeling on Arabic language dataset: Comparative study. International Conference on Model and Data Engineering (MEDI\u201922). Springer, 61\u201371. DOI:10.1007\/978-3-031-23119-3_5"},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2021.05.096"},{"key":"e_1_3_2_4_1","unstructured":"Sania Aftar Luca Gagliardelli Amina El Ganadi Federico Ruozzi and Sonia Bergamaschi. 2024. A novel methodology for topic identification in Hadith. In Proceedings of the 20th Conference on Information and Research Science Connecting to Digital and Library Science (IRCDL\u201924)."},{"key":"e_1_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.22937\/IJCSNS.2022.22.4.15"},{"key":"e_1_3_2_6_1","first-page":"630","volume-title":"Science and Information Conference (SAI\u201922)","author":"Qudah Islam Al","year":"2022","unstructured":"Islam Al Qudah, Ibrahim Hashem, Abdelaziz Soufyane, Weisi Chen, and Tarek Merabtene. 2022. Applying latent Dirichlet allocation technique to classify topics on sustainability using Arabic text. Science and Information Conference (SAI\u201922). Springer, 630\u2013638. DOI:10.1007\/978-3-031-10461-9_43"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2022.0130199"},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.14569\/IJACSA.2015.061238"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2852648"},{"key":"e_1_3_2_10_1","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1007\/978-3-030-51935-3_34","volume-title":"International Conference on Image and Signal Processing (ICISPC\u201920)","author":"Allaoui Mebarka","year":"2020","unstructured":"Mebarka Allaoui, Mohammed Lamine Kherfi, and Abdelhakim Cheriet. 2020. Considerably improving clustering algorithms using UMAP dimensionality reduction technique: A comparative study. International Conference on Image and Signal Processing (ICISPC\u201920). Springer, 317\u2013325. DOI:10.1007\/978-3-030-51935-3_34"},{"key":"e_1_3_2_11_1","first-page":"185","volume-title":"Proceedings of the 6th Arabic Natural Language Processing Workshop","author":"Alsaleh Abdullah N.","year":"2021","unstructured":"Abdullah N. Alsaleh, Eric Atwell, and Abdulrahman Altahhan. 2021. Quranic verses semantic relatedness using AraBert. In Proceedings of the 6th Arabic Natural Language Processing Workshop. Leeds, 185\u2013190."},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2021.05.104"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-55187-2_19"},{"key":"e_1_3_2_14_1","volume-title":"International Journal on Islamic Applications in Computer Science and Technology (IJASAT)","author":"Alshammeri Menwa","year":"2021","unstructured":"Menwa Alshammeri, Eric Atwell, and Mhd Ammar Alsalka. 2021c. A Siamese transformer-based architecture for detecting semantic similarity in the Quran. International Journal on Islamic Applications in Computer Science and Technology (IJASAT) 9 (2021), 1\u201311."},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3039548"},{"key":"e_1_3_2_16_1","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei David M.","year":"2003","unstructured":"David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (Jan. 2003), 993\u20131022.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_17_1","first-page":"31","article-title":"Normalized (pointwise) mutual information in collocation extraction","volume":"30","author":"Bouma Gerlof","year":"2009","unstructured":"Gerlof Bouma. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL 30 (2009), 31\u201340.","journal-title":"Proceedings of GSCL"},{"issue":"131","key":"e_1_3_2_18_1","first-page":"1","article-title":"Decoupling sparsity and smoothness in the Dirichlet variational autoencoder topic model","volume":"20","author":"Burkhardt Sophie","year":"2019","unstructured":"Sophie Burkhardt and Stefan Kramer. 2019. Decoupling sparsity and smoothness in the Dirichlet variational autoencoder topic model. Journal of Machine Learning Research 20, 131 (2019), 1\u201327. http:\/\/jmlr.org\/papers\/v20\/18-569.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-016-9482-x"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00325"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.3389\/fsoc.2022.886498"},{"key":"e_1_3_2_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41870-023-01268-w"},{"key":"e_1_3_2_24_1","article-title":"BERTopic: Neural topic modeling with a class-based TF-IDF procedure","author":"Grootendorst Maarten","year":"2022","unstructured":"Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022).","journal-title":"arXiv preprint arXiv:2203.05794"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","unstructured":"Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201999). 50\u201357. DOI:10.1145\/312624.312649","DOI":"10.1145\/312624.312649"},{"key":"e_1_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.31449\/inf.v46i8.4336"},{"key":"e_1_3_2_27_1","first-page":"1322","volume-title":"2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC\/PiCom\/DataCom\/CyberSciTech \u201917)","author":"Khalid Komal","year":"2017","unstructured":"Komal Khalid, Hammad Afzal, Faiza Moqaddas, Naima Iltaf, Ahmed Muqeem Sheri, and Raheel Nawaz. 2017. Extension of semantic based Urdu linguistic resources using natural language processing. In 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC\/PiCom\/DataCom\/CyberSciTech \u201917). IEEE, 1322\u20131325. DOI:10.1109\/DASC-PICom-DataCom-CyberSciTec.2017.214"},{"key":"e_1_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1080\/01638539809545028"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","unstructured":"Jey Han Lau David Newman and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL\u201914). 530\u2013539. DOI:10.3115\/v1\/E14-1056","DOI":"10.3115\/v1\/E14-1056"},{"key":"e_1_3_2_30_1","article-title":"Algorithms for non-negative matrix factorization","volume":"13","author":"Lee Daniel","year":"2000","unstructured":"Daniel Lee and H. Sebastian Seung. 2000. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13 (2000), 205.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00205"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.21105\/joss.00861"},{"key":"e_1_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.17485\/ijst\/2019\/v12i45\/145722"},{"key":"e_1_3_2_34_1","first-page":"68","volume-title":"ATAIT","author":"Mustafa Mubashar","year":"2021","unstructured":"Mubashar Mustafa, Feng Zeng, Hussain Ghulam, and Wenjia Li. 2021. Discovering coherent topics from Urdu text. In ATAIT. 68\u201380."},{"key":"e_1_3_2_35_1","first-page":"127","volume-title":"2023 6th International Conference on Information and Computer Technologies (ICICT \u201923)","author":"Mustafa Mubashar","year":"2023","unstructured":"Mubashar Mustafa, Feng Zeng, Usama Manzoor, and Lin Meng. 2023. Discovering coherent topics from Urdu text: A comparative study of statistical models, clustering techniques and word embedding. In 2023 6th International Conference on Information and Computer Technologies (ICICT \u201923). IEEE, 127\u2013131. DOI:10.1109\/ICICT58900.2023.00028"},{"key":"e_1_3_2_36_1","first-page":"6345","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Nan Feng","year":"2019","unstructured":"Feng Nan, Ran Ding, Ramesh Nallapati, and Bing Xiang. 2019. Topic modeling with Wasserstein autoencoders. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6345\u20136381. DOI:10.18653\/v1\/P19-1640"},{"key":"e_1_3_2_37_1","doi-asserted-by":"publisher","DOI":"10.1080\/14786440109462720"},{"key":"e_1_3_2_38_1","doi-asserted-by":"publisher","DOI":"10.11591\/ijeecs.v12.i1.pp406-411"},{"key":"e_1_3_2_39_1","first-page":"1","volume-title":"2018 24th International Conference on Automation and Computing (ICAC \u201918)","author":"Rehman Anwar Ur","year":"2018","unstructured":"Anwar Ur Rehman, Zobia Rehman, Junaid Akram, Waqar Ali, Munam Ali Shah, and Muhammad Salman. 2018. Statistical topic modeling for Urdu text articles. In 2018 24th International Conference on Automation and Computing (ICAC \u201918). IEEE, 1\u20136. DOI:10.23919\/IConAC.2018.8748975"},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/d19-1410"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","unstructured":"Michael R\u00f6der Andreas Both and Alexander Hinneburg. 2015. Exploring the space of topic coherence measures. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (WSDM\u201915). 399\u2013408. DOI:10.1145\/2684822.2685324","DOI":"10.1145\/2684822.2685324"},{"key":"e_1_3_2_42_1","volume-title":"Proceedings of the 2nd International Conference on Quran and Hadith Studies Information Technology and Media in Conjunction with the 1st International Conference on Islam, Science and Technology (ICONQUHAS & ICONIST \u201920)","author":"Rolliawati Dwi","year":"2020","unstructured":"Dwi Rolliawati, Indri Rozas, and Khalid Khalid. 2020. Text mining approach for topic modeling of corpus Al Qur\u2019an in Indonesian translation. In Proceedings of the 2nd International Conference on Quran and Hadith Studies Information Technology and Media in Conjunction with the 1st International Conference on Islam, Science and Technology (ICONQUHAS & ICONIST \u201920). DOI:10.4108\/eai.2-10-2018.2295559"},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3431728"},{"key":"e_1_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324921000425"},{"key":"e_1_3_2_45_1","first-page":"117","volume-title":"2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC \u201918)","author":"Shakeel Khadija","year":"2018","unstructured":"Khadija Shakeel, Ghulam Rasool Tahir, Irsha Tehseen, and Mubashir Ali. 2018. A framework of Urdu topic modeling using latent dirichlet allocation (LDA). In 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC \u201918). IEEE, 117\u2013123. DOI:10.1109\/CCWC.2018.8301655"},{"key":"e_1_3_2_46_1","doi-asserted-by":"publisher","unstructured":"Muazzam Ahmed Siddiqui Syed Muhammad Faraz and Sohail Abdul Sattar. 2013. Discovering the thematic structure of the Quran using probabilistic topic model. 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences (NOORIC\u201913). IEEE 234\u2013239. DOI:10.1109\/NOORIC.2013.55","DOI":"10.1109\/NOORIC.2013.55"},{"key":"e_1_3_2_47_1","first-page":"234","volume-title":"Proceedings of the 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences (NOORIC \u201913)","author":"Siddiqui M. A.","year":"2015","unstructured":"M. A. Siddiqui, S. M. Faraz, and S. A. Sattar. 2015. Discovering the thematic structure of the Quran using probabilistic topic model. In Proceedings of the 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences (NOORIC \u201913). Institute of Electrical and Electronics Engineers Inc., 234\u2013239. DOI:10.1109\/NOORIC.2013.55"},{"key":"e_1_3_2_48_1","volume-title":"Proceedings of the 1st International Workshop RELATED\u2014Relations in the Legal Domain 2021","author":"Silveira R.","year":"2021","unstructured":"R. Silveira, C. G. O. Fernandes, J. A. Monteiro Neto, V. Furtado, and J. E. P. Filho. 2021. Topic modelling of legal documents via LEGAL-BERT. In Proceedings of the 1st International Workshop RELATED\u2014Relations in the Legal Domain 2021. DOI:10.2139\/ssrn.4539091"},{"key":"e_1_3_2_49_1","doi-asserted-by":"publisher","unstructured":"Silvia Terragni Elisabetta Fersini and Enza Messina. 2021. Word embedding-based topic similarity measures. International Conference on Applications of Natural Language to Information Systems (NLDB\u201921). 33\u201345. DOI:10.1007\/978-3-030-80599-9_4","DOI":"10.1007\/978-3-030-80599-9_4"},{"issue":"86","key":"e_1_3_2_50_1","first-page":"2579","article-title":"Visualizing data using t-SNE.","volume":"9","author":"Maaten Laurens Van der","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579\u20132605. http:\/\/jmlr.org\/papers\/v9\/vandermaaten08a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2020.101582"},{"key":"e_1_3_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/1852102.1852106"},{"key":"e_1_3_2_53_1","doi-asserted-by":"publisher","DOI":"10.3390\/app122111220"},{"key":"e_1_3_2_54_1","doi-asserted-by":"publisher","DOI":"10.1287\/isre.2022.1124"},{"key":"e_1_3_2_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3112620"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3694967","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3694967","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:07Z","timestamp":1750295887000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3694967"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,24]]},"references-count":54,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3694967"],"URL":"https:\/\/doi.org\/10.1145\/3694967","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,24]]},"assertion":[{"value":"2023-10-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-29","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}