{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T02:58:06Z","timestamp":1764212286562,"version":"build-2065373602"},"reference-count":69,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,3,1]],"date-time":"2023-03-01T00:00:00Z","timestamp":1677628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"UEFISCDI Romania and MCI through BEIA projects AutoDecS, SOLID-B5G, T4ME2, DISAVIT, PIMEO-AI, AISTOR, MULTI-AI, ADRIATIC, Hydro3D, PREVENTION, DAFCC, EREMI, ADCATER, MUSEION, FinSESCo, iPREMAS, IPSUS, U-GARDEN, CREATE","award":["101037866","PN 19 11","19PFE\/30.12.2021"],"award-info":[{"award-number":["101037866","PN 19 11","19PFE\/30.12.2021"]}]},{"name":"European Union\u2019s Horizon Europe research and innovation program","award":["101037866","PN 19 11","19PFE\/30.12.2021"],"award-info":[{"award-number":["101037866","PN 19 11","19PFE\/30.12.2021"]}]},{"name":"Ministry of Research, Innovation, Digitization from Romania by the National Plan of R &amp; D","award":["101037866","PN 19 11","19PFE\/30.12.2021"],"award-info":[{"award-number":["101037866","PN 19 11","19PFE\/30.12.2021"]}]},{"name":"Institutional performance-Projects to finance excellence in RDI","award":["101037866","PN 19 11","19PFE\/30.12.2021"],"award-info":[{"award-number":["101037866","PN 19 11","19PFE\/30.12.2021"]}]},{"name":"the National Center for Hydrogen and Fuel Cells (CNHPC)\u2014Installations and Special Objectives of National Interest (IOSIN)","award":["101037866","PN 19 11","19PFE\/30.12.2021"],"award-info":[{"award-number":["101037866","PN 19 11","19PFE\/30.12.2021"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Topic modeling is a machine learning algorithm based on statistics that follows unsupervised machine learning techniques for mapping a high-dimensional corpus to a low-dimensional topical subspace, but it could be better. A topic model\u2019s topic is expected to be interpretable as a concept, i.e., correspond to human understanding of a topic occurring in texts. While discovering corpus themes, inference constantly uses vocabulary that impacts topic quality due to its size. Inflectional forms are in the corpus. Since words frequently appear in the same sentence and are likely to have a latent topic, practically all topic models rely on co-occurrence signals between various terms in the corpus. The topics get weaker because of the abundance of distinct tokens in languages with extensive inflectional morphology. Lemmatization is often used to preempt this problem. Gujarati is one of the morphologically rich languages, as a word may have several inflectional forms. This paper proposes a deterministic finite automaton (DFA) based lemmatization technique for the Gujarati language to transform lemmas into their root words. The set of topics is then inferred from this lemmatized corpus of Gujarati text. We employ statistical divergence measurements to identify semantically less coherent (overly general) topics. The result shows that the lemmatized Gujarati corpus learns more interpretable and meaningful subjects than unlemmatized text. Finally, results show that lemmatization curtails the size of vocabulary decreases by 16% and the semantic coherence for all three measurements\u2014Log Conditional Probability, Pointwise Mutual Information, and Normalized Pointwise Mutual Information\u2014from \u22129.39 to \u22127.49, \u22126.79 to \u22125.18, and \u22120.23 to \u22120.17, respectively.<\/jats:p>","DOI":"10.3390\/s23052708","type":"journal-article","created":{"date-parts":[[2023,3,2]],"date-time":"2023-03-02T02:10:59Z","timestamp":1677723059000},"page":"2708","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Modeling Topics in DFA-Based Lemmatized Gujarati Text"],"prefix":"10.3390","volume":"23","author":[{"given":"Uttam","family":"Chauhan","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Vishwakarma Government Engineering College, Chandkheda, Ahmedabad 382424, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-2267-5768","authenticated-orcid":false,"given":"Shrusti","family":"Shah","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Vishwakarma Government Engineering College, Chandkheda, Ahmedabad 382424, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dharati","family":"Shiroya","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Vishwakarma Government Engineering College, Chandkheda, Ahmedabad 382424, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dipti","family":"Solanki","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Vishwakarma Government Engineering College, Chandkheda, Ahmedabad 382424, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zeel","family":"Patel","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Vishwakarma Government Engineering College, Chandkheda, Ahmedabad 382424, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2375-5057","authenticated-orcid":false,"given":"Jitendra","family":"Bhatia","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad 382481, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1776-4651","authenticated-orcid":false,"given":"Sudeep","family":"Tanwar","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad 382481, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ravi","family":"Sharma","sequence":"additional","affiliation":[{"name":"Ravi Sharma, Centre for Inter-Disciplinary Research and Innovation, University of Petroleum and Energy Studies, Dehradun 248001, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Verdes","family":"Marina","sequence":"additional","affiliation":[{"name":"Faculty of Civil Engineering and Building Services, Department of Building Services, Technical University of Gheorghe Asachi, 700050 Iasi, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7277-4377","authenticated-orcid":false,"given":"Maria Simona","family":"Raboaca","sequence":"additional","affiliation":[{"name":"Doctoral School, University Politehnica of Bucharest, Splaiul Independentei Street No. 313, 060042 Bucharest, Romania"},{"name":"National Research and Development Institute for Cryogenic and Isotopic Technologies\u2014ICSI Rm. V\u00e2lcea, Uzinei Street, No. 4, P.O. Box 7 R\u00e2ureni, 240050 R\u00e2mnicu V\u00e2lcea, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,3,1]]},"reference":[{"key":"ref_1","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","article-title":"Indexing by latent semantic analysis","volume":"41","author":"Deerwester","year":"1990","journal-title":"J. Am. Soc. Inf. Sci."},{"key":"ref_3","unstructured":"Hofmann, T. (August, January 30). Probabilistic latent semantic analysis. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"5228","DOI":"10.1073\/pnas.0307752101","article-title":"Finding scientific topics","volume":"101","author":"Griffiths","year":"2004","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1007\/s11192-014-1321-8","article-title":"Clustering scientific documents with topic modeling","volume":"100","author":"Yau","year":"2014","journal-title":"Scientometrics"},{"key":"ref_6","unstructured":"Rosen-Zvi, M., Griffiths, T., Steyvers, M., and Smyth, P. (2004, January 7\u201311). The author-topic model for authors and documents. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, Banff, AB, Canada."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Steyvers, M., Smyth, P., Rosen-Zvi, M., and Griffiths, T. (2004, January 22\u201325). Probabilistic author-topic models for information discovery. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014087"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.jbi.2016.02.003","article-title":"Modeling healthcare data using multiple-channel latent Dirichlet allocation","volume":"60","author":"Lu","year":"2016","journal-title":"J. Biomed. Inform."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Paul, M.J., and Dredze, M. (2014). Discovering health topics in social media using topic models. PLoS ONE, 9.","DOI":"10.1371\/journal.pone.0103408"},{"key":"ref_10","unstructured":"Kayi, E.S., Yadav, K., Chamberlain, J.M., and Choi, H.A. (2017). Topic Modeling for Classification of Clinical Reports. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1016\/j.jbi.2015.10.012","article-title":"Discovering treatment pattern in Traditional Chinese Medicine clinical cases by exploiting supervised topic model and domain knowledge","volume":"58","author":"Yao","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Asuncion, H.U., Asuncion, A.U., and Taylor, R.N. (2010, January 2\u20138). Software traceability with topic modeling. Proceedings of the 2010 ACM\/IEEE 32nd International Conference on Software Engineering, Cape Town, South Africa.","DOI":"10.1145\/1806799.1806817"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.jss.2016.05.015","article-title":"Topic-based software defect explanation","volume":"129","author":"Chen","year":"2017","journal-title":"J. Syst. Softw."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1068","DOI":"10.1109\/TSE.2018.2874960","article-title":"Changeset-based topic modeling of software repositories","volume":"46","author":"Corley","year":"2018","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"972","DOI":"10.1016\/j.infsof.2010.04.002","article-title":"Bug localization using latent dirichlet allocation","volume":"52","author":"Lukins","year":"2010","journal-title":"Inf. Softw. Technol."},{"key":"ref_16","unstructured":"\u0158eh\u016f\u0159ek, R., and Sojka, P. (2010). Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, ELRA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.infsof.2015.05.003","article-title":"Msr4sm: Using topic models to effectively mining software repositories for software maintenance tasks","volume":"66","author":"Sun","year":"2015","journal-title":"Inf. Softw. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1016\/j.scico.2012.08.003","article-title":"Studying software evolution using topic models","volume":"80","author":"Thomas","year":"2014","journal-title":"Sci. Comput. Program."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tian, K., Revelle, M., and Poshyvanyk, D. (2009, January 16\u201317). Using latent dirichlet allocation for automatic categorization of software. Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, Vancouver, BC, Canada.","DOI":"10.1109\/MSR.2009.5069496"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"2489","DOI":"10.1016\/j.patcog.2011.12.022","article-title":"Video fingerprinting using Latent Dirichlet Allocation and facial images","volume":"45","author":"Vretos","year":"2012","journal-title":"Pattern Recognit."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.imavis.2015.02.003","article-title":"Incremental probabilistic Latent Semantic Analysis for video retrieval","volume":"38","author":"Pla","year":"2015","journal-title":"Image Vis. Comput."},{"key":"ref_22","first-page":"25","article-title":"Discovering Latent Topics by Gaussian Latent Dirichlet Allocation and Spectral Clustering","volume":"15","author":"Yuan","year":"2019","journal-title":"ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1016\/j.patcog.2013.06.010","article-title":"Latent topic model for audio retrieval","volume":"47","author":"Hu","year":"2014","journal-title":"Pattern Recognit."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Gao, N., Gao, L., He, Y., Wang, H., and Sun, Q. (2013, January 13\u201315). Topic detection based on group average hierarchical clustering. Proceedings of the 2013 International Conference on Advanced Cloud and Big Data, Nanjing, China.","DOI":"10.1109\/CBD.2013.38"},{"key":"ref_25","unstructured":"Kim, D., and Oh, A. (2014, January 22\u201324). Hierarchical Dirichlet scaling process. Proceedings of the International Conference on Machine Learning, Beijing, China."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.1109\/TKDE.2017.2786727","article-title":"Supervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications","volume":"30","author":"Li","year":"2017","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_27","unstructured":"Teh, Y.W., Jordan, M.I., Beal, M.J., and Blei, D.M. (2005, January 5\u20138). Sharing clusters among related groups: Hierarchical Dirichlet processes. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yang, S., Yuan, C., Hu, W., and Ding, X. (2014, January 24\u201328). A hierarchical model based on latent dirichlet allocation for action recognition. Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden.","DOI":"10.1109\/ICPR.2014.451"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.neucom.2010.11.038","article-title":"A hierarchical latent topic model based on sparse coding","volume":"76","author":"Zhu","year":"2012","journal-title":"Neurocomputing"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Fang, A., Macdonald, C., Ounis, I., and Habel, P. (2016, January 20). Topics in tweets: A user study of topic coherence metrics for Twitter data. Proceedings of the European Conference on Information Retrieval, Padua, Italy.","DOI":"10.1007\/978-3-319-30671-1_36"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Weng, J., Lim, E.P., Jiang, J., and He, Q. (2010, January 3\u20136). Twitterrank: Finding topic-sensitive influential twitterers. Proceedings of the Third ACM International Conference on Web Search and Data Mining, New York, NY, USA.","DOI":"10.1145\/1718487.1718520"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Bhattacharya, P., Zafar, M.B., Ganguly, N., Ghosh, S., and Gummadi, K.P. (2014, January 6\u201310). Inferring user interests in the twitter social network. Proceedings of the 8th ACM Conference on Recommender Systems, Foster City, CA, USA.","DOI":"10.1145\/2645710.2645765"},{"key":"ref_33","unstructured":"Cordeiro, M. (2012). Proceedings of the Doctoral Symposium on Informatics Engineering, Faculdade de Engenharia da Universidade do Porto."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.is.2013.11.003","article-title":"TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation","volume":"42","author":"Kim","year":"2014","journal-title":"Inf. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.compenvurbsys.2016.04.002","article-title":"The geography of Twitter topics in London","volume":"58","author":"Lansley","year":"2016","journal-title":"Comput. Environ. Urban Syst."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.ins.2016.06.040","article-title":"A topic-enhanced word embedding for twitter sentiment classification","volume":"369","author":"Ren","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_37","first-page":"304","article-title":"An LDA and synonym lexicon based approach to product feature extraction from online consumer product reviews","volume":"14","author":"Ma","year":"2013","journal-title":"J. Electron. Commer. Res."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.jbi.2016.06.001","article-title":"Topic detection using paragraph vectors to support active learning in systematic reviews","volume":"62","author":"Hashimoto","year":"2016","journal-title":"J. Biomed. Inform."},{"key":"ref_39","first-page":"526","article-title":"A hierarchical aspect-sentiment model for online reviews","volume":"27","author":"Kim","year":"2013","journal-title":"Proc. Aaai Conf. Artif. Intell."},{"key":"ref_40","first-page":"432","article-title":"Pulling out the stops: Rethinking stopword removal for topic models","volume":"Volume 2","author":"Schofield","year":"2017","journal-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Short Papers"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1007\/s10791-011-9171-y","article-title":"Arabic texts analysis for topic modeling evaluation","volume":"15","author":"Brahmi","year":"2012","journal-title":"Inf. Retr."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"653","DOI":"10.1016\/j.ipm.2017.01.003","article-title":"Vocabulary size and its effect on topic representation","volume":"53","author":"Lu","year":"2017","journal-title":"Inf. Process. Manag."},{"key":"ref_43","unstructured":"Paul, S., Tandon, M., Joshi, N., and Mathur, I. (2013). Proceedings of Third International Workshop on Artificial Intelligence, Soft Computing and Applications, Chennai, India, 27 July 2013, AIRCC Publishing Corporation."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2835494","article-title":"Benlem (A bengali lemmatizer) and its role in WSD","volume":"15","author":"Chakrabarty","year":"2016","journal-title":"ACM Trans. Asian-Low-Resour. Lang. Inf. Process. (TALLIP)"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Kumar, A.M., and Soman, K. (2014, January 5\u20137). AMRITA_CEN@ FIRE-2014: Morpheme Extraction and Lemmatization for Tamil using Machine Learning. Proceedings of the Forum for Information Retrieval Evaluation, Bangalore, India.","DOI":"10.1145\/2824864.2824883"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Al-Shammari, E., and Lin, J. (2008, January 24). A novel Arabic lemmatization algorithm. Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, Association for Computing Machinery, New York, NY, USA.","DOI":"10.1145\/1390749.1390767"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Al-Shammari, E.T., and Lin, J. (2008, January 30). Towards an error-free Arabic stemming. Proceedings of the 2nd ACM Workshop on Improving non English Web Searching, Napa Valley, CA, USA.","DOI":"10.1145\/1460027.1460030"},{"key":"ref_48","unstructured":"Roth, R., Rambow, O., Habash, N., Diab, M., and Rudin, C. (2008). Proceedings of the ACL-08: HLT, Short Papers, Association for Computational Linguistics."},{"key":"ref_49","unstructured":"Seddah, D., Chrupa\u0142a, G., \u00c7etino\u011flu, \u00d6., Van Genabith, J., and Candito, M. (2010). Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages, Association for Computational Linguistics."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Piskorski, J., Sydow, M., and Kup\u015b\u0107, A. (2007, January 29). Lemmatization of Polish person names. Proceedings of the Workshop on Balto-Slavonic Natural Language Processing, Prague, Czech Republic.","DOI":"10.3115\/1567545.1567551"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Korenius, T., Laurikkala, J., J\u00e4rvelin, K., and Juhola, M. (2004, January 8\u201313). Stemming and lemmatization in the clustering of finnish text documents. Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, Washington, DC, USA.","DOI":"10.1145\/1031171.1031285"},{"key":"ref_52","unstructured":"Ku\u010dera, K., and Stluka, M. (2014, January 19\u201320). Data processing and lemmatization in digitized 19th-century Czech texts. Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage, Madrid, Spain."},{"key":"ref_53","unstructured":"Eger, S., Gleim, R., and Mehler, A. (2016, January 23\u201328). Lemmatization and morphological tagging in German and Latin: A comparison and a survey of the state-of-the-art. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916), Portoro\u017e, Slovenia."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Lazarinis, F. (2007, January 14\u201317). Lemmatization and stopword elimination in Greek Web searching. Proceedings of the 2007 Euro American conference on Telematics and Information Systems, Faro, Portugal.","DOI":"10.1145\/1352694.1352757"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Rakhimova, D., and Turganbayeva, A. (2019, January 6\u20138). Lemmatization of big data in the Kazakh language. Proceedings of the 5th International Conference on Engineering and MIS, Astana, Kazakhstan.","DOI":"10.1145\/3330431.3330447"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Ozturkmenoglu, O., and Alpkocak, A. (2012, January 2\u20134). Comparison of different lemmatization approaches for information retrieval on Turkish text collection. Proceedings of the 2012 International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, Turkey.","DOI":"10.1109\/INISTA.2012.6246934"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Toporkov, O., and Agerri, R. (2023). On the Role of Morphological Information for Contextual Lemmatization. arXiv.","DOI":"10.1162\/coli_a_00497"},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Hafeez, R., Anwar, M.W., Jamal, M.H., Fatima, T., Espinosa, J.C.M., L\u00f3pez, L.A.D., Thompson, E.B., and Ashraf, I. (2023). Contextual Urdu Lemmatization Using Recurrent Neural Network Models. Mathematics, 11.","DOI":"10.3390\/math11020435"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3502157","article-title":"A Lemmatizer for Low-resource Languages: WSD and Its Role in the Assamese Language","volume":"21","author":"Gogoi","year":"2022","journal-title":"Trans. Asian-Low-Resour. Lang. Inf. Process."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/j.procs.2018.10.468","article-title":"Towards an optimal solution to lemmatization in Arabic","volume":"142","author":"Freihat","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_61","unstructured":"Porter, M. (2022, September 09). The Porter Stemming Algorithm (1980). Available online: http:\/\/tartarus.org\/martin\/PorterStemmer."},{"key":"ref_62","unstructured":"Wikipedia Contributors (2021, December 04). Gujarati Language\u2014Wikipedia, the Free Encyclopedia. Available online: https:\/\/en.wikipedia.org\/wiki\/Gujarati_language."},{"key":"ref_63","unstructured":"Suba, K., Jiandani, D., and Bhattacharyya, P. (2011, January 8\u201313). Hybrid inflectional stemmer and rule-based derivational stemmer for gujarati. Proceedings of the 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP), Chiang Mai, Thailand."},{"key":"ref_64","unstructured":"Ameta, J., Joshi, N., and Mathur, I. (2012). A lightweight stemmer for Gujarati. arXiv."},{"key":"ref_65","unstructured":"Aswani, N., and Gaizauskas, R.J. (2010, January 17\u201323). Developing Morphological Analysers for South Asian Languages: Experimenting with the Hindi and Gujarati Languages. Proceedings of the LREC, Valletta, Malta."},{"key":"ref_66","unstructured":"Popat, P.P.K., and Bhattacharyya, P. (2010, January 23\u201327). Hybrid stemmer for gujarati. Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Wallach, H.M., Murray, I., Salakhutdinov, R., and Mimno, D. (2009, January 14\u201318). Evaluation methods for topic models. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553515"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Lau, J.H., Newman, D., and Baldwin, T. (2014). Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. EACL, 530\u2013539.","DOI":"10.3115\/v1\/E14-1056"},{"key":"ref_69","unstructured":"Aletras, N., and Stevenson, M. (2013). Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)\u2013Long Papers, Association for Computational Linguistics."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2708\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:45:28Z","timestamp":1760121928000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/5\/2708"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,1]]},"references-count":69,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["s23052708"],"URL":"https:\/\/doi.org\/10.3390\/s23052708","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,3,1]]}}}