{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T07:37:50Z","timestamp":1771486670430,"version":"3.50.1"},"reference-count":43,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2019,5,5]],"date-time":"2019-05-05T00:00:00Z","timestamp":1557014400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla.<\/jats:p>","DOI":"10.3390\/informatics6020019","type":"journal-article","created":{"date-parts":[[2019,5,9]],"date-time":"2019-05-09T11:22:35Z","timestamp":1557400955000},"page":"19","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla\u2014A Low Resourced Language"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6981-0066","authenticated-orcid":false,"given":"Rajat","family":"Pandit","sequence":"first","affiliation":[{"name":"Department of Computer Science, West Bengal State University, Kolkata 700126, India"}]},{"given":"Saptarshi","family":"Sengupta","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Minnesota Duluth, Duluth, MN 55812, USA"}]},{"given":"Sudip Kumar","family":"Naskar","sequence":"additional","affiliation":[{"name":"Department of Computer Science &amp; Engineering, Jadavpur University, Kolkata 700032, India"}]},{"given":"Niladri Sekhar","family":"Dash","sequence":"additional","affiliation":[{"name":"Linguistic Research Unit, Indian Statistical Institute, Kolkata 700108, India"}]},{"given":"Mohini Mohan","family":"Sardar","sequence":"additional","affiliation":[{"name":"Department of Bengali, West Bengal State University, Kolkata 700126, India"}]}],"member":"1968","published-online":{"date-parts":[[2019,5,5]]},"reference":[{"key":"ref_1","first-page":"13","article-title":"Evaluating wordnet-based measures of semantic distance","volume":"32","author":"Budan","year":"2006","journal-title":"Comutational Linguist."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/TSMCC.2004.829322","article-title":"Toward agency and ontology for web-based information retrieval","volume":"34","author":"Sim","year":"2004","journal-title":"IEEE Trans. Syst. Man Cybern. C Appl. Rev."},{"key":"ref_3","unstructured":"Nguyen, H.A., and Al-Mubaid, H. (2006, January 10\u201312). New ontology-based semantic similarity measure for the biomedical domain. Proceedings of the 2006 IEEE International Conference on Granular Computing, Atlanta, GA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating Semantic Similarity Measures across the Gene Ontology: The Relationship between Sequence and Annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"ref_5","unstructured":"Patwardhan, S. (2003). Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatedness. [Master\u2019s Thesis, University of Minnesota]."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gracia, J., and Mena, E. (2008). Web-Based Measure of Semantic Relatedness. Lecture Notes in Computer Science, Proceedings of the International Conference on Web Information Systems Engineering, Auckland, New Zealand, 1\u20134 September 2008, Springer.","DOI":"10.1007\/978-3-540-85481-4_12"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Dash, N.S., Bhattacharyya, P., and Pawar, J.D. (2017). The WordNet in Indian Languages, Springer.","DOI":"10.1007\/978-981-10-1909-8"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Speer, R., Chin, J., and Havasi, C. (2017, January 4\u20139). Conceptnet 5.5: An open multilingual graph of general knowledge. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11164"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Poli, R., Healy, M., and Kameas, A. (2010). Theory and Applications of Ontology: Computer Applications, Springer.","DOI":"10.1007\/978-90-481-8847-5"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1023\/B:BTTJ.0000047600.45421.6d","article-title":"Conceptnet\u2014A practical commonsense reasoning tool-kit","volume":"22","author":"Liu","year":"2004","journal-title":"BT Technol. J."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1109\/21.24528","article-title":"Development and Application of a Metric on Semantic Nets","volume":"19","author":"Rada","year":"1989","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_12","unstructured":"Richardson, R., Smeaton, A., and Murphy, J. (1994). Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words, School of Computer Applications, Dublin City University. Technical Report Working Paper CA-1294."},{"key":"ref_13","unstructured":"Hirst, G., and St-Onge, D. (1995). Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms, The MIT Press."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wu, Z., and Palmer, M. (1994, January 27\u201330). Verb Semantics and Lexical Selection. Proceedings of the 32th Annual Meeting on Association for Compu-Tational Linguistics, Las Cruces, NM, USA.","DOI":"10.3115\/981732.981751"},{"key":"ref_15","unstructured":"Slimani, T., Yaghlane, B.B., and Mellouli, K. (2019, February 17). A New Similarity Measure Based on Edge Counting, Proceedings of the World Academy of Science, Engineering and Technology. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.307.1229&rep=rep1&type=pdf."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1109\/TKDE.2003.1209005","article-title":"An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources","volume":"15","author":"Li","year":"2003","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_17","unstructured":"Leacock, C. (1994). Filling in a Sparse Training Space for Word Sense Identification. [Ph.D. Thesis, Macquarie University]."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1613\/jair.514","article-title":"Semantic Similarity in Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language","volume":"11","author":"Resnik","year":"1999","journal-title":"J. Artif. Intell. Res."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lin, D. (1993, January 22\u201326). Principle-based parsing without overgeneration. Proceedings of the 31st Annual Meeting on Association for Computational Linguistics (ACL \u201993), Columbus, OH, USA.","DOI":"10.3115\/981574.981590"},{"key":"ref_20","unstructured":"Jiang, J.J., and Conrath, D.W. (1997, January 20). Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taipei, Taiwan."},{"key":"ref_21","unstructured":"Mikolov, T., Yih, W.T., and Zweig, G. (2013, January 9\u201314). Linguistic Regularities in Continuous Space Word Representations. Proceedings of the NAACL-HLT 2013, Atlanta, GA, USA."},{"key":"ref_22","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 5\u201310). Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_23","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient estimation of word representations in vector space. Proceedings of the Workshop at International Conference on Learning Representations (ICLR), Scottsdale, AZ, USA."},{"key":"ref_24","first-page":"1137","article-title":"Neural probabilistic language models","volume":"3","author":"Bengio","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Collobert, R., and Weston, J. (2008, January 5\u20139). A unified architecture for natural language processing: Deep neural networks with multi-task learning. Proceedings of the 25th International Conference on Machine Learning (ICML \u201908), Helsinki, Finland.","DOI":"10.1145\/1390156.1390177"},{"key":"ref_26","unstructured":"Nikolay, A., Panchenko, A., Lukanin, A., Lesota, O., and Romanov, P. (2015). Evaluating Three Corpus-Based Semantic Similarity Systems for Russian. Computational Linguistics and Intellectual Technologies, Dialog 28, HSE Publishing House."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching word vectors with subword information","volume":"Volume 5","author":"Bojanowski","year":"2017","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Wieting, J., Bansal, M., Gimpel, K., and Livescu, K. (2016, January 1\u20135). CHARAGRAM: Embedding words and sentences via character n-grams. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA.","DOI":"10.18653\/v1\/D16-1157"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Neelakantan, A., Shankar, J., Passos, A., and McCallum, A. (2014, January 25\u201329). Efficient nonparametric estimation of multiple embeddings per word in vector space. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1113"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Faruqui, M., and Dyer, C. (2014, January 26\u201330). Improving vector space word representations using multilingual correlation. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden.","DOI":"10.3115\/v1\/E14-1049"},{"key":"ref_31","unstructured":"Conneau, A., Lample, G., Ranzato, M.A., Denoyer, L., and J\u00e9gou, H. (May, January 30). Word translation without parallel data. Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1037\/0033-295X.84.4.327","article-title":"Features of similarity","volume":"84","author":"Tversky","year":"1977","journal-title":"Psychol. Rev."},{"key":"ref_33","first-page":"233","article-title":"X-Similarity: Computing Semantic Similarity between Concepts from Different Ontologies","volume":"4","author":"Petrakis","year":"2006","journal-title":"J. Digit. Inf. Manag."},{"key":"ref_34","unstructured":"Sinha, M., Jana, A., Dasgupta, T., and Basu, A. (2012, January 15). New Semantic Lexicon and Similarity Measure in Bangla. Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (CogALex-III), Mumbai, India."},{"key":"ref_35","first-page":"8","article-title":"Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure","volume":"95","author":"Sinha","year":"2014","journal-title":"Int. J. Comput. Appl."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 25\u201329). GloVe: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_37","unstructured":"Miller, G. (1998). WordNet: An Electronic Lexical Database, MIT Press."},{"key":"ref_38","unstructured":"Dash, N.S. (2010, January 25). Corpus Linguistics: A General Introduction. Proceedings of the National Workshop on Corpus Normalization of the Linguistic Data Consortium for the Indian Languages (LDC-IL), Mysore, India."},{"key":"ref_39","first-page":"7","article-title":"Some Corpus Access Tools for Bangla Corpus","volume":"42","author":"Dash","year":"2016","journal-title":"Indian J. Appl. Linguist."},{"key":"ref_40","unstructured":"Parker, R., Graff, D., Chen, J.K.K., and Maeda, K. (2011). English Gigaword Fifth Edition, Linguistic Data Consortium. LDC2011T07, DVD."},{"key":"ref_41","unstructured":"Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python, O\u2019Reilly Media Inc."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Dash, N.S. (2015). A Descriptive Study of Bangla Words, Cambridge University Press.","DOI":"10.1017\/CBO9781107585706"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"159","DOI":"10.2307\/2529310","article-title":"The Measurement of Observer Agreement for Categorical Data","volume":"33","author":"Landis","year":"1977","journal-title":"Biometrics"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/6\/2\/19\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:49:15Z","timestamp":1760186955000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/6\/2\/19"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,5]]},"references-count":43,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2019,6]]}},"alternative-id":["informatics6020019"],"URL":"https:\/\/doi.org\/10.3390\/informatics6020019","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,5,5]]}}}