{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,16]],"date-time":"2025-10-16T06:59:27Z","timestamp":1760597967740,"version":"3.41.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2020,6,18]],"date-time":"2020-06-18T00:00:00Z","timestamp":1592438400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"DST for the INSPIRE fellowship"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"<jats:p>Technology has evolved remarkably, which has led to an exponential increase in the availability of digital text documents of disparate domains over the Internet. This makes the retrieval of the information a very much time- and resource-consuming task. Thus, a system that can categorize such documents based on their domains can truly help the users in obtaining the required information with relative ease and also reduce the workload of the search engines. This article presents a text categorization system (CESS) that categorizes text document using newly proposed hybrid features that combines term frequency-inverse document frequency-inverse class frequency and modified chi-square methods. Experiments were performed on real-world Bangla documents from eight domains comprises of 24,29,857 tokens, and the highest accuracy of 99.91% has been obtained with multilayer perceptron-based classification. Also, the experiments were tested on Reuters-21578 and 20 Newsgroups datasets and obtained accuracies of 97.29% and 94.67%, respectively, to show the language-independent nature of the system.<\/jats:p>","DOI":"10.1145\/3398070","type":"journal-article","created":{"date-parts":[[2020,6,18]],"date-time":"2020-06-18T21:15:20Z","timestamp":1592514920000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["CESS-A System to Categorize Bangla Web Text Documents"],"prefix":"10.1145","volume":"19","author":[{"given":"Ankita","family":"Dhar","sequence":"first","affiliation":[{"name":"West Bengal State University, Barasat, Kolkata, West Bengal, India"}]},{"given":"Himadri","family":"Mukherjee","sequence":"additional","affiliation":[{"name":"West Bengal State University, Barasat, Kolkata, West Bengal, India"}]},{"given":"Niladri Sekhar","family":"Dash","sequence":"additional","affiliation":[{"name":"Indian Statistical Unit, Baranagar, Kolkata, West Bengal, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3360-7576","authenticated-orcid":false,"given":"Kaushik","family":"Roy","sequence":"additional","affiliation":[{"name":"West Bengal State University, Barasat, Kolkata, West Bengal, India"}]}],"member":"320","published-online":{"date-parts":[[2020,6,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812809"},{"volume-title":"Proceedings of the International Conference on Bangla Speech and Language Processing. 1--5.","year":"2018","author":"Alam Md Tanvir","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1838002.1838025"},{"key":"e_1_2_1_4_1","first-page":"343","article-title":"A novel approach on Tamil text classification using C-feature","volume":"02","author":"Aruna Devi K.","year":"2014","journal-title":"Int. J. Sci. Res. Dev."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s42452-019-1165-1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2017.10.007"},{"volume-title":"Rumelhart","year":"2013","author":"Chauvin Yves","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2010.08.100"},{"volume-title":"Proceedings of the International Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2475--2485","author":"Choi B. J.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248547.1248548"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-7566-7_6"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-13-1343-1_39"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/IoT-SIU.2018.8519866"},{"key":"e_1_2_1_14_1","first-page":"95","article-title":"Named entity recognition and transliteration in Bengali","volume":"30","author":"Ekbal Asif","year":"2007","journal-title":"Ling. Invest."},{"key":"e_1_2_1_15_1","unstructured":"Ethnologue. 2019. Retrieved from https:\/\/www.ethnologue.com\/language\/ben.  Ethnologue. 2019. Retrieved from https:\/\/www.ethnologue.com\/language\/ben."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2018.03.003"},{"volume-title":"Proceedings of the COLING Workshop 2010 Workshop on South and Southeast Asian Natural Language Processing (WSSANLP\u201912)","year":"2012","author":"Gupta Nidhi","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"volume-title":"Neural Networks for Perception","author":"Hecht-Nielsen Robert","key":"e_1_2_1_19_1"},{"volume-title":"Proceedings of the International Conference on Engineering Research, Innovation and Education (ICERIE\u201917)","year":"2017","author":"Saiful Islam Md.","key":"e_1_2_1_20_1"},{"volume-title":"Proceedings of the International Conference on Electrical, Communication and Computer Engineering (ICECCE\u201917)","year":"2017","author":"Saiful Islam Md.","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-016-2401-x"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1080\/03772063.2015.1021385"},{"volume-title":"Proceedings of the International Conference on Cognitive Computing and Information Processing (CCIP\u201915)","author":"Kabir F.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2019.07.003"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2348283.2348453"},{"volume-title":"Proceedings of the International Conference on Information System and Data Mining (ICISDM\u201918)","author":"Kowsari K.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2017.12.014"},{"volume-title":"Proceedings of the National Information Technology Conference (NITC\u201917)","author":"Lakmali K. B. M.","key":"e_1_2_1_29_1"},{"key":"e_1_2_1_30_1","first-page":"1","article-title":"Incorporating multi-level user preference into document-level sentiment classification","volume":"18","author":"Li Junjie","year":"2018","journal-title":"ACM Trans. Asian Low-Resour. Lang. Inf. Process."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2010.04.004"},{"volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3158--3167","author":"Mahabal A.","key":"e_1_2_1_32_1"},{"volume-title":"Proceedings of the International Conference on Advances in Social Networks Analysis and Mining (ASONAM\u201915)","author":"Malliaros F. D.","key":"e_1_2_1_33_1"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276473"},{"volume-title":"Distributed Computing and Artificial Intelligence","author":"Parvin Hamid","key":"e_1_2_1_35_1"},{"volume-title":"Proceedings of the International Conference on Energy Systems and Applications. IEEE, 689--694","author":"Patil J. J.","key":"e_1_2_1_36_1"},{"key":"e_1_2_1_37_1","first-page":"11","article-title":"Comparison of Marathi text classifiers","volume":"04","author":"Patil Meera","year":"2014","journal-title":"ACEEE Int. J. Inf. Technol."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324914000023"},{"volume-title":"Proceedings of the International Conference on Computational Collective Intelligence (ICCCI\u201916)","author":"Puri Shalini","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJKEDM.2015.074071"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2009.02.010"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.17485\/ijst\/2017\/v10i5\/103233"},{"volume-title":"Proceedings of the International Global Wordnet Conference. 324--329","author":"Sarmah J.","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","unstructured":"Stopwords. 2019. Retrieved from https:\/\/www.isical.ac.in.  Stopwords. 2019. Retrieved from https:\/\/www.isical.ac.in."},{"key":"e_1_2_1_45_1","first-page":"71","article-title":"Arabic text categorization using logistic regression","volume":"06","author":"Al-Tahrawi Mayy M.","year":"2015","journal-title":"Int. J. Intell. Syst. Appl."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2018.03.003"},{"volume-title":"Proceedings of the International Conference on Contemporary Computing. IEEE, 542--546","author":"Thakur S. K.","key":"e_1_2_1_47_1"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.proeng.2014.03.129"},{"volume-title":"Proceedings of the International Conference on Advanced Communications Technology (ICACT\u201914)","author":"Vispute Sushma R.","key":"e_1_2_1_49_1"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2018.04.024"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2012.02.068"},{"key":"e_1_2_1_52_1","unstructured":"wikipedia. 2019. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Languages_used_on_the_Internet.  wikipedia. 2019. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Languages_used_on_the_Internet."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2016.10.003"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3398070","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3398070","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:53Z","timestamp":1750199933000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3398070"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,18]]},"references-count":53,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3398070"],"URL":"https:\/\/doi.org\/10.1145\/3398070","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2020,6,18]]},"assertion":[{"value":"2019-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-06-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}