{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:41:26Z","timestamp":1775745686247,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>Numerous natural language processing (NLP) applications exist today, especially for the most commonly spoken languages such as English, Chinese, and Spanish. Popular traditional methods such as Rule based methods, Naive Bayes classifiers, Hidden Markov models, Conditional Random field-based classifiers, and other stochastic methods have contributed to this improvement in the past. Recently, deep learning has led to exciting breakthroughs in several areas of artificial intelligence, including image processing and natural language processing. It is important to label words as parts of speech to begin developing most of the NLP applications. A deep study in this area reveals that many popular approaches used for this purpose require massive training data. Therefore, these approaches have not been helpful for languages not rich in digital resources. Applying these methods with very little training data prompts the need for innovative problem-solving. This article describes our research, which examines the strengths and weaknesses of well-known approaches, such as conditional random fields and state-of-the-art deep learning models, when applied for part-of-speech tagging using minimal training data for Assamese and English. We also examine the factors affecting them. We discuss our deep learning architecture and the proposed activation function, which shows promise with little training data. The activation function categorizes words belonging to different classes with more confidence by using the outcomes of statistical methods with SMTaylor SoftMax in our deep learning model. With minimal training, our deep learning architecture using the proposed modification of SM-Taylor SoftMax improves accuracy upto 4%, for our small dataset. This technique is a combination of SMTaylor SoftMax and statistical probability distribution of words over tags.<\/jats:p>","DOI":"10.1145\/3655023","type":"journal-article","created":{"date-parts":[[2024,3,30]],"date-time":"2024-03-30T09:24:01Z","timestamp":1711790641000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Part-of-speech Tagging for Low-resource Languages: Activation Function for Deep Learning Network to Work with Minimal Training Data"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1430-5142","authenticated-orcid":false,"given":"Diganta","family":"Baishya","sequence":"first","affiliation":[{"name":"Computer Science and Engineering, Assam Science and Technology University, Guwahati, India and Computer Science and Engineering, Jorhat Engineering College, Jorhat, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7236-5314","authenticated-orcid":false,"given":"Rupam","family":"Baruah","sequence":"additional","affiliation":[{"name":"Computer Science and Engineering, Assam Science and Technology University, Guwahati, India and Computer Science and Engineering, Jorhat Engineering College, Jorhat, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,5,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1002\/wics.195"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007673816718"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1023868723792"},{"key":"e_1_3_1_5_2","volume-title":"International Conference on Machine Learning","unstructured":"C\u00edcero Nogueira dos Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-Speech Tagging. International Conference on Machine Learning."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-022-00561-y"},{"key":"e_1_3_1_7_2","unstructured":"Richard Socher Yoshua Bengio and Christopher D. Manning. 2012. Deep Learning for NLP (without Magic). In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts page 5 Jeju Island Korea. Association for Computational Linguistics."},{"key":"e_1_3_1_8_2","unstructured":"Tanya Dayanand. 2020. POS tagging using RNN. Towards Data Science. Retrieved from https:\/\/towardsdatascience.com\/pos-tagging-using-rnn-7f08a522f849"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSCEE.2018.8538416"},{"key":"e_1_3_1_10_2","volume-title":"Conference on Empirical Methods in Natural Language Processing","author":"Horsmann Tobias","year":"2017","unstructured":"Tobias Horsmann and Torsten Zesch. 2017. Do LSTMs really work so well for PoS tagging?\u2013A replication study. In Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","unstructured":"Barbara Plank Anders S\u00f8gaard and Yoav Goldberg. 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529 (2016).","DOI":"10.18653\/v1\/P16-2067"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCNT45670.2019.8944460"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2020.101235"},{"key":"e_1_3_1_14_2","volume-title":"In Conference on Empirical Methods in Natural Language Processing","author":"Zheng Xiaoqing","year":"2013","unstructured":"Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","unstructured":"D. Baishya and R. Baruah. 2021. Improving hidden markov model for very low resource languages: An analysis for Assamese parts of speech tagging. In 2021 11th International Conference on Cloud Computing Data Science & Engineering (Confluence). IEEE 142--146.","DOI":"10.1109\/Confluence51648.2021.9377146"},{"issue":"10","key":"e_1_3_1_16_2","article-title":"Highly efficient parts of speech tagging in low resource languages with improved hidden Markov model and deep learning","volume":"12","author":"Baishya Diganta","year":"2021","unstructured":"Diganta Baishya and Rupam Baruah. 2021. Highly efficient parts of speech tagging in low resource languages with improved hidden Markov model and deep learning. Int. J. Advanc. Comput. Sci. Applic. 12, 10 (2021).","journal-title":"Int. J. Advanc. Comput. Sci. Applic."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIMIA48430.2020.9074941"},{"key":"e_1_3_1_18_2","unstructured":"W. Nelson Francis and Henry Kucera. 2001. Brown Corpus Manual: Manual of Information to Accompany a Standard Corpus of Present-day Edited American English for Use with Digital Computers. Retrieved from http:\/\/icame.uib.no\/brown\/bcm.html"},{"key":"e_1_3_1_19_2","unstructured":"John Lafferty McCallum Andrew and Pereira Fernando. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML 1 2 (2001) 3."},{"key":"e_1_3_1_20_2","unstructured":"Trevor A. Cohn. 2007. Scaling conditional random fields for natural language processing. PhD Thesis University of Melbourne Department of Computer Science and Software Engineering Faculty of Engineering."},{"key":"e_1_3_1_21_2","unstructured":"Charles A. Sutton. 2008. Efficient training methods for conditional random fields. Doctoral Dissertations Available from Proquest. AAI3315485. https:\/\/scholarworks.umass.edu\/dissertations\/AAI3315485"},{"key":"e_1_3_1_22_2","unstructured":"Zhiheng Huang Wei Xu and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2966303"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2897327"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2914168"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2022.3175201"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2020.101138"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2017.01.006"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10506-018-9225-1"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-020-09716-9"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42044-020-00063-1"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3380967"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3488381"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Kunal Banerjee Vishak C. Rishi Raj Gupta Kartik Vyas Anushree H. and Biswajit Mishra. Exploring alternatives to softmax function. ArXiv abs\/2011.11538 (2020).","DOI":"10.5220\/0010502000002996"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2852721"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2909737"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2018.2822810"},{"key":"e_1_3_1_38_2","unstructured":"Weiyang Liu Yandong Wen Zhiding Yu and Meng Yang. 2016. Large-Margin SoftMax Loss for Convolutional Neural Networks. ArXiv abs\/1612.02295 (2016)."},{"key":"e_1_3_1_39_2","article-title":"Efficient exact gradient update for training deep networks with very large sparse targets","volume":"28","author":"Vincent Pascal","year":"2015","unstructured":"Pascal Vincent, Alexandre De Br\u00e9bisson, and Xavier Bouthillier. 2015. Efficient exact gradient update for training deep networks with very large sparse targets. Adv. Neural Inf. Process. Syst. 28, (2015).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-70096-0_43"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078186"},{"key":"e_1_3_1_42_2","volume-title":"Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP","unstructured":"Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, and Anders S\u00f8gaard. 2018. Character-level supervision for low-resource POS tagging. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, Melbourne. Association for Computational Linguistics, 1--11."},{"key":"e_1_3_1_43_2","doi-asserted-by":"crossref","unstructured":"Lyan Verwimp Joris Pelemans Hugo Van hamme and Patrick Wambacq. 2017. Character-word LSTM language models. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1 Valencia Spain. Association for Computational Linguistics 417--427.","DOI":"10.18653\/v1\/E17-1040"},{"key":"e_1_3_1_44_2","first-page":"265","volume-title":"In 28th International Conference on Machine Learning (ICML\u201911)","author":"Le Quoc","year":"2011","unstructured":"Quoc Le, Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, and Andrew. Ng. 2011. On optimization methods for deep learning. In 28th International Conference on Machine Learning (ICML\u201911). 265\u2013272."},{"key":"e_1_3_1_45_2","article-title":"An exploration of SoftMax alternatives belonging to the spherical loss family","author":"Brebisson Alexandre de","unstructured":"Alexandre de Brebisson and Pascal Vincent. An exploration of SoftMax alternatives belonging to the spherical loss family. In International Conference on Learning Representations (ICLR\u201916).","journal-title":"International Conference on Learning Representations (ICLR\u201916)"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/AICCSA56895.2022.10017934"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/UKSim.2013.91"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.3115\/1667583.1667595"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3655023","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3655023","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:03:51Z","timestamp":1750291431000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3655023"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,10]]},"references-count":47,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3655023"],"URL":"https:\/\/doi.org\/10.1145\/3655023","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,10]]},"assertion":[{"value":"2023-01-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-24","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}