{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,5]],"date-time":"2026-04-05T20:37:44Z","timestamp":1775421464862,"version":"3.50.1"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>Tamil text segmentation is a long-standing test in language comprehension that entails separating a record into adjacent pieces based on its semantic design. Each segment is important in its own way. The segments are organised according to the purpose of the content examination as text groups, sentences, phrases, words, characters or any other data unit. That process has been portioned using rapid tangled neural organisation in this research, which presents content segmentation methods based on deep learning in natural language processing (NLP). This study proposes a bidirectional long short-term memory (Bi-LSTM) neural network prototype in which fast recurrent neural networks (FRNNs) are used to learn Tamil text group embedding and phrases are fragmented using text-oriented data. As a result, this prototype is capable of handling variable measured setting data and gives a vast new dataset for naturally segmenting text in Tamil. In addition, we develop a segmentation prototype and show how well it sums up to unnoticeable regular content using this dataset as a base. With Bi-LSTM, the segmentation precision of FRNN is superior to that of other segmentation approaches; however, it is still inferior to that of certain other techniques. Every content is scaled to the required size in the proposed framework, which is immediately accessible for the preparation. This means, each word in a scaled Tamil text is employed to prepare neural organisation as fragmented content. The results reveal that the proposed framework produces high rates of segmentation for manually authored material that are nearly equivalent to segmentation-based plans.<\/jats:p>","DOI":"10.1145\/3643808","type":"journal-article","created":{"date-parts":[[2024,2,7]],"date-time":"2024-02-07T11:58:52Z","timestamp":1707307132000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["<b>Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil Text Segmentation in NLP<\/b>"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7929-2752","authenticated-orcid":false,"given":"Vinotheni","family":"C.","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Puducherry Technological University, Puducherry, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6019-7912","authenticated-orcid":false,"given":"S. Lakshmana","family":"Pandian","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Puducherry Technological University, Puducherry, India"}]}],"member":"320","published-online":{"date-parts":[[2024,5,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-45358-8"},{"issue":"2","key":"e_1_3_1_3_2","first-page":"179","article-title":"A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text","volume":"19","author":"Xiong Ying","year":"2019","unstructured":"Ying Xiong, Zhongmin Wang, Dehuan Jiang, Xiaolong Wang, Qingcai Chen, Hua Xu, Jun Yan, and Buzhou Tang. 2019. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text. BMC Medical Informatics and Decision-making 19, 2 (2019), 179\u2013184.","journal-title":"BMC Medical Informatics and Decision-making"},{"issue":"2","key":"e_1_3_1_4_2","first-page":"1","article-title":"Wasf-Vec: Topology-based word embedding for modern standard Arabic and Iraqi dialect ontology","volume":"19","author":"Abdulhameed Tiba Zaki","year":"2019","unstructured":"Tiba Zaki Abdulhameed, Imed Zitouni, and Ikhlas Abdel-Qader. 2019. Wasf-Vec: Topology-based word embedding for modern standard Arabic and Iraqi dialect ontology. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 2 (2019), 1\u201327.","journal-title":"ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocx090"},{"key":"e_1_3_1_6_2","first-page":"1","article-title":"Text localization using standard deviation analysis of structure elements and support vector machines","author":"Zagoris Konstantinos","year":"2011","unstructured":"Konstantinos Zagoris, Savvas A. Chatzichristofis, and Nikos Papamarkos. 2011. Text localization using standard deviation analysis of structure elements and support vector machines. EURASIP Journal on Advances in Signal Processing (2011), 1\u201312.","journal-title":"EURASIP Journal on Advances in Signal Processing"},{"key":"e_1_3_1_7_2","first-page":"9","volume-title":"National Conference on Advanced Computing and Communications (NCACC\u201912)","author":"Kumar M. R.","year":"2012","unstructured":"M. R. Kumar, N. N. Shetty, and B. P. Pragathi. 2012. Tamil text line segmentation of handwritten documents using clustering method based on thresholding approach. In National Conference on Advanced Computing and Communications (NCACC\u201912). 9\u201312."},{"key":"e_1_3_1_8_2","first-page":"87","article-title":"Deep learning-based text segmentation in NLP using fast recurrent neural network with bi-LSTM","volume":"38","author":"Vinotheni C.","year":"2021","unstructured":"C. Vinotheni and S. Lakshmana Pandian. 2021. Deep learning-based text segmentation in NLP using fast recurrent neural network with bi-LSTM. Smart Intelligent Computing and Communication Technology 38 (2021), 87\u201393.","journal-title":"Smart Intelligent Computing and Communication Technology"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1134\/S1054661813010136"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2846095"},{"key":"e_1_3_1_11_2","volume-title":"Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages","author":"Chakravarthi Bharathi Raja","year":"2021","unstructured":"Bharathi Raja Chakravarthi, Ruba Priyadharshini, Parameswari Krishnamurthy, and Elizabeth Sherly. 2021. Proceedings of the 1st Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics."},{"key":"e_1_3_1_12_2","volume-title":"Proceedings of the 2nd Workshop on Speech and Language Technologies for Dravidian Languages","author":"Chakravarthi Bharathi Raja","year":"2022","unstructured":"Bharathi Raja Chakravarthi, Ruba Priyadharshini, Parameswari Krishnamurthy, Elizabeth Sherly, and Sinnathamby Mahesan. 2022. Proceedings of the 2nd Workshop on Speech and Language Technologies for Dravidian Languages. Association for Computational Linguistics."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503162.3503177"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cageo.2018.08.006"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-019-2617-8"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.4018\/IJTHI.2019070104"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.01.085"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46681-1_42"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cvi.2017.0468"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocx090"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cageo.2018.08.006"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2016.7841067"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2019.05.026"},{"key":"e_1_3_1_24_2","volume-title":"European Conference on Information Retrieval","author":"Badjatiya Pinkesh","year":"2018","unstructured":"Pinkesh Badjatiya et al. 2018. Attention-based neural Tamil text segmentation. In European Conference on Information Retrieval. Springer, Cham."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3375959.3375990"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-018-0304-3"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDARW.2019.50110"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-020-09624-9"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.17485\/IJST\/v14i7.2146"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.5220\/0010330206280634"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-ipr.2019.0208"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.112916"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-017-4745-3"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2020.100302"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3096823"},{"key":"e_1_3_1_36_2","unstructured":"HPLabs Isolated Handwritten Tamil Character Dataset. (June 2013). Retrieved from http:\/\/lipitk.sourceforge.net\/datasets\/tamilchardata.htm"},{"key":"e_1_3_1_37_2","volume-title":"International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA\u201904)","author":"Agrawal Mudit","year":"2004","unstructured":"Mudit Agrawal, Ajay S. Bhaskarabhatla, and Sriganesh Madhvanath. 2004. Data collection for handwriting corpus creation in Indic scripts. In International Conference on Speech and Language Technology and Oriental COCOSDA (ICSLT-COCOSDA\u201904). Citeseer."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-017-4745-3"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-014-0222-y"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2013.08.009"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10044-009-0147-0"},{"issue":"8","key":"e_1_3_1_42_2","first-page":"0975","article-title":"A recognition of Tamil handwritten characters using Daubechies wavelet transforms and feed-forward backpropagation network","volume":"64","author":"Jose T. M.","year":"2013","unstructured":"T. M. Jose and A. Wahi. 2013. A recognition of Tamil handwritten characters using Daubechies wavelet transforms and feed-forward backpropagation network. International Journal of Computer Applications 64, 8 (2013), 0975\u20138887.","journal-title":"International Journal of Computer Applications"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4218-3_46"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2013.162"},{"key":"e_1_3_1_45_2","unstructured":"HPLabs Handwritten Tamil Word Dataset (2006). Retrieved from http:\/\/lipitk.sourceforge.net\/datasets\/tamilworddata.htm"},{"key":"e_1_3_1_46_2","first-page":"415","volume-title":"2010 12th International Conference on Frontiers in Handwriting Recognition","author":"Nethravathi B.","year":"2010","unstructured":"B. Nethravathi, C. P. Archana, K. Shashikiran, A. G. Ramakrishnan, and V. Kumar. 2010. Creation of a huge annotated database for Tamil and Kannada OHR. In 2010 12th International Conference on Frontiers in Handwriting Recognition. IEEE, 415\u2013420."},{"key":"e_1_3_1_47_2","unstructured":"Tamil Handwritten Documents Dataset in ResearchGate repository. Retrieved from https:\/\/www.researchgate.net\/publication\/362490821_Tamil_Handwritten_Documents_Dataset"},{"key":"e_1_3_1_48_2","first-page":"311","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Zhang Longkai","year":"2013","unstructured":"Longkai Zhang, Houfeng Wang, Xu Sun, and Mairgup Mansur. 2013. Exploring representations from unlabeled data with cotraining for Chinese word segmentation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 311\u2013321."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643808","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3643808","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:57:46Z","timestamp":1750294666000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3643808"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,10]]},"references-count":47,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3643808"],"URL":"https:\/\/doi.org\/10.1145\/3643808","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,10]]},"assertion":[{"value":"2022-07-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-24","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-05-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}