{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T23:42:53Z","timestamp":1769557373178,"version":"3.49.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,8,16]],"date-time":"2019-08-16T00:00:00Z","timestamp":1565913600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,1,31]]},"abstract":"<jats:p>Sentiment analysis is an important sub-task of Natural Language Processing that aims to determine the polarity of a review. Most of the work done on sentiment analysis is for the resource-rich languages of the world, but very limited work has been done on resource-poor languages. In this work, we focus on developing a Sentiment Analysis System for Roman Urdu, which is a resource-poor language. To this end, a dataset of 11,000 reviews has been gathered from six different domains. Comprehensive annotation guidelines were defined and the dataset was annotated using the multi-annotator methodology. Using the annotated dataset, state-of-the-art algorithms were used to build a sentiment analysis system. To improve the results of these algorithms, four different studies were carried out based on: word-level features, character level features, and feature union. The best results showed that we could reduce the error rate by 12% from the baseline (80.07%). Also, to see if the improvements are statistically significant, we applied t-test and Confidence Interval on the obtained results and found that the best results of each study are statistically significant from the baseline.<\/jats:p>","DOI":"10.1145\/3329709","type":"journal-article","created":{"date-parts":[[2019,8,16]],"date-time":"2019-08-16T19:35:19Z","timestamp":1565984119000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":42,"title":["Sentiment Analysis for a Resource Poor Language\u2014Roman Urdu"],"prefix":"10.1145","volume":"19","author":[{"given":"Khawar","family":"Mehmood","sequence":"first","affiliation":[{"name":"University of New South Wales Australia, Campbell, Australia"}]},{"given":"Daryl","family":"Essam","sequence":"additional","affiliation":[{"name":"University of New South Wales Australia, Campbell, Australia"}]},{"given":"Kamran","family":"Shafi","sequence":"additional","affiliation":[{"name":"University of New South Wales Australia, Campbell, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1392-8866","authenticated-orcid":false,"given":"Muhammad Kamran","family":"Malik","sequence":"additional","affiliation":[{"name":"University of the Punjab, The Mall, Lahore, Pakistan"}]}],"member":"320","published-online":{"date-parts":[[2019,8,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"2014 5th International Conference on Information and Communication Systems (ICICS). IEEE, 1--6.","author":"Duwairi R. M.","unstructured":"R. M. Duwairi , R. Marji , N. Sha'ban , and S. Rushaidat . 2014. Sentiment analysis in Arabic tweets . In 2014 5th International Conference on Information and Communication Systems (ICICS). IEEE, 1--6. R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS). IEEE, 1--6."},{"key":"e_1_2_1_2_1","first-page":"409","article-title":"Urdu-English code switching: The use of Urdu phrases and clauses in Pakistani English (A non-native variety)","volume":"3","author":"Anwar B.","year":"2009","unstructured":"B. Anwar . 2009 . Urdu-English code switching: The use of Urdu phrases and clauses in Pakistani English (A non-native variety) . Int. J. Lang. Stud. 3 , 4 (2009), 409 -- 424 . B. Anwar. 2009. Urdu-English code switching: The use of Urdu phrases and clauses in Pakistani English (A non-native variety). Int. J. Lang. Stud. 3, 4 (2009), 409--424.","journal-title":"Int. J. Lang. Stud."},{"key":"e_1_2_1_3_1","unstructured":"A. Pak and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In LREc (Vol. 10 No. 2010) 1320--1326.  A. Pak and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In LREc (Vol. 10 No. 2010) 1320--1326."},{"key":"e_1_2_1_4_1","volume-title":"Ethnologue: Languages of the World","author":"Simons Gary F.","year":"2017","unstructured":"Gary F. Simons and Charles D. Fennig ( Eds .) . 2017 . Ethnologue: Languages of the World , 20 th edition. Dallas, Texas : SIL International. Retrieved from http:\/\/www.ethnologue.com. Gary F. Simons and Charles D. Fennig (Eds.). 2017. Ethnologue: Languages of the World, 20th edition. Dallas, Texas: SIL International. Retrieved from http:\/\/www.ethnologue.com.","edition":"20"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2436256.2436274"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118704"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asej.2014.04.011"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1361684.1361685"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/WI.2007.50"},{"key":"e_1_2_1_10_1","volume-title":"Science and Information Conference. Springer, Cham, 29--42","author":"Mehmood K.","unstructured":"K. Mehmood , D. Essam , and K. Shafi . 2018. Sentiment analysis system for Roman Urdu . In Science and Information Conference. Springer, Cham, 29--42 . K. Mehmood, D. Essam, and K. Shafi. 2018. Sentiment analysis system for Roman Urdu. In Science and Information Conference. Springer, Cham, 29--42."},{"key":"e_1_2_1_11_1","unstructured":"R. Socher D. Chen C. D. Manning and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems. 926--934.   R. Socher D. Chen C. D. Manning and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems. 926--934."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1672983.1673010"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2015.2444846"},{"key":"e_1_2_1_14_1","first-page":"199","article-title":"Detection of sentiment polarity of unstructured multi-language text from social media","volume":"9","author":"Ahmed S.","year":"2018","unstructured":"S. Ahmed , S. Hina , and R. Asif . 2018 . Detection of sentiment polarity of unstructured multi-language text from social media . Int. J. Adv. Comput. Sci. Appl. 9 , 7 (2018), 199 -- 203 . S. Ahmed, S. Hina, and R. Asif. 2018. Detection of sentiment polarity of unstructured multi-language text from social media. Int. J. Adv. Comput. Sci. Appl. 9, 7 (2018), 199--203.","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"e_1_2_1_15_1","unstructured":"M. Daud R. Khan and A. Daud. 2015. Roman Urdu opinion mining system (RUOMiS). arXiv preprint arXiv:1501.01386.  M. Daud R. Khan and A. Daud. 2015. Roman Urdu opinion mining system (RUOMiS). arXiv preprint arXiv:1501.01386."},{"key":"e_1_2_1_16_1","volume-title":"Mexican International Conference on Artificial Intelligence. Springer","author":"Syed A. Z.","unstructured":"A. Z. Syed , M. Aslam , and A. M. Martinez-Enriquez . 2010. Lexicon based sentiment analysis of Urdu text using SentiUnits . In Mexican International Conference on Artificial Intelligence. Springer , Berlin, 32--43. A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2010. Lexicon based sentiment analysis of Urdu text using SentiUnits. In Mexican International Conference on Artificial Intelligence. Springer, Berlin, 32--43."},{"key":"e_1_2_1_17_1","unstructured":"N. Mukhtar M. A. Khan and N. Chiragh. 2017. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation (2017) 1--11.  N. Mukhtar M. A. Khan and N. Chiragh. 2017. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation (2017) 1--11."},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"N. Mukhtar and M. A. Khan. 2018. Urdu sentiment analysis using supervised machine learning approach. Int. J. Pattern Recognit. Artif. Intell. (2018) 32.  N. Mukhtar and M. A. Khan. 2018. Urdu sentiment analysis using supervised machine learning approach. Int. J. Pattern Recognit. Artif. Intell. (2018) 32.","DOI":"10.1142\/S0218001418510011"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the Second Workshop on Language in Social Media. ACL, 1--8","author":"Mukund S.","unstructured":"S. Mukund and R. K. Srihari . 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations . In Proceedings of the Second Workshop on Language in Social Media. ACL, 1--8 S. Mukund and R. K. Srihari. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the Second Workshop on Language in Social Media. ACL, 1--8"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1177\/001316446002000104"},{"key":"e_1_2_1_21_1","first-page":"3","article-title":"Supervised machine learning: A review of classification techniques","volume":"160","author":"Kotsiantis S. B.","year":"2007","unstructured":"S. B. Kotsiantis , I. Zaharakis , and P. Pintelas . 2007 . Supervised machine learning: A review of classification techniques . Emerging Artificial Intelligence Applications in Computer Engineering 160 (2007), 3 -- 24 . S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160 (2007), 3--24.","journal-title":"Emerging Artificial Intelligence Applications in Computer Engineering"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"T. Hastie R. Tibshirani and J. Friedman. 2009. Overview of supervised learning. In The Elements of Statistical Learning. Springer New York 9--41.  T. Hastie R. Tibshirani and J. Friedman. 2009. Overview of supervised learning. In The Elements of Statistical Learning. Springer New York 9--41.","DOI":"10.1007\/978-0-387-84858-7_2"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.80230"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/130385.130401"},{"key":"e_1_2_1_25_1","first-page":"576","article-title":"Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error","volume":"2001","author":"Zenobi G.","year":"2001","unstructured":"G. Zenobi and P. Cunningham . 2001 . Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error . Machine Learning: ECML 2001 , 576 -- 587 . G. Zenobi and P. Cunningham. 2001. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. Machine Learning: ECML 2001, 576--587.","journal-title":"Machine Learning: ECML"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL, 1046--1056","author":"Yessenalina A.","unstructured":"A. Yessenalina , Y. Yue , and C. Cardie . 2010. Multi-level structured models for document-level sentiment classification . In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL, 1046--1056 . A. Yessenalina, Y. Yue, and C. Cardie. 2010. Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL, 1046--1056."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.asej.2014.04.011"},{"key":"e_1_2_1_28_1","article-title":"Scikit-learn: Machine learning in Python","author":"Pedregosa F.","year":"2011","unstructured":"F. Pedregosa , G. Varoquaux , A. Gramfort , V. Michel , B. Thirion , O. Grisel , and J. Vanderplas . 2011 . Scikit-learn: Machine learning in Python . J. Mach. Learn. Res. 12 , ( Oct. 2011), 2825--2830. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, and J. Vanderplas. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, (Oct. 2011), 2825--2830.","journal-title":"J. Mach. Learn. Res. 12"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.02.088"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.991427"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2013.30"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of 23rd Pacific Asia Conference on Language, Information, and Computation.","volume":"2","author":"Oouchida K.","unstructured":"K. Oouchida , J. D. Kim , T. Takagi , and J. I. Tsujii . 2009. GuideLink: A corpus annotation system that integrates the management of annotation guidelines . In Proceedings of 23rd Pacific Asia Conference on Language, Information, and Computation. Vol. 2 . K. Oouchida, J. D. Kim, T. Takagi, and J. I. Tsujii. 2009. GuideLink: A corpus annotation system that integrates the management of annotation guidelines. In Proceedings of 23rd Pacific Asia Conference on Language, Information, and Computation. Vol. 2."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2015.11.003"},{"key":"e_1_2_1_35_1","first-page":"2267","article-title":"Recurrent convolutional neural networks for text classification","volume":"333","author":"Lai S.","year":"2015","unstructured":"S. Lai , L. Xu , K. Liu , and J. Zhao . 2015 . Recurrent convolutional neural networks for text classification . In AAAI , Vol. 333. 2267 -- 2273 . S. Lai, L. Xu, K. Liu, and J. Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI, Vol. 333. 2267--2273.","journal-title":"AAAI"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00049"},{"key":"e_1_2_1_37_1","volume-title":"2014 5th International Conference on Information and Communication Systems (ICICS), IEEE. 1--6.","author":"Duwairi R. M.","unstructured":"R. M. Duwairi , R. Marji , N. Sha'ban , and S. Rushaidat . 2014. Sentiment analysis in Arabic tweets . In 2014 5th International Conference on Information and Communication Systems (ICICS), IEEE. 1--6. R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS), IEEE. 1--6."},{"key":"e_1_2_1_38_1","first-page":"26","article-title":"Approaches, tools, and applications for sentiment analysis implementation","volume":"125","author":"Alessia D.","year":"2015","unstructured":"D. Alessia , F. Ferri , P. Grifoni , and T. Guzzo . 2015 . Approaches, tools, and applications for sentiment analysis implementation . Int. J. Comput. Appl. 125 , 3 (2015), 26 -- 33 . D. Alessia, F. Ferri, P. Grifoni, and T. Guzzo. 2015. Approaches, tools, and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125, 3 (2015), 26--33.","journal-title":"Int. J. Comput. Appl."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3129290"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-0429"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001409007326"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing.","author":"Socher R.","unstructured":"R. Socher , A. Perelygin , J. Wu , J. Chuang , C. D. Manning , A. Ng , and C. Potts . 2013. Recursive deep models for semantic compositionality over a sentiment treebank . In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the Joint BioLink and 9th Bio-ontologies Meeting. 89--92","author":"Lu Z.","unstructured":"Z. Lu , M. Bada , P. V. Ogren , K. B. Cohen , and L. Hunter . 2006. Improving biomedical corpus annotation guidelines . In Proceedings of the Joint BioLink and 9th Bio-ontologies Meeting. 89--92 . Z. Lu, M. Bada, P. V. Ogren, K. B. Cohen, and L. Hunter. 2006. Improving biomedical corpus annotation guidelines. In Proceedings of the Joint BioLink and 9th Bio-ontologies Meeting. 89--92."},{"key":"e_1_2_1_44_1","first-page":"141","article-title":"Performing natural language processing on roman urdu datasets","volume":"18","author":"Sharf Z.","year":"2018","unstructured":"Z. Sharf and S. U. Rahman . 2018 . Performing natural language processing on roman urdu datasets . Int. J. Comput. Sci. Network Secur. 18 , 1 (2018), 141 -- 148 . Z. Sharf and S. U. Rahman. 2018. Performing natural language processing on roman urdu datasets. Int. J. Comput. Sci. Network Secur. 18, 1 (2018), 141--148.","journal-title":"Int. J. Comput. Sci. Network Secur."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2015.06.015"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3329709","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3329709","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:18Z","timestamp":1750204458000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3329709"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,16]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,31]]}},"alternative-id":["10.1145\/3329709"],"URL":"https:\/\/doi.org\/10.1145\/3329709","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,8,16]]},"assertion":[{"value":"2018-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}