{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T19:49:06Z","timestamp":1759693746589,"version":"3.41.0"},"reference-count":87,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,4,11]],"date-time":"2020-04-11T00:00:00Z","timestamp":1586563200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,7,31]]},"abstract":"<jats:p>User-generated text in social media communication (SMC) is mainly characterized by non-standard form. It may contain code switching (CS) text, a widespread phenomenon in SMC, in addition to noisy elements used, especially in written conversations (use of abbreviations, symbols, emoticons) or misspelled words. All of these factors constitute a wall in front of text mining applications. Common text mining tools are dedicated to standard use of standard languages but cannot deal with other forms, especially written text in social media. To overcome these problems, in this work we present our solution for the normalization of non-standard use of standard and non-standard languages (dialects) in SMC text with the use of existent resources and tools. The main processing in our solution consists of CS normalization from multiple to one language by the use of a machine translation--like approach. This processing relies on a linguistic approach of CS, which aims at identifying automatically the translation source and target languages (without human intervention). The remaining processing operations concern the normalization of SMC special expressions and spelling correction of out-of-vocabulary words. To preserve the coded-switched sentence meaning across translation, we adopt a knowledge-based approach for word sense translation disambiguation reinforced with a multi-lingual vertical context. All of these processes are embedded in what we refer to as the machine normalization system. Our solution can be used as a front-end of text mining processing, enabling the analysis of SMC noisy text. The conducted experiments show that our system performs better than considered baselines.<\/jats:p>","DOI":"10.1145\/3378414","type":"journal-article","created":{"date-parts":[[2020,4,11]],"date-time":"2020-04-11T22:49:22Z","timestamp":1586645362000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Machine Normalization"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3353-6704","authenticated-orcid":false,"given":"Randa","family":"Zarnoufi","sequence":"first","affiliation":[{"name":"Mohammed V University"}]},{"given":"Hamid","family":"Jaafar","sequence":"additional","affiliation":[{"name":"Cadi Ayyad University"}]},{"given":"Mounia","family":"Abik","sequence":"additional","affiliation":[{"name":"Mohammed V University"}]}],"member":"320","published-online":{"date-parts":[[2020,4,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00164"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1609067.1609070"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2016.05.001"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-1010"},{"volume-title":"Proceedings of the 6th Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST-6\u201912)","year":"2012","author":"Apidianaki Marianna","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","first-page":"15","volume-title":"Proceedings of the ACL 2015 Workshop Workshop on Noisy User-Generated Text. 19--27","author":"Saloot Mohammad Arshi","year":"2015"},{"volume-title":"Retrieved","year":"2017","author":"Baldwin Timothy","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45715-1_11"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1630659.1630775"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3902"},{"volume-title":"Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING\u201914)","year":"2014","author":"Basile Pierpaolo","key":"e_1_2_1_11_1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00245"},{"volume-title":"The Syntax of Codeswitching Analysing Moroccan Arabic\/Dutch Conversations","author":"Boumans Louis Patrick","key":"e_1_2_1_13_1"},{"volume-title":"Mercer","year":"1993","author":"Brown Peter F.","key":"e_1_2_1_14_1"},{"volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL\u201907)","year":"2007","author":"Carpuat Marine","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-5801"},{"volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 33--40","year":"2007","author":"Chan Ys","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-004-7692-5"},{"volume-title":"EAMT Conference Proceedings. 79--86","year":"2005","author":"Corb\u00ed-Bellot Antonio M.","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2738045"},{"volume-title":"Fonollosa","year":"2015","author":"Costa-Juss\u00e0 Marta R.","key":"e_1_2_1_21_1"},{"volume-title":"Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra\u201914)","year":"2014","author":"Crego Josep Maria","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/363958.363994"},{"key":"e_1_2_1_24_1","first-page":"41","article-title":"Code-mixing in social media text the last language identification frontier","volume":"54","author":"Das Amitava","year":"2013","journal-title":"Traitement Automatique des Langues"},{"volume-title":"Proceedings of the 1st Workshop on Linguistic Resources for Natural Language Processing. 131--140","year":"2018","author":"Dhar Mrinal","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5594\/J11060"},{"volume-title":"Proceedings of COLING 2012: Posters. 287--296","year":"2012","author":"Elfardy Heba","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Atefeh Farzindar Diana Inkpen Graeme Hirst (Eds.). 2017. Natural Language Processing for Social Media (2nd ed.). Morgan 8 Claypool.  Atefeh Farzindar Diana Inkpen Graeme Hirst (Eds.). 2017. Natural Language Processing for Social Media (2nd ed.). Morgan 8 Claypool.","DOI":"10.2200\/S00809ED2V01Y201710HLT038"},{"volume-title":"WordNet: An electronic lexical database","author":"Fellbaum C.","key":"e_1_2_1_29_1"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324902002978"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-011-9090-0"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034732"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075527.1075579"},{"volume-title":"Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912)","year":"2012","author":"Goldhahn Dirk","key":"e_1_2_1_34_1"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1082"},{"volume-title":"Hamers and Michel Blanc","year":"1983","author":"Josiane","key":"e_1_2_1_36_1"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/2390948.2391000"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.2307\/410058"},{"volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 690--696","author":"Heafield Kenneth","key":"e_1_2_1_39_1"},{"volume-title":"Unsupervised creation of normalization dictionaries for micro-blogs in Arabic, French and English. Computacion y Sistemas 22, 3","year":"2018","author":"Htait Amal","key":"e_1_2_1_40_1"},{"volume-title":"Machine Translation: Past, Present, Future. Ellis Horwood","year":"1986","author":"Hutchins W. John","key":"e_1_2_1_41_1"},{"key":"e_1_2_1_42_1","first-page":"1","article-title":"Introduction to the special issue on word sense disambiguation: The state of the art","volume":"24","author":"Ide Nancy","year":"1998","journal-title":"Computational Linguistics"},{"volume-title":"Proceedings of the 10th Research on Computational Linguistics International Conference. 19--33","author":"Jay","key":"e_1_2_1_44_1"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00065"},{"volume-title":"Processing of sentences with intrasentential code switching","author":"Joshi Aravind K.","key":"e_1_2_1_46_1"},{"volume-title":"Proceedings of the International Conference on Natural Language Processing. 1--7.","author":"Kaufmann Max","key":"e_1_2_1_47_1"},{"volume-title":"Proceedings of the 2nd Conference on Language Resources and Evaluation. 1239--1244","year":"2000","author":"Kilgarriff Adam","key":"e_1_2_1_48_1"},{"volume-title":"Proceedings of the Machine Translation Summit. 79--86","year":"2005","author":"Koehn Philipp","key":"e_1_2_1_49_1"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.3115\/1557769.1557821"},{"volume-title":"WordNet: An Electronic Lexical Database. WordNet An Electron. Lex. database","author":"Leacock Claudia","key":"e_1_2_1_51_1"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118699"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/318723.318728"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00240"},{"volume-title":"Proceedings of the 15th International Conference on Machine Learning (ICML\u201998)","year":"1998","author":"Lin Dekang","key":"e_1_2_1_55_1"},{"volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 176--186","year":"2013","author":"Ling Wang","key":"e_1_2_1_56_1"},{"volume-title":"Proceedings of the IberSPEECH 2012 Workshop. 112--122","year":"2012","author":"Lude\u00f1a Veronica Lopez","key":"e_1_2_1_57_1"},{"volume-title":"Proceedings of the 5th Workshop on NLP for Similar Languages, Varieties, and Dialects. 18--28","year":"2018","author":"Lusetti Massimo","key":"e_1_2_1_58_1"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23138-4_6"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.4.553"},{"key":"e_1_2_1_61_1","first-page":"94","article-title":"Language identification: A solved problem suitable for undergraduate instruction","volume":"20","author":"McNamee Paul","year":"2005","journal-title":"Journal of Computer Sciences in Colleges"},{"volume-title":"Proceedings of the 8th Conference on Computational Natural Language Learning (CoNLL\u201904)","year":"2004","author":"Mihalcea Rada","key":"e_1_2_1_62_1"},{"volume-title":"One Speaker","author":"Muysken Pieter","key":"e_1_2_1_63_1"},{"volume-title":"One Speaker","author":"Myers-Scotton Carol","key":"e_1_2_1_64_1"},{"volume-title":"Duelling Languages: Grammatical Structure in Codeswitching","year":"1997","author":"Myers-Scotton Carol","key":"e_1_2_1_65_1"},{"volume-title":"Two Languages: Bilingual Language Processing","year":"2001","author":"Myers-Scotton Carol","key":"e_1_2_1_66_1"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/1459352.1459355"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2012.07.001"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1515\/ling.1980.18.7-8.581"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.5555\/1625855.1625914"},{"volume-title":"Proceedings of the SaLTMiLWorkshop on Free\/Open-Source Language Resources for the Machine Translation of Less-Resourced Languages (LREC\u201914)","year":"2014","author":"Rudnick Alex","key":"e_1_2_1_71_1"},{"volume-title":"Proceedings of the 13th Conference on Natural Language Processing (KONVENS\u201916)","year":"2016","author":"Scherrer Yves","key":"e_1_2_1_72_1"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-0604"},{"volume-title":"Proceedings of the 10th Machine Translation Summit. 149--156","author":"R. Mahesh","key":"e_1_2_1_74_1"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.5555\/1613715.1613852"},{"volume-title":"Proceedings of the 16th International Conference of the European Association for Machine Translation. 213--220","author":"Tyers Francis M.","key":"e_1_2_1_76_1"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.2478\/v10108-010-0015-5"},{"volume-title":"Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC\u201904)","year":"2004","author":"Vasilescu Florentina","key":"e_1_2_1_78_1"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220672"},{"volume-title":"Proceedings of the 9th International Conference on Language Resources and Evaluation. 188--199","year":"2014","author":"Voss Clare","key":"e_1_2_1_80_1"},{"key":"e_1_2_1_81_1","doi-asserted-by":"crossref","unstructured":"Li Wang Masao Fuketa Kazuhiro Morita and Jun-Ichi Aoe. 2011. Context constraint disambiguation of word semantics by field association schemes. Information Processing 8 Management 47 4 (2011) 560--574. DOI:https:\/\/doi.org\/10.1016\/j.ipm.2011.01.001  Li Wang Masao Fuketa Kazuhiro Morita and Jun-Ichi Aoe. 2011. Context constraint disambiguation of word semantics by field association schemes. Information Processing 8 Management 47 4 (2011) 560--574. DOI:https:\/\/doi.org\/10.1016\/j.ipm.2011.01.001","DOI":"10.1016\/j.ipm.2011.01.001"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324998001946"},{"volume-title":"Proceedings of the 4th Workshop on NLP for Similar Languages, Varieties, and Dialects. 1--6.","author":"Williams Jennifer","key":"e_1_2_1_83_1"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075671.1075731"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.3115\/981658.981684"},{"volume-title":"Proceedings of the SocialNLP workshop at IJCAI 2016","year":"2016","author":"Samih Younes","key":"e_1_2_1_86_1"},{"volume-title":"Information Systems and Technologies to Support Learning. Smart Innovation, Systems and Technologies","author":"Zarnoufi Randa","key":"e_1_2_1_87_1"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2016.02.001"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3378414","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3378414","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:45:04Z","timestamp":1750203904000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3378414"}},"subtitle":["Bringing Social Media Text from Non-Standard to Standard Form"],"short-title":[],"issued":{"date-parts":[[2020,4,11]]},"references-count":87,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,7,31]]}},"alternative-id":["10.1145\/3378414"],"URL":"https:\/\/doi.org\/10.1145\/3378414","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2020,4,11]]},"assertion":[{"value":"2019-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}