{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T10:59:14Z","timestamp":1778497154545,"version":"3.51.4"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T00:00:00Z","timestamp":1675382400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T00:00:00Z","timestamp":1675382400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SN COMPUT. SCI."],"DOI":"10.1007\/s42979-022-01541-y","type":"journal-article","created":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T15:03:54Z","timestamp":1675436634000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["TunBERT: Pretraining BERT for Tunisian Dialect Understanding"],"prefix":"10.1007","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3599-7229","authenticated-orcid":false,"given":"Hatem","family":"Haddad","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ahmed Cheikh","family":"Rouhou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abir","family":"Messaoudi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abir","family":"Korched","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chayma","family":"Fourati","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amel","family":"Sellami","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Moez","family":"Ben HajHmida","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Faten","family":"Ghriss","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,2,3]]},"reference":[{"key":"1541_CR1","unstructured":"Abdul-Mageed Muhammad, Zhang Chiyu, Bouamor Houda, Habash Nizar. NADI 2020: The first nuanced Arabic dialect identification shared task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 97\u2013110, Barcelona, Spain (Online), December 2020. Association for Computational Linguistics. URL https:\/\/aclanthology.org\/2020.wanlp-1.9."},{"key":"1541_CR2","doi-asserted-by":"publisher","unstructured":"Abdul-Mageed Muhammad, Elmadany AbdelRahim, Nagoudi El\u00a0Moatez\u00a0Billah. ARBERT & MARBERT: Deep bidirectional transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7088\u20137105, Online, August 2021. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/2021.acl-long.551. URL https:\/\/aclanthology.org\/2021.acl-long.551.","DOI":"10.18653\/v1\/2021.acl-long.551"},{"key":"1541_CR3","unstructured":"Abu\u00a0Farha Ibrahim, Magdy Walid. Benchmarking transformer-based language models for Arabic sentiment and sarcasm detection. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 21\u201331, Kyiv, Ukraine (Virtual), April 2021. Association for Computational Linguistics. URL https:\/\/aclanthology.org\/2021.wanlp-1.3."},{"key":"1541_CR4","first-page":"5","volume-title":"Alothaim Abdulrahman","author":"Alqahtani Ghadah","year":"2022","unstructured":"Ghadah Alqahtani. Alothaim Abdulrahman. Emotion analysis of arabic tweets: Language models and available resources. Frontiers in Artificial Intelligence; 2022. p. 5."},{"key":"1541_CR5","unstructured":"Antoun Wissam, Baly Fady, Hajj Hazem. AraBERT: Transformer-based model for Arabic language understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, 2020;9\u201315."},{"key":"1541_CR6","unstructured":"Bahdanau Dzmitry. Kyung Hyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015 ; Conference date: 07-05-2015 Through 09-05-2015, January 2015."},{"key":"1541_CR7","unstructured":"Baimukan Nurpeiis, Habash N, Bouamor H. Hierarchical aggregation of dialectal data for arabic dialect identification. In Proceedings of the Language Resources and Evaluation Conference (LREC), Marseille, France. The European Language Resources Association, 2022."},{"key":"1541_CR8","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"Bojanowski Piotr","year":"2017","unstructured":"Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017;5:135\u201346. https:\/\/doi.org\/10.1162\/tacl_a_00051.","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"1541_CR9","unstructured":"Bouamor Houda, Habash Nizar, Salameh Mohammad, Zaghouani Wajdi, Rambow Owen, Abdulrahim Dana, Obeid Ossama, Khalifa Salam, Eryani Fadhl, Erdmann Alexander, Oflazer Kemal. The madar arabic dialect corpus and lexicon. In The International Conference on Language Resources and Evaluation, 2018."},{"key":"1541_CR10","doi-asserted-by":"crossref","unstructured":"Bouamor Houda, Hassan Sabit, Habash Nizar. The MADAR shared task on Arabic fine-grained dialect identification. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, 2019;199\u2013207.","DOI":"10.18653\/v1\/W19-4622"},{"key":"1541_CR11","unstructured":"Canete Jos\u00e9, Chaperon Gabriel, Fuentes Rodrigo, P\u00e9rez Jorge. Spanish pre-trained bert model and evaluation data. In Proceedings of the Practical ML for Developing Countries Workshop at The International Conference on Language Resources and Evaluation, 2020."},{"key":"1541_CR12","doi-asserted-by":"publisher","first-page":"832","DOI":"10.3390\/electronics8080832","volume":"8","author":"V Carvalho Diogo","year":"2019","unstructured":"Carvalho Diogo V, Pereira Eduardo M, Cardoso Jaime S. Machine learning interpretability: A survey on methods and metrics. Electronics. 2019;8:832.","journal-title":"Electronics"},{"key":"1541_CR13","doi-asserted-by":"crossref","unstructured":"Chen Danqi, Fisch A, Weston J, Bordes Antoine. Reading wikipedia to answer open-domain questions. ArXiv, abs\/1704.00051, 2017.","DOI":"10.18653\/v1\/P17-1171"},{"key":"1541_CR14","unstructured":"Conneau Alexis, Lample Guillaume. Cross-lingual language model pretraining. In In Proceedings of tAdvances in Neural Information Processing Systems, 2019;7059\u20137069."},{"key":"1541_CR15","unstructured":"Delobelle Pieter, Winters Thomas, Berendt Bettina. Liu, yinhan and ott, myle and goyal, naman and du, jingfei and joshi, mandar and chen, danqi and levy, omer and lewis, mike and zettlemoyer, luke and stoyanov, veselin. Computing Research Repository, arXiv:1907.11692, 2019. URL https:\/\/arxiv.org\/abs\/1907.11692. version 1."},{"key":"1541_CR16","doi-asserted-by":"crossref","unstructured":"Delobelle Pieter, Winters Thomas, Berendt Bettina. Robbert: a dutch roberta-based language model. Computing Research Repository, arXiv:2001.06286, 2020. URL https:\/\/arxiv.org\/abs\/2001.06286. version 2.","DOI":"10.18653\/v1\/2020.findings-emnlp.292"},{"key":"1541_CR17","first-page":"125","volume":"11","author":"Delobelle Pieter","year":"2022","unstructured":"Pieter Delobelle, Thomas Winters, Bettina Berendt. Robbertje: A distilled dutch bert model. Computational Linguistics in the Netherlands Journal. 2022;11:125\u201340.","journal-title":"Computational Linguistics in the Netherlands Journal"},{"key":"1541_CR18","unstructured":"Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019;4171\u20134186."},{"key":"1541_CR19","unstructured":"El-Haj Mahmoud, Rayson Paul, Aboelezz Mariam. Arabic dialect identification in the context of bivalency and code-switching. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2018;3622\u20133627."},{"key":"1541_CR20","unstructured":"Fourati Chayma, Messaoudi Abir, Haddad Hatem. Tunizi: a tunisian arabizi sentiment analysis dataset. In AfricaNLP Workshop, Putting Africa on the NLP Map. ICLR 2020, Virtual Event, volume arXiv:3091079, 2020. URL https:\/\/arxiv.org\/submit\/3091079."},{"key":"1541_CR21","unstructured":"Harrat Salima, Meftouh Karima, Sma\u00efli Kamel. Maghrebi arabic dialect processing: an overview. Journal of International Science and General Applications, 1, 2018."},{"key":"1541_CR22","doi-asserted-by":"publisher","unstructured":"Horesh SUri. Languages of the middle east and north africa. The SAGE encyclopedia of human communication sciences and disorders, 2019;1:1058\u20131061. https:\/\/doi.org\/10.4135\/9781483380810.n349.","DOI":"10.4135\/9781483380810.n349"},{"key":"1541_CR23","doi-asserted-by":"publisher","unstructured":"Howard Jeremy, Ruder Sebastian. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328\u2013339, Melbourne, Australia, July 2018. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/P18-1031. URL https:\/\/www.aclweb.org\/anthology\/P18-1031.","DOI":"10.18653\/v1\/P18-1031"},{"key":"1541_CR24","unstructured":"Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, Soricut Radu. ALBERT: A lite BERT for self-supervised learning of language representations. In Proceedings of the 8th International Conference on Learning Representations (ICLR), 2020."},{"key":"1541_CR25","unstructured":"Le Hang, Vial Lo\u00efc, Frej Jibril, Segonne Vincent, Coavoux Maximin, Lecouteux Benjamin, Allauzen Alexandre, Crabb\u00e9 Benoit, Besacier Laurent, Schwab Didier. FlauBERT: Unsupervised language model pre-training for French. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), 2020;2479\u20132490."},{"key":"1541_CR26","doi-asserted-by":"crossref","unstructured":"Martin Louis, Muller Benjamin, Ortiz\u00a0Su\u00e1rez Pedro\u00a0Javier, Dupont Yoann, Romary Laurent, de\u00a0la Clergerie \u00c9ric, Seddah Djam\u00e9, Sagot Beno\u00eet. CamemBERT: a tasty French language model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020;7203\u20137219.","DOI":"10.18653\/v1\/2020.acl-main.645"},{"key":"1541_CR27","doi-asserted-by":"crossref","unstructured":"Medhaffar Salima, Bougares Fethi, Est\u00e8ve Yannick, Hadrich-Belguith Lamia. Sentiment analysis of Tunisian dialects: Linguistic ressources and experiments. In Proceedings of the Third Arabic Natural Language Processing Workshop, 2017;55\u201361.","DOI":"10.18653\/v1\/W17-1307"},{"key":"1541_CR28","doi-asserted-by":"crossref","unstructured":"Messaoudi Abir, Cheikhrouhou Ahmed, Haddad Hatem, Ferchichi Nourchene, BenHajhmida Moez, Korched Abir, Naski Malek, Ghriss Faten, Kerkeni Amine. Tunbert: Pretrained contextualized text representation for tunisian dialect. In Akram Bennour, Tolga Ensari, Yousri Kessentini, and Sean Eom, editors, Intelligent Systems and Pattern Recognition, pages 278\u2013290, Cham, 2022. Springer International Publishing. ISBN 978-3-031-08277-1.","DOI":"10.1007\/978-3-031-08277-1_23"},{"key":"1541_CR29","unstructured":"Mikolov Tomas, Chen Kai, Corrado Greg, Dean Jeffrey. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, Workshop Track Proceedings, 2013."},{"key":"1541_CR30","doi-asserted-by":"publisher","unstructured":"Mozannar Hussein, Maamary Elie, El\u00a0Hajal Karl, Hajj Hazem. Neural Arabic question answering. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 108\u2013118, Florence, Italy, August 2019. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/W19-4612. URL https:\/\/www.aclweb.org\/anthology\/W19-4612.","DOI":"10.18653\/v1\/W19-4612"},{"key":"1541_CR31","doi-asserted-by":"crossref","unstructured":"Mulki Hala, Haddad Hatem, Ali Chedi\u00a0Bechikh, Babao\u011flu Ismail. Tunisian dialect sentiment analysis: a natural language processing-based approach. Computaci\u00f3n y Sistemas, 2018a;22(4):1223\u20131232.","DOI":"10.13053\/cys-22-4-3009"},{"issue":"3","key":"1541_CR32","first-page":"15","volume":"58","author":"Mulki Hala","year":"2018","unstructured":"Hala Mulki, Hatem Haddad, Ismail Babao\u011flu. Modern trends in arabic sentiment analysis: A survey. Traitement Automatique des Langues. 2018;58(3):15\u201339.","journal-title":"Traitement Automatique des Langues"},{"key":"1541_CR33","doi-asserted-by":"publisher","unstructured":"Mulki Hala, Haddad Hatem, Gridach Mourad, Babao\u011flu Ismail. Syntax-ignorant n-gram embeddings for dialectal arabic sentiment analysis. Natural Language Engineering, 2020;1\u201324. https:\/\/doi.org\/10.1017\/S135132492000008X.","DOI":"10.1017\/S135132492000008X"},{"key":"1541_CR34","doi-asserted-by":"crossref","unstructured":"Naseem Usman, Razzak Imran, Khan Shah\u00a0Khalid, Prasad Mukesh. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Transactions on Asian and Low-Resource Language Information Processing, 2021;20(5):1\u201335.","DOI":"10.1145\/3434237"},{"key":"1541_CR35","doi-asserted-by":"crossref","unstructured":"Pennington Jeffrey, Socher Richard, Manning Christopher. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014;1532\u20131543.","DOI":"10.3115\/v1\/D14-1162"},{"key":"1541_CR36","doi-asserted-by":"publisher","unstructured":"Peters Matthew, Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, Zettlemoyer Luke. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227\u20132237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/N18-1202. URL https:\/\/www.aclweb.org\/anthology\/N18-1202.","DOI":"10.18653\/v1\/N18-1202"},{"key":"1541_CR37","doi-asserted-by":"publisher","unstructured":"Pires Telmo, Schlinger Eva, Garrette Dan. How multilingual is multilingual BERT? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996\u20135001, Florence, Italy, July 2019. Association for Computational Linguistics. https:\/\/doi.org\/10.18653\/v1\/P19-1493. URL https:\/\/www.aclweb.org\/anthology\/P19-1493.","DOI":"10.18653\/v1\/P19-1493"},{"key":"1541_CR38","unstructured":"Sayadi Karim, Liwicki Marcus, Ingold Rolf, Bui Marc. Tunisian dialect and modern standard arabic dataset for sentiment analysis: Tunisian election context. In Proceedings of The Second International Conference on Arabic Computational Linguistics, ACLING, 2016;35\u201353."},{"key":"1541_CR39","unstructured":"Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan\u00a0N, Kaiser \u0141\u00a0ukasz, Polosukhin Illia. Attention is all you need. In I.\u00a0Guyon, U.\u00a0V. Luxburg, S.\u00a0Bengio, H.\u00a0Wallach, R.\u00a0Fergus, S.\u00a0Vishwanathan, and R.\u00a0Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998\u20136008. Curran Associates, Inc., 2017. URL http:\/\/papers.nips.cc\/paper\/7181-attention-is-all-you-need.pdf."},{"key":"1541_CR40","unstructured":"Virtanen Antti, Kanerva Jenna, Ilo Rami, Luomaa Jouni, Luotolahti Juhani, Salakoski Tapio, Ginter Filip, Pyysalo Sampo. Multilingual is not enough: Bert for finnish. Computing Research Repository, arXiv:1912.07076, 2019. URL https:\/\/arxiv.org\/abs\/1912.07076. version 1."},{"key":"1541_CR41","unstructured":"Wuwei Lan, Yang Chen, Wei Xu, Alan Ritter. Gigabert: Zero-shot transfer learning from english to arabic. In Proceedings of The 2020 Conference on Empirical Methods on Natural Language Processing (EMNLP), 2020."},{"key":"1541_CR42","unstructured":"Zaidan Omar\u00a0F, Callison-Burch Chris. The Arabic online commentary dataset: an annotated dataset of informal Arabic with high dialectal content. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011;37\u201341."},{"key":"1541_CR43","unstructured":"Zhang Susan, Roller Stephen, Goyal Naman, Artetxe Mikel, Chen Moya, Chen Shuohui, Dewan Christopher, Diab Mona, Li Xian, Lin Xi\u00a0Victoria et\u00a0al. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022."}],"container-title":["SN Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-022-01541-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42979-022-01541-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-022-01541-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,3]],"date-time":"2023-02-03T15:32:54Z","timestamp":1675438374000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42979-022-01541-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,3]]},"references-count":43,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["1541"],"URL":"https:\/\/doi.org\/10.1007\/s42979-022-01541-y","relation":{},"ISSN":["2661-8907"],"issn-type":[{"value":"2661-8907","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,3]]},"assertion":[{"value":"21 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 February 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}}],"article-number":"194"}}