{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T21:50:44Z","timestamp":1774129844718,"version":"3.50.1"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2024,12,21]],"date-time":"2024-12-21T00:00:00Z","timestamp":1734739200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2024,12,21]],"date-time":"2024-12-21T00:00:00Z","timestamp":1734739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","award":["311275\/2020-6"],"award-info":[{"award-number":["311275\/2020-6"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","award":["315750\/2021-9"],"award-info":[{"award-number":["315750\/2021-9"]}],"id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004586","name":"Funda\u00e7\u00e3o Carlos Chagas Filho de Amparo \u00e0 Pesquisa do Estado do Rio de Janeiro","doi-asserted-by":"publisher","award":["SEI-260003\/000614\/2023"],"award-info":[{"award-number":["SEI-260003\/000614\/2023"]}],"id":[{"id":"10.13039\/501100004586","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004586","name":"Funda\u00e7\u00e3o Carlos Chagas Filho de Amparo \u00e0 Pesquisa do Estado do Rio de Janeiro","doi-asserted-by":"publisher","award":["E-26\/201.139\/2022"],"award-info":[{"award-number":["E-26\/201.139\/2022"]}],"id":[{"id":"10.13039\/501100004586","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput &amp; Applic"],"published-print":{"date-parts":[[2025,2]]},"DOI":"10.1007\/s00521-024-10711-3","type":"journal-article","created":{"date-parts":[[2024,12,21]],"date-time":"2024-12-21T09:10:03Z","timestamp":1734772203000},"page":"4363-4385","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["BERTweet.BR: a pre-trained language model for tweets in Portuguese"],"prefix":"10.1007","volume":"37","author":[{"given":"Fernando","family":"Carneiro","sequence":"first","affiliation":[]},{"given":"Daniela","family":"Vianna","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0983-2308","authenticated-orcid":false,"given":"Jonnathan","family":"Carvalho","sequence":"additional","affiliation":[]},{"given":"Alexandre","family":"Plastino","sequence":"additional","affiliation":[]},{"given":"Aline","family":"Paes","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,12,21]]},"reference":[{"key":"10711_CR1","unstructured":"Abdelali A, Hassan S, Mubarak H, et\u00a0al (2021) Pre-training bert on arabic tweets: Practical considerations. arXiv preprint arXiv:2102.10684"},{"key":"10711_CR2","doi-asserted-by":"publisher","unstructured":"Barbieri F, Camacho-Collados J, Espinosa\u00a0Anke L, et\u00a0al (2020) TweetEval: Unified benchmark and comparative evaluation for tweet classification. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1644\u20131650, https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.148, https:\/\/aclanthology.org\/2020.findings-emnlp.148","DOI":"10.18653\/v1\/2020.findings-emnlp.148"},{"key":"10711_CR3","unstructured":"Barbieri F, Espinosa-Anke L, Camacho-Collados J (2022) XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. In: Proceedings of LREC"},{"key":"10711_CR4","doi-asserted-by":"publisher","unstructured":"Beltagy I, Lo K, Cohan A (2019) SciBERT: A pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3615\u20133620, https:\/\/doi.org\/10.18653\/v1\/D19-1371","DOI":"10.18653\/v1\/D19-1371"},{"key":"10711_CR5","doi-asserted-by":"crossref","unstructured":"Bird S (2006) Nltk: the natural language toolkit. In: Proceedings of the COLING\/ACL 2006 Interactive Presentation Sessions, 69\u201372","DOI":"10.3115\/1225403.1225421"},{"key":"10711_CR6","unstructured":"Brown T, Mann B, Ryder N, et\u00a0al (2020) Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, et\u00a0al (eds) Advances in Neural Information Processing Systems, vol\u00a033. Curran Associates, Inc., 1877\u20131901, https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf"},{"key":"10711_CR7","unstructured":"Brum H, das Gra\u00e7as Volpe\u00a0Nunes M (2018) Building a Sentiment Corpus of Tweets in Brazilian Portuguese. In: chair) NCC, Choukri K, Cieri C, et\u00a0al (eds) Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan"},{"key":"10711_CR8","unstructured":"Ca\u00f1ete J, Chaperon G, Fuentes R, et\u00a0al (2020) Spanish pre-trained bert model and evaluation data. In: PML4DC at ICLR 2020"},{"key":"10711_CR9","doi-asserted-by":"publisher","unstructured":"Chalkidis I, Fergadiotis M, Malakasiotis P, et\u00a0al (2020) LEGAL-BERT: The muppets straight out of law school. In: Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp 2898\u20132904, https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.261, https:\/\/aclanthology.org\/2020.findings-emnlp.261","DOI":"10.18653\/v1\/2020.findings-emnlp.261"},{"key":"10711_CR10","doi-asserted-by":"publisher","unstructured":"Chan B, Schweter S, M\u00f6ller T (2020) German\u2019s next language model. In: Proceedings of the 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Barcelona, Spain (Online), pp 6788\u20136796, https:\/\/doi.org\/10.18653\/v1\/2020.coling-main.598, https:\/\/aclanthology.org\/2020.coling-main.598","DOI":"10.18653\/v1\/2020.coling-main.598"},{"key":"10711_CR11","doi-asserted-by":"publisher","unstructured":"Conneau A, Khandelwal K, Goyal N, et\u00a0al (2020) Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, pp 8440\u20138451, https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.747, https:\/\/aclanthology.org\/2020.acl-main.747","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"10711_CR12","unstructured":"Data Reportal (2021) Digital 2021: Local country headlines. https:\/\/datareportal.com\/reports\/digital-2021-local-country-headlines, accessed: 2021-10-30"},{"key":"10711_CR13","doi-asserted-by":"publisher","unstructured":"Devlin J, Chang M, Lee K, et\u00a0al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Association for Computational Linguistics, pp 4171\u20134186, https:\/\/doi.org\/10.18653\/v1\/n19-1423,","DOI":"10.18653\/v1\/n19-1423"},{"key":"10711_CR14","unstructured":"Eberhard DM, Simons GF, Fennig CD (2023) Ethnologue: Languages of the World, twenty-sixth edn. SIL International, Dallas, Texas, http:\/\/www.ethnologue.com"},{"key":"10711_CR15","doi-asserted-by":"publisher","unstructured":"\u00c1ngel Gonz\u00e1lez J, Hurtado LF, Pla F (2020) Twilbert: Pre-trained deep bidirectional transformers for spanish twitter. Neurocomputing. https:\/\/doi.org\/10.1016\/j.neucom.2020.09.078,http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0925231220316180","DOI":"10.1016\/j.neucom.2020.09.078"},{"key":"10711_CR16","doi-asserted-by":"publisher","unstructured":"Guo Y, Rennard V, Xypolopoulos C, et\u00a0al (2021) Bertweetfr : Domain adaptation of pre-trained language models for french tweets. In: Xu W, Ritter A, Baldwin T, et\u00a0al (eds) Proceedings of the Seventh Workshop on Noisy User-generated Text, W-NUT 2021, Online, November 11, 2021. Association for Computational Linguistics, pp 445\u2013450, https:\/\/doi.org\/10.18653\/v1\/2021.wnut-1.49,","DOI":"10.18653\/v1\/2021.wnut-1.49"},{"key":"10711_CR17","doi-asserted-by":"publisher","unstructured":"Gururangan S, Marasovic A, Swayamdipta S, et\u00a0al (2020) Don\u2019t stop pretraining: Adapt language models to domains and tasks. In: Jurafsky D, Chai J, Schluter N, et\u00a0al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp 8342\u20138360, https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.740,","DOI":"10.18653\/v1\/2020.acl-main.740"},{"key":"10711_CR18","unstructured":"Hong L, Convertino G, Chi EH (2011) Language matters in twitter: A large scale study. In: Adamic LA, Baeza-Yates R, Counts S (eds) Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011. The AAAI Press, http:\/\/www.aaai.org\/ocs\/index.php\/ICWSM\/ICWSM11\/paper\/view\/2856"},{"key":"10711_CR19","doi-asserted-by":"publisher","unstructured":"Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pp 328\u2013339, https:\/\/doi.org\/10.18653\/v1\/P18-1031, https:\/\/aclanthology.org\/P18-1031","DOI":"10.18653\/v1\/P18-1031"},{"key":"10711_CR20","doi-asserted-by":"crossref","unstructured":"Huertas-Tato J, Martin A, Camacho D (2022) Bertuit: Understanding spanish language in twitter through a native transformer. arXiv preprint arXiv:2204.03465","DOI":"10.1111\/exsy.13404"},{"key":"10711_CR21","unstructured":"Internet World Stats (2020) Internet world users by language. https:\/\/www.internetworldstats.com\/stats7.htm, accessed: 2021-04-07"},{"key":"10711_CR22","doi-asserted-by":"crossref","unstructured":"Koto F, Rahimi A, Lau JH, et\u00a0al (2020) Indolem and indobert: A benchmark dataset and pre-trained language model for indonesian nlp. In: Proceedings of the 28th COLING","DOI":"10.18653\/v1\/2020.coling-main.66"},{"key":"10711_CR23","doi-asserted-by":"publisher","unstructured":"Koto F, Lau JH, Baldwin T (2021) IndoBERTweet: A pretrained language model for Indonesian Twitter with effective domain-specific vocabulary initialization. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp 10,660\u201310,668, https:\/\/doi.org\/10.18653\/v1\/2021.emnlp-main.833, https:\/\/aclanthology.org\/2021.emnlp-main.833","DOI":"10.18653\/v1\/2021.emnlp-main.833"},{"key":"10711_CR24","unstructured":"Lan Z, Chen M, Goodman S, et\u00a0al (2020) ALBERT: A lite BERT for self-supervised learning of language representations. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, https:\/\/openreview.net\/forum?id=H1eA7AEtvS"},{"key":"10711_CR25","unstructured":"Le H, Vial L, Frej J, et\u00a0al (2020) Flaubert: Unsupervised language model pre-training for french. In: Proceedings of The 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, pp 2479\u20132490, https:\/\/www.aclweb.org\/anthology\/2020.lrec-1.302"},{"issue":"4","key":"10711_CR26","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","volume":"36","author":"J Lee","year":"2020","unstructured":"Lee J, Yoon W, Kim S et al (2020) Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234\u20131240","journal-title":"Bioinformatics"},{"key":"10711_CR27","doi-asserted-by":"publisher","unstructured":"Liu B (2020) Sentiment Analysis: Mining Opinions, Sentiments, and Emotions, 2nd edn. Studies in Natural Language Processing, Cambridge University Press,https:\/\/doi.org\/10.1017\/9781108639286","DOI":"10.1017\/9781108639286"},{"key":"10711_CR28","unstructured":"Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: International Conference on Learning Representations, https:\/\/openreview.net\/forum?id=Bkg6RiCqY7"},{"key":"10711_CR29","doi-asserted-by":"crossref","unstructured":"Martin L, Muller B, Ortiz\u00a0Su\u00e1rez PJ, et\u00a0al (2020) Camembert: a tasty french language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","DOI":"10.18653\/v1\/2020.acl-main.645"},{"key":"10711_CR30","doi-asserted-by":"crossref","unstructured":"Martins RF, Pereira A, Benevenuto F (2015) An approach to sentiment analysis of web applications in portuguese. In: Proceedings of the 21st Brazilian Symposium on Multimedia and the Web, pp 105\u2013112","DOI":"10.1145\/2820426.2820446"},{"key":"10711_CR31","unstructured":"Mikolov T, Grave E, Bojanowski P, et\u00a0al (2018) Advances in pre-training distributed word representations. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018)"},{"key":"10711_CR32","doi-asserted-by":"crossref","unstructured":"Moraes SM, Santos AL, Redecker M, et\u00a0al (2016) Comparing approaches to subjectivity classification: A study on portuguese tweets. In: Computational Processing of the Portuguese Language: 12th International Conference, PROPOR 2016, Tomar, Portugal, July 13-15, 2016, Proceedings 12, Springer, pp 86\u201394","DOI":"10.1007\/978-3-319-41552-9_8"},{"key":"10711_CR33","first-page":"1037","volume":"2020","author":"DQ Nguyen","year":"2020","unstructured":"Nguyen DQ, Nguyen AT (2020) PhoBERT: Pre-trained language models for Vietnamese. Find Assoc Comput Linguist: EMNLP 2020:1037\u20131042","journal-title":"Find Assoc Comput Linguist: EMNLP"},{"key":"10711_CR34","doi-asserted-by":"crossref","unstructured":"Nguyen DQ, Vu T, Nguyen AT (2020) BERTweet: A pre-trained language model for English Tweets. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp 9\u201314","DOI":"10.18653\/v1\/2020.emnlp-demos.2"},{"key":"10711_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/08839514.2019.1673037","volume":"34","author":"AE de Oliveira Carosia","year":"2020","unstructured":"de Oliveira Carosia AE, Coelho GP, da Silva AEA (2020) Analyzing the brazilian financial market through portuguese sentiment analysis in social media. Appl Artif Intell 34:1\u201319","journal-title":"Appl Artif Intell"},{"key":"10711_CR36","unstructured":"OpenAI (2023) Gpt-4 technical report. 2303.08774"},{"key":"10711_CR37","unstructured":"Paszke A, Gross S, Massa F, et\u00a0al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, et\u00a0al (eds) Advances in Neural Information Processing Systems 32. Curran Associates, Inc., p 8024\u20138035, http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf"},{"key":"10711_CR38","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: Machine learning in Python. J Machine Learn Res 12:2825\u20132830","journal-title":"J Machine Learn Res"},{"key":"10711_CR39","doi-asserted-by":"publisher","unstructured":"Peters ME, Neumann M, Iyyer M, et\u00a0al (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, pp 2227\u20132237, https:\/\/doi.org\/10.18653\/v1\/N18-1202, https:\/\/aclanthology.org\/N18-1202","DOI":"10.18653\/v1\/N18-1202"},{"key":"10711_CR40","unstructured":"Polignano M, Basile P, De\u00a0Gemmis M, et\u00a0al (2019) Alberto: Italian bert language understanding model for nlp challenging tasks based on tweets. In: 6th Italian Conference on Computational Linguistics, CLiC-it 2019, CEUR, pp 1\u20136"},{"key":"10711_CR41","unstructured":"Radford A, Narasimhan K, Salimans T, et\u00a0al (2018) Improving language understanding by generative pre-training"},{"key":"10711_CR42","doi-asserted-by":"crossref","unstructured":"Ruder S (2019) Neural transfer learning for natural language processing. PhD thesis, NUI Galway","DOI":"10.18653\/v1\/N19-5004"},{"key":"10711_CR43","unstructured":"Sanh V, Debut L, Chaumond J, et\u00a0al (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108"},{"key":"10711_CR44","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s13278-021-00813-4","volume":"11","author":"JS dos Santos","year":"2021","unstructured":"dos Santos JS, Bernardini FC, Paes A (2021) A survey on the use of data and opinion mining in social media to political electoral outcomes prediction. Social Network Analysis and Mining 11:1\u201339","journal-title":"Social Network Analysis and Mining"},{"key":"10711_CR45","doi-asserted-by":"publisher","unstructured":"Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, pp 1715\u20131725, https:\/\/doi.org\/10.18653\/v1\/P16-1162, https:\/\/aclanthology.org\/P16-1162","DOI":"10.18653\/v1\/P16-1162"},{"key":"10711_CR46","doi-asserted-by":"crossref","unstructured":"Souza F, Nogueira R, Lotufo R (2020) BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear)","DOI":"10.1007\/978-3-030-61377-8_28"},{"issue":"2","key":"10711_CR47","doi-asserted-by":"publisher","first-page":"79","DOI":"10.5752\/P.2316-9451.2017v5n2p79","volume":"5","author":"KF de Souza","year":"2017","unstructured":"de Souza KF, Pereira MHR, Dalip DH (2017) Unilex: M\u00e9todo l\u00e9xico para an\u00e1lise de sentimentos textuais sobre conte\u00fado de tweets em portugu\u00eas brasileiro. Abak\u00f3s 5(2):79\u201396","journal-title":"Abak\u00f3s"},{"key":"10711_CR48","unstructured":"Statista (2021) Leading countries based on number of twitter users as of july 2021. https:\/\/www.statista.com\/statistics\/242606\/number-of-active-twitter-users-in-selected-countries, accessed: 2021-10-30"},{"key":"10711_CR49","unstructured":"Touvron H, Lavril T, Izacard G, et\u00a0al (2023) Llama: Open and efficient foundation language models. 2302.13971"},{"issue":"1","key":"10711_CR50","first-page":"3221","volume":"15","author":"L Van Der Maaten","year":"2014","unstructured":"Van Der Maaten L (2014) Accelerating t-sne using tree-based algorithms. J Mach Learn Res 15(1):3221\u20133245","journal-title":"J Mach Learn Res"},{"key":"10711_CR51","unstructured":"Vaswani A, Shazeer N, Parmar N, et\u00a0al (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, et\u00a0al (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 5998\u20136008, https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"10711_CR52","unstructured":"Wagner\u00a0Filho JA, Wilkens R, Idiart M, et\u00a0al (2018) The brWaC corpus: A new open resource for Brazilian Portuguese. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan, https:\/\/aclanthology.org\/L18-1686"},{"key":"10711_CR53","doi-asserted-by":"publisher","unstructured":"Wang A, Singh A, Michael J, et\u00a0al (2018) GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. Association for Computational Linguistics, Brussels, Belgium, pp 353\u2013355, https:\/\/doi.org\/10.18653\/v1\/W18-5446, https:\/\/aclanthology.org\/W18-5446","DOI":"10.18653\/v1\/W18-5446"},{"key":"10711_CR54","unstructured":"Wang A, Pruksachatkun Y, Nangia N, et\u00a0al (2019) Superglue: A stickier benchmark for general-purpose language understanding systems. In: Wallach H, Larochelle H, Beygelzimer A, et\u00a0al (eds) Advances in Neural Information Processing Systems, vol\u00a032. Curran Associates, Inc., https:\/\/proceedings.neurips.cc\/paper\/2019\/file\/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf"},{"key":"10711_CR55","doi-asserted-by":"crossref","unstructured":"Wolf T, Debut L, Sanh V, et\u00a0al (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38\u201345, https:\/\/www.aclweb.org\/anthology\/2020.emnlp-demos.6","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"10711_CR56","unstructured":"Workshop B, :, Scao TL, et\u00a0al (2023) Bloom: A 176b-parameter open-access multilingual language model. 2211.05100"},{"key":"10711_CR57","doi-asserted-by":"publisher","unstructured":"Zhu Y, Kiros R, Zemel R, et\u00a0al (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 19\u201327, https:\/\/doi.org\/10.1109\/ICCV.2015.11, https:\/\/doi.ieeecomputersociety.org\/10.1109\/ICCV.2015.11","DOI":"10.1109\/ICCV.2015.11"},{"key":"10711_CR58","unstructured":"Zhuang L, Wayne L, Ya S, et\u00a0al (2021) A robustly optimized BERT pre-training approach with post-training. In: Proceedings of the 20th Chinese National Conference on Computational Linguistics. Chinese Information Processing Society of China, Huhhot, China, 1218\u20131227, https:\/\/aclanthology.org\/2021.ccl-1.108"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10711-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-024-10711-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-024-10711-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,8]],"date-time":"2025-02-08T19:34:50Z","timestamp":1739043290000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-024-10711-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,21]]},"references-count":58,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["10711"],"URL":"https:\/\/doi.org\/10.1007\/s00521-024-10711-3","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,21]]},"assertion":[{"value":"30 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 December 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflict of interest to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"All datasets considered in this manuscript were gathered from previous work that made them publicly available. Although we have not directly collected any tweets, we are aware that using data collected from the Twitter platform should raise ethical reflections. Even though Twitter users assume their posts are not private, they are usually not explicitly informed that what they write can be used for scientific \u2013 our case \u2013 or commercial \u2013 not our case \u2013 purposes. Besides, they might usually assume that their tweets are ephemeral while they, in fact, can be collected and stored by anyone anywhere. We tried our best not to include sensitive content in our examples and not disclose the identity of their authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Datasets"}},{"value":"Given that this work strongly relies on large-scale language models and datasets composed of social media texts, despite the best intentions, we anticipate possible ethical and social risks by perpetuating social biases and providing false or misleading information. In the case of language models, these risks usually spring from the chosen training corpora used to pre-train such large models. If your intent is to use our pre-trained model or a fine-tuned version in production, please be aware that, while BERTweet.BR like many other models is a powerful tool, it comes with limitations. To enable pre-training on large amounts of data, we scrape all the content we could find from Twitter until the year 2020, taking the best as well as the worst of what was available on this social media.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Language model"}}]}}