{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,14]],"date-time":"2026-07-14T18:56:14Z","timestamp":1784055374065,"version":"3.55.0"},"reference-count":55,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,2,4]],"date-time":"2024-02-04T00:00:00Z","timestamp":1707004800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,4]],"date-time":"2024-02-04T00:00:00Z","timestamp":1707004800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100008095","name":"Carleton University","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100008095","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The transformer model is a famous natural language processing model proposed by Google in 2017. Now, with the extensive development of deep learning, many natural language processing tasks can be solved by deep learning methods. After the BERT model was proposed, many pre-trained models such as the XLNet model, the RoBERTa model, and the ALBERT model were also proposed\u00a0in the research community. These models perform very well in various natural language processing tasks. In this paper, we describe and compare these well-known\u00a0models. In addition, we\u00a0also apply several types of existing and well-known\u00a0models which are the BERT model, the XLNet model, the RoBERTa model, the GPT2 model, and the ALBERT model to different existing and\u00a0well-known\u00a0natural language processing tasks, and analyze each model based on their performance. There are a\u00a0few papers that comprehensively compare various transformer models. In our paper, we use six\u00a0types of well-known\u00a0tasks, such as\u00a0sentiment analysis, question answering, text generation, text summarization, name entity recognition, and topic modeling tasks to compare the performance of\u00a0various transformer models. In addition, using the existing models, we also propose\u00a0ensemble learning models\u00a0for the\u00a0different natural language processing tasks. The results show that our ensemble learning models\u00a0 perform better than a single classifier\u00a0on specific tasks.<\/jats:p><jats:p><jats:bold>Graphical Abstract<\/jats:bold><\/jats:p>","DOI":"10.1186\/s40537-023-00842-0","type":"journal-article","created":{"date-parts":[[2024,2,4]],"date-time":"2024-02-04T14:01:58Z","timestamp":1707055318000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":120,"title":["Survey of transformers and towards ensemble learning using transformers for natural language processing"],"prefix":"10.1186","volume":"11","author":[{"given":"Hongzhi","family":"Zhang","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"M. Omair","family":"Shafiq","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,2,4]]},"reference":[{"key":"842_CR1","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I. Attention is all you need. Adv neural Inf Process Syst. 2017;30."},{"key":"842_CR2","unstructured":"Vajjala S, Majumder B, Gupta A, Surana H. Practical natural language processing: a comprehensive guide to building real-world NLP systems. O'Reilly Media; 2020."},{"key":"842_CR3","unstructured":"Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. North American Chapter of the Association for Computational Linguistics; 2019."},{"issue":"8","key":"842_CR4","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford A, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.","journal-title":"OpenAI blog."},{"key":"842_CR5","unstructured":"Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV. Xlnet: Generalized autoregressive pretraining for language understanding. Adv neural Inf Process Syst. 2019;32."},{"key":"842_CR6","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. RoBERTa: a robustly optimized BERT pretraining approach. CoRR; 2019. arXiv:1907.11692."},{"key":"842_CR7","unstructured":"Lan Z et al. ALBERT: a lite bert for self-supervised learning of language representations; 2019. arXiv preprint arXiv:1909.11942."},{"key":"842_CR8","doi-asserted-by":"publisher","first-page":"131662","DOI":"10.1109\/ACCESS.2020.3009626","volume":"8","author":"K Mishev","year":"2020","unstructured":"Mishev K, Gjorgjevikj A, Vodenska I, Chitkushev LT, Trajanov D. Evaluation of sentiment analysis in finance: from Lexicons to transformers. IEEE Access. 2020;8:131662\u201382.","journal-title":"IEEE Access"},{"key":"842_CR9","doi-asserted-by":"crossref","unstructured":"Kaliyar RK. A multi-layer bidirectional transformer encoder for pre-trained word embedding: a survey of BERT. In: 2020 10th international conference on cloud computing, data science & engineering (confluence). IEEE; 2020.","DOI":"10.1109\/Confluence47617.2020.9058044"},{"key":"842_CR10","doi-asserted-by":"crossref","unstructured":"Sun S, Cheng Y, Gan Z, Liu J. Patient knowledge distillation for BERT model compression. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing. 2019; pp. 4323\u201332.","DOI":"10.18653\/v1\/D19-1441"},{"key":"842_CR11","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter; 2019. CoRR arXiv:1910.01108."},{"key":"842_CR12","doi-asserted-by":"crossref","unstructured":"Song X, Wang G, Wu Z, Huang Y, Su D, Yu D, Meng H. Speech-XLNet: unsupervised acoustic model pretraining for self-attention networks; 2019. arXiv:1910.10387","DOI":"10.21437\/Interspeech.2020-1511"},{"key":"842_CR13","doi-asserted-by":"crossref","unstructured":"Alshahrani A, Ghaffari M, Amirizirtol K, Liu X. Identifying optimism and pessimism in twitter messages using XLNet and deep consensus. In: 2020 international joint conference on neural networks; 2020. pp. 1\u20138.","DOI":"10.1109\/IJCNN48605.2020.9206948"},{"key":"842_CR14","doi-asserted-by":"crossref","unstructured":"Ethayarajh K. How contextual are contextualized word representations? comparing the geometry of BERT, ELMo, and GPT-2 embeddings. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing; 2019. p. 55\u201365.","DOI":"10.18653\/v1\/D19-1006"},{"key":"842_CR15","unstructured":"Klein T, Nabi M. Learning to answer by learning to ask: getting the best of GPT-2 and BERT worlds. CoRR; 2019. arXiv:1911.02365."},{"key":"842_CR16","doi-asserted-by":"crossref","unstructured":"Delobelle P, Winters T, Berendt B. RobBERT: a Dutch RoBERTa-based language model. In: Findings of the association for computational linguistics: the 2020 conference on empirical methods in natural language processing; 2020. pp. 3255\u20133265.","DOI":"10.18653\/v1\/2020.findings-emnlp.292"},{"key":"842_CR17","doi-asserted-by":"crossref","unstructured":"Chernyavskiy A, Ilvovsky D, Nakov P. Aschern at SemEval-2020 Task 11: It takes three to tango: RoBERTa, CRF, and transfer learning. In: Proceedings of the fourteenth workshop on semantic evaluation; 2020. p. 1462\u20131468.","DOI":"10.18653\/v1\/2020.semeval-1.191"},{"key":"842_CR18","unstructured":"Polignano M, Basile P, De Gemmis M, Semeraro G, Basile V. Alberto: Italian BERT language understanding model for NLP challenging tasks based on tweets. In: CEUR workshop proceedings. Vol. 2481; 2019. p. 1\u20136."},{"key":"842_CR19","unstructured":"Moradshahi M, Palangi H, Lam MS, Smolensky P, Gao J. HUBERT utangles BERT to improve transfer across NLP tasks. CoRR, 2019. arXiv:1910.12647."},{"key":"842_CR20","doi-asserted-by":"crossref","unstructured":"Wu Z, Zheng H, Wang J, Su W, Fong J. Bnu-hkbu uic nlp team 2 at semeval-2019 task 6: detecting offensive language using BERT model. In: Proceedings of the 13th international workshop on semantic evaluation; 2019. p. 551\u2013555.","DOI":"10.18653\/v1\/S19-2099"},{"key":"842_CR21","doi-asserted-by":"publisher","first-page":"154290","DOI":"10.1109\/ACCESS.2019.2946594","volume":"7","author":"Z Gao","year":"2019","unstructured":"Gao Z, Feng A, Song X, Xi W. Target-dependent sentiment classification with BERT. IEEE Access. 2019;7:154290\u20139.","journal-title":"IEEE Access"},{"key":"842_CR22","unstructured":"Gonz\u00e1lez-Carvajal S, Garrido-Merch\u00e1n EC. Comparing BERT against traditional machine learning text classification. CoRR; 2020. arXiv:2005.13012."},{"key":"842_CR23","unstructured":"Baruah A, Das K, Barbhuiya F, Dey K. Aggression identification in English, Hindi and Bangla text using BERT, RoBERTa and SVM. In: Proceedings of the second workshop on trolling, aggression and cyberbullying; 2020. p. 76\u201382."},{"key":"842_CR24","doi-asserted-by":"crossref","unstructured":"Lee S, Jang H, Baik Y, Park S, Shin H. KR-BERT: a small-scale Korean-specific language model. CoRR; 2020. arXiv:2008.03979.","DOI":"10.5626\/JOK.2020.47.7.682"},{"key":"842_CR25","doi-asserted-by":"crossref","unstructured":"Li H et al. Comparing BERT and XLNet from the perspective of computational characteristics. In: 2020 international conference on electronics, information, and communication (ICEIC). IEEE; 2020.","DOI":"10.1109\/ICEIC49074.2020.9051081"},{"key":"842_CR26","unstructured":"Banerjee S, Jayapal A, Thavareesan S. NUIG-Shubhanker@ Dravidian-CodeMix-FIRE2020: sentiment analysis of code-mixed dravidian text using XLNet. arXiv preprint; 2020. arXiv:2010.07773."},{"key":"842_CR27","doi-asserted-by":"crossref","unstructured":"Ekta S, Tannert S, Frassinelli D, Bulling A, Vu NT. Interpreting attention models with human visual attention in machine reading comprehension. CoNLL; 2020. p. 12\u201325.","DOI":"10.18653\/v1\/2020.conll-1.2"},{"key":"842_CR28","doi-asserted-by":"crossref","unstructured":"Iandola FN et al. SqueezeBERT: what can computer vision teach NLP about efficient neural networks?. arXiv preprint; 2020. arXiv:2006.11316.","DOI":"10.18653\/v1\/2020.sustainlp-1.17"},{"key":"842_CR29","doi-asserted-by":"crossref","unstructured":"Chalkidis I et al. LEGAL-BERT: the muppets straight out of law school. arXiv preprint; 2020. arXiv:2010.02559.","DOI":"10.18653\/v1\/2020.findings-emnlp.261"},{"key":"842_CR30","doi-asserted-by":"crossref","unstructured":"Lee LH et al. NCUEE at MEDIQA 2019: medical text inference using ensemble BERT-BiLSTM-attention model. In: Proceedings of the 18th BioNLP workshop and shared task; 2019.","DOI":"10.18653\/v1\/W19-5058"},{"key":"842_CR31","unstructured":"Bashmal L, AlZeer D. ArSarcasm shared task: an ensemble BERT model for SarcasmDetection in Arabic Tweets. In: Proceedings of the sixth Arabic natural language processing workshop; 2021."},{"key":"842_CR32","unstructured":"Nagarajan A, Sen S, Stevens J R, et al. Optimizing transformers with approximate computing for faster, smaller and more accurate NLP models. arXiv preprint; 2020. arXiv:2010.03688."},{"key":"842_CR33","unstructured":"Shen S, Yao Z, Gholami A et al. Powernorm: Rethinking batch normalization in transformers. In: International conference on machine learning. PMLR; 2020. p. 8741\u201351."},{"key":"842_CR34","doi-asserted-by":"crossref","unstructured":"Li R, Xiao W, Wang L, et al. T3-Vis: a visual analytic framework for training and fine-tuning transformers in NLP. arXiv preprint; 2021. arXiv:2108.13587.","DOI":"10.18653\/v1\/2021.emnlp-demo.26"},{"issue":"1","key":"842_CR35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41746-020-00373-5","volume":"4","author":"L Rasmy","year":"2021","unstructured":"Rasmy L, et al. Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Dig Med. 2021;4(1):1\u201313.","journal-title":"NPJ Dig Med"},{"key":"842_CR36","doi-asserted-by":"crossref","unstructured":"Sch\u00fctz M et al. Automatic fake news detection with pre-trained transformer models. In: International conference on pattern recognition. Cham: Springer; 2021.","DOI":"10.1007\/978-3-030-68787-8_45"},{"issue":"23","key":"842_CR37","doi-asserted-by":"publisher","first-page":"17309","DOI":"10.1007\/s00521-020-05102-3","volume":"32","author":"RA Potamias","year":"2020","unstructured":"Potamias RA, Siolas G, Stafylopatis AG. A transformer-based approach to irony and sarcasm detection. Neural Comput Appl. 2020;32(23):17309\u201320.","journal-title":"Neural Comput Appl"},{"key":"842_CR38","doi-asserted-by":"crossref","unstructured":"Souza F, Nogueira R, Lotufo R. BERTimbau: pretrained BERT models for Brazilian Portuguese. In: Brazilian conference on intelligent systems. Cham: Springer; 2020.","DOI":"10.1007\/978-3-030-61377-8_28"},{"key":"842_CR39","unstructured":"Gonz\u00e1lez-Carvajal S, Garrido-Merch\u00e1n EC. Comparing BERT against traditional machine learning text classification. arXiv preprint; 2020. arXiv:2005.13012."},{"key":"842_CR40","doi-asserted-by":"crossref","unstructured":"Choi H et al. Evaluation of BERT and ALBERT sentence embedding performance on downstream NLP tasks. In: 2020 25th international conference on pattern recognition (ICPR). IEEE; 2021.","DOI":"10.1109\/ICPR48806.2021.9412102"},{"key":"842_CR41","doi-asserted-by":"crossref","unstructured":"Koutsikakis J et al. Greek-bert: the greeks visiting sesame street. In: 11th Hellenic conference on artificial intelligence; 2020.","DOI":"10.1145\/3411408.3411440"},{"key":"842_CR42","doi-asserted-by":"publisher","DOI":"10.1016\/j.health.2022.100078","volume":"2","author":"K Hall","year":"2022","unstructured":"Hall K, Chang V, Jayne C. A review on natural language processing models for COVID-19 research. Healthc Anal. 2022;2: 100078.","journal-title":"Healthc Anal"},{"key":"842_CR43","first-page":"100334","volume":"9","author":"S Casola","year":"2022","unstructured":"Casola S, Lauriola I, Lavelli A. Pre-trained transformers: an empirical comparison. Mach Learn Appl. 2022;9:100334.","journal-title":"Mach Learn Appl"},{"key":"842_CR44","unstructured":"Friedman S et al. From Unstructured Text to Causal Knowledge Graphs: A Transformer-Based Approach. arXiv preprint; 2022. arXiv:2202.11768."},{"key":"842_CR45","unstructured":"Troxler A, Schelldorfer J. Actuarial applications of natural language processing using transformers: case studies for using text features in an actuarial context. arXiv preprint; 2022. arXiv:2206.02014."},{"key":"842_CR46","doi-asserted-by":"publisher","first-page":"68675","DOI":"10.1109\/ACCESS.2021.3077350","volume":"9","author":"S Singh","year":"2021","unstructured":"Singh S, Mahmood A. The NLP cookbook: modern recipes for transformer based deep learning architectures. IEEE Access. 2021;9:68675\u2013702.","journal-title":"IEEE Access"},{"issue":"10","key":"842_CR47","doi-asserted-by":"publisher","first-page":"4301","DOI":"10.1021\/acsbiomaterials.2c00737","volume":"8","author":"E Khare","year":"2022","unstructured":"Khare E, et al. CollagenTransformer: end-to-end transformer model to predict thermal stability of collagen triple helices using an NLP approach. ACS Biomater Sci Eng. 2022;8(10):4301\u201310.","journal-title":"ACS Biomater Sci Eng"},{"key":"842_CR48","unstructured":"Dataset for sentiment analysis task. https:\/\/www.kaggle.com\/datatattle\/covid-19-nlp-text-classification"},{"key":"842_CR49","unstructured":"Dataset for question answering task. https:\/\/rajpurkar.github.io\/SQuAD-explorer\/."},{"key":"842_CR50","unstructured":"Dataset for NER task. https:\/\/www.kaggle.com\/shoumikgoswami\/annotated-gmb-corpus"},{"key":"842_CR51","unstructured":"Dataset for text summarization task. https:\/\/www.tensorflow.org\/datasets\/catalog\/cnn_dailymail"},{"key":"842_CR52","unstructured":"Dataset for topic modeling task. https:\/\/www.kaggle.com\/vbmokin\/nlp-with-disaster-tweets-cleaning-data"},{"key":"842_CR53","unstructured":"Dataset for text generation task. https:\/\/www.kaggle.com\/rishabh6377\/trump-2020-election-speech"},{"key":"842_CR54","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2019.105837","volume":"86","author":"MH Ribeiro","year":"2020","unstructured":"Ribeiro MH, dos Santos Coelho L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput. 2020;86:105837.","journal-title":"Appl Soft Comput."},{"key":"842_CR55","doi-asserted-by":"crossref","unstructured":"Kumar A, Mayank J. Ensemble learning for AI developers. BApress: Berkeley, CA, USA; 2020.","DOI":"10.1007\/978-1-4842-5940-5"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00842-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-023-00842-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00842-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,10]],"date-time":"2024-11-10T02:42:04Z","timestamp":1731206524000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-023-00842-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,4]]},"references-count":55,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["842"],"URL":"https:\/\/doi.org\/10.1186\/s40537-023-00842-0","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,2,4]]},"assertion":[{"value":"22 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 October 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 February 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"25"}}