{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T05:58:51Z","timestamp":1774418331089,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T00:00:00Z","timestamp":1717372800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T00:00:00Z","timestamp":1717372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Sentiment analysis has been used to study aspects of software engineering, such as issue resolution, toxicity, and self-admitted technical debt. To address the peculiarities of software engineering texts, sentiment analysis tools often consider the specific technical lingo practitioners use. To further improve the application of sentiment analysis, there have been two recommendations: Using pre-trained transformer models to classify sentiment and replacing non-natural language elements with meta-tokens. In this work, we benchmark five different sentiment analysis tools (two pre-trained transformer models and three machine learning tools) on 2 gold-standard sentiment analysis datasets. We find that pre-trained transformers outperform the best machine learning tool on only one of the two datasets, and that even on that dataset the performance difference is a few percentage points. Therefore, we recommend that software engineering researchers should not just consider predictive performance when selecting a sentiment analysis tool because the best-performing sentiment analysis tools perform very similarly to each other (within 4 percentage points). Meanwhile, we find that meta-tokenization does not improve the predictive performance of sentiment analysis tools. Both of our findings can be used by software engineering researchers who seek to apply sentiment analysis tools to software engineering data.<\/jats:p>","DOI":"10.1007\/s10664-024-10468-2","type":"journal-article","created":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T13:03:18Z","timestamp":1717419798000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Transformers and meta-tokenization in sentiment analysis for software engineering"],"prefix":"10.1007","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6511-918X","authenticated-orcid":false,"given":"Nathan","family":"Cassee","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrei","family":"Agaronian","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4242-2581","authenticated-orcid":false,"given":"Eleni","family":"Constantinou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1160-2608","authenticated-orcid":false,"given":"Nicole","family":"Novielli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1418-0095","authenticated-orcid":false,"given":"Alexander","family":"Serebrenik","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,6,3]]},"reference":[{"key":"10468_CR1","doi-asserted-by":"publisher","unstructured":"Ahmed T, Bosu A, Iqbal A, Rahimi S (2017) SentiCR: A customized sentiment analysis tool for code review interactions. ASE 2017 - Proceedings of the 32nd IEEE\/ACM International Conference on Automated Software Engineering pp 106\u2013111, https:\/\/doi.org\/10.1109\/ASE.2017.8115623","DOI":"10.1109\/ASE.2017.8115623"},{"key":"10468_CR2","doi-asserted-by":"publisher","unstructured":"Bacchelli A, D\u2019Ambros M, Lanza M (2010) Extracting source code from emails. In: Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension, IEEE Computer Society, USA, ICPC \u201910, p 24-33. URL https:\/\/doi.org\/10.1109\/ICPC.2010.47","DOI":"10.1109\/ICPC.2010.47"},{"issue":"1","key":"10468_CR3","doi-asserted-by":"publisher","first-page":"289","DOI":"10.2307\/2346101","volume":"57","author":"Y Benjamini","year":"1995","unstructured":"Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 57(1):289\u2013300. https:\/\/doi.org\/10.2307\/2346101","journal-title":"Journal of the Royal Statistical Society Series B (Methodological)"},{"key":"10468_CR4","unstructured":"Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O\u2019Reilly Media, Inc"},{"key":"10468_CR5","doi-asserted-by":"publisher","unstructured":"Biswas E, Karabulut ME, Pollock L, Vijay-Shanker K (2020) Achieving Reliable Sentiment Analysis in the Software Engineering Domain using BERT. Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020 pp 162\u2013173, https:\/\/doi.org\/10.1109\/ICSME46990.2020.00025","DOI":"10.1109\/ICSME46990.2020.00025"},{"key":"10468_CR6","doi-asserted-by":"crossref","unstructured":"Bosu A, Greiler M, Bird C (2015) Characteristics of useful code reviews: An empirical study at microsoft. In: Proceedings of the International Conference on Mining Software Repositories, URL https:\/\/www.microsoft.com\/en-us\/research\/publication\/characteristics-of-useful-code-reviews-an-empirical-study-at-microsoft\/","DOI":"10.1109\/MSR.2015.21"},{"key":"10468_CR7","doi-asserted-by":"publisher","unstructured":"Calefato F, Lanubile F, Maiorano F, Novielli N (2018a) Sentiment Polarity Detection for Software Development. Empirical Software Engineering 23(3):1352\u20131382. https:\/\/doi.org\/10.1007\/s10664-017-9546-9","DOI":"10.1007\/s10664-017-9546-9"},{"key":"10468_CR8","doi-asserted-by":"publisher","unstructured":"Calefato F, Lanubile F, Novielli N (2018b) How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow. Information and Software Technology 94(September 2017):186\u2013207.\u00a0https:\/\/doi.org\/10.1016\/j.infsof.2017.10.009","DOI":"10.1016\/j.infsof.2017.10.009"},{"key":"10468_CR9","doi-asserted-by":"publisher","unstructured":"Chen Z, Cao Y, Lu X, Mei Q, Liu X (2019) Sentimoji: An emoji-powered learning approach for sentiment analysis in software engineering. In: Proceedings of the 27th edition of ESEC\/FSE, Association for Computing Machinery, ESEC\/FSE 2019, p 841\u2013852. https:\/\/doi.org\/10.1145\/3338906.3338977","DOI":"10.1145\/3338906.3338977"},{"key":"10468_CR10","doi-asserted-by":"publisher","DOI":"10.1037\/h0026256","author":"J Cohen","year":"1968","unstructured":"Cohen J (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. https:\/\/doi.org\/10.1037\/h0026256","journal-title":"Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit."},{"issue":"3","key":"10468_CR11","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/BF00994018","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297","journal-title":"Mach Learn"},{"key":"10468_CR12","doi-asserted-by":"publisher","unstructured":"Ding J, Sun H, Wang X, Liu X (2018) Entity-level sentiment analysis of issue comments. Proceedings-International Conference on Software Engineering pp 7\u201313. https:\/\/doi.org\/10.1145\/3194932.3194935","DOI":"10.1145\/3194932.3194935"},{"key":"10468_CR13","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1177\/1536867X1501500117","volume":"15","author":"A Dinno","year":"2015","unstructured":"Dinno A (2015) Nonparametric pairwise multiple comparisons in independent groups using dunn\u2019s test. The Stata Journal: Promoting communications on statistics and Stata 15:292\u2013300. https:\/\/doi.org\/10.1177\/1536867X1501500117","journal-title":"The Stata Journal: Promoting communications on statistics and Stata"},{"key":"10468_CR14","doi-asserted-by":"crossref","unstructured":"Efstathiou V, Spinellis D (2018) Code review comments: Language matters. In: ICSE NIER, ACM, p 69\u201372","DOI":"10.1145\/3183399.3183411"},{"key":"10468_CR15","doi-asserted-by":"publisher","unstructured":"Fu W, Menzies T (2017) Easy over hard: a case study on deep learning. ACM, pp 49\u201360, https:\/\/doi.org\/10.1145\/3106237.3106256","DOI":"10.1145\/3106237.3106256"},{"key":"10468_CR16","doi-asserted-by":"publisher","unstructured":"Hastie T, Tibshirani R, Friedman J (2009) Model Assessment and Selection, Springer New York, New York, NY, pp 219\u2013259. https:\/\/doi.org\/10.1007\/978-0-387-84858-7_7","DOI":"10.1007\/978-0-387-84858-7_7"},{"key":"10468_CR17","doi-asserted-by":"publisher","unstructured":"Herrmann M, Klunder J (2021) From textual to verbal communication: Towards applying sentiment analysis to a software project meeting. IEEE, pp 371\u2013376. https:\/\/doi.org\/10.1109\/REW53955.2021.00065","DOI":"10.1109\/REW53955.2021.00065"},{"key":"10468_CR18","doi-asserted-by":"publisher","unstructured":"Islam MR, Zibran MF (2018) Sentistrength-se: Exploiting domain specificity for improved sentiment analysis in software engineering text. J Syst Softw 145:125\u2013146. https:\/\/doi.org\/10.1016\/j.jss.2018.08.030, URL https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0164121218301675","DOI":"10.1016\/j.jss.2018.08.030"},{"issue":"5","key":"10468_CR19","doi-asserted-by":"publisher","first-page":"2543","DOI":"10.1007\/s10664-016-9493-x","volume":"22","author":"R Jongeling","year":"2017","unstructured":"Jongeling R, Sarkar P, Datta S, Serebrenik A (2017) On negative results when using sentiment analysis tools for software engineering research. Empir Softw Eng 22(5):2543\u20132584. https:\/\/doi.org\/10.1007\/s10664-016-9493-x","journal-title":"Empir Softw Eng"},{"key":"10468_CR20","doi-asserted-by":"publisher","unstructured":"Kudo T, Richardson J (2018) Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. Association for Computational Linguistics, pp 66\u201371. https:\/\/doi.org\/10.18653\/v1\/D18-2012","DOI":"10.18653\/v1\/D18-2012"},{"issue":"5","key":"10468_CR21","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1109\/MS.2019.2922949","volume":"36","author":"MJ Lanovaz","year":"2019","unstructured":"Lanovaz MJ, Adams B (2019) Comparing the communication tone and responses of users and developers in two r mailing lists: Measuring positive and negative emails. IEEE Softw 36(5):46\u201350. https:\/\/doi.org\/10.1109\/MS.2019.2922949","journal-title":"IEEE Softw"},{"key":"10468_CR22","doi-asserted-by":"publisher","unstructured":"Lin B, Zampetti F, Bavota G, Di Penta M, Lanza M, Oliveto R (2018) Sentiment analysis for software engineering. ACM, pp 94\u2013104. https:\/\/doi.org\/10.1145\/3180155.3180195","DOI":"10.1145\/3180155.3180195"},{"key":"10468_CR23","doi-asserted-by":"crossref","unstructured":"Lin B, Cassee N, Serebrenik A, Bavota G, Novielli N, Lanza M (2022) Opinion mining for software development: A systematic literature review. ACM Transactions on Software Engineering and Methodology 31(3):38:1\u201338:41","DOI":"10.1145\/3490388"},{"key":"10468_CR24","doi-asserted-by":"publisher","unstructured":"Maalej W, Nabil H (2015) Bug report, feature request, or simply praise? on automatically classifying app reviews. In: 2015 IEEE 23rd International Requirements Engineering Conference (RE), pp 116\u2013125. https:\/\/doi.org\/10.1109\/RE.2015.7320414","DOI":"10.1109\/RE.2015.7320414"},{"key":"10468_CR25","doi-asserted-by":"crossref","unstructured":"Manning CD, Surdeanu M, Bauer J, Finkel JR, Bethard S, McClosky D (2014) The stanford corenlp natural language processing toolkit. In: ACL (System Demonstrations), The Association for Computer Linguistics, pp 55\u201360","DOI":"10.3115\/v1\/P14-5010"},{"key":"10468_CR26","doi-asserted-by":"publisher","unstructured":"M\u00e4ntyl\u00e4 M, Calefato F, Claes M (2018) Natural language or not (nlon) - a package for software engineering text analysis pipeline. In: 2018 IEEE\/ACM 15th International Conference on Mining Software Repositories (MSR), p 387\u2013391. https:\/\/doi.org\/10.1145\/3196398.3196444","DOI":"10.1145\/3196398.3196444"},{"key":"10468_CR27","doi-asserted-by":"publisher","unstructured":"McKnight PE, Najab J (2010) Mann-Whitney U Test, John Wiley & Sons, Ltd, pp 1\u20131. https:\/\/doi.org\/10.1002\/9780470479216.corpsy0524. https:\/\/onlinelibrary.wiley.com\/doi\/abs\/10.1002\/9780470479216.corpsy0524","DOI":"10.1002\/9780470479216.corpsy0524"},{"key":"10468_CR28","unstructured":"Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, URL http:\/\/arxiv.org\/abs\/1301.3781"},{"key":"10468_CR29","doi-asserted-by":"publisher","unstructured":"Novielli N, Calefato F, Dongiovanni D, Girardi D, Lanubile F (2020) Can we use se-specific sentiment analysis tools in a cross-platform setting? In: Proceedings of the 17th International Conference on Mining Software Repositories, Association for Computing Machinery, New York, NY, USA, MSR \u201920, p 158\u2013168. URL https:\/\/doi.org\/10.1145\/3379597.3387446","DOI":"10.1145\/3379597.3387446"},{"issue":"4","key":"10468_CR30","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1007\/s10664-021-09960-w","volume":"26","author":"N Novielli","year":"2021","unstructured":"Novielli N, Calefato F, Lanubile F, Serebrenik A (2021) Assessment of offthe- shelf se-specific sentiment analysis tools: An extended replication study. Empir Softw Eng 26(4):77. https:\/\/doi.org\/10.1007\/s10664-021-09960-w","journal-title":"Empir Softw Eng"},{"key":"10468_CR31","doi-asserted-by":"publisher","unstructured":"Ortu M, Marchesi M, Tonelli R (2019) Empirical analysis of affect of merged issues on github. In: 2019 IEEE\/ACM 4th International Workshop on Emotion Awareness in Software Engineering (SEmotion), pp 46\u201348. https:\/\/doi.org\/10.1109\/SEmotion.2019.00017","DOI":"10.1109\/SEmotion.2019.00017"},{"key":"10468_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102360","volume":"57","author":"EW Pamungkas","year":"2020","unstructured":"Pamungkas EW, Basile V, Patti V (2020) Misogyny detection in twitter: a multilingual and cross-domain study. Inf Process Manag 57:102360. https:\/\/doi.org\/10.1016\/j.ipm.2020.102360","journal-title":"Inf Process Manag"},{"key":"10468_CR33","doi-asserted-by":"publisher","unstructured":"Paul R, Bosu A, Sultana KZ (2019) Expressions of sentiments during code reviews: Male vs. female. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 26\u201337. https:\/\/doi.org\/10.1109\/SANER.2019.8667987","DOI":"10.1109\/SANER.2019.8667987"},{"key":"10468_CR34","doi-asserted-by":"publisher","unstructured":"Pennacchiotti M, Popescu AM (2011) Democrats, republicans and starbucks afficionados: User classification in twitter. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, USA, KDD \u201911, p 430\u2013438. https:\/\/doi.org\/10.1145\/2020408.2020477","DOI":"10.1145\/2020408.2020477"},{"key":"10468_CR35","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/s10664-008-9102-8","volume":"14","author":"P Runeson","year":"2009","unstructured":"Runeson P, H\u00f6st M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14:131\u2013164","journal-title":"Empir Softw Eng"},{"key":"10468_CR36","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.inffus.2021.11.011","volume":"81","author":"R Shwartz-Ziv","year":"2022","unstructured":"Shwartz-Ziv R, Armon A (2022) Tabular data: Deep learning is not all you need. Information Fusion 81:84\u201390. https:\/\/doi.org\/10.1016\/j.inffus.2021.11.011","journal-title":"Information Fusion"},{"key":"10468_CR37","doi-asserted-by":"publisher","unstructured":"Sidhu PK, Mussbacher G, McIntosh S (2019) Reuse (or lack thereof) in travis ci specifications: An empirical study of ci phases and commands. IEEE, pp 524\u2013533. https:\/\/doi.org\/10.1109\/SANER.2019.8668029","DOI":"10.1109\/SANER.2019.8668029"},{"key":"10468_CR38","doi-asserted-by":"publisher","DOI":"10.4018\/978-1-60566-650-1.ch001","author":"M Stanojevic","year":"2009","unstructured":"Stanojevic M, Vrane\u0161 S (2009). Semantic approach to knowledge representation and processing. https:\/\/doi.org\/10.4018\/978-1-60566-650-1.ch001","journal-title":"Semantic approach to knowledge representation and processing."},{"key":"10468_CR39","doi-asserted-by":"publisher","unstructured":"Uddin G, Khomh F (2017) Opiner: An opinion search and summarization engine for apis. IEEE, pp 978\u2013983. https:\/\/doi.org\/10.1109\/ASE.2017.8115715. URL http:\/\/ieeexplore.ieee.org\/document\/8115715\/","DOI":"10.1109\/ASE.2017.8115715"},{"issue":"3","key":"10468_CR40","doi-asserted-by":"publisher","first-page":"522","DOI":"10.1109\/TSE.2019.2900245","volume":"47","author":"G Uddin","year":"2021","unstructured":"Uddin G, Khomh F (2021) Automatic mining of opinions expressed about apis in stack overflow. IEEE Trans Softw Eng 47(3):522\u2013559. https:\/\/doi.org\/10.1109\/TSE.2019.2900245","journal-title":"IEEE Trans Softw Eng"},{"key":"10468_CR41","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3491211","volume":"31","author":"G Uddin","year":"2022","unstructured":"Uddin G, Gu\u00e9h\u00e9nuc YG, Khomh F, Roy CK (2022) An empirical study of the effectiveness of an ensemble of stand-alone sentiment detection tools for software engineering datasets. ACM Trans Softw Eng Methodol 31:1\u201338. https:\/\/doi.org\/10.1145\/3491211","journal-title":"ACM Trans Softw Eng Methodol"},{"key":"10468_CR42","doi-asserted-by":"crossref","unstructured":"Yedida R, Menzies T (2022) How to improve deep learning for software analytics (a case study with code smell detection). pp 156\u2013166","DOI":"10.1145\/3524842.3528458"},{"key":"10468_CR43","doi-asserted-by":"publisher","unstructured":"Zampetti F, Fucci G, Serebrenik A, Di Penta M (2021) Self-admitted technical debt practices: a com - parison between industry and open-source. Empir Softw Eng 26:131. https:\/\/doi.org\/10.1007\/s10664-021-10031-3","DOI":"10.1007\/s10664-021-10031-3"},{"key":"10468_CR44","doi-asserted-by":"publisher","unstructured":"Zhang T, Xu B, Thung F, Haryono SA, Lo D, Jiang L (2020) Sentiment analysis for software engineering: How far can pre-trained transformer models go? In: 2020 ICSME, pp 70\u201380. https:\/\/doi.org\/10.1109\/ICSME46990.2020.00017","DOI":"10.1109\/ICSME46990.2020.00017"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-024-10468-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-024-10468-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-024-10468-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T15:11:48Z","timestamp":1720192308000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-024-10468-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,3]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10468"],"URL":"https:\/\/doi.org\/10.1007\/s10664-024-10468-2","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,3]]},"assertion":[{"value":"22 February 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 June 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declared that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interests"}}],"article-number":"77"}}