{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T00:35:54Z","timestamp":1777336554693,"version":"3.51.4"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T00:00:00Z","timestamp":1600041600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T00:00:00Z","timestamp":1600041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["402774445"],"award-info":[{"award-number":["402774445"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2020,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Context<\/jats:title>\n<jats:p>Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Objective<\/jats:title>\n<jats:p>We want to understand the overall maturity of the state of the art of issue type prediction with the goal to predict if issues are bugs and evaluate if we can improve existing models by incorporating manually specified knowledge about issues.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Method<\/jats:title>\n<jats:p>We train different models for the title and description of the issue to account for the difference in structure between these fields, e.g., the length. Moreover, we manually detect issues whose description contains a null pointer exception, as these are strong indicators that issues are bugs.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>Our approach performs best overall, but not significantly different from an approach from the literature based on the fastText classifier from Facebook AI Research. The small improvements in prediction performance are due to structural information about the issues we used. We found that using information about the content of issues in form of null pointer exceptions is not useful. We demonstrate the usefulness of issue type prediction through the example of labelling bugfixing commits.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>Issue type prediction can be a useful tool if the use case allows either for a certain amount of missed bug reports or the prediction of too many issues as bug is acceptable.<\/jats:p>\n<\/jats:sec>","DOI":"10.1007\/s10664-020-09885-w","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T15:03:46Z","timestamp":1600095826000},"page":"5333-5369","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":39,"title":["On the feasibility of automated prediction of bug and non-bug issues"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9765-2803","authenticated-orcid":false,"given":"Steffen","family":"Herbold","sequence":"first","affiliation":[]},{"given":"Alexander","family":"Trautsch","sequence":"additional","affiliation":[]},{"given":"Fabian","family":"Trautsch","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,9,14]]},"reference":[{"key":"9885_CR1","doi-asserted-by":"publisher","unstructured":"Antoniol G, Ayari K, Di Penta M, Khomh F, Gu\u00e9h\u00e9neuc YG (2008) Is it a bug or an enhancement?: A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds, ACM, New York, NY, USA, CASCON \u201908, pp 23:304\u201323:318. https:\/\/doi.org\/10.1145\/1463788.1463819","DOI":"10.1145\/1463788.1463819"},{"issue":"901","key":"9885_CR2","first-page":"268","volume":"160","author":"MS Bartlett","year":"1937","unstructured":"Bartlett MS (1937) Properties of sufficiency and statistical tests. Proc R Soc London A Math Phys Sci 160(901):268\u2013282","journal-title":"Proc R Soc London A Math Phys Sci"},{"issue":"1","key":"9885_CR3","first-page":"2653","volume":"18","author":"A Benavoli","year":"2017","unstructured":"Benavoli A, Corani G, Dem\u0161ar J, Zaffalon M (2017) Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J Mach Learn Res 18(1):2653\u20132688","journal-title":"J Mach Learn Res"},{"key":"9885_CR4","doi-asserted-by":"crossref","unstructured":"Chawla I, Singh SK (2015) An automated approach for bug categorization using fuzzy logic. In: Proceedings of the 8th india software engineering conference, ACM, pp 90\u201399","DOI":"10.1145\/2723742.2723751"},{"issue":"1","key":"9885_CR5","first-page":"177","volume":"18","author":"I Chawla","year":"2018","unstructured":"Chawla I, Singh SK (2018) Automated labeling of issue reports using semi supervised approach. J Comp Meth Sci Eng 18(1):177\u2013191","journal-title":"J Comp Meth Sci Eng"},{"key":"9885_CR6","unstructured":"Cohen J (1988) Statistical power analysis for the behavioral sciences. Lawrence Erlbaum Associates"},{"key":"9885_CR7","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"key":"9885_CR8","unstructured":"Devlin J, Chang M, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805"},{"issue":"293","key":"9885_CR9","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1080\/01621459.1961.10482090","volume":"56","author":"OJ Dunn","year":"1961","unstructured":"Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52\u201364. https:\/\/doi.org\/10.1080\/01621459.1961.10482090","journal-title":"J Am Stat Assoc"},{"key":"9885_CR10","unstructured":"Facebook AI Research (2019) fasttext - library for efficient text classification and representation learning. https:\/\/fasttext.cc\/, [accessed 14-November-2019]"},{"issue":"1","key":"9885_CR11","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1214\/aoms\/1177731944","volume":"11","author":"M Friedman","year":"1940","unstructured":"Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11(1):86\u201392","journal-title":"Ann Math Stat"},{"issue":"6","key":"9885_CR12","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.1109\/TSE.2011.103","volume":"38","author":"T Hall","year":"2012","unstructured":"Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276\u20131304. https:\/\/doi.org\/10.1109\/TSE.2011.103","journal-title":"IEEE Trans Softw Eng"},{"issue":"39","key":"9885_CR13","doi-asserted-by":"publisher","first-page":"313","DOI":"10.19101\/IJACR.2018.839013","volume":"8","author":"M Hammad","year":"2018","unstructured":"Hammad M, Alzyoudi R, Otoom AF (2018) Automatic clustering of bug reports. Int J Adv Comput Res 8(39):313\u2013323","journal-title":"Int J Adv Comput Res"},{"key":"9885_CR14","doi-asserted-by":"crossref","unstructured":"Han Z, Li X, Xing Z, Liu H, Feng Z (2017) Learning to predict severity of software vulnerability using only vulnerability description. In: 2017 IEEE International conference on software maintenance and evolution (ICSME), pp 125\u2013136","DOI":"10.1109\/ICSME.2017.52"},{"key":"9885_CR15","doi-asserted-by":"crossref","unstructured":"Herbold S (2020) With registered reports towards large scale data curation. In: Proceedings of the 2020 international conference software engineering - NIER track","DOI":"10.1145\/3377816.3381721"},{"issue":"9","key":"9885_CR16","doi-asserted-by":"publisher","first-page":"811","DOI":"10.1109\/TSE.2017.2724538","volume":"44","author":"S Herbold","year":"2018","unstructured":"Herbold S, Trautsch A, Grabowski J (2018) A comparative study to benchmark cross-project defect prediction approaches. IEEE Trans Softw Eng 44 (9):811\u2013833. https:\/\/doi.org\/10.1109\/TSE.2017.2724538","journal-title":"IEEE Trans Softw Eng"},{"key":"9885_CR17","unstructured":"Herbold S, Trautsch A, Trautsch F (2020) Issues with szz: an empirical assessment of the state of practice of defect prediction data collection. arXiv:1911.08938"},{"key":"9885_CR18","doi-asserted-by":"crossref","unstructured":"Herzig K, Just S, Zeller A (2013) It\u2019s not a bug, it\u2019s a feature: How misclassification impacts bug prediction. In: Proceedings of the international conference on software engineering, IEEE Press, Piscataway, NJ, USA, ICSE \u201913, pp 392\u2013401. http:\/\/dl.acm.org\/citation.cfm?id=2486788.2486840","DOI":"10.1109\/ICSE.2013.6606585"},{"issue":"99","key":"9885_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TSE.2017.2770124","volume":"PP","author":"S Hosseini","year":"2017","unstructured":"Hosseini S, Turhan B, Gunarathna D (2017) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng PP(99):1\u20131. https:\/\/doi.org\/10.1109\/TSE.2017.2770124","journal-title":"IEEE Trans Softw Eng"},{"key":"9885_CR20","doi-asserted-by":"crossref","unstructured":"Just R, Jalali D, Ernst MD (2014) Defects4j: a database of existing faults to enable controlled testing studies for java programs. In: Proceedings of the 2014 international symposium on software testing and analysis (ISSTA), ACM","DOI":"10.1145\/2610384.2628055"},{"key":"9885_CR21","doi-asserted-by":"crossref","unstructured":"Kallis R, Di Sorbo A, Canfora G, Panichella S (2019) Ticket tagger: machine learning driven issue classification. In: IEEE International conference on software maintenance and evolution (ICSME), IEEE","DOI":"10.1109\/ICSME.2019.00070"},{"issue":"6","key":"9885_CR22","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1145\/293347.293351","volume":"45","author":"M Kearns","year":"1998","unstructured":"Kearns M (1998) Efficient noise-tolerant learning from statistical queries. J ACM 45(6):983\u20131006. https:\/\/doi.org\/10.1145\/293347.293351","journal-title":"J ACM"},{"key":"9885_CR23","doi-asserted-by":"crossref","unstructured":"Limsettho N, Hata H, Matsumoto Ki (2014a) Comparing hierarchical dirichlet process with latent dirichlet allocation in bug report multiclass classification. In: 15Th IEEE\/ACIS international conference on software engineering, artificial intelligence, networking and parallel\/distributed computing (SNPD), IEEE, pp 1\u20136","DOI":"10.1109\/SNPD.2014.6888695"},{"key":"9885_CR24","doi-asserted-by":"crossref","unstructured":"Limsettho N, Hata H, Monden A, Matsumoto K (2014b) Automatic unsupervised bug report categorization. In: 2014 6th International workshop on empirical software engineering in practice, IEEE, pp 7\u201312","DOI":"10.1109\/IWESEP.2014.8"},{"issue":"07","key":"9885_CR25","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1142\/S0218194016500352","volume":"26","author":"N Limsettho","year":"2016","unstructured":"Limsettho N, Hata H, Monden A, Matsumoto K (2016) Unsupervised bug report categorization using clustering and labeling algorithm. Int J Softw Eng Knowl Eng 26(07):1027\u20131053","journal-title":"Int J Softw Eng Knowl Eng"},{"key":"9885_CR26","doi-asserted-by":"publisher","unstructured":"Lukins SK, Kraft NA, Etzkorn LH (2008) Source code retrieval for bug localization using latent dirichlet allocation. In: Proceedings of the 2008 15th working conference on reverse engineering, IEEE Computer Society, USA, WCRE \u201908, p 155\u2013164 https:\/\/doi.org\/10.1109\/WCRE.2008.33,","DOI":"10.1109\/WCRE.2008.33"},{"key":"9885_CR27","doi-asserted-by":"crossref","unstructured":"Marcus A, Sergeyev A, Rajlich V, Maletic JI (2004) An information retrieval approach to concept location in source code. In: Proceedings of the 11th working conference on reverse engineering, IEEE Computer Society, USA, WCRE \u201904, pp 214\u2013223","DOI":"10.1109\/WCRE.2004.10"},{"key":"9885_CR28","doi-asserted-by":"crossref","unstructured":"Mills C, Pantiuchina J, Parra E, Bavota G, Haiduc S (2018) Are bug reports enough for text retrieval-based bug localization?. In: IEEE International conference on software maintenance and evolution (ICSME), pp 381\u2013392","DOI":"10.1109\/ICSME.2018.00046"},{"key":"9885_CR29","unstructured":"Nemenyi P (1963) Distribution-free multiple comparison. PhD thesis, Princeton University"},{"key":"9885_CR30","doi-asserted-by":"publisher","unstructured":"Ortu M, Destefanis G, Adams B, Murgia A, Marchesi M, Tonelli R (2015) The jira repository dataset: Understanding social aspects of software development. In: Proceedings of the 11th international conference on predictive models and data analytics in software engineering, association for computing machinery, New York, NY, USA, PROMISE \u201915,. https:\/\/doi.org\/10.1145\/2810146.2810147","DOI":"10.1145\/2810146.2810147"},{"key":"9885_CR31","doi-asserted-by":"crossref","unstructured":"Otoom AF, Al-jdaeh S, Hammad M (2019) Automated classification of software bug reports. In: Proceedings of the 9th international conference on information communication and management, ACM, pp 17\u201321","DOI":"10.1145\/3357419.3357424"},{"key":"9885_CR32","doi-asserted-by":"crossref","unstructured":"Palacio DN, McCrystal D, Moran K, Bernal-Cardenas\u0301 C, Poshyvanyk D, Shenefiel C (2019) Learning to identify security-related issues using convolutional neural networks. In: 2019 IEEE International conference on software maintenance and evolution (ICSME), pp 140\u2013144","DOI":"10.1109\/ICSME.2019.00024"},{"key":"9885_CR33","doi-asserted-by":"crossref","unstructured":"Pandey N, Hudait A, Sanyal DK, Sen A (2018) Automated classification of issue reports from a software issue tracker. In: Sa PK, Sahoo MN, Murugappan M, Wu Y, Majhi B (eds) Progress in intelligent computing techniques: theory, practice, and applications. Springer Singapore, Singapore, pp 423\u2013430","DOI":"10.1007\/978-981-10-3373-5_42"},{"key":"9885_CR34","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"9885_CR35","doi-asserted-by":"publisher","unstructured":"Pingclasai N, Hata H, Matsumoto K (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20Th asia-pacific software engineering conference (APSEC), vol 2, pp 13\u201318. https:\/\/doi.org\/10.1109\/APSEC.2013.105","DOI":"10.1109\/APSEC.2013.105"},{"key":"9885_CR36","doi-asserted-by":"crossref","unstructured":"Qin H, Sun X (2018) Classifying bug reports into bugs and non-bugs using lstm. In: Proceedings of the tenth asia-pacific symposium on internetware, ACM, p 20","DOI":"10.1145\/3275219.3275239"},{"key":"9885_CR37","doi-asserted-by":"publisher","unstructured":"Rao S, Kak A (2011) Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th working conference on mining software repositories, association for computing machinery, New York, NY, USA, MSR \u201911, p 43\u201352. https:\/\/doi.org\/10.1145\/1985441.1985451","DOI":"10.1145\/1985441.1985451"},{"issue":"3\/4","key":"9885_CR38","doi-asserted-by":"publisher","first-page":"591","DOI":"10.2307\/2333709","volume":"52","author":"SS Shapiro","year":"1965","unstructured":"Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3\/4):591\u2013611","journal-title":"Biometrika"},{"key":"9885_CR39","doi-asserted-by":"publisher","unstructured":"\u015aliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. In: Proceedings of the 2005 international workshop on mining software repositories, ACM. https:\/\/doi.org\/10.1145\/1082983.1083147","DOI":"10.1145\/1082983.1083147"},{"key":"9885_CR40","doi-asserted-by":"publisher","unstructured":"Terdchanakul P, Hata H, Phannachitta P, Matsumoto K (2017) Bug or not? bug report classification using n-gram idf. In: IEEE International conference on software maintenance and evolution (ICSME), pp 534\u2013538. https:\/\/doi.org\/10.1109\/ICSME.2017.14","DOI":"10.1109\/ICSME.2017.14"},{"key":"9885_CR41","doi-asserted-by":"crossref","unstructured":"Trautsch A, Trautsch F, Herbold S, Ledel B, Grabowski J (2020) The smartshark ecosystem for software repository mining. In: Proceedings of the 2020 international conference software engineering - demonstrations track","DOI":"10.1145\/3377812.3382139"},{"issue":"2","key":"9885_CR42","doi-asserted-by":"publisher","first-page":"1036","DOI":"10.1007\/s10664-017-9537-x","volume":"23","author":"F Trautsch","year":"2018","unstructured":"Trautsch F, Herbold S, Makedonski P, Grabowski J (2018) Addressing problems with replicability and validity of repository mining studies through a smart data platform. Empirical Softw Engg 23(2):1036\u20131083. https:\/\/doi.org\/10.1007\/s10664-017-9537-x","journal-title":"Empirical Softw Engg"},{"key":"9885_CR43","doi-asserted-by":"crossref","unstructured":"Wohlin C, Runeson P, H\u00f6st M, Ohlsson MC, Regnell B, Wesslen A (2012) Experimentation in software engineering. Springer Publishing Company. Incorporated","DOI":"10.1007\/978-3-642-29044-2"},{"issue":"3","key":"9885_CR44","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1002\/smr.1770","volume":"28","author":"Y Zhou","year":"2016","unstructured":"Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150\u2013176","journal-title":"J Softw Evol Process"},{"key":"9885_CR45","doi-asserted-by":"crossref","unstructured":"Zolkeply MS, Shao J (2019) Classifying software issue reports through association mining. In: Proceedings of the 34th ACM\/SIGAPP symposium on applied computing, ACM, pp 1860\u20131863","DOI":"10.1145\/3297280.3297608"}],"updated-by":[{"DOI":"10.1007\/s10664-020-09888-7","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2020,10,17]],"date-time":"2020-10-17T00:00:00Z","timestamp":1602892800000}}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-020-09885-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10664-020-09885-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-020-09885-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,14]],"date-time":"2021-09-14T00:59:58Z","timestamp":1631581198000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10664-020-09885-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,14]]},"references-count":45,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2020,11]]}},"alternative-id":["9885"],"URL":"https:\/\/doi.org\/10.1007\/s10664-020-09885-w","relation":{"correction":[{"id-type":"doi","id":"10.1007\/s10664-020-09888-7","asserted-by":"object"}]},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,9,14]]},"assertion":[{"value":"14 September 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 October 2020","order":2,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":3,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The original version of this article unfortunately contained mistakes. Figures 8, 9 and 10 were incorrectly captured. Somehow, the plots in Fig.\u00a08 were replaced with those from Fig.\u00a09 and the original Fig.\u00a08 was lost.","order":4,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}]}}