{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T13:01:30Z","timestamp":1774443690083,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,11,14]],"date-time":"2019-11-14T00:00:00Z","timestamp":1573689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,11,14]],"date-time":"2019-11-14T00:00:00Z","timestamp":1573689600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Empir Software Eng"],"published-print":{"date-parts":[[2020,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Software developers in big and medium-size companies are working with millions of lines of code in their codebases. Assuring the quality of this code has shifted from simple defect management to proactive assurance of internal code quality. Although static code analysis and code reviews have been at the forefront of research and practice in this area, code reviews are still an effort-intensive and interpretation-prone activity. The aim of this research is to support code reviews by automatically recognizing company-specific code guidelines violations in large-scale, industrial source code. In our action research project, we constructed a machine-learning-based tool for code analysis where software developers and architects in big and medium-sized companies can use a few examples of source code lines violating code\/design guidelines (up to 700 lines of code) to train decision-tree classifiers to find similar violations in their codebases (up to 3 million lines of code). Our action research project consisted of (i) understanding the challenges of two large software development companies, (ii) applying the machine-learning-based tool to detect violations of Sun\u2019s and Google\u2019s coding conventions in the code of three large open source projects implemented in Java, (iii) evaluating the tool on evolving industrial codebase, and (iv) finding the best learning strategies to reduce the cost of training the classifiers. We were able to achieve the average accuracy of over 99% and the average F-score of 0.80 for open source projects when using ca. 40K lines for training the tool. We obtained a similar average F-score of 0.78 for the industrial code but this time using only up to 700 lines of code as a training dataset. Finally, we observed the tool performed visibly better for the rules requiring to understand a single line of code or the context of a few lines (often allowing to reach the F-score of 0.90 or higher). Based on these results, we could observe that this approach can provide modern software development companies with the ability to use examples to teach an algorithm to recognize violations of code\/design guidelines and thus increase the number of reviews conducted before the product release. This, in turn, leads to the increased quality of the final software.<\/jats:p>","DOI":"10.1007\/s10664-019-09769-8","type":"journal-article","created":{"date-parts":[[2019,11,14]],"date-time":"2019-11-14T05:29:06Z","timestamp":1573709346000},"page":"220-265","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Recognizing lines of code violating company-specific coding guidelines using machine learning"],"prefix":"10.1007","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9103-717X","authenticated-orcid":false,"given":"Miroslaw","family":"Ochodek","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Regina","family":"Hebig","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wilhelm","family":"Meding","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gert","family":"Frost","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miroslaw","family":"Staron","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2019,11,14]]},"reference":[{"key":"9769_CR1","doi-asserted-by":"crossref","unstructured":"Allamanis M, Barr ET, Bird C, Sutton C (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, pp 281\u2013 293","DOI":"10.1145\/2635868.2635883"},{"key":"9769_CR2","unstructured":"Axelsson S, Baca D, Feldt R, Sidlauskas D, Kacan D (2009) Detecting defects with an interactive code review tool based on visualisation and machine learning. In: The 21st international conference on software engineering and knowledge engineering (SEKE)"},{"issue":"2","key":"9769_CR3","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1177\/026839629601100305","volume":"11","author":"R Baskerville","year":"1996","unstructured":"Baskerville R, Wood-Harper A T (1996) A critical perspective on action research as a method for information systems research. J Inf Technol 11(2):235\u2013246","journal-title":"J Inf Technol"},{"key":"9769_CR4","unstructured":"Brar H K, Kaur P J (2015) Static analysis tools for security: a comparative evaluation. International Journal 5(7):1085\u20131089"},{"key":"9769_CR5","doi-asserted-by":"crossref","unstructured":"Brun Y, Ernst MD (2004) Finding latent code errors via machine learning over program executions. In: Proceedings of the 26th International Conference on Software Engineering, ICSE \u201904. IEEE Computer Society, Washington, pp 480\u2013490. http:\/\/dl.acm.org\/citation.cfm?id=998675.999452","DOI":"10.1109\/ICSE.2004.1317470"},{"key":"9769_CR6","doi-asserted-by":"crossref","unstructured":"Chappelly T, Cifuentes C, Krishnan P, Gevay S (2017) Machine Learning for finding bugs: An initial report. In: IEEE Workshop on Machine learning techniques for software quality evaluation (maLTeSQue). IEEE, pp 21\u201326","DOI":"10.1109\/MALTESQUE.2017.7882012"},{"key":"9769_CR7","doi-asserted-by":"crossref","unstructured":"Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Machine Learning Proceedings 1995. Elsevier, pp 150\u2013157","DOI":"10.1016\/B978-1-55860-377-6.50027-X"},{"key":"9769_CR8","doi-asserted-by":"crossref","unstructured":"Di Nucci D, Palomba F, Tamburri D A, Serebrenik A, De Lucia A (2018) Detecting code smells using machine learning techniques: are we there yet?. In: 2018 IEEE 25Th international conference on software analysis, evolution and reengineering, SANER. IEEE, pp 612\u2013621","DOI":"10.1109\/SANER.2018.8330266"},{"key":"9769_CR9","doi-asserted-by":"crossref","unstructured":"Dyer R, Nguyen H A, Rajan H, Nguyen T N (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, pp 422\u2013431","DOI":"10.1109\/ICSE.2013.6606588"},{"key":"9769_CR10","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.entcs.2008.06.039","volume":"217","author":"P Emanuelsson","year":"2008","unstructured":"Emanuelsson P, Nilsson U (2008) A comparative study of industrial static analysis tools. Electron Notes Theor Comput Sci 217:5\u201321","journal-title":"Electron Notes Theor Comput Sci"},{"key":"9769_CR11","doi-asserted-by":"crossref","unstructured":"Fatima A, Bibi S, Hanif R (2018) Comparative study on static code analysis tools for c\/c++. In: 2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST). IEEE, pp 465\u2013469","DOI":"10.1109\/IBCAST.2018.8312265"},{"key":"9769_CR12","doi-asserted-by":"crossref","unstructured":"Fontana FA, Zanoni M, Marino A, Mantyla MV (2013) Code smell detection: Towards a machine learning-based approach. In: 2013 29th IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 396\u2013399","DOI":"10.1109\/ICSM.2013.56"},{"issue":"3","key":"9769_CR13","doi-asserted-by":"publisher","first-page":"1143","DOI":"10.1007\/s10664-015-9378-4","volume":"21","author":"FA Fontana","year":"2016","unstructured":"Fontana F A, M\u00e4ntyl\u00e4 M V, Zanoni M, Marino A (2016) Comparing and experimenting machine learning techniques for code smell detection. Empir Softw Eng 21(3):1143\u20131191","journal-title":"Empir Softw Eng"},{"issue":"1","key":"9769_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2594473.2594475","volume":"15","author":"AA Freitas","year":"2014","unstructured":"Freitas A A (2014) Comprehensible classification models: a position paper. ACM SIGKDD Explor Newslett 15(1):1\u201310","journal-title":"ACM SIGKDD Explor Newslett"},{"issue":"2","key":"9769_CR15","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/s10115-012-0507-8","volume":"35","author":"Y Fu","year":"2013","unstructured":"Fu Y, Zhu X, Li B (2013) A survey on instance selection for active learning. Knowl Inf Syst 35(2):249\u2013283","journal-title":"Knowl Inf Syst"},{"key":"9769_CR16","unstructured":"Goodman PS, Bazerman M, Conlon E (1980) Institutionalization of planned organizational change. In: Research in Organizational Behavior, JAI Press,Greenwich, pp 215\u2013246"},{"issue":"3","key":"9769_CR17","first-page":"26","volume":"8","author":"W Irwin","year":"2001","unstructured":"Irwin W, Churcher N (2001) A generated parser of c++. NZ J Comput 8 (3):26\u201337","journal-title":"NZ J Comput"},{"key":"9769_CR18","doi-asserted-by":"crossref","unstructured":"Mantere M, Uusitalo I, Roning J (2009) Comparison of static code analysis tools. In: 2009. SECURWARE\u201909, Third International Conference on Emerging security information, systems and technologies. IEEE, pp 15\u201322","DOI":"10.1109\/SECURWARE.2009.10"},{"issue":"4","key":"9769_CR19","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1057\/ejis.2009.24","volume":"18","author":"LM Maruping","year":"2009","unstructured":"Maruping L M, Zhang X, Venkatesh V (2009) Role of collective ownership and coding standards in coordinating expertise in software project teams. Eur J Inf Syst 18 (4):355\u2013371","journal-title":"Eur J Inf Syst"},{"key":"9769_CR20","first-page":"2005","volume":"22","author":"J Masters","year":"1995","unstructured":"Masters J (1995) The history of action research. Action Res Electron Read 22:2005","journal-title":"Action Res Electron Read"},{"key":"9769_CR21","doi-asserted-by":"crossref","unstructured":"McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, pp 192\u2013201","DOI":"10.1145\/2597073.2597076"},{"key":"9769_CR22","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.infsof.2018.07.006","volume":"104","author":"Q Mi","year":"2018","unstructured":"Mi Q, Keung J, Xiao Y, Mensah S, Gao Y (2018) Improving code readability classification using convolutional neural networks. Inf Softw Technol 104:60\u201371","journal-title":"Inf Softw Technol"},{"issue":"1","key":"9769_CR23","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1109\/TSE.2009.50","volume":"36","author":"N Moha","year":"2010","unstructured":"Moha N, Gueheneuc Y G, Duchien A F, et al. (2010) Decor: a method for the specification and detection of code and design smells. IEEE Trans Softw Eng (TSE) 36(1):20\u201336","journal-title":"IEEE Trans Softw Eng (TSE)"},{"key":"9769_CR24","unstructured":"Novak J, Krajnc A, Ontar R (2010) Taxonomy of static code analysis tools. In: MIPRO, 2010 Proceedings of the 33rd International Convention. IEEE, pp 418\u2013422"},{"key":"9769_CR25","doi-asserted-by":"crossref","unstructured":"Ochodek M, Staron M, Bargowski D, Meding W, Hebig R (2017) Using machine learning to design a flexible loc counter. In: IEEE Workshop on Machine learning techniques for software quality evaluation (maLTeSQue). IEEE, pp 14\u201320","DOI":"10.1109\/MALTESQUE.2017.7882011"},{"key":"9769_CR26","volume-title":"Real world research","author":"C Robson","year":"2016","unstructured":"Robson C, McCartan K (2016) Real world research. Wiley, New York"},{"key":"9769_CR27","doi-asserted-by":"crossref","unstructured":"Shaukat R, Shahoor A, Urooj A (2018) Probing into code analysis tools: A comparison of c# supporting static code analyzers. In: 2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST). IEEE, pp 455\u2013464","DOI":"10.1109\/IBCAST.2018.8312264"},{"key":"9769_CR28","doi-asserted-by":"crossref","unstructured":"Singh D, Sekar V R, Stolee K T, Johnson B (2017) Evaluating how static analysis tools can reduce code review effort. In: 2017 IEEE Symposium on Visual languages and human-centric computing (VL\/HCC). IEEE, pp 101\u2013105","DOI":"10.1109\/VLHCC.2017.8103456"},{"key":"9769_CR29","unstructured":"Smit M, Gergel B, Hoover HJ, Stroulia E (2011) Maintainability and source code conventions: An analysis of open source projects. University of Alberta, Department of Computing Science, Tech Rep TR11-06"},{"issue":"4","key":"9769_CR30","doi-asserted-by":"publisher","first-page":"582","DOI":"10.2307\/2392581","volume":"23","author":"G Susman","year":"1978","unstructured":"Susman G, Evered R (1978) An assessment of the scientific merits of action research. J Admin Sci Q 23(4):582\u2013603","journal-title":"J Admin Sci Q"},{"key":"9769_CR31","doi-asserted-by":"crossref","unstructured":"Torunski E, Shafiq M O, Whitehead A (2017) Code style analytics for the automatic setting of formatting rules in ides: a solution to the tabs vs. spaces debate. In: 2017 Twelfth International Conference on Digital information management (ICDIM). IEEE, pp 6\u201314","DOI":"10.1109\/ICDIM.2017.8244675"},{"key":"9769_CR32","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-4625-2","volume-title":"Experimentation in software engineering: an introduction","author":"C Wohlin","year":"2000","unstructured":"Wohlin C, Runeson P, Host M, Ohlsson M C, Regnell B, Wessl\u00e8n A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publisher, Boston"}],"container-title":["Empirical Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-019-09769-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10664-019-09769-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10664-019-09769-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,3]],"date-time":"2021-02-03T04:31:01Z","timestamp":1612326661000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10664-019-09769-8"}},"subtitle":["A Method and Its Evaluation"],"short-title":[],"issued":{"date-parts":[[2019,11,14]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1]]}},"alternative-id":["9769"],"URL":"https:\/\/doi.org\/10.1007\/s10664-019-09769-8","relation":{},"ISSN":["1382-3256","1573-7616"],"issn-type":[{"value":"1382-3256","type":"print"},{"value":"1573-7616","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,14]]},"assertion":[{"value":"14 November 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}