{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T20:35:54Z","timestamp":1773693354653,"version":"3.50.1"},"reference-count":36,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T00:00:00Z","timestamp":1673481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Code comments are considered an efficient way to document the functionality of a particular block of code. Code commenting is a common practice among developers to explain the purpose of the code in order to improve code comprehension and readability. Researchers investigated the effect of code comments on software development tasks and demonstrated the use of comments in several ways, including maintenance, reusability, bug detection, etc. Given the importance of code comments, it becomes vital for novice developers to brush up on their code commenting skills. In this study, we initially investigated what types of comments novice students document in their source code and further categorized those comments using a machine learning approach. The work involves the initial manual classification of code comments and then building a machine learning model to classify student code comments automatically. The findings of our study revealed that novice developers\/students\u2019 comments are mainly related to Literal (26.66%) and Insufficient (26.66%). Further, we proposed and extended the taxonomy of such source code comments by adding a few more categories, i.e., License (5.18%), Profile (4.80%), Irrelevant (4.80%), Commented Code (4.44%), Autogenerated (1.48%), and Improper (1.10%). Moreover, we assessed our approach with three different machine-learning classifiers. Our implementation of machine learning models found that Decision Tree resulted in the overall highest accuracy, i.e., 85%. This study helps in predicting the type of code comments for a novice developer using a machine learning approach that can be implemented to generate automated feedback for students, thus saving teachers time for manual one-on-one feedback, which is a time-consuming activity.<\/jats:p>","DOI":"10.3390\/a16010053","type":"journal-article","created":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T05:03:03Z","timestamp":1673499783000},"page":"53","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Investigating Novice Developers\u2019 Code Commenting Trends Using Machine Learning Techniques"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9459-7068","authenticated-orcid":false,"given":"Tahira","family":"Niazi","sequence":"first","affiliation":[{"name":"Department of Computer Science, Mohammad Ali Jinnah University, Karachi 75400, Pakistan"}]},{"given":"Teerath","family":"Das","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Jyv\u00e4skyl\u00e4, 40014 Jyv\u00e4skyl\u00e4, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0077-9638","authenticated-orcid":false,"given":"Ghufran","family":"Ahmed","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Computer Emerging Sciences, Karachi 75400, Pakistan"}]},{"given":"Syed Muhammad","family":"Waqas","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Bahria University, Karachi 75260, Pakistan"}]},{"given":"Sumra","family":"Khan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Mohammad Ali Jinnah University, Karachi 75400, Pakistan"}]},{"given":"Suleman","family":"Khan","sequence":"additional","affiliation":[{"name":"School of Psychology and Computer Science, University of Central Lancashire, Preston PR1 2HE, UK"}]},{"given":"Ahmed Abdelaziz","family":"Abdelatif","sequence":"additional","affiliation":[{"name":"Khawarizmi International College, Al Bahya, Abu Dhabi 25669, United Arab Emirates"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3660-065X","authenticated-orcid":false,"given":"Shaukat","family":"Wasi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Mohammad Ali Jinnah University, Karachi 75400, Pakistan"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,12]]},"reference":[{"key":"ref_1","unstructured":"Smit, M., Gergel, B., Hoover, H.J., and Stroulia, E. (2011). Maintainability and source code conventions: An analysis of open source projects. Univ. Alta. Dep. Comput. Sci. Tech. Rep. TR11, 6."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"dos Santos, R.M., and Gerosa, M.A. (2018, January 27\u201328). Impacts of coding practices on readability. Proceedings of the 26th Conference on Program Comprehension, Gothenburg, Sweden.","DOI":"10.1145\/3196321.3196342"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1271","DOI":"10.1109\/32.6171","article-title":"Program readability: Procedures versus comments","volume":"14","author":"Tenny","year":"1988","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1145\/382208.382523","article-title":"Procedures and comments vs. the banker\u2019s algorithm","volume":"17","author":"Tenny","year":"1985","journal-title":"Acm Sigcse Bull."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Rubio-Gonz\u00e1lez, C., and Liblit, B. (2010, January 5\u20136). Expect the unexpected: Error code mismatches between documentation and the real world. Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, Toronto, ON, Canada.","DOI":"10.1145\/1806672.1806687"},{"key":"ref_6","unstructured":"Subramanian, S., Inozemtseva, L., and Holmes, R. (June, January 31). Live API documentation. Proceedings of the 36th International Conference on Software Engineering, Hyderabad, India."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Goffi, A., Gorla, A., Ernst, M.D., and Pezz\u00e8, M. (2016, January 18\u201320). Automatic generation of oracles for exceptional behaviors. Proceedings of the 25th International Symposium on Software Testing and Analysis, Saarbr\u00fccken, Germany.","DOI":"10.1145\/2931037.2931061"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Pascarella, L., and Bacchelli, A. (2017, January 20\u201328). Classifying code comments in Java open-source software systems. Proceedings of the 2017 IEEE\/ACM 14th International Conference on Mining Software Repositories (MSR), Buenos Aires, Argentina.","DOI":"10.1109\/MSR.2017.63"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Mohammadi-Aragh, M.J., Beck, P.J., Barton, A.K., Reese, D., Jones, B.A., and Jankun-Kelly, M. (2018, January 23\u201327). Coding the coders: A qualitative investigation of students\u2019 commenting patterns. Proceedings of the 2018 ASEE Annual Conference Exposition, Salt Lake City, UT, USA.","DOI":"10.1145\/3159450.3162304"},{"key":"ref_10","unstructured":"Beck, P., Mohammadi-Aragh, M.J., and Archibald, C. (October, January 15). An Initial Exploration of Machine Learning Techniques to Classify Source Code Comments in Real-time. Proceedings of the 2019 ASEE Annual Conference & Exposition, Tampa, FL, USA."},{"key":"ref_11","unstructured":"Hartzman, C.S., and Austin, C.F. (1993, January 22\u201325). Maintenance productivity: Observations based on an experience in a large system environment. Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative Research: Software Engineering, Toronto, ON, Canada."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Jiang, Z.M., and Hassan, A.E. (2006, January 22\u201323). Examining the evolution of code comments in PostgreSQL. Proceedings of the 2006 International Workshop on Mining Software Repositories, Shanghai, China.","DOI":"10.1145\/1137983.1138030"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"de Souza, S.C.B., Anquetil, N., and de Oliveira, K.M. (2005, January 21\u201323). A study of the documentation essential to software maintenance. Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information, Coventry, UK.","DOI":"10.1145\/1085313.1085331"},{"key":"ref_14","unstructured":"Oman, P., and Hagemeister, J. (1992, January 9\u201312). Metrics for assessing a software system\u2019s maintainability. Proceedings of the Conference on Software Maintenance 1992, IEEE Computer Society, Orlando, FL, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Garcia, M.J.B., and Granja-Alvarez, J.C. (1996, January 4\u20138). Maintainability as a key factor in maintenance productivity: A case study. Proceedings of the Icsm, Monterey, CA, USA.","DOI":"10.1109\/ICSM.1996.564992"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Khamis, N., Witte, R., and Rilling, J. (2010, January 23\u201325). Automatic quality assessment of source code comments: The JavadocMiner. Proceedings of the International Conference on Application of Natural Language to Information Systems, Cardiff, UK.","DOI":"10.1007\/978-3-642-13881-2_7"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Steidl, D., Hummel, B., and Juergens, E. (2013, January 20\u201321). Quality analysis of source code comments. Proceedings of the 2013 21st International Conference on Program Comprehension (icpc), San Francisco, CA, USA.","DOI":"10.1109\/ICPC.2013.6613836"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"981","DOI":"10.1142\/S0218194016500339","article-title":"Code comment quality analysis and improvement recommendation: An automated approach","volume":"26","author":"Sun","year":"2016","journal-title":"Int. J. Softw. Eng. Knowl. Eng."},{"key":"ref_19","unstructured":"Tan, L., Yuan, D., Krishna, G., and Zhou, Y. (2007, January 3\u20136). comment: Bugs or bad comments?. Proceedings of the ACM Symposium on Operating Systems Principles: Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles, New York, NY, USA."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Ratol, I.K., and Robillard, M.P. (November, January 30). Detecting fragile comments. Proceedings of the 2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE), Urbana-Champaign, IL, USA.","DOI":"10.1109\/ASE.2017.8115624"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Das, T., Penta, M.D., and Malavolta, I. (2016, January 2\u20137). A Quantitative and Qualitative Investigation of Performance-Related Commits in Android Apps. Proceedings of the 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, IEEE Computer Society, Raleigh, NC, USA.","DOI":"10.1109\/ICSME.2016.49"},{"key":"ref_22","unstructured":"Allamanis, M., Peng, H., and Sutton, C. (2016, January 19\u201324). A convolutional attention network for extreme summarization of source code. Proceedings of the International Conference on Machine Learning, New York City, NY, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Hu, X., Li, G., Xia, X., Lo, D., and Jin, Z. (June, January 27). Deep code comment generation. Proceedings of the 2018 IEEE\/ACM 26th International Conference on Program Comprehension (ICPC), Gothenburg, Sweden.","DOI":"10.1145\/3196321.3196334"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Haiduc, S., Aponte, J., and Marcus, A. (2010, January 1\u20138). Supporting program comprehension with source code summarization. Proceedings of the 2010 ACM\/IEEE 32nd International Conference on Software Engineering, Cape Town, South Africa.","DOI":"10.1145\/1810295.1810335"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Haiduc, S., Aponte, J., Moreno, L., and Marcus, A. (2010, January 13\u201316). On the use of automated text summarization techniques for summarizing source code. Proceedings of the 2010 17th Working Conference on Reverse Engineering, Washington, DC, USA.","DOI":"10.1109\/WCRE.2010.13"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Huang, Y., Zheng, Q., Chen, X., Xiong, Y., Liu, Z., and Luo, X. (2017, January 9\u201310). Mining version control system for automatically generating commit comment. Proceedings of the 2017 ACM\/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), Toronto, ON, Canada.","DOI":"10.1109\/ESEM.2017.56"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lawrie, D.J., Feild, H., and Binkley, D. (2006, January 14\u201316). Leveraged quality assessment using information retrieval techniques. Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC\u201906), Athens, Greece.","DOI":"10.1109\/ICPC.2006.34"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Marcus, A., and Maletic, J.I. (2003, January 3\u201310). Recovering documentation-to-source-code traceability links using latent semantic indexing. Proceedings of the 25th International Conference on Software Engineering, Portland, OR, USA.","DOI":"10.1109\/ICSE.2003.1201194"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.jss.2019.03.010","article-title":"Automatically detecting the scopes of source code comments","volume":"153","author":"Chen","year":"2019","journal-title":"J. Syst. Softw."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hata, H., Treude, C., Kula, R.G., and Ishio, T. (2019, January 27). 9.6 million links in source code comments: Purpose, evolution, and decay. Proceedings of the 2019 IEEE\/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada.","DOI":"10.1109\/ICSE.2019.00123"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Alghamdi, M., Hayashi, S., Kobayashi, T., and Treude, C. (2021, January 17\u201319). Characterising the Knowledge about Primitive Variables in Java Code Comments. Proceedings of the 2021 IEEE\/ACM 18th International Conference on Mining Software Repositories (MSR), Madrid, Spain.","DOI":"10.1109\/MSR52588.2021.00058"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Haouari, D., Sahraoui, H., and Langlais, P. (2011, January 22\u201323). How good is your comment? A study of comments in java programs. Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement, Banff, AB, Canada.","DOI":"10.1109\/ESEM.2011.22"},{"key":"ref_33","unstructured":"Zhai, J., Xu, X., Shi, Y., Tao, G., Pan, M., Ma, S., Xu, L., Zhang, W., Tan, L., and Zhang, X. (July, January 27). CPC: Automatically classifying and propagating natural language comments via program analysis. Proceedings of the ACM\/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3058751","article-title":"Writing in-code comments to self-explain in computational science and engineering education","volume":"17","author":"Vieira","year":"2017","journal-title":"ACM Trans. Comput. Educ. (TOCE)"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Beck, P.J., Mohammadi-Aragh, M.J., Archibald, C., Jones, B.A., and Barton, A. (2018, January 13\u201316). Real-time metacognition feedback for introductory programming using machine learning. Proceedings of the 2018 IEEE Frontiers in Education Conference (FIE), Lincoln, NE, USA.","DOI":"10.1109\/FIE.2018.8658973"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Pascarella, L. (June, January 27). Classifying code comments in Java mobile applications. Proceedings of the 2018 IEEE\/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft), Gothenburg, Sweden.","DOI":"10.1145\/3197231.3198444"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/53\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:04:14Z","timestamp":1760119454000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/53"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,12]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["a16010053"],"URL":"https:\/\/doi.org\/10.3390\/a16010053","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,12]]}}}