{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T18:53:33Z","timestamp":1771959213195,"version":"3.50.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,5,15]],"date-time":"2024-05-15T00:00:00Z","timestamp":1715731200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,15]],"date-time":"2024-05-15T00:00:00Z","timestamp":1715731200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006752","name":"Universidade do Porto","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006752","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Clustering of source code is a technique that can help improve feedback in automated program assessment. Grouping code submissions that contain similar mistakes can, for instance, facilitate the identification of students\u2019 difficulties to provide targeted feedback. Moreover, solutions with similar functionality but possibly different coding styles or progress levels can allow personalized feedback to students stuck at some point based on a more developed source code or even detect potential cases of plagiarism. However, existing clustering approaches for source code are mostly inadequate for automated feedback generation or assessment systems in programming education. They either give too much emphasis to syntactical program features, rely on expensive computations over pairs of programs, or require previously collected data. This paper introduces an online approach and implemented tool\u2014AsanasCluster\u2014to cluster source code submissions to programming assignments. The proposed approach relies on program attributes extracted from semantic graph representations of source code, including control and data flow features. The obtained feature vector values are fed into an incremental <jats:italic>k<\/jats:italic>-means model. Such a model aims to determine the closest cluster of solutions, as they enter the system, timely, considering clustering is an intermediate step for feedback generation in automated assessment. We have conducted a twofold evaluation of the tool to assess (1) its runtime performance and (2) its precision in separating different algorithmic strategies. To this end, we have applied our clustering approach on a public dataset of real submissions from undergraduate students to programming assignments, measuring the runtimes for the distinct tasks involved: building a model, identifying the closest cluster to a new observation, and recalculating partitions. As for the precision, we partition two groups of programs collected from GitHub. One group contains implementations of two searching algorithms, while the other has implementations of several sorting algorithms. AsanasCluster matches and, in some cases, improves the state-of-the-art clustering tools in terms of runtime performance and precision in identifying different algorithmic strategies. It does so without requiring the execution of the code. Moreover, it is able to start the clustering process from a dataset with only two submissions and continuously partition the observations as they enter the system.<\/jats:p>","DOI":"10.1007\/s41060-024-00554-5","type":"journal-article","created":{"date-parts":[[2024,5,15]],"date-time":"2024-05-15T02:01:40Z","timestamp":1715738500000},"page":"1581-1592","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Clustering source code from automated assessment of programming assignments"],"prefix":"10.1007","volume":"20","author":[{"given":"Jos\u00e9 Carlos","family":"Paiva","sequence":"first","affiliation":[]},{"given":"Jos\u00e9 Paulo","family":"Leal","sequence":"additional","affiliation":[]},{"given":"\u00c1lvaro","family":"Figueira","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,15]]},"reference":[{"issue":"2","key":"554_CR1","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1080\/08993400500150747","volume":"15","author":"KM Ala-Mutka","year":"2005","unstructured":"Ala-Mutka, K.M.: A survey of automated assessment approaches for programming assignments. Comput. Sci. Educ. 15(2), 83\u2013102 (2005). https:\/\/doi.org\/10.1080\/08993400500150747","journal-title":"Comput. Sci. Educ."},{"issue":"2","key":"554_CR2","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1145\/1272848.1272879","volume":"39","author":"J Bennedsen","year":"2007","unstructured":"Bennedsen, J., Caspersen, M.E.: Failure rates in introductory programming. SIGCSE Bull. 39(2), 32\u201336 (2007). https:\/\/doi.org\/10.1145\/1272848.1272879","journal-title":"SIGCSE Bull."},{"key":"554_CR3","unstructured":"Bottou, L., Bengio, Y.: Convergence properties of the k-means algorithms. In: Proceedings of the 7th International Conference on Neural Information Processing Systems, pp. 585\u2013592. MIT Press, Cambridge, MA, USA, NIPS\u201994 (1994)"},{"key":"554_CR4","doi-asserted-by":"publisher","unstructured":"Chae, D.K., Ha, J., Kim, S.W., et\u00a0al.: Software plagiarism detection: a graph-based approach. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1577\u20131580. Association for Computing Machinery, New York, NY, USA, CIKM \u201913 (2013). https:\/\/doi.org\/10.1145\/2505515.2507848","DOI":"10.1145\/2505515.2507848"},{"key":"554_CR5","doi-asserted-by":"publisher","unstructured":"Chen, R., Hong, L., Lu, C., et\u00a0al.: Author identification of software source code with program dependence graphs. In: Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference Workshops, pp. 281\u2013286. IEEE Computer Society, USA, COMPSACW \u201910 (2010). https:\/\/doi.org\/10.1109\/COMPSACW.2010.56","DOI":"10.1109\/COMPSACW.2010.56"},{"key":"554_CR6","doi-asserted-by":"publisher","unstructured":"Chow, S., Yacef, K., Koprinska, I., et\u00a0al.: Automated data-driven hints for computer programming students. In: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization, pp. 5\u201310. Association for Computing Machinery, New York, NY, USA, UMAP \u201917 (2017). https:\/\/doi.org\/10.1145\/3099023.3099065","DOI":"10.1145\/3099023.3099065"},{"issue":"3","key":"554_CR7","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1109\/TC.2011.223","volume":"61","author":"G Cosma","year":"2012","unstructured":"Cosma, G., Joy, M.: An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans. Comput. 61(3), 379\u2013394 (2012). https:\/\/doi.org\/10.1109\/TC.2011.223","journal-title":"IEEE Trans. Comput."},{"key":"554_CR8","doi-asserted-by":"publisher","unstructured":"Drummond, A., Lu, Y., Chaudhuri, S., et\u00a0al.: Learning to grade student programs in a massive open online course. In: Proceedings of the 2014 IEEE International Conference on Data Mining, pp. 785\u2013790. IEEE Computer Society, USA, ICDM \u201914 (2014). https:\/\/doi.org\/10.1109\/ICDM.2014.142","DOI":"10.1109\/ICDM.2014.142"},{"issue":"1","key":"554_CR9","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1093\/comjnl\/bxs018","volume":"56","author":"Z Duri\u0107","year":"2012","unstructured":"Duri\u0107, Z., Ga\u0161evi\u0107, D.: A source code similarity system for plagiarism detection. Comput. J. 56(1), 70\u201386 (2012). https:\/\/doi.org\/10.1093\/comjnl\/bxs018","journal-title":"Comput. J."},{"key":"554_CR10","doi-asserted-by":"publisher","unstructured":"Elmaleh, J., Shankararaman, V.: Improving student learning in an introductory programming course using flipped classroom and competency framework. In: 2017 IEEE Global Engineering Education Conference (EDUCON), pp. 49\u201355. IEEE, Athens, Greece (2017). https:\/\/doi.org\/10.1109\/EDUCON.2017.7942823","DOI":"10.1109\/EDUCON.2017.7942823"},{"key":"554_CR11","doi-asserted-by":"publisher","unstructured":"Emerson, A., Smith, A., Rodriguez, F.J., et\u00a0al.: Cluster-based analysis of novice coding misconceptions in block-based programming. In: Proceedings of the 51st ACM Technical Symposium on Computer Science Education. Association for Computing Machinery, New York, NY, USA, SIGCSE \u201920, pp. 825\u2013831 (2020). https:\/\/doi.org\/10.1145\/3328778.3366924","DOI":"10.1145\/3328778.3366924"},{"issue":"1","key":"554_CR12","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/BF01407931","volume":"20","author":"P Feautrier","year":"1991","unstructured":"Feautrier, P.: Dataflow analysis of array and scalar references. Int. J. Parallel Prog. 20(1), 23\u201353 (1991). https:\/\/doi.org\/10.1007\/BF01407931","journal-title":"Int. J. Parallel Prog."},{"key":"554_CR13","unstructured":"Fraunhofer AISEC: Code Property Graph (2023). https:\/\/github.com\/Fraunhofer-AISEC\/cpg. Accessed 20 May 2023"},{"issue":"2","key":"554_CR14","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1145\/2699751","volume":"22","author":"EL Glassman","year":"2015","unstructured":"Glassman, E.L., Scott, J., Singh, R., et al.: Overcode: visualizing variation in student solutions to programming problems at scale. ACM Trans. Comput. Hum. Interact. 22(2), 25 (2015). https:\/\/doi.org\/10.1145\/2699751","journal-title":"ACM Trans. Comput. Hum. Interact."},{"key":"554_CR15","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1007\/978-3-642-30950-2_127","volume-title":"Intelligent Tutoring Systems","author":"S Gross","year":"2012","unstructured":"Gross, S., Zhu, X., Hammer, B., et al.: Cluster based feedback provision strategies in intelligent tutoring systems. In: Cerri, S.A., Clancey, W.J., Papadourakis, G., et al. (eds.) Intelligent Tutoring Systems, pp. 699\u2013700. Springer, Berlin (2012). https:\/\/doi.org\/10.1007\/978-3-642-30950-2_127"},{"key":"554_CR16","doi-asserted-by":"publisher","first-page":"644","DOI":"10.1007\/978-3-642-39112-5_79","volume-title":"Artificial Intelligence in Education","author":"S Gross","year":"2013","unstructured":"Gross, S., Mokbel, B., Hammer, B., et al.: Towards providing feedback to students in absence of formalized domain models. In: Lane, H.C., Yacef, K., Mostow, J., et al. (eds.) Artificial Intelligence in Education, pp. 644\u2013648. Springer, Berlin (2013). https:\/\/doi.org\/10.1007\/978-3-642-39112-5_79"},{"key":"554_CR17","doi-asserted-by":"publisher","unstructured":"Gulwani, S., Radi\u010dek, I., Zuleger, F.: Automated clustering and program repair for introductory programming assignments. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 465\u2013480. Association for Computing Machinery, New York, NY, USA, PLDI 2018 (2018). https:\/\/doi.org\/10.1145\/3192366.3192387","DOI":"10.1145\/3192366.3192387"},{"key":"554_CR18","doi-asserted-by":"publisher","unstructured":"Head, A., Glassman, E., Soares, G., et\u00a0al.: Writing reusable code feedback at scale with mixed-initiative program synthesis. In: Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale, pp. 89\u201398. Association for Computing Machinery, New York, NY, USA, L@S \u201917 (2017). https:\/\/doi.org\/10.1145\/3051457.3051467","DOI":"10.1145\/3051457.3051467"},{"key":"554_CR19","unstructured":"Huang, J., Piech, C., Nguyen, A., et\u00a0al.: Syntactic and functional variability of a million code submissions in a machine learning MOOC. In: Walker, E., Looi, C. (eds.) Proceedings of the Workshops at the 16th International Conference on Artificial Intelligence in Education AIED 2013, CEUR Workshop Proceedings, vol 1009. CEUR-WS.org, Memphis, TN, USA, pp. 25\u201332 (2013). https:\/\/ceur-ws.org\/Vol-1009\/0105.pdf"},{"key":"554_CR20","doi-asserted-by":"publisher","unstructured":"Inoue, U., Wada, S.: Detecting plagiarisms in elementary programming courses. In: 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, Chongqing, China, pp. 2308\u20132312 (2012). https:\/\/doi.org\/10.1109\/FSKD.2012.6234186","DOI":"10.1109\/FSKD.2012.6234186"},{"key":"554_CR21","doi-asserted-by":"publisher","unstructured":"Jhi, Y.C., Wang, X., Jia, X., et\u00a0al.: Value-based program characterization and its application to software plagiarism detection. In: Proceedings of the 33rd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, ICSE \u201911, pp. 756\u2013765 (2011). https:\/\/doi.org\/10.1145\/1985793.1985899","DOI":"10.1145\/1985793.1985899"},{"key":"554_CR22","doi-asserted-by":"publisher","unstructured":"Kaleeswaran, S., Santhiar, A., Kanade, A., et\u00a0al.: Semi-supervised verified feedback generation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, FSE 2016, pp. 739\u2013750 (2016). https:\/\/doi.org\/10.1145\/2950290.2950363","DOI":"10.1145\/2950290.2950363"},{"key":"554_CR23","first-page":"1090","volume-title":"Pearson\u2019s Correlation Coefficient","year":"2008","unstructured":"Kirch, W. (ed.): Pearson\u2019s Correlation Coefficient, pp. 1090\u20131091. Springer, Dordrecht (2008)"},{"key":"554_CR24","doi-asserted-by":"publisher","unstructured":"Koivisto, T., Hellas, A.: Evaluating CodeClusters for effectively providing feedback on code submissions. In: 2022 IEEE Frontiers in Education Conference (FIE). IEEE, pp. 1\u20139 (2022). https:\/\/doi.org\/10.1109\/FIE56618.2022.9962751","DOI":"10.1109\/FIE56618.2022.9962751"},{"issue":"6","key":"554_CR25","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1002\/spe.522","volume":"33","author":"JP Leal","year":"2003","unstructured":"Leal, J.P., Silva, F.: Mooshak: a web-based multi-site programming contest system. Softw. Pract. Exp. 33(6), 567\u2013581 (2003). https:\/\/doi.org\/10.1002\/spe.522","journal-title":"Softw. Pract. Exp."},{"issue":"2","key":"554_CR26","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"S Lloyd","year":"1982","unstructured":"Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theor. 28(2), 129\u2013137 (1982). https:\/\/doi.org\/10.1109\/TIT.1982.1056489","journal-title":"IEEE Trans. Inf. Theor."},{"key":"554_CR27","doi-asserted-by":"publisher","unstructured":"Luo, L., Zeng, Q.: Solminer: mining distinct solutions in programs. In: Proceedings of the 38th International Conference on Software Engineering Companion. Association for Computing Machinery, New York, NY, USA, ICSE \u201916, pp. 481\u2013490 (2016). https:\/\/doi.org\/10.1145\/2889160.2889202","DOI":"10.1145\/2889160.2889202"},{"key":"554_CR28","doi-asserted-by":"publisher","unstructured":"Luo, L., Ming, J., Wu, D., et\u00a0al.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, FSE 2014, pp. 389\u2013400 (2014). https:\/\/doi.org\/10.1145\/2635868.2635900","DOI":"10.1145\/2635868.2635900"},{"key":"554_CR29","doi-asserted-by":"publisher","unstructured":"Luxton-Reilly, A., Denny, P., Kirk, D., et\u00a0al.: On the differences between correct student solutions. In: Proceedings of the 18th ACM Conference on Innovation and Technology in Computer Science Education. Association for Computing Machinery, New York, NY, USA, ITiCSE \u201913, pp. 177\u2013182 (2013). https:\/\/doi.org\/10.1145\/2462476.2462505","DOI":"10.1145\/2462476.2462505"},{"key":"554_CR30","doi-asserted-by":"publisher","unstructured":"Luxton-Reilly, A., Simon Albluwi, I., et\u00a0al.: Introductory programming: a systematic literature review. In: Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. Association for Computing Machinery, New York, NY, USA, ITiCSE 2018 Companion, pp. 55\u2013106 (2018). https:\/\/doi.org\/10.1145\/3293881.3295779","DOI":"10.1145\/3293881.3295779"},{"issue":"6","key":"554_CR31","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1093\/comjnl\/bxh119","volume":"48","author":"L Moussiades","year":"2005","unstructured":"Moussiades, L., Vakali, A.: PDetect: a clustering approach for detecting plagiarism in source code datasets. Comput. J. 48(6), 651\u2013661 (2005). https:\/\/doi.org\/10.1093\/comjnl\/bxh119","journal-title":"Comput. J."},{"key":"554_CR32","doi-asserted-by":"publisher","unstructured":"Nguyen, A., Piech, C., Huang, J., et\u00a0al.: Codewebs: scalable homework search for massive open online programming courses. In: Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, WWW \u201914, pp. 491\u2013502 (2014). https:\/\/doi.org\/10.1145\/2566486.2568023","DOI":"10.1145\/2566486.2568023"},{"issue":"2","key":"554_CR33","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1007\/s10115-014-0742-2","volume":"43","author":"T Ohmann","year":"2014","unstructured":"Ohmann, T., Rahal, I.: Efficient clustering-based source code plagiarism detection using PIY. Knowl. Inf. Syst. 43(2), 445\u2013472 (2014). https:\/\/doi.org\/10.1007\/s10115-014-0742-2","journal-title":"Knowl. Inf. Syst."},{"issue":"2","key":"554_CR34","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1007\/s10115-014-0742-2","volume":"43","author":"T Ohmann","year":"2015","unstructured":"Ohmann, T., Rahal, I.: Efficient clustering-based source code plagiarism detection using PIY. Knowl. Inf. Syst. 43(2), 445\u2013472 (2015). https:\/\/doi.org\/10.1007\/s10115-014-0742-2","journal-title":"Knowl. Inf. Syst."},{"key":"554_CR35","doi-asserted-by":"publisher","DOI":"10.1145\/3513140","author":"JC Paiva","year":"2022","unstructured":"Paiva, J.C., Leal, J.P., Figueira, A.: Automated assessment in computer science education: a state-of-the-art review. ACM Trans. Comput. Educ. (2022). https:\/\/doi.org\/10.1145\/3513140","journal-title":"ACM Trans. Comput. Educ."},{"key":"554_CR36","doi-asserted-by":"publisher","first-page":"108887","DOI":"10.1016\/j.dib.2023.108887","volume":"46","author":"JC Paiva","year":"2023","unstructured":"Paiva, J.C., Leal, J.P., Figueira, \u00c1.: Progpedia: collection of source-code submitted to introductory programming assignments. Data Brief 46, 108887 (2023). https:\/\/doi.org\/10.1016\/j.dib.2023.108887","journal-title":"Data Brief"},{"key":"554_CR37","doi-asserted-by":"publisher","unstructured":"Perry, DM., Kim, D., Samanta, R., et\u00a0al.: Semcluster: clustering of imperative programming assignments based on quantitative semantic features. In: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, USA, PLDI 2019, pp. 860\u2013873 (2019). https:\/\/doi.org\/10.1145\/3314221.3314629","DOI":"10.1145\/3314221.3314629"},{"key":"554_CR38","unstructured":"Piech, C., Huang, J., Nguyen, A., et\u00a0al.: Learning program embeddings to propagate feedback on student code. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 1093\u20131102. JMLR.org, ICML\u201915 (2015)"},{"key":"554_CR39","doi-asserted-by":"publisher","unstructured":"Poon, J.Y., Sugiyama, K., Tan, Y.F., et\u00a0al.: Instructor-centric source code plagiarism detection and plagiarism corpus. In: Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education. Association for Computing Machinery, New York, NY, USA, ITiCSE \u201912, pp. 122\u2013127 (2012). https:\/\/doi.org\/10.1145\/2325296.2325328","DOI":"10.1145\/2325296.2325328"},{"key":"554_CR40","doi-asserted-by":"publisher","unstructured":"Pu, Y., Narasimhan, K., Solar-Lezama, A., et\u00a0al.: Sk_p: a neural program corrector for MOOCs. In: Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for Humanity. Association for Computing Machinery, New York, NY, USA, SPLASH Companion 2016, pp. 39\u201340 (2016). https:\/\/doi.org\/10.1145\/2984043.2989222","DOI":"10.1145\/2984043.2989222"},{"key":"554_CR41","doi-asserted-by":"publisher","unstructured":"Rivers, K., Koedinger, K.R.: A canonicalizing model for building programming tutors. In: Cerri, S.A., Clancey, W.J., Papadourakis, G., et\u00a0al. (eds.) Intelligent Tutoring Systems. Springer, Berlin, pp. 591\u2013593 (2012). https:\/\/doi.org\/10.1007\/978-3-642-30950-2_80","DOI":"10.1007\/978-3-642-30950-2_80"},{"key":"554_CR42","unstructured":"Rivers, K., Koedinger, K.R.: Automatic generation of programming feedback: a data-driven approach. In: The First Workshop on AI-supported Education for Computer Science (AIEDCS 2013), pp. 50\u201359. Memphis, USA (2013)"},{"key":"554_CR43","doi-asserted-by":"publisher","unstructured":"Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, WWW \u201910, pp. 1177\u20131178 (2010). https:\/\/doi.org\/10.1145\/1772690.1772862","DOI":"10.1145\/1772690.1772862"},{"key":"554_CR44","doi-asserted-by":"publisher","unstructured":"Wang, K., Singh, R., Su, Z.: Dynamic neural program embedding for program repair (2018). https:\/\/doi.org\/10.48550\/arXiv.1711.07163","DOI":"10.48550\/arXiv.1711.07163"},{"key":"554_CR45","doi-asserted-by":"publisher","unstructured":"Wang, K., Singh, R., Su, Z.: Search, align, and repair: data-driven feedback generation for introductory programming exercises. In: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, USA, PLDI 2018, pp. 481\u2013495 (2018). https:\/\/doi.org\/10.1145\/3192366.3192384","DOI":"10.1145\/3192366.3192384"},{"key":"554_CR46","doi-asserted-by":"publisher","unstructured":"Weiss, K., Banse, C.: A language-independent analysis platform for source code (2022). https:\/\/doi.org\/10.48550\/arXiv.2203.08424","DOI":"10.48550\/arXiv.2203.08424"},{"issue":"4","key":"554_CR47","doi-asserted-by":"publisher","first-page":"360","DOI":"10.1109\/TSE.2003.1191799","volume":"29","author":"S Xu","year":"2003","unstructured":"Xu, S., Chee, Y.S.: Transformation-based diagnosis of student programs for programming tutoring systems. IEEE Trans. Softw. Eng. 29(4), 360\u2013384 (2003). https:\/\/doi.org\/10.1109\/TSE.2003.1191799","journal-title":"IEEE Trans. Softw. Eng."},{"key":"554_CR48","doi-asserted-by":"publisher","unstructured":"Yamaguchi, F., Golde, N., Arp, D., et\u00a0al.: Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy, pp. 590\u2013604. IEEE, Berkeley, CA, USA (2014). https:\/\/doi.org\/10.1109\/SP.2014.44","DOI":"10.1109\/SP.2014.44"},{"key":"554_CR49","doi-asserted-by":"publisher","unstructured":"Zhang, F., Wu, D., Liu, P., et\u00a0al.: Program logic based software plagiarism detection. In: 2014 IEEE 25th International Symposium on Software Reliability Engineering, pp. 66\u201377. IEEE, Naples, Italy (2014). https:\/\/doi.org\/10.1109\/ISSRE.2014.18","DOI":"10.1109\/ISSRE.2014.18"},{"key":"554_CR50","doi-asserted-by":"publisher","unstructured":"D\u030cura\u010d\u00edk, M., Kr\u0161\u00e1k, E., Hrk\u00fat, P.: Scalable source code plagiarism detection using source code vectors clustering. In: 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), pp. 499\u2013502. IEEE, Beijing, China (2018). https:\/\/doi.org\/10.1109\/ICSESS.2018.8663708","DOI":"10.1109\/ICSESS.2018.8663708"}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-024-00554-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-024-00554-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-024-00554-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T14:58:09Z","timestamp":1755701889000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-024-00554-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,15]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["554"],"URL":"https:\/\/doi.org\/10.1007\/s41060-024-00554-5","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,15]]},"assertion":[{"value":"3 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}