{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,15]],"date-time":"2025-12-15T19:12:42Z","timestamp":1765825962409,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T00:00:00Z","timestamp":1691452800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>This article presents a dataset containing messages from the Digital Teaching Assistant (DTA) system, which records the results from the automatic verification of students\u2019 solutions to unique programming exercises of 11 various types. These results are automatically generated by the system, which automates a massive Python programming course at MIREA\u2014Russian Technological University (RTU MIREA). The DTA system is trained to distinguish between approaches to solve programming exercises, as well as to identify correct and incorrect solutions, using intelligent algorithms responsible for analyzing the source code in the DTA system using vector representations of programs based on Markov chains, calculating pairwise Jensen\u2013Shannon distances for programs and using a hierarchical clustering algorithm to detect high-level approaches used by students in solving unique programming exercises. In the process of learning, each student must correctly solve 11 unique exercises in order to receive admission to the intermediate certification in the form of a test. In addition, a motivated student may try to find additional approaches to solve exercises they have already solved. At the same time, not all students are able or willing to solve the 11 unique exercises proposed to them; some will resort to outside help in solving all or part of the exercises. Since all information about the interactions of the students with the DTA system is recorded, it is possible to identify different types of students. First of all, the students can be classified into 2 classes: those who failed to solve 11 exercises and those who received admission to the intermediate certification in the form of a test, having solved the 11 unique exercises correctly. However, it is possible to identify classes of typical, motivated and suspicious students among the latter group based on the proposed dataset. The proposed dataset can be used to develop regression models that will predict outbursts of student activity when interacting with the DTA system, to solve clustering problems, to identify groups of students with a similar behavior model in the learning process and to develop intelligent data classifiers that predict the students\u2019 behavior model and draw appropriate conclusions, not only at the end of the learning process but also during the course of it in order to motivate all students, even those who are classified as suspicious, to visualize the results of the learning process using various tools.<\/jats:p>","DOI":"10.3390\/data8080129","type":"journal-article","created":{"date-parts":[[2023,8,8]],"date-time":"2023-08-08T12:26:39Z","timestamp":1691497599000},"page":"129","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Anomaly Detection in Student Activity in Solving Unique Programming Exercises: Motivated Students against Suspicious Ones"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4516-3746","authenticated-orcid":false,"given":"Liliya A.","family":"Demidova","sequence":"first","affiliation":[{"name":"Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA\u2014Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia"}]},{"given":"Peter N.","family":"Sovietov","sequence":"additional","affiliation":[{"name":"Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA\u2014Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6418-6797","authenticated-orcid":false,"given":"Elena G.","family":"Andrianova","sequence":"additional","affiliation":[{"name":"Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA\u2014Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia"}]},{"given":"Anna A.","family":"Demidova","sequence":"additional","affiliation":[{"name":"Institute of Information Technologies, Federal State Budget Educational Institution of Higher Education, MIREA\u2014Russian Technological University, 78, Vernadsky Avenue, 119454 Moscow, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Wang, C.L., Dai, J., and Xu, L.J. (2022, January 22\u201324). Big Data and Data Mining in Education: A Bibliometrics Study from 2010 to 2022. Proceedings of the 2022 7th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA, Chengdu, China.","DOI":"10.1109\/ICCCBDA55098.2022.9778874"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Baashar, Y., Alkawsi, G., Mustafa, A., Alkahtani, A.A., Alsariera, Y.A., Ali, A.Q., Hashim, W., and Tiong, S.K. (2022). Toward Predicting Student\u2019s Academic Performance Using Artificial Neural Networks (ANNs). Appl. Sci., 12.","DOI":"10.3390\/app12031289"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1155\/2022\/4151487","article-title":"Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance","volume":"2022","author":"Alsariera","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Ordo\u00f1ez-Avila, R., Salgado Reyes, N., Meza, J., and Ventura, S. (2023). Data mining techniques for predicting teacher evaluation in higher education: A systematic literature review. Heliyon, 9.","DOI":"10.1016\/j.heliyon.2023.e13939"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"7","DOI":"10.32362\/2500-316X-2023-11-1-7-17","article-title":"Developing the data management component of an academic discipline program for an educational management information system","volume":"11","author":"Starichkova","year":"2023","journal-title":"Russ. Technol. J."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Queir\u00f3s, R.A.P., and Leal, J.P. (2012, January 3\u20135). PETCHA: A Programming Exercises Teaching Assistant. Proceedings of the 17th ACM Annual Conference on Innovation and Technology in Computer Science Education, Haifa, Israel.","DOI":"10.1145\/2325296.2325344"},{"key":"ref_7","unstructured":"Sadigh, D., Seshia, S.A., and Gupta, M. (2013). Proceedings of the Workshop on Embedded and Cyber-Physical Systems Education\u2014WESE \u201912, New York, NY, USA, 17\u201318 October 2019, Association for Computing Machinery."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tiam-Lee, T.J., and Sumi, K. (2018, January 7\u201310). Procedural generation of programming exercises with guides based on the student\u2019s emotion. Proceedings of the 2018 IEEE International Conference on Systems Man and Cybernetics (SMC), Miyazaki, Japan.","DOI":"10.1109\/SMC.2018.00255"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"81154","DOI":"10.1109\/ACCESS.2020.2990980","article-title":"Building a Comprehensive Automated Programming Assessment System","volume":"8","year":"2020","journal-title":"IEEE Access"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"3","DOI":"10.3390\/software1010002","article-title":"Automated Code Assessment for Education: Review, Classification and Perspectives on Techniques and Tools","volume":"1","year":"2022","journal-title":"Software"},{"key":"ref_11","unstructured":"Baxter, I.D., Yahin, A., Moura, L., Sant\u2019Anna, M., and Bier, L. (1998, January 16\u201319). Clone detection using abstract syntax trees. Proceedings of the International Conference on Software Maintenance (Cat. No. 98CB36272), Bethesda, MD, USA."},{"key":"ref_12","unstructured":"Jiang, L., Misherghi, G., Su, Z., and Glondu, S. (2007). Proceedings of the 29th International Conference on Software Engineering (ICSE\u201907), Minneapolis, MN, USA, 20\u201326 May 2007, IEEE."},{"key":"ref_13","unstructured":"Kustanto, C., and Liem, I. (2009). Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel\/Distributed Computing, Daegu, Republic of Korea, 27\u201329 May 2009, IEEE."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1093\/comjnl\/bxh119","article-title":"PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets","volume":"48","author":"Moussiades","year":"2005","journal-title":"Comput. J."},{"key":"ref_15","unstructured":"Yasaswi, J., Kailash, S., Chilupuri, A., Purini, S., and Jawahar, C.V. (2017). Proceedings of the 10th Innovations in Software Engineering Conference, Jaipur, India, 5\u20137 February 2017, Association for Computing Machinery."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sovietov, P.N., and Gorchakov, A.V. (2022, January 26\u201327). Digital Teaching Assistant for the Python Programming Course. Proceedings of the 2022 2nd International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russia.","DOI":"10.1109\/TELE55498.2022.9801060"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"7","DOI":"10.32362\/2500-316X-2022-10-3-7-23","article-title":"Pedagogical Design of a Digital Teaching Assistant in Massive Professional Training for the Digital Economy","volume":"10","author":"Andrianova","year":"2022","journal-title":"Russ. Technol. J."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sovietov, P. (2021, January 24\u201325). Automatic Generation of Programming Exercises. Proceedings of the 2021 1st International Conference on Technology Enhanced Learning in Higher Education (TELE), Lipetsk, Russia.","DOI":"10.1109\/TELE52840.2021.9482762"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"51","DOI":"10.21667\/1995-4565-2022-81-51-64","article-title":"Clustering of Program Source Text Representations Based on Markov Chains","volume":"81","author":"Demidova","year":"2022","journal-title":"Vestn. Ryazan State Radio Eng. Univ."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Demidova, L.A., and Gorchakov, A.V. (2022). Classification of Program Texts Represented as Markov Chains with Biology-Inspired Algorithms-Enhanced Extreme Learning Machines. Algorithms, 15.","DOI":"10.3390\/a15090329"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1143","DOI":"10.1002\/cjce.5450820602","article-title":"Applications of Markov chains in particulate process engineering: A review","volume":"82","author":"Berthiaux","year":"2004","journal-title":"Can. J. Chem. Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1109\/18.61115","article-title":"Divergence Measures Based on the Shannon Entropy","volume":"37","author":"Lin","year":"1991","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Nielsen, F. (2019). On the Jensen\u2013Shannon Symmetrization of Distances Relying on Abstract Means. Entropy, 21.","DOI":"10.3390\/e21050485"},{"key":"ref_24","first-page":"130","article-title":"A Statistical Method for Evaluating Systematic Relationships","volume":"11","author":"Sokal","year":"1957","journal-title":"Evolution"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1080\/01621459.1963.10500845","article-title":"Hierarchical Grouping to Optimize an Objective Function","volume":"58","author":"Ward","year":"1963","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Demidova, L.A., Andrianova, E.G., Sovietov, P.N., and Gorchakov, A.V. (2023). Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Data, 8.","DOI":"10.3390\/data8060109"},{"key":"ref_27","first-page":"89","article-title":"Automated Program Text Analysis Using Representations Based on Markov Chains and Extreme Learning Machines","volume":"82","author":"Gorchakov","year":"2022","journal-title":"Vestn. Ryazan State Radio Eng. Univ."},{"key":"ref_28","unstructured":"(2023, June 28). Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant. Available online: https:\/\/zenodo.org\/record\/7799972."},{"key":"ref_29","unstructured":"(2023, June 28). PEP 8\u2014Style Guide for Python Code. Available online: https:\/\/peps.python.org\/pep-0008\/."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1109\/MS.2016.147","article-title":"Cyclomatic complexity","volume":"33","author":"Ebert","year":"2016","journal-title":"IEEE Softw."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"McInnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv.","DOI":"10.21105\/joss.00861"},{"key":"ref_32","first-page":"9129","article-title":"Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization","volume":"22","author":"Wang","year":"2021","journal-title":"J. Mach. Learn. Res."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Demidova, L.A., and Gorchakov, A.V. (2022). Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm. J. Imaging, 8.","DOI":"10.3390\/jimaging8040113"},{"key":"ref_34","unstructured":"(2023, June 28). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Available online: https:\/\/umap-learn.readthedocs.io\/en\/latest\/_modules\/umap\/umap_html."},{"key":"ref_35","unstructured":"Jain, A.K., and Dubes, R.C. (1988). Algorithms for Clustering Data, Prentice-Hall."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1109\/TPAMI.2002.1017616","article-title":"An efficient k-means clustering algorithm: Analysis and implementation","volume":"24","author":"Kanungo","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"80716","DOI":"10.1109\/ACCESS.2020.2988796","article-title":"Unsupervised K-Means Clustering Algorithm","volume":"8","author":"Sinaga","year":"2020","journal-title":"IEEE Access"},{"key":"ref_38","first-page":"226","article-title":"A density-based algorithm for discovering clusters in large spatial databases with noise","volume":"96","author":"Ester","year":"1996","journal-title":"KDD"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Khan, K., Rehman, S.U., Aziz, K., Fong, S., and Sarasvady, S. (2014, January 17\u201319). DBSCAN: Past, present and future. Proceedings of the Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), Bangalore, India.","DOI":"10.1109\/ICADIWT.2014.6814687"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Deng, D. (2020, January 25\u201327). DBSCAN Clustering Algorithm Based on Density. Proceedings of the 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), Hefei, China.","DOI":"10.1109\/IFEEA51475.2020.00199"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shahapure, K.R., and Nicholas, C. (2020, January 6\u20139). Cluster Quality Analysis Using Silhouette Score. Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia.","DOI":"10.1109\/DSAA49011.2020.00096"},{"key":"ref_42","unstructured":"(2023, June 28). Scikit-Learn. Available online: https:\/\/scikit-learn.org\/stable\/."},{"key":"ref_43","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_44","unstructured":"(2023, June 28). Selecting the Number of Clusters with Silhouette Analysis on KMeans Clustering. Available online: https:\/\/scikit-learn.org\/stable\/auto_examples\/cluster\/plot_kmeans_silhouette_analysis.html#sphx-glr-auto-examples-cluster-plot-kmeans-silhouette-analysis-py."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/8\/129\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:27:58Z","timestamp":1760128078000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/8\/129"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,8]]},"references-count":44,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["data8080129"],"URL":"https:\/\/doi.org\/10.3390\/data8080129","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2023,8,8]]}}}