{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T04:16:36Z","timestamp":1771647396872,"version":"3.50.1"},"reference-count":51,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2023,8,31]],"date-time":"2023-08-31T00:00:00Z","timestamp":1693440000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union","award":["101084201"],"award-info":[{"award-number":["101084201"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>The availability of code snippets in online repositories like GitHub has led to an uptick in code reuse, this way further supporting an open-source component-based development paradigm. The likelihood of code reuse rises when the code components or snippets are of high quality, especially in terms of readability, making their integration and upkeep simpler. Toward this direction, we have developed a dataset of code snippets that takes into account both the functional and the quality characteristics of the snippets. The dataset is based on the CodeSearchNet corpus and comprises additional information, including static analysis metrics, code violations, readability assessments, and source code similarity metrics. Thus, using this dataset, both software researchers and practitioners can conveniently find and employ code snippets that satisfy diverse functional needs while also demonstrating excellent readability and maintainability.<\/jats:p>","DOI":"10.3390\/data8090140","type":"journal-article","created":{"date-parts":[[2023,9,1]],"date-time":"2023-09-01T08:19:39Z","timestamp":1693556379000},"page":"140","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Employing Source Code Quality Analytics for Enriching Code Snippets Data"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6117-8222","authenticated-orcid":false,"given":"Thomas","family":"Karanikiotis","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0520-7225","authenticated-orcid":false,"given":"Themistoklis","family":"Diamantopoulos","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0235-6046","authenticated-orcid":false,"given":"Andreas","family":"Symeonidis","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2023,8,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1016\/S0164-1212(01)00148-0","article-title":"Challenges of Component-Based Development","volume":"61","author":"Crnkovic","year":"2002","journal-title":"J. Syst. Softw."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Brandt, J., Guo, P.J., Lewenstein, J., and Klemmer, S.R. (2008, January 10\u201318). Opportunistic Programming: How Rapid Ideation and Prototyping Occur in Practice. Proceedings of the 4th International Workshop on End-User Software Engineering, New York, NY, USA.","DOI":"10.1145\/1370847.1370848"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Nguyen, T., Rigby, P.C., Nguyen, A.T., Karanfil, M., and Nguyen, T.N. (2016, January 13\u201318). T2API: Synthesizing API Code Usage Templates from English Texts with Statistical Translation. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, USA.","DOI":"10.1145\/2950290.2983931"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/j.jss.2018.04.060","article-title":"MULAPI: Improving API method recommendation with API usage location","volume":"142","author":"Xu","year":"2018","journal-title":"J. Syst. Softw."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Nguyen, P.T., Di Rocco, J., Di Ruscio, D., Ochoa, L., Degueule, T., and Di Penta, M. (2019, January 25\u201331). FOCUS: A Recommender System for Mining API Function Calls and Usage Patterns. Proceedings of the 41st International Conference on Software Engineering, IEEE Press, Montr\u00e9al, QC, Canada.","DOI":"10.1109\/ICSE.2019.00109"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gu, X., Zhang, H., Zhang, D., and Kim, S. (2016, January 13\u201318). Deep API Learning. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, New York, NY, USA.","DOI":"10.1145\/2950290.2950334"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Cai, L., Wang, H., Huang, Q., Xia, X., Xing, Z., and Lo, D. (2019, January 3\u20139). BIKER: A Tool for Bi-Information Source Based API Method Recommendation. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, New York, NY, USA.","DOI":"10.1145\/3338906.3341174"},{"key":"ref_8","first-page":"1","article-title":"Bridging Semantic Gaps between Natural Languages and APIs with Word Embedding","volume":"46","author":"Li","year":"2018","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"192103","DOI":"10.1007\/s11432-018-9821-9","article-title":"Generative API Usage Code Recommendation with Parameter Concretization","volume":"62","author":"Chen","year":"2019","journal-title":"Sci. China Inf. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ponzanelli, L., Bacchelli, A., and Lanza, M. (2013, January 18\u201326). Seahawk: Stack Overflow in the IDE. Proceedings of the 2013 International Conference on Software Engineering, Piscataway, NJ, USA.","DOI":"10.1109\/ICSE.2013.6606701"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Campbell, B.A., and Treude, C. (2017, January 17\u201324). NLP2Code: Code Snippet Content Assist via Natural Language Tasks. Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution, Los Alamitos, CA, USA.","DOI":"10.1109\/ICSME.2017.56"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Diamantopoulos, T., Oikonomou, N., and Symeonidis, A. (2020, January 25\u201330). Extracting Semantics from Question-Answering Services for Snippet Reuse. Proceedings of the 23rd International Conference on Fundamental Approaches to Software Engineering, Dublin, Ireland.","DOI":"10.26226\/morressier.604907f41a80aac83ca25d36"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Gu, X., Zhang, H., and Kim, S. (2018, January 26\u201327). Deep Code Search. Proceedings of the 40th International Conference on Software Engineering, New York, NY, USA.","DOI":"10.1145\/3180155.3180167"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Papathomas, E., Diamantopoulos, T., and Symeonidis, A. (2022, January 2\u20137). Semantic Code Search in Software Repositories using Neural Machine Translation. Proceedings of the 25th International Conference on Fundamental Approaches to Software Engineering, Munich, Germany.","DOI":"10.1007\/978-3-030-99429-7_13"},{"key":"ref_15","unstructured":"(2023, August 28). ISO\/IEC 25010:2011. Available online: https:\/\/www.iso.org\/obp\/ui\/#iso:std:iso-iec:25010:ed-1:v1:en."},{"key":"ref_16","unstructured":"Spinellis, D. (2006). Code Quality: The Open Source Perspective, Adobe Press."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sedano, T. (2016, January 5\u20136). Code Readability Testing, an Empirical Study. Proceedings of the 2016 IEEE 29th International Conference on Software Engineering Education and Training (CSEET).","DOI":"10.1109\/CSEET.2016.36"},{"key":"ref_18","unstructured":"Pfleeger, S.L., and Atlee, J.M. (1998). Software Engineering: Theory and Practice, Pearson Education India."},{"key":"ref_19","unstructured":"Husain, H., Wu, H.H., Gazit, T., Allamanis, M., and Brockschmidt, M. (2019). CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1109\/TSE.2002.1019480","article-title":"CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code","volume":"28","author":"Kamiya","year":"2002","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Jiang, L., Misherghi, G., Su, Z., and Glondu, S. (2007, January 19\u201327). DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. Proceedings of the 29th International Conference on Software Engineering, Minneapolis, MN, USA.","DOI":"10.1109\/ICSE.2007.30"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"White, M., Tufano, M., Vendome, C., and Poshyvanyk, D. (2016, January 3\u20137). Deep Learning Code Fragments for Code Clone Detection. Proceedings of the 31st IEEE\/ACM International Conference on Automated Software Engineering, New York, NY, USA.","DOI":"10.1145\/2970276.2970326"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1142\/S0218194016500133","article-title":"Structural Code Clone Detection Methodology Using Software Metrics","volume":"26","author":"Aktas","year":"2016","journal-title":"Int. J. Softw. Eng. Knowl. Eng."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Terragni, V., Liu, Y., and Cheung, S.C. (2016, January 18\u201320). CSNIPPEX: Automated Synthesis of Compilable Code Snippets from Q&A Sites. Proceedings of the 25th International Symposium on Software Testing and Analysis, New York, NY, USA.","DOI":"10.1145\/2931037.2931058"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Raghothaman, M., Wei, Y., and Hamadi, Y. (2016, January 18\u201320). SWIM: Synthesizing What i Mean: Code Search and Idiomatic Snippet Synthesis. Proceedings of the 38th International Conference on Software Engineering, New York, NY, USA.","DOI":"10.1145\/2884781.2884808"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Haiduc, S., Aponte, J., and Marcus, A. (2010, January 1\u20138). Supporting Program Comprehension with Source Code Summarization. Proceedings of the 32nd ACM\/IEEE International Conference on Software Engineering\u2014Volume 2, New York, NY, USA.","DOI":"10.1145\/1810295.1810335"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Katirtzis, N., Diamantopoulos, T., and Sutton, C. (2018, January 14\u201321). Summarizing Software API Usage Examples using Clustering Techniques. Proceedings of the 21th International Conference on Fundamental Approaches to Software Engineering, Thessaloniki, Greece.","DOI":"10.1007\/978-3-319-89363-1_11"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Janjic, W., Hummel, O., Schumacher, M., and Atkinson, C. (2013, January 18\u201319). An Unabridged Source Code Dataset for Research in Software Reuse. Proceedings of the 10th Working Conference on Mining Software Repositories, San Francisco, CA, USA.","DOI":"10.1109\/MSR.2013.6624047"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"104712","DOI":"10.1016\/j.dib.2019.104712","article-title":"Source code analysis dataset","volume":"27","author":"Gelman","year":"2019","journal-title":"Data Brief"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"e1958","DOI":"10.1002\/smr.1958","article-title":"A Comprehensive Model for Code Readability","volume":"30","author":"Scalabrino","year":"2018","journal-title":"J. Softw. Evol. Process"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"546","DOI":"10.1109\/TSE.2009.70","article-title":"Learning a Metric for Code Readability","volume":"36","author":"Buse","year":"2010","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Posnett, D., Hindle, A., and Devanbu, P. (2011, January 21\u201322). A Simpler Model of Software Readability. Proceedings of the 8th Working Conference on Mining Software Repositories, New York, NY, USA.","DOI":"10.1145\/1985441.1985454"},{"key":"ref_33","unstructured":"Dorn, J. (2012). A General Software Readability Model. [Master\u2019s Thesis, The University of Virginia]."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1002\/spe.4380250705","article-title":"ANTLR: A Predicated-LL(k) Parser Generator","volume":"25","author":"Parr","year":"1995","journal-title":"Softw. Pract. Exper."},{"key":"ref_35","unstructured":"Donnelly, C., and Stallman, R. (2015). Bison: The Yacc-Compatible Parser Generator, Free Software Foundation."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1145\/322139.322143","article-title":"The Tree-to-Tree Correction Problem","volume":"26","author":"Tai","year":"1979","journal-title":"J. ACM"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1670243.1670247","article-title":"The Pq-Gram Distance between Ordered Labeled Trees","volume":"35","author":"Augsten","year":"2008","journal-title":"ACM Trans. Database Syst."},{"key":"ref_38","first-page":"277","article-title":"Localizing Software Bugs using the Edit Distance of Call Traces","volume":"7","author":"Diamantopoulos","year":"2014","journal-title":"Int. J. Adv. Softw."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Parker, Z., Poe, S., and Vrbsky, S. (2013, January 4\u20136). Comparing nosql mongodb to an sql db. Proceedings of the 51st ACM Southeast Conference, Savannah, Georgia.","DOI":"10.1145\/2498328.2500047"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.infsof.2018.07.006","article-title":"Improving code readability classification using convolutional neural networks","volume":"104","author":"Mi","year":"2018","journal-title":"Inf. Softw. Technol."},{"key":"ref_41","first-page":"221","article-title":"Metric and Tool Support for Instant Feedback of Source Code Readability","volume":"15","author":"Choi","year":"2020","journal-title":"Inf. Softw. Technol."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Karanikiotis, T., Papamichail, M.D., Gonidelis, I., Karatza, D., and Symeonidis, A.L. (2020, January 7\u20139). A Data-driven Methodology towards Interpreting Readability against Software Properties. Proceedings of the 15th International Conference on Software Technologies, Held Online.","DOI":"10.5220\/0009891000610072"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Fakhoury, S., Roy, D., Hassan, S.A., and Arnaoudova, V. (2019, January 25\u201326). Improving Source Code Readability: Theory and Practice. Proceedings of the 27th International Conference on Program Comprehension, Montreal, QC, Canada.","DOI":"10.1109\/ICPC.2019.00014"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Roy, D., Fakhoury, S., Lee, J., and Arnaoudova, V. (2020, January 13\u201315). A Model to Detect Readability Improvements in Incremental Changes. Proceedings of the 28th International Conference on Program Comprehension, New York, NY, USA.","DOI":"10.1145\/3387904.3389255"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Papoudakis, A., Karanikiotis, T., and Symeonidis, A. (2022, January 11\u201313). A Mechanism for Automatically Extracting Reusable and Maintainable Code Idioms from Software Repositories. Proceedings of the 17th International Conference on Software Technologies (ICSOFT), Lisbon, Portugal.","DOI":"10.5220\/0011279300003266"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Diamantopoulos, T., Thomopoulos, K., and Symeonidis, A.L. (2016, January 14\u201316). QualBoa: Reusability-aware Recommendations of Source Code Components. Proceedings of the IEEE\/ACM 13th Working Conference on Mining Software Repositories, Austin, TX, USA.","DOI":"10.1145\/2901739.2903492"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Michailoudis, A., Diamantopoulos, T., and Symeonidis, A. (2023, January 10\u201312). Towards Readability-aware Recommendations of Source Code Snippets. Proceedings of the 18th International Conference on Software Technologies (ICSOFT), Rome, Italy.","DOI":"10.5220\/0012145500003538"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"100905","DOI":"10.1016\/j.cola.2019.100905","article-title":"A Nano-Pattern Language for Java","volume":"54","author":"Gil","year":"2019","journal-title":"J. Comput. Lang."},{"key":"ref_49","unstructured":"Diamantopoulos, T., Karagiannopoulos, G., and Symeonidis, A. (June, January 27). CodeCatch: Extracting Source Code Snippets from Online Sources. Proceedings of the IEEE\/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), Gothenburg, Sweden."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1016\/j.infsof.2006.10.017","article-title":"Semantic Clustering: Identifying Topics in Source Code","volume":"49","author":"Kuhn","year":"2007","journal-title":"Inf. Softw. Technol."},{"key":"ref_51","unstructured":"Sillito, J., Maurer, F., Nasehi, S.M., and Burns, C. (2012, January 23\u201328). What Makes a Good Code Example? A Study of Programming Q&A in StackOverflow. Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM), Trento, Italy."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/9\/140\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:44:25Z","timestamp":1760129065000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/9\/140"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,31]]},"references-count":51,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2023,9]]}},"alternative-id":["data8090140"],"URL":"https:\/\/doi.org\/10.3390\/data8090140","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,31]]}}}