{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,16]],"date-time":"2026-05-16T07:10:07Z","timestamp":1778915407420,"version":"3.51.4"},"reference-count":64,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T00:00:00Z","timestamp":1640304000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-sa\/4.0\/"}],"funder":[{"name":"Luxembourg National Research Fund","award":["14591304 and 11693861"],"award-info":[{"award-number":["14591304 and 11693861"]}]},{"name":"Luxembourg Ministry of Foreign and European Affairs through their Digital4Development"},{"name":"European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme","award":["949014"],"award-info":[{"award-number":["949014"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2022,4,30]]},"abstract":"<jats:p>\n            Recent successes in training word embeddings for\n            <jats:bold>Natural Language Processing<\/jats:bold>\n            (\n            <jats:bold>NLP<\/jats:bold>\n            ) tasks have encouraged a wave of research on representation learning for source code, which builds on similar NLP methods. The overall objective is then to produce code embeddings that capture the maximum of program semantics. State-of-the-art approaches invariably rely on a syntactic representation (i.e., raw lexical tokens, abstract syntax trees, or intermediate representation tokens) to generate embeddings, which are criticized in the literature as non-robust or non-generalizable. In this work, we investigate a novel embedding approach based on the intuition that source code has visual patterns of semantics. We further use these patterns to address the outstanding challenge of identifying semantic code clones. We propose the\n            <jats:sc>\n              <jats:bold>WySiWiM<\/jats:bold>\n            <\/jats:sc>\n            \u00a0(\n            <jats:italic>\u2018<\/jats:italic>\n            <jats:bold>\u2018What You See Is What It Means<\/jats:bold>\n            <jats:italic>\u201d<\/jats:italic>\n            ) approach where visual representations of source code are fed into powerful pre-trained image classification neural networks from the field of computer vision to benefit from the practical advantages of transfer learning. We evaluate the proposed embedding approach on the task of vulnerable code prediction in source code and on two variations of the task of semantic code clone identification: code clone detection (a binary classification problem), and code classification (a multi-classification problem). We show with experiments on the BigCloneBench (Java), Open Judge (C) that although simple, our\n            <jats:sc>WySiWiM<\/jats:sc>\n            \u00a0approach performs as effectively as state-of-the-art approaches such as ASTNN or TBCNN. We also showed with data from NVD and SARD that\n            <jats:sc>WySiWiM<\/jats:sc>\n            \u00a0representation can be used to learn a vulnerable code detector with reasonable performance (accuracy \u223c90%). We further explore the influence of different steps in our approach, such as the choice of visual representations or the classification algorithm, to eventually discuss the promises and limitations of this research direction.\n          <\/jats:p>","DOI":"10.1145\/3485135","type":"journal-article","created":{"date-parts":[[2021,12,24]],"date-time":"2021-12-24T14:22:36Z","timestamp":1640355756000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning"],"prefix":"10.1145","volume":"31","author":[{"given":"Patrick","family":"Keller","sequence":"first","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3151-9433","authenticated-orcid":false,"given":"Abdoul Kader","family":"Kabor\u00e9","sequence":"additional","affiliation":[{"name":"University of Luxembourg, CITADEL at Universit\u00e9 Virtuelle du Burkina Faso, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Laura","family":"Plein","sequence":"additional","affiliation":[{"name":"Saarland University, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jacques","family":"Klein","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yves","family":"Le Traon","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tegawend\u00e9 F.","family":"Bissyand\u00e9","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,12,24]]},"reference":[{"key":"e_1_3_3_2_2","unstructured":"Miltiadis Allamanis. 2018. The Adverse Effects of Code Duplication in Machine Learning Models of Code. CoRR abs\/1812.06469 (2018). arXiv:1812.06469 http:\/\/arxiv.org\/abs\/1812.06469."},{"key":"e_1_3_3_3_2","unstructured":"Uri Alon Omer Levy and Eran Yahav. 2018. code2seq: Generating Sequences from Structured Representations of Code. CoRR abs\/1808.01400 (2018). arXiv:1808.01400 http:\/\/arxiv.org\/abs\/1808.01400."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290353"},{"key":"e_1_3_3_5_2","unstructured":"Ambient Software Evoluton Group. 2013. IJaDataset 2.0. Retrieved on 10 August 2020 from https:\/\/sites.google.com\/site\/asegsecold\/projects\/seclone."},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.73"},{"key":"e_1_3_3_7_2","first-page":"49","volume-title":"Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface","volume":"24","author":"Baker B. S.","year":"1992","unstructured":"B. S. Baker. 1992. A program for identifying duplicated code. In Computing Science and Statistics: Proceedings of the 24th Symposium on the Interface, Vol. 24. 49\u201357."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.5555\/850947.853341"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2007.70725"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-009-5152-4"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.5555\/3327144.3327276"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380389"},{"key":"e_1_3_3_13_2","unstructured":"Zimin Chen and Martin Monperrus. 2019. A Literature Study of Embeddings on Source Code. CoRR abs\/1904.03061 (2019). arXiv:1904.03061 http:\/\/arxiv.org\/abs\/1904.03061"},{"issue":"101","key":"e_1_3_3_14_2","first-page":"102","article-title":"Dawnbench: An end-to-end deep learning benchmark and competition","volume":"100","author":"Coleman Cody","year":"2017","unstructured":"Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris R\u00e9, and Matei Zaharia. 2017. Dawnbench: An end-to-end deep learning benchmark and competition. Training 100, 101 (2017), 102.","journal-title":"Training"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022627411411"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1053964"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_3_18_2","unstructured":"FaCoY. 2017. https:\/\/github.com\/facoy\/facoy."},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2019.00025"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.5555\/541500"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_3_22_2","unstructured":"Mi-Young Huh Pulkit Agrawal and Alexei A. Efros. 2016. What makes ImageNet good for transfer learning? CoRR abs\/1608.08614 (2016). arXiv:1608.08614 http:\/\/arxiv.org\/abs\/1608.08614."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2017.156"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2007.30"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/1572272.1572283"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CSMR.2010.33"},{"issue":"1","key":"e_1_3_3_27_2","first-page":"1005","article-title":"A survey on image classification approaches and techniques","volume":"2","author":"Kamavisdar Pooja","year":"2013","unstructured":"Pooja Kamavisdar, Sonam Saluja, and Sonu Agrawal. 2013. A survey on image classification approaches and techniques. International Journal of Advanced Research in Computer and Communication Engineering 2, 1 (2013), 1005\u20131009.","journal-title":"International Journal of Advanced Research in Computer and Communication Engineering"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2002.1019480"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/1985793.1985835"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/832308.837142"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999134.2999257"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/WCRE.2013.6671332"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2017.46"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2889160.2889204"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.3390\/app10051692"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/1251254.1251274"},{"key":"e_1_3_3_38_2","unstructured":"Zhen Li Deqing Zou Shouhuai Xu Hai Jin Yawei Zhu Zhaoxuan Chen Sujuan Wang and Jialai Wang. 2018. SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities. CoRR abs\/1807.06756 (2018). arXiv:1807.06756 http:\/\/arxiv.org\/abs\/1807.06756."},{"key":"e_1_3_3_39_2","unstructured":"Zhen Li Deqing Zou Shouhuai Xu Xinyu Ou Hai Jin Sujuan Wang Zhijun Deng and Yuyi Zhong. 2018. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. CoRR abs\/1801.01681 (2018). arXiv:1801.01681 http:\/\/arxiv.org\/abs\/1801.01681."},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150522"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1080\/01431160600746456"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/872023.872542"},{"issue":"34","key":"e_1_3_3_43_2","first-page":"597","article-title":"Design principles and design patterns","volume":"1","author":"Martin Robert C.","year":"2000","unstructured":"Robert C. Martin. 2000. Design principles and design patterns. Object Mentor 1, 34 (2000), 597.","journal-title":"Object Mentor"},{"key":"e_1_3_3_44_2","unstructured":"Kris McGuffie and Alex Newhouse. 2020. The Radicalization Risks of GPT-3 and Advanced Neural Language Models. CoRR abs\/2009.06807 (2020). arXiv:2009.06807 https:\/\/arxiv.org\/abs\/2009.06807."},{"key":"e_1_3_3_45_2","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013). https:\/\/arxiv.org\/abs\/1301.3781."},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999959"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.5555\/3015812.3016002"},{"key":"e_1_3_3_48_2","article-title":"National Vulnerability Database","author":"Technology National Institute of Standards and","year":"2018","unstructured":"National Institute of Standards and Technology. 2018. National Vulnerability Database. Retrieved on 16 August, 2020 http:\/\/nvd.nist.gov\/.","journal-title":"http:\/\/nvd.nist.gov\/"},{"key":"e_1_3_3_49_2","article-title":"Software Assurance Reference Dataset","author":"Technology National Institute of Standards and","year":"2018","unstructured":"National Institute of Standards and Technology. 2018. Software Assurance Reference Dataset. Retrieved on 16 August, 2020 https:\/\/samate.nist.gov\/ SRD\/index.php.","journal-title":"https:\/\/samate.nist.gov\/ SRD\/index.php"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_3_3_51_2","article-title":"Pytorch","volume":"1","author":"Paszke Adam","year":"2017","unstructured":"Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. 2017. Pytorch. Computer Software. Vers. 0.3 1 (2017).","journal-title":"Computer Software. Vers. 0.3"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSR.2019.00064"},{"key":"e_1_3_3_54_2","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1109\/IWSC.2018.8327318","volume-title":"2018 IEEE 12th International Workshop on Software Clones (IWSC\u201918)","author":"Ragkhitwetsagul Chaiyong","year":"2018","unstructured":"Chaiyong Ragkhitwetsagul, Jens Krinke, and Bruno Marnette. 2018. A picture is worth a thousand words: Code clone detection based on image similarity. In 2018 IEEE 12th International Workshop on Software Clones (IWSC\u201918). IEEE, 44\u201350."},{"key":"e_1_3_3_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.scico.2009.02.007"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236026"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950321"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950321"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2014.77"},{"key":"e_1_3_3_60_2","article-title":"Common Weakness Enumeration","author":"Corporation The MITRE","year":"2018","unstructured":"The MITRE Corporation. 2018. Common Weakness Enumeration. Retrieved on 16 August, 2020 https:\/\/cwe.mitre.org\/.","journal-title":"https:\/\/cwe.mitre.org\/"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3196398.3196431"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.5555\/3172077.3172312"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00086"},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045289"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2020.3004555"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485135","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3485135","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:18:35Z","timestamp":1750191515000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3485135"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,24]]},"references-count":64,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4,30]]}},"alternative-id":["10.1145\/3485135"],"URL":"https:\/\/doi.org\/10.1145\/3485135","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12,24]]},"assertion":[{"value":"2020-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}