{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T01:09:13Z","timestamp":1768784953026,"version":"3.49.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2024,11,22]],"date-time":"2024-11-22T00:00:00Z","timestamp":1732233600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,11,30]]},"abstract":"<jats:p>\n            Vulnerability detection is a critical problem in software security and attracts growing attention both from academia and industry. Traditionally, software security is safeguarded by designated rule-based detectors that heavily rely on empirical expertise, requiring tremendous effort from software experts to generate rule repositories for large code corpus. Recent advances in deep learning, especially Graph Neural Networks (GNN), have uncovered the feasibility of automatic detection of a wide range of software vulnerabilities. However, prior learning-based works only break programs down into a sequence of word tokens for extracting contextual features of codes, or apply GNN largely on homogeneous graph representation (e.g., AST) without discerning complex types of underlying program entities (e.g., methods, variables). In this work, we are one of the first to explore heterogeneous graph representation in the form of Code Property Graph and adapt a well-known heterogeneous graph network with a dual-supervisor structure for the corresponding graph learning task. Using the prototype built, we have conducted extensive experiments on both synthetic datasets and real-world projects. Compared with the state-of-the-art baselines, the results demonstrate superior performance in vulnerability detection (average F1 improvements over 10% in real-world projects) and language-agnostic transferability from C\/C\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\({+}{+}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            to other programming languages (average F1 improvements over 11%).\n          <\/jats:p>","DOI":"10.1145\/3674729","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T16:53:09Z","timestamp":1719593589000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["<i>DSHGT<\/i>\n            : Dual-Supervisors Heterogeneous Graph Transformer\u2014A Pioneer Study of Using Heterogeneous Graph Learning for Detecting Software Vulnerabilities"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7195-4472","authenticated-orcid":false,"given":"Tiehua","family":"Zhang","sequence":"first","affiliation":[{"name":"Tongji University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9739-8316","authenticated-orcid":false,"given":"Rui","family":"Xu","sequence":"additional","affiliation":[{"name":"Ping An Technology, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1104-1987","authenticated-orcid":false,"given":"Jianping","family":"Zhang","sequence":"additional","affiliation":[{"name":"Fudan University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6903-0294","authenticated-orcid":false,"given":"Yuze","family":"Liu","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-0860-6085","authenticated-orcid":false,"given":"Xin","family":"Chen","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-4351-0598","authenticated-orcid":false,"given":"Jun","family":"Yin","sequence":"additional","affiliation":[{"name":"Ant Group, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2572-2355","authenticated-orcid":false,"given":"Xi","family":"Zheng","sequence":"additional","affiliation":[{"name":"Macquarie University, Sydney, Australia"}]}],"member":"320","published-online":{"date-parts":[[2024,11,22]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.211"},{"key":"e_1_3_2_3_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Allamanis Miltiadis","year":"2018","unstructured":"Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to represent programs with graphs. In Proceedings of the International Conference on Learning Representations. 1\u201317."},{"key":"e_1_3_2_4_2","first-page":"1","volume-title":"Proceedings of the ACM on Programming Languages","volume":"3","author":"Bader Johannes","year":"2019","unstructured":"Johannes Bader, Andrew Scott, Michael Pradel, and Satish Chandra. 2019. Getafix: Learning to fix bugs automatically. Proceedings of the ACM on Programming Languages 3, OOPSLA (2019), 1\u201327."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-022-10375-2"},{"key":"e_1_3_2_6_2","first-page":"1456","volume-title":"Proceedings of the International Conference on Software Engineering","author":"Cao Sicong","year":"2022","unstructured":"Sicong Cao, Xiaobing Sun, Lili Bo, Rongxin Wu, Bin Li, and Chuanqi Tao. 2022. MVD: Memory-related vulnerability detection based on flow-sensitive graph neural networks. In Proceedings of the International Conference on Software Engineering. 1456\u20131468."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5747"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3436877"},{"key":"e_1_3_2_9_2","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/3180445.3180453","volume-title":"Proceedings of the ACM International Workshop on Security and Privacy Analytics","author":"Chernis Boris","year":"2018","unstructured":"Boris Chernis and Rakesh Verma. 2018. Machine learning methods for software vulnerability detection. Proceedings of the ACM International Workshop on Security and Privacy Analytics. 31\u201339."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2010.06.003"},{"issue":"12","key":"e_1_3_2_11_2","first-page":"4818","article-title":"An empirical study on the usage of transformer models for code completion","volume":"48","author":"Ciniselli Matteo","year":"2021","unstructured":"Matteo Ciniselli, Nathan Cooper, Luca Pascarella, Antonio Mastropaolo, Emad Aghajani, Denys Poshyvanyk, Massimiliano Di Penta, and Gabriele Bavota. 2021. An empirical study on the usage of transformer models for code completion. IEEE Transactions on Software Engineering 48, 12 (2021), 4818\u20134837.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"e_1_3_2_12_2","first-page":"1","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Daya Guo","year":"2021","unstructured":"Guo Daya, Ren Shuo, Lu Shuai, Feng Zhangyin, Tang Duyu, Liu Shujie, Zhou Long, Duan Nan, Svyatkovskiy Alexey, Fu Shengyu, Tufano Michele, Deng Shaokun, Clement Colin, Drain Dawn, Sundaresan Neel, Yin Jian, Jiang Daxin, and Zhou Ming. 2021. GraphCodeBert: Pre-training code representations with data flow. In Proceedings of the International Conference on Learning Representations. 1\u201312."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098036"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/502059.502041"},{"key":"e_1_3_2_15_2","first-page":"1","volume-title":"Proceedings of the CLEF (Online Working Notes\/Labs\/Workshop)","author":"Ferschke Oliver","year":"2012","unstructured":"Oliver Ferschke, Iryna Gurevych, and Marc Rittberger. 2012. FlawFinder: A modular system for predicting quality flaws in Wikipedia. In Proceedings of the CLEF (Online Working Notes\/Labs\/Workshop). 1\u201310."},{"key":"e_1_3_2_16_2","first-page":"249","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics","author":"Glorot Xavier","year":"2010","unstructured":"Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249\u2013256."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2021.103009"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3366423.3380027"},{"key":"e_1_3_2_19_2","unstructured":"Katikapalli S. Kalyan Ajit Rajasekharan and Sivanesan Sangeetha. 2021. AMMUS: A survey of transformer-based pretrained models in natural language processing. Retrieved from https:\/\/www.sciencedirect.com\/science\/article\/pii\/S1532046421003117"},{"key":"e_1_3_2_20_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 . Retrieved from https:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_21_2","unstructured":"Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 . Retrieved from https:\/\/arxiv.org\/pdf\/1609.02907"},{"key":"e_1_3_2_22_2","unstructured":"Ted Kremenek. 2008. Finding Software Bugs with the Clang Static Analyzer. Apple Inc. 2008\u201308. Retrieved from https:\/\/llvm.org\/devmtg\/2008-08\/Kremenek_StaticAnalyzer.pdf"},{"key":"e_1_3_2_23_2","first-page":"1188","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Le Quoc","year":"2014","unstructured":"Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188\u20131196."},{"key":"e_1_3_2_24_2","unstructured":"Yujia Li Daniel Tarlow Marc Brockschmidt and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv:1511.05493 . Retrieved from https:\/\/arxiv.org\/abs\/1511.05493"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2021.3076142"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2021.3051525"},{"key":"e_1_3_2_27_2","unstructured":"Zhen Li Deqing Zou Shouhuai Xu Xinyu Ou Hai Jin Sujuan Wang Zhijun Deng and Yuyi Zhong. 2018. Vuldeepecker: A deep learning-based system for vulnerability detection. arXiv:1801.01681 . Retrieved from https:\/\/arxiv.org\/abs\/1801.01681"},{"key":"e_1_3_2_28_2","first-page":"10","volume-title":"Proceedings of the Workshop on Financial Technology and Natural Language Processing","author":"Liang Rong","year":"2022","unstructured":"Rong Liang, Tiehua Zhang, Yujie Lu, Yuze Liu, Zhen Huang, and Xin Chen. 2022. AstBERT: Enabling language model for financial code understanding with abstract syntax trees. In Proceedings of the Workshop on Financial Technology and Natural Language Processing. 10\u201317."},{"key":"e_1_3_2_29_2","first-page":"1825","volume-title":"Proceedings of the IEEE","volume":"108","author":"Lin Guanjun","year":"2020","unstructured":"Guanjun Lin, Sheng Wen, Qing-Long Han, Jun Zhang, and Yang Xiang. 2020. Software vulnerability detection using deep neural networks: A survey. Proceedings of the IEEE 108, 10 (2020), 1825\u20131848."},{"key":"e_1_3_2_30_2","first-page":"2469","volume-title":"IEEE Transactions on Dependable and Secure Computing","volume":"18","author":"Lin Guanjun","year":"2019","unstructured":"Guanjun Lin, Jun Zhang, Wei Luo, Lei Pan, Olivier De Vel, Paul Montague, and Yang Xiang. 2019. Software vulnerability discovery via learning multi-domain knowledge bases. IEEE Transactions on Dependable and Secure Computing 18, 5 (2019), 2469\u20132485."},{"key":"e_1_3_2_31_2","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 . Retrieved from https:\/\/arxiv.org\/abs\/1301.3781"},{"key":"e_1_3_2_32_2","first-page":"1","volume-title":"Proceedings of the ACM on Programming Languages 2, OOPSLA","author":"Pradel Michael","year":"2018","unstructured":"Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages 2, OOPSLA (2018), 1\u201325."},{"key":"e_1_3_2_33_2","volume-title":"International Conference on Software Engineering","author":"Sadowski Caitlin","year":"2015","unstructured":"Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Soederberg, and Collin Winter. 2015. Tricorder: Building a program analysis ecosystem. In International Conference on Software Engineering."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2010.81"},{"key":"e_1_3_2_35_2","first-page":"257","volume-title":"Proceedings of the Annual Computer Security Applications Conference","author":"Viega John","year":"2001","unstructured":"John Viega, J. T. Bloch, Yoshi Kohno, and Gary McGraw. 2001. A static vulnerability scanner for C and C++ code. In Proceedings of the Annual Computer Security Applications Conference. 257\u2013269."},{"key":"e_1_3_2_36_2","first-page":"13","volume-title":"Proceedings of the IEEE\/ACM International Conference on Automated Software Engineering","author":"Wan Yao","year":"2019","unstructured":"Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, and Philip Yu. 2019. Multi-modal attention network learning for semantic source code retrieval. In Proceedings of the IEEE\/ACM International Conference on Automated Software Engineering. 13\u201325."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2020.3044773"},{"key":"e_1_3_2_38_2","first-page":"2249","volume-title":"Proceedings of the International Conference on Software Engineering","author":"Wang Wenbo","year":"2023","unstructured":"Wenbo Wang, Tien N. Nguyen, Shaohua Wang, Yi Li, Jiyuan Zhang, and Aashish Yadavally. 2023. DeepVD: Toward class-separation features for neural network vulnerability detection. In Proceedings of the International Conference on Software Engineering. 2249\u20132261."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2022.3177455"},{"key":"e_1_3_2_40_2","first-page":"2275\u20132286","volume-title":"Proceedings of the International Conference on Software Engineering","author":"Wen Xin-Cheng","year":"2023","unstructured":"Xin-Cheng Wen, Yupan Chen, Cuiyun Gao, Hongyu Zhang, Jie M. Zhang, and Qing Liao. 2023. Vulnerability detection with graph simplification and enhanced graph representation learning. In Proceedings of the International Conference on Software Engineering. 2275\u20132286."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2014.44"},{"key":"e_1_3_2_42_2","first-page":"1","volume-title":"Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining Workshop on Deep Learning on Graphs","author":"Zhang Tiehua","year":"2022","unstructured":"Tiehua Zhang, Yuze Liu, Xin Chen, Xiaowei Huang, Feng Zhu, and Xi Zheng. 2022. GPS: A policy-driven sampling approach for graph representation learning. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining Workshop on Deep Learning on Graphs. 1\u20137."},{"key":"e_1_3_2_43_2","first-page":"1536","volume-title":"Proceedings of the Empirical Methods in Natural Language Processing","author":"Zhangyin Feng","year":"2020","unstructured":"Feng Zhangyin, Guo Daya, Tang Duyu, Duan Nan, Feng Xiaocheng, Gong Ming, Shou Linjun, Qin Bing, Liu Ting, Jiang Daxin, and Zhou Ming. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Proceedings of the Empirical Methods in Natural Language Processing. 1536\u20131547."},{"key":"e_1_3_2_44_2","first-page":"1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"32","author":"Zhou Yaqin","year":"2019","unstructured":"Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32. 1\u201311."},{"key":"e_1_3_2_45_2","first-page":"2224","volume-title":"IEEE Transactions on Dependable and Secure Computing","volume":"18","author":"Zou Deqing","year":"2019","unstructured":"Deqing Zou, Sujuan Wang, Shouhuai Xu, Zhen Li, and Hai Jin. 2019. \\(\\mu\\) VulDeePecker: A deep learning-based system for multiclass vulnerability detection. IEEE Transactions on Dependable and Secure Computing 18, 5 (2019), 2224\u20132236."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674729","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3674729","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:57:50Z","timestamp":1750294670000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3674729"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":44,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,11,30]]}},"alternative-id":["10.1145\/3674729"],"URL":"https:\/\/doi.org\/10.1145\/3674729","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,22]]},"assertion":[{"value":"2023-07-27","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}