{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,12]],"date-time":"2026-07-12T14:08:19Z","timestamp":1783865299338,"version":"3.55.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA","license":[{"start":{"date-parts":[[2019,10,10]],"date-time":"2019-10-10T00:00:00Z","timestamp":1570665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2019,10,10]]},"abstract":"<jats:p>Bug detection has been shown to be an effective way to help developers in detecting bugs early, thus, saving much effort and time in software development process. Recently, deep learning-based bug detection approaches have gained successes over the traditional machine learning-based approaches, the rule-based program analysis approaches, and mining-based approaches. However, they are still limited in detecting bugs that involve multiple methods and suffer high rate of false positives. In this paper, we propose a combination approach with the use of contexts and attention neural network to overcome those limitations. We propose to use as the global context the Program Dependence Graph (PDG) and Data Flow Graph (DFG) to connect the method under investigation with the other relevant methods that might contribute to the buggy code. The global context is complemented by the local context extracted from the path on the AST built from the method\u2019s body. The use of PDG and DFG enables our model to reduce the false positive rate, while to complement for the potential reduction in recall, we make use of the attention neural network mechanism to put more weights on the buggy paths in the source code. That is, the paths that are similar to the buggy paths will be ranked higher, thus, improving the recall of our model. We have conducted several experiments to evaluate our approach on a very large dataset with +4.973M methods in 92 different project versions. The results show that our tool can have a relative improvement up to 160% on F-score when comparing with the state-of-the-art bug detection approaches. Our tool can detect 48 true bugs in the list of top 100 reported bugs, which is 24 more true bugs when comparing with the baseline approaches. We also reported that our representation is better suitable for bug detection and relatively improves over the other representations up to 206% in accuracy.<\/jats:p>","DOI":"10.1145\/3360588","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":141,"title":["Improving bug detection via context-based code representation learning and attention-based neural networks"],"prefix":"10.1145","volume":"3","author":[{"given":"Yi","family":"Li","sequence":"first","affiliation":[{"name":"New Jersey Institute of Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shaohua","family":"Wang","sequence":"additional","affiliation":[{"name":"New Jersey Institute of Technology, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tien N.","family":"Nguyen","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Son","family":"Van Nguyen","sequence":"additional","affiliation":[{"name":"University of Texas at Dallas, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2019,10,10]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2019. The GitHub Repository for This Study. (2019). https:\/\/github.com\/OOPSLA-2019-BugDetection\/OOPSLA-2019-BugDetection  2019. The GitHub Repository for This Study. (2019). https:\/\/github.com\/OOPSLA-2019-BugDetection\/OOPSLA-2019-BugDetection"},{"key":"e_1_2_2_2_1","volume-title":"Sutton","author":"Allamanis Miltiadis","year":"2016"},{"key":"e_1_2_2_3_1","volume-title":"code2vec: Learning Distributed Representations of Code. CoRR abs\/1803.09473","author":"Alon Uri","year":"2018"},{"key":"e_1_2_2_4_1","volume-title":"Reps","author":"Amodio Matthew","year":"2017"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1251535.1251536"},{"key":"e_1_2_2_6_1","volume-title":"Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs\/1603.06129","author":"Bhatia Sahil","year":"2016"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236032"},{"key":"e_1_2_2_8_1","volume-title":"Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Maria Florina Balcan and Kilian Q. Weinberger (Eds.)","volume":"48","author":"Bielik Pavol","year":"2016"},{"key":"e_1_2_2_9_1","volume-title":"Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs\/1406.1078","author":"Cho Kyunghyun","year":"2014"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1176617.1176667"},{"key":"e_1_2_2_11_1","volume-title":"Hinton","author":"Cun Yann Le","year":"1989"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/502059.502041"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/24039.24041"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939754"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1831708.1831723"},{"key":"e_1_2_2_16_1","volume-title":"Reps","author":"Henkel Jordan","year":"2018"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2337223.2337322"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1251535.1251537"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2345156.2254075"},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/512927.512945"},{"key":"e_1_2_2_21_1","volume-title":"Deepcode: Feedback Codes via Deep Learning. CoRR abs\/1807.00801","author":"Kim Hyeji","year":"2018"},{"key":"e_1_2_2_22_1","unstructured":"Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097\u20131105.  Alex Krizhevsky Ilya Sutskever and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097\u20131105."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME.2017.46"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095430.1081755"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884870"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1095430.1081754"},{"key":"e_1_2_2_27_1","volume-title":"Distributed Representations of Words and Phrases and their Compositionality. CoRR abs\/1310.4546","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_2_28_1","volume-title":"27th Annual Conference on Neural Information Processing Systems 2013 (NIPS\u201913)","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Audris Mockus and Lawrence G Votta. 2000. Identifying Reasons for Software Changes using Historic Databases.. In icsm. 120\u2013130.  Audris Mockus and Lawrence G Votta. 2000. Identifying Reasons for Software Changes using Historic Databases.. In icsm. 120\u2013130.","DOI":"10.1109\/ICSM.2000.883028"},{"key":"e_1_2_2_30_1","volume-title":"TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs\/1409.5718","author":"Mou Lili","year":"2014"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786805.2786814"},{"key":"e_1_2_2_32_1","volume-title":"Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen.","author":"Nguyen Hoan Anh","year":"2009"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1595696.1595767"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2813885.2737966"},{"key":"e_1_2_2_35_1","unstructured":"Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic Generative Models of Input Data.  Jibesh Patra and Michael Pradel. 2016. Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic Generative Models of Input Data."},{"key":"e_1_2_2_36_1","volume-title":"DeepBugs: A Learning Approach to Name-based Bug Detection. CoRR abs\/1805.11683","author":"Pradel Michael","year":"2018"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884848"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2635868.2635922"},{"key":"e_1_2_2_39_1","unstructured":"Randy Smith and Susan Horwitz. 2009. Detecting and Measuring Similarity in Code Clones.  Randy Smith and Susan Horwitz. 2009. Detecting and Measuring Similarity in Code Clones."},{"key":"e_1_2_2_40_1","volume-title":"d.]. Soot Introduction. https:\/\/sable.github.io\/soot\/ . ([n. d.]). Last Accessed","year":"2019"},{"key":"e_1_2_2_41_1","volume-title":"Manning","author":"Tai Kai Sheng","year":"2015"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.SNAPL.2017.18"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3196398.3196431"},{"key":"e_1_2_2_44_1","volume-title":"CoRR abs\/1706.03762","author":"Vaswani Ashish","year":"2017"},{"key":"e_1_2_2_45_1","unstructured":"WALA. [n. d.]. WALA Documentation. http:\/\/wala.sourceforge.net\/wiki\/index.php\/Main_Page . ([n. d.]). Last Accessed July 11 2019.  WALA. [n. d.]. WALA Documentation. http:\/\/wala.sourceforge.net\/wiki\/index.php\/Main_Page . ([n. d.]). Last Accessed July 11 2019."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2970276.2970341"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884804"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/1287624.1287632"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2970276.2970326"},{"key":"e_1_2_2_50_1","volume-title":"ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. CoRR abs\/1512.05193","author":"Yin Wenpeng","year":"2015"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/1499949.1499997"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236068"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3360588","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3360588","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:59Z","timestamp":1750202579000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3360588"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,10]]},"references-count":52,"journal-issue":{"issue":"OOPSLA","published-print":{"date-parts":[[2019,10,10]]}},"alternative-id":["10.1145\/3360588"],"URL":"https:\/\/doi.org\/10.1145\/3360588","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,10]]},"assertion":[{"value":"2019-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}