{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T04:28:32Z","timestamp":1775276912696,"version":"3.50.1"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA","license":[{"start":{"date-parts":[[2018,10,24]],"date-time":"2018-10-24T00:00:00Z","timestamp":1540339200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2018,10,24]]},"abstract":"<jats:p>Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.<\/jats:p>","DOI":"10.1145\/3276517","type":"journal-article","created":{"date-parts":[[2018,10,24]],"date-time":"2018-10-24T11:57:18Z","timestamp":1540382238000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":248,"title":["DeepBugs: a learning approach to name-based bug detection"],"prefix":"10.1145","volume":"2","author":[{"given":"Michael","family":"Pradel","sequence":"first","affiliation":[{"name":"TU Darmstadt, Germany"}]},{"given":"Koushik","family":"Sen","sequence":"additional","affiliation":[{"name":"University of California at Berkeley, USA"}]}],"member":"320","published-online":{"date-parts":[[2018,10,24]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCAM.2012.28"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2635868.2635883"},{"key":"e_1_2_2_3_1","volume-title":"A Survey of Machine Learning for Big Code and Naturalness. arXiv:1709.06182","author":"Allamanis Miltiadis","year":"2017"},{"key":"e_1_2_2_4_1","volume-title":"SmartPaste: Learning to Adapt Source Code. CoRR abs\/1705.07867","author":"Allamanis Miltiadis","year":"2017"},{"key":"e_1_2_2_5_1","volume-title":"Learning to Represent Programs with Graphs. CoRR abs\/1711.00740","author":"Allamanis Miltiadis","year":"2017"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the 33nd International Conference on Machine Learning, ICML 2016","author":"Allamanis Miltiadis","year":"2016"},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192412"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/503272.503275"},{"key":"e_1_2_2_9_1","unstructured":"M. Amodio S. Chaudhuri and T. Reps. 2017. Neural Attribute Machines for Program Generation. ArXiv e-prints (May 2017). arXiv: cs.AI\/1705.09231  M. Amodio S. Chaudhuri and T. Reps. 2017. Neural Attribute Machines for Program Generation. ArXiv e-prints (May 2017). arXiv: cs.AI\/1705.09231"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106739"},{"key":"e_1_2_2_11_1","volume-title":"Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs\/1603.06129","author":"Bhatia Sahil","year":"2016"},{"key":"e_1_2_2_12_1","volume-title":"Proceedings of the 33nd International Conference on Machine Learning, ICML 2016","author":"Bielik Pavol","year":"2016"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106237.3106280"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSMR.2010.27"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-17524-9_1"},{"key":"e_1_2_2_16_1","volume-title":"Path-Based Function Embedding and its Application to Specification Mining. CoRR abs\/1802.07779","author":"DeFreez Daniel","year":"2018"},{"key":"e_1_2_2_17_1","volume-title":"LAVA: Large-Scale Automated Vulnerability Addition. In IEEE Symposium on Security and Privacy, SP 2016","author":"Dolan-Gavitt Brendan","year":"2016"},{"key":"e_1_2_2_18_1","volume-title":"ECMAScript Language Specification, 5.1 Edition. (June","author":"ECMA.","year":"2011"},{"key":"e_1_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/502034.502041"},{"key":"e_1_2_2_20_1","volume-title":"Machine Learning for Input Fuzzing. CoRR abs\/1701.07232","author":"Godefroid Patrice","year":"2017"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786805.2786831"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950334"},{"key":"e_1_2_2_23_1","doi-asserted-by":"crossref","unstructured":"Rahul Gupta Soham Pal Aditya Kanade and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI.  Rahul Gupta Soham Pal Aditya Kanade and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. In AAAI.","DOI":"10.1609\/aaai.v31i1.10742"},{"key":"e_1_2_2_24_1","volume-title":"Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. 297\u2013304","author":"Gutmann Michael","year":"2010"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238212"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950308"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/581339.581377"},{"key":"e_1_2_2_28_1","volume-title":"34th International Conference on Software Engineering, ICSE 2012","author":"Hindle Abram","year":"2012"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03013-0_14"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028664.1028717"},{"key":"e_1_2_2_31_1","volume-title":"End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks. CoRR abs\/1703.02458","author":"Choi Min","year":"2017"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03237-0_17"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2010.62"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2006.51"},{"key":"e_1_2_2_35_1","volume-title":"Xinyu Ou, Hai Jin, Sujuan Wang, Zhijun Deng, and Yuyi Zhong.","author":"Li Zhen","year":"2018"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884870"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884841"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2017.65"},{"key":"e_1_2_2_39_1","volume-title":"Efficient Estimation of Word Representations in Vector Space. CoRR abs\/1301.3781","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_2_40_1","volume-title":"Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_2_41_1","volume-title":"Detecting Missing Method Calls in Object-Oriented Software. In European Conference on Object-Oriented Programming (ECOOP) . Springer, 2\u201325","author":"Monperrus Martin","year":"2010"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/3015812.3016002"},{"key":"e_1_2_2_43_1","volume-title":"Finding Likely Errors with Bayesian Specifications. CoRR abs\/1703.01370","author":"Murali Vijayaraghavan","year":"2017"},{"key":"e_1_2_2_44_1","volume-title":"Graph-Based Statistical Language Model for Code. In 37th IEEE\/ACM International Conference on Software Engineering, ICSE 2015","volume":"1","author":"Nguyen Anh Tuan","year":"2015"},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2017.47"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2991079.2991103"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2001420.2001448"},{"key":"e_1_2_2_49_1","volume-title":"Statically Checking API Protocol Conformance with Mined Multi-Object Specifications. In International Conference on Software Engineering (ICSE). 925\u2013935","author":"Pradel Michael"},{"key":"e_1_2_2_50_1","volume-title":"TypeDevil: Dynamic Type Inconsistency Analysis for JavaScript. In International Conference on Software Engineering (ICSE) .","author":"Pradel Michael","year":"2015"},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884848"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983990.2984041"},{"key":"e_1_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2837614.2837671"},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/2676726.2677009"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594291.2594321"},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133928"},{"key":"e_1_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-31987-0_28"},{"key":"e_1_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3106237.3106289"},{"key":"e_1_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884804"},{"key":"e_1_2_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2009.30"},{"key":"e_1_2_2_61_1","volume-title":"Memory Networks. CoRR abs\/1410.3916","author":"Weston Jason","year":"2014"},{"key":"e_1_2_2_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/2970276.2970326"},{"key":"e_1_2_2_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2884781.2884862"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3276517","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3276517","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:03:39Z","timestamp":1750273419000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3276517"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,24]]},"references-count":62,"journal-issue":{"issue":"OOPSLA","published-print":{"date-parts":[[2018,10,24]]}},"alternative-id":["10.1145\/3276517"],"URL":"https:\/\/doi.org\/10.1145\/3276517","relation":{},"ISSN":["2475-1421"],"issn-type":[{"value":"2475-1421","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,10,24]]},"assertion":[{"value":"2018-10-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}