{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T16:23:42Z","timestamp":1779380622180,"version":"3.53.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T00:00:00Z","timestamp":1657584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Luxembourg National Research Funds","award":["C18\/IS\/ 12669767\/STELLAR\/LeTraon"],"award-info":[{"award-number":["C18\/IS\/ 12669767\/STELLAR\/LeTraon"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2022,10,31]]},"abstract":"<jats:p>Similar to traditional software that is constantly under evolution, deep neural networks need to evolve upon the rapid growth of test data for continuous enhancement (e.g., adapting to distribution shift in a new environment for deployment). However, it is labor intensive to manually label all of the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, deep neural networks will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: (1) using different retraining processes, (2) ignoring data distribution shifts, and (3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose DAT, a novel distribution-aware test selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to five times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively.<\/jats:p>","DOI":"10.1145\/3511598","type":"journal-article","created":{"date-parts":[[2022,4,19]],"date-time":"2022-04-19T12:40:29Z","timestamp":1650372029000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":41,"title":["An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement"],"prefix":"10.1145","volume":"31","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8251-1669","authenticated-orcid":false,"given":"Qiang","family":"Hu","sequence":"first","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5535-2420","authenticated-orcid":false,"given":"Yuejun","family":"Guo","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8312-1358","authenticated-orcid":false,"given":"Maxime","family":"Cordy","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1288-6502","authenticated-orcid":false,"given":"Xiaofei","family":"Xie","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lei","family":"Ma","sequence":"additional","affiliation":[{"name":"University of Alberta, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mike","family":"Papadakis","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yves","family":"Le Traon","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,7,12]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416609"},{"key":"e_1_3_2_3_2","article-title":"End to end learning for self-driving cars","author":"Bojarski Mariusz","year":"2016","unstructured":"Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, et\u00a0al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016).","journal-title":"arXiv preprint arXiv:1604.07316"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394112"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409759"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the IEEE\/ACM 43nd International Conference on Software Engineering (ICSE\u201921)","author":"Dola Swaroopa","year":"2021","unstructured":"Swaroopa Dola, Matthew B. Dwyer, and Mary Lou Soffa. 2021. Distribution-aware testing of neural networks using generative models. In Proceedings of the IEEE\/ACM 43nd International Conference on Software Engineering (ICSE\u201921). IEEE, Los Alamitos, CA."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338954"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397357"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISIT.2004.1365067"},{"key":"e_1_3_2_11_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Goodfellow Ian J.","year":"2015","unstructured":"Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1412.6572."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00080"},{"key":"e_1_3_2_13_2","article-title":"Robust active learning: Sample-efficient training of robust deep learning models","author":"Guo Yuejun","year":"2021","unstructured":"Yuejun Guo, Qiang Hu, Maxime Cordy, Mike Papadakis, and Yves Le Traon. 2021. Robust active learning: Sample-efficient training of robust deep learning models. arXiv preprint arXiv:2112.02542 (2021).","journal-title":"arXiv preprint arXiv:2112.02542"},{"key":"e_1_3_2_14_2","article-title":"A baseline for detecting misclassified and out-of-distribution examples in neural networks","author":"Hendrycks Dan","year":"2016","unstructured":"Dan Hendrycks and Kevin Gimpel. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).","journal-title":"arXiv preprint arXiv:1610.02136"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Hendrycks Dan","year":"2019","unstructured":"Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2019. Deep anomaly detection with outlier exposure. In Proceedings of the International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=HyxCxhRcY7."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678672"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00108"},{"key":"e_1_3_2_18_2","article-title":"Wilds: A benchmark of in-the-wild distribution shifts","author":"Koh Pang Wei","year":"2020","unstructured":"Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, et\u00a0al. 2020. Wilds: A benchmark of in-the-wild distribution shifts. arXiv preprint arXiv:2012.07421 (2020).","journal-title":"arXiv preprint arXiv:2012.07421"},{"key":"e_1_3_2_19_2","article-title":"Active testing: Sample-efficient model evaluation","author":"Kossen Jannik","year":"2021","unstructured":"Jannik Kossen, Sebastian Farquhar, Yarin Gal, and Tom Rainforth. 2021. Active testing: Sample-efficient model evaluation. arXiv preprint arXiv:2103.05331 (2021).","journal-title":"arXiv preprint arXiv:2103.05331"},{"key":"e_1_3_2_20_2","volume-title":"Learning Multiple Layers of Features from Tiny Images","author":"Krizhevsky Alex","year":"2009","unstructured":"Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50048-7"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_23_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Lee Kimin","year":"2018","unstructured":"Kimin Lee, Honglak Lee, Kibok Lee, and Jinwoo Shin. 2018. Training confidence-calibrated classifiers for detecting out-of-distribution samples. In Proceedings of the International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=ryiAv2xAZ."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.116"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338930"},{"key":"e_1_3_2_26_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Liang Shiyu","year":"2018","unstructured":"Shiyu Liang, Yixuan Li, and R. Srikant. 2018. Enhancing the reliability of out-of-distribution image detection in neural networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238202"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3417330"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002491"},{"key":"e_1_3_2_30_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Madry Aleksander","year":"2018","unstructured":"Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-5660-4"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.5555\/983238"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132747.3132785"},{"key":"e_1_3_2_34_2","article-title":"Likelihood ratios for out-of-distribution detection","author":"Ren Jie","year":"2019","unstructured":"Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, and Balaji Lakshminarayanan. 2019. Likelihood ratios for out-of-distribution detection. arXiv preprint arXiv:1906.02845 (2019).","journal-title":"arXiv preprint arXiv:1906.02845"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2019.12.012"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1142\/5847"},{"key":"e_1_3_2_37_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Serr\u00e0 Joan","year":"2020","unstructured":"Joan Serr\u00e0, David \u00c1lvarez, Vicen\u00e7 G\u00f3mez, Olga Slizovskaia, Jos\u00e9 F. N\u00fa\u00f1ez, and Jordi Luque. 2020. Input complexity and out-of-distribution detection with likelihood-based generative models. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416621"},{"key":"e_1_3_2_40_2","article-title":"DeepID3: Face recognition with very deep neural networks.","volume":"1502","author":"Sun Yi","year":"2015","unstructured":"Yi Sun, Ding Liang, Xiaogang Wang, and Xiaoou Tang. 2015. DeepID3: Face recognition with very deep neural networks.CoRR abs\/1502.00873 (2015). http:\/\/dblp.uni-trier.de\/db\/journals\/corr\/corr1502.html#SunLWT15.","journal-title":"CoRR"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238172"},{"key":"e_1_3_2_42_2","volume-title":"Computer Vision: Algorithms and Applications.","author":"Szeliski Richard","year":"2010","unstructured":"Richard Szeliski. 2010. Computer Vision: Algorithms and Applications.Springer-Verlag, Berlin, Germany."},{"key":"e_1_3_2_43_2","article-title":"Measuring robustness to natural distribution shifts in image classification","author":"Taori Rohan","year":"2020","unstructured":"Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, and Ludwig Schmidt. 2020. Measuring robustness to natural distribution shifts in image classification. arXiv preprint arXiv:2007.00644 (2020).","journal-title":"arXiv preprint arXiv:2007.00644"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180220"},{"key":"e_1_3_2_45_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-019-1724-z"},{"key":"e_1_3_2_46_2","unstructured":"Daisuke Wakabayashi. 2018. Self-driving Uber car kills pedestrian in Arizona where robots roam. New York Times. Retrieved April 25 2022 from https:\/\/www.nytimes.com\/2018\/03\/19\/technology\/uber-driverless-fatality.html."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2014.6889457"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00038"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00046"},{"key":"e_1_3_2_50_2","article-title":"Understanding concept drift","volume":"1704","author":"Webb Geoffrey I.","year":"2017","unstructured":"Geoffrey I. Webb, Loong Kuan Lee, Fran\u00e7ois Petitjean, and Bart Goethals. 2017. Understanding concept drift. CoRR abs\/1704.00362 (2017). http:\/\/arxiv.org\/abs\/1704.00362.","journal-title":"CoRR"},{"key":"e_1_3_2_51_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wong Eric","year":"2019","unstructured":"Eric Wong, Leslie Rice, and J. Zico Kolter. 2019. Fast is better than free: Revisiting adversarial training. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_52_2","volume-title":"Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms","author":"Xiao Han","year":"2017","unstructured":"Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:cs.LG\/1708.07747 [cs.LG] (2017)."},{"key":"e_1_3_2_53_2","article-title":"Achieving human parity in conversational speech recognition","author":"Xiong Wayne","year":"2016","unstructured":"Wayne Xiong, Jasha Droppo, Xuedong Huang, Frank Seide, Mike Seltzer, Andreas Stolcke, Dong Yu, and Geoffrey Zweig. 2016. Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256 (2016).","journal-title":"arXiv preprint arXiv:1610.05256"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE.2019.00020"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380368"},{"key":"e_1_3_2_56_2","first-page":"2808","article-title":"MetaSelection: Metaheuristic sub-structure selection for neural network pruning using evolutionary algorithm","volume":"325","author":"Zhang Zixun","year":"2020","unstructured":"Zixun Zhang, Zhen Li, Lin Lin, Na Lei, Guanbin Li, and Shuguang Cui. 2020. MetaSelection: Metaheuristic sub-structure selection for neural network pruning using evolutionary algorithm. Frontiers in Artificial Intelligence and Applications 325 (2020), 2808\u20132815.","journal-title":"Frontiers in Artificial Intelligence and Applications"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511598","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3511598","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:04Z","timestamp":1750182664000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511598"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,12]]},"references-count":55,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,10,31]]}},"alternative-id":["10.1145\/3511598"],"URL":"https:\/\/doi.org\/10.1145\/3511598","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,12]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-07-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}