{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T12:53:42Z","timestamp":1773147222612,"version":"3.50.1"},"reference-count":59,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T00:00:00Z","timestamp":1700784000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Luxembourg National Research Funds","award":["C18\/IS\/12669767\/STELLAR\/LeTraon"],"award-info":[{"award-number":["C18\/IS\/12669767\/STELLAR\/LeTraon"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2024,1,31]]},"abstract":"<jats:p>Applying deep learning (DL) to science is a new trend in recent years, which leads DL engineering to become an important problem. Although training data preparation, model architecture design, and model training are the normal processes to build DL models, all of them are complex and costly. Therefore, reusing the open-sourced pre-trained model is a practical way to bypass this hurdle for developers. Given a specific task, developers can collect massive pre-trained deep neural networks from public sources for reusing. However, testing the performance (e.g., accuracy and robustness) of multiple deep neural networks (DNNs) and recommending which model should be used is challenging regarding the scarcity of labeled data and the demand for domain expertise. In this article, we propose a labeling-free (LaF) model selection approach to overcome the limitations of labeling efforts for automated model reusing. The main idea is to statistically learn a Bayesian model to infer the models\u2019 specialty only based on predicted labels. We evaluate LaF using nine benchmark datasets, including image, text, and source code, and 165 DNNs, considering both the accuracy and robustness of models. The experimental results demonstrate that LaF outperforms the baseline methods by up to 0.74 and 0.53 on Spearman\u2019s correlation and Kendall\u2019s \u03c4, respectively.<\/jats:p>","DOI":"10.1145\/3611666","type":"journal-article","created":{"date-parts":[[2023,7,31]],"date-time":"2023-07-31T12:10:27Z","timestamp":1690805427000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["LaF: Labeling-free Model Selection for Automated Deep Neural Network Reusing"],"prefix":"10.1145","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8251-1669","authenticated-orcid":false,"given":"Qiang","family":"Hu","sequence":"first","affiliation":[{"name":"University of Luxembourg, Luxembourg"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5535-2420","authenticated-orcid":false,"given":"Yuejun","family":"Guo","sequence":"additional","affiliation":[{"name":"Luxembourg Institute of Science and Technology, Luxembourg"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1288-6502","authenticated-orcid":false,"given":"Xiaofei","family":"Xie","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8312-1358","authenticated-orcid":false,"given":"Maxime","family":"Cordy","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1852-2547","authenticated-orcid":false,"given":"Mike","family":"Papadakis","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1045-4861","authenticated-orcid":false,"given":"Yves","family":"Le Traon","sequence":"additional","affiliation":[{"name":"University of Luxembourg, Luxembourg"}]}],"member":"320","published-online":{"date-parts":[[2023,11,24]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"AOJ: Online Programming Challenge. 2018. AIZU online judge. Retrieved from https:\/\/judge.u-aizu.ac.jp\/onlinejudge\/. Accessed 10 January 2021."},{"key":"e_1_3_2_3_2","unstructured":"Retrieved from github.com\/Testing-Multiple-DL-Models\/SDS\/tree\/ main\/models 2021 DNN models for Fashion-MNIST"},{"key":"e_1_3_2_4_2","unstructured":"LaF project site. 2021. Project website of ranking multiple DNNs. Retrieved from https:\/\/sites.google.com\/view\/ranking-of-multiple-DNNs"},{"key":"e_1_3_2_5_2","unstructured":"Retrieved from 2022"},{"key":"e_1_3_2_6_2","unstructured":"MLOps. 2022. Machine Learning Model Operationalization Management . Retrieved from https:\/\/ml-ops.org\/"},{"key":"e_1_3_2_7_2","unstructured":"SciPy. 2022. Retrieved from https:\/\/scipy.org\/. Accessed 27 January 2022."},{"key":"e_1_3_2_8_2","first-page":"265","volume-title":"Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)","volume":"16","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard et\u00a0al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916), Vol. 16, 265\u2013283."},{"key":"e_1_3_2_9_2","volume-title":"Safety Assurance Objectives for Autonomous Systems","author":"Alexander Rob","year":"2020","unstructured":"Rob Alexander, Rob Ashmore, Andrew Banks, Ben Bradshaw, John Bragg, John Clegg, Christopher Harper, Catherine Menon, Roger Rivett, Philippa Ryan, Nick Tudor, Stuart Tushingham, John Birch, Lavinia Burski, Timothy Coley, Neil Lewis, Ken Neal, Ashley Price, Stuart Reid, and Rod Steel. 2020. Safety Assurance Objectives for Autonomous Systems."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3290353"},{"key":"e_1_3_2_11_2","article-title":"The iWildCam 2020 competition dataset","author":"Beery Sara","year":"2020","unstructured":"Sara Beery, Elijah Cole, and Arvi Gjoka. 2020. The iWildCam 2020 competition dataset. Retrieved from https:\/\/arXiv:2004.10340","journal-title":"Retrieved from"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416609"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394112"},{"key":"e_1_3_2_14_2","volume-title":"Applied Nonparametric Statistics","author":"Daniel W. Wayne","year":"1990","unstructured":"W. Wayne Daniel. 1990. Applied Nonparametric Statistics. PWS-KENT Pub.89009463 Retrieved from https:\/\/books.google.lu\/books?id=0hPvAAAAMAAJ"},{"issue":"1","key":"e_1_3_2_15_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the em algorithm","volume":"39","author":"Dempster A. P.","year":"1977","unstructured":"A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39, 1 (1977), 1\u201338. Retrieved from http:\/\/www.jstor.org\/stable\/2984875","journal-title":"J. Roy. Stat. Soc. Ser. B (Methodol.)"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00032"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1177\/001316445401400215"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397357"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510232"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3264835"},{"key":"e_1_3_2_21_2","article-title":"MUTEN: Boosting gradient-based adversarial attacks via mutant-based ensembles","author":"Guo Yuejun","year":"2021","unstructured":"Yuejun Guo, Qiang Hu, Maxime Cordy, Michail Papadakis, and Yves Le Traon. 2021. MUTEN: Boosting gradient-based adversarial attacks via mutant-based ensembles. Retrieved from https:\/\/arXiv:2109.12838","journal-title":"Retrieved from"},{"key":"e_1_3_2_22_2","article-title":"Benchmarking neural network robustness to common corruptions and perturbations","author":"Hendrycks Dan","year":"2019","unstructured":"Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural network robustness to common corruptions and perturbations. Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=HJz6tiCqYm","journal-title":"Proceedings of the International Conference on Learning Representations"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00152"},{"key":"e_1_3_2_24_2","article-title":"Understanding and testing generalization of deep networks on out-of-distribution data","author":"Hu Rui","year":"2021","unstructured":"Rui Hu, Jitao Sang, Jinqiang Wang, and Chaoquan Jiang. 2021. Understanding and testing generalization of deep networks on out-of-distribution data. Retrieved from https:\/\/arXiv:2111.09190","journal-title":"Retrieved from"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00141"},{"key":"e_1_3_2_26_2","article-title":"Software testing methods and techniques","volume":"30","author":"Jovanovi\u0107 Irena","year":"2006","unstructured":"Irena Jovanovi\u0107. 2006. Software testing methods and techniques. IPSI BgD Trans. Internet Res. 30 (2006).","journal-title":"IPSI BgD Trans. Internet Res."},{"key":"e_1_3_2_27_2","volume-title":"Proceedings of the International Conference on Information and Image Processing (ICIIP\u201914)","author":"Kavitha S.","year":"2014","unstructured":"S. Kavitha and D. Jeevitha. 2014. Software testing methods and techniques. In Proceedings of the International Conference on Information and Image Processing (ICIIP\u201914)."},{"key":"e_1_3_2_28_2","volume-title":"Proceedings of the International Conference on Machine Learning (ICML\u201921)","author":"Koh Pang Wei","year":"2021","unstructured":"Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang. 2021. WILDS: A benchmark of in-the-wild distribution shifts. In Proceedings of the International Conference on Machine Learning (ICML\u201921)."},{"key":"e_1_3_2_29_2","volume-title":"Learning Multiple Layers of Features From Tiny Images","author":"Krizhevsky Alex","year":"2009","unstructured":"Alex Krizhevsky. 2009. Learning Multiple Layers of Features From Tiny Images. Technical Report. University of Toronto, Toronto."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_2_31_2","first-page":"20874","article-title":"TestRank: Bringing order into unlabeled test instances for deep learning tasks","volume":"34","author":"Li Yu","year":"2021","unstructured":"Yu Li, Min Li, Qiuxia Lai, Yannan Liu, and Qiang Xu. 2021. TestRank: Bringing order into unlabeled test instances for deep learning tasks. Adv. Neural Inf. Process. Syst. 34 (2021), 20874\u201320886.","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338930"},{"key":"e_1_3_2_33_2","first-page":"499","volume-title":"Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering","author":"Li Zenan","year":"2019","unstructured":"Zenan Li, Xiaoxing Ma, Chang Xu, Chun Cao, Jingwei Xu, and Jian L\u00fc. 2019. Boosting operational DNN testing efficiency through conditioning. In Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 499\u2013509."},{"key":"e_1_3_2_34_2","first-page":"6471","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Li Zhong","year":"2021","unstructured":"Zhong Li, Minxue Pan, Tian Zhang, and Xuandong Li. 2021. Testing DNN-based autonomous driving systems under critical environmental conditions. In Proceedings of the International Conference on Machine Learning. PMLR, 6471\u20136482."},{"key":"e_1_3_2_35_2","first-page":"375","volume-title":"Proceedings of the 3rd International Conference on Pattern Recognition and Machine Learning (PRML\u201922)","author":"Liu Bin","year":"2022","unstructured":"Bin Liu. 2022. Consistent relative confidence and label-free model selection for convolutional neural networks. In Proceedings of the 3rd International Conference on Pattern Recognition and Machine Learning (PRML\u201922). IEEE, 375\u2013379."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3238147.3238202"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3417330"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00045"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3439726"},{"key":"e_1_3_2_40_2","unstructured":"Norman Mu and Justin Gilmer. 2019. MNIST-C: A robustness benchmark for computer vision. Retrieved from https:\/\/arXiv:1906.02337"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1018"},{"key":"e_1_3_2_42_2","first-page":"4901","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Odena Augustus","year":"2019","unstructured":"Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In Proceedings of the International Conference on Machine Learning. PMLR, 4901\u20134911."},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-17795-9_10"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3132747.3132785"},{"key":"e_1_3_2_45_2","unstructured":"Ruchir Puri David S. Kung Geert Janssen Wei Zhang Giacomo Domeniconi Vladimir Zolotov Julian Dolby Jie Chen Mihir Choudhury Lindsey Decker Veronika Thost Luca Buratti Saurabh Pujar Shyam Ramji Ulrich Finkler Susan Malaika and Frederick Reiss. 2021. CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks. Retrieved from https:\/\/arxiv:2105.12655"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1249"},{"key":"e_1_3_2_47_2","first-page":"980","article-title":"Software testing techniques and strategies","volume":"2","author":"Sawant Abhijit","year":"2012","unstructured":"Abhijit Sawant, Pranit Bari, and Pramila Chawan. 2012. Software testing techniques and strategies. Int. J. Eng. Res. Appl. 2 (062012), 980\u2013986.","journal-title":"Int. J. Eng. Res. Appl."},{"issue":"1","key":"e_1_3_2_48_2","article-title":"Different software testing strategies and techniques","volume":"2","author":"Selvapriya P. B.","year":"2013","unstructured":"P. B. Selvapriya. 2013. Different software testing strategies and techniques. Int. J. Sci. Modern Eng. 2, 1 (2013).","journal-title":"Int. J. Sci. Modern Eng."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115627"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180220"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00046"},{"key":"e_1_3_2_52_2","first-page":"2035","volume-title":"Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS\u201909)","author":"Whitehill Jacob","year":"2009","unstructured":"Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier Movellan. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proceedings of the 22nd International Conference on Neural Information Processing Systems (NIPS\u201909). Curran Associates, Red Hook, NY, 2035\u20132043. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2009\/file\/f899139df5e1059396431415e770c6dd-Paper.pdf."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"e_1_3_2_54_2","volume-title":"Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms","author":"Xiao Han","year":"2017","unstructured":"Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. Retrieved from https:\/\/arXiv:cs.LG\/1708.07747"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3293882.3330579"},{"key":"e_1_3_2_56_2","unstructured":"Jie M. Zhang Mark Harman Lei Ma and Yang Liu. 2019. Machine learning testing: Survey landscapes and horizons. Retrieved from http:\/\/arxiv.org\/abs\/1906.10742"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00074"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE43902.2021.00048"},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00077"},{"key":"e_1_3_2_60_2","article-title":"Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks","volume":"32","author":"Zhou Yaqin","year":"2019","unstructured":"Yaqin Zhou, Shangqing Liu, Jingkai Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. Adv. Neural Inf. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3611666","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3611666","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:08Z","timestamp":1750178228000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3611666"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,24]]},"references-count":59,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,1,31]]}},"alternative-id":["10.1145\/3611666"],"URL":"https:\/\/doi.org\/10.1145\/3611666","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"value":"1049-331X","type":"print"},{"value":"1557-7392","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,24]]},"assertion":[{"value":"2022-12-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}