{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,9]],"date-time":"2025-09-09T22:10:26Z","timestamp":1757455826005,"version":"3.41.2"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"ISSTA","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U24A20337, 62372228"],"award-info":[{"award-number":["U24A20337, 62372228"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Shenzhen-Hong Kong-Macau Technology Research Programme","award":["SGDX20230821091559018"],"award-info":[{"award-number":["SGDX20230821091559018"]}]},{"name":"Open Project of State Key Laboratory for Novel Software Technology at Nanjing University","award":["KFKT2024B21"],"award-info":[{"award-number":["KFKT2024B21"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["14380029"],"award-info":[{"award-number":["14380029"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,22]]},"abstract":"<jats:p>Deep learning (DL) frameworks are essential to DL-based software systems, and framework bugs may lead to substantial disasters, thus requiring effective testing. Researchers adopt DL models or single interfaces as test inputs and analyze their execution results to detect bugs. However, floating-point errors, inherent randomness, and the complexity of test inputs make it challenging to analyze execution results effectively, leading to existing methods suffering from a lack of suitable test oracles. Some researchers utilize metamorphic testing to tackle this challenge. They design Metamorphic Relations (MRs) based on input data and parameter settings of a single framework interface to generate equivalent test inputs, ensuring consistent execution results between original and generated test inputs. Despite their promising effectiveness, they still face certain limitations. (1) Existing MRs overlook structural complexity, limiting test input diversity. (2) Existing MRs focus on limited interfaces, which limits generalization and necessitates additional adaptations. (3) Their detected bugs are related to the result consistency of single interfaces and far from those exposed in multi-interface combinations and runtime metrics (e.g., resource usage). To address these limitations, we propose ModelMeta, a model-level metamorphic testing method for DL frameworks with four MRs focused on the structure characteristics of DL models. ModelMeta augments seed models with diverse interface combinations to generate test inputs with consistent outputs, guided by the QR-DQN strategy. It then detects bugs through fine-grained analysis of training loss\/gradients, memory\/GPU usage, and execution time. We evaluate the effectiveness of ModelMeta on three popular DL frameworks (i.e., MindSpore, PyTorch, and ONNX) with 17 DL models from ten real-world tasks ranging from image classification to object detection. Results demonstrate that ModelMeta outperforms state-of-the-art baselines from the perspective of test coverage and diversity of generated test inputs. Regarding bug detection, ModelMeta has identified 31 new bugs, of which 27 have been confirmed, and 11 have been fixed. Among them, seven bugs existing methods cannot detect, i.e., five wrong resource usage bugs and two low-efficiency bugs. These results demonstrate the practicality of our method.<\/jats:p>","DOI":"10.1145\/3728972","type":"journal-article","created":{"date-parts":[[2025,6,22]],"date-time":"2025-06-22T10:52:56Z","timestamp":1750589576000},"page":"2158-2180","source":"Crossref","is-referenced-by-count":1,"title":["Improving Deep Learning Framework Testing with Model-Level Metamorphic Testing"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1816-2246","authenticated-orcid":false,"given":"Yanzhou","family":"Mu","sequence":"first","affiliation":[{"name":"Nanjing University, Nanjing, China"},{"name":"Shenzhen Research Institute of Nanjing University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5017-8016","authenticated-orcid":false,"given":"Juan","family":"Zhai","sequence":"additional","affiliation":[{"name":"University of Massachusetts at Amherst, Amherst, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9930-7111","authenticated-orcid":false,"given":"Chunrong","family":"Fang","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"},{"name":"Shenzhen Research Institute of Nanjing University, Shenzhen, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1180-3891","authenticated-orcid":false,"given":"Xiang","family":"Chen","sequence":"additional","affiliation":[{"name":"Nantong University, Nantong, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-6810-7608","authenticated-orcid":false,"given":"Zhixiang","family":"Cao","sequence":"additional","affiliation":[{"name":"Nantong University, Nantong, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-8242-9543","authenticated-orcid":false,"given":"Peiran","family":"Yang","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7793-2507","authenticated-orcid":false,"given":"Kexin","family":"Zhao","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8661-6133","authenticated-orcid":false,"given":"An","family":"Guo","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9592-7022","authenticated-orcid":false,"given":"Zhenyu","family":"Chen","sequence":"additional","affiliation":[{"name":"Nanjing University, Nanjing, China"},{"name":"Shenzhen Research Institute of Nanjing University, Shenzhen, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2021. GCOV. https:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Gcov.html."},{"key":"e_1_2_1_2_1","unstructured":"2025. LCOV - A graphical front-end for gcov. http:\/\/ltp.sourceforge.net\/coverage\/lcov.php"},{"key":"e_1_2_1_3_1","unstructured":"Accessed: 2023. Accuracy bug case.. https:\/\/github.com\/pytorch\/pytorch\/issues\/106604"},{"key":"e_1_2_1_4_1","unstructured":"Accessed: 2023. Bug Link in the motivating example.. https:\/\/gitee.com\/mindspore\/mindspore\/issues\/I7S7KP"},{"key":"e_1_2_1_5_1","unstructured":"Accessed: 2023. Crash bug case.. https:\/\/gitee.com\/mindspore\/mindspore\/issues\/I7N3UY"},{"key":"e_1_2_1_6_1","unstructured":"Accessed: 2023. Resource bug case.. https:\/\/e.gitee.com\/mind_spore\/dashboard?issue=I7VVFN"},{"key":"e_1_2_1_7_1","volume-title":"Arnaud Doucet, and Michael I Jordan.","author":"Andrieu Christophe","year":"2003","unstructured":"Christophe Andrieu, Nando De Freitas, Arnaud Doucet, and Michael I Jordan. 2003. An introduction to MCMC for machine learning. Machine learning, 50 (2003), 5\u201343."},{"key":"e_1_2_1_8_1","volume-title":"The oracle problem in software testing: A survey","author":"Barr Earl T","year":"2014","unstructured":"Earl T Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2014. The oracle problem in software testing: A survey. IEEE transactions on software engineering, 41, 5 (2014), 507\u2013525."},{"volume-title":"Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng (Eds.)","key":"e_1_2_1_9_1","unstructured":"2011. Handbook of Markov Chain Monte Carlo, Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng (Eds.). Chapman and Hall\/CRC."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1145\/2070336.2070341","article-title":"Do-178c: the next avionics safety standard","volume":"31","author":"Brosgol Benjamin","year":"2011","unstructured":"Benjamin Brosgol. 2011. Do-178c: the next avionics safety standard. ACM SIGAda Ada Letters, 31, 3 (2011), 5\u20136.","journal-title":"ACM SIGAda Ada Letters"},{"key":"e_1_2_1_11_1","unstructured":"Accessed: 2022. A TensorFlow bug related to multiple interfaces. https:\/\/github.com\/tensorflow\/tensorflow\/issues\/55840"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.312"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00042"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the ACM on Software Engineering, 1, FSE (2024)","author":"Chen Jinyin","year":"2024","unstructured":"Jinyin Chen, Chengyu Jia, Yunjie Yan, Jie Ge, Haibin Zheng, and Yao Cheng. 2024. A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning Operators. Proceedings of the ACM on Software Engineering, 1, FSE (2024), 2005\u20132027."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3587155"},{"key":"e_1_2_1_16_1","unstructured":"Tsong Yueh Chen Shing Chi Cheung and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Department of Computer Science Hong Kong University of Science and Technology. Tech. Rep. HKUST-CS98-01."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE51524.2021.9678746"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the AAAI conference on artificial intelligence. 32","author":"Dabney Will","year":"2018","unstructured":"Will Dabney, Mark Rowland, Marc Bellemare, and R\u00e9mi Munos. 2018. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI conference on artificial intelligence. 32."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549085"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00234-2"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MET.2017.2"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5539\u20135544","author":"Gao Fei","year":"2019","unstructured":"Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang Zhou, and Tie-Yan Liu. 2019. Soft contextual data augmentation for neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5539\u20135544."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510092"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416571"},{"key":"e_1_2_1_25_1","volume-title":"Advances in neural information processing systems, 23","author":"Hasselt Hado","year":"2010","unstructured":"Hado Hasselt. 2010. Double Q-learning. Advances in neural information processing systems, 23 (2010)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_27_1","unstructured":"Intel Corporation. 2021. oneDNN: Deep Neural Network Library. https:\/\/github.com\/oneapi-src\/oneDNN Version 2.6"},{"key":"e_1_2_1_28_1","volume-title":"oneMKL: Math Kernel Library. https:\/\/github.com\/oneapi-src\/oneMKL Version","author":"Intel Corporation","year":"2021","unstructured":"Intel Corporation. 2021. oneMKL: Math Kernel Library. https:\/\/github.com\/oneapi-src\/oneMKL Version 2021.4"},{"key":"e_1_2_1_29_1","volume-title":"International Conference on Machine Learning. 3050\u20133059","author":"Jay Nathan","year":"2019","unstructured":"Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, and Aviv Tamar. 2019. A deep reinforcement learning perspective on internet congestion control. In International Conference on Machine Learning. 3050\u20133059."},{"key":"e_1_2_1_30_1","volume-title":"2021 IEEE international conference on software maintenance and evolution (ICSME). 47\u201357","author":"Jia Li","year":"2021","unstructured":"Li Jia, Hao Zhong, and Linpeng Huang. 2021. The unit test quality of deep learning libraries: A mutation analysis. In 2021 IEEE international conference on software maintenance and evolution (ICSME). 47\u201357."},{"key":"e_1_2_1_31_1","volume-title":"Continuous Univariate Distributions","volume":"2","author":"Johnson Norman L.","unstructured":"Norman L. Johnson, Samuel Kotz, and N. Balakrishnan. 1995. Continuous Univariate Distributions, Volume 2 (2nd ed.). John Wiley & Sons. isbn:9780471584951"},{"key":"e_1_2_1_32_1","unstructured":"Accessed: 2020. Keras. http:\/\/keras.io\/"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","DOI":"10.1142\/5720","volume-title":"Beyond Beta: Other Continuous Families of Distributions with Bounded Support and Applications. World Scientific.","author":"Kotz Samuel","year":"2004","unstructured":"Samuel Kotz and Johan Rene van Dorp. 2004. Beyond Beta: Other Continuous Families of Distributions with Bounded Support and Applications. World Scientific."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.19"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2950290.2950361"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597207"},{"key":"e_1_2_1_37_1","first-page":"1","article-title":"Generation-based Differential Fuzzing for Deep Learning Libraries","volume":"33","author":"Liu Jiawei","year":"2023","unstructured":"Jiawei Liu, Yuheng Huang, Zhijie Wang, Lei Ma, Chunrong Fang, Mingzheng Gu, Xufan Zhang, and Zhenyu Chen. 2023. Generation-based Differential Fuzzing for Deep Learning Libraries. ACM Transactions on Software Engineering and Methodology, 33, 2 (2023), 1\u201328.","journal-title":"ACM Transactions on Software Engineering and Methodology"},{"key":"e_1_2_1_38_1","unstructured":"Accessed: 2020. MindSpore. https:\/\/www.mindspore.cn\/"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML). 28, 1928","author":"Mnih V.","year":"1937","unstructured":"V. Mnih, A. P. Badia, and M. Mirza. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML). 28, 1928\u20131937."},{"key":"e_1_2_1_40_1","unstructured":"Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra and Martin Riedmiller. 2013. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602."},{"key":"e_1_2_1_41_1","volume-title":"Human-level control through deep reinforcement learning. nature, 518, 7540","author":"Mnih Volodymyr","year":"2015","unstructured":"Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, and Georg Ostrovski. 2015. Human-level control through deep reinforcement learning. nature, 518, 7540 (2015), 529\u2013533."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.935097"},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering. 1533\u20131544","author":"Mu Yanzhou","year":"2024","unstructured":"Yanzhou Mu, Juan Zhai, Chunrong Fang, Xiang Chen, Zhixiang Cao, Peiran Yang, Yinglong Zou, Tao Zheng, and Zhenyu Chen. 2024. DevMuT: Testing Deep Learning Framework via Developer Expertise-Based Mutation. In Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering. 1533\u20131544."},{"key":"e_1_2_1_44_1","volume-title":"2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 785\u2013796","author":"Nejadgholi Mahdi","year":"2019","unstructured":"Mahdi Nejadgholi and Jinqiu Yang. 2019. A study of oracle approximations in testing deep learning libraries. In 2019 34th IEEE\/ACM International Conference on Automated Software Engineering (ASE). 785\u2013796."},{"key":"e_1_2_1_45_1","volume-title":"Predicting the future\u2014big data, machine learning, and clinical medicine. The New England journal of medicine, 375, 13","author":"Obermeyer Ziad","year":"2016","unstructured":"Ziad Obermeyer and Ezekiel J Emanuel. 2016. Predicting the future\u2014big data, machine learning, and clinical medicine. The New England journal of medicine, 375, 13 (2016), 1216."},{"key":"e_1_2_1_46_1","unstructured":"Accessed: 2019. ONNX. https:\/\/onnx.ai\/"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00107"},{"key":"e_1_2_1_48_1","unstructured":"Accessed: 2019. Pytorch. https:\/\/pytorch.org\/"},{"key":"e_1_2_1_49_1","unstructured":"Alec Radford Jeffrey Wu Rewon Child David Luan Dario Amodei and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. OpenAI Blog https:\/\/cdn.openai.com\/better-language-models\/language_models_are_unsupervised_multitask_learners.pdf"},{"key":"e_1_2_1_50_1","unstructured":"Accessed: 2024. Data Avaiable. https:\/\/github.com\/anonymous-tai\/ModelMeta"},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Qingchao Shen Yongqiang Tian Haoyang Ma Junjie Chen Lili Huang Ruifeng Fu Shing-Chi Cheung and Zan Wang. 2024. A Tale of Two DL Cities: When Library Tests Meet Compiler. arXiv preprint arXiv:2407.16626.","DOI":"10.1109\/ICSE55347.2025.00025"},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"D. Silver A. Huang and C. J. Maddison. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 7587 (2016) 484\u2013489.","DOI":"10.1038\/nature16961"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-022-10228-y"},{"key":"e_1_2_1_54_1","unstructured":"Accessed: 2019. Tensorflow. https:\/\/www.tensorflow.org\/"},{"key":"e_1_2_1_55_1","unstructured":"Accessed: 2023. teslanews. https:\/\/www.tesladeaths.com\/"},{"key":"e_1_2_1_56_1","unstructured":"Accessed: 2024. TF2ONNX. https:\/\/pypi.org\/project\/tf2onnx\/"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/25.3-4.285"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510165"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409761"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510041"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3508035"},{"key":"e_1_2_1_62_1","volume-title":"Lin Tan, Xiangyu Zhang, and Michael Godfrey.","author":"Xie Danning","year":"2021","unstructured":"Danning Xie, Yitong Li, Mijung Kim, Hung Viet Pham, Lin Tan, Xiangyu Zhang, and Michael Godfrey. 2021. Leveraging documentation to test deep learning library functions. arXiv preprint arXiv:2109.01002."},{"key":"e_1_2_1_63_1","volume-title":"CEDAR: Continuous Testing of Deep Learning Libraries. In International Conference on Software Analysis, Evolution, and Reengineering,.","author":"Xie Danning","year":"2024","unstructured":"Danning Xie, Jiannan Wang, Hung Viet Pham, Lin Tan, Yu Guo, Adnan Aziz, and Erik Meijer. 2024. CEDAR: Continuous Testing of Deep Learning Libraries. In International Conference on Software Analysis, Evolution, and Reengineering,."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2022.107004"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180198"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2021.3107165"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3460319.3464843"},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the ACM on Programming Languages, 8, OOPSLA2","author":"Zhou Chijin","year":"2024","unstructured":"Chijin Zhou, Bingzhou Qian, Gwihwan Go, Quan Zhang, Shanshan Li, and Yu Jiang. 2024. PolyJuice: Detecting Mis-compilation Bugs in Tensor Compilers with Equality Saturation Based Rewriting. Proceedings of the ACM on Programming Languages, 8, OOPSLA2 (2024), 1309\u20131335."}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3728972","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,16]],"date-time":"2025-07-16T16:45:59Z","timestamp":1752684359000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3728972"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,22]]},"references-count":68,"journal-issue":{"issue":"ISSTA","published-print":{"date-parts":[[2025,6,22]]}},"alternative-id":["10.1145\/3728972"],"URL":"https:\/\/doi.org\/10.1145\/3728972","relation":{},"ISSN":["2994-970X"],"issn-type":[{"type":"electronic","value":"2994-970X"}],"subject":[],"published":{"date-parts":[[2025,6,22]]}}}