{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T13:19:51Z","timestamp":1773839991146,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","license":[{"start":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:00:00Z","timestamp":1720742400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2024,7,12]]},"abstract":"<jats:p>\n                    Deep learning (DL) is a critical tool for real-world applications, and comprehensive testing of DL models is vital to ensure their quality before deployment. However, recent studies have shown that even subtle deviations in DL operators can result in catastrophic consequences, underscoring the importance of rigorous testing of these components. Unlike testing other DL system components, operator analysis poses unique challenges due to complex inputs and uncertain outputs. The existing DL operator testing approach has limitations in terms of testing efficiency and error localization. In this paper, we propose\n                    <jats:italic toggle=\"yes\">Meta<\/jats:italic>\n                    , a novel operator testing framework based on metamorphic testing that automatically tests and assists bug location based on metamorphic relations (MRs). Meta distinguishes itself in three key ways: (1) it considers both parameters and input tensors to detect operator errors, enabling it to identify both implementation and precision errors; (2) it uses MRs to guide the generation of more effective inputs (i.e., tensors and parameters) in less time; (3) it assists the precision error localization by tracing the error to the input level of the operator based on MR violations. We designed 18 MRs for testing 10 widely used DL operators. To assess the effectiveness of Meta, we conducted experiments on 13 released versions of 5 popular DL libraries. Our results revealed that Meta successfully detected 41 errors, including 14 new ones that were reported to the respective platforms and 8 of them are confirmed\/fixed. Additionally, Meta demonstrated high efficiency, outperforming the baseline by detecting\n                    <jats:inline-formula>\n                      <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" display=\"inline\">\n                        <mml:mrow>\n                          <mml:mo>\u223c<\/mml:mo>\n                          <mml:mn>2<\/mml:mn>\n                        <\/mml:mrow>\n                      <\/mml:math>\n                    <\/jats:inline-formula>\n                    times more errors of the baseline. Meta is open-sourced and available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/TDY-raedae\/Medi-Test\">https:\/\/github.com\/TDY-raedae\/Medi-Test<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1145\/3660796","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T10:22:09Z","timestamp":1720779729000},"page":"2005-2027","source":"Crossref","is-referenced-by-count":7,"title":["A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning Operators"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7153-2755","authenticated-orcid":false,"given":"Jinyin","family":"Chen","sequence":"first","affiliation":[{"name":"Zhejiang University of Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6960-0047","authenticated-orcid":false,"given":"Chengyu","family":"Jia","sequence":"additional","affiliation":[{"name":"Zhejiang University of Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8058-6383","authenticated-orcid":false,"given":"Yunjie","family":"Yan","sequence":"additional","affiliation":[{"name":"Zhejiang University of Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-5163-8765","authenticated-orcid":false,"given":"Jie","family":"Ge","sequence":"additional","affiliation":[{"name":"Zhejiang University of Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8997-5343","authenticated-orcid":false,"given":"Haibin","family":"Zheng","sequence":"additional","affiliation":[{"name":"Zhejiang University of Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5781-5185","authenticated-orcid":false,"given":"Yao","family":"Cheng","sequence":"additional","affiliation":[{"name":"T\u00dcV S\u00dcD Asia Pacific, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2024,7,12]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEEESTD.2020.9091348"},{"key":"e_1_3_1_3_2","first-page":"265","volume-title":"OSDI","author":"Abadi Martin","year":"2016","unstructured":"Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. USENIX Association, 265\u2013283."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00042"},{"key":"e_1_3_1_5_2","unstructured":"Tianqi Chen Mu Li Yutian Li Min Lin Naiyan Wang Minjie Wang Tianjun Xiao Bing Xu Chiyuan Zhang and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In NeurIPS. 1\u20136."},{"key":"e_1_3_1_6_2","first-page":"430","volume-title":"ASE","author":"Chen Zhuangbin","year":"2021","unstructured":"Zhuangbin Chen, Jinyang Liu, Yuxin Su, Hongyu Zhang, Xuemin Wen, Xiao Ling, Yongqiang Yang, and Michael R. Lyu. 2021. Graph-based Incident Aggregation for Large-Scale Online Service Systems. In ASE. IEEE, 430\u2013442."},{"key":"e_1_3_1_7_2","unstructured":"Francois Chollet. 2015. Keras: Deep learning library for theano and tensorflow. https:\/\/github.com\/keras-team\/keras."},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Yinlin Deng Chunqiu Steven Xia Haoran Peng Chenyuan Yang and Lingming Zhang. 2023. Large language models are zero-shot fuzzers: Fuzzing deep-learning libraries via large language models. In Proceedings of the 32nd ACM SIGSOFT international symposium on software testing and analysis. 423\u2013435.","DOI":"10.1145\/3597926.3598067"},{"key":"e_1_3_1_9_2","doi-asserted-by":"crossref","unstructured":"Yinlin Deng Chunqiu Steven Xia Chenyuan Yang Shizhuo Dylan Zhang Shujing Yang and Lingming Zhang. 2024. Large language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In Proceedings of the 46th IEEE\/ACM International Conference on Software Engineering. 1\u201313.","DOI":"10.1145\/3597503.3623343"},{"key":"e_1_3_1_10_2","first-page":"44","volume-title":"ESEC\/SIGSOFT FSE","author":"Deng Yinlin","year":"2022","unstructured":"Yinlin Deng, Chenyuan Yang, Anjiang Wei, and Lingming Zhang. 2022. Fuzzing deep-learning libraries via automated relational API inference. In ESEC\/SIGSOFT FSE. ACM, 44\u201356."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MET.2019.00008"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/3213846.3213858"},{"key":"e_1_3_1_13_2","first-page":"5539","volume-title":"ACL (1)","author":"Gao Fei","year":"2019","unstructured":"Fei Gao, Jinhua Zhu, Lijun Wu, Yingce Xia, Tao Qin, Xueqi Cheng, Wengang Zhou, and Tie-Yan Liu. 2019. Soft Contextual Data Augmentation for Neural Machine Translation. In ACL (1). Association for Computational Linguistics, 5539\u20135544."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510092"},{"key":"e_1_3_1_15_2","first-page":"71","volume-title":"ICSE (SEIP)","author":"Gulzar Muhammad Ali","year":"2019","unstructured":"Muhammad Ali Gulzar, Yongkang Zhu, and Xiaofeng Han. 2019. Perception and practices of differential testing. In ICSE (SEIP). IEEE \/ ACM, 71\u201380."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3324884.3416571"},{"key":"e_1_3_1_17_2","unstructured":"Huawei. 2020. MindSpore. https:\/\/gitee.com\/mindspore\/mindspore."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3548606.3560578"},{"key":"e_1_3_1_19_2","first-page":"510","volume-title":"ESEC\/SIGSOFT FSE","author":"Islam Johirul","year":"2019","unstructured":"Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In ESEC\/SIGSOFT FSE. ACM, 510\u2013520."},{"key":"e_1_3_1_20_2","first-page":"1135","volume-title":"ICSE","author":"Islam Johirul","year":"2020","unstructured":"Md Johirul Islam, Rangeet Pan, Giang Nguyen, and Hridesh Rajan. 2020. Repairing deep neural networks: fix patterns and challenges. In ICSE. ACM, 1135\u20131146."},{"key":"e_1_3_1_21_2","first-page":"604","volume-title":"DASFAA (1) (Lecture Notes in Computer Science","author":"Jia Li","year":"2020","unstructured":"Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An Empirical Study on Bugs Inside TensorFlow. In DASFAA (1) (Lecture Notes in Computer Science, Vol. 12112). Springer, 604\u2013620."},{"key":"e_1_3_1_22_2","first-page":"1","volume-title":"Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020","author":"Jiang Xiaotang","year":"2020","unstructured":"Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lyu, and Zhihua Wu. 2020. MNN: A Universal and Efficient Inference Engine. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.), Vol. 2. mlsys.org, 1\u201313."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2208.01508"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3524846.3527341"},{"key":"e_1_3_1_25_2","unstructured":"Christian Murphy and Gail E Kaiser. 2010. Empirical evaluation of approaches to testing applications without test oracles. (2010)."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2896880"},{"key":"e_1_3_1_27_2","unstructured":"Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury Gregory Chanan Trevor Killeen Zeming Lin Natalia Gimelshein Luca Antiga Alban Desmaison Andreas Kopf Edward Z. Yang Zachary DeVito Martin Raison Alykhan Tejani Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai and Soumith Chintala. 2019. PyTorch: An Imperative Style High-Performance Deep Learning Library. In NeurIPS. 8024\u20138035."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3361566"},{"key":"e_1_3_1_29_2","first-page":"1027","volume-title":"ICSE","author":"Pham Hung Viet","year":"2019","unstructured":"Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In ICSE. IEEE \/ ACM, 1027\u20131038."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510454.3516835"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2016.2532875"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2301.08653"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1504\/IJWGS.2020.110945"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Jiannan Wang Thibaud Lutellier Shangshu Qian Hung Viet Pham and Lin Tan. 2022. EAGLE: creating equivalent graphs to test deep learning libraries. In Proceedings of the 44th International Conference on Software Engineering. 798\u2013810.","DOI":"10.1145\/3510003.3510165"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2020.07.042"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409761"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510041"},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1145\/3591251","article-title":"Optimal Reads-From Consistency Checking for C11-Style Memory Models","volume":"7","author":"Windsor Matt","year":"2023","unstructured":"Matt Windsor, Alastair F. Donaldson, and John Wickerson. 2023. Optimal Reads-From Consistency Checking for C11-Style Memory Models. Proceedings of the ACM on Programming Languages 7, PLDI (2023), 761\u2013785.","journal-title":"Proceedings of the ACM on Programming Languages"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534220"},{"key":"e_1_3_1_40_2","first-page":"135","volume-title":"QSIC","author":"Xie Xiaoyuan","year":"2009","unstructured":"Xiaoyuan Xie, Joshua Wing Kei Ho, Christian Murphy, Gail E. Kaiser, Baowen Xu, and Tsong Yueh Chen. 2009. Application of Metamorphic Testing to Supervised Classifiers. In QSIC. IEEE Computer Society, 135\u2013144."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","unstructured":"Xiaoyuan Xie Joshua Wing Kei Ho Christian Murphy Gail E. Kaiser Baowen Xu and Tsong Yueh Chen. 2011. Testing and validating machine learning classifiers by metamorphic testing. J. Syst. Softw. 84 4 (2011) 544\u2013558. https:\/\/doi.org\/10.1016\/j.jss.2010.11.920 10.1016\/j.jss.2010.11.920","DOI":"10.1016\/j.jss.2010.11.920"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468612"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2302.04351"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2021.3107165"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3460319.3464843"},{"key":"e_1_3_1_46_2","first-page":"129","volume-title":"ISSTA","author":"Zhang Yuhao","year":"2018","unstructured":"Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018. An empirical study on TensorFlow program bugs. In ISSTA. ACM, 129\u2013140."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409720"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/MET52542.2021.00010"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3533767.3534409"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660796","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3660796","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T07:55:02Z","timestamp":1770191702000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3660796"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,12]]},"references-count":48,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2024,7,12]]}},"alternative-id":["10.1145\/3660796"],"URL":"https:\/\/doi.org\/10.1145\/3660796","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,7,12]]}}}