{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T11:00:37Z","timestamp":1766401237148,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"NSF","award":["#2313054"],"award-info":[{"award-number":["#2313054"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>\n            Machine learning (ML) applications have become an integral part of our lives. ML applications extensively use floating-point computation and involve very large\/small numbers; thus, maintaining the numerical stability of such complex computations remains an important challenge. Numerical bugs can lead to system crashes, incorrect output, and wasted computing resources. In this paper, we introduce a novel idea, namely\n            <jats:italic toggle=\"yes\">soft assertions (SA)<\/jats:italic>\n            , to encode safety\/error conditions for the places where numerical instability can occur. A soft assertion is an ML model automatically trained using the dataset obtained during unit testing of unstable functions. Given the values at the unstable function in an ML application, a soft assertion reports how to change these values in order to trigger the instability. We then use the output of soft assertions as signals to effectively mutate inputs to trigger numerical instability in ML applications. In the evaluation, we used the GRIST benchmark, a total of 79 programs, as well as 15 real-world ML applications from GitHub. We compared our tool with 5 state-of-the-art (SOTA) fuzzers. We found all the GRIST bugs and outperformed the baselines. We found 13 numerical bugs in real-world code, one of which had already been confirmed by the GitHub developers. While the baselines mostly found the bugs that report NaN and INF, our tool found numerical bugs with incorrect output. We showed one case where the\n            <jats:italic toggle=\"yes\">Tumor Detection Model<\/jats:italic>\n            , trained on Brain MRI images, should have predicted \u201dtumor\u201d, but instead, it incorrectly predicted \u201dno tumor\u201d due to the numerical bugs. Our replication package is located at https:\/\/figshare.com\/s\/6528d21ccd28bea94c32.\n          <\/jats:p>","DOI":"10.1145\/3729394","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:16:02Z","timestamp":1750346162000},"page":"2806-2827","source":"Crossref","is-referenced-by-count":1,"title":["Automatically Detecting Numerical Instability in Machine Learning Applications via Soft Assertions"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-0854-5404","authenticated-orcid":false,"given":"Shaila","family":"Sharmin","sequence":"first","affiliation":[{"name":"Iowa State University, Ames, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7014-112X","authenticated-orcid":false,"given":"Anwar Hossain","family":"Zahid","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5976-705X","authenticated-orcid":false,"given":"Subhankar","family":"Bhattacharjee","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7096-6753","authenticated-orcid":false,"given":"Chiamaka","family":"Igwilo","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3802-1512","authenticated-orcid":false,"given":"Miryung","family":"Kim","sequence":"additional","affiliation":[{"name":"University of California at Los Angeles, Los Angeles, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6797-0648","authenticated-orcid":false,"given":"Wei","family":"Le","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"BESBES Ahmed. 2021. focal_loss.py. GitHub. Available at: https:\/\/github.com\/ahmedbesbes\/character-based-cnn\/blob\/master\/src\/focal_loss.py"},{"key":"e_1_2_1_2_1","first-page":"1","article-title":"Automatic differentiation in machine learning: a survey","volume":"18","author":"Baydin Atilim Gunes","year":"2018","unstructured":"Atilim Gunes Baydin, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2018. Automatic differentiation in machine learning: a survey. Journal of machine learning research, 18, 153 (2018), 1\u201343.","journal-title":"Journal of machine learning research"},{"key":"e_1_2_1_3_1","unstructured":"Richard L Burden and J Douglas Faires. 1997. Numerical analysis. Brooks Cole."},{"key":"e_1_2_1_4_1","unstructured":"Navoneel Chakrabarty. 2024. Brain MRI Images for Brain Tumor Detection. GitHub. Available at: https:\/\/www.kaggle.com\/datasets\/navoneel\/brain-mri-images-for-brain-tumor-detection\/data"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3385412.3386004"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468585"},{"key":"e_1_2_1_7_1","unstructured":"codespecs. 2024. Daikon Documentation. GitHub. Available at: https:\/\/plse.cs.washington.edu\/daikon\/pubs\/"},{"key":"e_1_2_1_8_1","unstructured":"Rodrigo P\u00e9rez Dattari. 2023. ranking_losses.py. GitHub. Available at: https:\/\/github.com\/rperezdattari\/Stable-Motion-Primitives-via-Imitation-and-Contrastive-Learning\/blob\/main\/src\/agent\/utils\/ranking_losses.py"},{"key":"e_1_2_1_9_1","unstructured":"Bruce Dawson. 2008. Comparing floating point numbers. Retrieved from Comparing floating point numbers: http:\/\/www. cygnus-software. com\/papers\/comparingfloats\/comparingfloats. htm."},{"key":"e_1_2_1_10_1","unstructured":"DeCLaRe. 2022. OT_solver.py. GitHub. Available at: https:\/\/github.com\/declare-lab\/MM-Align\/blob\/main\/src\/modules\/OT_solver.py"},{"key":"e_1_2_1_11_1","volume-title":"2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE). 509\u2013519","author":"Franco Anthony Di","year":"2017","unstructured":"Anthony Di Franco, Hui Guo, and Cindy Rubio-Gonz\u00e1lez. 2017. A comprehensive study of real-world numerical bug characteristics. In 2017 32nd IEEE\/ACM International Conference on Automated Software Engineering (ASE). 509\u2013519."},{"key":"e_1_2_1_12_1","unstructured":"Yanjia Li Ethan. 2023. loss.py. GitHub. Available at: https:\/\/github.com\/ethanyanjiali\/minChatGPT\/blob\/main\/src\/loss.py"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jacr.2017.12.028"},{"key":"e_1_2_1_14_1","unstructured":"Ethan Gilmore. 2024. MicroMLP. GitHub. Available at: https:\/\/github.com\/ethangilmore\/MicroMLP"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2843948"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3264835"},{"key":"e_1_2_1_17_1","unstructured":"Nikita Gushchin. 2023. cunet. GitHub. Available at: https:\/\/github.com\/ngushchin\/EntropicNeuralOptimalTransport\/tree\/e66cf61b37f06f0714165161618f7a944af189c9"},{"key":"e_1_2_1_18_1","unstructured":"Fabrice Harel-Canada. 2022. A basic python-based fuzzing tool used to support miscellaneous research objectives. GitHub. Available at: https:\/\/github.com\/fabriceyhc\/pyfuzz"},{"key":"e_1_2_1_19_1","unstructured":"Richard Hawkins Colin Paterson Chiara Picardi Yan Jia Radu Calinescu and Ibrahim Habli. 2021. Guidance on the assurance of machine learning in autonomous systems (AMLAS). arXiv preprint arXiv:2102.01564."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_2_1_21_1","volume-title":"CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries. https:\/\/www.cs.purdue.edu\/homes\/lintan\/publications\/cradle-icse19.pdf Accessed: 2024-01-03","author":"Hu Xinyu","year":"2019","unstructured":"Xinyu Hu, Lingming Zhang, Gongxuan Zhang, Xiaoning Du, Qianyu Guo, Xin Xia, Shanping Li, and Lin Tan. 2019. CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries. https:\/\/www.cs.purdue.edu\/homes\/lintan\/publications\/cradle-icse19.pdf Accessed: 2024-01-03"},{"key":"e_1_2_1_22_1","volume-title":"Database Systems for Advanced Applications: 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24\u201327, 2020, Proceedings, Part I 25","author":"Jia Li","year":"2020","unstructured":"Li Jia, Hao Zhong, Xiaoyin Wang, Linpeng Huang, and Xuansheng Lu. 2020. An empirical study on bugs inside tensorflow. In Database Systems for Advanced Applications: 25th International Conference, DASFAA 2020, Jeju, South Korea, September 24\u201327, 2020, Proceedings, Part I 25. 604\u2013620."},{"key":"e_1_2_1_23_1","unstructured":"Jinwoo Kim. 2023. SO.py. GitHub. Available at: https:\/\/github.com\/jw9730\/lps\/blob\/b6ba12faa9ec74ba95d95299895a21da944dd4af\/src\/symmetry\/groups\/SO.py"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 44th International Conference on Software Engineering. 586\u2013597","author":"Kloberdanz Eliska","year":"2022","unstructured":"Eliska Kloberdanz, Kyle G Kloberdanz, and Wei Le. 2022. DeepStability: A study of unstable numerical methods and their solutions in deep learning. In Proceedings of the 44th International Conference on Software Engineering. 586\u2013597."},{"key":"e_1_2_1_25_1","unstructured":"Linyi Li Yuhao Zhang Luyao Ren Yingfei Xiong and Tao Xie. 2023. Reliability Assurance for Deep Neural Network Architectures Against Numerical Defects. arxiv:2302.06086"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575707"},{"key":"e_1_2_1_27_1","unstructured":"Weitang Liu. 2023. triplet_loss.py. GitHub. Available at: https:\/\/github.com\/lonePatient\/TorchBlocks\/blob\/445140f8190a037b590d3344cdbd6cae5b850da0\/src\/torchblocks\/losses\/triplet_loss.py"},{"key":"e_1_2_1_28_1","volume-title":"Atheris: Coverage-Guided Python Fuzzing Engine. https:\/\/pypi.org\/project\/atheris\/ Accessed: 2023-12-03","author":"Google","year":"2023","unstructured":"Google LLC. 2023. Atheris: Coverage-Guided Python Fuzzing Engine. https:\/\/pypi.org\/project\/atheris\/ Accessed: 2023-12-03"},{"key":"e_1_2_1_29_1","unstructured":"longyahui. 2021. crf.py. GitHub. Available at: https:\/\/github.com\/longyahui\/GCNMDA\/blob\/master\/src\/crf.py"},{"key":"e_1_2_1_30_1","volume-title":"Mohayeminul Islam, Diego Elias Costa, Emad Shihab, Foutse Khomh, Sarah Nadi, and Muhammad Raza.","author":"Majdinasab Vahid","year":"2023","unstructured":"Vahid Majdinasab, Sharon Chee Yin Ho, Mohayeminul Islam, Diego Elias Costa, Emad Shihab, Foutse Khomh, Sarah Nadi, and Muhammad Raza. 2023. An empirical study on bugs inside PyTorch: A replication study. arXiv e-prints, arXiv\u20132307."},{"key":"e_1_2_1_31_1","volume-title":"International Conference on Machine Learning. 4901\u20134911","author":"Odena Augustus","year":"2019","unstructured":"Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning. 4901\u20134911."},{"key":"e_1_2_1_32_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in pytorch."},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"Kexin Pei Yinzhi Cao Junfeng Yang and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. arxiv:1705.06640","DOI":"10.1145\/3132747.3132785"},{"key":"e_1_2_1_34_1","unstructured":"Robin Scheibler. 2023. diffusion-separation. GitHub. Available at: https:\/\/github.com\/fakufaku\/diffusion-separation\/blob\/main\/utils\/linalg.py"},{"key":"e_1_2_1_35_1","unstructured":"Mohit Shaharwale. 2024. TumorScope. GitHub. Available at: https:\/\/github.com\/mohits2806\/TumorScope"},{"key":"e_1_2_1_36_1","unstructured":"Simon. 2024. Value Error. GitHub. Available at: https:\/\/github.com\/scikit-learn\/scikit-learn\/issues\/29678"},{"key":"e_1_2_1_37_1","unstructured":"Strath.AI. 2023. SatelliteCloudGenerator. GitHub. Available at: https:\/\/github.com\/strath-ai\/SatelliteCloudGenerator\/blob\/d120eaf787d379527d7aeb4f7cb3f635b00b50d5\/src\/noise.py"},{"key":"e_1_2_1_38_1","unstructured":"Zeye Sun. 2023. loss.py. GitHub. Available at: https:\/\/github.com\/sunzeyeah\/RLHF\/blob\/cd1a6d54971eb0513f38974aa6dcca53aa2f3174\/src\/models\/loss.py"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-023-10389-6"},{"key":"e_1_2_1_40_1","volume-title":"Hypothesis: A new approach to property-based testing. https:\/\/hypothesis.readthedocs.io\/en\/latest\/ Accessed: 2024-01-26","author":"Team Hypothesis Development","year":"2023","unstructured":"Hypothesis Development Team. 2023. Hypothesis: A new approach to property-based testing. https:\/\/hypothesis.readthedocs.io\/en\/latest\/ Accessed: 2024-01-26"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3395363.3397380"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3551349.3559561"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11634-018-0318-1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377811.3380429"},{"key":"e_1_2_1_45_1","volume-title":"Reassert: Deep learning for assert generation. arXiv preprint arXiv:2011.09784.","author":"White Robert","year":"2020","unstructured":"Robert White and Jens Krinke. 2020. Reassert: Deep learning for assert generation. arXiv preprint arXiv:2011.09784."},{"key":"e_1_2_1_46_1","unstructured":"Li Xingxuan. 2023. prior_wd_optim.py. GitHub. Available at: https:\/\/github.com\/DAMO-NLP-SG\/MVCR\/blob\/main\/src\/prior_wd_optim.py"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3468264.3468612"},{"key":"e_1_2_1_48_1","volume-title":"2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1174\u20131186","author":"Yang Chenyuan","year":"2023","unstructured":"Chenyuan Yang, Yinlin Deng, Jiayi Yao, Yuxing Tu, Hanchi Li, and Lingming Zhang. 2023. Fuzzing automatic differentiation in deep-learning libraries. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). 1174\u20131186."},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Hanmo You Zan Wang Junjie Chen Shuang Liu and Shuochuan Li. 2023. DRFuzz: A Regression Fuzzing Framework for Deep Learning Systems. https:\/\/dl.acm.org\/doi\/abs\/10.1109\/ICSE48619.2023.00019","DOI":"10.1109\/ICSE48619.2023.00019"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3368089.3409720"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729394","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:18:19Z","timestamp":1750346299000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729394"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":50,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729394"],"URL":"https:\/\/doi.org\/10.1145\/3729394","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}