{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,17]],"date-time":"2026-02-17T03:39:40Z","timestamp":1771299580575,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":34,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T00:00:00Z","timestamp":1691107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,8,6]]},"DOI":"10.1145\/3580305.3599501","type":"proceedings-article","created":{"date-parts":[[2023,8,4]],"date-time":"2023-08-04T18:13:58Z","timestamp":1691172838000},"page":"3185-3194","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1430-7519","authenticated-orcid":false,"given":"Yun","family":"Yue","sequence":"first","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5998-0037","authenticated-orcid":false,"given":"Jiadi","family":"Jiang","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-1229-5876","authenticated-orcid":false,"given":"Zhiling","family":"Ye","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4458-2132","authenticated-orcid":false,"given":"Ning","family":"Gao","sequence":"additional","affiliation":[{"name":"Ant Group, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3440-9675","authenticated-orcid":false,"given":"Yongchao","family":"Liu","sequence":"additional","affiliation":[{"name":"Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-6685-1293","authenticated-orcid":false,"given":"Ke","family":"Zhang","sequence":"additional","affiliation":[{"name":"Ant Group, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2023,8,4]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning. In International Conference on Machine Learning, ICML 2022","volume":"32","author":"Abbas Momin","year":"2022","unstructured":"Momin Abbas , Quan Xiao , Lisha Chen , Pin-Yu Chen , and Tianyi Chen . 2022 . Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning. In International Conference on Machine Learning, ICML 2022 , 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research , Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.). PMLR, 10-- 32 . https:\/\/proceedings.mlr.press\/v162\/abbas22b.html Momin Abbas, Quan Xiao, Lisha Chen, Pin-Yu Chen, and Tianyi Chen. 2022. Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1ri, Gang Niu, and Sivan Sabato (Eds.). PMLR, 10--32. https:\/\/proceedings.mlr.press\/v162\/abbas22b.html"},{"key":"e_1_3_2_2_2_1","volume-title":"Proceedings of the 36th International Conference on Machine Learning, ICML 2019","volume":"321","author":"Arazo Eric","year":"2019","unstructured":"Eric Arazo , Diego Ortego , Paul Albert , Noel E. O'Connor , and Kevin McGuinness . 2019 . Unsupervised Label Noise Modeling and Loss Correction . In Proceedings of the 36th International Conference on Machine Learning, ICML 2019 , 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research , Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 312-- 321 . http:\/\/proceedings.mlr.press\/v97\/arazo19a.html Eric Arazo, Diego Ortego, Paul Albert, Noel E. O'Connor, and Kevin McGuinness. 2019. Unsupervised Label Noise Modeling and Loss Correction. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 312--321. http:\/\/proceedings.mlr.press\/v97\/arazo19a.html"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.508"},{"key":"e_1_3_2_2_4_1","volume-title":"Christian Borgs, Jennifer T. Chayes, Levent Sagun, and Riccardo Zecchina.","author":"Chaudhari Pratik","year":"2017","unstructured":"Pratik Chaudhari , Anna Choromanska , Stefano Soatto , Yann LeCun , Carlo Baldassi , Christian Borgs, Jennifer T. Chayes, Levent Sagun, and Riccardo Zecchina. 2017 . Entropy-SGD: Biasing Gradient Descent Into Wide Valleys. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview .net. https:\/\/openreview.net\/forum?id=B1YfAfcgl Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer T. Chayes, Levent Sagun, and Riccardo Zecchina. 2017. Entropy-SGD: Biasing Gradient Descent Into Wide Valleys. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https:\/\/openreview.net\/forum?id=B1YfAfcgl"},{"key":"e_1_3_2_2_5_1","volume-title":"When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations. CoRR","author":"Chen Xiangning","year":"2022","unstructured":"Xiangning Chen , Cho-Jui Hsieh , and Boqing Gong . 2022. When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations. CoRR , Vol. abs\/ 2106 .01548 ( 2022 ). [arXiv]2106.01548 https:\/\/arxiv.org\/abs\/2106.01548 Xiangning Chen, Cho-Jui Hsieh, and Boqing Gong. 2022. When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations. CoRR, Vol. abs\/2106.01548 (2022). [arXiv]2106.01548 https:\/\/arxiv.org\/abs\/2106.01548"},{"key":"e_1_3_2_2_6_1","volume-title":"Le","author":"Cubuk Ekin Dogus","year":"2018","unstructured":"Ekin Dogus Cubuk , Barret Zoph , Dandelion Man\u00e9 , Vijay Vasudevan , and Quoc V . Le . 2018 . AutoAugment: Learning Augmentation Policies from Data. CoRR , Vol. abs\/ 1805 .09501 (2018). [arXiv]1805.09501 http:\/\/arxiv.org\/abs\/1805.09501 Ekin Dogus Cubuk, Barret Zoph, Dandelion Man\u00e9, Vijay Vasudevan, and Quoc V. Le. 2018. AutoAugment: Learning Augmentation Policies from Data. CoRR, Vol. abs\/1805.09501 (2018). [arXiv]1805.09501 http:\/\/arxiv.org\/abs\/1805.09501"},{"key":"e_1_3_2_2_7_1","volume-title":"Taylor","author":"Devries Terrance","year":"2017","unstructured":"Terrance Devries and Graham W . Taylor . 2017 . Improved Regularization of Convolutional Neural Networks with Cutout. CoRR , Vol. abs\/ 1708 .04552 (2017). [arXiv]1708.04552 http:\/\/arxiv.org\/abs\/1708.04552 Terrance Devries and Graham W. Taylor. 2017. Improved Regularization of Convolutional Neural Networks with Cutout. CoRR, Vol. abs\/1708.04552 (2017). [arXiv]1708.04552 http:\/\/arxiv.org\/abs\/1708.04552"},{"key":"e_1_3_2_2_8_1","volume-title":"Efficient Sharpness-aware Minimization for Improved Training of Neural Networks. In The Tenth International Conference on Learning Representations, ICLR 2022","author":"Du Jiawei","year":"2022","unstructured":"Jiawei Du , Hanshu Yan , Jiashi Feng , Joey Tianyi Zhou , Liangli Zhen , Rick Siow Mong Goh , and Vincent Y. F. Tan . 2022 . Efficient Sharpness-aware Minimization for Improved Training of Neural Networks. In The Tenth International Conference on Learning Representations, ICLR 2022 , Virtual Event , April 25-29, 2022 . OpenReview.net. https:\/\/openreview.net\/forum?id=n0OeTdNRG0Q Jiawei Du, Hanshu Yan, Jiashi Feng, Joey Tianyi Zhou, Liangli Zhen, Rick Siow Mong Goh, and Vincent Y. F. Tan. 2022. Efficient Sharpness-aware Minimization for Improved Training of Neural Networks. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https:\/\/openreview.net\/forum?id=n0OeTdNRG0Q"},{"key":"e_1_3_2_2_9_1","volume-title":"Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017","author":"Dziugaite Gintare Karolina","year":"2017","unstructured":"Gintare Karolina Dziugaite and Daniel M. Roy . 2017a. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data . In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017 , Sydney, Australia, August 11--15 , 2017 , Gal Elidan, Kristian Kersting, and Alexander T. Ihler (Eds.). AUAI Press. http:\/\/auai.org\/uai2017\/proceedings\/papers\/173.pdf Gintare Karolina Dziugaite and Daniel M. Roy. 2017a. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11--15, 2017, Gal Elidan, Kristian Kersting, and Alexander T. Ihler (Eds.). AUAI Press. http:\/\/auai.org\/uai2017\/proceedings\/papers\/173.pdf"},{"key":"e_1_3_2_2_10_1","volume-title":"Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017","author":"Dziugaite Gintare Karolina","year":"2017","unstructured":"Gintare Karolina Dziugaite and Daniel M. Roy . 2017b. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data . In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017 , Sydney, Australia , August 11-15, 2017 , Gal Elidan, Kristian Kersting, and Alexander T. Ihler (Eds.). AUAI Press. http:\/\/auai.org\/uai2017\/proceedings\/papers\/173.pdf Gintare Karolina Dziugaite and Daniel M. Roy. 2017b. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. In Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence, UAI 2017, Sydney, Australia, August 11-15, 2017, Gal Elidan, Kristian Kersting, and Alexander T. Ihler (Eds.). AUAI Press. http:\/\/auai.org\/uai2017\/proceedings\/papers\/173.pdf"},{"key":"e_1_3_2_2_11_1","volume-title":"Sharpness-aware Minimization for Efficiently Improving Generalization. In 9th International Conference on Learning Representations, ICLR 2021","author":"Foret Pierre","year":"2021","unstructured":"Pierre Foret , Ariel Kleiner , Hossein Mobahi , and Behnam Neyshabur . 2021 . Sharpness-aware Minimization for Efficiently Improving Generalization. In 9th International Conference on Learning Representations, ICLR 2021 , Virtual Event, Austria , May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=6Tm1mposlrM Pierre Foret, Ariel Kleiner, Hossein Mobahi, and Behnam Neyshabur. 2021. Sharpness-aware Minimization for Efficiently Improving Generalization. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https:\/\/openreview.net\/forum?id=6Tm1mposlrM"},{"key":"e_1_3_2_2_12_1","unstructured":"Noah Golmant Zhewei Yao Amir Gholami Michael Mahoney and Joseph Gonzalez. 2018. pytorch-hessian-eigenthings: efficient PyTorch Hessian eigende composition https:\/\/github.com\/noahgolmant\/pytorch-hessian-eigenthings  Noah Golmant Zhewei Yao Amir Gholami Michael Mahoney and Joseph Gonzalez. 2018. pytorch-hessian-eigenthings: efficient PyTorch Hessian eigende composition https:\/\/github.com\/noahgolmant\/pytorch-hessian-eigenthings"},{"key":"e_1_3_2_2_13_1","volume-title":"Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016 , Las Vegas, NV, USA , June 27-30, 2016. IEEE Computer Society, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90 10.1109\/CVPR.2016.90 Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770--778. https:\/\/doi.org\/10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.1.1"},{"key":"e_1_3_2_2_15_1","volume-title":"Virtual","volume":"97","author":"Huang W. Ronny","year":"2020","unstructured":"W. Ronny Huang , Zeyad Emam , Micah Goldblum , Liam Fowl , Justin K. Terry , Furong Huang , and Tom Goldstein . 2020 . Understanding Generalization Through Visualizations. In \"I Can't Believe It's Not Better!\" at NeurIPS Workshops , Virtual , December 12, 2020 (Proceedings of Machine Learning Research , Vol. 137), Jessica Zosa Forde, Francisco J. R. Ruiz, Melanie F. Pradier, and Aaron Schein (Eds.). PMLR, 87-- 97 . https:\/\/proceedings.mlr.press\/v137\/huang20a.html W. Ronny Huang, Zeyad Emam, Micah Goldblum, Liam Fowl, Justin K. Terry, Furong Huang, and Tom Goldstein. 2020. Understanding Generalization Through Visualizations. In \"I Can't Believe It's Not Better!\" at NeurIPS Workshops, Virtual, December 12, 2020 (Proceedings of Machine Learning Research, Vol. 137), Jessica Zosa Forde, Francisco J. R. Ruiz, Melanie F. Pradier, and Aaron Schein (Eds.). PMLR, 87--97. https:\/\/proceedings.mlr.press\/v137\/huang20a.html"},{"key":"e_1_3_2_2_16_1","volume-title":"Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018","author":"Izmailov Pavel","year":"2018","unstructured":"Pavel Izmailov , Dmitrii Podoprikhin , Timur Garipov , Dmitry P. Vetrov , and Andrew Gordon Wilson . 2018 . Averaging Weights Leads to Wider Optima and Better Generalization . In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018 , Monterey, California, USA , August 6-10, 2019, Amir Globerson and Ricardo Silva (Eds.). AUAI Press, 876--885. http:\/\/auai.org\/uai2018\/proceedings\/papers\/313.pdf Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry P. Vetrov, and Andrew Gordon Wilson. 2018. Averaging Weights Leads to Wider Optima and Better Generalization. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, UAI 2018, Monterey, California, USA, August 6-10, 2019, Amir Globerson and Ricardo Silva (Eds.). AUAI Press, 876--885. http:\/\/auai.org\/uai2018\/proceedings\/papers\/313.pdf"},{"key":"e_1_3_2_2_17_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020","volume":"4815","author":"Jiang Lu","year":"2020","unstructured":"Lu Jiang , Di Huang , Mason Liu , and Weilong Yang . 2020 . Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels . In Proceedings of the 37th International Conference on Machine Learning, ICML 2020 , 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research , Vol. 119). PMLR, 4804-- 4815 . http:\/\/proceedings.mlr.press\/v119\/jiang20c.html Lu Jiang, Di Huang, Mason Liu, and Weilong Yang. 2020. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 4804--4815. http:\/\/proceedings.mlr.press\/v119\/jiang20c.html"},{"key":"e_1_3_2_2_18_1","volume-title":"Kusner","author":"Kaddour Jean","year":"2022","unstructured":"Jean Kaddour , Linqing Liu , Ricardo Silva , and Matt J . Kusner . 2022 . When Do Flat Minima Optimizers Work?. In NeurIPS. http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/69b5534586d6c035a96b49c86dbeece8-Abstract-Conference.html Jean Kaddour, Linqing Liu, Ricardo Silva, and Matt J. Kusner. 2022. When Do Flat Minima Optimizers Work?. In NeurIPS. http:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/69b5534586d6c035a96b49c86dbeece8-Abstract-Conference.html"},{"key":"e_1_3_2_2_19_1","volume-title":"5th International Conference on Learning Representations, ICLR","author":"Keskar Nitish Shirish","year":"2017","unstructured":"Nitish Shirish Keskar , Dheevatsa Mudigere , Jorge Nocedal , Mikhail Smelyanskiy , and Ping Tak Peter Tang . 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima . In 5th International Conference on Learning Representations, ICLR 2017 , Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview .net. https:\/\/openreview.net\/forum?id=H1oyRlYgg Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https:\/\/openreview.net\/forum?id=H1oyRlYgg"},{"key":"e_1_3_2_2_20_1","volume-title":"Fisher SAM: Information Geometry and Sharpness Aware Minimisation. In International Conference on Machine Learning, ICML 2022","volume":"11161","author":"Kim Minyoung","year":"2022","unstructured":"Minyoung Kim , Da Li , Shell Xu Hu , and Timothy M. Hospedales . 2022 . Fisher SAM: Information Geometry and Sharpness Aware Minimisation. In International Conference on Machine Learning, ICML 2022 , 17-23 July 2022 , Baltimore, Maryland, USA (Proceedings of Machine Learning Research , Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1 ri, Gang Niu, and Sivan Sabato (Eds.). PMLR, 11148-- 11161 . https:\/\/proceedings.mlr.press\/v162\/kim22f.html Minyoung Kim, Da Li, Shell Xu Hu, and Timothy M. Hospedales. 2022. Fisher SAM: Information Geometry and Sharpness Aware Minimisation. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv\u00e1 ri, Gang Niu, and Sivan Sabato (Eds.). PMLR, 11148--11161. https:\/\/proceedings.mlr.press\/v162\/kim22f.html"},{"key":"e_1_3_2_2_21_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). http:\/\/arxiv.org\/abs\/1412.6980 Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_2_2_22_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning, ICML 2021","volume":"5914","author":"Kwon Jungmin","year":"2021","unstructured":"Jungmin Kwon , Jeongseop Kim , Hyunseo Park , and In Kwon Choi . 2021 . ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks . In Proceedings of the 38th International Conference on Machine Learning, ICML 2021 , 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research , Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 5905-- 5914 . http:\/\/proceedings.mlr.press\/v139\/kwon21b.html Jungmin Kwon, Jeongseop Kim, Hyunseo Park, and In Kwon Choi. 2021. ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 5905--5914. http:\/\/proceedings.mlr.press\/v139\/kwon21b.html"},{"key":"e_1_3_2_2_23_1","volume-title":"Visualizing the Loss Landscape of Neural Nets. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018","author":"Li Hao","year":"2018","unstructured":"Hao Li , Zheng Xu , Gavin Taylor , Christoph Studer , and Tom Goldstein . 2018 . Visualizing the Loss Landscape of Neural Nets. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 , NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 6391--6401. https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/a41b3bb3e6b050b6c9067c67f663b915-Abstract.html Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicol\u00f2 Cesa-Bianchi, and Roman Garnett (Eds.). 6391--6401. https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/a41b3bb3e6b050b6c9067c67f663b915-Abstract.html"},{"key":"e_1_3_2_2_24_1","volume-title":"Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019","author":"Loshchilov Ilya","year":"2019","unstructured":"Ilya Loshchilov and Frank Hutter . 2019 . Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019 , New Orleans, LA, USA , May 6-9, 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=Bkg6RiCqY7 Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https:\/\/openreview.net\/forum?id=Bkg6RiCqY7"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/307400.307435"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/0041-5553(64)90137-5"},{"key":"e_1_3_2_2_27_1","volume-title":"A stochastic approximation method. The annals of mathematical statistics","author":"Robbins Herbert","year":"1951","unstructured":"Herbert Robbins and Sutton Monro . 1951. A stochastic approximation method. The annals of mathematical statistics ( 1951 ), 400--407. Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The annals of mathematical statistics (1951), 400--407."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_2_29_1","volume-title":"Understanding Machine Learning - From Theory to Algorithms","author":"Shalev-Shwartz Shai","unstructured":"Shai Shalev-Shwartz and Shai Ben-David . 2014. Understanding Machine Learning - From Theory to Algorithms . Cambridge University Press . http:\/\/www.cambridge.org\/de\/academic\/subjects\/computer-science\/pattern-recognition-and-machine-learning\/understanding-machine-learning-theory-algorithms Shai Shalev-Shwartz and Shai Ben-David. 2014. Understanding Machine Learning - From Theory to Algorithms. Cambridge University Press. http:\/\/www.cambridge.org\/de\/academic\/subjects\/computer-science\/pattern-recognition-and-machine-learning\/understanding-machine-learning-theory-algorithms"},{"key":"e_1_3_2_2_30_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning, ICML 2021","volume":"10357","author":"Touvron Hugo","year":"2021","unstructured":"Hugo Touvron , Matthieu Cord , Matthijs Douze , Francisco Massa , Alexandre Sablayrolles , and Herv\u00e9 J\u00e9 gou. 2021 . Training data-efficient image transformers & distillation through attention . In Proceedings of the 38th International Conference on Machine Learning, ICML 2021 , 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research , Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 10347-- 10357 . http:\/\/proceedings.mlr.press\/v139\/touvron21a.html Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herv\u00e9 J\u00e9 gou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 10347--10357. http:\/\/proceedings.mlr.press\/v139\/touvron21a.html"},{"key":"e_1_3_2_2_31_1","volume-title":"Chervonenkis","author":"Vapnik V. N.","year":"1971","unstructured":"V. N. Vapnik and A. Ya . Chervonenkis . 1971 . On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability and its Applications , Vol. 16 , 2 (1971), 264--280. https:\/\/doi.org\/10.1137\/1116025 10.1137\/1116025 V. N. Vapnik and A. Ya. Chervonenkis. 1971. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability and its Applications, Vol. 16, 2 (1971), 264--280. https:\/\/doi.org\/10.1137\/1116025"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.30.87"},{"key":"e_1_3_2_2_33_1","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https:\/\/openreview.net\/forum?id=Sy8gdB9xx","author":"Zhang Chiyuan","year":"2017","unstructured":"Chiyuan Zhang , Samy Bengio , Moritz Hardt , Benjamin Recht , and Oriol Vinyals . 2017 . Understanding deep learning requires rethinking generalization . In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https:\/\/openreview.net\/forum?id=Sy8gdB9xx Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. Understanding deep learning requires rethinking generalization. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https:\/\/openreview.net\/forum?id=Sy8gdB9xx"},{"key":"e_1_3_2_2_34_1","volume-title":"Surrogate Gap Minimization Improves Sharpness-Aware Training. In The Tenth International Conference on Learning Representations, ICLR 2022","author":"Zhuang Juntang","year":"2022","unstructured":"Juntang Zhuang , Boqing Gong , Liangzhe Yuan , Yin Cui , Hartwig Adam , Nicha C. Dvornek , Sekhar Tatikonda , James S. Duncan , and Ting Liu . 2022 . Surrogate Gap Minimization Improves Sharpness-Aware Training. In The Tenth International Conference on Learning Representations, ICLR 2022 , Virtual Event , April 25-29, 2022. OpenReview.net. https:\/\/openreview.net\/forum?id=edONMAnhLu- Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha C. Dvornek, Sekhar Tatikonda, James S. Duncan, and Ting Liu. 2022. Surrogate Gap Minimization Improves Sharpness-Aware Training. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https:\/\/openreview.net\/forum?id=edONMAnhLu-"}],"event":{"name":"KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","location":"Long Beach CA USA","acronym":"KDD '23","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGKDD ACM Special Interest Group on Knowledge Discovery in Data"]},"container-title":["Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599501","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580305.3599501","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:52Z","timestamp":1750178272000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580305.3599501"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,4]]},"references-count":34,"alternative-id":["10.1145\/3580305.3599501","10.1145\/3580305"],"URL":"https:\/\/doi.org\/10.1145\/3580305.3599501","relation":{},"subject":[],"published":{"date-parts":[[2023,8,4]]},"assertion":[{"value":"2023-08-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}