{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T03:06:05Z","timestamp":1773803165842,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"28","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>AdamW has become one of the most effective optimizers for training large-scale models. We have also observed its effectiveness in the context of federated learning (FL). However, directly applying AdamW in federated learning settings poses significant challenges: (1) due to data heterogeneity, AdamW often yields  high variance in the second-moment estimate v; (2) the local overfitting of AdamW may  cause client drift; and (3) Reinitializing  moment estimates (v, m) at each round slows down convergence. To address these challenges, we propose the first Federated AdamW  algorithm, called FedAdamW, for training and fine-tuning various large models. FedAdamW aligns local updates with the global update using both a local correction mechanism and decoupled weight decay to mitigate local overfitting. FedAdamW efficiently aggregates the mean of the second-moment estimates to reduce their variance and reinitialize them.  Theoretically, we prove that FedAdamW achieves a linear speedup convergence rate of O\uff08p\uff08L\u2206\u03c32l \uff09\/\uff08SKR\u03b52\uff09 + \uff08L\u2206\uff09\/R\uff09 without heterogeneity assumption, where S is the number of participating clients per round, K is the number of local iterations, and R is the total number of communication rounds. We also employ PAC-Bayesian generalization analysis to explain the effectiveness of decoupled weight decay in local training. Empirically, we validate the effectiveness of FedAdamW on language and vision Transformer models. Compared to several  baselines, FedAdamW significantly reduces communication rounds and improves test accuracy.<\/jats:p>","DOI":"10.1609\/aaai.v40i28.39549","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:42:17Z","timestamp":1773798137000},"page":"23748-23756","source":"Crossref","is-referenced-by-count":0,"title":["FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models"],"prefix":"10.1609","volume":"40","author":[{"given":"Junkang","family":"Liu","sequence":"first","affiliation":[]},{"given":"Fanhua","family":"Shang","sequence":"additional","affiliation":[]},{"given":"Hongying","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Yuxuan","family":"Tian","sequence":"additional","affiliation":[]},{"given":"Yuanyuan","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Jin","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Kewen","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Zhouchen","family":"Lin","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/39549\/43510","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/39549\/43510","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T01:42:17Z","timestamp":1773798137000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/39549"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"28","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i28.39549","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}