{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T07:23:01Z","timestamp":1763018581893,"version":"build-2065373602"},"reference-count":42,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2022,11,9]],"date-time":"2022-11-09T00:00:00Z","timestamp":1667952000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Multi-task learning (MTL) is a paradigm to learn multiple tasks simultaneously by utilizing a shared network, in which a distinct header network is further tailored for fine-tuning for each distinct task. Personalized federated learning (PFL) can be achieved through MTL in the context of federated learning (FL) where tasks are distributed across clients, referred to as personalized federated MTL (PF-MTL). Statistical heterogeneity caused by differences in the task complexities across clients and the non-identically independently distributed (non-i.i.d.) characteristics of local datasets degrades the system performance. To overcome this degradation, we propose FedGradNorm, a distributed dynamic weighting algorithm that balances learning speeds across tasks by normalizing the corresponding gradient norms in PF-MTL. We prove an exponential convergence rate for FedGradNorm. Further, we propose HOTA-FedGradNorm by utilizing over-the-air aggregation (OTA) with FedGradNorm in a hierarchical FL (HFL) setting. HOTA-FedGradNorm is designed to have efficient communication between the parameter server (PS) and clients in the power- and bandwidth-limited regime. We conduct experiments with both FedGradNorm and HOTA-FedGradNorm using MT facial landmark (MTFL) and wireless communication system (RadComDynamic) datasets. The results indicate that both frameworks are capable of achieving a faster training performance compared to equal-weighting strategies. In addition, FedGradNorm and HOTA-FedGradNorm compensate for imbalanced datasets across clients and adverse channel effects.<\/jats:p>","DOI":"10.3390\/a15110421","type":"journal-article","created":{"date-parts":[[2022,11,10]],"date-time":"2022-11-10T02:07:48Z","timestamp":1668046068000},"page":"421","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Personalized Federated Multi-Task Learning over Wireless Fading Channels"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1945-5534","authenticated-orcid":false,"given":"Matin","family":"Mortaheb","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA"}]},{"given":"Cemil","family":"Vahapoglu","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA"}]},{"given":"Sennur","family":"Ulukus","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,11,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1023\/A:1007379606734","article-title":"Multitask learning","volume":"28","author":"Caruana","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, Y., and Yang, Q. (2017). A survey on multi-task learning. arXiv.","DOI":"10.1093\/nsr\/nwx105"},{"key":"ref_3","unstructured":"McMahan, B., Moore, E., Ramage, D., Hampson, S., and Aguera y Arcas, B. (2017, January 20\u201322). Communication-efficient learning of deep networks from decentralized data. Proceedings of the AISTATS, Fort Lauderdale, FL, USA."},{"key":"ref_4","unstructured":"Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2020, January 2\u20134). Federated optimization in heterogeneous networks. Proceedings of the MLSys, Austin, TX, USA."},{"key":"ref_5","unstructured":"Karimireddy, S.P., Kale, S., Mohri, M., Reddi, S., Stich, S., and Suresh, A.T. (2020, January 13\u201318). SCAFFOLD: Stochastic controlled averaging for federated learning. Proceedings of the ICML, Virtual."},{"key":"ref_6","unstructured":"Fifty, C., Amid, E., Zhao, Z., Yu, T., Anil, R., and Finn, C. (2020). Measuring and harnessing transference in multi-task learning. arXiv."},{"key":"ref_7","unstructured":"Collins, L., Hassani, H., Mokhtari, A., and Shakkottai, S. (2021, January 18\u201324). Exploiting shared representations for personalized federated learning. Proceedings of the ICML, Virtual."},{"key":"ref_8","unstructured":"Arivazhagan, M.G., Aggarwal, V., Singh, A.K., and Choudhary, S. (2019). Federated learning with personalization layers. arXiv."},{"key":"ref_9","unstructured":"Deng, Y., Kamani, M., and Mahdavi, M. (2020). Adaptive personalized federated learning. arXiv."},{"key":"ref_10","unstructured":"Fallah, A., Mokhtari, A., and Ozdaglar, A. (2020, January 6\u201312). Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. Proceedings of the NeurIPS, Virtual."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1007\/s10107-017-1173-0","article-title":"An optimal randomized incremental gradient method","volume":"171","author":"Lan","year":"2018","journal-title":"Math. Program."},{"key":"ref_12","unstructured":"Smith, V., Chiang, C.K., Sanjabi, M., and Talwalkar, A.S. (2017, January 4\u20139). Federated multi-task learning. Proceedings of the NeurIPS, Long Beach, CA, USA."},{"key":"ref_13","unstructured":"Hanzely, F., and Richt\u00e1rik, P. (2020). Federated learning of a mixture of global and local models. arXiv."},{"key":"ref_14","unstructured":"Liang, P.P., Liu, T., Ziyin, L., Allen, N.B., Auerbach, R.P., Brent, D., Salakhutdinov, R., and Morency, L.P. (2020). Think locally, act globally: Federated learning with local and global representations. arXiv."},{"key":"ref_15","unstructured":"Agarwal, A., Langford, J., and Wei, C.Y. (2020). Federated residual learning. arXiv."},{"key":"ref_16","unstructured":"Hanzely, F., Zhao, B., and Kolar, M. (2021). Personalized federated learning: A unified framework and universal optimization techniques. arXiv."},{"key":"ref_17","unstructured":"Kendall, A., Gal, Y., and Cipolla, R. (2018, January 18\u201322). Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. Proceedings of the CVPR, Salt Lake City, UT, USA."},{"key":"ref_18","unstructured":"Qian, W., Chen, B., Zhang, Y., Wen, G., and Gechter, F. (2020). Multi-task variational information bottleneck. arXiv."},{"key":"ref_19","unstructured":"Chen, Z., Badrinarayanan, V., Lee, C., and Rabinovich, A. (2018, January 10\u201315). GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. Proceedings of the ICML, Stockholm, Sweden."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mortaheb, M., Vahapoglu, C., and Ulukus, S. (2022, January 4\u20136). FedGradNorm: Personalized federated gradient-normalized multi-task learning. Proceedings of the IEEE SPAWC, Oulu, Finland.","DOI":"10.1109\/SPAWC51304.2022.9833969"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Amiri, M.M., and G\u00fcnd\u00fcz, D. (2019, January 7\u201312). Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air. Proceedings of the IEEE ISIT, Paris, France.","DOI":"10.1109\/ISIT.2019.8849334"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Amiri, M.M., and G\u00fcnd\u00fcz, D. (2019, January 2\u20135). Over-the-air machine learning at the wireless edge. Proceedings of the IEEE SPAWC, Cannes, France.","DOI":"10.1109\/SPAWC.2019.8815402"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Vahapoglu, C., Mortaheb, M., and Ulukus, S. (2022, January 1\u20134). Hierarchical over-the-air FedGradNorm. Proceedings of the IEEE Asilomar, Pacific Grove, CA, USA.","DOI":"10.1109\/IEEECONF56349.2022.10052054"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Abad, M.S.H., Ozfatura, E., G\u00fcnd\u00fcz, D., and Er\u00e7etin, \u00d6. (2020, January 4\u20138). Hierarchical federated learning across heterogeneous cellular networks. Proceedings of the IEEE ICASSP, Virtual.","DOI":"10.1109\/ICASSP40776.2020.9054634"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, L., Zhang, J., Song, S.H., and Letaief, K.B. (2020, January 7\u201311). Client-edge-cloud hierarchical federated learning. Proceedings of the IEEE ICC, Virtual.","DOI":"10.1109\/ICC40277.2020.9148862"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"6535","DOI":"10.1109\/TWC.2020.3003744","article-title":"HFEL: Joint edge association and resource allocation for cost-efficient hierarchical federated edge learning","volume":"19","author":"Luo","year":"2020","journal-title":"IEEE Trans. Wirel. Commun."},{"key":"ref_27","unstructured":"Wang, J., Wang, S., Chen, R.R., and Ji, M. (2020). Demystifying why local aggregation helps: Convergence analysis of hierarchical SGD. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Luo, P., Loy, C., and Tang, X. (2014, January 6\u201312). Facial landmark detection by deep multi-task learning. Proceedings of the ECCV, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10599-4_7"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Jagannath, A., and Jagannath, J. (2021, January 7\u201311). Multi-task learning approach for automatic modulation and wireless signal classification. Proceedings of the IEEE ICC, Virtual.","DOI":"10.36227\/techrxiv.15156978.v1"},{"key":"ref_30","unstructured":"Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., Kiddon, C., Konecn\u00fd, J., Mazzocchi, S., and McMahan, H. (April, January 31). Towards federated learning at scale: System design. Proceedings of the MLSys, Stanford, CA, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1109\/TEVC.2017.2712906","article-title":"A review on bilevel optimization: From classical to evolutionary approaches and applications","volume":"22","author":"Sinha","year":"2017","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1194","DOI":"10.1137\/0913069","article-title":"New branch-and-bound rules for linear bilevel programming","volume":"13","author":"Hansen","year":"1992","journal-title":"SIAM J. Sci. Comput."},{"key":"ref_33","first-page":"51","article-title":"An extended kuhn-tucker approach for linear bilevel programming","volume":"162","author":"Shi","year":"2005","journal-title":"Appl. Math. Comput."},{"key":"ref_34","unstructured":"Bennett, K.P., and Moore, G.M. (2010, January 9). Bilevel programming algorithms for machine learning model selection. Proceedings of the Rensselaer Polytechnic Institute."},{"key":"ref_35","unstructured":"Domke, J. (2012, January 21\u201323). Generic methods for optimization-based modeling. Proceedings of the AISTATS, La Palma, Canary Islands."},{"key":"ref_36","unstructured":"Ghadimi, S., and Wang, M. (2018). Approximation methods for bilevel programming. arXiv."},{"key":"ref_37","unstructured":"Grazzi, R., Franceschi, L., Pontil, M., and Salzo, S. (2020, January 13\u201318). On the iteration complexity of hypergradient computation. Proceedings of the ICML, Virtual."},{"key":"ref_38","unstructured":"Shaban, A., Cheng, C.A., Hatch, N., and Boots, B. (2019, January 16\u201318). Truncated back-propagation for bilevel optimization. Proceedings of the AISTATS, Naha, Okinawa, Japan."},{"key":"ref_39","unstructured":"Maclaurin, D., Duvenaud, D., and Adams, R. (2015, January 6\u201311). Gradient-based hyperparameter optimization through reversible learning. Proceedings of the ICML, Lille, France."},{"key":"ref_40","unstructured":"Ji, K., Yang, J., and Liang, Y. (2021, January 18\u201324). Bilevel optimization: Convergence analysis and enhanced design. Proceedings of the ICML, Virtual."},{"key":"ref_41","unstructured":"Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G.R., Gibbons, P.B., and Mutlu, O. (2017, January 27\u201329). Gaia: Geo-distributed machine learning approaching LAN speeds. Proceedings of the NSDI, Boston, MA, USA."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.eng.2021.12.002","article-title":"Federated learning for 6G: Applications, challenges, and opportunities","volume":"8","author":"Yang","year":"2022","journal-title":"Engineering"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/11\/421\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:13:02Z","timestamp":1760145182000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/11\/421"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,9]]},"references-count":42,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["a15110421"],"URL":"https:\/\/doi.org\/10.3390\/a15110421","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,11,9]]}}}