{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T00:21:08Z","timestamp":1758846068701,"version":"3.44.0"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"Guangdong Provincial Key Lab of Integrated Communication, Sensing and Computation for Ubiquitous Internet of Things","award":["2023B1212010007"],"award-info":[{"award-number":["2023B1212010007"]}]},{"name":"China NSFC Grant","award":["62472366"],"award-info":[{"award-number":["62472366"]}]},{"name":"the Project of DEGP","award":["2024GCZX003, 2023KCXTD042"],"award-info":[{"award-number":["2024GCZX003, 2023KCXTD042"]}]},{"DOI":"10.13039\/501100018925","name":"111 Center","doi-asserted-by":"crossref","award":["D25008"],"award-info":[{"award-number":["D25008"]}],"id":[{"id":"10.13039\/501100018925","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shenzhen Science and Technology Foundation","award":["ZDSYS20190902092853047"],"award-info":[{"award-number":["ZDSYS20190902092853047"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2025,9,3]]},"abstract":"<jats:p>Foundation models have achieved remarkable success across various domains by learning general representations from raw data, offering a promising paradigm for diverse applications. This concept holds great potential for advancing human activity recognition (HAR), particularly in overcoming challenges associated with collecting large-scale labeled datasets. However, the dynamic nature of HAR tasks, characterized by diverse sensing devices and activity types, results in fragmented datasets that question the feasibility of applying foundation model to this domain. In this work, we propose a novel foundation model training framework that effectively leverages heterogeneous datasets through a two-stage training strategy: (1) self-supervised learning to extract cross-domain sensor patterns, followed by (2) multi-task learning to align representations with semantic contexts. The effectiveness of the trained foundation model is demonstrated through extensive downstream experiments, with the superior fine-tuning performance across various modalities and input configurations---achieving the highest performance metric in 10 out of 12 settings---further validating the robustness and adaptability. While our model shows a performance gap compared to foundation models pre-trained on large-scale or high-quality data in zero- and few-shot scenarios, its competitive results with a more flexible architecture demonstrate the efficiency and potential of our training strategy for HAR foundation models.<\/jats:p>","DOI":"10.1145\/3749479","type":"journal-article","created":{"date-parts":[[2025,9,3]],"date-time":"2025-09-03T17:15:45Z","timestamp":1756919745000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Towards Customizable Foundation Models for Human Activity Recognition with Wearable Devices"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8632-8282","authenticated-orcid":false,"given":"Minghui","family":"Qiu","sequence":"first","affiliation":[{"name":"DSA, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-7184-4616","authenticated-orcid":false,"given":"Cekai","family":"Weng","sequence":"additional","affiliation":[{"name":"DSA, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0356-4712","authenticated-orcid":false,"given":"Mingming","family":"Fan","sequence":"additional","affiliation":[{"name":"CMA &amp; IoT, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2216-0737","authenticated-orcid":false,"given":"Kaishun","family":"Wu","sequence":"additional","affiliation":[{"name":"DSA &amp; IoT, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Large-scale training of foundation models for wearable biosignals. arXiv preprint arXiv:2312.05409","author":"Abbaspourazad Salar","year":"2023","unstructured":"Salar Abbaspourazad, Oussama Elachqar, Andrew C Miller, Saba Emrani, Udhyakumar Nallasamy, and Ian Shapiro. 2023. Large-scale training of foundation models for wearable biosignals. arXiv preprint arXiv:2312.05409 (2023)."},{"key":"e_1_2_1_2_1","first-page":"3","article-title":"A public domain dataset for human activity recognition using smartphones","volume":"3","author":"Anguita Davide","year":"2013","unstructured":"Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. 2013. A public domain dataset for human activity recognition using smartphones.. In Esann, Vol. 3. 3.","journal-title":"Esann"},{"key":"e_1_2_1_3_1","volume-title":"wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449--12460."},{"volume-title":"mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications","author":"Ba\u00f1os Oresti","key":"e_1_2_1_4_1","unstructured":"Oresti Ba\u00f1os, Rafael Garcia, Juan A. Holgado-Terriza, Miguel Damas, Hector Pomares, Ignacio Rojas, Alejandro Saez, and Claudia Villalonga. 2014. mHealthDroid: A Novel Framework for Agile Development of Mobile Health Applications. In Ambient Assisted Living and Daily Activities, Leandro Pecchia, Liming Luke Chen, Chris Nugent, and Jos\u00e9 Bravo (Eds.). Springer International Publishing, Cham, 91--98."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370216.2370437"},{"key":"e_1_2_1_6_1","volume-title":"Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948","author":"Cao Defu","year":"2023","unstructured":"Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, and Yan Liu. 2023. Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948 (2023)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-024-03960-3"},{"key":"e_1_2_1_8_1","volume-title":"Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469","author":"Chang Ching","year":"2023","unstructured":"Ching Chang, Wen-Chih Peng, and Tien-Fu Chen. 2023. Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469 (2023)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2012.12.014"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU51503.2021.9688253"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-023-02679-x"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-07743-4"},{"key":"e_1_2_1_13_1","volume-title":"arXiv preprint arXiv:2310.03589","author":"Garza Azul","year":"2023","unstructured":"Azul Garza and Max Mergenthaler-Canseco. 2023. TimeGPT-1. arXiv preprint arXiv:2310.03589 (2023)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3659597"},{"key":"e_1_2_1_15_1","unstructured":"Eric Jang Shixiang Gu and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. arXiv:1611.01144 [stat.ML] https:\/\/arxiv.org\/abs\/1611.01144"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241548"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806333"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2018.2837758"},{"key":"e_1_2_1_19_1","volume-title":"Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728","author":"Jin Ming","year":"2023","unstructured":"Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, et al. 2023. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728 (2023)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.57"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2994551.2994569"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3675095.3676618"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560905.3568548"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432208"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589334.3645434"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307334.3326109"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302505.3310068"},{"key":"e_1_2_1_28_1","volume-title":"Conference on Health, Inference, and Learning. PMLR, 191--206","author":"Merrill Mika A","year":"2023","unstructured":"Mika A Merrill and Tim Althoff. 2023. Self-supervised pretraining and transfer learning enable\\titlebreak flu and covid-19 predictions in small mobile sensing datasets. In Conference on Health, Inference, and Learning. PMLR, 191--206."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3390\/app7101101"},{"key":"e_1_2_1_30_1","unstructured":"Girish Narayanswamy Xin Liu Kumar Ayush Yuzhe Yang Xuhai Xu Shun Liao Jake Garrison Shyam Tailor Jake Sunshine Yun Liu et al. 2024. Scaling Wearable Foundation Models. arXiv preprint arXiv:2410.13638 (2024)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-021-00817-w"},{"key":"e_1_2_1_32_1","volume-title":"A survey on vision-based human action recognition. Image and vision computing 28, 6","author":"Poppe Ronald","year":"2010","unstructured":"Ronald Poppe. 2010. A survey on vision-based human action recognition. Image and vision computing 28, 6 (2010), 976--990."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599360"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2019.2900862"},{"key":"e_1_2_1_35_1","volume-title":"Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Bilo\u0161, Hena Ghonia, Nadhir Hassen, Anderson Schneider, et al.","author":"Rasul Kashif","year":"2023","unstructured":"Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Bilo\u0161, Hena Ghonia, Nadhir Hassen, Anderson Schneider, et al. 2023. Lag-llama: Towards foundation models for time series forecasting. In R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Large Foundation Models."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2413097.2413148"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISWC.2012.13"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2015.07.085"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/INSS.2010.5573462"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3328932"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.3390\/s140610146"},{"key":"e_1_2_1_42_1","volume-title":"MPNet: Masked and Permuted Pre-training for Language Understanding. CoRR abs\/2004.09297","author":"Song Kaitao","year":"2020","unstructured":"Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2020. MPNet: Masked and Permuted Pre-training for Language Understanding. CoRR abs\/2004.09297 (2020). arXiv:2004.09297 https:\/\/arxiv.org\/abs\/2004.09297"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2809695.2809718"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i7.16763"},{"key":"e_1_2_1_45_1","volume-title":"Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, and James Zou. [n. d.]. Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals. 5","author":"Thapa Rahul","year":"2024","unstructured":"Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, and James Zou. [n. d.]. Sleepfm: Multi-modal representation learning for sleep across brain activity, ecg and respiratory signals. 5 2024. arXiv preprint arXiv:2405.17766 ([n. d.])."},{"key":"e_1_2_1_46_1","first-page":"2579","article-title":"Visualizing Data using t-SNE","volume":"9","author":"van der Maaten Laurens","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579--2605. http:\/\/jmlr.org\/papers\/v9\/vandermaaten08a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1101\/2020.11.10.20227769"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3367329"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790093"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966039"},{"key":"e_1_2_1_52_1","unstructured":"Gary Weiss. 2019. WISDM Smartphone and Smartwatch Activity and Biometrics Dataset."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-018-26174-1"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3631445"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3407899"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3570361.3613299"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485730.3485937"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3380901"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052577"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4400--4404","author":"Michael Yeh Chin-Chia","year":"2023","unstructured":"Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Audrey Der, Vivian Lai, Zhongfang Zhuang, Junpeng Wang, Liang Wang, et al. 2023. Toward a foundation model for time series data. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4400--4404."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3356250.3360045"},{"key":"e_1_2_1_62_1","volume-title":"Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. NPJ digital medicine 7, 1","author":"Yuan Hang","year":"2024","unstructured":"Hang Yuan, Shing Chan, Andrew P Creagh, Catherine Tong, Aidan Acquah, David A Clifton, and Aiden Doherty. 2024. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. NPJ digital medicine 7, 1 (2024), 91."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3636534.3649361"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370216.2370438"},{"key":"e_1_2_1_65_1","volume-title":"Shuheng Li, Dezhi Hong, Rajesh K. Gupta, and Jingbo Shang.","author":"Zhang Xiyuan","year":"2024","unstructured":"Xiyuan Zhang, Diyan Teng, Ranak Roy Chowdhury, Shuheng Li, Dezhi Hong, Rajesh K. Gupta, and Jingbo Shang. 2024. UniMTS: Unified Pre-training for Motion Time Series. arXiv:2410.19818 [eess.SP] https:\/\/arxiv.org\/abs\/2410.19818"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458750"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307334.3326081"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369836"},{"key":"e_1_2_1_69_1","unstructured":"Tian Zhou Peisong Niu Liang Sun Rong Jin et al. 2023. One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems 36 (2023) 43322--43355."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3675095.3676624"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544794.3558467"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3749479","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T16:26:09Z","timestamp":1758817569000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3749479"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,3]]},"references-count":71,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,3]]}},"alternative-id":["10.1145\/3749479"],"URL":"https:\/\/doi.org\/10.1145\/3749479","relation":{},"ISSN":["2474-9567"],"issn-type":[{"type":"electronic","value":"2474-9567"}],"subject":[],"published":{"date-parts":[[2025,9,3]]},"assertion":[{"value":"2025-09-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}