{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T15:56:33Z","timestamp":1782834993558,"version":"3.54.5"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China A3 Foresight Program","award":["62061146001"],"award-info":[{"award-number":["62061146001"]}]},{"DOI":"10.13039\/501100006374","name":"National Science Fund for Distinguished Young Scholars","doi-asserted-by":"publisher","award":["62025205"],"award-info":[{"award-number":["62025205"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006374","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62302017, 62102317, 62032020, 61972008"],"award-info":[{"award-number":["62302017, 62102317, 62032020, 61972008"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006374","name":"China Postdoctoral Science Foundation","doi-asserted-by":"publisher","award":["2023M730058"],"award-info":[{"award-number":["2023M730058"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"name":"PKU-NTU Collaboration Project"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,12,19]]},"abstract":"<jats:p>In recent years, considerable endeavors have been devoted to exploring Wi-Fi-based sensing technologies by modeling the intricate mapping between received signals and corresponding human activities. However, the inherent complexity of Wi-Fi signals poses significant challenges for practical applications due to their pronounced susceptibility to deployment environments. To address this challenge, we delve into the distinctive characteristics of Wi-Fi signals and distill three pivotal factors that can be leveraged to enhance generalization capabilities of deep learning-based Wi-Fi sensing models: 1) effectively capture valuable input to mitigate the adverse impact of noisy measurements; 2) adaptively fuse complementary information from multiple Wi-Fi devices to boost the distinguishability of signal patterns associated with different activities; 3) extract generalizable features that can overcome the inconsistent representations of activities under different environmental conditions (e.g., locations, orientations). Leveraging these insights, we design a novel and unified sensing framework based on Wi-Fi signals, dubbed UniFi, and use gesture recognition as an application to demonstrate its effectiveness. UniFi achieves robust and generalizable gesture recognition in real-world scenarios by extracting discriminative and consistent features unrelated to environmental factors from pre-denoised signals collected by multiple transceivers. To achieve this, we first introduce an effective signal preprocessing approach that captures the applicable input data from noisy received signals for the deep learning model. Second, we propose a multi-view deep network based on spatio-temporal cross-view attention that integrates multi-carrier and multi-device signals to extract distinguishable information. Finally, we present the mutual information maximization as a regularizer to learn environment-invariant representations via contrastive loss without requiring access to any signals from unseen environments for practical adaptation. Extensive experiments on the Widar 3.0 dataset demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in different settings (99% and 90%-98% accuracy for in-domain and cross-domain recognition without additional data collection and model training).<\/jats:p>","DOI":"10.1145\/3631429","type":"journal-article","created":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T12:52:04Z","timestamp":1705063924000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":50,"title":["UniFi"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0907-7840","authenticated-orcid":false,"given":"Yan","family":"Liu","sequence":"first","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Ministry of Education), School of Computer Science, Peking University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-7092-7502","authenticated-orcid":false,"given":"Anlan","family":"Yu","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Ministry of Education), School of Computer Science, Peking University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7627-8485","authenticated-orcid":false,"given":"Leye","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Ministry of Education), School of Computer Science, Peking University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6097-2467","authenticated-orcid":false,"given":"Bin","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University, Xi'an, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0340-1462","authenticated-orcid":false,"given":"Yang","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Ministry of Education), School of Computer Science, Peking University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8513-5016","authenticated-orcid":false,"given":"Enze","family":"Yi","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Ministry of Education), School of Computer Science, Peking University, Beijing, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6608-1267","authenticated-orcid":false,"given":"Daqing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (Ministry of Education), School of Computer Science, Peking University, Beijing, China, Telecom SudParis and Institut Polytechnique de Paris, Evry, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,1,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2015.7218525"},{"key":"e_1_2_1_2_1","volume-title":"Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062","author":"Belghazi Mohamed Ishmael","year":"2018","unstructured":"Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018)."},{"key":"e_1_2_1_3_1","volume-title":"International conference on machine learning. PMLR, 1597--1607","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597--1607."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i6.20584"},{"key":"e_1_2_1_5_1","volume-title":"Learning non-linear combinations of kernels. Advances in neural information processing systems 22","author":"Cortes Corinna","year":"2009","unstructured":"Corinna Cortes, Mehryar Mohri, and Afshin Rostamizadeh. 2009. Learning non-linear combinations of kernels. Advances in neural information processing systems 22 (2009)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430735"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550318"},{"key":"e_1_2_1_8_1","volume-title":"International Workshop on Independent Component Analysis and Blind Signal Separation","author":"Fyfe Colin","year":"2001","unstructured":"Colin Fyfe. 2001. ICA using kernel canonical correlation analysis. In International Workshop on Independent Component Analysis and Blind Signal Separation, 2001."},{"key":"e_1_2_1_9_1","volume-title":"International conference on machine learning. PMLR, 1180--1189","author":"Ganin Yaroslav","year":"2015","unstructured":"Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International conference on machine learning. PMLR, 1180--1189."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3463504"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Jiuxiang Gu Zhenhua Wang Jason Kuen Lianyang Ma Amir Shahroudy Bing Shuai Ting Liu Xingxing Wang Gang Wang Jianfei Cai et al. 2018. Recent advances in convolutional neural networks. Pattern recognition 77 (2018) 354--377.","DOI":"10.1016\/j.patcog.2017.10.013"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2023.3261325"},{"key":"e_1_2_1_13_1","volume-title":"Wife: Wifi and vision based intelligent facial-gesture emotion recognition. arXiv preprint arXiv:2004.09889","author":"Gu Yu","year":"2020","unstructured":"Yu Gu, Xiang Zhang, Zhi Liu, and Fuji Ren. 2020. Wife: Wifi and vision based intelligent facial-gesture emotion recognition. arXiv preprint arXiv:2004.09889 (2020)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/THMS.2022.3163189"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241548"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the 28th international conference on machine learning (ICML-11)","author":"Kumar Abhishek","year":"2011","unstructured":"Abhishek Kumar and Hal Daum\u00e9. 2011. A co-training approach for multi-view spectral clustering. In Proceedings of the 28th international conference on machine learning (ICML-11). 393--400."},{"key":"e_1_2_1_18_1","volume-title":"Deep learning. nature 521, 7553","author":"LeCun Yann","year":"2015","unstructured":"Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436--444."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i1.16103"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2020.3009561"},{"key":"e_1_2_1_21_1","first-page":"1","article-title":"IndoTrack: Device-free indoor human tracking with commodity Wi-Fi","volume":"1","author":"Li Xiang","year":"2017","unstructured":"Xiang Li, Daqing Zhang, Qin Lv, Jie Xiong, Shengjie Li, Yue Zhang, and Hong Mei. 2017. IndoTrack: Device-free indoor human tracking with commodity Wi-Fi. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 1--22.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3191755"},{"key":"e_1_2_1_23_1","volume-title":"Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748","author":"van den Oord Aaron","year":"2018","unstructured":"Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)."},{"key":"e_1_2_1_24_1","first-page":"20210","article-title":"Model-based domain generalization","volume":"34","author":"Robey Alexander","year":"2021","unstructured":"Alexander Robey, George J Pappas, and Hamed Hassani. 2021. Model-based domain generalization. Advances in Neural Information Processing Systems 34 (2021), 20210--20229.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790129"},{"key":"e_1_2_1_27_1","volume-title":"A survey of multi-view machine learning. Neural computing and applications 23","author":"Sun Shiliang","year":"2013","unstructured":"Shiliang Sun. 2013. A survey of multi-view machine learning. Neural computing and applications 23 (2013), 2031--2038."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2942358.2942393"},{"key":"e_1_2_1_29_1","volume-title":"Accelerating t-SNE using tree-based algorithms. The journal of machine learning research 15, 1","author":"Der Maaten Laurens Van","year":"2014","unstructured":"Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms. The journal of machine learning research 15, 1 (2014), 3221--3245."},{"key":"e_1_2_1_30_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081340"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3080401"},{"key":"e_1_2_1_33_1","volume-title":"AirFi: Empowering WiFi-based Passive Human Gesture Recognition to Unseen Environment via Domain Generalization","author":"Wang Dazhuo","year":"2022","unstructured":"Dazhuo Wang, Jianfei Yang, Wei Cui, Lihua Xie, and Sumei Sun. 2022. AirFi: Empowering WiFi-based Passive Human Gesture Recognition to Unseen Environment via Domain Generalization. IEEE Transactions on Mobile Computing (2022)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2971648.2971744"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2016.2557795"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00631"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790093"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411822"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2021.3133114"},{"key":"e_1_2_1_40_1","volume-title":"WiTraj: robust indoor motion tracking with WiFi signals","author":"Wu Dan","year":"2021","unstructured":"Dan Wu, Youwei Zeng, Ruiyang Gao, Shengjie Li, Yang Li, Rahul C Shah, Hong Lu, and Daqing Zhang. 2021. WiTraj: robust indoor motion tracking with WiFi signals. IEEE Transactions on Mobile Computing (2021)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2971648.2971658"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485730.3485936"},{"key":"e_1_2_1_43_1","volume-title":"A survey on multi-view learning. arXiv preprint arXiv:1304.5634","author":"Xu Chang","year":"2013","unstructured":"Chang Xu, Dacheng Tao, and Chao Xu. 2013. A survey on multi-view learning. arXiv preprint arXiv:1304.5634 (2013)."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3380980"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00696"},{"key":"e_1_2_1_46_1","first-page":"1","article-title":"QGesture: Quantifying gesture distance and direction with WiFi signals","volume":"2","author":"Yu Nan","year":"2018","unstructured":"Nan Yu, Wei Wang, Alex X Liu, and Lingtao Kong. 2018. QGesture: Quantifying gesture distance and direction with WiFi signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--23.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351279"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351279"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2017.7"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3569482"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2022.3170157"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3114309"},{"key":"e_1_2_1_53_1","first-page":"8671","article-title":"Widar3. 0: Zero-effort cross-domain gesture recognition with Wi-Fi","volume":"44","author":"Zhang Yi","year":"2021","unstructured":"Yi Zhang, Yue Zheng, Kun Qian, Guidong Zhang, Yunhao Liu, Chenshu Wu, and Zheng Yang. 2021. Widar3. 0: Zero-effort cross-domain gesture recognition with Wi-Fi. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 11 (2021), 8671--8688.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307334.3326081"},{"key":"e_1_2_1_55_1","volume-title":"Domain generalization in vision: A survey. arXiv preprint arXiv:2103.02503","author":"Zhou Kaiyang","year":"2021","unstructured":"Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao Xiang, and Chen Change Loy. 2021. Domain generalization in vision: A survey. arXiv preprint arXiv:2103.02503 (2021)."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCN.2018.8487345"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631429","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3631429","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T17:00:20Z","timestamp":1756314020000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631429"}},"subtitle":["A Unified Framework for Generalizable Gesture Recognition with Wi-Fi Signals Using Consistency-guided Multi-View Networks"],"short-title":[],"issued":{"date-parts":[[2023,12,19]]},"references-count":56,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,19]]}},"alternative-id":["10.1145\/3631429"],"URL":"https:\/\/doi.org\/10.1145\/3631429","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,19]]},"assertion":[{"value":"2024-01-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}