{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,2]],"date-time":"2026-07-02T16:21:55Z","timestamp":1783009315323,"version":"3.54.5"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T00:00:00Z","timestamp":1608163200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2020,12,17]]},"abstract":"<jats:p>Fitness tracking devices have risen in popularity in recent years, but limitations in terms of their accuracy and failure to track many common exercises presents a need for improved fitness tracking solutions. This work proposes a multimodal deep learning approach to leverage multiple data sources for robust and accurate activity segmentation, exercise recognition and repetition counting. For this, we introduce the MM-Fit dataset; a substantial collection of inertial sensor data from smartphones, smartwatches and earbuds worn by participants while performing full-body workouts, and time-synchronised multi-viewpoint RGB-D video, with 2D and 3D pose estimates. We establish a strong baseline for activity segmentation and exercise recognition on the MM-Fit dataset, and demonstrate the effectiveness of our CNN-based architecture at extracting modality-specific spatial temporal features from inertial sensor and skeleton sequence data. We compare the performance of unimodal and multimodal models for activity recognition across a number of sensing devices and modalities. Furthermore, we demonstrate the effectiveness of multimodal deep learning at learning cross-modal representations for activity recognition, which achieves 96% accuracy across all sensing modalities on unseen subjects in the MM-Fit dataset; 94% using data from the smartwatch only; 85% from the smartphone only; and 82% on data from the earbud device. We strengthen single-device performance by using the zeroing-out training strategy, which phases out the other sensing modalities. Finally, we implement and evaluate a strong repetition counting baseline on our MM-Fit dataset. Collectively, these tasks contribute to recognising, segmenting and timing exercise and non-exercise activities for automatic exercise logging.<\/jats:p>","DOI":"10.1145\/3432701","type":"journal-article","created":{"date-parts":[[2020,12,18]],"date-time":"2020-12-18T15:39:14Z","timestamp":1608305954000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":67,"title":["MM-Fit"],"prefix":"10.1145","volume":"4","author":[{"given":"David","family":"Str\u00f6mb\u00e4ck","sequence":"first","affiliation":[{"name":"University of Edinburgh, Edinburgh, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sangxia","family":"Huang","sequence":"additional","affiliation":[{"name":"R&amp;D Center Lund Laboratory, Sony Europe, Lund, Sweden"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Valentin","family":"Radu","sequence":"additional","affiliation":[{"name":"University of Edinburgh, University of Sheffield, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,12,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2014.04.011"},{"key":"e_1_2_1_2_1","volume-title":"International Journal of Computer Science and Network Security 17 (04","author":"Almaslukh B","year":"2017"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3397323","article-title":"Adversarial Multi-view Networks for Activity Recognition","volume":"4","author":"Bai Lei","year":"2020","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3390\/s140406474"},{"key":"e_1_2_1_5_1","volume-title":"2017 IEEE Sensors Applications Symposium (SAS). 1--6.","author":"Bender C. G."},{"key":"e_1_2_1_6_1","volume-title":"A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46","author":"Bulling Andreas","year":"2014"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Zhe Cao Gines Hidalgo Tomas Simon Shih-En Wei and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In arXiv preprint arXiv:1812.08008.  Zhe Cao Gines Hidalgo Tomas Simon Shih-En Wei and Yaser Sheikh. 2018. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. In arXiv preprint arXiv:1812.08008.","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_1_8_1","volume-title":"Latent Structured Models for Human Pose Estimation. In International Conference on Computer Vision.","author":"Catalin Ionescu Cristian Sminchisescu","year":"2011"},{"key":"e_1_2_1_9_1","first-page":"6","article-title":"Sensor-Based Activity","volume":"42","author":"Chen Liming","year":"2012","journal-title":"Recognition. Trans. Sys. Man Cyber Part C"},{"key":"e_1_2_1_10_1","volume-title":"2013 IEEE International Conference on Consumer Electronics (ICCE). 436--437","author":"Choi K. S."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-013-0665-3"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACPR.2015.7486569"},{"key":"e_1_2_1_13_1","volume-title":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Du Yong","year":"2015"},{"key":"e_1_2_1_14_1","volume-title":"Skeletal Quads: Human Action Recognition Using Joint Quadruples. 2014 22nd International Conference on Pattern Recognition","author":"Evangelidis Georgios Dimitrios","year":"2014"},{"key":"e_1_2_1_15_1","unstructured":"Hristijan Gjoreski Jani Bizjak Martin Gjoreski and Matjaz Gams. 2016. Comparing Deep and Classical Machine Learning Methods for Human Activity Recognition using Wrist Accelerometer.  Hristijan Gjoreski Jani Bizjak Martin Gjoreski and Matjaz Gams. 2016. Comparing Deep and Classical Machine Learning Methods for Human Activity Recognition using Wrist Accelerometer."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research), Geoffrey Gordon, David Dunson, and Miroslav Dud\u00edk (Eds.)","volume":"15","author":"Glorot Xavier","year":"2011"},{"key":"e_1_2_1_17_1","first-page":"1","article-title":"Device-free personalized fitness assistant using WiFi","volume":"2","author":"Guo Xiaonan","year":"2018","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_18_1","volume-title":"Multi-modal Convolutional Neural Networks for Activity Recognition. In 2015 IEEE International Conference on Systems, Man, and Cybernetics. 3017--3022","author":"Ha S."},{"key":"e_1_2_1_19_1","unstructured":"Nils Y. Hammerla Shane Halloran and Thomas Pl\u00f6tz. 2016. Deep Convolutional and Recurrent Models for Human Activity Recognition using Wearables. In IJCAI.  Nils Y. Hammerla Shane Halloran and Thomas Pl\u00f6tz. 2016. Deep Convolutional and Recurrent Models for Human Activity Recognition using Wearables. In IJCAI."},{"key":"e_1_2_1_20_1","volume-title":"Canny","author":"Chang Keng","year":"2007"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2017.01.010"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3214269"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MFI.2017.8170441"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.486"},{"key":"e_1_2_1_26_1","volume-title":"Detecting, Recognizing and Tracking Simultaneous Exercises in Unconstrained Scenes. IMWUT 2","author":"Khurana Rushil","year":"2018"},{"key":"e_1_2_1_27_1","volume-title":"International Conference on Learning Representations (ICLR).","author":"Kingma Diederick P","year":"2015"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/SURV.2012.110112.00192"},{"key":"e_1_2_1_29_1","volume-title":"Deep learning. Nature 521, 7553 (27 5","author":"LeCun Yann","year":"2015"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_31_1","volume-title":"Live Repetition Counting. In 2015 IEEE International Conference on Computer Vision (ICCV). 3020--3028","author":"Levy O."},{"key":"e_1_2_1_32_1","volume-title":"2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)","author":"Li Chao","year":"2017"},{"key":"e_1_2_1_33_1","unstructured":"Maosen Li Siheng Chen Xu Chen Ya Zhang Yanfeng Wang and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. In CVPR.  Maosen Li Siheng Chen Xu Chen Ya Zhang Yanfeng Wang and Qi Tian. 2019. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. In CVPR."},{"key":"e_1_2_1_34_1","unstructured":"Shanhong Liu. 2018. Fitness & Activity Tracker. https:\/\/www.statista.com\/study\/35598\/fitness-and-activity-tracker\/.  Shanhong Liu. 2018. Fitness & Activity Tracker. https:\/\/www.statista.com\/study\/35598\/fitness-and-activity-tracker\/."},{"key":"e_1_2_1_35_1","volume-title":"Little","author":"Martinez Julieta","year":"2017"},{"key":"e_1_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Dan Morris T. Scott Saponas Andrew Guillory and Ilya Kelner. 2014. RecoFit: using a wearable sensor to find recognize and count repetitive exercises. In CHI.  Dan Morris T. Scott Saponas Andrew Guillory and Ilya Kelner. 2014. RecoFit: using a wearable sensor to find recognize and count repetitive exercises. In CHI.","DOI":"10.1145\/2556288.2557116"},{"key":"e_1_2_1_37_1","volume-title":"Determining the Single Best Axis for Exercise Repetition Recognition and Counting on SmartWatches. 2014 11th International Conference on Wearable and Implantable Body Sensor Networks","author":"Mortazavi Bobak","year":"2014"},{"key":"e_1_2_1_38_1","volume-title":"Ng","author":"Ngiam Jiquan","year":"2011"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2009.11.014"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2968219.2971461"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161174"},{"key":"e_1_2_1_42_1","volume-title":"DensePose: Dense Human Pose Estimation In The Wild. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Riza Alp Guler Iasonas Kokkinos","year":"2018"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00939"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.3390\/s18092967"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1021\/ac60214a047"},{"key":"e_1_2_1_46_1","volume-title":"Recognition and Repetition Counting for Complex Physical Exercises with Deep Learning. Sensors 19, 3","author":"Soro Andrea","year":"2019"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2809695.2809718"},{"key":"e_1_2_1_48_1","volume-title":"2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)","author":"Um Terry Taewoong","year":"2016"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Eduardo Velloso Andreas Bulling Hans-Werner Gellersen Wallace Ugulino and Hugo Fuks. 2013. Qualitative activity recognition of weight lifting exercises. In AH.  Eduardo Velloso Andreas Bulling Hans-Werner Gellersen Wallace Ugulino and Hugo Fuks. 2013. Qualitative activity recognition of weight lifting exercises. In AH.","DOI":"10.1145\/2459236.2459256"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.82"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2018.02.010"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.198"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Sijie Yan Yuanjun Xiong and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.  Sijie Yan Yuanjun Xiong and Dahua Lin. 2018. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. In AAAI.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"e_1_2_1_54_1","volume-title":"Phyo Phyo San, Xiaoli Li, and Shonali Krishnaswamy.","author":"Yang Jianbo","year":"2015"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.4108\/icst.mobicase.2014.257786"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3432701","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3432701","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:11Z","timestamp":1750193231000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3432701"}},"subtitle":["Multimodal Deep Learning for Automatic Exercise Logging across Sensing Devices"],"short-title":[],"issued":{"date-parts":[[2020,12,17]]},"references-count":55,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12,17]]}},"alternative-id":["10.1145\/3432701"],"URL":"https:\/\/doi.org\/10.1145\/3432701","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,17]]},"assertion":[{"value":"2020-12-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}