{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T11:25:56Z","timestamp":1780053956876,"version":"3.54.0"},"reference-count":26,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2018,8,11]],"date-time":"2018-08-11T00:00:00Z","timestamp":1533945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Working with multimodal datasets is a challenging task as it requires annotations which often are time consuming and difficult to acquire. This includes in particular video recordings which often need to be watched as a whole before they can be labeled. Additionally, other modalities like acceleration data are often recorded alongside a video. For that purpose, we created an annotation tool that enables to annotate datasets of video and inertial sensor data. In contrast to most existing approaches, we focus on semi-supervised labeling support to infer labels for the whole dataset. This means, after labeling a small set of instances our system is able to provide labeling recommendations. We aim to rely on the acceleration data of a wrist-worn sensor to support the labeling of a video recording. For that purpose, we apply template matching to identify time intervals of certain activities. We test our approach on three datasets, one containing warehouse picking activities, one consisting of activities of daily living and one about meal preparations. Our results show that the presented method is able to give hints to annotators about possible label candidates.<\/jats:p>","DOI":"10.3390\/s18082639","type":"journal-article","created":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T11:27:13Z","timestamp":1534159633000},"page":"2639","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Exploring Semi-Supervised Methods for Labeling Support in Multimodal Datasets"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8541-5517","authenticated-orcid":false,"given":"Alexander","family":"Diete","sequence":"first","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, 68131 Mannheim, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Timo","family":"Sztyler","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, 68131 Mannheim, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Heiner","family":"Stuckenschmidt","sequence":"additional","affiliation":[{"name":"Data and Web Science Group, University of Mannheim, 68131 Mannheim, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2018,8,11]]},"reference":[{"key":"ref_1","unstructured":"De la Torre Frade, F., Hodgins, J.K., Bargteil, A.W., Martin Artal, X., Macey, J.C., Collado, I., Castells, A., and Beltran, J. (2008). Guide to the Carnegie Mellon University Multimodal Activity (CMU-MMAC) Database, Robotics Institute. Technical Report CMU-RI-TR-08-22."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1145\/2629633","article-title":"Wearables: Has the Age of Smartwatches Finally Arrived?","volume":"58","author":"Rawassizadeh","year":"2014","journal-title":"Commun. ACM"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1111\/2041-210X.12584","article-title":"BORIS: A free, versatile open-source event-logging software for video\/audio coding and live observations","volume":"7","author":"Friard","year":"2016","journal-title":"Methods Ecol. Evolut."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kipp, M. (2001, January 3\u20137). ANVIL\u2014A generic annotation tool for multimodal dialogue. Proceedings of the Seventh European Conference on Speech Communication and Technology, ISCA, Aalborg, Denmark.","DOI":"10.21437\/Eurospeech.2001-354"},{"key":"ref_5","first-page":"122","article-title":"The OpenCV Library","volume":"120","author":"Bradski","year":"2000","journal-title":"Dr. Dobb\u2019s J. Softw. Tools"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Diete, A., Weiland, L., Sztyler, T., and Stuckenschmidt, H. (2016, January 12\u201316). Exploring a multi-sensor picking process in the future warehouse. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, Heidelberg, Germany.","DOI":"10.1145\/2968219.2968270"},{"key":"ref_7","first-page":"788","article-title":"User-Independent Recognition of Sports Activities From a Single Wrist-Worn Accelerometer: A Template-Matching-Based Approach","volume":"63","author":"Margarito","year":"2016","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Spriggs, E.H., Torre, F.D.L., and Hebert, M. (2009, January 20\u201325). Temporal segmentation and activity classification from first-person sensing. Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5204354"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Diete, A., Sztyler, T., and Stuckenschmidt, H. (2017, January 13\u201317). A smart data annotation tool for multi-sensor activity recognition. Proceedings of the 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Kona, HI, USA.","DOI":"10.1109\/PERCOMW.2017.7917542"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"161","DOI":"10.3233\/THC-2009-0546","article-title":"Annotating smart environment sensor data for activity learning","volume":"17","author":"Szewcyzk","year":"2009","journal-title":"Technol. Health Care"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Liu, C., Freeman, W.T., Adelson, E.H., and Weiss, Y. (2008, January 23\u201328). Human-assisted motion annotation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA.","DOI":"10.1109\/CVPR.2008.4587845"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Del Fabro, M., M\u00fcnzer, B., and B\u00f6sz\u00f6rmenyi, L. (2013, January 7\u20139). Smart video browsing with augmented navigation bars. Proceedings of the International Conference on Multimedia Modeling, Huangshan, China.","DOI":"10.1007\/978-3-642-35728-2_9"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Del Fabro, M., and B\u00f6sz\u00f6rmenyi, L. (2012, January 4\u20136). AAU Video browser: Non-sequential hierarchical video browsing without content analysis. Proceedings of the International Conference on Multimedia Modeling, Klagenfurt, Austria.","DOI":"10.1007\/978-3-642-27355-1_63"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Ishihara, T., Kitani, K.M., Ma, W.C., Takagi, H., and Asakawa, C. (2015, January 27\u201330). Recognizing hand-object interactions in wearable camera videos. Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada.","DOI":"10.1109\/ICIP.2015.7351020"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"D\u2019Orazio, T., Leo, M., Mosca, N., Spagnolo, P., and Mazzeo, P.L. (2009, January 2\u20134). A semi-automatic system for ground truth generation of soccer video sequences. Proceedings of the Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, Genova, Italy.","DOI":"10.1109\/AVSS.2009.69"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Martindale, C.F., Hoenig, F., Strohrmann, C., and Eskofier, B.M. (2017). Smart Annotation of Cyclic Data Using Hierarchical Hidden Markov Models. Sensors, 17.","DOI":"10.3390\/s17102328"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1169","DOI":"10.1016\/j.proeng.2012.07.297","article-title":"A Smart Watch with Embedded Sensors to Recognize Objects, Grasps and Forearm Gestures","volume":"41","author":"Morganti","year":"2012","journal-title":"Procedia Eng."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Palotai, Z., L\u00e1ng, M., S\u00e1rk\u00e1ny, A., T\u0151s\u00e9r, Z., Sonntag, D., Toyama, T., and L\u0151rincz, A. (2014, January 18\u201320). LabelMovie: Semi-supervised machine annotation tool with quality assurance and crowd-sourcing options for videos. Proceedings of the 12th International Workshop on Content-Based Multimedia Indexing, Klagenfurt, Austria.","DOI":"10.1109\/CBMI.2014.6849850"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Barz, M., Moniri, M.M., Weber, M., and Sonntag, D. (2016, January 12\u201316). Multimodal Multisensor Activity Annotation Tool. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, Heidelberg, Germany.","DOI":"10.1145\/2968219.2971459"},{"key":"ref_20","unstructured":"Muda, L., Begam, M., and Elamvazuthi, I. (arXiv, 2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques, arXiv."},{"key":"ref_21","unstructured":"Celebi, S., Aydin, A.S., Temiz, T.T., and Arici, T. (2013, January 21\u201324). Gesture recognition using skeleton data with weighted dynamic time warping. Proceedings of the 8th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISAPP (1), Barcelona, Spain."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1016\/j.ejor.2006.07.009","article-title":"Design and control of warehouse order picking: A literature review","volume":"182","author":"Roodbergen","year":"2007","journal-title":"Eur. J. Oper. Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1093\/geront\/9.3_Part_1.179","article-title":"Assessment of older people: Self-maintaining and instrumental activities of daily living","volume":"9","author":"Lawton","year":"1969","journal-title":"Gerontologist"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sztyler, T., and Stuckenschmidt, H. (2016, January 14\u201319). On-body Localization of Wearable Devices: An Investigation of Position-Aware Activity Recognition. Proceedings of the 2016 IEEE International Conference on Pervasive Computing and Communications, Sydney, NSW, Australia.","DOI":"10.1109\/PERCOM.2016.7456521"},{"key":"ref_25","unstructured":"Berndt, D.J., and Clifford, J. (August, January 31). Using Dynamic Time Warping to Find Patterns in Time Series. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"M\u00fcller, M. (2015). Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications, Springer.","DOI":"10.1007\/978-3-319-21945-5"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/8\/2639\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:18:13Z","timestamp":1760195893000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/18\/8\/2639"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,11]]},"references-count":26,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2018,8]]}},"alternative-id":["s18082639"],"URL":"https:\/\/doi.org\/10.3390\/s18082639","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,8,11]]}}}