{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,20]],"date-time":"2025-10-20T10:25:19Z","timestamp":1760955919579,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2019,1,31]],"date-time":"2019-01-31T00:00:00Z","timestamp":1548892800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Agency for Science, Technology and Research (A*STAR), Singapore","award":["ARAP program"],"award-info":[{"award-number":["ARAP program"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,1,31]]},"abstract":"<jats:p>In this article, we deal with the problem of understanding human-to-human interactions as a fundamental component of social events analysis. Inspired by the recent success of multi-modal visual data in many recognition tasks, we propose a novel approach to model dyadic interaction by means of features extracted from synchronized 3D skeleton coordinates, depth, and Red Green Blue (RGB) sequences. From skeleton data, we extract new view-invariant proxemic features, named Unified Proxemic Descriptor (UProD), which is able to incorporate intrinsic and extrinsic distances between two interacting subjects. A novel key frame selection method is introduced to identify salient instants of the interaction sequence based on the joints\u2019 energy. From Red Green Blue Depth (RGBD) videos, more holistic CNN features are extracted by applying an adaptive pre-trained Convolutional Neural Networks (CNNs) on optical flow frames. For better understanding the dynamics of interactions, we expand the boundaries of dyadic interactions analysis by proposing a fundamentally new modeling for non-treated problem aiming to discern the active from the passive interactor. Extensive experiments have been carried out on four multi-modal and multi-view interactions datasets. The experimental results demonstrate the superiority of our proposed techniques against the state-of-the-art approaches.<\/jats:p>","DOI":"10.1145\/3300937","type":"journal-article","created":{"date-parts":[[2019,2,19]],"date-time":"2019-02-19T20:54:15Z","timestamp":1550609655000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Understanding the Dynamics of Social Interactions"],"prefix":"10.1145","volume":"15","author":[{"given":"Rim","family":"Trabelsi","sequence":"first","affiliation":[{"name":"Advanced Digital Sciences Center, Singapore LISTIC, Polytech Annecy-Chamb\u00e9ry, Universit\u00e9 Savoie Mont Blanc, France Hatem Bettaher IResCoMath Research Unit, National Engineering School of Gabes, University of Gabes, Tunisia"}]},{"given":"Jagannadan","family":"Varadarajan","sequence":"additional","affiliation":[{"name":"Grab, Singapore"}]},{"given":"Le","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute for Infocomm Research, Agency for Science, Technology and Research, Singapore"}]},{"given":"Issam","family":"Jabri","sequence":"additional","affiliation":[{"name":"College of Engineering and Architecture, Al Yamamah University, KSA"}]},{"given":"Yong","family":"Pei","sequence":"additional","affiliation":[{"name":"SAP Asia Pte Ltd, Singapore"}]},{"given":"Fethi","family":"Smach","sequence":"additional","affiliation":[{"name":"Groupe cr\u00e9dit agricole, France"}]},{"given":"Ammar","family":"Bouallegue","sequence":"additional","affiliation":[{"name":"SysCom Laboratory, National Engineering School of Tunis, University of Tunis El Manar, Tunisia"}]},{"given":"Pierre","family":"Moulin","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, IL"}]}],"member":"320","published-online":{"date-parts":[[2019,2,17]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1922649.1922653"},{"volume-title":"Interpersonal adaptation: Dyadic interaction patterns","author":"Burgoon Judee K.","key":"e_1_2_1_2_1","unstructured":"Judee K. Burgoon , Lesa A. Stern , and Leesa Dillman . 2007. Interpersonal adaptation: Dyadic interaction patterns . Cambridge University Press . Judee K. Burgoon, Lesa A. Stern, and Leesa Dillman. 2007. Interpersonal adaptation: Dyadic interaction patterns. Cambridge University Press."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2564404"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2013.02.006"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Claudio Coppola Serhan Cosar Diego R. Faria Nicola Bellotto and others. 2017. Automatic detection of human interactions from RGB-D data for social activity classification. (2017).  Claudio Coppola Serhan Cosar Diego R. Faria Nicola Bellotto and others. 2017. Automatic detection of human interactions from RGB-D data for social activity classification. (2017).","DOI":"10.1109\/ROMAN.2017.8172405"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759742"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2015.7353446"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2014.772"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1390681.1442794"},{"volume-title":"Authority Ranking, Equality Matching, Market Pricing","author":"Fiske Alan Page","key":"e_1_2_1_10_1","unstructured":"Alan Page Fiske . 1991. Structures of Social Life: The Four Elementary Forms of Human Relations: Communal Sharing , Authority Ranking, Equality Matching, Market Pricing . Free Press . Alan Page Fiske. 1991. Structures of Social Life: The Four Elementary Forms of Human Relations: Communal Sharing, Authority Ranking, Equality Matching, Market Pricing. Free Press."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.99.4.689"},{"key":"e_1_2_1_12_1","volume-title":"Interactive phrases: Semantic descriptions for human interaction recognition","author":"Fu Yun","year":"2014","unstructured":"Yun Fu , Yunde Jia , and Yu Kong . 2014. Interactive phrases: Semantic descriptions for human interaction recognition . IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) ( 2014 ). Yun Fu, Yunde Jia, and Yu Kong. 2014. Interactive phrases: Semantic descriptions for human interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2014)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1525\/aa.1963.65.5.02a00020"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299172"},{"key":"e_1_2_1_15_1","volume-title":"Kitani","author":"Huang De-An","year":"2014","unstructured":"De-An Huang and Kris M . Kitani . 2014 . Action-reaction : Forecasting the dynamics of human interaction. In European Conference on Computer Vision. Springer , 489--504. De-An Huang and Kris M. Kitani. 2014. Action-reaction: Forecasting the dynamics of human interaction. In European Conference on Computer Vision. Springer, 489--504."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2388676.2388683"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33718-5_22"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2015.06.009"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2017.8273574"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.391"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.209"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2464152"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1452392.1452426"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2013.76"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACII.2013.76"},{"key":"e_1_2_1_26_1","volume-title":"BMVC","volume":"1","author":"Patron-Perez Alonso","unstructured":"Alonso Patron-Perez , Marcin Marszalek , Andrew Zisserman , and Ian D. Reid . 2010. High five: Recognising human interactions in TV shows . In BMVC , Vol. 1 . Citeseer, 2. Alonso Patron-Perez, Marcin Marszalek, Andrew Zisserman, and Ian D. Reid. 2010. High five: Recognising human interactions in TV shows. In BMVC, Vol. 1. Citeseer, 2."},{"key":"e_1_2_1_27_1","unstructured":"Eric Postma and Marie Nilsenova. 2016. Measuring the causal dynamics of facial interaction. (2016).  Eric Postma and Marie Nilsenova. 2016. Measuring the causal dynamics of facial interaction. (2016)."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126349"},{"volume-title":"IEEE International Conference on Computer Vision (ICCV).","author":"Ryoo M. S.","key":"e_1_2_1_29_1","unstructured":"M. S. Ryoo and J. K. Aggarwal . 2009. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities . In IEEE International Conference on Computer Vision (ICCV). M. S. Ryoo and J. K. Aggarwal. 2009. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In IEEE International Conference on Computer Vision (ICCV)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.115"},{"key":"e_1_2_1_31_1","volume-title":"Neural Information Processing Systems Conference (NIPS).","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014 . Two-stream convolutional networks for action recognition in videos . In Neural Information Processing Systems Conference (NIPS). Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Neural Information Processing Systems Conference (NIPS)."},{"key":"e_1_2_1_32_1","unstructured":"K. Soomro A. Roshan Zamir and M. Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01.  K. Soomro A. Roshan Zamir and M. Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.82"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12559-015-9326-z"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_2_1_36_1","volume-title":"Towards good practices for very deep two-stream ConvNets. Arxiv Preprint Arxiv:1507.02159","author":"Wang Limin","year":"2015","unstructured":"Limin Wang , Yuanjun Xiong , Zhe Wang , and Yu Qiao . 2015. Towards good practices for very deep two-stream ConvNets. Arxiv Preprint Arxiv:1507.02159 ( 2015 ). Limin Wang, Yuanjun Xiong, Zhe Wang, and Yu Qiao. 2015. Towards good practices for very deep two-stream ConvNets. Arxiv Preprint Arxiv:1507.02159 (2015)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/THMS.2015.2504550"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2014.06.014"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806315"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.108"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2565479"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.288"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2012.6239234"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1771530.1771554"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Maryam Ziaeefard Robert Bergevin and Louis-Philippe Morency. 2015. Time-slice prediction of dyadic human activities. In BMVC. 167--1.  Maryam Ziaeefard Robert Bergevin and Louis-Philippe Morency. 2015. Time-slice prediction of dyadic human activities. In BMVC. 167--1.","DOI":"10.5244\/C.29.167"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3300937","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3300937","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:25:23Z","timestamp":1750206323000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3300937"}},"subtitle":["A Multi-Modal Multi-View Approach"],"short-title":[],"issued":{"date-parts":[[2019,1,31]]},"references-count":45,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2019,1,31]]}},"alternative-id":["10.1145\/3300937"],"URL":"https:\/\/doi.org\/10.1145\/3300937","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2019,1,31]]},"assertion":[{"value":"2017-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-02-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}