{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T05:37:40Z","timestamp":1773207460407,"version":"3.50.1"},"reference-count":126,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T00:00:00Z","timestamp":1701993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"DARPA Explainable Artificial Intelligence","award":["N66001-17-2-4032"],"award-info":[{"award-number":["N66001-17-2-4032"]}]},{"name":"National Science Foundation","award":["IIS-1652835, IIS-1528037, and IIS-1762268"],"award-info":[{"award-number":["IIS-1652835, IIS-1528037, and IIS-1762268"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Interact. Intell. Syst."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>We consider the following video activity recognition (VAR) task: given a video, infer the set of activities being performed in the video and assign each frame to an activity. Although VAR can be solved accurately using existing deep learning techniques, deep networks are neither interpretable nor explainable and as a result their use is problematic in high stakes decision-making applications (in healthcare, experimental Biology, aviation, law, etc.). In such applications, failure may lead to disastrous consequences and therefore it is necessary that the user is able to either understand the inner workings of the model or probe it to understand its reasoning patterns for a given decision. We address these limitations of deep networks by proposing a new approach that feeds the output of a deep model into a tractable, interpretable probabilistic model called a dynamic conditional cutset network that is defined over the explanatory and output variables and then performing joint inference over the combined model. The two key benefits of using cutset networks are: (a) they explicitly model the relationship between the output and explanatory variables and as a result, the combined model is likely to be more accurate than the vanilla deep model and (b) they can answer reasoning queries in polynomial time and as a result, they can derive meaningful explanations by efficiently answering explanation queries. We demonstrate the efficacy of our approach on two datasets, Textually Annotated Cooking Scenes (TACoS), and wet lab, using conventional evaluation measures such as the Jaccard Index and Hamming Loss, as well as a human-subjects study.<\/jats:p>","DOI":"10.1145\/3626961","type":"journal-article","created":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T14:53:47Z","timestamp":1697122427000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Explainable Activity Recognition in Videos using Deep Learning and Tractable Probabilistic Models"],"prefix":"10.1145","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5565-1247","authenticated-orcid":false,"given":"Chiradeep","family":"Roy","sequence":"first","affiliation":[{"name":"The University of Texas at Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8823-9635","authenticated-orcid":false,"given":"Mahsan","family":"Nourani","sequence":"additional","affiliation":[{"name":"University of Florida, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9727-2533","authenticated-orcid":false,"given":"Shivvrat","family":"Arya","sequence":"additional","affiliation":[{"name":"The University of Texas at Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0492-3016","authenticated-orcid":false,"given":"Mahesh","family":"Shanbhag","sequence":"additional","affiliation":[{"name":"The University of Texas at Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9720-5015","authenticated-orcid":false,"given":"Tahrima","family":"Rahman","sequence":"additional","affiliation":[{"name":"The University of Texas at Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7192-3457","authenticated-orcid":false,"given":"Eric D.","family":"Ragan","sequence":"additional","affiliation":[{"name":"University of Florida, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4262-2698","authenticated-orcid":false,"given":"Nicholas","family":"Ruozzi","sequence":"additional","affiliation":[{"name":"The University of Texas at Dallas, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6459-7358","authenticated-orcid":false,"given":"Vibhav","family":"Gogate","sequence":"additional","affiliation":[{"name":"The University of Texas at Dallas, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,12,8]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2870052"},{"issue":"3","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1922649.1922653","article-title":"Human activity analysis: A review","volume":"43","author":"Aggarwal Jake K.","year":"2011","unstructured":"Jake K. Aggarwal and Michael S. Ryoo. 2011. Human activity analysis: A review. ACM Computing Surveys 43, 3 (2011), 1\u201343.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_2_4_2","first-page":"1","volume-title":"Proceedings of the 2018 IEEE International Conference on Future IoT Technologies (Future IoT)","author":"Atzmueller Martin","year":"2018","unstructured":"Martin Atzmueller, Naveed Hayat, Matthias Trojahn, and Dennis Kroll. 2018. Explicative human activity recognition using adaptive association rule-based classification. In Proceedings of the 2018 IEEE International Conference on Future IoT Technologies (Future IoT). IEEE, 1\u20136."},{"key":"e_1_3_2_5_2","first-page":"569","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Bach Francis R.","year":"2002","unstructured":"Francis R. Bach and Michael I. Jordan. 2002. Thin junction trees. In Proceedings of the Advances in Neural Information Processing Systems. MIT, 569\u2013576."},{"key":"e_1_3_2_6_2","first-page":"1440","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Bargal Sarah Adel","year":"2018","unstructured":"Sarah Adel Bargal, Andrea Zunino, Donghyun Kim, Jianming Zhang, Vittorio Murino, and Stan Sclaroff. 2018. Excitation backprop for RNNs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1440\u20131449."},{"key":"e_1_3_2_7_2","first-page":"2242","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Bekker Jessa","year":"2015","unstructured":"Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, and Guy Van den Broeck. 2015. Tractable learning for complex probability queries. In Proceedings of the Advances in Neural Information Processing Systems. Curran Associates, Inc., 2242\u20132250."},{"key":"e_1_3_2_8_2","first-page":"32","volume-title":"Proceedings of the 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops)","author":"Bettini Claudio","year":"2021","unstructured":"Claudio Bettini, Gabriele Civitarese, and Michele Fiori. 2021. Explainable activity recognition over interpretable models. In Proceedings of the 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops). IEEE, 32\u201337."},{"key":"e_1_3_2_9_2","first-page":"1395","volume-title":"Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV\u201905)","author":"Blank Moshe","year":"2005","unstructured":"Moshe Blank, Lena Gorelick, Eli Shechtman, Michal Irani, and Ronen Basri. 2005. Actions as space-time shapes. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV\u201905). IEEE, 1395\u20131402."},{"issue":"3","key":"e_1_3_2_10_2","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/34.910878","article-title":"The recognition of human movement using temporal templates","volume":"23","author":"Bobick Aaron F.","year":"2001","unstructured":"Aaron F. Bobick and James W. Davis. 2001. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3 (2001), 257\u2013267.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"12","key":"e_1_3_2_11_2","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1109\/34.643892","article-title":"A state-based approach to the representation and recognition of gesture","volume":"19","author":"Bobick Aaron F.","year":"1997","unstructured":"Aaron F. Bobick and Andrew D. Wilson. 1997. A state-based approach to the representation and recognition of gesture. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 12 (1997), 1325\u20131337.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_12_2","first-page":"3329","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911)","author":"Brendel William","year":"2011","unstructured":"William Brendel, Alan Fern, and Sinisa Todorovic. 2011. Probabilistic event logic for interval-based event recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911). 3329\u20133336."},{"issue":"1","key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/0004-3702(95)00041-0","article-title":"Visual surveillance in a dynamic and uncertain world","volume":"78","author":"Buxton Hilary","year":"1995","unstructured":"Hilary Buxton and Shaogang Gong. 1995. Visual surveillance in a dynamic and uncertain world. Artificial Intelligence 78, 1\u20132 (1995), 431\u2013459.","journal-title":"Artificial Intelligence"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","first-page":"624","DOI":"10.1109\/ICCV.1995.466880","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Campbell Lee W.","year":"1995","unstructured":"Lee W. Campbell and Aaron F. Bobick. 1995. Recognition of human body motion using phase space constraints. In Proceedings of the IEEE International Conference on Computer Vision. 624\u2013630."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1007\/978-3-030-67832-6_24","volume-title":"Proceedings of the International Conference on Multimedia Modeling","author":"Chen Xuanwei","year":"2021","unstructured":"Xuanwei Chen, Rui Liu, Xiaomeng Song, and Yahong Han. 2021. Locating visual explanations for video question answering. In Proceedings of the International Conference on Multimedia Modeling. Springer, 290\u2013302."},{"key":"e_1_3_2_16_2","first-page":"104","volume-title":"Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149)","author":"Chomat Olivier","year":"1999","unstructured":"Olivier Chomat and James L. Crowley. 1999. Probabilistic recognition of activity using local appearance. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). 104\u2013109."},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"C. Chow and C. Liu. 1968. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14 3 (1968) 462\u2013467. DOI:10.1109\/TIT.1968.1054142","DOI":"10.1109\/TIT.1968.1054142"},{"key":"e_1_3_2_18_2","first-page":"123","volume-title":"Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence","author":"Darwiche Adnan","year":"2000","unstructured":"Adnan Darwiche. 2000. A differential approach to inference in Bayesian networks. In Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence. 123\u2013132."},{"key":"e_1_3_2_19_2","first-page":"147","volume-title":"Proceedings of the Conference on Probabilistic Graphical Models","author":"Mauro Nicola Di","year":"2016","unstructured":"Nicola Di Mauro, Antonio Vergari, and Floriana Esposito. 2016. Multi-label classification with cutset networks. In Proceedings of the Conference on Probabilistic Graphical Models. 147\u2013158."},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1109\/VSPETS.2005.1570899","volume-title":"Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance","author":"Doll\u00e1r Piotr","year":"2005","unstructured":"Piotr Doll\u00e1r, Vincent Rabaud, Garrison Cottrell, and Serge Belongie. 2005. Behavior recognition via sparse spatio-temporal features. In Proceedings of the 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 65\u201372."},{"key":"e_1_3_2_21_2","first-page":"2625","volume-title":"Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition","author":"Donahue Jeffrey","year":"2015","unstructured":"Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 2625\u20132634."},{"key":"e_1_3_2_22_2","first-page":"176","volume-title":"Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence","author":"Doucet Arnaud","year":"2000","unstructured":"Arnaud Doucet, Nando De Freitas, Kevin Murphy, and Stuart Russell. 2000. Rao-blackwellised particle filtering for dynamic Bayesian networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence. 176\u2013183."},{"key":"e_1_3_2_23_2","first-page":"3059","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Duan Xuguang","year":"2018","unstructured":"Xuguang Duan, Wenbing Huang, Chuang Gan, Jingdong Wang, Wenwu Zhu, and Junzhou Huang. 2018. Weakly supervised dense event captioning in videos. In Proceedings of the Advances in Neural Information Processing Systems. 3059\u20133069."},{"key":"e_1_3_2_24_2","unstructured":"Vincent Dumoulin and Francesco Visin. 2016. A guide to convolution arithmetic for deep learning. ArXiv e-prints 1603.07285 (2016)."},{"key":"e_1_3_2_25_2","first-page":"1300","volume-title":"Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI)","author":"Friedman Nir","year":"1999","unstructured":"Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer. 1999. Learning probabilistic relational models. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI). 1300\u20131309."},{"key":"e_1_3_2_26_2","first-page":"5277","article-title":"TALL: Temporal activity localization via language query","author":"Gao Jiyang","year":"2017","unstructured":"Jiyang Gao, Chen Sun, Zhenheng Yang, and Ramakant Nevatia. 2017. TALL: Temporal activity localization via language query. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV\u201917). 5277\u20135285.","journal-title":"Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV\u201917)."},{"key":"e_1_3_2_27_2","first-page":"1503","volume-title":"Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV\u201921)","author":"Gao Junyu","year":"2021","unstructured":"Junyu Gao and Changsheng Xu. 2021. Fast video moment retrieval. In Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV\u201921). 1503\u20131512."},{"issue":"3","key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"1646","DOI":"10.1109\/TCSVT.2021.3075470","article-title":"Learning video moment retrieval without a single annotated video","volume":"32","author":"Gao Junyu","year":"2022","unstructured":"Junyu Gao and Changsheng Xu. 2022. Learning video moment retrieval without a single annotated video. IEEE Transactions on Circuits and Systems for Video Technology 32, 3 (2022), 1646\u20131657.","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"issue":"1","key":"e_1_3_2_29_2","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1136\/amiajnl-2011-000089","article-title":"Automation bias: A systematic review of frequency, effect mediators, and mitigators","volume":"19","author":"Goddard Kate","year":"2011","unstructured":"Kate Goddard, Abdul Roudsari, and Jeremy C. Wyatt. 2011. Automation bias: A systematic review of frequency, effect mediators, and mitigators. Journal of the American Medical Informatics Association 19, 1 (2011), 121\u2013127.","journal-title":"Journal of the American Medical Informatics Association"},{"key":"e_1_3_2_30_2","first-page":"742","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201903)","author":"Gong Shaogang","year":"2003","unstructured":"Shaogang Gong and Tao Xiang. 2003. Recognition of group activities using dynamic probabilistic networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201903). 742\u2013749."},{"issue":"2","key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1609\/aimag.v40i2.2850","article-title":"DARPA\u2019s explainable artificial intelligence (XAI) program","volume":"40","author":"Gunning David","year":"2019","unstructured":"David Gunning and David Aha. 2019. DARPA\u2019s explainable artificial intelligence (XAI) program. AI Magazine 40, 2 (Jun.2019), 44\u201358.","journal-title":"AI Magazine"},{"key":"e_1_3_2_32_2","doi-asserted-by":"crossref","first-page":"1380","DOI":"10.18653\/v1\/D18-1168","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Hendricks Lisa Anne","year":"2018","unstructured":"Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. 2018. Localizing moments in video with temporal language. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1380\u20131390."},{"issue":"3","key":"e_1_3_2_33_2","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1177\/0018720814547570","article-title":"Trust in automation: Integrating empirical evidence on factors that influence trust","volume":"57","author":"Hoff Kevin Anthony","year":"2015","unstructured":"Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors 57, 3 (2015), 407\u2013434.","journal-title":"Human Factors"},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1201\/9781315572529-8","article-title":"A taxonomy of emergent trusting in the human\u2013machine relationship","author":"Hoffman Robert R.","year":"2017","unstructured":"Robert R. Hoffman. 2017. A taxonomy of emergent trusting in the human\u2013machine relationship. Cognitive Systems Engineering: The Future for a Changing World (2017), 137\u2013163. https:\/\/www.taylorfrancis.com\/chapters\/edit\/10.1201\/9781315572529-8\/taxonomy-emergent-trusting-human%E2%80%93machine-relationship-robert-hoffman","journal-title":"Cognitive Systems Engineering: The Future for a Changing World"},{"key":"e_1_3_2_35_2","unstructured":"Robert R. Hoffman Shane T. Mueller Gary Klein and Jordan Litman. 2018. Metrics for explainable AI: challenges and prospects. CoRR abs\/1812.04608 (2018)."},{"key":"e_1_3_2_36_2","first-page":"966","volume-title":"Proceedings of the 12th AAAI Conference on Artificial Intelligence","author":"Huang Timothy","year":"1994","unstructured":"Timothy Huang, Daphne Koller, Jitendra Malik, G. Ogasawara, Bobby S. Rao, Stuart J. Russell, and Joseph Weber. 1994. Automatic symbolic traffic scene analysis using belief networks. In Proceedings of the 12th AAAI Conference on Artificial Intelligence. 966\u2013972."},{"issue":"8","key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1109\/34.868686","article-title":"Recognition of visual activities and interactions by stochastic parsing","volume":"22","author":"Ivanov Yuri A.","year":"2000","unstructured":"Yuri A. Ivanov and Aaron F. Bobick. 2000. Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (2000), 852\u2013872.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1145\/3323873.3325019","volume-title":"Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR\u201919)","author":"Jiang Bin","year":"2019","unstructured":"Bin Jiang, Xin Huang, Chao Yang, and Junsong Yuan. 2019. Cross-modal video moment retrieval with spatial and language-temporal attention. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR\u201919). Association for Computing Machinery, 217\u2013225."},{"key":"e_1_3_2_39_2","unstructured":"Matthew J. Johnson David K. Duvenaud Alex Wiltschko Ryan P. Adams and Sandeep R. Datta. 2016. Composing graphical models with neural networks for structured representations and fast inference. In Advances in Neural Information Processing Systems Vol. 29 Curran Associates Inc."},{"key":"e_1_3_2_40_2","first-page":"2897","volume-title":"Proceedings of the IEEE International Conference on Image Processing (ICIP\u201906)","author":"Joo Seong-wook","year":"2006","unstructured":"Seong-wook Joo and Rama Chellappa. 2006. Recognition of multi-object events using attribute grammars. In Proceedings of the IEEE International Conference on Image Processing (ICIP\u201906). 2897\u20132900."},{"key":"e_1_3_2_41_2","article-title":"Deep variational Bayes filters: Unsupervised learning of state space models from raw data","author":"Karl Maximilian","year":"2016","unstructured":"Maximilian Karl, Maximilian S\u00f6lch, Justin Bayer, and Patrick van der Smagt. 2016. Deep variational Bayes filters: Unsupervised learning of state space models from raw data. In Proceedings of the International Conference on Learning Representations.","journal-title":"Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1109\/CVPR.2014.223","volume-title":"Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition","author":"Karpathy Andrej","year":"2014","unstructured":"Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1725\u20131732."},{"key":"e_1_3_2_43_2","first-page":"1","volume-title":"Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition","author":"Ke Yan","year":"2007","unstructured":"Yan Ke, Rahul Sukthankar, and Martial Hebert. 2007. Spatio-temporal shape and flow correlation for action recognition. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20138."},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Zafar A. Khan and Won Sohn. 2011. Abnormal human activity recognition system based on R-transform and kernel discriminant technique for elderly home care. IEEE Transactions on Consumer Electronics 57 4 (2011) 1843\u20131850. DOI:10.1109\/TCE.2011.6131162","DOI":"10.1109\/TCE.2011.6131162"},{"key":"e_1_3_2_45_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015. Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_2_46_2","volume-title":"Probabilistic Graphical Models - Principles and Techniques","author":"Koller Daphne","year":"2009","unstructured":"Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models - Principles and Techniques. MIT Press."},{"key":"e_1_3_2_47_2","first-page":"706","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Krishna Ranjay","year":"2017","unstructured":"Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, and Juan Carlos Niebles. 2017. Dense-captioning events in videos. In Proceedings of the IEEE International Conference on Computer Vision. 706\u2013715."},{"key":"e_1_3_2_48_2","unstructured":"Rahul G. Krishnan Uri Shalit and David Sontag. 2015. Deep Kalman Filters."},{"key":"e_1_3_2_49_2","first-page":"1097","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1097\u20131105."},{"key":"e_1_3_2_50_2","first-page":"139","volume-title":"Proceedings of the 3rd International Conference on Innovations in Bio-Inspired Computing and Applications","author":"Lai Tai Yu","year":"2012","unstructured":"Tai Yu Lai, Jong Yih Kuo, Yong-Yi Fanjiang, Shang-Pin Ma, and Yi Han Liao. 2012. Robust little flame detection on real-time video surveillance system. In Proceedings of the 3rd International Conference on Innovations in Bio-Inspired Computing and Applications. 139\u2013143."},{"key":"e_1_3_2_51_2","first-page":"432","article-title":"Space-time interest points","author":"Laptev Ivan","year":"2003","unstructured":"Ivan Laptev and Tony Lindenberg. 2003. Space-time interest points. In Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV\u201903), 432\u2013439.","journal-title":"Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV\u201903)"},{"key":"e_1_3_2_52_2","first-page":"1","volume-title":"Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition","author":"Laptev Ivan","year":"2008","unstructured":"Ivan Laptev, Marcin Marszalek, Cordelia Schmid, and Benjamin Rozenfeld. 2008. Learning realistic human actions from movies. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20138."},{"key":"e_1_3_2_53_2","first-page":"36","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Lea Colin","year":"2016","unstructured":"Colin Lea, Austin Reiter, Ren\u00e9 Vidal, and Gregory D. Hager. 2016. Segmental spatiotemporal cnns for fine-grained action segmentation. In Proceedings of the European Conference on Computer Vision. Springer, 36\u201352."},{"issue":"11","key":"e_1_3_2_54_2","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"LeCun Yann","year":"1998","unstructured":"Yann LeCun, L\u00e9on Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of IEEE 86, 11 (1998), 2278\u20132324.","journal-title":"Proceedings of IEEE"},{"issue":"1","key":"e_1_3_2_55_2","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1518\/hfes.46.1.50.30392","article-title":"Trust in automation: Designing for appropriate reliance","volume":"46","author":"Lee John D.","year":"2004","unstructured":"John D. Lee and Katrina A. See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors 46, 1 (2004), 50\u201380.","journal-title":"Human Factors"},{"key":"e_1_3_2_56_2","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9781139924801","volume-title":"Mining of Massive Datasets (2nd. ed.)","author":"Leskovec Jurij","year":"2014","unstructured":"Jurij Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. 2014. Mining of Massive Datasets (2nd. ed.). Cambridge University Press, Cambridge. HF5415.125.L46 2014"},{"key":"e_1_3_2_57_2","first-page":"1120","volume-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision","author":"Li Zhenqiang","year":"2021","unstructured":"Zhenqiang Li, Weimin Wang, Zuoyue Li, Yifei Huang, and Yoichi Sato. 2021. Towards visually explaining video understanding networks with perturbation. In Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision. 1120\u20131129."},{"issue":"8","key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"1893","DOI":"10.1109\/TPAMI.2018.2890628","article-title":"Focal visual-text attention for memex question answering","volume":"41","author":"Liang Junwei","year":"2019","unstructured":"Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, and Alexander G. Hauptmann. 2019. Focal visual-text attention for memex question answering. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 8 (2019), 1893\u20131908.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_59_2","first-page":"100","volume-title":"Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence","author":"Liang Yitao","year":"2017","unstructured":"Yitao Liang, Jessa Bekker, and Guy Van den Broeck. 2017. Learning the structure of probabilistic sentential decision diagrams. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence. 100\u2013109."},{"key":"e_1_3_2_60_2","first-page":"11539","article-title":"Weakly-supervised video moment retrieval via semantic completion network","author":"Lin Zhijie","year":"2020","unstructured":"Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi Wang, and Huasheng Liu. 2020. Weakly-supervised video moment retrieval via semantic completion network. In Proceedings of the AAAI Conference on Artificial Intelligence. 11539\u201311546.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence."},{"issue":"443","key":"e_1_3_2_61_2","doi-asserted-by":"crossref","first-page":"1032","DOI":"10.1080\/01621459.1998.10473765","article-title":"Sequential Monte Carlo methods for dynamic systems","volume":"93","author":"Liu Jun S.","year":"1998","unstructured":"Jun S. Liu and Rong Chen. 1998. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association 93, 443 (1998), 1032\u20131044.","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_3_2_62_2","first-page":"15","volume-title":"Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201918)","author":"Liu Meng","year":"2018","unstructured":"Meng Liu, Xiang Wang, Liqiang Nie, Xiangnan He, Baoquan Chen, and Tat-Seng Chua. 2018. Attentive moment retrieval in videos. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201918). Association for Computing Machinery, 15\u201324."},{"key":"e_1_3_2_63_2","first-page":"383","volume-title":"Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence","author":"Lowd Daniel","year":"2008","unstructured":"Daniel Lowd and Pedro Domingos. 2008. Learning arithmetic circuits. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence. 383\u2013392."},{"key":"e_1_3_2_64_2","first-page":"1","volume-title":"Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition","author":"Lv Fengjun","year":"2007","unstructured":"Fengjun Lv and Ramakant Nevatia. 2007. Single view human action recognition using key pose matching and viterbi path searching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20138."},{"key":"e_1_3_2_65_2","first-page":"230","volume-title":"Proceedings of the 19th International Joint Conference on Artificial Intelligence","author":"Mateescu Robert","year":"2005","unstructured":"Robert Mateescu and Rina Dechter. 2005. AND\/OR cutset conditioning. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., 230\u2013235."},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1007\/978-3-319-25252-0_13","volume-title":"Proceedings of the Foundations of Intelligent Systems - 22nd International Symposium","author":"Mauro Nicola Di","year":"2015","unstructured":"Nicola Di Mauro, Antonio Vergari, and Teresa Maria Altomare Basile. 2015. Learning Bayesian random cutset forests. In Proceedings of the Foundations of Intelligent Systems - 22nd International Symposium. 122\u2013132."},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","unstructured":"David Miller Mishel Johns Brian Mok Nikhil Gowda David Sirkin Key Lee and Wendy Ju. 2016. Behavioral measurement of trust in automation: The trust fall. Proceedings of the Human Factors and Ergonomics Society Annual Meeting 60 1 (2016) 1849\u20131853. DOI:10.1177\/1541931213601422","DOI":"10.1177\/1541931213601422"},{"key":"e_1_3_2_68_2","first-page":"11592","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Mithun Niluthpol Chowdhury","year":"2019","unstructured":"Niluthpol Chowdhury Mithun, Sujoy Paul, and Amit K. Roy-Chowdhury. 2019. Weakly supervised video moment retrieval from text queries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11592\u201311601."},{"key":"e_1_3_2_69_2","first-page":"3289","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911)","author":"Morariu Vlad I.","year":"2011","unstructured":"Vlad I. Morariu and Larry S. Davis. 2011. Multi-agent event recognition in structured scenarios. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201911). 3289\u20133296."},{"issue":"11","key":"e_1_3_2_70_2","doi-asserted-by":"crossref","first-page":"1905","DOI":"10.1080\/00140139408964957","article-title":"Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems","volume":"37","author":"Muir Bonnie M.","year":"1994","unstructured":"Bonnie M. Muir. 1994. Trust in automation: Part I. Theoretical issues in the study of trust and human intervention in automated systems. Ergonomics 37, 11 (1994), 1905\u20131922.","journal-title":"Ergonomics"},{"issue":"3","key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1080\/00140139608964474","article-title":"Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation","volume":"39","author":"Muir Bonnie M.","year":"1996","unstructured":"Bonnie M. Muir and Neville Moray. 1996. Trust in automation. Part II. Experimental studies of trust and human intervention in a process control simulation. Ergonomics 39, 3 (1996), 429\u2013460.","journal-title":"Ergonomics"},{"key":"e_1_3_2_72_2","volume-title":"Dynamic Bayesian Networks: Representation, Inference and Learning","author":"Murphy Kevin P.","year":"2002","unstructured":"Kevin P. Murphy. 2002. Dynamic Bayesian Networks: Representation, Inference and Learning. Ph.D. Dissertation. University of California, Berkeley."},{"issue":"1","key":"e_1_3_2_73_2","first-page":"1558","article-title":"Unsupervised alignment of natural language instructions with video segments","volume":"28","author":"Naim Iftekhar","year":"2014","unstructured":"Iftekhar Naim, Young Song, Qiguang Liu, Henry Kautz, Jiebo Luo, and Daniel Gildea. 2014. Unsupervised alignment of natural language instructions with video segments. Proceedings of the AAAI Conference on Artificial Intelligence 28, 1(2014), 1558\u20131564.","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"e_1_3_2_74_2","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1109\/WMVC.2007.12","volume-title":"Proceedings of the 2007 IEEE Workshop on Motion and Video Computing (WMVC\u201907)","author":"Natarajan Pradeep","year":"2007","unstructured":"Pradeep Natarajan and Ramakant Nevatia. 2007. Coupled hidden semi Markov models for activity recognition. In Proceedings of the 2007 IEEE Workshop on Motion and Video Computing (WMVC\u201907). IEEE, 10\u201310."},{"key":"e_1_3_2_75_2","first-page":"4694","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Ng Joe Yue-Hei","year":"2015","unstructured":"Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915). 4694\u20134702."},{"key":"e_1_3_2_76_2","first-page":"1020","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ni Bingbing","year":"2016","unstructured":"Bingbing Ni, Xiaokang Yang, and Shenghua Gao. 2016. Progressively parsing interactional objects for fine grained action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1020\u20131028."},{"key":"e_1_3_2_77_2","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1145\/1102351.1102430","volume-title":"Proceedings of the 22nd International Conference on Machine Learning","author":"Niculescu-Mizil Alexandru","year":"2005","unstructured":"Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning. 625\u2013632."},{"key":"e_1_3_2_78_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0122-4"},{"issue":"8","key":"e_1_3_2_79_2","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1109\/34.868684","article-title":"A Bayesian computer vision system for modeling human interactions","volume":"22","author":"Oliver Nuria","year":"2000","unstructured":"Nuria Oliver, Barbara Rosario, and Alex Pentland. 2000. A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 8 (2000), 831\u2013843.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_80_2","unstructured":"Mayu Otani Yuta Nakashima Esa Rahtu and Janne Heikkil\u00e4. 2020. Uncovering Hidden Challenges in Query-Based Video Moment Retrieval . CoRR abs\/2009.00325 (2020)."},{"key":"e_1_3_2_81_2","first-page":"8779","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Park Dong Huk","year":"2018","unstructured":"Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach. 2018. Multimodal explanations: Justifying decisions and pointing to the evidence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8779\u20138788."},{"key":"e_1_3_2_82_2","doi-asserted-by":"crossref","unstructured":"J. D. Park and A. Darwiche. 2004. Complexity results and approximation strategies for MAP explanations. Journal of Artificial Intelligence Research 21 (2004) 101--133. DOI:10.1613\/jair.1236","DOI":"10.1613\/jair.1236"},{"key":"e_1_3_2_83_2","volume-title":"Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference","author":"Pearl Judea","year":"1988","unstructured":"Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann."},{"key":"e_1_3_2_84_2","first-page":"487","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201911)","author":"Pei Mingtao","year":"2011","unstructured":"Mingtao Pei, Yunde Jia, and Song-Chun Zhu. 2011. Parsing video events with goal inference and intent prediction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201911). 487\u2013494."},{"key":"e_1_3_2_85_2","first-page":"337","volume-title":"Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence","author":"Poon Hoifung","year":"2011","unstructured":"Hoifung Poon and Pedro Domingos. 2011. Sum-product networks: A new deep architecture. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. AUAI, 337\u2013346."},{"issue":"2","key":"e_1_3_2_86_2","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/5.18626","article-title":"A tutorial on hidden Markov models and selected applications in speech recognition","volume":"77","author":"Rabiner Lawrence R.","year":"1989","unstructured":"Lawrence R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 2 (1989), 257\u2013286.","journal-title":"Proceedings of the IEEE"},{"key":"e_1_3_2_87_2","first-page":"3301","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Rahman Tahrima","year":"2016","unstructured":"Tahrima Rahman and Vibhav Gogate. 2016. Learning ensembles of cutset networks. In Proceedings of the AAAI Conference on Artificial Intelligence. Dale Schuurmans and Michael P. Wellman (Eds.), AAAI, 3301\u20133307."},{"key":"e_1_3_2_88_2","first-page":"617","volume-title":"Proceedings of the 32nd Conference Conference on Uncertainty in Artificial Intelligence","author":"Rahman Tahrima","year":"2016","unstructured":"Tahrima Rahman and Vibhav Gogate. 2016. Merging strategies for sum-product networks: From trees to graphs. In Proceedings of the 32nd Conference Conference on Uncertainty in Artificial Intelligence. 617\u2013626."},{"key":"e_1_3_2_89_2","first-page":"5751","volume-title":"Proceedings of the 28th International Joint Conference on Artificial Intelligence","author":"Rahman Tahrima","year":"2019","unstructured":"Tahrima Rahman, Shasha Jin, and Vibhav Gogate. 2019. Cutset Bayesian networks: A new representation for learning rao-blackwellised graphical models. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5751\u20135757."},{"key":"e_1_3_2_90_2","first-page":"5751","volume-title":"Proceedings of the 28th International Joint Conference on Artificial Intelligence","author":"Rahman Tahrima","year":"2019","unstructured":"Tahrima Rahman, Shasha Jin, and Vibhav Gogate. 2019. Cutset Bayesian networks: A new representation for learning rao-blackwellised graphical models. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. Sarit Kraus (Ed.), 5751\u20135757."},{"key":"e_1_3_2_91_2","doi-asserted-by":"crossref","first-page":"630","DOI":"10.1007\/978-3-662-44851-9_40","volume-title":"Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases","author":"Rahman Tahrima","year":"2014","unstructured":"Tahrima Rahman, Prasanna Kothalkar, and Vibhav Gogate. 2014. Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of Chow-Liu trees. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 630\u2013645."},{"key":"e_1_3_2_92_2","first-page":"1882","volume-title":"Proceedings of the 19th IEEE International Conference on Intelligent Transportation Systems","author":"Rangesh Akshay","year":"2016","unstructured":"Akshay Rangesh, Eshed Ohn-Bar, Kevan Yuen, and Mohan M. Trivedi. 2016. Pedestrians and their phones-detecting phone-based activities of pedestrians for autonomous vehicles. In Proceedings of the 19th IEEE International Conference on Intelligent Transportation Systems. 1882\u20131887."},{"key":"e_1_3_2_93_2","first-page":"316","volume-title":"Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001)","author":"Rao Cen","year":"2001","unstructured":"Cen Rao and Mubarak Shah. 2001. View-invariance in action recognition. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001). IEEE Computer Society, 316\u2013322."},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","unstructured":"Michaela Regneri Marcus Rohrbach Dominikus Wetzel Stefan Thater Bernt Schiele and Manfred Pinkal. 2013. Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics 1 (2013) 25--36. DOI:10.1162\/tacl_a_00207","DOI":"10.1162\/tacl_a_00207"},{"key":"e_1_3_2_95_2","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1145\/2939672.2939778","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Ribeiro Marco Tulio","year":"2016","unstructured":"Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. \u201cWhy should I trust you?\u201d Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135\u20131144."},{"key":"e_1_3_2_96_2","first-page":"1","volume-title":"Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition","author":"Rodriguez Mikel D.","year":"2008","unstructured":"Mikel D. Rodriguez, Javed Ahmed, and Mubarak Shah. 2008. Action mach a spatio-temporal maximum average correlation height filter for action recognition. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20138."},{"key":"e_1_3_2_97_2","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1007\/978-3-319-11752-2_15","volume-title":"Proceedings of the German Conference on Pattern Recognition","author":"Rohrbach Anna","year":"2014","unstructured":"Anna Rohrbach, Marcus Rohrbach, Wei Qiu, Annemarie Friedrich, Manfred Pinkal, and Bernt Schiele. 2014. Coherent multi-sentence video description with variable level of detail. In Proceedings of the German Conference on Pattern Recognition. 184\u2013195."},{"key":"e_1_3_2_98_2","first-page":"710","volume-title":"Proceedings of the 31st International Conference on Machine Learning","author":"Rooshenas Amirmohammad","year":"2014","unstructured":"Amirmohammad Rooshenas and Daniel Lowd. 2014. Learning sum-product networks with direct and indirect variable interactions. In Proceedings of the 31st International Conference on Machine Learning. 710\u2013718."},{"issue":"1","key":"e_1_3_2_99_2","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1016\/0004-3702(94)00092-1","article-title":"On the hardness of approximate reasoning","volume":"82","author":"Roth Dan","year":"1996","unstructured":"Dan Roth. 1996. On the hardness of approximate reasoning. Artificial Intelligence 82, 1\u20132 (1996), 273\u2013302.","journal-title":"Artificial Intelligence"},{"issue":"4","key":"e_1_3_2_100_2","doi-asserted-by":"crossref","first-page":"e59","DOI":"10.1002\/ail2.59","article-title":"Explainable activity recognition in videos: Lessons learned","volume":"2","author":"Roy Chiradeep","year":"2021","unstructured":"Chiradeep Roy, Mahsan Nourani, Donald R. Honeycutt, Jeremy E. Block, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, and Vibhav Gogate. 2021. Explainable activity recognition in videos: Lessons learned. Applied AI Letters 2, 4 (2021), e59.","journal-title":"Applied AI Letters"},{"key":"e_1_3_2_101_2","first-page":"3106","volume-title":"Proceedings of the 24th International Conference on Artificial Intelligence and Statistics","author":"Roy Chiradeep","year":"2021","unstructured":"Chiradeep Roy, Tahrima Rahman, Hailiang Dong, Nicholas Ruozzi, and Vibhav Gogate. 2021. Dynamic cutset networks. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. 3106\u20133114."},{"key":"e_1_3_2_102_2","volume-title":"Proceedings of the 3rd Worksop on Tractable Probabilistic Models","author":"Roy Chiradeep","year":"2019","unstructured":"Chiradeep Roy, Mahesh Shanbhag, Mahsan Nourani, Tahrima Rahman, Samia Kabir, Vibhav Gogate, Nicholas Ruozzi, and Eric D. Ragan. 2019. Explainable activity recognition in videos. In Proceedings of the 3rd Worksop on Tractable Probabilistic Models."},{"issue":"5","key":"e_1_3_2_103_2","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","article-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead","volume":"1","author":"Rudin Cynthia","year":"2019","unstructured":"Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206\u2013215.","journal-title":"Nature Machine Intelligence"},{"issue":"3","key":"e_1_3_2_104_2","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","year":"2015","unstructured":"Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei.. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211\u2013252.","journal-title":"International Journal of Computer Vision"},{"key":"e_1_3_2_105_2","first-page":"1709","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201906)","author":"Ryoo Michael S.","year":"2006","unstructured":"Michael S. Ryoo and Jake K. Aggarwal. 2006. Recognition of composite human activities through context-free grammar-based representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201906). 1709\u20131718."},{"key":"e_1_3_2_106_2","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1109\/ICPR.2004.1334462","volume-title":"Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004","author":"Schuldt Christian","year":"2004","unstructured":"Christian Schuldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: A local SVM approach. In Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. IEEE, 32\u201336."},{"key":"e_1_3_2_107_2","first-page":"405","volume-title":"Proceeedings of the IEEE International Conference on Computer Vision and Pattern Recognition","author":"Shechtman Eli","year":"2005","unstructured":"Eli Shechtman and Michal Irani. 2005. Space-time behavior-based correlation. In Proceeedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 405\u2013412."},{"key":"e_1_3_2_108_2","first-page":"144","volume-title":"Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV\u201905).","author":"Sheikh Yaser","year":"2005","unstructured":"Yaser Sheikh, Mumtaz Sheikh, and Mubarak Shah. 2005. Exploring the space of a human action. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV\u201905). IEEE, 144\u2013149."},{"key":"e_1_3_2_109_2","first-page":"1961","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Singh Bharat","year":"2016","unstructured":"Bharat Singh, Tim K. Marks, Michael Jones, Oncel Tuzel, and Ming Shao. 2016. A multi-stream bi-directional recurrent neural network for fine-grained action detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1961\u20131970."},{"key":"e_1_3_2_110_2","first-page":"2025","volume-title":"Proceedings of the 25th International Joint Conference on Artificial Intelligence","author":"Song Young Chol","year":"2016","unstructured":"Young Chol Song, Iftekhar Naim, Abdullah Al Mamun, Kaustubh Kulkarni, Parag Singla, Jiebo Luo, Daniel Gildea, and Henry A. Kautz. 2016. Unsupervised alignment of actions in video with text descriptions. In Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2025\u20132031."},{"key":"e_1_3_2_111_2","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1109\/ISCV.1995.477012","volume-title":"Proceedings of the International Symposium on Computer Vision","author":"Starner Thad","year":"1995","unstructured":"Thad Starner and Alex Pentland. 1995. Real-time American sign language recognition from video using hidden Markov models. In Proceedings of the International Symposium on Computer Vision. 265\u2013270."},{"key":"e_1_3_2_112_2","first-page":"1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition","author":"Szegedy Christian","year":"2015","unstructured":"Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 1\u20139."},{"key":"e_1_3_2_113_2","first-page":"4489","volume-title":"Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201915)","author":"Tran Du","year":"2015","unstructured":"Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV\u201915). 4489\u20134497."},{"issue":"2","key":"e_1_3_2_114_2","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1007\/s10462-017-9545-7","article-title":"Suspicious human activity recognition: A review","volume":"50","author":"Tripathi Rajesh Kumar","year":"2018","unstructured":"Rajesh Kumar Tripathi, Anand Singh Jalal, and Subhash Chand Agrawal. 2018. Suspicious human activity recognition: A review. Artificial Intelligence Review 50, 2 (2018), 283\u2013339.","journal-title":"Artificial Intelligence Review"},{"issue":"3","key":"e_1_3_2_115_2","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1089\/big.2016.0051","article-title":"On the safety of machine learning: Cyber-physical systems, decision sciences, and data products","volume":"5","author":"Varshney Kush R.","year":"2017","unstructured":"Kush R. Varshney and Homa Alemzadeh. 2017. On the safety of machine learning: Cyber-physical systems, decision sciences, and data products. Big Data 5, 3 (2017), 246\u2013255.","journal-title":"Big Data"},{"key":"e_1_3_2_116_2","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, Curran Associates, Inc."},{"key":"e_1_3_2_117_2","first-page":"1","volume-title":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI\u201918)","author":"Veale Michael","year":"2018","unstructured":"Michael Veale, Max Van Kleek, and Reuben Binns. 2018. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI\u201918). Association for Computing Machinery, 1\u201314."},{"key":"e_1_3_2_118_2","article-title":"When a Computer Program Keeps You in Jail","author":"Wexler Rebecca","year":"2017","unstructured":"Rebecca Wexler. 2017. When a Computer Program Keeps You in Jail. New York Times.","journal-title":"New York Times"},{"key":"e_1_3_2_119_2","first-page":"1","volume-title":"Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition","author":"Wong Shu-Fai","year":"2007","unstructured":"Shu-Fai Wong, Tae-Kyun Kim, and Roberto Cipolla. 2007. Learning motion categories using both semantic and structural information. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1\u20136."},{"key":"e_1_3_2_120_2","first-page":"1","volume-title":"Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME\u201921)","author":"Wu Ziyue","year":"2021","unstructured":"Ziyue Wu, Junyu Gao, Shucheng Huang, and Changsheng Xu. 2021. Diving into the relations: Leveraging semantic and visual structures for video moment retrieval. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME\u201921). 1\u20136."},{"key":"e_1_3_2_121_2","first-page":"461","volume-title":"Proceedings of the 23rd ACM International Conference on Multimedia","author":"Wu Zuxuan","year":"2015","unstructured":"Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, and Xiangyang Xue. 2015. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM International Conference on Multimedia. 461\u2013470."},{"key":"e_1_3_2_122_2","first-page":"379","volume-title":"Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition","author":"Yamato Junji","year":"1992","unstructured":"Junji Yamato, Jun Ohya, and Kenichiro Ishii. 1992. Recognizing human action in time-sequential images using hidden Markov model. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. 379\u2013385."},{"key":"e_1_3_2_123_2","first-page":"984","volume-title":"Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905)","author":"Yilmaz Alper","year":"2005","unstructured":"Alper Yilmaz and Mubarak Shah. 2005. Actions sketch: A novel action representation. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905). IEEE, 984\u2013989."},{"key":"e_1_3_2_124_2","first-page":"4694","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ng Joe Yue-Hei","year":"2015","unstructured":"Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694\u20134702."},{"key":"e_1_3_2_125_2","first-page":"II\u2013II","volume-title":"Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001","author":"Zelnik-Manor Lihi","year":"2001","unstructured":"Lihi Zelnik-Manor and Michal Irani. 2001. Event-based analysis of video. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001. IEEE, II\u2013II."},{"issue":"2","key":"e_1_3_2_126_2","doi-asserted-by":"crossref","first-page":"240","DOI":"10.1109\/TPAMI.2010.60","article-title":"An extended grammar system for learning and recognizing complex visual events","volume":"33","author":"Zhang Zhang","year":"2011","unstructured":"Zhang Zhang, Tieniu Tan, and Kaiqi Huang. 2011. An extended grammar system for learning and recognizing complex visual events. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 2 (2011), 240\u2013255.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_2_127_2","doi-asserted-by":"crossref","first-page":"521","DOI":"10.1145\/3343031.3351040","volume-title":"Proceedings of the 27th ACM International Conference on Multimedia","author":"Zhuo Tao","year":"2019","unstructured":"Tao Zhuo, Zhiyong Cheng, Peng Zhang, Yongkang Wong, and Mohan Kankanhalli. 2019. Explainable video action reasoning via prior knowledge and state transitions. In Proceedings of the 27th ACM International Conference on Multimedia. 521\u2013529."}],"container-title":["ACM Transactions on Interactive Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626961","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3626961","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:44:16Z","timestamp":1750290256000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626961"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,8]]},"references-count":126,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3626961"],"URL":"https:\/\/doi.org\/10.1145\/3626961","relation":{},"ISSN":["2160-6455","2160-6463"],"issn-type":[{"value":"2160-6455","type":"print"},{"value":"2160-6463","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,8]]},"assertion":[{"value":"2021-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-21","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}