{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T14:18:00Z","timestamp":1780064280570,"version":"3.54.0"},"publisher-location":"New York, NY, USA","reference-count":50,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3394171.3413778","type":"proceedings-article","created":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T13:10:18Z","timestamp":1602508218000},"page":"691-699","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":31,"title":["LIGHTEN"],"prefix":"10.1145","author":[{"given":"Sai Praneeth Reddy","family":"Sunkesula","sequence":"first","affiliation":[{"name":"Indian Institute of Technology, Bombay, Mumbai, India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rishabh","family":"Dabral","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, Bombay, Mumbai, India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ganesh","family":"Ramakrishnan","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology, Bombay, Mumbai, India"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","volume":"201","author":"Chao Y.W.","journal-title":"J. Deng."},{"key":"e_1_3_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Oishik Chatterjee Ganesh Ramakrishnan and Sunita Sarawagi. 2020. Robust Data Programming with Precision-guided Labeling Functions. In AAAI.  Oishik Chatterjee Ganesh Ramakrishnan and Sunita Sarawagi. 2020. Robust Data Programming with Precision-guided Labeling Functions. In AAAI.","DOI":"10.1609\/aaai.v34i04.5742"},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"crossref","unstructured":"R. Dabral N. B. Gundavarapu R. Mitra A. Sharma G. Ramakrishnan and A. Jain. 2019. Multi-Person 3D Human Pose Estimation from Monocular Images. In 3DV.  R. Dabral N. B. Gundavarapu R. Mitra A. Sharma G. Ramakrishnan and A. Jain. 2019. Multi-Person 3D Human Pose Estimation from Monocular Images. In 3DV.","DOI":"10.1109\/3DV.2019.00052"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Rishabh Dabral Anurag Mundhada Uday Kusupati Safeer Afaque Abhishek Sharma and Arjun Jain. 2018. Learning 3D Human Pose from Structure and Motion. In ECCV.  Rishabh Dabral Anurag Mundhada Uday Kusupati Safeer Afaque Abhishek Sharma and Arjun Jain. 2018. Learning 3D Human Pose from Structure and Motion. In ECCV.","DOI":"10.1007\/978-3-030-01240-3_41"},{"key":"e_1_3_2_2_5_1","unstructured":"Navneet Dalal and Bill Triggs. 2005. Histograms of Oriented Gradients for Human Detection. In CVPR.  Navneet Dalal and Bill Triggs. 2005. Histograms of Oriented Gradients for Human Detection. In CVPR."},{"key":"e_1_3_2_2_6_1","unstructured":"V. Delaitre J. Sivic and I. Laptev. 2011. Learning person-object interactions for action recognition in still images. In NIPS.  V. Delaitre J. Sivic and I. Laptev. 2011. Learning person-object interactions for action recognition in still images. In NIPS."},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"crossref","unstructured":"Georgia Gkioxari Ross Girshick Piotr Dollar and Kaiming He. 2018. Detecting and recognizing human-object interactions.. In CVPR.  Georgia Gkioxari Ross Girshick Piotr Dollar and Kaiming He. 2018. Detecting and recognizing human-object interactions.. In CVPR.","DOI":"10.1109\/CVPR.2018.00872"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Riza Alp G\u00fcler Natalia Neverova and Iasonas Kokkinos. 2018. DensePose: Dense Human Pose Estimation In The Wild. In CVPR.  Riza Alp G\u00fcler Natalia Neverova and Iasonas Kokkinos. 2018. DensePose: Dense Human Pose Estimation In The Wild. In CVPR.","DOI":"10.1109\/CVPR.2018.00762"},{"key":"e_1_3_2_2_9_1","unstructured":"Saurabh Gupta and Jitendra Malik. 2015. Visual Semantic Role Labeling. In arXiv preprint arXiv:1505.04474.  Saurabh Gupta and Jitendra Malik. 2015. Visual Semantic Role Labeling. In arXiv preprint arXiv:1505.04474."},{"key":"e_1_3_2_2_10_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR."},{"key":"e_1_3_2_2_11_1","unstructured":"J.F. Hu W.S. Zheng J. Lai S. Gong and T. Xiang. 2013. Recognising human-object interaction via exemplar based modelling. In ICCV.  J.F. Hu W.S. Zheng J. Lai S. Gong and T. Xiang. 2013. Recognising human-object interaction via exemplar based modelling. In ICCV."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"A. Jain A.R. Zamir S. Savarese and A Saxena. 2016. Structural-RNN: Deep learning on spatio-temporal graphs. In CVPR.  A. Jain A.R. Zamir S. Savarese and A Saxena. 2016. Structural-RNN: Deep learning on spatio-temporal graphs. In CVPR.","DOI":"10.1109\/CVPR.2016.573"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"crossref","unstructured":"Angjoo Kanazawa Michael J. Black David W. Jacobs and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In CVPR.  Angjoo Kanazawa Michael J. Black David W. Jacobs and Jitendra Malik. 2018. End-to-end Recovery of Human Shape and Pose. In CVPR.","DOI":"10.1109\/CVPR.2018.00744"},{"key":"e_1_3_2_2_14_1","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimizations. In ICLR.  Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimizations. In ICLR."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"crossref","unstructured":"H.S. Koppula R. Gupta and A. Saxena. 2013. Learning human activities and object affordances from RGB-D videos. In The International Journal of Robotics Research.  H.S. Koppula R. Gupta and A. Saxena. 2013. Learning human activities and object affordances from RGB-D videos. In The International Journal of Robotics Research.","DOI":"10.1177\/0278364913478446"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"crossref","unstructured":"H.S. Koppula and A. Saxena. 2016. Anticipating human activities using object affordances for reactive robotic response. In TPAMI.  H.S. Koppula and A. Saxena. 2016. Anticipating human activities using object affordances for reactive robotic response. In TPAMI.","DOI":"10.1109\/TPAMI.2015.2430335"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2888451.2888454"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Ashish Kulkarni Narasimha Raju Uppalapati Pankaj Singh and Ganesh Ramakrishnan. 2018. An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments. In AAAI Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press.  Ashish Kulkarni Narasimha Raju Uppalapati Pankaj Singh and Ganesh Ramakrishnan. 2018. An Interactive Multi-Label Consensus Labeling Model for Multiple Labeler Judgments. In AAAI Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press.","DOI":"10.1609\/aaai.v32i1.11494"},{"key":"e_1_3_2_2_19_1","unstructured":"Sayali Kulkarni Amit Singh Ganesh Ramakrishnan and Soumen Chakrabarti. 2009. ACM SIGKDD John F. Elder IV Francc oise Fogelman-Souli\u00e9 Peter A. Flach and Mohammed Javeed Zaki (Eds.).  Sayali Kulkarni Amit Singh Ganesh Ramakrishnan and Soumen Chakrabarti. 2009. ACM SIGKDD John F. Elder IV Francc oise Fogelman-Souli\u00e9 Peter A. Flach and Mohammed Javeed Zaki (Eds.)."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"crossref","unstructured":"Verica Lazova Eldar Insafutdinov and Gerard Pons-Moll. 2019. 360-Degree Textures of People in Clothing from a Single Image. In 3DV.  Verica Lazova Eldar Insafutdinov and Gerard Pons-Moll. 2019. 360-Degree Textures of People in Clothing from a Single Image. In 3DV.","DOI":"10.1109\/3DV.2019.00076"},{"key":"e_1_3_2_2_21_1","unstructured":"Yong-Lu Li Siyuan Zhou Xijie Huang Liang Xu Ze Ma Hao-Shu Fang Yan-Feng Wang and Cewu Lu. 2019. Transferable interactiveness prior for human-object interaction detection. In CVPR.  Yong-Lu Li Siyuan Zhou Xijie Huang Liang Xu Ze Ma Hao-Shu Fang Yan-Feng Wang and Cewu Lu. 2019. Transferable interactiveness prior for human-object interaction detection. In CVPR."},{"key":"e_1_3_2_2_22_1","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV.  Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In ECCV."},{"key":"e_1_3_2_2_23_1","unstructured":"Chenchen Liu Yang Jin Kehan Xu Guoqiang Gong and Yadong Mu. 2020 a. Beyond Short-Term Snippet: Video Relation Detection with Spatio-Temporal Global Context. In CVPR.  Chenchen Liu Yang Jin Kehan Xu Guoqiang Gong and Yadong Mu. 2020 a. Beyond Short-Term Snippet: Video Relation Detection with Spatio-Temporal Global Context. In CVPR."},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"crossref","unstructured":"Ziyu Liu Hongwen Zhang Zhenghao Chen Zhiyong Wang and Wanli Ouyang. 2020 b. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In CVPR.  Ziyu Liu Hongwen Zhang Zhenghao Chen Zhiyong Wang and Wanli Ouyang. 2020 b. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In CVPR.","DOI":"10.1109\/CVPR42600.2020.00022"},{"key":"e_1_3_2_2_25_1","unstructured":"Dushyant Mehta Oleksandr Sotnychenko Franziska Mueller Weipeng Xu Mohamed Elgharib Pascal Fua Hans-Peter Seidel Helge Rhodin Gerard Pons-Moll and Christian Theobalt. 2020. XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera. ACM Transactions on Graphics.  Dushyant Mehta Oleksandr Sotnychenko Franziska Mueller Weipeng Xu Mohamed Elgharib Pascal Fua Hans-Peter Seidel Helge Rhodin Gerard Pons-Moll and Christian Theobalt. 2020. XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera. ACM Transactions on Graphics."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Trisha Mittal Pooja Guhan Uttaran Bhattacharya Rohan Chandra Aniket Bera and Dinesh Manocha. 2020. EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle. In CVPR.  Trisha Mittal Pooja Guhan Uttaran Bhattacharya Rohan Chandra Aniket Bera and Dinesh Manocha. 2020. EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege's Principle. In CVPR.","DOI":"10.1109\/CVPR42600.2020.01424"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Arsha Nagrani Chen Sun David Ross Rahul Sukthankar Cordelia Schmid and Andrew Zisserman. 2020. Speech2Action: Cross-Modal Supervision for Action Recognition. In CVPR.  Arsha Nagrani Chen Sun David Ross Rahul Sukthankar Cordelia Schmid and Andrew Zisserman. 2020. Speech2Action: Cross-Modal Supervision for Action Recognition. In CVPR.","DOI":"10.1109\/CVPR42600.2020.01033"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"crossref","unstructured":"Alejandro Newell Kaiyu Yang and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV.  Alejandro Newell Kaiyu Yang and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In ECCV.","DOI":"10.1007\/978-3-319-46484-8_29"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Mohamed Omran Christoph Lassner Gerard Pons-Moll Peter Gehler and Bernt Schiele. 2018. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation. In 3DV.  Mohamed Omran Christoph Lassner Gerard Pons-Moll Peter Gehler and Bernt Schiele. 2018. Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation. In 3DV.","DOI":"10.1109\/3DV.2018.00062"},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"crossref","unstructured":"Chaitanya Patel Zhouyingcheng Liao and Gerard Pons-Moll. 2020. TailorNet: Predicting Clothing in 3D as a Function of Human Pose Shape and Garment Style. In CVPR.  Chaitanya Patel Zhouyingcheng Liao and Gerard Pons-Moll. 2020. TailorNet: Predicting Clothing in 3D as a Function of Human Pose Shape and Garment Style. In CVPR.","DOI":"10.1109\/CVPR42600.2020.00739"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"crossref","unstructured":"O. Pele and M. Werman. 2008. A linear time histogram metric for improved sift matching. In ECCV.  O. Pele and M. Werman. 2008. A linear time histogram metric for improved sift matching. In ECCV.","DOI":"10.1007\/978-3-540-88690-7_37"},{"key":"e_1_3_2_2_32_1","unstructured":"Siyuan Qi Wenguan Wang Baoxiong Jia Jianbing Shen and Song-Chun Zhu. 2018. Learning Human-Object Interactions by Graph Parsing Neural Networks. In ECCV.  Siyuan Qi Wenguan Wang Baoxiong Jia Jianbing Shen and Song-Chun Zhu. 2018. Learning Human-Object Interactions by Graph Parsing Neural Networks. In ECCV."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Xufeng Qian Yueting Zhuang Yimeng Li Shaoning Xiao Shiliang Pu and Jun Xiao. 2019. Video Relation Detection with Spatio-Temporal Graph. In ACM MM.  Xufeng Qian Yueting Zhuang Yimeng Li Shaoning Xiao Shiliang Pu and Jun Xiao. 2019. Video Relation Detection with Spatio-Temporal Graph. In ACM MM.","DOI":"10.1145\/3343031.3351058"},{"key":"e_1_3_2_2_34_1","volume-title":"EVA: Generating Emotional Behavior of Virtual Agents Using Expressive Features of Gait and Gaze. In ACM Symposium on Applied Perception.","author":"Randhavane Tanmay","year":"2019"},{"key":"e_1_3_2_2_35_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NeurIPS.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NeurIPS."},{"key":"e_1_3_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Xindi Shang Donglin Di Junbin Xiao Yu Cao Xun Yang and Tat-Seng Chua. 2019. Annotating Objects and Relations in User-Generated Videos. In ICMR.  Xindi Shang Donglin Di Junbin Xiao Yu Cao Xun Yang and Tat-Seng Chua. 2019. Annotating Objects and Relations in User-Generated Videos. In ICMR.","DOI":"10.1145\/3323873.3325056"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Xindi Shang Tongwei Ren Jingfan Guo Hanwang Zhang and Tat-Seng Chua. 2017. Video Visual Relation Detection. In ACM MM.  Xindi Shang Tongwei Ren Jingfan Guo Hanwang Zhang and Tat-Seng Chua. 2017. Video Visual Relation Detection. In ACM MM.","DOI":"10.1145\/3123266.3123380"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"crossref","unstructured":"Lei Shi Yifan Zhang Jian Cheng and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In CVPR.  Lei Shi Yifan Zhang Jian Cheng and Hanqing Lu. 2019. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In CVPR.","DOI":"10.1109\/CVPR.2019.01230"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"crossref","unstructured":"Xu Sun Tongwei Ren Yuan Zi and Gangshan Wu. 2019. Video Visual Relation Detection via Multi-modal Feature Fusion. In ACM MM.  Xu Sun Tongwei Ren Yuan Zi and Gangshan Wu. 2019. Video Visual Relation Detection via Multi-modal Feature Fusion. In ACM MM.","DOI":"10.1145\/3343031.3356076"},{"key":"e_1_3_2_2_40_1","unstructured":"Yao-Hung Hubert Tsai Santosh Divvala Louis-Philippe Morency Ruslan Salakhutdinov and Ali Farhadi. 2019. Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph. In CVPR.  Yao-Hung Hubert Tsai Santosh Divvala Louis-Philippe Morency Ruslan Salakhutdinov and Ali Farhadi. 2019. Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph. In CVPR."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"crossref","unstructured":"Gul Varol Javier Romero Xavier Martin Naureen Mahmood Michael J. Black Ivan Laptev and Cordelia Schmid. 2017. Learning From Synthetic Humans. In CVPR.  Gul Varol Javier Romero Xavier Martin Naureen Mahmood Michael J. Black Ivan Laptev and Cordelia Schmid. 2017. Learning From Synthetic Humans. In CVPR.","DOI":"10.1109\/CVPR.2017.492"},{"key":"e_1_3_2_2_42_1","doi-asserted-by":"crossref","unstructured":"Bo Wan Desen Zhou Yongfei Liu Rongjie Li and Xuming He. 2019. Pose-aware Multi-level Feature Network for Human Object Interaction Detection. In ICCV.  Bo Wan Desen Zhou Yongfei Liu Rongjie Li and Xuming He. 2019. Pose-aware Multi-level Feature Network for Human Object Interaction Detection. In ICCV.","DOI":"10.1109\/ICCV.2019.00956"},{"key":"e_1_3_2_2_43_1","volume-title":"Repulsion Loss: Detecting Pedestrians in a Crowd. In CVPR.","author":"Wang Xinlong","year":"2018"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"crossref","unstructured":"Yunyang Xiong Hyunwoo J. Kim and Vikas Singh. 2019. Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation. In CVPR.  Yunyang Xiong Hyunwoo J. Kim and Vikas Singh. 2019. Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation. In CVPR.","DOI":"10.1109\/CVPR.2019.00793"},{"key":"e_1_3_2_2_45_1","unstructured":"Bingjie Xu Junnan Li Yongkang Wong Mohan S Kankanhalli and Qi Zhao. 2018. Interact as you intend: Intention driven human-object interaction detection. In arXiv preprint arXiv:1808.09796.  Bingjie Xu Junnan Li Yongkang Wong Mohan S Kankanhalli and Qi Zhao. 2018. Interact as you intend: Intention driven human-object interaction detection. In arXiv preprint arXiv:1808.09796."},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"crossref","unstructured":"B. Xu Y. Wong J. Li Q. Zhao and M. S. Kankanhalli. 2019. Learning to Detect Human-Object Interactions With Knowledge. In CVPR.  B. Xu Y. Wong J. Li Q. Zhao and M. S. Kankanhalli. 2019. Learning to Detect Human-Object Interactions With Knowledge. In CVPR.","DOI":"10.1109\/CVPR.2019.00212"},{"key":"e_1_3_2_2_47_1","volume-title":"Grouplet: A structured image representation for recognizing human and object interactions.. In CVPR.","author":"Yao B.","year":"2010"},{"key":"e_1_3_2_2_48_1","unstructured":"Bangpeng Yao Xiaoye Jiang Aditya Khosla Andy Lai Lin Leonidas Guibas and Li Fei-Fei. 2011. Human Action Recognition by Learning Bases of Action Attributes and Parts. In ICCV.  Bangpeng Yao Xiaoye Jiang Aditya Khosla Andy Lai Lin Leonidas Guibas and Li Fei-Fei. 2011. Human Action Recognition by Learning Bases of Action Attributes and Parts. In ICCV."},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"crossref","unstructured":"Pengfei Zhang Cuiling Lan Wenjun Zeng Junliang Xing Jianru Xue and Nanning Zheng. 2020. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. In CVPR.  Pengfei Zhang Cuiling Lan Wenjun Zeng Junliang Xing Jianru Xue and Nanning Zheng. 2020. Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition. In CVPR.","DOI":"10.1109\/CVPR42600.2020.00119"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"crossref","unstructured":"Shifeng Zhang Longyin Wen Xiao Bian Zhen Lei and Stan Z. Li. 2018. Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd. In ECCV.  Shifeng Zhang Longyin Wen Xiao Bian Zhen Lei and Stan Z. Li. 2018. Occlusion-aware R-CNN: Detecting Pedestrians in a Crowd. In ECCV.","DOI":"10.1007\/978-3-030-01219-9_39"}],"event":{"name":"MM '20: The 28th ACM International Conference on Multimedia","location":"Seattle WA USA","acronym":"MM '20","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 28th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413778","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394171.3413778","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:01:17Z","timestamp":1750197677000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394171.3413778"}},"subtitle":["Learning Interactions with Graph and Hierarchical TEmporal Networks for HOI in videos"],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":50,"alternative-id":["10.1145\/3394171.3413778","10.1145\/3394171"],"URL":"https:\/\/doi.org\/10.1145\/3394171.3413778","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}