{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:27:04Z","timestamp":1775744824075,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,10,15]],"date-time":"2018-10-15T00:00:00Z","timestamp":1539561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"China Scholarship Council (CSC)","award":["201503170248"],"award-info":[{"award-number":["201503170248"]}]},{"name":"ECRA","award":["R-263-000-C87-133"],"award-info":[{"award-number":["R-263-000-C87-133"]}]},{"name":"NUS IDS","award":["R-263-000-C67-646"],"award-info":[{"award-number":["R-263-000-C67-646"]}]},{"name":"MOE Tier-II","award":["R-263-000-D17-112"],"award-info":[{"award-number":["R-263-000-D17-112"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"DOI":"10.1145\/3240508.3240509","type":"proceedings-article","created":{"date-parts":[[2018,10,18]],"date-time":"2018-10-18T13:52:08Z","timestamp":1539870728000},"page":"792-800","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":172,"title":["Understanding Humans in Crowded Scenes"],"prefix":"10.1145","author":[{"given":"Jian","family":"Zhao","sequence":"first","affiliation":[{"name":"National University of Singapore &amp; National University of Defense Technology, Singapore, Singapore"}]},{"given":"Jianshu","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Yu","family":"Cheng","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Terence","family":"Sim","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]},{"given":"Shuicheng","family":"Yan","sequence":"additional","affiliation":[{"name":"National University of Singapore &amp; Qihoo 360 AI Institute, Singapore, Singapore"}]},{"given":"Jiashi","family":"Feng","sequence":"additional","affiliation":[{"name":"National University of Singapore, Singapore, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2018,10,15]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.161"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Liang-Chieh Chen Yi Yang Jiang Wang Wei Xu and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In CVPR. 3640--3649. Liang-Chieh Chen Yi Yang Jiang Wang Wei Xu and Alan L Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In CVPR. 3640--3649.","DOI":"10.1109\/CVPR.2016.396"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.254"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.383"},{"key":"e_1_3_2_1_5_1","unstructured":"Robert T. Collins Alan J. Lipton Takeo Kanade Hironobu Fujiyoshi David Duggins Yanghai Tsin David Tolliver Nobuyoshi Enomoto Osamu Hasegawa Peter Burt etal 2000. A system for video surveillance and monitoring. VSAM final report (2000) 1--68. Robert T. Collins Alan J. Lipton Takeo Kanade Hironobu Fujiyoshi David Duggins Yanghai Tsin David Tolliver Nobuyoshi Enomoto Osamu Hasegawa Peter Burt et al. 2000. A system for video surveillance and monitoring. VSAM final report (2000) 1--68."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Marius Cordts Mohamed Omran Sebastian Ramos Timo Rehfeld Markus Enzweiler Rodrigo Benenson Uwe Franke Stefan Roth and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223. Marius Cordts Mohamed Omran Sebastian Ramos Timo Rehfeld Markus Enzweiler Rodrigo Benenson Uwe Franke Stefan Roth and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In CVPR. 3213--3223.","DOI":"10.1109\/CVPR.2016.350"},{"key":"e_1_3_2_1_7_1","volume-title":"Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551","author":"Brabandere Bert De","year":"2017","unstructured":"Bert De Brabandere , Davy Neven , and Luc Van Gool . 2017. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 ( 2017 ). Bert De Brabandere, Davy Neven, and Luc Van Gool. 2017. Semantic instance segmentation with a discriminative loss function. arXiv preprint arXiv:1708.02551 (2017)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.155"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0733-5"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-009-0275-4"},{"key":"e_1_3_2_1_11_1","volume-title":"Hauptmann","author":"Gan Chuang","year":"2016","unstructured":"Chuang Gan , Ming Lin , Yi Yang , Gerard de Melo , and Alexander G . Hauptmann . 2016 . Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition. In AAAI. 3487. Chuang Gan, Ming Lin, Yi Yang, Gerard de Melo, and Alexander G. Hauptmann. 2016. Concepts Not Alone: Exploring Pairwise Relationships for Zero-Shot Video Activity Recognition. In AAAI. 3487."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_2_1_13_1","volume-title":"Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. arXiv preprint arXiv:1703.05446","author":"Gong Ke","year":"2017","unstructured":"Ke Gong , Xiaodan Liang , Xiaohui Shen , and Liang Lin . 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. arXiv preprint arXiv:1703.05446 ( 2017 ). Ke Gong, Xiaodan Liang, Xiaohui Shen, and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. arXiv preprint arXiv:1703.05446 (2017)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Kaiming He Georgia Gkioxari Piotr Doll\u00e1r and Ross Girshick. 2017. Mask r-cnn. In ICCV. 2980--2988. Kaiming He Georgia Gkioxari Piotr Doll\u00e1r and Ross Girshick. 2017. Mask r-cnn. In ICCV. 2980--2988.","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778. Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_16_1","volume-title":"Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly. arXiv preprint arXiv:1604.03880","author":"Jiang Hao","year":"2016","unstructured":"Hao Jiang and Kristen Grauman . 2016 . Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly. arXiv preprint arXiv:1604.03880 (2016). Hao Jiang and Kristen Grauman. 2016. Detangling People: Individuating Multiple Close People and Their Body Parts via Region Assembly. arXiv preprint arXiv:1604.03880 (2016)."},{"key":"e_1_3_2_1_17_1","volume-title":"Jordan Cheney, Kristen Allen, Patrick Grother, Alan Mah, and Anil K. Jain.","author":"Klare Brendan F.","year":"2015","unstructured":"Brendan F. Klare , Ben Klein , Emma Taborsky , Austin Blanton , Jordan Cheney, Kristen Allen, Patrick Grother, Alan Mah, and Anil K. Jain. 2015 . Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In CVPR. 1931--1939. Brendan F. Klare, Ben Klein, Emma Taborsky, Austin Blanton, Jordan Cheney, Kristen Allen, Patrick Grother, Alan Mah, and Anil K. Jain. 2015. Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark A. In CVPR. 1931--1939."},{"key":"e_1_3_2_1_18_1","volume-title":"Advances in Neural Information Processing Systems 24","author":"Kr\u00e4henb\u00fchl Philipp","unstructured":"Philipp Kr\u00e4henb\u00fchl and Vladlen Koltun . 2011. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials . In Advances in Neural Information Processing Systems 24 , J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc. , 109--117. http:\/\/papers.nips.cc\/paper\/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials.pdf Philipp Kr\u00e4henb\u00fchl and Vladlen Koltun. 2011. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 109--117. http:\/\/papers.nips.cc\/paper\/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials.pdf"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Guanbin Li Yuan Xie Liang Lin and Yizhou Yu. 2017. Instance-level salient object segmentation. In CVPR. 247--256. Guanbin Li Yuan Xie Liang Lin and Yizhou Yu. 2017. Instance-level salient object segmentation. In CVPR. 247--256.","DOI":"10.1109\/CVPR.2017.34"},{"key":"e_1_3_2_1_20_1","volume-title":"Multi-Human Parsing in the Wild. arXiv preprint arXiv:1705.07206","author":"Li Jianshu","year":"2017","unstructured":"Jianshu Li , Jian Zhao , Yunchao Wei , Congyan Lang , Yidong Li , Terence Sim , Shuicheng Yan , and Jiashi Feng . 2017. Multi-Human Parsing in the Wild. arXiv preprint arXiv:1705.07206 ( 2017 ). Jianshu Li, Jian Zhao, Yunchao Wei, Congyan Lang, Yidong Li, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2017. Multi-Human Parsing in the Wild. arXiv preprint arXiv:1705.07206 (2017)."},{"key":"e_1_3_2_1_21_1","volume-title":"Torr","author":"Li Qizhu","year":"2017","unstructured":"Qizhu Li , Anurag Arnab , and Philip H. S . Torr . 2017 . Holistic, Instance-level Human Parsing . arXiv preprint arXiv:1709.03612 (2017). Qizhu Li, Anurag Arnab, and Philip H. S. Torr. 2017. Holistic, Instance-level Human Parsing. arXiv preprint arXiv:1709.03612 (2017)."},{"key":"e_1_3_2_1_22_1","volume-title":"Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636","author":"Liang Xiaodan","year":"2015","unstructured":"Xiaodan Liang , Yunchao Wei , Xiaohui Shen , Jianchao Yang , Liang Lin , and Shuicheng Yan . 2015. Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 ( 2015 ). Xiaodan Liang, Yunchao Wei, Xiaohui Shen, Jianchao Yang, Liang Lin, and Shuicheng Yan. 2015. Proposal-free network for instance-level object segmentation. arXiv preprint arXiv:1509.02636 (2015)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.163"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2992138.2992144"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740--755. Tsung-Yi Lin Michael Maire Serge Belongie James Hays Pietro Perona Deva Ramanan Piotr Doll\u00e1r and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740--755.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"crossref","unstructured":"Si Liu Changhu Wang Ruihe Qian Han Yu Renda Bao and Yao Sun. 2017. Surveillance video parsing with single frame supervision. In CVPRW. 1--9. Si Liu Changhu Wang Ruihe Qian Han Yu Renda Bao and Yao Sun. 2017. Surveillance video parsing with single frame supervision. In CVPRW. 1--9.","DOI":"10.1109\/CVPR.2017.114"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Jonathan Long Evan Shelhamer and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440. Jonathan Long Evan Shelhamer and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In CVPR. 3431--3440.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_2_1_28_1","unstructured":"Andrew Y. Ng Michael I. Jordan and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In NIPS. 849--856. Andrew Y. Ng Michael I. Jordan and Yair Weiss. 2002. On spectral clustering: Analysis and an algorithm. In NIPS. 849--856."},{"key":"e_1_3_2_1_29_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS. 91--99. Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS. 91--99."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.471"},{"key":"e_1_3_2_1_31_1","volume-title":"Electronic commerce: A managerial perspective","author":"Turban Efraim","year":"2002","unstructured":"Efraim Turban , David King , Jae Lee , and Dennis Viehland . 2002. Electronic commerce: A managerial perspective 2002 . Prentice Hall : ISBN 0 Vol. 13 , 975285 (2002), 4. Efraim Turban, David King, Jae Lee, and Dennis Viehland. 2002. Electronic commerce: A managerial perspective 2002. Prentice Hall: ISBN 0 Vol. 13, 975285 (2002), 4."},{"key":"e_1_3_2_1_32_1","volume-title":"Torr","author":"Vineet Vibhav","year":"2011","unstructured":"Vibhav Vineet , Jonathan Warrell , Lubor Ladicky , and Philip H. S . Torr . 2011 . Human Instance Segmentation from Video using Detector-based Conditional Random Fields. In BMVC , Vol. Vol. 2 . 12--15. Vibhav Vineet, Jonathan Warrell, Lubor Ladicky, and Philip H. S. Torr. 2011. Human Instance Segmentation from Video using Detector-based Conditional Random Fields. In BMVC, Vol. Vol. 2. 12--15."},{"key":"e_1_3_2_1_33_1","volume-title":"Wider or deeper: Revisiting the resnet model for visual recognition. arXiv preprint arXiv:1611.10080","author":"Wu Zifeng","year":"2016","unstructured":"Zifeng Wu , Chunhua Shen , and Anton van den Hengel . 2016. Wider or deeper: Revisiting the resnet model for visual recognition. arXiv preprint arXiv:1611.10080 ( 2016 ). Zifeng Wu, Chunhua Shen, and Anton van den Hengel. 2016. Wider or deeper: Revisiting the resnet model for visual recognition. arXiv preprint arXiv:1611.10080 (2016)."},{"key":"e_1_3_2_1_34_1","volume-title":"Huang","author":"Xu Ning","year":"2016","unstructured":"Ning Xu , Brian Price , Scott Cohen , Jimei Yang , and Thomas S . Huang . 2016 . Deep interactive object selection. In CVPR. 373--381. Ning Xu, Brian Price, Scott Cohen, Jimei Yang, and Thomas S. Huang. 2016. Deep interactive object selection. In CVPR. 373--381."},{"key":"e_1_3_2_1_35_1","volume-title":"Berg","author":"Yamaguchi Kota","year":"2012","unstructured":"Kota Yamaguchi , M. Hadi Kiapour , Luis E. Ortiz , and Tamara L . Berg . 2012 . Parsing clothing in fashion photographs. In CVPR. 3570--3577. Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, and Tamara L. Berg. 2012. Parsing clothing in fashion photographs. In CVPR. 3570--3577."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Ning Zhang Manohar Paluri Yaniv Taigman Rob Fergus and Lubomir Bourdev. 2015. Beyond frontal faces: Improving person recognition using multiple cues. In CVPR. 4804--4813. Ning Zhang Manohar Paluri Yaniv Taigman Rob Fergus and Lubomir Bourdev. 2015. Beyond frontal faces: Improving person recognition using multiple cues. In CVPR. 4804--4813.","DOI":"10.1109\/CVPR.2015.7299113"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-017-1055-1"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Jian Zhao Jianshu Li Xuecheng Nie Fang Zhao Yunpeng Chen Zhecan Wang Jiashi Feng and Shuicheng Yan. 2017. Self-Supervised Neural Aggregation Networks for Human Parsing. In CVPRW. 7--15. Jian Zhao Jianshu Li Xuecheng Nie Fang Zhao Yunpeng Chen Zhecan Wang Jiashi Feng and Shuicheng Yan. 2017. Self-Supervised Neural Aggregation Networks for Human Parsing. In CVPRW. 7--15.","DOI":"10.1109\/CVPRW.2017.204"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.460"}],"event":{"name":"MM '18: ACM Multimedia Conference","location":"Seoul Republic of Korea","acronym":"MM '18","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 26th ACM international conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240509","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3240508.3240509","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T20:40:21Z","timestamp":1775248821000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3240508.3240509"}},"subtitle":["Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing"],"short-title":[],"issued":{"date-parts":[[2018,10,15]]},"references-count":39,"alternative-id":["10.1145\/3240508.3240509","10.1145\/3240508"],"URL":"https:\/\/doi.org\/10.1145\/3240508.3240509","relation":{},"subject":[],"published":{"date-parts":[[2018,10,15]]},"assertion":[{"value":"2018-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}