{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T04:54:38Z","timestamp":1781585678223,"version":"3.54.5"},"reference-count":149,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"DOI":"10.13039\/501100001809","name":"Chinese National Natural Science Foundation Projects","doi-asserted-by":"crossref","award":["62206280, U23B2054 and 62276254, 62376265"],"award-info":[{"award-number":["62206280, U23B2054 and 62276254, 62376265"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012401","name":"Beijing Science and Technology Plan Project","doi-asserted-by":"crossref","award":["Z231100005923033"],"award-info":[{"award-number":["Z231100005923033"]}],"id":[{"id":"10.13039\/501100012401","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Beijing Natural Science Foundation","award":["L242092, L221013"],"award-info":[{"award-number":["L242092, L221013"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Comput. Surv."],"published-print":{"date-parts":[[2026,1,31]]},"abstract":"<jats:p>Human parsing has recently attracted increasing attention due to its wide applications in many areas, such as surveillance analysis, human-robot interaction, and person search. Many methods now focus on developing human parsing algorithms based on deep learning. To stimulate future research, we present the comprehensive review of recent advances in this field. This survey analyzes state-of-the-art methods, covering a broad spectrum of pioneering works for human parsing. This work introduces five insightful categories: (1) structure-driven architectures that exploit the relationship of different parts and the inherent hierarchical structure of a human body, (2) graph-based networks model part-relation reasoning to achieve an effective human body analysis, (3) context-aware networks utilize multiple types of contextual information to classify pixels accurately, (4) LSTM-based methods combine short-distance and long-distance spatial dependencies to leverage local and global contexts, and (5) combined auxiliary information approaches use related tasks to improve the performance. We also discuss the advantages and disadvantages of each category and the relationships between different methods. Additionally, we present quantitative performance comparisons of the reviewed methods on benchmark datasets. Finally, we introduce some common applications and suggest new directions for future study.<\/jats:p>","DOI":"10.1145\/3748717","type":"journal-article","created":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T11:21:29Z","timestamp":1752751289000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Deep Learning for Human Parsing: A Survey"],"prefix":"10.1145","volume":"58","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0221-6978","authenticated-orcid":false,"given":"Xiaomei","family":"Zhang","sequence":"first","affiliation":[{"name":"State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA)","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4636-9677","authenticated-orcid":false,"given":"Xiangyu","family":"Zhu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA)","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4976-3095","authenticated-orcid":false,"given":"Ming","family":"Tang","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA)","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0791-189X","authenticated-orcid":false,"given":"Zhen","family":"Lei","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA)","place":["Beijing, China"]},{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS)","place":["Beijing, China"]},{"name":"Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science and Innovation, Chinese Academy of Sciences","place":["Beijing, China"]},{"name":"School of Computer Science and Engineering, the Faculty of Innovation Engineering, Macau University of Science and Technology","place":["Beijing, China"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2025,9]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00127"},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1109\/3DV.2018.00022","volume-title":"Proceedings of the 2018 International Conference on 3D Vision","author":"Alldieck Thiemo","year":"2018","unstructured":"Thiemo Alldieck, Marcus Magnor, Weipeng Xu, Christian Theobalt, and Gerard Pons-Moll. 2018. Detailed human avatars from monocular video. In Proceedings of the 2018 International Conference on 3D Vision. IEEE, 98\u2013109."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00875"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"1014","DOI":"10.1109\/CVPR.2009.5206754","volume-title":"Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition","author":"Andriluka Mykhaylo","year":"2009","unstructured":"Mykhaylo Andriluka, Stefan Roth, and Bernt Schiele. 2009. Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1014\u20131021."},{"key":"e_1_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Vijay Badrinarayanan Alex Kendall and Roberto Cipolla. 2017. SegNet: A deep convolutional encoder-decoder architecture for scene segmentation. In IEEE TPAMI 39 12 (2017) 2481\u20132495.","DOI":"10.1109\/TPAMI.2016.2644615"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00552"},{"key":"e_1_3_1_8_2","volume-title":"Kinesics and Context: Essays on Body Motion Communication","author":"Birdwhistell Ray L.","year":"2010","unstructured":"Ray L. Birdwhistell. 2010. Kinesics and Context: Essays on Body Motion Communication. University of Pennsylvania press."},{"key":"e_1_3_1_9_2","first-page":"1365","volume-title":"Proceedings of the 2009 IEEE 12th International Conference on Computer Vision","author":"Bourdev Lubomir","year":"2009","unstructured":"Lubomir Bourdev and Jitendra Malik. 2009. Poselets: Body part detectors trained using 3d human pose annotations. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 1365\u20131372."},{"key":"e_1_3_1_10_2","unstructured":"Joan Bruna Wojciech Zaremba Arthur Szlam and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. International Conference on Learning Representations 18 (2014) 256\u2013270."},{"key":"e_1_3_1_11_2","doi-asserted-by":"crossref","unstructured":"John Canny. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8 6 (1986) 679\u2013698.","DOI":"10.1109\/TPAMI.1986.4767851"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01505"},{"key":"e_1_3_1_13_2","first-page":"8699","volume-title":"Proceedings of the Advances in Neural Information Processing Systems.","author":"Chen Liang-Chieh","year":"2018","unstructured":"Liang-Chieh Chen, Maxwell Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, and Jon Shlens. 2018. Searching for efficient multi-scale architectures for dense image prediction. In Proceedings of the Advances in Neural Information Processing Systems.8699\u20138710."},{"key":"e_1_3_1_14_2","doi-asserted-by":"crossref","unstructured":"Liang-Chieh Chen George Papandreou Iasonas Kokkinos Kevin Murphy and Alan L. Yuille. 2018. Semantic image segmentation with deep convolutional nets and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 4 (2018) 834\u2013848.","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Liang-Chieh Chen George Papandreou Florian Schroff and Hartwig Adam. 2018. Rethinking atrous convolution for semantic image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 4 (2018) 834\u2013848.","DOI":"10.1109\/TPAMI.2017.2699184"},{"key":"e_1_3_1_17_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Chen Liang-Chieh","year":"2016","unstructured":"Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_18_2","first-page":"801","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Chen Liang-Chieh","year":"2018","unstructured":"Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision.801\u2013818."},{"key":"e_1_3_1_19_2","first-page":"1979","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Chen Xianjie","year":"2014","unstructured":"Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect what you can: Detecting and representing objects using holistic models and body parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1979\u20131986."},{"key":"e_1_3_1_20_2","first-page":"5218","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Cheng Bowen","year":"2019","unstructured":"Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, Jinjun Xiong, Thomas S. Huang, Wen-Mei Hwu, and Honghui Shi. 2019. Spgnet: Semantic prediction guidance for scene parsing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 5218\u20135228."},{"key":"e_1_3_1_21_2","first-page":"17840","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Ci Yuanzheng","year":"2023","unstructured":"Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, and Wanli Ouyang. 2023. Unihcp: A unified model for human-centric perceptions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 17840\u201317852."},{"key":"e_1_3_1_22_2","first-page":"3408","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Dong Jian","year":"2014","unstructured":"Jian Dong, Qiang Chen, Wei Xia, Zhongyang Huang, and Shuicheng Yan. 2014. A deformable mixture parsing model with parselets. In Proceedings of the IEEE International Conference on Computer Vision. 3408\u20133415."},{"issue":"4","key":"e_1_3_1_23_2","first-page":"1","article-title":"Deep learning on monocular object pose detection and tracking: A comprehensive overview","volume":"55","author":"Fan Zhaoxin","year":"2022","unstructured":"Zhaoxin Fan, Yazhi Zhu, Yulin He, Qi Sun, Hongyan Liu, and Jun He. 2022. Deep learning on monocular object pose detection and tracking: A comprehensive overview. Computing Surveys 55, 4 (2022), 1\u201340.","journal-title":"Computing Surveys"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Hao-Shu Fang Guansong Lu Xiaolin Fang Jianwen Xie Yu-Wing Tai and Cewu Lu. 2018. Weakly and semi supervised human body part parsing via pose-guided knowledge transfer. In Proceedings of the European Conference on Computer Vision. 519\u2013534.","DOI":"10.1109\/CVPR.2018.00015"},{"key":"e_1_3_1_25_2","first-page":"2360","volume-title":"Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.","author":"Farenzena M.","year":"2010","unstructured":"M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.2360\u20132367."},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.2981604"},{"key":"e_1_3_1_27_2","first-page":"519","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Ghiasi Golnaz","year":"2016","unstructured":"Golnaz Ghiasi and Charless C. Fowlkes. 2016. Laplacian pyramid reconstruction and refinement for semantic segmentation. In Proceedings of the European Conference on Computer Vision. Springer, 519\u2013534."},{"key":"e_1_3_1_28_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","author":"Gong Ke","year":"2019","unstructured":"Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohu Shen, Meng Wang, and Liang Lin. 2019. Graphonomy: Universal human parsing via graph transfer learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_29_2","first-page":"770","volume-title":"Proceedings of the Proceedings of the European Conference on Computer Vision.","author":"Gong Ke","year":"2018","unstructured":"Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, and Liang Lin. 2018. Instance-level human parsing via part grouping network. In Proceedings of the Proceedings of the European Conference on Computer Vision.770\u2013785."},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"Ke Gong Xiaodan Liang Dongyu Zhang Xiaohui Shen and Liang Lin. 2017. Look into person: Self-supervised structure-sensitive learning and a new benchmark for human parsing. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6.","DOI":"10.1109\/CVPR.2017.715"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00762"},{"key":"e_1_3_1_32_2","first-page":"1522","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"He Haoyu","year":"2021","unstructured":"Haoyu He, Jing Zhang, Bhavani Thuraisingham, and Dacheng Tao. 2021. Progressive one-shot human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. 1522\u20131530."},{"key":"e_1_3_1_33_2","first-page":"10949","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"He Haoyu","year":"2020","unstructured":"Haoyu He, Jing Zhang, Qiming Zhang, and Dacheng Tao. 2020. Grapy-ML: Graph pyramid mutual learning for cross-dataset human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. 10949\u201310956."},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","unstructured":"Haoyu He Jing Zhang Bohan Zhuang Jianfei Cai and Dacheng Tao. 2023. End-to-end one-shot human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 12 (2023) 14481\u201314496.","DOI":"10.1109\/TPAMI.2023.3301672"},{"key":"e_1_3_1_35_2","volume-title":"Proceedings of the 30th AAAI Conference on Artificial Intelligence","author":"He Pan","year":"2016","unstructured":"Pan He, Weilin Huang, Yu Qiao, Chen Change Loy, and Xiaoou Tang. 2016. Reading scene text in deep convolutional sequences. In Proceedings of the 30th AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_1_36_2","unstructured":"Christopher J. Holder and Muhammad Shafique. 2022. On efficient real-time semantic segmentation: A survey. ACM Computing Surveys 55 1 (2022) 1\u201337."},{"key":"e_1_3_1_37_2","volume-title":"Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition","author":"Hong C.","year":"2006","unstructured":"C. Hong, J. X. Zi, Q. L. Zi, and S. C. Zhu. 2006. Composite templates for cloth modeling and sketching. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_38_2","first-page":"336","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Humayun Ahmad","year":"2014","unstructured":"Ahmad Humayun, Fuxin Li, and James M. Rehg. 2014. RIGOR: Reusing inference in graph cuts for generating object regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 336\u2013343."},{"key":"e_1_3_1_39_2","unstructured":"Allan Jabri Andrew Owens and Alexei Efros. 2020. Space-time correspondence as a contrastive random walk. Advances in Neural Information Processing Systems 33 (2020) 19545\u201319560."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00109"},{"key":"e_1_3_1_41_2","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Ji Ruyi","year":"2020","unstructured":"Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, and Siwei Lyu. 2020. Learning semantic neural tree for human parsing. In Proceedings of the European Conference on Computer Vision."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00710"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00868"},{"key":"e_1_3_1_44_2","first-page":"38","volume-title":"Proceedings of the Second International Conference on Automatic Face and Gesture Recognition","author":"Ju Shanon X.","year":"1996","unstructured":"Shanon X. Ju, Michael J. Black, and Yaser Yacoob. 1996. Cardboard people: A parameterized model of articulated image motion. In Proceedings of the Second International Conference on Automatic Face and Gesture Recognition. IEEE, 38\u201344."},{"key":"e_1_3_1_45_2","first-page":"980","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","year":"1994","unstructured":"Kakadiaris, Metaxas, and Bajcsy. 1994. Active part-decomposition, shape and motion estimation of articulated objects: A physics-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 980\u2013984."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00117"},{"key":"e_1_3_1_47_2","doi-asserted-by":"crossref","unstructured":"Rawal Khirodkar Timur Bagautdinov Julieta Martinez Su Zhaoen Austin James Peter Selednik Stuart Anderson and Shunsuke Saito. 2024. Sapiens: Foundation for human vision models. In Proceedings of the European Conference on Computer Vision. 206\u2013228.","DOI":"10.1007\/978-3-031-73235-5_12"},{"key":"e_1_3_1_48_2","first-page":"331","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Kiefel Martin","year":"2014","unstructured":"Martin Kiefel and Peter Vincent Gehler. 2014. Human pose estimation with fields of parts. In Proceedings of the European Conference on Computer Vision. Springer, 331\u2013346."},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_1_50_2","unstructured":"Jianshu Li Jian Zhao Yunchao Wei Congyan Lang Yidong Li Terence Sim Shuicheng Yan and Jiashi Feng. 2017. Multiple-human parsing in the wild. ACM Transactions on Multimedia Computing Communications and Applications 13 3 (2017) 1\u201322."},{"key":"e_1_3_1_51_2","first-page":"4122","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Li Liulei","year":"2023","unstructured":"Liulei Li, Wenguan Wang, and Yi Yang. 2023. Logicseg: Parsing visual semantics with neural logic learning and reasoning. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 4122\u20134133."},{"key":"e_1_3_1_52_2","first-page":"1246","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Liulei","year":"2022","unstructured":"Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, and Yi Yang. 2022. Deep hierarchical semantic segmentation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1246\u20131257."},{"key":"e_1_3_1_53_2","first-page":"8719","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Liulei","year":"2022","unstructured":"Liulei Li, Tianfei Zhou, Wenguan Wang, Lu Yang, Jianwu Li, and Yi Yang. 2022. Locality-aware inter-and intra-video reconstruction for self-supervised correspondence learning. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 8719\u20138730."},{"key":"e_1_3_1_54_2","unstructured":"Qizhu Li Anurag Arnab and Philip H. S. Torr. 2017. Holistic instance-level human parsing. Proceedings of the British Machine Vision Conference. 25.1\u201325.13."},{"key":"e_1_3_1_55_2","first-page":"9263","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Tao","year":"2020","unstructured":"Tao Li, Zhiyuan Liang, Sanyuan Zhao, Jiahao Gong, and Jianbing Shen. 2020. Self-learning with rectification strategy for human parsing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9263\u20139272."},{"key":"e_1_3_1_56_2","unstructured":"Xueting Li Sifei Liu Shalini DeMello XiaolongWang Jan Kautz andMing-Hsuan Yang. 2019. Joint-task self-supervised learning for temporal correspondence. Advances in Neural Information Processing Systems 32 (2019) 317\u2013327."},{"key":"e_1_3_1_57_2","doi-asserted-by":"crossref","unstructured":"Zhiyong Li Jingyi Lv Ying Chen and Jin Yuan. 2021. Person re-identification with part prediction alignment. Computer Vision and Image Understanding 205 4 (2021) 103172.","DOI":"10.1016\/j.cviu.2021.103172"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2014.2306177"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","unstructured":"Xiaodan Liang Ke Gong Xiaohui Shen and Liang Lin. 2018. Look into person: Joint body parsing & pose estimation network and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 41 4 (2018) 871\u2013885.","DOI":"10.1109\/TPAMI.2018.2820063"},{"key":"e_1_3_1_60_2","first-page":"2175","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Liang Xiaodan","year":"2017","unstructured":"Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, and Eric P. Xing. 2017. Interpretable structure-evolving LSTM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2175\u20132184."},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2542983"},{"key":"e_1_3_1_62_2","first-page":"125","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Liang Xiaodan","year":"2016","unstructured":"Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic object parsing with graph lstm. In Proceedings of the European Conference on Computer Vision.125\u2013143."},{"key":"e_1_3_1_63_2","first-page":"3185","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Liang Xiaodan","year":"2016","unstructured":"Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic object parsing with local-global long short-term memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.3185\u20133193."},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.163"},{"issue":"1","key":"e_1_3_1_65_2","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1109\/TPAMI.2016.2537339","article-title":"Human parsing with contextualized convolutional neural network","volume":"39","author":"Liang Xiaodan","year":"2017","unstructured":"Xiaodan Liang, Chunyan Xu, Xiaohui Shen, Jianchao Yang, Si Liu, Jinhui Tang, Liang Lin, and Shuicheng Yan. 2017. Human parsing with contextualized convolutional neural network. In IEEE TPAMI 39, 1 (2017), 115\u2013127.","journal-title":"In IEEE TPAMI"},{"key":"e_1_3_1_66_2","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Lin Di","year":"2018","unstructured":"Di Lin, Yuanfeng Ji, Dani Lischinski, Daniel Cohen-Or, and Hui Huang. 2018. Multi-scale context intertwining for semantic segmentation. In Proceedings of the European Conference on Computer Vision. 603\u2013619."},{"key":"e_1_3_1_67_2","first-page":"1925","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Lin Guosheng","year":"2017","unstructured":"Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. 2017. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1925\u20131934."},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00580"},{"key":"e_1_3_1_69_2","doi-asserted-by":"crossref","unstructured":"Kevin Lin Lijuan Wang Kun Luo Yinpeng Chen Zicheng Liu and Ming-Ting Sun. 2020. Cross-domain complementary learning using pose for multi-person part segmentation. IEEE Transactions on Circuits and Systems for Video Technology 31 3 (2020) 1066\u20131078.","DOI":"10.1109\/TCSVT.2020.2995122"},{"key":"e_1_3_1_70_2","unstructured":"Liang Lin Yiming Gao Ke Gong Meng Wang and Xiaodan Liang. 2020. Graphonomy: Universal image parsing via graph reasoning and transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 5 (2020) 2504\u20132518."},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_3_1_72_2","first-page":"4473","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Liu Kunliang","year":"2022","unstructured":"Kunliang Liu, Ouk Choi, Jianming Wang, and Wonjun Hwang. 2022. CDGNet: Class distribution guided network for human parsing. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 4473\u20134482."},{"issue":"1","key":"e_1_3_1_73_2","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1109\/TMM.2013.2285526","article-title":"Fashion parsing with weak color-category labels","volume":"16","author":"Liu Si","year":"2013","unstructured":"Si Liu, Jiashi Feng, Csaba Domokos, Hui Xu, Junshi Huang, Zhenzhen Hu, and Shuicheng Yan. 2013. Fashion parsing with weak color-category labels. IEEE Transactions on Multimedia 16, 1 (2013), 253\u2013265.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654932"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298748"},{"key":"e_1_3_1_76_2","volume-title":"Proceedings of the AAAI Conference On Artificial Intelligence","author":"Liu Si","year":"2018","unstructured":"Si Liu, Yao Sun, Defa Zhu, Guanghui Ren, Yu Chen, Jiashi Feng, and Jizhong Han. 2018. Cross-domain human parsing via adversarial feature and label adaptation. In Proceedings of the AAAI Conference On Artificial Intelligence."},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00825"},{"key":"e_1_3_1_78_2","doi-asserted-by":"crossref","unstructured":"Ting Liu Tao Ruan Zilong Huang Yunchao Wei Shikui Wei Yao Zhao and Huang Thomas. 2019. Devil in the Details: Towards accurate single and multiple human parsing. Proceedings of the AAAI 33 1 (2019) 4814\u20134821.","DOI":"10.1609\/aaai.v33i01.33014814"},{"issue":"4","key":"e_1_3_1_79_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3524497","article-title":"Recent advances of monocular 2d and 3d human pose estimation: A deep learning perspective","volume":"55","author":"Liu Wu","year":"2022","unstructured":"Wu Liu, Qian Bao, Yu Sun, and Tao Mei. 2022. Recent advances of monocular 2d and 3d human pose estimation: A deep learning perspective. Computing Surveys 55, 4 (2022), 1\u201341.","journal-title":"Computing Surveys"},{"key":"e_1_3_1_80_2","first-page":"338","volume-title":"Proceedings of the 27th ACM International Conference on Multimedia","author":"Liu Xinchen","year":"2019","unstructured":"Xinchen Liu, Meng Zhang, Wu Liu, Jingkuan Song, and Tao Mei. 2019. BraidNet: Braiding semantics and details for accurate human parsing. In Proceedings of the 27th ACM International Conference on Multimedia. 338\u2013346."},{"key":"e_1_3_1_81_2","first-page":"2207","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","volume":"35","author":"Liu Yunan","year":"2021","unstructured":"Yunan Liu, Shanshan Zhang, Jian Yang, and PongChi Yuen. 2021. Hierarchical information passing based noise-tolerant hybrid learning for semi-supervised human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 2207\u20132215."},{"key":"e_1_3_1_82_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413831"},{"key":"e_1_3_1_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_1_84_2","first-page":"2648","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Luo Ping","year":"2013","unstructured":"Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2013. Pedestrian parsing via deep decompositional network. In Proceedings of the IEEE International Conference on Computer Vision. 2648\u20132655."},{"key":"e_1_3_1_85_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240634"},{"key":"e_1_3_1_86_2","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Luo Yawei","year":"2018","unstructured":"Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2018. Macro-micro adversarial network for human parsing. In Proceedings of the European Conference on Computer Vision."},{"key":"e_1_3_1_87_2","volume-title":"Human Anatomy","author":"Martini Frederic","year":"2006","unstructured":"Frederic Martini, Michael J. Timmons, Robert B. Tallitsch, William C. Ober, Claire W. Garrison, Kathleen B. Welch, and Ralph T. Hutchings. 2006. Human Anatomy. Pearson\/Benjamin Cummings San Francisco, CA."},{"key":"e_1_3_1_88_2","volume-title":"New atlas of human anatomy","author":"McCracken Thomas","year":"2000","unstructured":"Thomas McCracken. 2000. New atlas of human anatomy. Barnes & Noble Publishing."},{"key":"e_1_3_1_89_2","unstructured":"Daniel Mckee Zitong Zhan Bing Shuai Davide Modolo Joseph Tighe and Svetlana Lazebnik. 2022. Transfer of representations to video label propagation: Implementation factors matter. Advances in Neural Information Processing Systems 35 (2022) 12345\u201312357."},{"key":"e_1_3_1_90_2","doi-asserted-by":"publisher","DOI":"10.1109\/34.216727"},{"key":"e_1_3_1_91_2","unstructured":"Shervin Minaee Yuri Y. Boykov Fatih Porikli Antonio J. Plaza Nasser Kehtarnavaz and Demetri Terzopoulos. 2021. Image segmentation using deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 7 (2021) 3523\u20133542."},{"key":"e_1_3_1_92_2","first-page":"502","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Nie Xuecheng","year":"2018","unstructured":"Xuecheng Nie, Jiashi Feng, and Shuicheng Yan. 2018. Mutual learning to adapt for joint human parsing and pose estimation. In Proceedings of the European Conference on Computer Vision.502\u2013517."},{"issue":"7","key":"e_1_3_1_93_2","doi-asserted-by":"crossref","first-page":"1555","DOI":"10.1109\/TPAMI.2017.2731842","article-title":"Attribute and-or grammar for joint parsing of human pose, parts and attributes","volume":"40","author":"Park Seyoung","year":"2018","unstructured":"Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. 2018. Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 7 (2018), 1555\u20131569.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_94_2","unstructured":"Adam Paszke Abhishek Chaurasia Sangpil Kim and Eugenio Culurciello. 2016. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv:1606.02147. Retrieved from https:\/\/arxiv.org\/abs\/1606.02147"},{"key":"e_1_3_1_95_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_1_96_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.413"},{"key":"e_1_3_1_97_2","first-page":"640","volume-title":"Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops","author":"Rothrock Brandon","year":"2011","unstructured":"Brandon Rothrock and Song-Chun Zhu. 2011. Human parsing using stochastic and-or grammars and rich appearances. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops. IEEE, 640\u2013647."},{"key":"e_1_3_1_98_2","first-page":"3674","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Sapp Ben","year":"2013","unstructured":"Ben Sapp and Ben Taskar. 2013. Modec: Multimodal decomposable models for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3674\u20133681."},{"key":"e_1_3_1_99_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15552-9_30"},{"key":"e_1_3_1_100_2","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https:\/\/arxiv.org\/abs\/1409.1556"},{"key":"e_1_3_1_101_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00129"},{"key":"e_1_3_1_102_2","first-page":"1649","volume-title":"Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition","author":"Stavens David","year":"2010","unstructured":"David Stavens and Sebastian Thrun. 2010. Unsupervised learning of invariant features using video. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1649\u20131656."},{"key":"e_1_3_1_103_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00584"},{"key":"e_1_3_1_104_2","unstructured":"Ke Sun Yang Zhao Borui Jiang Tianheng Cheng Bin Xiao Dong Liu Yadong Mu Xinggang Wang Wenyu Liu and Jingdong Wang. 2019. High-resolution representations for labeling pixels and regions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 9584\u20139593."},{"key":"e_1_3_1_105_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_1_106_2","unstructured":"Jonathan J. Tompson Arjun Jain Yann LeCun and Christoph Bregler. 2014. Joint training of a convolutional network and a graphical model for human pose estimation. Advances in Neural Information Processing Systems 27 (2014) 1799\u20131807."},{"key":"e_1_3_1_107_2","doi-asserted-by":"crossref","unstructured":"Irem Ulku and Erdem Akag\u00fcnd\u00fcz. 2022. A survey on deep learning-based architectures for semantic segmentation on 2d images. Applied Artificial Intelligence 36 1 (2022) 1\u201345.","DOI":"10.1080\/08839514.2022.2032924"},{"key":"e_1_3_1_108_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.83"},{"key":"e_1_3_1_109_2","first-page":"178","volume-title":"Proceedings of the International MICCAI Brainlesion Workshop","author":"Wang Guotai","year":"2017","unstructured":"Guotai Wang, Wenqi Li, S\u00e9bastien Ourselin, and Tom Vercauteren. 2017. Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks. In Proceedings of the International MICCAI Brainlesion Workshop. Springer, 178\u2013190."},{"key":"e_1_3_1_110_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Wang Wenguan","year":"2018","unstructured":"Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Songchun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_111_2","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision.","author":"Wang Wenguan","year":"2019","unstructured":"Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, and Ling Shao. 2019. Learning compositional neural information fusion for human parsing. In Proceedings of the IEEE\/CVF International Conference on Computer Vision."},{"key":"e_1_3_1_112_2","unstructured":"Wenguan Wang Tianfei Zhou Siyuan Qi Jianbing Shen and Song-Chun Zhu. 2021. Hierarchical human semantic parsing with comprehensive part-relation modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 7 (2021) 3508\u20133522."},{"key":"e_1_3_1_113_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00895"},{"key":"e_1_3_1_114_2","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition.","author":"Wang X.","year":"2019","unstructured":"X. Wang, A. Jabri, and Alexei A. Efros. 2019. Learning correspondence from the cycle-consistency of time. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_115_2","doi-asserted-by":"publisher","DOI":"10.5555\/2503308.2503340"},{"key":"e_1_3_1_116_2","doi-asserted-by":"crossref","unstructured":"Yizhou Wang Yixuan Wu Shixiang Tang Weizhen He Xun Guo Feng Zhu Lei Bai Rui Zhao Jian Wu Tong He et\u00a0al. 2025. Hulk: A universal knowledge translator for human-centric tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 47 7 (2025) 5672\u20135689.","DOI":"10.1109\/TPAMI.2025.3552604"},{"issue":"12","key":"e_1_3_1_117_2","doi-asserted-by":"crossref","first-page":"2402","DOI":"10.1109\/TPAMI.2015.2408360","article-title":"Deep human parsing with active template regression.","volume":"37","author":"X. Liang","year":"2015","unstructured":"Liang X., Liu S., Shen X., Yang J., Liu L., Dong J., Lin L, and Yan S.. 2015. Deep human parsing with active template regression. In IEEE TPAMI 37, 12 (2015), 2402\u20132414.","journal-title":"In IEEE TPAMI"},{"key":"e_1_3_1_118_2","first-page":"648","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Xia Fangting","year":"2015","unstructured":"Fangting Xia, Peng Wang, Liang Chieh Chen, and Alan L. Yuille. 2015. Zoom better to see clearer: Human and object parsing with hierarchical auto-zoom net. In Proceedings of the European Conference on Computer Vision.648\u2013663."},{"key":"e_1_3_1_119_2","first-page":"6080","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Xia Fangting","year":"2017","unstructured":"Fangting Xia, Peng Wang, Xianjie Chen, and Alan Yuille. 2017. Joint multi-person pose estimation and semantic part segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.6080\u20136089."},{"key":"e_1_3_1_120_2","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Xia Fangting","year":"2016","unstructured":"Fangting Xia, Jun Zhu, Peng Wang, and Alan Yuille. 2016. Pose-guided human parsing by an and\/or graph using pose-context features. In Proceedings of the AAAI Conference on Artificial Intelligence."},{"key":"e_1_3_1_121_2","first-page":"3570","volume-title":"Proceedings of the CVPR","author":"Yamaguchi Kota","year":"2012","unstructured":"Kota Yamaguchi. 2012. Parsing clothing in fashion photographs. In Proceedings of the CVPR. 3570\u20133577."},{"key":"e_1_3_1_122_2","first-page":"3519","volume-title":"Proceedings of the IEEE International Conference on Computer Vision.","author":"Yamaguchi Kota","year":"2013","unstructured":"Kota Yamaguchi, M. Hadi Kiapour, and Tamara L. Berg. 2013. Paper doll parsing: Retrieving similar styles to parse clothing items. In Proceedings of the IEEE International Conference on Computer Vision.3519\u20133526."},{"key":"e_1_3_1_123_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00787"},{"key":"e_1_3_1_124_2","doi-asserted-by":"crossref","unstructured":"Lu Yang Wenhe Jia Shan Li and Qing Song. 2024. Deep learning technique for human parsing: A survey and outlook. International Journal of Computer Vision 132 8 (2024) 3270\u20133301.","DOI":"10.1007\/s11263-024-02031-9"},{"key":"e_1_3_1_125_2","first-page":"364","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yang Lu","year":"2019","unstructured":"Lu Yang, Qing Song, Zhihui Wang, and Ming Jiang. 2019. Parsing r-cnn for instance-level human analysis. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 364\u2013373."},{"key":"e_1_3_1_126_2","doi-asserted-by":"crossref","unstructured":"Lu Yang Qing Song Zhihui Wang Zhiwei Liu Songcen Xu and Zhihao Li. 2022. Quality-aware network for human parsing. IEEE Transactions on Multimedia 25 3 (2022) 7128\u20137138.","DOI":"10.1109\/TMM.2022.3217413"},{"key":"e_1_3_1_127_2","first-page":"3182","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Yang Wei","year":"2014","unstructured":"Wei Yang, Ping Luo, and Liang Lin. 2014. Clothing co-parsing by joint image segmentation and labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3182\u20133189."},{"key":"e_1_3_1_128_2","doi-asserted-by":"crossref","first-page":"1385","DOI":"10.1109\/CVPR.2011.5995741","volume-title":"Proceedings of the CVPR 2011","author":"Yang Yi","year":"2011","unstructured":"Yi Yang and Deva Ramanan. 2011. Articulated pose estimation with flexible mixtures-of-parts. In Proceedings of the CVPR 2011. IEEE, 1385\u20131392."},{"key":"e_1_3_1_129_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.261"},{"key":"e_1_3_1_130_2","first-page":"11385","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zeng Dan","year":"2021","unstructured":"Dan Zeng, Yuhang Huang, Qian Bao, Junjie Zhang, Chi Su, and Wu Liu. 2021. Neural architecture search for joint human parsing and pose estimation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 11385\u201311394."},{"key":"e_1_3_1_131_2","doi-asserted-by":"crossref","unstructured":"Hongwen Zhang Jie Cao Guo Lu Wanli Ouyang and Zhenan Sun. 2020. Learning 3d human shape and pose from dense body parts. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 5 (2020) 2610\u20132627.","DOI":"10.1109\/TPAMI.2020.3042341"},{"key":"e_1_3_1_132_2","doi-asserted-by":"crossref","unstructured":"Sanyi Zhang Xiaochun Cao Guo-Jun Qi Zhanjie Song and Jie Zhou. 2022. AIParsing: Anchor-free instance-level human parsing. IEEE Transactions on Image Processing 31 5 (2022) 5599\u20135612.","DOI":"10.1109\/TIP.2022.3192989"},{"key":"e_1_3_1_133_2","doi-asserted-by":"crossref","unstructured":"Xiaomei Zhang Yingying Chen Ming Tang Zhen Lei and Jinqiao Wang. 2022. Grammar-induced wavelet network for human parsing. IEEE Transactions on Image Processing 31 4 (2022) 4502\u20134514.","DOI":"10.1109\/TIP.2022.3181486"},{"key":"e_1_3_1_134_2","doi-asserted-by":"crossref","unstructured":"Xiaomei Zhang Yingying Chen Ming Tang Jinqiao Wang Xiangyu Zhu and Zhen Lei. 2022. Human parsing with part-aware relation modeling. IEEE Transactions on Multimedia 25 3 (2022) 2601\u20132612.","DOI":"10.1109\/TMM.2022.3148595"},{"key":"e_1_3_1_135_2","first-page":"189","volume-title":"Proceedings of the European Conference on Computer Vision.","author":"Zhang Xiaomei","year":"2020","unstructured":"Xiaomei Zhang, Yingying Chen, Bingke Zhu, Jinqiao Wang, and Ming Tang. 2020. Blended grammar network for human parsing. In Proceedings of the European Conference on Computer Vision.189\u2013205."},{"key":"e_1_3_1_136_2","first-page":"8971","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Zhang Xiaomei","year":"2020","unstructured":"Xiaomei Zhang, Yingying Chen, Bingke Zhu, Jinqiao Wang, and Ming Tang. 2020. Part-aware context network for human parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.8971\u20138980."},{"key":"e_1_3_1_137_2","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1007\/978-3-031-20233-9_23","volume-title":"Proceedings of the Chinese Conference on Biometric Recognition","author":"Zhang Xiaomei","year":"2022","unstructured":"Xiaomei Zhang, Feng Pan, Ke Xiang, Xiangyu Zhu, Chang Yu, Zidu Wang, and Zhen Lei. 2022. Contrastive and consistent learning for unsupervised human parsing. In Proceedings of the Chinese Conference on Biometric Recognition. Springer, 226\u2013236."},{"key":"e_1_3_1_138_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Zhang Z.","year":"2020","unstructured":"Z. Zhang, C. Su, L. Zheng, and X. Xie. 2020. Correlating edge, pose with parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_139_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.660"},{"key":"e_1_3_1_140_2","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1145\/3240508.3240509","volume-title":"Proceedings of the 26th ACM International Conference on Multimedia","author":"Zhao Jian","year":"2018","unstructured":"Jian Zhao, Jianshu Li, Yu Cheng, Terence Sim, Shuicheng Yan, and Jiashi Feng. 2018. Understanding humans in crowded scenes: Deep nested adversarial learning and a new benchmark for multi-human parsing. In Proceedings of the 26th ACM International Conference on Multimedia. 792\u2013800."},{"key":"e_1_3_1_141_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-019-01181-5"},{"key":"e_1_3_1_142_2","first-page":"9177","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zhao Yifan","year":"2019","unstructured":"Yifan Zhao, Jia Li, Yu Zhang, and Yonghong Tian. 2019. Multi-class part parsing with joint boundary-semantic awareness. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 9177\u20139186."},{"key":"e_1_3_1_143_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00681"},{"key":"e_1_3_1_144_2","first-page":"1591","volume-title":"Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium.","author":"Zhong Zilong","year":"2016","unstructured":"Zilong Zhong, Jonathan Li, Weihong Cui, and Han Jiang. 2016. Fully convolutional networks for building and road extraction: Preliminary results. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 1591\u20131594."},{"key":"e_1_3_1_145_2","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240660"},{"key":"e_1_3_1_146_2","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Zhou Tianfei","year":"2021","unstructured":"Tianfei Zhou, Wenguan Wang, Si Liu, Yi Yang, and Luc Van Gool. 2021. Differentiable multi-granularity human representation learning for instance-aware human semantic parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_3_1_147_2","unstructured":"Tianfei Zhou Yi Yang and Wenguan Wang. 2023. Differentiable multi-granularity human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 7 (2023) 8296\u20138310."},{"key":"e_1_3_1_148_2","doi-asserted-by":"publisher","DOI":"10.1093\/nsr\/nwx106"},{"key":"e_1_3_1_149_2","doi-asserted-by":"crossref","unstructured":"Bingke Zhu Yingying Chen Ming Tang and JinqiaoWang. 2018. Progressive cognitive human parsing. In Proceedings of the AAAI 32 1 (2018) 182\u2013190.","DOI":"10.1609\/aaai.v32i1.12336"},{"key":"e_1_3_1_150_2","first-page":"346","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Zhu Kuan","year":"2020","unstructured":"Kuan Zhu, Haiyun Guo, Zhiwei Liu, Ming Tang, and Jinqiao Wang. 2020. Identity-guided human semantic parsing for person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 346\u2013363."}],"container-title":["ACM Computing Surveys"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,1]],"date-time":"2025-09-01T12:52:54Z","timestamp":1756731174000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748717"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9]]},"references-count":149,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1,31]]}},"alternative-id":["10.1145\/3748717"],"URL":"https:\/\/doi.org\/10.1145\/3748717","relation":{},"ISSN":["0360-0300","1557-7341"],"issn-type":[{"value":"0360-0300","type":"print"},{"value":"1557-7341","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9]]},"assertion":[{"value":"2023-02-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}