{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T12:03:19Z","timestamp":1775217799124,"version":"3.50.1"},"reference-count":48,"publisher":"Institution of Engineering and Technology (IET)","issue":"1","license":[{"start":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T00:00:00Z","timestamp":1762128000000},"content-version":"vor","delay-in-days":306,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62262036"],"award-info":[{"award-number":["62262036"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62362043"],"award-info":[{"award-number":["62362043"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Image Processing"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:title>ABSTRACT<\/jats:title>\n                  <jats:p>Multi\u2010human parsing faces critical challenges in handling small\u2010scale accessories, distinguishing overlapping instances, and maintaining accuracy across diverse poses. We propose a unified framework with three fundamental innovations. First, unlike standard FPN, our hierarchical multi\u2010scale feature builder employs six sequential deformable encoders with specialized encoding sets (body parts, keypoints, accessories) for adaptive multi\u2010scale aggregation. Second, while UniParser lacks pose guidance, we pioneer the integration of whole\u2010body pose estimation (133 keypoints) integration in single\u2010stage parsing, providing structural constraints for accurate instance disambiguation. Third, instead of convolution\u2010based fusion, we introduce direct indexing association in cosine space, eliminating learned parameters while achieving 22% faster inference. Extensive experiments demonstrate substantial improvements: our ResNet\u2010101 model achieves 68.4%  on CIHP, outperforming UniParser by 10.5% in , and 52.0%  on MHP v2.0. Remarkably, even our ResNet\u201050 variant achieves 48.3%  on MHP v2.0, surpassing several ResNet\u2010101\u2010based approaches and validating our fundamental architectural\u00a0advances.<\/jats:p>","DOI":"10.1049\/ipr2.70240","type":"journal-article","created":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T20:17:02Z","timestamp":1762201022000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Instance\u2010Category Feature Representation and Association Learning for Multi\u2010Human Parsing"],"prefix":"10.1049","volume":"19","author":[{"given":"Lanqing","family":"Ye","sequence":"first","affiliation":[{"name":"Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9685-6599","authenticated-orcid":false,"given":"Li","family":"Liu","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"},{"name":"Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"}]},{"given":"Xiaodong","family":"Fu","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"},{"name":"Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"}]},{"given":"Lijun","family":"Liu","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"},{"name":"Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"}]},{"given":"Wei","family":"Peng","sequence":"additional","affiliation":[{"name":"Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"},{"name":"Computer Technology Application Key Lab of Yunnan Province, Faculty of Information Engineering and Automation Kunming University of Science and Technology Yunnan China"}]}],"member":"265","published-online":{"date-parts":[[2025,11,3]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33014814"},{"key":"e_1_2_10_3_1","doi-asserted-by":"crossref","unstructured":"J.Long E.Shelhamer andT.Darrell \u201cFully Convolutional Networks for Semantic Segmentation \u201d inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(IEEE 2015):3431\u20133440.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_2_10_4_1","doi-asserted-by":"publisher","DOI":"10.1049\/cvi2.12222"},{"key":"e_1_2_10_5_1","doi-asserted-by":"crossref","unstructured":"Q.Li A.Arnab andP. H.Torr \u201cHolistic Instance\u2010Level Human Parsing \u201darXiv preprint arXiv:1709.03612(2017).","DOI":"10.5244\/C.31.25"},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3192989"},{"key":"e_1_2_10_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3260631"},{"key":"e_1_2_10_8_1","doi-asserted-by":"crossref","unstructured":"X.Chen R.Mottaghi X.Liu S.Fidler R.Urtasun andA.Yuille \u201cDetect What you Can: Detecting and Representing Objects Using Holistic Models and Body Parts \u201d inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(IEEE 2014) 1971\u20131978.","DOI":"10.1109\/CVPR.2014.254"},{"key":"e_1_2_10_9_1","doi-asserted-by":"crossref","unstructured":"J.Chu L.Jin J.Xing andJ.Zhao \u201cUniparser: Multi\u2010Human Parsing With Unified Correlation Representation Learning \u201darXiv preprint arXiv:2310.08984(2023).","DOI":"10.1109\/TIP.2024.3456004"},{"key":"e_1_2_10_10_1","doi-asserted-by":"crossref","unstructured":"K.Gong X.Liang Y.Li Y.Chen M.Yang andL.Lin \u201cInstance\u2010Level Human Parsing via Part Grouping Network \u201d inProceedings of the European Conference on Computer Vision (ECCV)(Springer 2018) 770\u2013785.","DOI":"10.1007\/978-3-030-01225-0_47"},{"key":"e_1_2_10_11_1","doi-asserted-by":"crossref","unstructured":"J.Chu L.Jin X.Fan et\u00a0al. \u201cSingle\u2010Stage Multi\u2010Human Parsing via Point Sets and Center\u2010Based Offsets \u201d inProceedings of the 31st ACM International Conference on Multimedia(ACM 2023) 1863\u20131873.","DOI":"10.1145\/3581783.3611993"},{"key":"e_1_2_10_12_1","doi-asserted-by":"crossref","unstructured":"J.Zhao J.Li Y.Cheng T.Sim S.Yan andJ.Feng \u201cUnderstanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and a New Benchmark for Multi\u2010Human Parsing \u201d inProceedings of the 26th ACM International Conference on Multimedia(ACM 2018) 792\u2013800.","DOI":"10.1145\/3240508.3240509"},{"key":"e_1_2_10_13_1","doi-asserted-by":"crossref","unstructured":"K.Sun B.Xiao D.Liu andJ.Wang \u201cDeep High\u2010Resolution Representation Learning for Human Pose Estimation \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(IEEE 2019) 5693\u20135703.","DOI":"10.1109\/CVPR.2019.00584"},{"key":"e_1_2_10_14_1","doi-asserted-by":"crossref","unstructured":"T.\u2010Y. Lin P. Doll\u00e1r R.Girshick K.He B.Hariharan andS.Belongie \u201cFeature Pyramid Networks for Object Detection \u201d inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(IEEE 2017) 2117\u20132125.","DOI":"10.1109\/CVPR.2017.106"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"K.He G.Gkioxari P. Doll\u00e1r andR.Girshick \u201cMask r\u2010cnn \u201d inProceedings of the IEEE International Conference on Computer Vision(IEEE 2017) 2961\u20132969.","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_2_10_17_1","doi-asserted-by":"crossref","unstructured":"L.Yang Q.Song Z.Wang et\u00a0al. \u201cRenovating Parsing r\u2010cnn for Accurate Multiple Human Parsing \u201d inEuropean Conference on computer vision(Springer 2020) 421\u2013437.","DOI":"10.1007\/978-3-030-58610-2_25"},{"key":"e_1_2_10_18_1","doi-asserted-by":"crossref","unstructured":"R.Ji D.Du L.Zhang et\u00a0al. \u201cLearning Semantic Neural Tree for Human Parsing \u201d inComputer Vision\u2013ECCV 2020: 16th European Conference Proceedings Part XIII 16(Springer 2020) 205\u2013221.","DOI":"10.1007\/978-3-030-58601-0_13"},{"key":"e_1_2_10_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00138-023-01392-4"},{"key":"e_1_2_10_20_1","doi-asserted-by":"crossref","unstructured":"Z.Wang J.Zhao C.Lu et\u00a0al. \u201cLearning to Detect Head Movement in Unconstrained Remote Gaze Estimation in the Wild \u201d inProceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision(IEEE 2020) 3443\u20133452.","DOI":"10.1109\/WACV45572.2020.9093476"},{"key":"e_1_2_10_21_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics12040944"},{"key":"e_1_2_10_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2021.104145"},{"key":"e_1_2_10_23_1","doi-asserted-by":"crossref","unstructured":"Z.Cao T.Simon S.\u2010E. Wei andY.Sheikh \u201cRealtime Multi\u2010Person 2D Pose Estimation Using Part Affinity Fields \u201d inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(IEEE 2017):7291\u20137299.","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_10_24_1","doi-asserted-by":"crossref","unstructured":"T.Zhou W.Wang S.Liu Y.Yang andL.vanGool \u201cDifferentiable Multi\u2010Granularity Human Representation Learning for Instance\u2010Aware Human Semantic Parsing \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(IEEE 2021):1622\u20131631.","DOI":"10.1109\/CVPR46437.2021.00167"},{"key":"e_1_2_10_25_1","unstructured":"A.Newell Z.Huang andJ.Deng \u201cAssociative Embedding: End\u2010to\u2010end Learning for Joint Detection and Grouping \u201d inAdvances in Neural Information Processing Systems Vol.30(Curran Associates Inc. 2017)."},{"key":"e_1_2_10_26_1","doi-asserted-by":"crossref","unstructured":"B.Cheng B.Xiao J.Wang H.Shi T. S.Huang andL.Zhang \u201cHigherhrnet: Scale\u2010Aware Representation Learning for Bottom\u2010up Human Pose Estimation \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(IEEE 2020):5386\u20135395.","DOI":"10.1109\/CVPR42600.2020.00543"},{"key":"e_1_2_10_27_1","doi-asserted-by":"crossref","unstructured":"L.\u2010C. Chen Y.Zhu G.Papandreou F.Schroff andH.Adam \u201cEncoder\u2010Decoder With Atrous Separable Convolution for Semantic Image Segmentation \u201d inProceedings of the European Conference on Computer Vision (ECCV)(Springer 2018):801\u2013818.","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"e_1_2_10_28_1","doi-asserted-by":"crossref","unstructured":"Y.Luo Z.Zheng L.Zheng T.Guan J.Yu andY.Yang \u201cMacro\u2010Micro Adversarial Network for Human Parsing \u201d inProceedings of the European Conference on Computer Vision (ECCV)(Springer 2018):418\u2013434.","DOI":"10.1007\/978-3-030-01240-3_26"},{"key":"e_1_2_10_29_1","doi-asserted-by":"crossref","unstructured":"K.Gong Y.Gao X.Liang X.Shen M.Wang andL.Lin \u201cGraphonomy: Universal Human Parsing via Graph Transfer Learning \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(IEEE 2019):7450\u20137459.","DOI":"10.1109\/CVPR.2019.00763"},{"key":"e_1_2_10_30_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6728"},{"key":"e_1_2_10_31_1","doi-asserted-by":"publisher","DOI":"10.1049\/ipr2.13176"},{"key":"e_1_2_10_32_1","doi-asserted-by":"crossref","unstructured":"W.Wang Z.Zhang S.Qi J.Shen Y.Pang andL.Shao \u201cLearning Compositional Neural Information Fusion for Human Parsing \u201d inProceedings of the IEEE\/CVF International Conference on Computer Vision(IEEE 2019):5703\u20135713.","DOI":"10.1109\/ICCV.2019.00580"},{"key":"e_1_2_10_33_1","doi-asserted-by":"crossref","unstructured":"P.Chen X.Yu X.Han et\u00a0al. \u201cPoint\u2010to\u2010box Network for Accurate Object Detection via Single Point Supervision \u201d inEuropean Conference on Computer Vision(Springer 2022):51\u201367.","DOI":"10.1007\/978-3-031-20077-9_4"},{"key":"e_1_2_10_34_1","doi-asserted-by":"crossref","unstructured":"M.KiefelandP. V.Gehler \u201cHuman Pose Estimation With Fields of Parts \u201d inComputer Vision\u2013ECCV 2014: 13th European Conference Proceedings Part V 13(Springer 2014):331\u2013346.","DOI":"10.1007\/978-3-319-10602-1_22"},{"key":"e_1_2_10_35_1","unstructured":"H.Qin W.Hong W.\u2010C.Hung Y.\u2010H.Tsai andM.\u2010H.Yang \u201cA Top\u2010Down Unified Framework for Instance\u2010Level Human Parsing \u201d inBritish Machine Vision Conference BMVC 2019(The British Machine Vision Association 2019)."},{"key":"e_1_2_10_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3181486"},{"key":"e_1_2_10_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2021.3107544"},{"key":"e_1_2_10_38_1","unstructured":"X.Chen X.Wang L.Gao andJ.Song \u201cRepparser: End\u2010to\u2010end Multiple Human Parsing With Representative Parts \u201darXiv preprint arXiv:2208.12908(2022)."},{"key":"e_1_2_10_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3281070"},{"key":"e_1_2_10_40_1","doi-asserted-by":"crossref","unstructured":"K.He X.Zhang S.Ren andJ.Sun \u201cDeep Residual Learning for Image Recognition \u201d inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition(IEEE 2016):770\u2013778.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_10_41_1","unstructured":"X.Zhu W.Su L.Lu B.Li X.Wang andJ.Dai \u201cDeformable Detr: Deformable Transformers for end\u2010to\u2010end Object Detection \u201darXiv preprint arXiv:2010.04159(2020)."},{"key":"e_1_2_10_42_1","doi-asserted-by":"crossref","unstructured":"T.\u2010Y. Lin P.Goyal R.Girshick K.He andP.Doll\u00e1r \u201cFocal Loss for Dense Object Detection \u201d inProceedings of the IEEE International Conference on Computer Vision(IEEE 2017):2980\u20132988.","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_2_10_43_1","doi-asserted-by":"crossref","unstructured":"F.Milletari N.Navab andS.\u2010A.Ahmadi \u201cV\u2010net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation \u201d in2016 Fourth International Conference on 3D Vision (3DV)(IEEE 2016):565\u2013571.","DOI":"10.1109\/3DV.2016.79"},{"key":"e_1_2_10_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3426974"},{"key":"e_1_2_10_45_1","doi-asserted-by":"crossref","unstructured":"L.Yang Q.Song Z.Wang andM.Jiang \u201cParsing r\u2010cnn for Instance\u2010Level Human Analysis \u201d inProceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition(IEEE 2019):364\u2013373.","DOI":"10.1109\/CVPR.2019.00045"},{"key":"e_1_2_10_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3048039"},{"key":"e_1_2_10_47_1","unstructured":"K.Chen J.Wang J.Pang et\u00a0al. \u201cMmdetection: Open Mmlab Detection Toolbox and Benchmark \u201darXiv preprint arXiv:1906.07155(2019)."},{"key":"e_1_2_10_48_1","doi-asserted-by":"crossref","unstructured":"X.Wang T.Kong C.Shen Y.Jiang andL.Li \u201cSolo: Segmenting Objects by Locations \u201d inComputer Vision\u2013ECCV 2020: 16th European Conference Proceedings Part XVIII 16(Springer 2020):649\u2013665.","DOI":"10.1007\/978-3-030-58523-5_38"},{"key":"e_1_2_10_49_1","unstructured":"X.Wang R.Zhang T.Kong L.Li andC.Shen \u201cSolov2: Dynamic and Fast Instance Segmentation \u201d inAdvances in Neural Information Processing Systems Vol.33(Curran Associates 2020):17721\u201317732."}],"container-title":["IET Image Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ipr2.70240","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/full-xml\/10.1049\/ipr2.70240","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/ipr2.70240","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T11:25:02Z","timestamp":1775215502000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/ipr2.70240"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["10.1049\/ipr2.70240"],"URL":"https:\/\/doi.org\/10.1049\/ipr2.70240","archive":["Portico"],"relation":{},"ISSN":["1751-9659","1751-9667"],"issn-type":[{"value":"1751-9659","type":"print"},{"value":"1751-9667","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"2024-08-09","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-03","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"e70240"}}