{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T23:30:18Z","timestamp":1773876618228,"version":"3.50.1"},"reference-count":32,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T00:00:00Z","timestamp":1738886400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Multi-label attribute recognition is a critical task in computer vision, with applications ranging across diverse fields. This problem often involves detecting objects with multiple attributes, necessitating sophisticated models capable of both high-level differentiation and fine-grained feature extraction. The integration of object detection and attribute recognition typically relies on approaches such as dual-stage networks, where accurate predictions depend on advanced feature extraction techniques, such as Region of Interest (RoI) pooling. To meet these demands, an efficient method that achieves both reliable detection and attribute classification in a unified framework is essential. This study introduces an innovative MTL framework designed to incorporate Multi-Person Attribute Recognition (MPAR) within a single-model architecture. Named MPAR-RCNN, this framework unifies object detection and attribute recognition tasks through a spatially aware, shared backbone, facilitating efficient and accurate multi-label prediction. Unlike the traditional Fast Region-based Convolutional Neural Network (R-CNN), which separately manages person detection and attribute classification with a dual-stage network, the MPAR-RCNN architecture optimizes both tasks within a single structure. Validated on the WIDER (Web Image Dataset for Event Recognition) dataset, the proposed model demonstrates an improvement over current state-of-the-art (SOTA) architectures, showcasing its potential in advancing multi-label attribute recognition.<\/jats:p>","DOI":"10.3389\/frai.2025.1454488","type":"journal-article","created":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T06:52:11Z","timestamp":1738911131000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["MPAR-RCNN: a multi-task network for multiple person detection with attribute recognition"],"prefix":"10.3389","volume":"8","author":[{"given":"S.","family":"Raghavendra","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"S. K.","family":"Abhilash","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Venu Madhav","family":"Nookala","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jayashree","family":"Shetty","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Praveen Gurunath","family":"Bharathi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,2,7]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2004.10934","article-title":"Yolov4: Optimal speed and accuracy of object detection","author":"Bochkovskiy","year":"2020","journal-title":"arXiv"},{"key":"B2","article-title":"\u201cGradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,\u201d","volume-title":"International Conference on Machine Learning","author":"Chen","year":"2018"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2009.09796","article-title":"Multi-task learning with deep neural networks: a survey","author":"Crawshaw","year":"2020","journal-title":"arXiv"},{"key":"B4","first-page":"1851","article-title":"\u201cClass rectification hard mining for imbalanced deep learning,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Dong","year":"2017"},{"key":"B5","doi-asserted-by":"publisher","first-page":"35","DOI":"10.2478\/ijcss-2024-0010","article-title":"Advancing artistic swimming officiating and performance assessment: a computer vision study using mediapipe","volume":"23","author":"Edriss","year":"2024","journal-title":"Int. J. Comp. Sci. Sport"},{"key":"B6","first-page":"1440","article-title":"\u201cFast R-CNN,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Girshick","year":"2015"},{"key":"B7","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81","article-title":"\u201cRich feature hierarchies for accurate object detection and semantic segmentation,\u201d","author":"Girshick","year":"2014","journal-title":"Proceedings of the IEEE conference on computer vision and pattern recognition"},{"key":"B8","first-page":"1080","article-title":"\u201cContextual action recognition with R* CNN,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"Gkioxari","year":"2015"},{"key":"B9","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1016\/j.patrec.2017.05.012","article-title":"Human attribute recognition by refining attention heat map","volume":"94","author":"Guo","year":"2017","journal-title":"Patt. Recogn. Letters"},{"key":"B10","first-page":"2961","article-title":"\u201cMask R-CNN,\u201d","volume-title":"Proceedings of the IEEE International Conference on Computer Vision","author":"He","year":"2017"},{"key":"B11","first-page":"770","article-title":"\u201cDeep residual learning for image recognition,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"He","year":"2016"},{"key":"B12","first-page":"997","article-title":"\u201cMultitask-centernet (MCN): efficient and diverse multitask learning using an anchor free approach,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Heuer","year":"2021"},{"key":"B13","author":"Jia","year":"2024","journal-title":"Wider Dataset: A Dataset for Object Detection"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1620","DOI":"10.1109\/TPAMI.2019.2956039","article-title":"\u201cOn symbiosis of attribute prediction and semantic segmentation","volume":"43","author":"Kalayeh","year":"2019","journal-title":"IEEE Trans. Pattern Analy. Mach. Intellig"},{"key":"B15","first-page":"6129","article-title":"\u201cUbernet: Training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Kokkinos","year":"2017"},{"key":"B16","doi-asserted-by":"crossref","first-page":"684","DOI":"10.1007\/978-3-319-46466-4_41","article-title":"\u201cHuman attribute recognition by deep hierarchical contexts,\u201d","volume-title":"Computer Vision-ECCV 2016: 14th European Conference","author":"Li","year":"2016"},{"key":"B17","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1007\/978-3-319-46448-0_2","article-title":"\u201cSSD: Single shot multibox detector,\u201d","volume-title":"Computer Vision-ECCV 2016: 14th European Conference","author":"Liu","year":"2016"},{"key":"B18","first-page":"3994","article-title":"\u201cCross-stitch networks for multi-task learning,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Misra","year":"2016"},{"key":"B19","doi-asserted-by":"publisher","first-page":"2281","DOI":"10.1049\/ipr2.12195","article-title":"Jointly human semantic parsing and attribute recognition with feature pyramid structure in efficientnets","volume":"15","author":"Moghaddam","year":"2021","journal-title":"IET Image Proc"},{"key":"B20","doi-asserted-by":"publisher","first-page":"11823","DOI":"10.1109\/TITS.2021.3107587","article-title":"Detecting 32 pedestrian attributes for autonomous vehicles","volume":"23","author":"Mordan","year":"2021","journal-title":"IEEE Trans. Intellig. Transp. Syst"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1808990","DOI":"10.1155\/2022\/1808990","article-title":"Hit har: human image threshing machine for human activity recognition using deep learning models","volume":"2022","author":"Poulose","year":"2022","journal-title":"Comp. Intellig. Neurosci"},{"key":"B22","doi-asserted-by":"publisher","first-page":"900","DOI":"10.1109\/TITS.2019.2901817","article-title":"Autonomous vehicles that interact with pedestrians: A survey of theory and practice","volume":"21","author":"Rasouli","year":"2019","journal-title":"IEEE Trans. Intellig. Transp. Syst"},{"key":"B23","article-title":"\u201cFaster R-CNN: Towards real-time object detection with region proposal networks,\u201d","author":"Ren","year":"2015","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B24","first-page":"680","article-title":"\u201cDeep imbalanced attribute classification using visual attention aggregation,\u201d","author":"Sarafianos","year":"2018","journal-title":"Proceedings of the European Conference on Computer Vision (ECCV)"},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1707.06089","article-title":"Deep view-sensitive pedestrian attribute inference in an end-to-end model","author":"Sarfraz","year":"2017","journal-title":"arXiv"},{"key":"B26","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1109\/ISBI.2019.8759531","article-title":"\u201cGraph convolutional neural networks for alzheimer's disease classification,\u201d","volume-title":"2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)","author":"Song","year":"2019"},{"key":"B27","first-page":"2818","article-title":"\u201cRethinking the inception architecture for computer vision,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Szegedy","year":"2016"},{"key":"B28","first-page":"6769","article-title":"\u201cJoint multi-person pose estimation and semantic part segmentation,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Xia","year":"2017"},{"key":"B29","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1109\/TIP.2020.3029901","article-title":"HIER R-CNN: Instance-level human parts detection and a new benchmark","volume":"30","author":"Yang","year":"","journal-title":"IEEE Trans. Image Proc"},{"key":"B30","first-page":"421","article-title":"\u201cRenovating parsing R-CNN for accurate multiple human parsing,\u201d","volume-title":"Computer Vision-ECCV 2020: 16th European Conference","author":"Yang","year":""},{"key":"B31","first-page":"364","article-title":"\u201cParsing R-CNN for instance-level human analysis,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yang","year":"2019"},{"key":"B32","first-page":"184","article-title":"\u201cResidual attention: a simple but effective method for multi-label recognition,\u201d","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Zhu","year":"2021"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1454488\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T06:52:22Z","timestamp":1738911142000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1454488\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,7]]},"references-count":32,"alternative-id":["10.3389\/frai.2025.1454488"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1454488","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,7]]},"article-number":"1454488"}}