{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T10:20:19Z","timestamp":1773656419443,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,2,9]],"date-time":"2022-02-09T00:00:00Z","timestamp":1644364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Video-based person re-identification has become quite attractive due to its importance in many vision surveillance problems. It is a challenging topic due to the inter\/intra changes, occlusion, and pose variations involved. In this paper, we propose a pyramid-attentive framework that relies on multi-part features and multiple attention to aggregate features of multi-levels and learns attention-based representations of persons through various aspects. Self-attention is used to strengthen the most discriminative features in the spatial and channel domains and hence capture robust global information. We propose the use of part-relation attention between different multi-granularities of features\u2019 representation to focus on learning appropriate local features. Temporal attention is used to aggregate temporal features. We integrate the most robust features in the global and multi-level views to build an effective convolution neural network (CNN) model. The proposed model outperforms the previous state-of-the art models on three datasets. Notably, using the proposed model enables the achievement of 98.9% (a relative improvement of 2.7% on the GRL) top1 accuracy and 99.3% mAP on the PRID2011, and 92.8% (a relative improvement of 2.4% relative to GRL) top1 accuracy on iLIDS-vid. We also explore the generalization ability of our model on a cross dataset.<\/jats:p>","DOI":"10.3390\/bdcc6010020","type":"journal-article","created":{"date-parts":[[2022,2,9]],"date-time":"2022-02-09T21:19:06Z","timestamp":1644441546000},"page":"20","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Person Re-Identification via Pyramid Multipart Features and Multi-Attention Framework"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9830-8630","authenticated-orcid":false,"given":"Randa Mohamed","family":"Bayoumi","sequence":"first","affiliation":[{"name":"Informatics Research Department, Electronics Research Institute, Giza 12622, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5421-7948","authenticated-orcid":false,"given":"Elsayed E.","family":"Hemayed","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, Faculty of Engineering, Cairo University, Giza 12613, Egypt"},{"name":"Zewail City of Science and Technology, University of Science and Technology, Giza 12578, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5659-2506","authenticated-orcid":false,"given":"Mohammad Ehab","family":"Ragab","sequence":"additional","affiliation":[{"name":"Informatics Research Department, Electronics Research Institute, Giza 12622, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Magda B.","family":"Fayek","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, Faculty of Engineering, Cairo University, Giza 12613, Egypt"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,2,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.patcog.2019.06.006","article-title":"Improving person re-identification by attribute and identity learning","volume":"95","author":"Lin","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1109\/TPAMI.2019.2928294","article-title":"Leader-based multi-scale attention deep architecture for person re-idenRtification","volume":"42","author":"Qian","year":"2020","journal-title":"IEEE Trans. Pattern. Anal. Mach. Intell."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhou, K., Yang, Y., Cavallaro, A., and Xiang, T. (2019, January 27\u201328). Omni-scale feature learning for person re-identification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00380"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2860","DOI":"10.1109\/TIP.2019.2891888","article-title":"Deep representation learning with part loss for person re-identification","volume":"28","author":"Yao","year":"2019","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1109\/TPAMI.2019.2938523","article-title":"Learning part-based convolutional features for person re-identification","volume":"43","author":"Sun","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"4500","DOI":"10.1109\/TIP.2019.2910414","article-title":"Pose-invariant embedding for deep person re-identification","volume":"28","author":"Zheng","year":"2019","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"107016","DOI":"10.1016\/j.patcog.2019.107016","article-title":"Attributes-aided part detection and refinement for person re-identification","volume":"97","author":"Li","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chen, D., Li, H., Xiao, T., Yi, S., and Wang, X. (2018, January 18\u201322). Video person re-identification with competitive snippet-similarity aggregation and co-attentive snippet embedding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00128"},{"key":"ref_9","first-page":"701","article-title":"Person re-identification via recurrent feature aggregation","volume":"Volume 9910","author":"Yan","year":"2016","journal-title":"Proceedings of the European Conference on Computer Vision"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4870","DOI":"10.1109\/TIP.2019.2911488","article-title":"SCAN: Self-and-collaborative attention network for video person re-identification","volume":"28","author":"Zhang","year":"2019","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"McLaughlin, N., Del Rincon, J.M., and Miller, P. (2016, January 27\u201330). Recurrent convolutional network for video-based person re-identification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.148"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Li, M., Xu, H., Wang, J., Li, W., and Sun, Y. (2020, January 1\u20135). Temporal aggregation with clip-level attention for video-based person re-identification. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093413"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhou, Z., Huang, Y., Wang, W., Wang, L., and Tan, T. (2017, January 22\u201329). See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/CVPR.2017.717"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chung, D., Tahboub, K., and Delp, E.J. (2017, January 22\u201329). A two stream siamese convolutional neural network for person re-identification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.218"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"4192","DOI":"10.1109\/TIP.2019.2908062","article-title":"Spatial-temporal attention-aware learning for video-based person re-identification","volume":"28","author":"Chen","year":"2019","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Xu, S., Cheng, Y., Gu, K., Yang, Y., Chang, S., and Zhou, P. (2017, January 22\u201329). Jointly attentive spatial-temporal pooling networks for video-based person re-identification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.507"},{"key":"ref_17","unstructured":"Wu, L., Shen, C., and Hengel, A.V.D. (2016). Convolutional LSTM networks for video-based person re-identification. arXiv."},{"key":"ref_18","unstructured":"Wu, L., Shen, C., and Hengel, A.V.D. (2016). Deep recurrent convolutional networks for video-based person re-identification: An end-to-end approach. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1366","DOI":"10.1109\/TIP.2018.2878505","article-title":"Video person re-identification by temporal residual learning","volume":"28","author":"Dai","year":"2019","journal-title":"IEEE Trans. Image Processing"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Jiang, X., Gong, Y., Guo, X., Yang, Q., Huang, F., Zheng, W.S., Zheng, F., and Sun, X. (2020, January 7\u201312). Rethinking temporal fusion for video-based person re-identification on semantic and time aspect. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6770"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Matiyali, N., and Sharma, G. (2020, January 14\u201319). Video person re-identification using learned clip similarity aggregation. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seattle, WA, USA.","DOI":"10.1109\/WACV45572.2020.9093510"},{"key":"ref_22","unstructured":"Li, D., Zhang, Z., Chen, X., Ling, H., and Huang, K. (2016). A richly annotated dataset for pedestrian attribute recognition. arXiv."},{"key":"ref_23","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1109\/TPAMI.2002.1017623","article-title":"Multiresolution gray-scale and rotation invariant texture classification with local binary patterns","volume":"24","author":"Ojala","year":"2002","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kl\u00e4ser, A., Marsza\u0142ek, M., and Schmid, C. (2008, January 1\u20134). A spatio-temporal descriptor based on 3D-gradients. Proceedings of the BMVC 2008\u201419th British Machine Vision Conference, Leeds, UK.","DOI":"10.5244\/C.22.99"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Liao, S., Hu, Y., Zhu, X., and Li, S.Z. (2015, January 7\u201313). Person re-identification by local maximal occurrence representation and metric learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298832"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Farenzena, M., Bazzani, L., Perina, A., Murino, V., and Cristani, M. (2010, January 13\u201318). Person re-identification by symmetry-driven accumulation of local features. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539926"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Gray, D., and Tao, H. (2008). Viewpoint Invariant Pedestrian Recognition with an Ensemble of Localized Features, Springer.","DOI":"10.1007\/978-3-540-88682-2_21"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wu, S., Chen, Y.C., Li, X., Wu, A.C., You, J.J., and Zheng, W.S. (2016, January 7\u201310). An enhanced deep feature representation for person re-identification. Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA.","DOI":"10.1109\/WACV.2016.7477681"},{"key":"ref_30","unstructured":"Gao, J., and Nevatia, R. (2018). Revisiting temporal modeling for video-based person ReID. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Shen, X., Jin, Z., Lu, H., and Hua, X.S. (2019, January 15\u201320). Attribute-driven feature disentangling and temporal aggregation for video person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00505"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Subramaniam, A., Nambiar, A., and Mittal, A. (2019, January 15\u201320). Co-segmentation inspired attention networks for video-based person re-identification. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00065"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Andriluka, M., Pishchulin, L., Gehler, P., and Schiele, B. (2014, January 23\u201328). 2D human pose estimation: New benchmark and state of the art analysis. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.471"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Li, X., Zhou, W., Zhou, Y., and Li, H. (2020, January 7\u201312). Relation-guided spatial attention and temporal refinement for video-based person re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6807"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Yan, Y., Qin, J., Chen, J., Liu, L., Zhu, F., Tai, Y., and Shao, L. (2020, January 14\u201319). Learning multi-granular hypergraphs for video-based person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00297"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, Z., Lan, C., Zeng, W., and Chen, Z. (2020, January 14\u201319). Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01042"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Wang, T., Gong, S., Zhu, X., and Wang, S. (2014, January 6\u201312). Person re-identification by video ranking. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10593-2_45"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Hirzer, M., Beleznai, C., Roth, P.M., and Bischof, H. (2011). Person re-identification by descriptive and discriminative classification. Scandinavian Conference on Image Analysis, Springer.","DOI":"10.1007\/978-3-642-21227-7_9"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wu, Y., Lin, Y., Dong, X., Yan, Y., Ouyang, W., and Yang, Y. (2018, January 18\u201322). Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00543"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zheng, L., Bie, Z., Sun, Y., Wang, J., Su, C., Wang, S., and Tian, Q. (2016, January 8\u201316). Mars: A video benchmark for large-scale person re-identification. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_52"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1109\/TPAMI.2009.167","article-title":"Object detection with discriminatively trained part-based models","volume":"32","author":"Felzenszwalb","year":"2010","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Dehghan, A., Assari, S.M., and Shah, M. (2015, January 7\u201312). GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299036"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhong, Z., Zheng, L., Cao, D., and Li, S. (2017, January 22\u201329). Re-ranking person re-identification with k-reciprocal encoding. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/CVPR.2017.389"},{"key":"ref_45","first-page":"8026","article-title":"PyTorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke","year":"2019","journal-title":"Adv. Neural Inf. Processing Syst."},{"key":"ref_46","unstructured":"Kingma, D.P. (2014). Adam: A Method for Stochastic Optimization. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Liu, X., Zhang, P., Yu, C., Lu, H., and Yang, X. (2021, January 20\u201325). Watching you: Global-guided reciprocal learning for video-based person re-identification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01313"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Hou, R., Chang, H., Ma, B., Huang, R., and Shan, S. (2021, January 20\u201325). BiCnet-TKS: Learning efficient spatial-temporal representation for video person re-identification. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00205"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/1\/20\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:16:44Z","timestamp":1760134604000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/1\/20"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,9]]},"references-count":48,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["bdcc6010020"],"URL":"https:\/\/doi.org\/10.3390\/bdcc6010020","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,9]]}}}