{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T23:30:17Z","timestamp":1773876617711,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2020,2,3]],"date-time":"2020-02-03T00:00:00Z","timestamp":1580688000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"China Education and Research Computer Network Center","award":["NGII20180617"],"award-info":[{"award-number":["NGII20180617"]}]},{"name":"Shanghai Jianqiao University","award":["SJQ19010"],"award-info":[{"award-number":["SJQ19010"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>As a result of its important role in video surveillance, pedestrian attribute recognition has become an attractive facet of computer vision research. Because of the changes in viewpoints, illumination, resolution and occlusion, the task is very challenging. In order to resolve the issue of unsatisfactory performance of existing pedestrian attribute recognition methods resulting from ignoring the correlation between pedestrian attributes and spatial information, in this paper, the task is regarded as a spatiotemporal, sequential, multi-label image classification problem. An attention-based neural network consisting of convolutional neural networks (CNN), channel attention (CAtt) and convolutional long short-term memory (ConvLSTM) is proposed (CNN-CAtt-ConvLSTM). Firstly, the salient and correlated visual features of pedestrian attributes are extracted by pre-trained CNN and CAtt. Then, ConvLSTM is used to further extract spatial information and correlations from pedestrian attributes. Finally, pedestrian attributes are predicted with optimized sequences based on attribute image area size and importance. Extensive experiments are carried out on two common pedestrian attribute datasets, PEdesTrian Attribute (PETA) dataset and Richly Annotated Pedestrian (RAP) dataset, and higher performance than other state-of-the-art (SOTA) methods is achieved, which proves the superiority and validity of our method.<\/jats:p>","DOI":"10.3390\/s20030811","type":"journal-article","created":{"date-parts":[[2020,2,5]],"date-time":"2020-02-05T03:18:48Z","timestamp":1580872728000},"page":"811","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":30,"title":["Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4915-613X","authenticated-orcid":false,"given":"Yang","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China"},{"name":"School of Information Technology, Shanghai Jianqiao University, Shanghai 201306, China"}]},{"given":"Huahu","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China"},{"name":"Information Office, Shanghai University, Shanghai 200444, China"}]},{"given":"Minjie","family":"Bian","sequence":"additional","affiliation":[{"name":"Information Office, Shanghai University, Shanghai 200444, China"}]},{"given":"Junsheng","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,2,3]]},"reference":[{"key":"ref_1","unstructured":"Wang, X., Zheng, S., Yang, R., Luo, B., and Tang, J. (2019). Pedestrian Attribute Recognition: A Survey. arXiv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liu, Z., Luo, P., Wang, X., and Tang, X. (2015, January 7\u201313). Deep learning face attributes in the wild. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.425"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Su, C., Zhang, S., Xing, J., Gao, W., and Tian, Q. (2016, January 8\u201316). Deep attributes driven multi-camera person re-identification. Proceedings of the European conference on computer vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46475-6_30"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1016\/j.patcog.2019.06.006","article-title":"Improving person re-identification by attribute and identity learning","volume":"95","author":"Lin","year":"2019","journal-title":"Patt. Recognit."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Feris, R., Bobbitt, R., Brown, L., and Pankanti, S. (2014, January 1\u20134). Attribute-based people search: Lessons learnt from a practical surveillance system. Proceedings of the ACM International Conference on Multimedia Retrieval, Glasgow, Scotland.","DOI":"10.1145\/2578726.2578732"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2035","DOI":"10.1109\/TMM.2013.2279658","article-title":"Personal clothing retrieval on photo collections by color and attributes","volume":"15","author":"Wang","year":"2013","journal-title":"IEEE Trans. Multimed."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1216","DOI":"10.1109\/TPAMI.2013.219","article-title":"Soft biometrics; human identification using comparative descriptions","volume":"36","author":"Reid","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, D., Chen, X., and Huang, K. (2015, January 3\u20136). Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia.","DOI":"10.1109\/ACPR.2015.7486476"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1625","DOI":"10.1109\/TPAMI.2017.2723882","article-title":"Joint semantic and latent attribute modelling for cross-class transfer learning","volume":"40","author":"Peng","year":"2018","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1016\/j.imavis.2016.07.004","article-title":"Multi-label convolutional neural network based pedestrian attribute classification","volume":"58","author":"Zhu","year":"2016","journal-title":"Image Vis. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.artint.2017.05.002","article-title":"Discovering visual concept structure with sparse and incomplete tags","volume":"250","author":"Wang","year":"2017","journal-title":"Artif. Intell."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, X., Zhao, H., Tian, M., Sheng, L., Shao, J., Yi, S., Yan, J., and Wang, X. (2017, January 22\u201329). Hydraplus-net: Attentive deep features for pedestrian analysis. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.46"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhou, Y., Yu, K., Leng, B., Zhang, Z., Li, D., and Huang, K. (2017, January 4\u20137). Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization. Proceedings of the British Machine Vision Conference (BMVC), London, UK.","DOI":"10.5244\/C.31.69"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, Y., Huang, C., Loy, C., and Tang, X. (2016, January 11\u201314). Human Attribute Recognition by Deep Hierarchical Contexts. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46466-4_41"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, Y., Lin, G., Zhuang, B., Liu, L., Shen, C., and Hengel, A. (2017, January 21\u201326). Sequential person recognition in photo albums with a recurrent network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.600"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Deng, Y., Luo, P., Loy, C.C., and Tang, X. (2014, January 3\u20137). Pedestrian attribute recognition at far distance. Proceedings of the 22nd ACM international conference on Multimedia, Orlando, Florida, USA.","DOI":"10.1145\/2647868.2654966"},{"key":"ref_17","unstructured":"Li, D., Zhang, Z., Chen, X., Ling, H., and Huang, K. (2016). A richly annotated dataset for pedestrian attribute recognition. arXiv."},{"key":"ref_18","unstructured":"Jaha, E.S., and Nixon, M.S. (October, January 29). Soft biometrics for subject identification using clothing attributes. Proceedings of the IEEE International Joint Conference on Biometrics, Clearwater, FL, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, H., Gallagher, A., and Girod, B. (2012, January 7\u201313). Describing clothing by semantic attributes. Proceedings of the European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33712-3_44"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Shi, Z., Hospedales, T.M., and Xiang, T. (2015, January 7\u201312). Transferring a semantic representation for person re-identification and search. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299046"},{"key":"ref_21","unstructured":"Deng, Y., Luo, P., Loy, C.C., and Tang, X. (2015). Learning to recognize pedestrian attribute. arXiv."},{"key":"ref_22","first-page":"1080","article-title":"Contextual action recognition with R*CNN","volume":"40","author":"Gkioxari","year":"2015","journal-title":"Int. J. Cancer"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhang, N., Paluri, M., Ranzato, M., Darrell, T., and Bourdev, L. (2014, January 24\u201327). Panda: Pose aligned networks for deep attribute modeling. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH.","DOI":"10.1109\/CVPR.2014.212"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Zhu, J., Liao, S., Yi, D., Lei, Z., and Li, S.Z. (2015, January 19\u201322). Multi-label cnn based pedestrian attribute learning for soft biometrics. Proceedings of the 2015 International Conference on Biometrics (ICB), Phuket, Thailand.","DOI":"10.1109\/ICB.2015.7139070"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Fabbri, M., Calderara, S., and Cucchiara, R. (September, January 29). Generative adversarial models for people attribute recognition in surveillance. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Lecce, Italy.","DOI":"10.1109\/AVSS.2017.8078521"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, J., Zhu, X., Gong, S., and Li, W. (2017, January 22\u201329). Attribute recognition by joint recurrent learning of context and correlation. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.65"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sudowe, P., Spitzer, H., and Leibe, B. (2015, January 7\u201313). Person attribute recognition with a jointly-trained holistic CNN model. Proceedings of the IEEE International Conference on Computer Vision Workshops, Santiago, Chile.","DOI":"10.1109\/ICCVW.2015.51"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, D., Chen, X., Zhang, Z., and Huang, K. (2018, January 23\u201327). Pose Guided Deep Model for Pedestrian Attribute Recognition in Surveillance Scenarios. Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA.","DOI":"10.1109\/ICME.2018.8486604"},{"key":"ref_29","unstructured":"Zhao, X., Sang, L., Ding, G., Han, J., Di, N., and Yan, C. (February, January 27). Recurrent attention model for pedestrian attribute recognition. Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_30","unstructured":"Liu, H., Wu, J., Jiang, J., Qi, M., and Bo, R. (2018). Sequence-based Person Attribute Recognition with Joint CTC-Attention Model. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short\u2013Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_32","unstructured":"Xingjian, S., Chen, Z., Wang, H., Yeung, D., Wong, W., and Woo, W. (2015, January 7\u201310). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., Sun, G., and Wu, E. (2019). Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell., 41.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_34","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (July, January 26). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Liu, F., Xiang, T., Hospedales, T.M., Yang, W., and Sun, C. (2017, January 21\u201326). Semantic regularisation for recurrent image annotation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.443"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/3\/811\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T08:54:03Z","timestamp":1760172843000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/3\/811"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,3]]},"references-count":35,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,2]]}},"alternative-id":["s20030811"],"URL":"https:\/\/doi.org\/10.3390\/s20030811","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,3]]}}}