{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T02:26:30Z","timestamp":1769048790515,"version":"3.49.0"},"reference-count":60,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2019,1,24]],"date-time":"2019-01-24T00:00:00Z","timestamp":1548288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key R8D Program of China","award":["2017YFB1300201"],"award-info":[{"award-number":["2017YFB1300201"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61622211, 61472392, 61620106009, and 61525206"],"award-info":[{"award-number":["61622211, 61472392, 61620106009, and 61525206"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,1,31]]},"abstract":"<jats:p>Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.<\/jats:p>","DOI":"10.1145\/3231741","type":"journal-article","created":{"date-parts":[[2019,1,28]],"date-time":"2019-01-28T14:01:39Z","timestamp":1548684099000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":50,"title":["Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos"],"prefix":"10.1145","volume":"15","author":[{"given":"Jiawei","family":"Liu","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"given":"Zheng-Jun","family":"Zha","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"given":"Xuejin","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"given":"Zilei","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"given":"Yongdong","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]}],"member":"320","published-online":{"date-parts":[[2019,1,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00902"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.142"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298764"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.145"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273523"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5539926"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","volume":"1","author":"Huang Gao","unstructured":"Gao Huang , Zhuang Liu , Kilian Q Weinberger , and Laurens van der Maaten. 2017. Densely connected convolutional networks . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , Vol. 1 . 3. Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 3."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2815998"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.513"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.22.99"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354748"},{"key":"e_1_2_1_13_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.782"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.461"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2839916"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298832"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038916"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2715499"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967209"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.434"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.499"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:VISI.0000029664.99615.94"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2016.11.018"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.38"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.152"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.148"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2424056"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299046"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2679002"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_1_32_1","volume-title":"Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196","author":"Tesfaye Yonatan Tariku","year":"2017","unstructured":"Yonatan Tariku Tesfaye , Eyasu Zemene , Andrea Prati , Marcello Pelillo , and Mubarak Shah . 2017. Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 ( 2017 ). Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2017. Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2531280"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.144"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2522418"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00159"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00016"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/1577069.1577078"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46478-7_31"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.140"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10584-0_1"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.507"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_42"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3089249"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10590-1_35"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.150"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.139"},{"key":"e_1_2_1_49_1","volume-title":"Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294","author":"Zhang Wei","year":"2017","unstructured":"Wei Zhang , Shengnan Hu , and Kan Liu . 2017. Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294 ( 2017 ). Wei Zhang, Shengnan Hu, and Kan Liu. 2017. Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294 (2017)."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.349"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2544310"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46466-4_52"},{"key":"e_1_2_1_53_1","volume-title":"Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984","author":"Zheng Liang","year":"2016","unstructured":"Liang Zheng , Yi Yang , and Alexander G Hauptmann . 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 ( 2016 ). Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016)."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995598"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2453984"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159171"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.389"},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037","author":"Zhou Sanping","year":"2017","unstructured":"Sanping Zhou , Jinjun Wang , Jiayun Wang , Yihong Gong , and Nanning Zheng . 2017 . Point to set similarity based deep feature learning for person reidentification . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037 . Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, and Nanning Zheng. 2017. Point to set similarity based deep feature learning for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.717"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 2242--2251","author":"Zhu Jun Yan","unstructured":"Jun Yan Zhu , Taesung Park , Phillip Isola , and Alexei A. Efros . 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks . In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251 . Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3231741","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3231741","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:08:16Z","timestamp":1750208896000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3231741"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,24]]},"references-count":60,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2019,1,31]]}},"alternative-id":["10.1145\/3231741"],"URL":"https:\/\/doi.org\/10.1145\/3231741","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,1,24]]},"assertion":[{"value":"2017-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-01-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}