{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T13:07:23Z","timestamp":1770815243409,"version":"3.50.1"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61722204, 61732007 and 61632007"],"award-info":[{"award-number":["61722204, 61732007 and 61632007"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,11,30]]},"abstract":"<jats:p>\n            Despite the fact that remarkable progress has been made in recent years, Content-based Video Retrieval (CBVR) is still an appealing research topic due to increasing search demands in the Internet era of big data. This article aims to explore an efficient CBVR system by discriminately hashing videos into short binary codes. Existing video hashing methods usually encounter two weaknesses originating from the following sources: (1) Most works adopt the separated stages method or the frame-pooling based end-to-end architecture. However, the spatial-temporal properties of videos cannot be fully explored or kept well in the follow-up hashing step. (2) Discriminative learning based on pairwise or triplet constraints often suffers from slow convergence and poor local optimization, mainly because of the limited samples for each update. To alleviate these problems, we propose an end-to-end video retrieval framework called the Similarity-Preserving Deep Temporal Hashing (SPDTH) network. Specifically, we equip the model with the ability to capture spatial-temporal properties of videos and to generate binary codes by stacked Gated Recurrent Units (GRUs). It unifies video temporal modeling and learning to hash into one step to allow for maximum retention of information. We also introduce a deep metric learning objective called \u2113\n            <jats:sub>2<\/jats:sub>\n            <jats:italic>All<\/jats:italic>\n            _\n            <jats:italic>loss<\/jats:italic>\n            for network training by preserving intra-class similarity and inter-class separability, and a quantization loss between the real-valued outputs and the binary codes is minimized. Extensive experiments on several challenging datasets demonstrate that SPDTH can consistently outperform state-of-the-art methods.\n          <\/jats:p>","DOI":"10.1145\/3356316","type":"journal-article","created":{"date-parts":[[2019,12,16]],"date-time":"2019-12-16T13:12:30Z","timestamp":1576501950000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["Video Retrieval with Similarity-Preserving Deep Temporal Hashing"],"prefix":"10.1145","volume":"15","author":[{"given":"Ling","family":"Shen","sequence":"first","affiliation":[{"name":"Hefei University of Technology, Hefei, Anhui, China"}]},{"given":"Richang","family":"Hong","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, Anhui, China"}]},{"given":"Haoran","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, Anhui, China"}]},{"given":"Xinmei","family":"Tian","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Hefei, Anhui, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3094-7735","authenticated-orcid":false,"given":"Meng","family":"Wang","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Hefei, Anhui, China"}]}],"member":"320","published-online":{"date-parts":[[2019,12,16]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2393347.2393393"},{"key":"e_1_2_1_2_1","volume-title":"30th AAAI Conference on Artificial Intelligence. AAAI, 3457--3463","author":"Cao Yue","year":"2016","unstructured":"Yue Cao , Mingsheng Long , Jianmin Wang , Han Zhu , and Qingfu Wen . 2016 . Deep quantization network for efficient image retrieval . In 30th AAAI Conference on Artificial Intelligence. AAAI, 3457--3463 . Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu, and Qingfu Wen. 2016. Deep quantization network for efficient image retrieval. In 30th AAAI Conference on Artificial Intelligence. AAAI, 3457--3463."},{"key":"e_1_2_1_3_1","volume-title":"Yu","author":"Cao Zhangjie","year":"2017","unstructured":"Zhangjie Cao , Mingsheng Long , Jianmin Wang , and Philip S . Yu . 2017 . Hashnet : Deep learning to hash by continuation. Arxiv Preprint Arxiv :1702.00758 (2017). Zhangjie Cao, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2017. Hashnet: Deep learning to hash by continuation. Arxiv Preprint Arxiv:1702.00758 (2017)."},{"key":"e_1_2_1_4_1","volume-title":"Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio.","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart Van Merrienboer , Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014 . Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078."},{"key":"e_1_2_1_5_1","volume-title":"International Conference on Very Large Data Bases (VLDB'99)","author":"Gionis Aristides","year":"1999","unstructured":"Aristides Gionis , Piotr Indyk , and Rajeev Motwani . 1999 . Similarity search in high dimensions via hashing . In International Conference on Very Large Data Bases (VLDB'99) . 518--529. Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 1999. Similarity search in high dimensions via hashing. In International Conference on Very Large Data Bases (VLDB'99). 518--529."},{"key":"e_1_2_1_6_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 817--824","author":"Gong Yunchao","unstructured":"Yunchao Gong and S. Lazebnik . 2011. Iterative quantization: A procrustean approach to learning binary codes . In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 817--824 . Yunchao Gong and S. Lazebnik. 2011. Iterative quantization: A procrustean approach to learning binary codes. In IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 817--824."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/253769.253798"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2737329"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2610324"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.378"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1631144.1631154"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874033"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2011.2109710"},{"key":"e_1_2_1_15_1","volume-title":"IEEE International Conference on Computer Vision. IEEE, 3192--3199","author":"Jhuang Hueihan","unstructured":"Hueihan Jhuang , Juergen Gall , Silvia Zuffi , Cordelia Schmid , and Michael J. Black . 2013. Towards understanding action recognition . In IEEE International Conference on Computer Vision. IEEE, 3192--3199 . Hueihan Jhuang, Juergen Gall, Silvia Zuffi, Cordelia Schmid, and Michael J. Black. 2013. Towards understanding action recognition. In IEEE International Conference on Computer Vision. IEEE, 3192--3199."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 29","author":"Jiang Yu-Gang","unstructured":"Yu-Gang Jiang , Guangnan Ye , Shih-Fu Chang , Daniel Ellis , and Alexander C. Loui . 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance . In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 29 . Yu-Gang Jiang, Guangnan Ye, Shih-Fu Chang, Daniel Ellis, and Alexander C. Loui. 2011. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval. ACM, 29."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2013.2283497"},{"key":"e_1_2_1_18_1","volume-title":"International Conference on Neural Information Processing Systems","volume":"25","author":"Krizhevsky Alex","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E. Hinton . 2012. ImageNet classification with deep convolutional neural networks . In International Conference on Neural Information Processing Systems , Vol. 25 . Curran Associates Inc., 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, Vol. 25. Curran Associates Inc., 1097--1105."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1126004.1126005"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2012.2199970"},{"key":"e_1_2_1_21_1","first-page":"2482","article-title":"Deep supervised discrete hashing. In Advances in Neural Information Processing Systems","volume":"99","author":"Li Qi","year":"2017","unstructured":"Qi Li , Zhenan Sun , Ran He , and Tieniu Tan . 2017 . Deep supervised discrete hashing. In Advances in Neural Information Processing Systems . PP 99 , 2482 -- 2491 . Qi Li, Zhenan Sun, Ran He, and Tieniu Tan. 2017. Deep supervised discrete hashing. In Advances in Neural Information Processing Systems. PP 99, 2482--2491.","journal-title":"PP"},{"key":"e_1_2_1_22_1","volume-title":"Feature learning based deep supervised hashing with pairwise labels. Arxiv Preprint Arxiv:1511.03855","author":"Li Wu-Jun","year":"2015","unstructured":"Wu-Jun Li , Sheng Wang , and Wang-Cheng Kang . 2015. Feature learning based deep supervised hashing with pairwise labels. Arxiv Preprint Arxiv:1511.03855 ( 2015 ). Wu-Jun Li, Sheng Wang, and Wang-Cheng Kang. 2015. Feature learning based deep supervised hashing with pairwise labels. Arxiv Preprint Arxiv:1511.03855 (2015)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.253"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2645404"},{"key":"e_1_2_1_25_1","volume-title":"Computer Vision and Pattern Recognition (CVPR'15)","author":"Liong Venice Erin","unstructured":"Venice Erin Liong , Jiwen Lu , Gang Wang , Pierre Moulin , and Jie Zhou . 2015. Deep hashing for compact binary codes learning . In Computer Vision and Pattern Recognition (CVPR'15) . IEEE , 2475--2483. Venice Erin Liong, Jiwen Lu, Gang Wang, Pierre Moulin, and Jie Zhou. 2015. Deep hashing for compact binary codes learning. In Computer Vision and Pattern Recognition (CVPR'15). IEEE, 2475--2483."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.227"},{"key":"e_1_2_1_27_1","volume-title":"International Conference on Neural Information Processing Systems (NIPS'14)","author":"Liu Wei","year":"2014","unstructured":"Wei Liu , Sanjiv Kumar , Sanjiv Kumar , and Shih Fu Chang . 2014 . Discrete graph hashing . In International Conference on Neural Information Processing Systems (NIPS'14) . 3419--3427. Wei Liu, Sanjiv Kumar, Sanjiv Kumar, and Shih Fu Chang. 2014. Discrete graph hashing. In International Conference on Neural Information Processing Systems (NIPS'14). 3419--3427."},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Wei Liu Jun Wang Rongrong Ji and Yu Gang Jiang. 2012. Supervised hashing with kernels. In Computer Vision and Pattern Recognition. 2074--2081.  Wei Liu Jun Wang Rongrong Ji and Yu Gang Jiang. 2012. Supervised hashing with kernels. In Computer Vision and Pattern Recognition. 2074--2081.","DOI":"10.1109\/CVPR.2012.6247912"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 1--8.","author":"Liu Wei","year":"2011","unstructured":"Wei Liu , Jun Wang , Sanjiv Kumar , and Shih Fu Chang . 2011 . Hashing with Graphs . In Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 1--8. Wei Liu, Jun Wang, Sanjiv Kumar, and Shih Fu Chang. 2011. Hashing with Graphs. In Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, 1--8."},{"key":"e_1_2_1_30_1","volume-title":"IEEE International Conference on Multimedia and Expo (ICME\u201916)","author":"Nguyen Viet-Anh","unstructured":"Viet-Anh Nguyen and Minh N. Do . 2016. Deep learning based supervised hashing for efficient image retrieval . In IEEE International Conference on Multimedia and Expo (ICME\u201916) . IEEE, 1--6. Viet-Anh Nguyen and Minh N. Do. 2016. Deep learning based supervised hashing for efficient image retrieval. In IEEE International Conference on Multimedia and Expo (ICME\u201916). IEEE, 1--6."},{"key":"e_1_2_1_31_1","first-page":"1061","article-title":"Hamming distance metric learning","volume":"2","author":"Norouzi Mohammad","year":"2012","unstructured":"Mohammad Norouzi , David J. Fleet , and Ruslan Salakhutdinov . 2012 . Hamming distance metric learning . Advances in Neural Information Processing Systems 2 , 1061 -- 1069 . Mohammad Norouzi, David J. Fleet, and Ruslan Salakhutdinov. 2012. Hamming distance metric learning. Advances in Neural Information Processing Systems 2, 1061--1069.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.434"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1561\/1500000014"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2072298.2072354"},{"key":"e_1_2_1_35_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro , Amir Roshan Zamir, and Mubarak Shah . 2012 . UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild . arXiv preprint arXiv:1212.0402. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv preprint arXiv:1212.0402."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2487976"},{"key":"e_1_2_1_37_1","volume-title":"Kitani","author":"Wang Xiaofang","year":"2016","unstructured":"Xiaofang Wang , Yi Shi , and Kris M . Kitani . 2016 . Deep supervised hashing with triplet labels. In Asian Conference on Computer Vision. Springer , 70--84. Xiaofang Wang, Yi Shi, and Kris M. Kitani. 2016. Deep supervised hashing with triplet labels. In Asian Conference on Computer Vision. Springer, 70--84."},{"key":"e_1_2_1_38_1","first-page":"207","article-title":"Distance metric learning for large margin nearest neighbor classification","volume":"10","author":"Weinberger Kilian Q.","year":"2006","unstructured":"Kilian Q. Weinberger and Lawrence K. Saul . 2006 . Distance metric learning for large margin nearest neighbor classification . Journal of Machine Learning Research 10 , 1, 207 -- 244 . Kilian Q. Weinberger and Lawrence K. Saul. 2006. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 1, 207--244.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_39_1","article-title":"Person reidentification via structural deep metric learning","author":"Yang Xun","year":"2018","unstructured":"Xun Yang , Peicheng Zhou , and Meng Wang . 2018 . Person reidentification via structural deep metric learning . IEEE Transactions on Neural Networks and Learning Systems. 1--12. Xun Yang, Peicheng Zhou, and Meng Wang. 2018. Person reidentification via structural deep metric learning. IEEE Transactions on Neural Networks and Learning Systems. 1--12.","journal-title":"IEEE Transactions on Neural Networks and Learning Systems. 1--12."},{"key":"e_1_2_1_40_1","volume-title":"IEEE International Conference on Computer Vision. IEEE, 2272--2279","author":"Ye Guangnan","year":"2014","unstructured":"Guangnan Ye , Dong Liu , Jun Wang , and Shih Fu Chang . 2014 . Large-scale video hashing via structure learning . In IEEE International Conference on Computer Vision. IEEE, 2272--2279 . Guangnan Ye, Dong Liu, Jun Wang, and Shih Fu Chang. 2014. Large-scale video hashing via structure learning. In IEEE International Conference on Computer Vision. IEEE, 2272--2279."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964308"},{"key":"e_1_2_1_42_1","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Zhao Fang","year":"2015","unstructured":"Fang Zhao , Yongzhen Huang , Liang Wang , and Tieniu Tan . 2015 . Deep semantic ranking based hashing for multi-label image retrieval . In IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915) . IEEE, 1556--1564. Fang Zhao, Yongzhen Huang, Liang Wang, and Tieniu Tan. 2015. Deep semantic ranking based hashing for multi-label image retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915). IEEE, 1556--1564."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2415--2421","author":"Zhu Han","year":"2016","unstructured":"Han Zhu , Mingsheng Long , Jianmin Wang , and Yue Cao . 2016 . Deep hashing network for efficient similarity retrieval . In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2415--2421 . Han Zhu, Mingsheng Long, Jianmin Wang, and Yue Cao. 2016. Deep hashing network for efficient similarity retrieval. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. AAAI Press, 2415--2421."}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356316","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356316","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:51Z","timestamp":1750203891000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356316"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,11,30]]}},"alternative-id":["10.1145\/3356316"],"URL":"https:\/\/doi.org\/10.1145\/3356316","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"2018-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}