{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:47:08Z","timestamp":1750308428501,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2022,12,10]],"date-time":"2022-12-10T00:00:00Z","timestamp":1670630400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2018YFA0701500"],"award-info":[{"award-number":["2018YFA0701500"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61972242"],"award-info":[{"award-number":["61972242"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2023,1,31]]},"abstract":"<jats:p>High-resolution video object recognition (VOR) evolves so fast but is very compute-intensive. This is because VOR leverages compute-intensive deep neural network (DNN) for better accuracy. Although many works have been proposed for speedup, they mostly focus on DNN algorithm and hardware acceleration on the edge side. We observe that most video streams need to be losslessly compressed before going online and an encoder should have all the video information. Moreover, as the cloud should have abundant computing power to handle sophisticated VOR algorithms, we propose to take a one-shot effort for a modified VOR algorithm at the encoding stage in cloud and integrate the full VOR regeneration into a slightly extended decoder on the device. The scheme can enable lightweight VOR with server-class accuracy by simply leveraging the classic and economic video decoder universal to any mobile device. Meanwhile, the scheme can save massive computing power for not repetitively processing the same video on different user devices that makes it extremely sustainable for green computing across the whole network.<\/jats:p>\n          <jats:p>\n            We propose E\n            <jats:sup>2<\/jats:sup>\n            -VOR, an end-to-end encoder and decoder architecture for efficient VOR. We carefully design the scheme to have minimum impact on the video bitstream transmitted. In the cloud, the VOR extended video encoder tracks on a macro-block basis and packs intelligent information into the video stream for increased VOR accuracy and fast regenerating process. On the edge device, we extend the traditional video decoder with a small piece of dedicated hardware to enable the efficient VOR regeneration. Our experiment shows that E\n            <jats:sup>2<\/jats:sup>\n            -VOR can achieve 5.0\u00d7 performance improvement with less than 0.4% VOR accuracy loss compared to the state-of-the-art FAVOS scheme. On average, E\n            <jats:sup>2<\/jats:sup>\n            -VOR can run over 54 frames-per-second (FPS) for 480P videos on an edge device.\n          <\/jats:p>","DOI":"10.1145\/3543852","type":"journal-article","created":{"date-parts":[[2022,6,17]],"date-time":"2022-06-17T09:00:44Z","timestamp":1655456444000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["E\n            <sup>2<\/sup>\n            -VOR: An End-to-End En\/Decoder Architecture for Efficient Video Object Recognition"],"prefix":"10.1145","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6494-4786","authenticated-orcid":false,"given":"Zhuoran","family":"Song","sequence":"first","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Minhang District, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8417-5796","authenticated-orcid":false,"given":"Naifeng","family":"Jing","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Minhang District, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2790-5884","authenticated-orcid":false,"given":"Xiaoyao","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Minhang District, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2022,12,10]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/76.538925"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.173"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_21"},{"key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1109\/ISCA.2018.00051","volume-title":"2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918)","author":"Buckler Mark","year":"2018","unstructured":"Mark Buckler, Philip Bedoukian, Suren Jayasuriya, and Adrian Sampson. 2018. EVA \\( ^2 \\) : Exploiting temporal redundancy in live computer vision. In 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA\u201918). IEEE, 533\u2013546."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.565"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/1816038.1815993"},{"key":"e_1_3_1_8_2","article-title":"Semantic image segmentation with deep convolutional nets and fully connected CRFs","author":"Chen Liang-Chieh","year":"2014","unstructured":"Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062. https:\/\/arxiv.org\/pdf\/1412.7062.pdf.","journal-title":"arXiv:1412.7062"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00774"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.81"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.316"},{"key":"e_1_3_1_13_2","first-page":"92","volume-title":"ACM SIGARCH Computer Architecture News","author":"Du Zidong","year":"2015","unstructured":"Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92\u2013104."},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.169"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"e_1_3_1_16_2","first-page":"682","volume-title":"CVPR Workshops","year":"2014","unstructured":"Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. 2014. A 240 G-ops\/s mobile coprocessor for deep neural networks. In CVPR Workshops. 682\u2013687."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2015.2389824"},{"key":"e_1_3_1_18_2","article-title":"Impression network for video object detection","author":"Hetang Congrui","year":"2017","unstructured":"Congrui Hetang, Hongwei Qin, Shaohui Liu, and Junjie Yan. 2017. Impression network for video object detection. arXiv:1712.05896. https:\/\/arxiv.org\/pdf\/1712.05896.pdf.","journal-title":"arXiv:1712.05896"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_3_1_21_2","first-page":"1","volume-title":"2019 IEEE Hot Chips 31 Symposium (HCS\u201919)","author":"Liao Heng","year":"2019","unstructured":"Heng Liao, Jiajin Tu, Jing Xia, and Xiping Zhou. 2019. DaVinci: A scalable architecture for neural network computing. In 2019 IEEE Hot Chips 31 Symposium (HCS\u201919). IEEE, 1\u201344."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00713"},{"key":"e_1_3_1_24_2","first-page":"13","volume-title":"ICCD","year":"2013","unstructured":"Maurice Peemen, Arnaud A. A. Setio, Bart Mesman, and Henk Corporaal. 2013. Memory-centric accelerator design for convolutional neural networks. In ICCD, Vol. 2013. 13\u201319."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.85"},{"key":"e_1_3_1_26_2","article-title":"The 2017 Davis challenge on video object segmentation","author":"Pont-Tuset Jordi","year":"2017","unstructured":"Jordi Pont-Tuset, Federico Perazzi, Sergi Caelles, Pablo Arbel\u00e1ez, Alex Sorkine-Hornung, and Luc Van Gool. 2017. The 2017 Davis challenge on video object segmentation. arXiv:1704.00675. https:\/\/arxiv.org\/pdf\/1704.00675.pdf.","journal-title":"arXiv:1704.00675"},{"key":"e_1_3_1_27_2","first-page":"91","volume-title":"Advances in Neural Information Processing Systems","author":"Ren Shaoqing","year":"2015","unstructured":"Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91\u201399."},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","first-page":"698","DOI":"10.1109\/MICRO50266.2020.00063","volume-title":"2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Song Zhuoran","year":"2020","unstructured":"Zhuoran Song, Feiyang Wu, Xueyuan Liu, Jing Ke, Naifeng Jing, and Xiaoyao Liang. 2020. VR-DANN: Real-time video recognition via decoder-assisted neural network acceleration. In 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). IEEE, 698\u2013710."},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2012.2221191"},{"key":"e_1_3_1_30_2","unstructured":"FFmpeg team. 2019. FFmpeg. https:\/\/ffmpeg.org."},{"key":"e_1_3_1_31_2","volume-title":"CACTI 5.1","author":"Thoziyoor Shyamkumar","year":"2008","unstructured":"Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. Technical Report HPL-2008-20, HP Labs. https:\/\/www.hpl.hp.com\/techreports\/2008\/HPL-2008-20.html."},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1105734.1105748"},{"key":"e_1_3_1_33_2","article-title":"FixyNN: Efficient hardware for mobile computer vision via transfer learning","author":"Whatmough Paul N.","year":"2019","unstructured":"Paul N. Whatmough, Chuteng Zhou, Patrick Hansen, Shreyas Kolala Venkataramanaiah, Jae-sun Seo, and Matthew Mattina. 2019. FixyNN: Efficient hardware for mobile computer vision via transfer learning. arXiv:1902.11128. https:\/\/arxiv.org\/pdf\/1902.11128.pdf.","journal-title":"arXiv:1902.11128"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241563"},{"key":"e_1_3_1_35_2","first-page":"80","volume-title":"2015 ISSCC","year":"2015","unstructured":"Seongwook Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, and Hoi-Jun Yoo. 2015. A 1.93 TOPS\/W scalable deep learning\/inference processor with tetra-parallel mimd architecture for big data applications. In 2015 ISSCC. IEEE, 80\u201381."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC19947.2020.9063155"},{"key":"e_1_3_1_37_2","first-page":"4694","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ng Joe Yue-Hei","year":"2015","unstructured":"Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. 2015. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694\u20134702."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE51398.2021.9474075"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2016.7418009"},{"key":"e_1_3_1_40_2","unstructured":"Qiuling Zhu Ofer Shacham Albert Meixner Jason Rupert Redgrave Daniel Frederic Finchelstein David Patterson Neeti Desai Donald Stark Edward T. Chang William R. Mark et\u00a0al. 2018. Architecture for high performance power efficient programmable image processing. US Patent No. 9 965 824. Patent: May 8 2018."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.52"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.441"},{"key":"e_1_3_1_43_2","article-title":"Euphrates: Algorithm-SoC co-design for low-power mobile continuous vision","author":"Zhu Yuhao","year":"2018","unstructured":"Yuhao Zhu, Anand Samajdar, Matthew Mattina, and Paul Whatmough. 2018. Euphrates: Algorithm-SoC co-design for low-power mobile continuous vision. arXiv:1803.11232. https:\/\/arxiv.org\/pdf\/1803.11232.pdf.","journal-title":"arXiv:1803.11232"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3543852","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3543852","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:39Z","timestamp":1750268979000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3543852"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,10]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,31]]}},"alternative-id":["10.1145\/3543852"],"URL":"https:\/\/doi.org\/10.1145\/3543852","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2022,12,10]]},"assertion":[{"value":"2021-05-28","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-03","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}