{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:20Z","timestamp":1750309520416,"version":"3.41.0"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T00:00:00Z","timestamp":1736121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>With the growing demand for video analysis on mobile devices, object tracking has demonstrated to be a suitable assistance to object detection under the Tracking-By-Detection (TBD) paradigm for reducing computational overhead and power demands. However, performing TBD with fixed hyper-parameters leads to computational inefficiency and ignores perceptual dynamics, as fixed setups tend to run suboptimally, given the variability of scenarios. In this article, we propose SmartTBD, a scheduling strategy for TBD based on multi-objective optimization of accuracy-latency metrics. SmartTBD is a novel deep reinforcement learning based scheduling architecture that computes appropriate TBD configurations in video sequences to improve the speed and detection accuracy. This involves a challenging optimization problem due to the intrinsic relation between the video characteristics and the TBD performance. Therefore, we leverage video characteristics, frame information, and the past TBD results to drive the optimization problem. Our approach surpasses baselines with fixed TBD configurations and recent research, achieving accuracy comparable to pure detection while significantly reducing latency. Moreover, it enables performance analysis of tracking and detection in diverse scenarios. The method is proven to be generalizable and highly practical in common video analytics datasets on resource-constrained devices.<\/jats:p>","DOI":"10.1145\/3703912","type":"journal-article","created":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T12:30:36Z","timestamp":1736166636000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SmartTBD: Smart Tracking for Resource-constrained Object Detection"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-9862-8399","authenticated-orcid":false,"given":"Shihang","family":"Zhou","sequence":"first","affiliation":[{"name":"KTH Royal Institute of Technology School of Electrical Engineering and Computer Science, Stockholm, Sweden and Ericsson Research, Ericsson AB, Stockholm, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1398-1296","authenticated-orcid":false,"given":"Alejandra C.","family":"Hernandez","sequence":"additional","affiliation":[{"name":"Ericsson Research, Ericsson AB, Stockholm, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4859-5053","authenticated-orcid":false,"given":"Clara","family":"Gomez","sequence":"additional","affiliation":[{"name":"Ericsson Research, Ericsson AB, Stockholm, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7189-1336","authenticated-orcid":false,"given":"Wenjie","family":"Yin","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology School of Electrical Engineering and Computer Science, Stockholm, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0579-3372","authenticated-orcid":false,"given":"M\u00e5rten","family":"Bj\u00f6rkman","sequence":"additional","affiliation":[{"name":"KTH Royal Institute of Technology School of Electrical Engineering and Computer Science, Stockholm, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,1,6]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"GitHub. n.d. AlexeyAB\/Darknet: Yolov4 Neural Networks for Object Detection (Windows and Linux Version of Darknet). Retrieved December 11 2024 from https:\/\/github.com\/AlexeyAB\/darknet"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510831"},{"key":"e_1_3_1_4_2","volume-title":"Constrained Markov Decision Processes","author":"Altman Eitan","year":"1999","unstructured":"Eitan Altman. 1999. Constrained Markov Decision Processes. Chapman & Hall\/CRC."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","unstructured":"Kittipat Apicharttrisorn Xukan Ran Jiasi Chen Srikanth V. Krishnamurthy and Amit K. Roy-Chowdhury. 2019. Frugal following: Power thrifty object detection and tracking for mobile augmented reality. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems(SenSys\u201919). Association for Computing Machinery New York NY USA. DOI:10.1145\/3356250.3360044","DOI":"10.1145\/3356250.3360044"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","unstructured":"Alexey Bochkovskiy Chien-Yao Wang and Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal speed and accuracy of object detection. arXiv:2004.10934 (2020). DOI:10.48550\/ARXIV.2004.10934","DOI":"10.48550\/ARXIV.2004.10934"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3071178.3079189"},{"key":"e_1_3_1_8_2","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:1606.01540 (2016)."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2022.103508"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2961959"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.272"},{"key":"e_1_3_1_12_2","article-title":"Open Source Computer Vision Library","year":"2015","unstructured":"Itseez. 2015. Open Source Computer Vision Library. Retrieved December 11, 2024 from https:\/\/github.com\/itseez\/opencv","journal-title":"https:\/\/github.com\/itseez\/opencv"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230574"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2010.675"},{"key":"e_1_3_1_15_2","unstructured":"Tsung-Yi Lin Michael Maire Serge Belongie Lubomir Bourdev Ross Girshick James Hays Pietro Perona Deva Ramanan C. Lawrence Zitnick and Piotr Doll\u00e1r. 2015. Microsoft COCO: Common Objects in Context. arXiv:cs.CV\/1405.0312 (2015)."},{"key":"e_1_3_1_16_2","first-page":"976","volume-title":"Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS\u201920)","author":"Liu Miaomiao","year":"2020","unstructured":"Miaomiao Liu, Xianzhong Ding, and Wan Du. 2020. Continuous, real-time object detection on mobile devices without offloading. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS\u201920). IEEE, 976\u2013986."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2020.103448"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.abm6074"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2013.2271451"},{"key":"e_1_3_1_20_2","volume-title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et\u00a0al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates, Red Hook, NY, USA."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2018.8485905"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_1_24_2","first-page":"4510","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Sandler Mark","year":"2018","unstructured":"Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510\u20134520."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347 [cs] (2017). DOI:10.48550\/ARXIV.1707.06347","DOI":"10.48550\/ARXIV.1707.06347"},{"key":"e_1_3_1_26_2","first-page":"621","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Sela Gur-Eyal","year":"2022","unstructured":"Gur-Eyal Sela, Ionel Gog, Justin Wong, Kumar Krishna Agrawal, Xiangxi Mo, Sukrit Kalra, Peter Schafhalter, Eric Leong, Xin Wang, Bharathan Balaji, et\u00a0al. 2022. Context-aware streaming perception in dynamic environments. In Proceedings of the European Conference on Computer Vision. 621\u2013638."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2021.3127492"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.1994.323794"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","unstructured":"Pavel Tokmakov Martial Hebert and Cordelia Schmid. 2020. Unsupervised learning of video representations via dense trajectory clustering. arXiv:2006.15731 (2020). DOI:10.48550\/ARXIV.2006.15731","DOI":"10.48550\/ARXIV.2006.15731"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1109\/ICCV.2013.441","volume-title":"Proceedings of the 2013 IEEE International Conference on Computer Vision","author":"Wang Heng","year":"2013","unstructured":"Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the 2013 IEEE International Conference on Computer Vision. 3551\u20133558. DOI:10.1109\/ICCV.2013.441"},{"key":"e_1_3_1_31_2","first-page":"1","volume-title":"Proceedings of the IEEE Conference on Computer Communications (INFOCOM\u201921)","author":"Wang Xu","year":"2021","unstructured":"Xu Wang, Zheng Yang, Jiahang Wu, Yi Zhao, and Zimu Zhou. 2021. EdgeDuet: Tiling small object detection for edge assisted autonomous mobile vision. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM\u201921). 1\u201310. DOI:10.1109\/INFOCOM42981.2021.9488843"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2388226"},{"key":"e_1_3_1_33_2","doi-asserted-by":"crossref","unstructured":"Ran Xu. n.d. SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles. Retrieved December 11 2024 from https:\/\/openaccess.thecvf.com\/content\/CVPR2022\/papers\/Xu_SmartAdapt_Multi-Branch_Object_Detection_Framework_for_Videos_on_Mobiles_CVPR_2022_paper.pdf","DOI":"10.1109\/CVPR52688.2022.00256"},{"key":"e_1_3_1_34_2","first-page":"334","volume-title":"Proceedings of the 17th European Conference on Computer Systems (EuroSys\u201922)","author":"Xu Ran","year":"2022","unstructured":"Ran Xu, Jayoung Lee, Pengcheng Wang, Saurabh Bagchi, Yin Li, and Somali Chaterji. 2022. LiteReconfig: Cost and content aware reconfiguration of video object detection systems for mobile GPUs. In Proceedings of the 17th European Conference on Computer Systems (EuroSys\u201922). Association for Computing Machinery, New York, NY, USA, 334\u2013351. DOI:10.1145\/3492321.3519577"},{"key":"e_1_3_1_35_2","first-page":"449","volume-title":"Proceedings of the 18th Conference on Embedded Networked Sensor Systems (SenSys\u201920)","author":"Xu Ran","year":"2020","unstructured":"Ran Xu, Chen-Lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, and Saurabh Bagchi. 2020. ApproxDet: Content and contention-aware approximate object detection for mobiles. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems (SenSys\u201920). Association for Computing Machinery, New York, NY, USA, 449\u2013462. DOI:10.1145\/3384419.3431159"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796984"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3126101"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703912","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3703912","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:06Z","timestamp":1750295886000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3703912"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,6]]},"references-count":36,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3703912"],"URL":"https:\/\/doi.org\/10.1145\/3703912","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2025,1,6]]},"assertion":[{"value":"2023-11-06","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-27","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}