{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T13:59:20Z","timestamp":1769003960976,"version":"3.49.0"},"reference-count":9,"publisher":"Association for Computing Machinery (ACM)","issue":"4","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["GetMobile: Mobile Comp. and Comm."],"published-print":{"date-parts":[[2025,1,20]]},"abstract":"<jats:p>Mobile Augmented Reality (AR) applications demand high-quality, real-time visual prediction, including pixel-level depth and semantics, to enable immersive and context-aware user experiences. Recently, Vision Foundation Models (VFMs) have offered strong generalization capabilities on diverse and unseen data, supporting scalable mobile AR experiences. However, deploying VFMs on mobile devices is challenging due to computational limitations, particularly in maintaining both prediction accuracy and real-time performance. In this article, we present ARIA [3], the first system that enables on-device inference acceleration of a VFM. ARIA employs the heterogeneity of mobile processors through a parallel and selective inference scheme: full-frame prediction is periodically offloaded to a processor with high parallelism capability like GPU, while lowlatency updates on dynamic regions are conducted via a specialized accelerator like NPU. Implemented and evaluated using mobile devices, ARIA achieved significant improvements in accuracy and deadline success rate on real-world mobile AR scenarios.<\/jats:p>","DOI":"10.1145\/3793236.3793246","type":"journal-article","created":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T23:18:20Z","timestamp":1768951100000},"page":"31-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["VFM in Your Hands: Optimizing Real-Time Scene Understanding for Mobile Augmented Reality"],"prefix":"10.1145","volume":"29","author":[{"given":"Jeho","family":"Lee","sequence":"first","affiliation":[{"name":"Yonsei University,, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chanyoung","family":"Jung","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gunjoong","family":"Kim","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiwon","family":"Kim","sequence":"additional","affiliation":[{"name":"Uppsala University, Uppsala, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Seonghoon","family":"Park","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hojung","family":"Cha","sequence":"additional","affiliation":[{"name":"Yonsei University, Seoul, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,1,20]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"M. Oquab T. Darcet T. Moutakanni H. Vo M. Szafraniec V. Khalidov P. Fernandez D. Haziza F. Massa and A. El-Nouby. 2023. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision, 12179--12188","author":"Ranftl R.","unstructured":"R. Ranftl, A. Bochkovskiy and V. Koltun. 2021. Vision transformers for dense prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, 12179--12188"},{"key":"e_1_2_1_3_1","volume-title":"ACM Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, 249--262","author":"Jung C.","year":"1875","unstructured":"C. Jung, J. Lee, G. Kim, J. Kim, S. Park and H. Cha. 2025. ARIA: Optimizing vision foundation model inference on heterogeneous mobile processors for augmented reality. ACM Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, 249--262. https:\/\/doi. org\/10.1145\/3711875.3729161"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1290--1299","author":"Cheng B.","unstructured":"B. Cheng, I. Misra, A.G. Schwing, A. Kirillov and R. Girdhar. 2022. Masked-attention mask transformer for universal image segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 1290--1299."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3669940.3707239"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, 129--144","author":"Xu M.","unstructured":"M. Xu, M. Zhu, Y. Liu, F. X. Lin and X. Liu. 2018. Deepcache: Principled cache for mobile deep vision. Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, 129--144."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633--641","author":"Zhou B.","unstructured":"B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. 2017. Scene parsing through ade20k dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 633--641."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 25th Annual International Conference on Mobile Computing and Networking, 1--16","author":"Lee R.","unstructured":"R. Lee, S.I. Venieris, L. Dudziak, S. Bhattacharya and N.D. Lane. 2019. Mobisr: Efficient ondevice super-resolution through heterogeneous mobile processors. Proceedings of the 25th Annual International Conference on Mobile Computing and Networking, 1--16."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the International Symposium on Low Power Electronics and Design, 1--6.","author":"Park J.","unstructured":"J. Park, S. Lee and H. Cha. 2018. App-oriented thermal management of mobile devices. Proceedings of the International Symposium on Low Power Electronics and Design, 1--6."}],"container-title":["GetMobile: Mobile Computing and Communications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3793236.3793246","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,20]],"date-time":"2026-01-20T23:18:31Z","timestamp":1768951111000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3793236.3793246"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,20]]},"references-count":9,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,1,20]]}},"alternative-id":["10.1145\/3793236.3793246"],"URL":"https:\/\/doi.org\/10.1145\/3793236.3793246","relation":{},"ISSN":["2375-0529","2375-0537"],"issn-type":[{"value":"2375-0529","type":"print"},{"value":"2375-0537","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,20]]},"assertion":[{"value":"2026-01-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}