{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T17:42:22Z","timestamp":1757612542940,"version":"3.44.0"},"publisher-location":"New York, NY, USA","reference-count":39,"publisher":"ACM","license":[{"start":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T00:00:00Z","timestamp":1746489600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"NSF (National Science Foundation)","doi-asserted-by":"publisher","award":["CAREER CNS-1845969"],"award-info":[{"award-number":["CAREER CNS-1845969"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006374","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CPS Frontier CNS-1954556"],"award-info":[{"award-number":["CPS Frontier CNS-1954556"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,5,6]]},"DOI":"10.1145\/3716550.3722033","type":"proceedings-article","created":{"date-parts":[[2025,5,7]],"date-time":"2025-05-07T06:20:57Z","timestamp":1746598857000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4187-6167","authenticated-orcid":false,"given":"Joshua R.","family":"Waite","sequence":"first","affiliation":[{"name":"Iowa State University, Ames, IA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3213-2719","authenticated-orcid":false,"given":"Md Zahid","family":"Hasan","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, IA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9553-0789","authenticated-orcid":false,"given":"Qisai","family":"Liu","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, IA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5363-7898","authenticated-orcid":false,"given":"Zhanhong","family":"Jiang","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, IA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4574-8066","authenticated-orcid":false,"given":"Chinmay","family":"Hegde","sequence":"additional","affiliation":[{"name":"New York University, New York, NY, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6775-9199","authenticated-orcid":false,"given":"Soumik","family":"Sarkar","sequence":"additional","affiliation":[{"name":"Iowa State University, Ames, IA, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,5,7]]},"reference":[{"key":"e_1_3_2_1_1_1","first-page":"96","article-title":"Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA","volume":"32","author":"Agarwal Alekh","year":"2019","unstructured":"Alekh Agarwal, Nan Jiang, Sham M Kakade, and Wen Sun. 2019. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep 32 (2019), 96.","journal-title":"Tech. Rep"},{"key":"e_1_3_2_1_2_1","volume-title":"Pushing RL Boundaries: Integrating Foundational Models, e.g. LLMs and VLMs, into Reinforcement Learning. Towards Data Science","author":"Aghapour Elahe","year":"2023","unstructured":"Elahe Aghapour and Salar Rahili. 2023. Pushing RL Boundaries: Integrating Foundational Models, e.g. LLMs and VLMs, into Reinforcement Learning. Towards Data Science (2023)."},{"key":"e_1_3_2_1_3_1","volume-title":"Uncertain Gradient Lower Bounds. In 8th International Conference on Learning Representations, ICLR 2020","author":"Ash Jordan T.","year":"2020","unstructured":"Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, and Alekh Agarwal. 2020. Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=ryghZJBKPS"},{"key":"e_1_3_2_1_4_1","unstructured":"Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan et al. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862 (2022)."},{"key":"e_1_3_2_1_5_1","unstructured":"Lucas Beyer Andreas Steiner Andr\u00e9 Susano Pinto Alexander Kolesnikov Xiao Wang Daniel Salz Maxim Neumann Ibrahim Alabdulmohsin Michael Tschannen Emanuele Bugliarello Thomas Unterthiner Daniel Keysers Skanda Koppula Fangyu Liu Adam Grycner Alexey Gritsenko Neil Houlsby Manoj Kumar Keran Rong Julian Eisenschlos Rishabh Kabra Matthias Bauer Matko Bo\u0161njak Xi Chen Matthias Minderer Paul Voigtlaender Ioana Bica Ivana Balazevic Joan Puigcerver Pinelopi Papalampidi Olivier Henaff Xi Xiong Radu Soricut Jeremiah Harmsen and Xiaohua Zhai. 2024. PaliGemma: A versatile 3B VLM for transfer. arXiv:2407.07726 [cs.CV] https:\/\/arxiv.org\/abs\/2407.07726"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01370"},{"key":"e_1_3_2_1_7_1","unstructured":"An-Chieh Cheng Hongxu Yin Yang Fu Qiushan Guo Ruihan Yang Jan Kautz Xiaolong Wang and Sifei Liu. 2024. SpatialRGPT: Grounded Spatial Reasoning in Vision-Language Models. In NeurIPS."},{"key":"e_1_3_2_1_8_1","volume-title":"Garnett (Eds.)","volume":"30","author":"Christiano Paul F","year":"2017","unstructured":"Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the 34th International Conference on Machine Learning -","volume":"70","author":"Gal Yarin","year":"2017","unstructured":"Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep Bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML '17). JMLR.org, 1183--1192."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Jensen Gao Bidipta Sarkar Fei Xia Ted Xiao Jiajun Wu Brian Ichter Anirudha Majumdar and Dorsa Sadigh. 2024. Physically Grounded Vision-Language Models for Robotic Manipulation. arXiv:2309.02561 [cs.RO] https:\/\/arxiv.org\/abs\/2309.02561","DOI":"10.1109\/ICRA57147.2024.10610090"},{"key":"e_1_3_2_1_11_1","volume-title":"International conference on machine learning. PMLR","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2024.3381175"},{"key":"e_1_3_2_1_13_1","unstructured":"Neil Houlsby Ferenc Husz\u00e1r Zoubin Ghahramani and M\u00e1t\u00e9 Lengyel. 2011. Bayesian Active Learning for Classification and Preference Learning. arXiv:1112.5745 [stat.ML] https:\/\/arxiv.org\/abs\/1112.5745"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.215"},{"key":"e_1_3_2_1_15_1","volume-title":"Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627","author":"Juliani Arthur","year":"2020","unstructured":"Arthur Juliani, Vincent-Pierre Berges, Ervin Teng, Andrew Cohen, Jonathan Harper, Chris Elion, Chris Goy, Yuan Gao, Hunter Henry, Marwan Mattar, and Danny Lange. 2020. Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627 (2020). https:\/\/arxiv.org\/pdf\/1809.02627.pdf"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622737.1622748"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1177\/14759217221150376"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539476"},{"key":"e_1_3_2_1_19_1","volume-title":"Victoria Fernandez Abrevaya, and Michael J. Black","author":"Kulits Peter","year":"2024","unstructured":"Peter Kulits, Haiwen Feng, Weiyang Liu, Victoria Fernandez Abrevaya, and Michael J. Black. 2024. Re-Thinking Inverse Graphics With Large Language Models. Transactions on Machine Learning Research (2024). https:\/\/openreview.net\/forum?id=u0eiu1MTS7"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-72970-6_3"},{"key":"e_1_3_2_1_21_1","unstructured":"Trung Quoc Luong Xinbo Zhang Zhanming Jie Peng Sun Xiaoran Jin and Hang Li. 2024. ReFT: Reasoning with Reinforced Fine-Tuning. arXiv:2401.08967 [cs.CL] https:\/\/arxiv.org\/abs\/2401.08967"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10055-019-00399-5"},{"key":"e_1_3_2_1_23_1","volume-title":"Oh (Eds.)","volume":"35","author":"Ouyang Long","year":"2022","unstructured":"Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 27730--27744. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2022\/file\/b1efde53be364a73914f58805a001731-Paper-Conference.pdf"},{"key":"e_1_3_2_1_24_1","volume-title":"Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pam Mishkin, Jack Clark, et al.","author":"Radford Alec","year":"2021","unstructured":"Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pam Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020 (2021)."},{"key":"e_1_3_2_1_25_1","first-page":"1","article-title":"Stable-Baselines3: Reliable Reinforcement Learning Implementations","volume":"22","author":"Raffin Antonin","year":"2021","unstructured":"Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. 2021. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22, 268 (2021), 1--8. http:\/\/jmlr.org\/papers\/v22\/20-1364.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_1_26_1","unstructured":"Pengzhen Ren Yun Xiao Xiaojun Chang Po-Yao Huang Zhihui Li Brij B. Gupta Xiaojiang Chen and Xin Wang. 2021. A Survey of Deep Active Learning. arXiv:2009.00236 [cs.LG] https:\/\/arxiv.org\/abs\/2009.00236"},{"key":"e_1_3_2_1_27_1","unstructured":"Soumyadip Sengupta et al. 2023. Neural Inverse Rendering of an Indoor Scene From a Single Image. arXiv preprint arXiv:2312.03275 (2023)."},{"key":"e_1_3_2_1_28_1","unstructured":"Nisan Stiennon Long Ouyang Jeff Wu Daniel M. Ziegler Ryan Lowe Chelsea Voss Alec Radford Dario Amodei and Paul Christiano. 2022. Learning to summarize from human feedback. arXiv:2009.01325 [cs.CL] https:\/\/arxiv.org\/abs\/2009.01325"},{"key":"e_1_3_2_1_29_1","volume-title":"DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models. In 8th Annual Conference on Robot Learning. https:\/\/openreview.net\/forum?id=928V4Umlys","author":"Tian Xiaoyu","year":"2024","unstructured":"Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, XianPeng Lang, and Hang Zhao. 2024. DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models. In 8th Annual Conference on Robot Learning. https:\/\/openreview.net\/forum?id=928V4Umlys"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2018.00143"},{"key":"e_1_3_2_1_31_1","unstructured":"Unity Technologies. 2022. Unity. https:\/\/unity.com\/ Game development platform."},{"key":"e_1_3_2_1_32_1","volume-title":"Subhadeep Chakraborty, and Soumik Sarkar.","author":"Waite Joshua R.","year":"2023","unstructured":"Joshua R. Waite, Jiale Feng, Riley Tavassoli, Laura Harris, Sin Yong Tan, Subhadeep Chakraborty, and Soumik Sarkar. 2023. Active shooter detection and robust tracking utilizing supplemental synthetic data. arXiv:2309.03381 [cs.CV] https:\/\/arxiv.org\/abs\/2309.03381"},{"key":"e_1_3_2_1_33_1","first-page":"59008","article-title":"Fine-grained human feedback gives better rewards for language model training","volume":"36","author":"Wu Zeqiu","year":"2023","unstructured":"Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A Smith, Mari Ostendorf, and Hannaneh Hajishirzi. 2023. Fine-grained human feedback gives better rewards for language model training. Advances in Neural Information Processing Systems 36 (2023), 59008--59033.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/IAVVC63304.2024.10786471"},{"key":"e_1_3_2_1_35_1","volume-title":"Learning Interactive Real-World Simulators. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=sFyTZEqmUY","author":"Yang Sherry","year":"2024","unstructured":"Sherry Yang, Yilun Du, Seyed Kamyar Seyed Ghasemipour, Jonathan Tompson, Leslie Pack Kaelbling, Dale Schuurmans, and Pieter Abbeel. 2024. Learning Interactive Real-World Simulators. In The Twelfth International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=sFyTZEqmUY"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00018"},{"key":"e_1_3_2_1_37_1","volume-title":"International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=KRLUvxh8uaX","author":"Yuksekgonul Mert","year":"2023","unstructured":"Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, and James Zou. 2023. When and why Vision-Language Models behave like Bags-of-Words, and what to do about it?. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=KRLUvxh8uaX"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/SSCI47803.2020.9308468"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIV.2024.3402136"}],"event":{"name":"ICCPS '25: ACM\/IEEE 16th International Conference on Cyber-Physical Systems","sponsor":["SIGBED ACM Special Interest Group on Embedded Systems"],"location":"Irvine CA USA","acronym":"ICCPS '25"},"container-title":["Proceedings of the ACM\/IEEE 16th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2025)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716550.3722033","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716550.3722033","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T14:02:38Z","timestamp":1756994558000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716550.3722033"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,6]]},"references-count":39,"alternative-id":["10.1145\/3716550.3722033","10.1145\/3716550"],"URL":"https:\/\/doi.org\/10.1145\/3716550.3722033","relation":{},"subject":[],"published":{"date-parts":[[2025,5,6]]},"assertion":[{"value":"2025-05-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}