{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:36Z","timestamp":1750220316751,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,1,8]],"date-time":"2022-01-08T00:00:00Z","timestamp":1641600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Robert Bosch Centre for Data Science and AI"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,1,8]]},"DOI":"10.1145\/3493700.3493712","type":"proceedings-article","created":{"date-parts":[[2022,1,7]],"date-time":"2022-01-07T23:54:21Z","timestamp":1641599661000},"page":"1-9","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["An Active Learning Framework for Efficient Robust Policy Search"],"prefix":"10.1145","author":[{"given":"Sai Kiran","family":"Narayanaswami","sequence":"first","affiliation":[{"name":"Indian Institute of Technology, Madras, IN"}]},{"given":"Nandan","family":"Sudarsanam","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology - Madras, IN"}]},{"given":"Balaraman","family":"Ravindran","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Madras, IN"}]}],"member":"320","published-online":{"date-parts":[[2022,1,8]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/2986459.2986717"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1214\/17-EJS1341SI"},{"key":"e_1_3_2_1_3_1","volume-title":"Proceedings of the 30th International Conference on International Conference on Machine Learning -","volume":"28","author":"Agrawal Shipra","year":"2013","unstructured":"Shipra Agrawal and Navin Goyal. 2013. Thompson Sampling for Contextual Bandits with Linear Payoffs. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87987-9_25"},{"key":"e_1_3_2_1_5_1","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. OpenAI Gym. arXiv:arXiv:1606.01540"},{"volume-title":"Algorithmic Learning Theory","author":"Carpentier Alexandra","key":"e_1_3_2_1_6_1","unstructured":"Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, R\u00e9mi Munos, and Peter Auer. 2011. Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits. In Algorithmic Learning Theory. Springer Berlin Heidelberg."},{"key":"e_1_3_2_1_7_1","unstructured":"Prafulla Dhariwal Christopher Hesse Oleg Klimov Alex Nichol Matthias Plappert Alec Radford John Schulman Szymon Sidor Yuhuai Wu and Peter Zhokhov. 2017. OpenAI Baselines. https:\/\/github.com\/openai\/baselines."},{"key":"e_1_3_2_1_8_1","unstructured":"Takuya Hiraoka Takahisa Imagawa Tatsuya Mori Takashi Onishi and Yoshimasa Tsuruoka. 2019. Learning Robust Options by Conditional Value at Risk Optimization. In NeurIPS. 2615\u20132625. http:\/\/papers.nips.cc\/paper\/8530-learning-robust-options-by-conditional-value-at-risk-optimization"},{"key":"e_1_3_2_1_9_1","unstructured":"Thanard Kurutach Ignasi Clavera Yan Duan Aviv Tamar and Pieter Abbeel. 2018. Model-Ensemble Trust-Region Policy Optimization. In ICLR."},{"key":"e_1_3_2_1_10_1","unstructured":"Gilwoo Lee Brian Hou Aditya Mandalika Jeongseok Lee and Siddhartha\u00a0S. Srinivasa. 2019. Bayesian Policy Optimization for Model Uncertainty. In ICLR. https:\/\/openreview.net\/forum?id=SJGvns0qK7"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772758"},{"key":"e_1_3_2_1_12_1","unstructured":"Timothy\u00a0P. Lillicrap Jonathan\u00a0J. Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In ICLR."},{"volume-title":"2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS).","author":"Mordatch I.","key":"e_1_3_2_1_13_1","unstructured":"I. Mordatch, K. Lowrey, and E. Todorov. 2015. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids. In 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS)."},{"key":"e_1_3_2_1_14_1","unstructured":"Jun Morimoto and Kenji Doya. 2001. Robust Reinforcement Learning. In Advances in Neural Information Processing Systems 13."},{"key":"e_1_3_2_1_15_1","unstructured":"Emilio Parisotto Jimmy Ba and Ruslan Salakhutdinov. 2016. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning. In ICLR."},{"volume-title":"Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA).","author":"Peng B.","key":"e_1_3_2_1_16_1","unstructured":"X.\u00a0B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel. 2018. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA)."},{"volume-title":"Proceedings of the 34th International Conference on Machine Learning. PMLR.","author":"Pinto Lerrel","key":"e_1_3_2_1_17_1","unstructured":"Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta. [n.d.]. Robust Adversarial Reinforcement Learning. In Proceedings of the 34th International Conference on Machine Learning. PMLR."},{"key":"e_1_3_2_1_18_1","volume-title":"EPOpt: Learning Robust Neural Network Policies Using Model Ensembles. In International Conference on Learning Representations.","author":"Rajeswaran Aravind","year":"2017","unstructured":"Aravind Rajeswaran, Sarvjeet Ghotra, Balaraman Ravindran, and Sergey Levine. 2017. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Fabio Ramos Rafael Possas and Dieter Fox. 2019. BayesSim: adaptive domain randomization via probabilistic inference for robotics simulators. In Robotics: Science and Systems (RSS). https:\/\/arxiv.org\/abs\/1906.01728","DOI":"10.15607\/RSS.2019.XV.029"},{"key":"e_1_3_2_1_20_1","unstructured":"Andrei\u00a0A. Rusu Sergio\u00a0Gomez Colmenarejo Caglar Gulcehre Guillaume Desjardins James Kirkpatrick Razvan Pascanu Volodymyr Mnih Koray Kavukcuoglu and Raia Hadsell. 2016. Policy distillation. In ICLR."},{"key":"e_1_3_2_1_21_1","unstructured":"Andrei\u00a0A. Rusu Neil\u00a0C. Rabinowitz Guillaume Desjardins Hubert Soyer James Kirkpatrick Koray Kavukcuoglu Razvan Pascanu and Raia Hadsell. 2016. Progressive Neural Networks. CoRR abs\/1606.04671(2016)."},{"key":"e_1_3_2_1_22_1","volume-title":"Proceedings of the 32Nd International Conference on International Conference on Machine Learning -","volume":"37","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, and Pieter Abbeel. 2015. Trust Region Policy Optimization. In Proceedings of the 32Nd International Conference on International Conference on Machine Learning - Volume 37."},{"key":"e_1_3_2_1_23_1","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR).","author":"Schulman John","year":"2016","unstructured":"John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2016. High-Dimensional Continuous Control Using Generalized Advantage Estimation. In Proceedings of the International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_2_1_24_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal Policy Optimization Algorithms. CoRR abs\/1707.06347(2017)."},{"key":"e_1_3_2_1_26_1","volume-title":"International Conference on Learning Representations.","author":"Sharma Sahil","year":"2018","unstructured":"Sahil Sharma, Ashutosh\u00a0Kumar Jha, Parikshit\u00a0S Hegde, and Balaraman Ravindran. 2018. Learning to Multi-Task by Active Sampling. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_27_1","volume-title":"Active Learning in Linear Stochastic Bandits. In NIPS 2013 Workshop on Bayesian Optimization in Theory and Practice.","author":"Soare Marta","year":"2013","unstructured":"Marta Soare, Alessandro Lazaric, and Remi Munos. 2013. Active Learning in Linear Stochastic Bandits. In NIPS 2013 Workshop on Bayesian Optimization in Theory and Practice."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Aviv Tamar Yonatan Glassner and Shie Mannor. 2015. Optimizing the CVaR via Sampling. In AAAI.","DOI":"10.1609\/aaai.v29i1.9561"},{"key":"e_1_3_2_1_29_1","volume-title":"Taylor and Peter Stone","author":"E.","year":"2009","unstructured":"Matthew\u00a0E. Taylor and Peter Stone. 2009. Transfer Learning for Reinforcement Learning Domains: A Survey. J. Mach. Learn. Res.(2009)."},{"volume-title":"Domain randomization for transferring deep neural networks from simulation to the real world","author":"Tobin Josh","key":"e_1_3_2_1_30_1","unstructured":"Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel. 2017. Domain randomization for transferring deep neural networks from simulation to the real world. In IROS. IEEE, 23\u201330."},{"volume-title":"2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems.","author":"Todorov E.","key":"e_1_3_2_1_31_1","unstructured":"E. Todorov, T. Erez, and Y. Tassa. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems."},{"key":"e_1_3_2_1_32_1","volume-title":"Optimizing Walking Controllers for Uncertain Inputs and Environments. In ACM SIGGRAPH 2010 Papers.","author":"Wang M.","year":"2010","unstructured":"Jack\u00a0M. Wang, David\u00a0J. Fleet, and Aaron Hertzmann. 2010. Optimizing Walking Controllers for Uncertain Inputs and Environments. In ACM SIGGRAPH 2010 Papers."},{"key":"e_1_3_2_1_33_1","unstructured":"Ziyu Wang Victor Bapst Nicolas Heess Volodymyr Mnih Remi Munos Koray Kavukcuoglu and Nando de Freitas. 2017. Sample Efficient Actor-Critic with Experience Replay. In ICLR."},{"key":"e_1_3_2_1_34_1","unstructured":"Wenhao Yu Jie Tan Karen Liu and Greg Turk. 2017. Preparing for the Unknown: Learning a Universal Policy with Online System Identification. In Robotics: Science and Systems (RSS)."}],"event":{"name":"CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"],"location":"Bangalore India","acronym":"CODS-COMAD 2022"},"container-title":["Proceedings of the 5th Joint International Conference on Data Science &amp; Management of Data (9th ACM IKDD CODS and 27th COMAD)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3493700.3493712","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3493700.3493712","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:51Z","timestamp":1750191111000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3493700.3493712"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,8]]},"references-count":33,"alternative-id":["10.1145\/3493700.3493712","10.1145\/3493700"],"URL":"https:\/\/doi.org\/10.1145\/3493700.3493712","relation":{},"subject":[],"published":{"date-parts":[[2022,1,8]]},"assertion":[{"value":"2022-01-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}