{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:36Z","timestamp":1750220316356,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,1,8]],"date-time":"2022-01-08T00:00:00Z","timestamp":1641600000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Robert Bosch Center for Data Science and Artificial Intelligence"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,1,8]]},"DOI":"10.1145\/3493700.3493716","type":"proceedings-article","created":{"date-parts":[[2022,1,7]],"date-time":"2022-01-07T23:54:21Z","timestamp":1641599661000},"page":"63-71","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Smooth Imitation Learning via Smooth Costs and Smooth Policies"],"prefix":"10.1145","author":[{"given":"Sapana","family":"Chaudhary","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering, Texas A&amp;M University, USA"}]},{"given":"Balaraman","family":"Ravindran","sequence":"additional","affiliation":[{"name":"Robert Bosch Centre for Data Science and AI, India and Computer Science and Engineering, Indian Institute of Technology Madras, India"}]}],"member":"320","published-online":{"date-parts":[[2022,1,8]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015430"},{"key":"e_1_3_2_1_2_1","volume-title":"A survey of robot learning from demonstration. Robotics and autonomous systems 57, 5","author":"Argall D","year":"2009","unstructured":"Brenna\u00a0D Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and autonomous systems 57, 5 (2009), 469\u2013483."},{"key":"e_1_3_2_1_3_1","volume-title":"NIPS 2016 Workshop on Adversarial Training. In review for ICLR, Vol.\u00a02016","author":"Arjovsky Martin","year":"2017","unstructured":"Martin Arjovsky and L\u00e9on Bottou. 2017. Towards principled methods for training generative adversarial networks. In NIPS 2016 Workshop on Adversarial Training. In review for ICLR, Vol.\u00a02016."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"crossref","unstructured":"Michael Bain and Claude Sammut. 1995. A Framework for Behavioural Cloning.. In Machine Intelligence 15. 103\u2013129.","DOI":"10.1093\/oso\/9780198538677.003.0006"},{"key":"e_1_3_2_1_5_1","unstructured":"Lionel Blond\u00e9 Pablo Strasser and Alexandros Kalousis. 2020. Lipschitzness Is All You Need To Tame Off-policy Generative Adversarial Imitation Learning. arXiv preprint arXiv:2006.16785(2020)."},{"key":"e_1_3_2_1_6_1","unstructured":"Greg Brockman Vicki Cheung Ludwig Pettersson Jonas Schneider John Schulman Jie Tang and Wojciech Zaremba. 2016. Openai gym. arXiv preprint arXiv:1606.01540(2016)."},{"volume-title":"Robot programming by demonstration","author":"Calinon Sylvain","key":"e_1_3_2_1_7_1","unstructured":"Sylvain Calinon. 2009. Robot programming by demonstration. EPFL Press."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.507"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0896-6273(02)00963-7"},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the 33rd International Conference on Machine Learning, Vol.\u00a048","author":"Finn Chelsea","year":"2016","unstructured":"Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Guided cost learning: Deep inverse optimal control via policy optimization. In Proceedings of the 33rd International Conference on Machine Learning, Vol.\u00a048."},{"key":"e_1_3_2_1_11_1","unstructured":"Justin Fu Katie Luo and Sergey Levine. 2017. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. arXiv preprint arXiv:1710.11248(2017)."},{"key":"e_1_3_2_1_12_1","unstructured":"Ishaan Gulrajani Faruk Ahmed Martin Arjovsky Vincent Dumoulin and Aaron Courville. 2017. Improved Training of Wasserstein GANs. arXiv preprint arXiv:1704.00028(2017)."},{"key":"e_1_3_2_1_13_1","volume-title":"International conference on machine learning. PMLR","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861\u20131870."},{"key":"e_1_3_2_1_14_1","unstructured":"Dan Hendrycks Mantas Mazeika Saurav Kadavath and Dawn Song. 2019. Using self-supervised learning can improve model robustness and uncertainty. In Advances in Neural Information Processing Systems. 15663\u201315674."},{"key":"e_1_3_2_1_15_1","unstructured":"Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems. 4565\u20134573."},{"key":"e_1_3_2_1_16_1","unstructured":"Martin\u00a0Sleziak. [n.d.]. Is the maximum function Lipschitz continuous?Mathematics Stack Exchange. arXiv:https:\/\/math.stackexchange.com\/q\/1742410https:\/\/math.stackexchange.com\/q\/1742410 URL:https:\/\/math.stackexchange.com\/q\/1742410 (version: 2016-04-14)."},{"key":"e_1_3_2_1_17_1","volume-title":"Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437(2019).","author":"Jiang Haoming","year":"2019","unstructured":"Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Tuo Zhao. 2019. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv preprint arXiv:1911.03437(2019)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2018.8618996"},{"key":"e_1_3_2_1_19_1","volume-title":"In Proc. 19th International Conference on Machine Learning. Citeseer.","author":"Kakade Sham","year":"2002","unstructured":"Sham Kakade and John Langford. 2002. Approximately optimal approximate reinforcement learning. In In Proc. 19th International Conference on Machine Learning. Citeseer."},{"key":"e_1_3_2_1_20_1","unstructured":"H\u00a0Jin Kim Michael\u00a0I Jordan Shankar Sastry and Andrew\u00a0Y Ng. 2004. Autonomous helicopter flight via reinforcement learning. In Advances in neural information processing systems. 799\u2013806."},{"key":"e_1_3_2_1_21_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014).","author":"Kingma P","year":"2014","unstructured":"Diederik\u00a0P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980(2014)."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2007.363075"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3390\/robotics2030122"},{"key":"e_1_3_2_1_24_1","volume-title":"International Conference on Machine Learning. PMLR, 680\u2013688","author":"Le Hoang","year":"2016","unstructured":"Hoang Le, Andrew Kang, Yisong Yue, and Peter Carr. 2016. Smooth imitation learning for online sequence prediction. In International Conference on Machine Learning. PMLR, 680\u2013688."},{"key":"e_1_3_2_1_25_1","unstructured":"Sergey Levine Zoran Popovic and Vladlen Koltun. 2011. Nonlinear inverse reinforcement learning with gaussian processes. In Advances in neural information processing systems. 19\u201327."},{"key":"e_1_3_2_1_26_1","unstructured":"Timothy\u00a0P Lillicrap Jonathan\u00a0J Hunt Alexander Pritzel Nicolas Heess Tom Erez Yuval Tassa David Silver and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015)."},{"key":"e_1_3_2_1_27_1","volume-title":"Virtual adversarial training: a regularization method for supervised and semi-supervised learning","author":"Miyato Takeru","year":"2018","unstructured":"Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. 2018. Virtual adversarial training: a regularization method for supervised and semi-supervised learning. IEEE transactions on pattern analysis and machine intelligence 41, 8(2018), 1979\u20131993."},{"key":"e_1_3_2_1_28_1","unstructured":"Andrew\u00a0Y Ng Stuart\u00a0J Russell 2000. Algorithms for inverse reinforcement learning.. In Icml. 663\u2013670."},{"key":"e_1_3_2_1_29_1","unstructured":"Matteo Papini Andrea Battistello and Marcello Restelli. [n.d.]. Safe Exploration in Gaussian Policy Gradient. ([n. d.])."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Jason Pazis and Ronald Parr. 2011. Non-Parametric Approximate Linear Programming for MDPs.. In AAAI.","DOI":"10.1609\/aaai.v25i1.7930"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-control-053018-023825"},{"key":"e_1_3_2_1_32_1","volume-title":"Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 661\u2013668","author":"Ross St\u00e9phane","year":"2010","unstructured":"St\u00e9phane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 661\u2013668."},{"key":"e_1_3_2_1_33_1","unstructured":"Stephane Ross and J\u00a0Andrew Bagnell. 2014. Reinforcement and imitation learning via interactive no-regret learning. arXiv preprint arXiv:1406.5979(2014)."},{"key":"e_1_3_2_1_34_1","volume-title":"Proceedings of the fourteenth international conference on artificial intelligence and statistics. 627\u2013635","author":"Ross St\u00e9phane","year":"2011","unstructured":"St\u00e9phane Ross, Geoffrey Gordon, and Drew Bagnell. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics. 627\u2013635."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279964"},{"key":"e_1_3_2_1_36_1","volume-title":"Towards Behavioural Cloning for Autonomous Driving. In 2019 Third IEEE International Conference on Robotic Computing (IRC). IEEE, 560\u2013567","author":"Saksena Saumya\u00a0Kumaar","year":"2019","unstructured":"Saumya\u00a0Kumaar Saksena, B Navaneethkrishnan, Sinchana Hegde, Pragadeesh Raja, and Ravi\u00a0M Vishwanath. 2019. Towards Behavioural Cloning for Autonomous Driving. In 2019 Third IEEE International Conference on Robotic Computing (IRC). IEEE, 560\u2013567."},{"key":"e_1_3_2_1_37_1","volume-title":"Is imitation learning the route to humanoid robots?Trends in cognitive sciences 3, 6","author":"Schaal Stefan","year":"1999","unstructured":"Stefan Schaal. 1999. Is imitation learning the route to humanoid robots?Trends in cognitive sciences 3, 6 (1999), 233\u2013242."},{"key":"e_1_3_2_1_38_1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML-15)","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1889\u20131897."},{"key":"e_1_3_2_1_39_1","unstructured":"John Schulman Philipp Moritz Sergey Levine Michael Jordan and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438(2015)."},{"key":"e_1_3_2_1_40_1","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347(2017)."},{"key":"e_1_3_2_1_41_1","unstructured":"Qianli Shen Yan Li Haoming Jiang Zhaoran Wang and Tuo Zhao. 2020. Deep Reinforcement Learning with Smooth Policy. arXiv preprint arXiv:2003.09534(2020)."},{"volume-title":"Introduction to reinforcement learning. Vol.\u00a0135","author":"Sutton S","key":"e_1_3_2_1_42_1","unstructured":"Richard\u00a0S Sutton, Andrew\u00a0G Barto, 1998. Introduction to reinforcement learning. Vol.\u00a0135. MIT press Cambridge."},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_3_2_1_44_1","unstructured":"Qizhe Xie Zihang Dai Eduard Hovy Minh-Thang Luong and Quoc\u00a0V Le. 2019. Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848(2019)."},{"key":"e_1_3_2_1_45_1","unstructured":"Hongyang Zhang Yaodong Yu Jiantao Jiao Eric\u00a0P Xing Laurent\u00a0El Ghaoui and Michael\u00a0I Jordan. 2019. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573(2019)."},{"key":"e_1_3_2_1_46_1","first-page":"1433","article-title":"Maximum Entropy Inverse Reinforcement Learning.. In AAAI, Vol.\u00a08","author":"Ziebart D","year":"2008","unstructured":"Brian\u00a0D Ziebart, Andrew\u00a0L Maas, J\u00a0Andrew Bagnell, and Anind\u00a0K Dey. 2008. Maximum Entropy Inverse Reinforcement Learning.. In AAAI, Vol.\u00a08. Chicago, IL, USA, 1433\u20131438.","journal-title":"Chicago, IL, USA"}],"event":{"name":"CODS-COMAD 2022: 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)","sponsor":["SIGGRAPH ACM Special Interest Group on Computer Graphics and Interactive Techniques"],"location":"Bangalore India","acronym":"CODS-COMAD 2022"},"container-title":["Proceedings of the 5th Joint International Conference on Data Science &amp; Management of Data (9th ACM IKDD CODS and 27th COMAD)"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3493700.3493716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3493700.3493716","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:51Z","timestamp":1750191111000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3493700.3493716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,8]]},"references-count":46,"alternative-id":["10.1145\/3493700.3493716","10.1145\/3493700"],"URL":"https:\/\/doi.org\/10.1145\/3493700.3493716","relation":{},"subject":[],"published":{"date-parts":[[2022,1,8]]},"assertion":[{"value":"2022-01-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}