{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T05:42:31Z","timestamp":1751089351692},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,7,6]],"date-time":"2023-07-06T00:00:00Z","timestamp":1688601600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,6]],"date-time":"2023-07-06T00:00:00Z","timestamp":1688601600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Robot"],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert\u2019s observations and state-control trajectory. We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features. Since the expert cost is not directly observable, the model parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. We propose a new model of expert behavior that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. Our approach allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories. We analyze the different components of our model in a minigrid environment. We also demonstrate that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.<\/jats:p>","DOI":"10.1007\/s10514-023-10118-4","type":"journal-article","created":{"date-parts":[[2023,7,6]],"date-time":"2023-07-06T11:02:35Z","timestamp":1688641355000},"page":"809-830","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning"],"prefix":"10.1007","volume":"47","author":[{"given":"Tianyu","family":"Wang","sequence":"first","affiliation":[]},{"given":"Vikas","family":"Dhiman","sequence":"additional","affiliation":[]},{"given":"Nikolay","family":"Atanasov","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,7,6]]},"reference":[{"key":"10118_CR1","doi-asserted-by":"crossref","unstructured":"Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International conference on machine learning (p. 1).","DOI":"10.1145\/1015330.1015430"},{"issue":"5","key":"10118_CR2","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1016\/j.robot.2008.10.024","volume":"57","author":"BD Argall","year":"2009","unstructured":"Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469\u2013483.","journal-title":"Robotics and Autonomous Systems"},{"key":"10118_CR3","unstructured":"Atkeson, C. G., & Schaal, S. (1997). Robot learning from demonstration. In International conference on machine learning (Vol. 97, pp. 12\u201320)."},{"issue":"12","key":"10118_CR4","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12), 2481\u20132495.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"10118_CR5","unstructured":"Bajcsy, A., Losey, D. P., O\u2019malley, M. K., & Dragan, A. D. (2017). Learning robot objectives from physical human interaction. In Conference on robot learning."},{"key":"10118_CR6","unstructured":"Baker, C. L., Tenenbaum, J. B., & Saxe, R. R. (2007). Goal inference as inverse planning. In Annual meeting of the cognitive science society (Vol. 29)."},{"key":"10118_CR7","volume-title":"Dynamic programming and optimal control","author":"D Bertsekas","year":"1995","unstructured":"Bertsekas, D. (1995). Dynamic programming and optimal control. Athena Scientific."},{"key":"10118_CR8","unstructured":"Brown, D. S., Goo, W., & Niekum, S. (2020). Betterthan-demonstrator imitation learning via automatically-ranked demonstrations. In Conference on robot learning (pp. 330\u2013359)."},{"key":"10118_CR9","unstructured":"Chen, L., Paleja, R., & Gombolay, M. (2021). Learning from suboptimal demonstration via self-supervised reward regression. In Conference on robot learning."},{"key":"10118_CR10","doi-asserted-by":"crossref","unstructured":"Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018). Encoder\u2013decoder with atrous separable convolution for semantic image segmentation. In European conference on computer vision (pp. 801\u2013818).","DOI":"10.1007\/978-3-030-01234-2_49"},{"key":"10118_CR11","unstructured":"Chevalier-Boisvert, M., Willems, L., & Pal, S. (2018). Minimalistic grid-world environment for OpenAI Gym. https:\/\/github.com\/ maximecb\/gym-minigrid. GitHub."},{"key":"10118_CR12","doi-asserted-by":"crossref","unstructured":"Choy, C., Gwak, J., & Savarese, S. (2019). 4D spatio-temporal convnets: Minkowski convolutional neural networks. In IEEE conference on computer vision and pattern recognition (pp. 3075\u20133084).","DOI":"10.1109\/CVPR.2019.00319"},{"key":"10118_CR13","unstructured":"Cohen, T., & Welling, M. (2016). Group equivariant convolutional networks. In International conference on machine learning (pp. 2990\u20132999)."},{"key":"10118_CR14","doi-asserted-by":"crossref","unstructured":"Cortinhal, T., Tzelepis, G., & Aksoy, E. E. (2020). SalsaNext: Fast, uncertainty-aware semantic segmentation of lidar point clouds for autonomous driving. arXiv preprint arXiv:2003.03653.","DOI":"10.1007\/978-3-030-64559-5_16"},{"key":"10118_CR15","doi-asserted-by":"crossref","unstructured":"Dohan, D., Matejek, B., & Funkhouser, T. (2015). Learning hierarchical semantic segmentations of lidar data. In International conference on 3D vision (pp. 273\u2013281).","DOI":"10.1109\/3DV.2015.38"},{"key":"10118_CR16","unstructured":"Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V. (2017). CARLA: An open urban driving simulator. In Proceedings of the 1st annual conference on robot learning (pp. 1\u201316)."},{"key":"10118_CR17","unstructured":"Finn, C., Christiano, P., Abbeel, P., & Levine, S. (2016). A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852."},{"key":"10118_CR18","unstructured":"Finn, C., Levine, S., & Abbeel, P. (2016). Guided cost learning: Deep inverse optimal control via policy optimization. In: International conference on machine learning (pp. 49\u201358)."},{"key":"10118_CR19","unstructured":"Fu, J., Luo, K., & Levine, S. (2018). Learning robust rewards with adverserial inverse reinforcement learning. In International conference on learning representations."},{"issue":"2","key":"10118_CR20","doi-asserted-by":"publisher","first-page":"790","DOI":"10.1109\/LRA.2020.2965390","volume":"5","author":"L Gan","year":"2020","unstructured":"Gan, L., Zhang, R., Grizzle, J. W., Eustice, R. M., & Ghaffari, M. (2020). Bayesian spatial kernel smoothing for scalable dense semantic mapping. IEEE Robotics and Automation Letters, 5(2), 790\u2013797.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"10118_CR21","unstructured":"Ghasemipour, S. K. S., Zemel, R., & Gu, S. (2020). A divergence minimization perspective on imitation learning methods. In Conference on robot learning (pp. 1259\u20131277)."},{"key":"10118_CR22","volume-title":"Deep learning","author":"I Goodfellow","year":"2016","unstructured":"Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press."},{"key":"10118_CR23","doi-asserted-by":"crossref","unstructured":"Gupta, S., Davidson, J., Levine, S., Sukthankar, R., & Malik, J. (2017). Cognitive mapping and planning for visual navigation. In Computer vision and pattern recognition (CVPR).","DOI":"10.1109\/CVPR.2017.769"},{"key":"10118_CR24","unstructured":"Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy-based policies. In International conference on machine learning (pp. 1352\u20131361)."},{"key":"10118_CR25","unstructured":"Ho, J., & Ermon, S. (2016). Generative adversarial imitation learning. In Advances in neural information processing systems (pp. 4565\u20134573)."},{"issue":"3","key":"10118_CR26","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1007\/s10514-012-9321-0","volume":"34","author":"A Hornung","year":"2013","unstructured":"Hornung, A., Wurm, K. M., Bennewitz, M., Stachniss, C., & Burgard, W. (2013). OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots, 34(3), 189\u2013206.","journal-title":"Autonomous Robots"},{"key":"10118_CR27","unstructured":"Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (Vol. 37, pp. 448\u2013456)."},{"key":"10118_CR28","doi-asserted-by":"publisher","first-page":"1296","DOI":"10.1177\/0278364915581193","volume":"34","author":"A Jain","year":"2015","unstructured":"Jain, A., Sharma, S., Joachims, T., & Saxena, A. (2015). Learning preferences for manipulation tasks from online coactive feedback. The International Journal of Robotics Research, 34, 1296\u20131313.","journal-title":"The International Journal of Robotics Research"},{"key":"10118_CR29","first-page":"4415","volume":"33","author":"HJ Jeon","year":"2020","unstructured":"Jeon, H. J., Milli, S., & Dragan, A. (2020). Rewardrational (implicit) choice: A unifying formalism for reward learning. Advances in Neural Information Processing Systems, 33, 4415\u20134426.","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"7","key":"10118_CR30","doi-asserted-by":"publisher","first-page":"846","DOI":"10.1177\/0278364911406761","volume":"30","author":"S Karaman","year":"2011","unstructured":"Karaman, S., & Frazzoli, E. (2011). Sampling-based algorithms for optimal motion planning. The International Journal of Robotics Research, 30(7), 846\u2013894.","journal-title":"The International Journal of Robotics Research"},{"key":"10118_CR31","doi-asserted-by":"crossref","unstructured":"Ke, L., Choudhury, S., Barnes, M., Sun, W., Lee, G., & Srinivasa, S. (2020). Imitation learning as f-divergence minimization. In International workshop on the algorithmic foundations of robotics.","DOI":"10.1007\/978-3-030-66723-8_19"},{"key":"10118_CR32","unstructured":"Khan, A., Zhang, C., Atanasov, N., Karydis, K., Kumar, V., & Lee, D. D. (2018). Memory augmented control networks. In International conference on learning representations."},{"key":"10118_CR33","unstructured":"Kingma, D. P., & Ba, J. (2014). ADAM: A method for stochastic optimization. In International conference on learning representations."},{"key":"10118_CR34","unstructured":"LaValle, S. (1998). Rapidly-exploring random trees: A new tool for path planning (TR 98-11). Department of Computer Science, Iowa State University."},{"key":"10118_CR35","unstructured":"Levine, S. (2018). Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909."},{"key":"10118_CR36","unstructured":"Levine, S., Popovic, Z., & Koltun, V. (2011). Nonlinear inverse reinforcement learning with Gaussian processes. In Advances in neural information processing systems (pp. 19\u201327)."},{"issue":"1","key":"10118_CR37","first-page":"1334","volume":"17","author":"S Levine","year":"2016","unstructured":"Levine, S., Finn, C., Darrell, T., & Abbeel, P. (2016). End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1), 1334\u20131373.","journal-title":"The Journal of Machine Learning Research"},{"key":"10118_CR38","unstructured":"Likhachev, M., Gordon, G., & Thrun, S. (2004). ARA*: Anytime A* with provable bounds on sub-optimality. In Advances in neural information processing systems (pp. 767\u2013774)."},{"issue":"2","key":"10118_CR39","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1109\/LRA.2019.2891028","volume":"4","author":"C Lu","year":"2019","unstructured":"Lu, C., van de Molengraft, M. J. G., & Dubbelman, G. (2019). Monocular semantic occupancy grid mapping with convolutional variational encoder\u2013decoder networks. IEEE Robotics and Automation Letters, 4(2), 445\u2013452.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"10118_CR40","doi-asserted-by":"crossref","unstructured":"Milioto, A., Vizzo, I., Behley, J., & Stachniss, C. (2019). RangeNet++: Fast and accurate lidar semantic segmentation. In IEEE\/RSJ international conference on intelligent robots and systems (IROS) (pp. 4213\u20134220).","DOI":"10.1109\/IROS40897.2019.8967762"},{"key":"10118_CR41","unstructured":"Neu, G., & Szepesv\u00e1ri, C. (2007). Apprenticeship learning using inverse reinforcement learning and gradient methods. In Conference on uncertainty in artificial intelligence (pp. 295\u2013302)."},{"key":"10118_CR42","unstructured":"Ng, A. Y., & Russell, S. (2000). Algorithms for inverse reinforcement learning. In International conference on machine learning (pp. 663\u2013670)."},{"key":"10118_CR43","doi-asserted-by":"crossref","unstructured":"Oleynikova, H., Taylor, Z., Fehr, M., Siegwart, R., & Nieto, J. (2017). Voxblox: Incremental 3D Euclidean signed distance fields for onboard MAV planning. In IEEE\/RSJ international conference on intelligent robots and systems (IROS) (pp. 1366\u20131373).","DOI":"10.1109\/IROS.2017.8202315"},{"issue":"2\u20133","key":"10118_CR44","doi-asserted-by":"publisher","first-page":"286","DOI":"10.1177\/0278364919880273","volume":"39","author":"Y Pan","year":"2020","unstructured":"Pan, Y., Cheng, C.-A., Saigol, K., Lee, K., Yan, X., Theodorou, E. A., & Boots, B. (2020). Imitation learning for agile autonomous driving. The International Journal of Robotics Research, 39(2\u20133), 286\u2013302.","journal-title":"The International Journal of Robotics Research"},{"key":"10118_CR45","doi-asserted-by":"crossref","unstructured":"Papandreou, G., Chen, L.-C., Murphy, K. P., & Yuille, A. L. (2015). Weakly- and semi-supervised learning of a deep convolutional network for semantic image segmentation. In IEEE international conference on computer vision (pp. 1742\u20131750).","DOI":"10.1109\/ICCV.2015.203"},{"key":"10118_CR46","doi-asserted-by":"crossref","unstructured":"Pastor, P., Hoffmann, H., Asfour, T., & Schaal, S. (2009). Learning and generalization of motor skills by learning from demonstration. In IEEE international conference on robotics and automation (pp. 763\u2013768).","DOI":"10.1109\/ROBOT.2009.5152385"},{"key":"10118_CR47","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., & Desmaison, A. (2019). Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems (pp. 8026\u20138037)."},{"key":"10118_CR48","doi-asserted-by":"crossref","unstructured":"Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E., & Levine, S. (2018). Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. In Proceedings of robotics: Science and systems (RSS).","DOI":"10.15607\/RSS.2018.XIV.049"},{"key":"10118_CR49","unstructured":"Ramachandran, D., & Amir, E. (2007). Bayesian inverse reinforcement learning. In International joint conferences on artificial intelligence organization (Vol. 7, pp. 2586\u20132591)."},{"key":"10118_CR50","doi-asserted-by":"crossref","unstructured":"Ratliff, N. D., Bagnell, J. A., & Zinkevich, M. A. (2006). Maximum margin planning. In International conference on machine learning (pp. 729\u2013736).","DOI":"10.1145\/1143844.1143936"},{"key":"10118_CR51","unstructured":"Ross, S., Gordon, G., & Bagnell, D. (2011). A reduction of imitation learning and structured prediction to no-regret online learning. In International conference on artificial intelligence and statistics (pp. 627\u2013635)."},{"key":"10118_CR52","doi-asserted-by":"crossref","unstructured":"Sengupta, S., Sturgess, P., Ladick\u1ef3, L., & Torr, P. H. (2012). Automatic dense visual semantic mapping from street-level imagery. In IEEE\/RSJ international conference on intelligent robots and systems (pp. 857\u2013862).","DOI":"10.1109\/IROS.2012.6385958"},{"key":"10118_CR53","volume-title":"Minimization methods for nondifferentiable functions","author":"NZ Shor","year":"2012","unstructured":"Shor, N. Z. (2012). Minimization methods for nondifferentiable functions (Vol. 3). Springer."},{"key":"10118_CR54","unstructured":"Song, Y. (2019). Inverse reinforcement learning for autonomous ground navigation using aerial and satellite observation data (unpublished master\u2019s thesis). Carnegie Mellon University."},{"issue":"4","key":"10118_CR55","doi-asserted-by":"publisher","first-page":"3749","DOI":"10.1109\/LRA.2018.2856268","volume":"3","author":"L Sun","year":"2018","unstructured":"Sun, L., Yan, Z., Zaganidis, A., Zhao, C., & Duckett, T. (2018). Recurrent-OctoMap: Learning state-based map refinement for long-term semantic mapping with 3D-Lidar data. IEEE Robotics and Automation Letters, 3(4), 3749\u20133756.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"10118_CR56","doi-asserted-by":"crossref","unstructured":"Tamar, A., Wu, Y., Thomas, G., Levine, S., & Abbeel, P. (2016). Value iteration networks. In Advances in neural information processing systems (pp. 2154\u20132162).","DOI":"10.24963\/ijcai.2017\/700"},{"key":"10118_CR57","unstructured":"Tew, P. A. (2016). An investigation of sparse tensor formats for tensor libraries (unpublished doctoral dissertation). Massachusetts Institute of Technology."},{"key":"10118_CR58","volume-title":"Probabilistic robotics","author":"S Thrun","year":"2005","unstructured":"Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. MIT Press."},{"key":"10118_CR59","doi-asserted-by":"crossref","unstructured":"Wang, T., Dhiman, V., & Atanasov, N. (2020a). Learning navigation costs from demonstration in partially observable environments. In IEEE international conference on robotics and automation.","DOI":"10.1109\/ICRA40945.2020.9197199"},{"key":"10118_CR60","doi-asserted-by":"crossref","unstructured":"Wang, T., Dhiman, V., & Atanasov, N. (2020b). Learning navigation costs from demonstrations with semantic observations. In Conference on learning for dynamics and control.","DOI":"10.1109\/ICRA40945.2020.9197199"},{"key":"10118_CR61","doi-asserted-by":"crossref","unstructured":"Wu, B., Wan, A., Yue, X., & Keutzer, K. (2018). SqueezeSeg: Convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D lidar point cloud. In International conference on robotics and automation (pp. 1887\u20131893).","DOI":"10.1109\/ICRA.2018.8462926"},{"key":"10118_CR62","doi-asserted-by":"crossref","unstructured":"Wulfmeier, M., Wang, D. Z., & Posner, I. (2016). Watch this: Scalable cost-function learning for path planning in urban environments. In IEEE\/RSJ international conference on intelligent robots and systems (IROS) (pp. 2089\u20132095).","DOI":"10.1109\/IROS.2016.7759328"},{"key":"10118_CR63","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Wang, Z., Merel, J., Rusu, A., Erez, T., Cabi, S., Tunyasuvunakool, S., Kramar, J., Hadsell, R., de Freitas, N., & Heess, N. (2018). Reinforcement and imitation learning for diverse visuomotor skills. In Robotics: Science and systems.","DOI":"10.15607\/RSS.2018.XIV.009"},{"key":"10118_CR64","unstructured":"Ziebart, B. D., Maas, A., Bagnell, J., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In AAAI conference on artificial intelligence (pp. 1433\u20131438)."}],"container-title":["Autonomous Robots"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-023-10118-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10514-023-10118-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-023-10118-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,28]],"date-time":"2023-07-28T02:26:44Z","timestamp":1690511204000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10514-023-10118-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,6]]},"references-count":64,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10118"],"URL":"https:\/\/doi.org\/10.1007\/s10514-023-10118-4","relation":{},"ISSN":["0929-5593","1573-7527"],"issn-type":[{"value":"0929-5593","type":"print"},{"value":"1573-7527","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,6]]},"assertion":[{"value":"1 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 June 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 July 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}