{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T15:58:46Z","timestamp":1776182326108,"version":"3.50.1"},"reference-count":85,"publisher":"SAGE Publications","issue":"7","license":[{"start":{"date-parts":[[2020,5,31]],"date-time":"2020-05-31T00:00:00Z","timestamp":1590883200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"NSFC\/RGC Joint Research Scheme","award":["HKU103\/16"],"award-info":[{"award-number":["HKU103\/16"]}]},{"DOI":"10.13039\/501100012479","name":"general research fund of shanghai normal university","doi-asserted-by":"publisher","award":["11202119"],"award-info":[{"award-number":["11202119"]}],"id":[{"id":"10.13039\/501100012479","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012479","name":"general research fund of shanghai normal university","doi-asserted-by":"publisher","award":["11207818"],"award-info":[{"award-number":["11207818"]}],"id":[{"id":"10.13039\/501100012479","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2020,6]]},"abstract":"<jats:p> Developing a safe and efficient collision-avoidance policy for multiple robots is challenging in the decentralized scenarios where each robot generates its paths with limited observation of other robots\u2019 states and intentions. Prior distributed multi-robot collision-avoidance systems often require frequent inter-robot communication or agent-level features to plan a local collision-free action, which is not robust and computationally prohibitive. In addition, the performance of these methods is not comparable with their centralized counterparts in practice. In this article, we present a decentralized sensor-level collision-avoidance policy for multi-robot systems, which shows promising results in practical applications. In particular, our policy directly maps raw sensor measurements to an agent\u2019s steering commands in terms of the movement velocity. As a first step toward reducing the performance gap between decentralized and centralized methods, we present a multi-scenario multi-stage training framework to learn an optimal policy. The policy is trained over a large number of robots in rich, complex environments simultaneously using a policy-gradient-based reinforcement-learning algorithm. The learning algorithm is also integrated into a hybrid control framework to further improve the policy\u2019s robustness and effectiveness. We validate the learned sensor-level collision-3avoidance policy in a variety of simulated and real-world scenarios with thorough performance evaluations for large-scale multi-robot systems. The generalization of the learned policy is verified in a set of unseen scenarios including the navigation of a group of heterogeneous robots and a large-scale scenario with 100 robots. Although the policy is trained using simulation data only, we have successfully deployed it on physical robots with shapes and dynamics characteristics that are different from the simulated agents, in order to demonstrate the controller\u2019s robustness against the simulation-to-real modeling error. Finally, we show that the collision-avoidance policy learned from multi-robot navigation tasks provides an excellent solution for safe and effective autonomous navigation for a single robot working in a dense real human crowd. Our learned policy enables a robot to make effective progress in a crowd without getting stuck. More importantly, the policy has been successfully deployed on different types of physical robot platforms without tedious parameter tuning. Videos are available at https:\/\/sites.google.com\/view\/hybridmrca . <\/jats:p>","DOI":"10.1177\/0278364920916531","type":"journal-article","created":{"date-parts":[[2020,6,1]],"date-time":"2020-06-01T04:14:21Z","timestamp":1590984861000},"page":"856-892","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":303,"title":["Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios"],"prefix":"10.1177","volume":"39","author":[{"given":"Tingxiang","family":"Fan","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Hong Kong, Hong Kong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8440-3218","authenticated-orcid":false,"given":"Pinxin","family":"Long","sequence":"additional","affiliation":[{"name":"Baidu Research, Baidu, Inc., Beijing, China"}]},{"given":"Wenxi","family":"Liu","sequence":"additional","affiliation":[{"name":"College of Mathematics and Computer Science, Fuzhou University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9003-2054","authenticated-orcid":false,"given":"Jia","family":"Pan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Hong Kong, Hong Kong, China"}]}],"member":"179","published-online":{"date-parts":[[2020,5,31]]},"reference":[{"key":"bibr1-0278364920916531","volume-title":"Proceedings of the Conference on Autonomous Robot Systems and Competitions","author":"Adouane L","year":"2009"},{"key":"bibr2-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.110"},{"key":"bibr3-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917719333"},{"key":"bibr4-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2018.2793890"},{"key":"bibr5-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32723-0_15"},{"key":"bibr6-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-48119-2_14"},{"key":"bibr7-0278364920916531","first-page":"173","volume-title":"International Conference on International Conference on Machine Learning","author":"Amodei D","year":"2016"},{"key":"bibr8-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/70.736776"},{"key":"bibr9-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915576234"},{"key":"bibr10-0278364920916531","author":"Barreto A","year":"2017","journal-title":"Proceedings of Neural Information Processing Systems"},{"key":"bibr11-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553380"},{"key":"bibr12-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910396552"},{"key":"bibr13-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364909104290"},{"key":"bibr14-0278364920916531","author":"Chen YF","year":"2017","journal-title":"arXiv preprint arXiv:1703.08862"},{"key":"bibr15-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989037"},{"key":"bibr16-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386125"},{"key":"bibr17-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574707004092"},{"key":"bibr18-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1016\/S0005-1098(01)00185-6"},{"key":"bibr19-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910365417"},{"key":"bibr20-0278364920916531","author":"Everett M","year":"2018","journal-title":"arXiv preprint arXiv:1805.01956"},{"key":"bibr21-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2012.6225245"},{"key":"bibr22-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/100.580977"},{"key":"bibr23-0278364920916531","author":"Frans K","year":"2017","journal-title":"arXiv preprint arXiv:1710.09767"},{"key":"bibr24-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364910387173"},{"key":"bibr25-0278364920916531","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","author":"Godoy J","year":"2016"},{"key":"bibr26-0278364920916531","first-page":"294","author":"Godoy J","year":"2016","journal-title":"IJCAI"},{"key":"bibr27-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"bibr28-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00240"},{"key":"bibr29-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1998.677268"},{"key":"bibr30-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"bibr31-0278364920916531","author":"Heess N","year":"2017","journal-title":"arXiv preprint arXiv:1707.02286"},{"key":"bibr32-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.51.4282"},{"key":"bibr33-0278364920916531","first-page":"147","volume-title":"Proceedings of the International Conference on Autonomous Agents and Multiagent Systems","author":"Hennes D","year":"2012"},{"key":"bibr34-0278364920916531","author":"Horgan D","year":"2018","journal-title":"arXiv preprint arXiv:1803.00933"},{"key":"bibr35-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364911413479"},{"key":"bibr36-0278364920916531","author":"Kahn G","year":"2017","journal-title":"arXiv preprint arXiv:1702.01182"},{"key":"bibr37-0278364920916531","doi-asserted-by":"publisher","DOI":"10.23919\/ACC.2004.1384629"},{"key":"bibr38-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914555543"},{"key":"bibr39-0278364920916531","author":"Kingma D","year":"2014","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"bibr40-0278364920916531","first-page":"1097","author":"Krizhevsky A","year":"2012","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr41-0278364920916531","author":"Lenz I","year":"2015","journal-title":"Proceedings of the Robotics: Science and System"},{"issue":"39","key":"bibr42-0278364920916531","first-page":"1","volume":"17","author":"Levine S","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"bibr43-0278364920916531","author":"Lillicrap TP","year":"2015","journal-title":"arXiv preprint arXiv:1509.02971"},{"key":"bibr44-0278364920916531","volume-title":"Proceedings of the IEEE International Conference on Robotics and Automation","author":"Long P","year":"2017"},{"key":"bibr45-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2017.2651371"},{"key":"bibr46-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2011.6095085"},{"key":"bibr47-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-010-9205-0"},{"key":"bibr48-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"bibr49-0278364920916531","first-page":"739","author":"Muller U","year":"2006","journal-title":"Proceedings of Advances in Neural Information Processing Systems"},{"key":"bibr50-0278364920916531","first-page":"807","volume-title":"Proceedings of International Conference on Machine Learning","author":"Nair V","year":"2010"},{"key":"bibr51-0278364920916531","volume-title":"Proceedings of Robotics: Science and Systems, Workshop on Limits and Potentials of Deep Learning in Robotics","author":"Ondruska P","year":"2016"},{"key":"bibr52-0278364920916531","volume-title":"Proceedings of AAAI Conference on Artificial Intelligence","author":"Ondruska P","year":"2016"},{"key":"bibr53-0278364920916531","author":"Peng XB","year":"2017","journal-title":"arXiv preprint arXiv:1710.06537"},{"key":"bibr54-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989182"},{"key":"bibr55-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2004.1302413"},{"key":"bibr56-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2013.6630809"},{"key":"bibr57-0278364920916531","volume-title":"Conference on Robot Learning","author":"Rusu AA","year":"2016"},{"key":"bibr58-0278364920916531","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2017.XIII.034"},{"key":"bibr59-0278364920916531","author":"Schaul T","year":"2015","journal-title":"arXiv preprint arXiv:1511.05952"},{"key":"bibr60-0278364920916531","first-page":"1889","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Schulman J","year":"2015"},{"key":"bibr61-0278364920916531","author":"Schulman J","year":"2015","journal-title":"arXiv preprint arXiv:1506.02438"},{"key":"bibr62-0278364920916531","author":"Schulman J","year":"2017","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"bibr63-0278364920916531","author":"Schulman J","year":"2017","journal-title":"arXiv preprint arXiv:1707.06347"},{"key":"bibr64-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/027836498300200304"},{"key":"bibr65-0278364920916531","volume-title":"Proceedings of Australasian Conference on Robotics and Automation","author":"Sergeant J","year":"2015"},{"key":"bibr66-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2014.11.006"},{"key":"bibr67-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ACC.2007.4282736"},{"key":"bibr68-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2010.5652073"},{"key":"bibr69-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2011.2120810"},{"key":"bibr70-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00025-9"},{"key":"bibr71-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2009.2027384"},{"key":"bibr72-0278364920916531","volume-title":"Proceedings of the IEEE\/RSJ International Conference on Intelligent Robots and Systems","author":"Tai L","year":"2017"},{"key":"bibr73-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364917741532"},{"key":"bibr74-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8202133"},{"key":"bibr75-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913515307"},{"key":"bibr76-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19457-3_1"},{"key":"bibr77-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-19457-3_1"},{"key":"bibr78-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2008.4543489"},{"key":"bibr79-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"bibr80-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_16"},{"key":"bibr81-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2016.2593448"},{"key":"bibr82-0278364920916531","author":"Zhang J","year":"2016","journal-title":"arXiv preprint arXiv:1612.05533"},{"key":"bibr83-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487175"},{"key":"bibr84-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2017.2656241"},{"key":"bibr85-0278364920916531","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989381"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364920916531","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364920916531","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364920916531","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T10:44:31Z","timestamp":1740825871000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364920916531"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5,31]]},"references-count":85,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2020,6]]}},"alternative-id":["10.1177\/0278364920916531"],"URL":"https:\/\/doi.org\/10.1177\/0278364920916531","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,31]]}}}