{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T15:07:08Z","timestamp":1779030428145,"version":"3.51.4"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T00:00:00Z","timestamp":1778544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"name":"Infosys Foundation and the Department of Science and Technology, Government of India"},{"name":"iHub Anubhuti-IIITD Foundation, New Delhi, India"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2026,7,31]]},"abstract":"<jats:p>As robots (edge-devices, agents) find uses in an increasing number of settings and edge-cloud resources become pervasive, wireless networks will often be shared by flows of data traffic that result from communication between agents and their corresponding edge-cloud nodes (cloud compute or data resource accessed by an agent). In such a setting, any agent communicating with the edge-cloud is unaware of the state of the network resource, which evolves in response to not just the agent\u2019s own communication at any given time but also to communication by the other agents, which stays unknown to the agent.<\/jats:p>\n                  <jats:p>We address the challenge of an agent learning a policy that allows it to decide whether or not to communicate with its cloud node, using limited feedback it obtains from its own attempts to communicate, with the goal of optimizing its utility. The policy must generalize well to any number of other agents sharing the network and must not be trained for any particular network configuration. Our proposed policy is a deep reinforcement learning model Query Net (QNet) that we train using a proposed simulation-to-real framework. Our simulation model has just one parameter and is agnostic to specific configurations of any wireless network. It however allows training an agent\u2019s policy over a wide range of outcomes that an agent\u2019s communication with its edge-cloud node may face when using a shared network, by suitably randomizing the simulation parameter. We propose a learning algorithm that addresses the challenges we observe in training QNet. We validate our simulation-to-real driven approach through experiments conducted on real wireless networks including WiFi and cellular. We compare QNet with other policies to demonstrate its efficacy. Our WiFi experiments involved as few as five agents, resulting in barely any contention for the network, to as many as 50 agents, resulting in severe contention. The cellular experiments spanned a broad range of network conditions, with baseline network round-trip times ranging from a low of 0.07\u2009s to a high of 0.83\u2009s.<\/jats:p>","DOI":"10.1145\/3786203","type":"journal-article","created":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T13:04:51Z","timestamp":1766495091000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning to Communicate over an Unknown Shared Network"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0240-9454","authenticated-orcid":false,"given":"Shivangi","family":"Agarwal","sequence":"first","affiliation":[{"name":"Department of CSE, IIIT-Delhi, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1090-5367","authenticated-orcid":false,"given":"Adi","family":"Asija","sequence":"additional","affiliation":[{"name":"IIIT-Delhi, New Delhi, India and Department of ECE, Johns Hopkins University, Baltimore, Maryland, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5867-8584","authenticated-orcid":false,"given":"Sanjit","family":"K. Kaul","sequence":"additional","affiliation":[{"name":"Department of ECE, IIIT-Delhi, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2586-7308","authenticated-orcid":false,"given":"Arani","family":"Bhattacharya","sequence":"additional","affiliation":[{"name":"Department of CSE and ECE, IIIT-Delhi, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6229-3940","authenticated-orcid":false,"given":"Saket","family":"Anand","sequence":"additional","affiliation":[{"name":"Department of CSE and ECE, IIIT-Delhi, New Delhi, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,5,12]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3736419"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3387514.3405892"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2024.3362889"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2024.3418675"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS47612.2022.9981319"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/iros47612.2022.9981565"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/IOTM.001.2300102"},{"key":"e_1_3_3_9_2","first-page":"1615","article-title":"Robust multi-agent reinforcement learning method based on adversarial domain randomization for real-world dual-UAV cooperation","volume":"1","author":"Chen Shutong","year":"2023","unstructured":"Shutong Chen, Guanjun Liu, Ziyuan Zhou, Kaiwen Zhang, and Jiacun Wang. 2023. Robust multi-agent reinforcement learning method based on adversarial domain randomization for real-world dual-UAV cooperation. IEEE Transactions on Intelligent Vehicles 1 (2023), 1615\u20131627.","journal-title":"IEEE Transactions on Intelligent Vehicles"},{"key":"e_1_3_3_10_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Chen Xiaoyu","year":"2022","unstructured":"Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, and Liwei Wang. 2022. Understanding domain randomization for sim-to-real transfer. In Proceedings of the International Conference on Learning Representations. ICLR. Retrieved from https:\/\/openreview.net\/forum?id=T8vZHIRTrY"},{"key":"e_1_3_3_11_2","unstructured":"Petros Christodoulou. 2019. Soft actor-critic for discrete action settings. arXiv:1910.07207. Retrieved from https:\/\/arxiv.org\/abs\/1910.07207"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2023.104585"},{"key":"e_1_3_3_13_2","first-page":"1538","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Das Abhishek","year":"2019","unstructured":"Abhishek Das, Th\u00e9ophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Mike Rabbat, and Joelle Pineau. 2019. Tarmac: Targeted multi-agent communication. In Proceedings of the International Conference on Machine Learning. PMLR, 1538\u20131546."},{"key":"e_1_3_3_14_2","first-page":"22069","article-title":"Learning individually inferred communication for multi-agent cooperation","volume":"33","author":"Ding Ziluo","year":"2020","unstructured":"Ziluo Ding, Tiejun Huang, and Zongqing Lu. 2020. Learning individually inferred communication for multi-agent cooperation. Advances in Neural Information Processing Systems 33 (2020), 22069\u201322079.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_15_2","first-page":"2145","article-title":"Learning to communicate with deep multi-agent reinforcement learning","volume":"29","author":"Foerster Jakob","year":"2016","unstructured":"Jakob Foerster, Ioannis Alexand Ros Assael, Nando De Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. Advances in Neural Information Processing Systems 29 (2016), 2145\u20132153.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2022.3160932"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s43154-022-00090-9"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-021-09996-w"},{"key":"e_1_3_3_19_2","first-page":"1861","volume-title":"Proceedings of International Conference on Machine Learning","author":"Haarnoja Tuomas","year":"2018","unstructured":"Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of International Conference on Machine Learning. PMLR, 1861\u20131870."},{"key":"e_1_3_3_20_2","unstructured":"Tuomas Haarnoja Aurick Zhou Kristian Hartikainen George Tucker Sehoon Ha Jie Tan Vikash Kumar Henry Zhu Abhishek Gupta Pieter Abbeel et al. 2018. Soft actor-critic algorithms and applications. arXiv:1812.05905. Retrieved from https:\/\/arxiv.org\/abs\/1812.05905"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.5555\/3545946.3598669"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2022.3207619"},{"key":"e_1_3_3_23_2","unstructured":"Jakob Hoydis Sebastian Cammerer Fay\u00e7al Ait Aoudia Merlin Nimier-David Lorenzo Maggi Guillermo Marcus Avinash Vem and Alexander Keller. 2022. Sionna. Retrieved from https:\/\/nvlabs.github.io\/sionna\/"},{"key":"e_1_3_3_24_2","first-page":"467","volume-title":"Proceedings of the Asian Conference on Machine Learning","author":"Hu Diyi","year":"2023","unstructured":"Diyi Hu, Chi Zhang, Viktor Prasanna, and Bhaskar Krishnamachari. 2023. Learning practical communication strategies in cooperative multi-agent reinforcement learning. In Proceedings of the Asian Conference on Machine Learning. PMLR, 467\u2013482."},{"key":"e_1_3_3_25_2","unstructured":"Guangzheng Hu Yuanheng Zhu Dongbin Zhao Mengchen Zhao and Jianye Hao. 2020. Event-triggered multi-agent reinforcement learning with communication under limited-bandwidth constraint. arXiv:2010.04978. Retrieved from https:\/\/arxiv.org\/abs\/2010.04978"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302509.3313784"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3013848"},{"key":"e_1_3_3_28_2","first-page":"1072","volume-title":"Proceedings of 5th Annual Learning for Dynamics and Control Conferenceand","author":"Kesper Lukas","year":"2023","unstructured":"Lukas Kesper, Sebastian Trimpe, and Dominik Baumann. 2023. Toward multi-agent reinforcement learning for distributed event-triggered control. In Proceedings of 5th Annual Learning for Dynamics and Control Conference. Nikolai Matni, Manfred Morari, and George J. Pappas (Eds.), Vol. 211, PMLR, 1072\u20131085. Retrieved from https:\/\/proceedings.mlr.press\/v211\/kesper23a.html"},{"key":"e_1_3_3_29_2","unstructured":"Daewoo Kim Sangwoo Moon David Hostallero Wan Ju Kang Taeyoung Lee Kyunghwan Son and Yung Yi. 2019. Learning to schedule communication in multi-agent reinforcement learning. arXiv:1902.01554. Retrieved from https:\/\/arxiv.org\/abs\/1902.01554"},{"key":"e_1_3_3_30_2","doi-asserted-by":"crossref","first-page":"1476","DOI":"10.1109\/LRA.2023.3342561","article-title":"Robust MADER: Decentralized multiagent trajectory planner robust to communication delay in dynamic environments","volume":"9","author":"Kondo Kota","year":"2023","unstructured":"Kota Kondo, Reinaldo Figueroa, Juan Rached, Jesus Tordesillas, Parker C. Lusk, and Jonathan P. How. 2023. Robust MADER: Decentralized multiagent trajectory planner robust to communication delay in dynamic environments. IEEE Robotics and Automation Letters 9 (2023), 1476\u20131483.","journal-title":"IEEE Robotics and Automation Letters"},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpat.2022.102580"},{"issue":"3","key":"e_1_3_3_32_2","article-title":"Recent development and applications of SUMO-Simulation of urban MObility","volume":"5","author":"Krajzewicz Daniel","year":"2012","unstructured":"Daniel Krajzewicz, Jakob Erdmann, Michael Behrisch, and Laura Bieker. 2012. Recent development and applications of SUMO-Simulation of urban MObility. International Journal on Advances in Systems and Measurements 5, 3\u20134 (2012), 128\u2013138.","journal-title":"International Journal on Advances in Systems and Measurements"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2018.2820810"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2023.103638"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2019.2915983"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1609\/AAAI.v34i04.5957"},{"key":"e_1_3_3_37_2","unstructured":"Federico Mason Federico Chiariotti Andrea Zanella and Petar Popovski. 2023. Multi-agent reinforcement learning for pragmatic communication and control. arXiv:2302.14399. Retrieved from https:\/\/arxiv.org\/abs\/2302.14399"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3073973"},{"key":"e_1_3_3_39_2","first-page":"417","volume-title":"Proceedings of the 2015 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC \u201915)","author":"Netravali Ravi","year":"2015","unstructured":"Ravi Netravali, Anirudh Sivaraman, Somak Das, Ameesh Goyal, Keith Winstein, James Mickens, and Hari Balakrishnan. 2015. Mahimahi: Accurate record-and-replay for HTTP. In Proceedings of the 2015 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC \u201915). USENIX Association, 417\u2013429."},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpat.2019.101933"},{"key":"e_1_3_3_41_2","first-page":"3803","volume-title":"Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA)","author":"Bin Peng Xue","year":"2018","unstructured":"Xue Bin Peng, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2018. Sim-to-real transfer of robotic control with dynamics randomization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 3803\u20133810."},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.65109\/MTWP6998"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.2022.3229213"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.02402"},{"key":"e_1_3_3_45_2","unstructured":"Amanpreet Singh Tushar Jain and Sainbayar Sukhbaatar. 2018. Learning when to communicate at scale in multiagent cooperative and competitive tasks. arXiv:1812.09755. Retrieved from https:\/\/arxiv.org\/abs\/1812.09755"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157348"},{"key":"e_1_3_3_47_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_3_48_2","doi-asserted-by":"crossref","unstructured":"Gabriele Tiboni Andrea Protopapa Tatiana Tommasi and Giuseppe Averta. 2023. Domain randomization for robust affordable and effective closed-loop control of soft robots. arXiv:2303.04136. Retrieved from https:\/\/arxiv.org\/abs\/2303.04136","DOI":"10.1109\/IROS55552.2023.10342537"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.62.1805"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341617"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCOMM.2016.2601087"},{"key":"e_1_3_3_52_2","first-page":"9908","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Wang Rundong","year":"2020","unstructured":"Rundong Wang, Xu He, Runsheng Yu, Wei Qiu, Bo An, and Zinovi Rabinovich. 2020. Learning efficient multi-agent communication: An information bottleneck approach. In Proceedings of the International Conference on Machine Learning. PMLR, 9908\u20139918."},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.5555\/2482626.2482670"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CDC40024.2019.9030168"},{"key":"e_1_3_3_55_2","first-page":"317","volume-title":"Proceedings of Conference on Robot Learning","author":"Xie Zhaoming","year":"2020","unstructured":"Zhaoming Xie, Patrick Clary, Jeremy Dao, Pedro Morais, Jonanthan Hurst, and Michiel Panne. 2020. Learning locomotion skills for Cassie: Iterative design and sim-to-real. In Proceedings of Conference on Robot Learning. PMLR, Massachusetts Institute of Technology, 317\u2013329."},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9560837"},{"issue":"2","key":"e_1_3_3_57_2","first-page":"1080","article-title":"Multiplexing URLLC traffic within eMBB services in 5G NR: Fair scheduling","volume":"69","author":"Yin Hao","year":"2020","unstructured":"Hao Yin, Lyutianyang Zhang, and Sumit Roy. 2020. Multiplexing URLLC traffic within eMBB services in 5G NR: Fair scheduling. IEEE Transactions on Communications 69, 2 (2020), 1080\u20131093.","journal-title":"IEEE Transactions on Communications"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.5555\/3545946.3598752"},{"key":"e_1_3_3_59_2","unstructured":"Lei Yuan Ziqian Zhang Lihe Li Cong Guan and Yang Yu. 2023. A survey of progress on cooperative multi-agent reinforcement learning in open environment. arXiv:2312.01058. Retrieved from https:\/\/arxiv.org\/abs\/2312.01058"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVT.2023.3307409"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i10.26389"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-60990-0_12"},{"key":"e_1_3_3_63_2","volume-title":"Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control","author":"Zhang Sai Qian","year":"2019","unstructured":"Sai Qian Zhang, Qi Zhang, and Jieyu Lin. 2019. Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control. Curran Associates Inc."},{"key":"e_1_3_3_64_2","first-page":"17271","article-title":"Succinct and robust multi-agent communication with temporal message control","volume":"33","author":"Zhang Sai Qian","year":"2020","unstructured":"Sai Qian Zhang, Qi Zhang, and Jieyu Lin. 2020. Succinct and robust multi-agent communication with temporal message control. Advances in Neural Information Processing Systems 33 (2020), 17271\u201317282.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3116700"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2024.3445583"},{"key":"e_1_3_3_67_2","unstructured":"Ziyuan Zhou Guanjun Liu and Ying Tang. 2023. Multi-agent reinforcement learning: Methods applications visionary prospects and challenges. arXiv:2305.10091. Retrieved from https:\/\/arxiv.org\/abs\/2305.10091"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-023-09633-6"}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3786203","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T14:36:31Z","timestamp":1779028591000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3786203"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,12]]},"references-count":67,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,7,31]]}},"alternative-id":["10.1145\/3786203"],"URL":"https:\/\/doi.org\/10.1145\/3786203","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,12]]},"assertion":[{"value":"2025-05-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-12-09","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-05-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}