{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T09:08:35Z","timestamp":1762506515248,"version":"3.41.0"},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2015,3,9]],"date-time":"2015-03-09T00:00:00Z","timestamp":1425859200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Fund for Scientific Research","award":["FWOG.0291.09N: QoS"],"award-info":[{"award-number":["FWOG.0291.09N: QoS"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Auton. Adapt. Syst."],"published-print":{"date-parts":[[2015,3,25]]},"abstract":"<jats:p>In today\u2019s Internet, the commercial aspects of routing are gaining importance. Current technology allows Internet Service Providers (ISPs) to renegotiate contracts online to maximize profits. Changing link prices will influence interdomain routing policies that are now driven by monetary aspects as well as global resource and performance optimization. In this article, we consider an interdomain routing game in which the ISP\u2019s action is to set the price for its transit links. Assuming a cheapest path routing scheme, the optimal action is the price setting that yields the highest utility (i.e., profit) and depends both on the network load and the actions of other ISPs. We adapt a continuous and a discrete action learning automaton (LA) to operate in this framework as a tool that can be used by ISP operators to learn optimal price setting. In our model, agents representing different ISPs learn only on the basis of local information and do not need any central coordination or sensitive information exchange. Simulation results show that a single ISP employing LAs is able to learn the optimal price in a stationary environment. By introducing a selective exploration rule, LAs are also able to operate in nonstationary environments. When two ISPs employ LAs, we show that they converge to stable and fair equilibrium strategies.<\/jats:p>","DOI":"10.1145\/2719648","type":"journal-article","created":{"date-parts":[[2015,3,9]],"date-time":"2015-03-09T19:03:01Z","timestamp":1425927781000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["A Reinforcement Learning Approach for Interdomain Routing with Link Prices"],"prefix":"10.1145","volume":"10","author":[{"given":"Peter","family":"Vrancx","sequence":"first","affiliation":[{"name":"Vrije Universiteit Brussel, Brussels, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pasquale","family":"Gurzi","sequence":"additional","affiliation":[{"name":"Vrije Universiteit Brussel, Brussels, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdel","family":"Rodriguez","sequence":"additional","affiliation":[{"name":"Vrije Universiteit Brussel and Universidad Central Marta Abreu de las Villas, Brussels, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kris","family":"Steenhaut","sequence":"additional","affiliation":[{"name":"Vrije Universiteit Brussel, Brussels, Belgium"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ann","family":"Now\u00e9","sequence":"additional","affiliation":[{"name":"Vrije Universiteit Brussel"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2015,3,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1080\/09540091.2014.885268"},{"key":"e_1_2_1_2_1","first-page":"745","article-title":"Optimal transit price negotiation: The distributed learning perspective","volume":"14","author":"Barth Dominique","year":"2008","unstructured":"Dominique Barth and Loubna Echabbi . 2008 . Optimal transit price negotiation: The distributed learning perspective . Journal of Universal Computer Science 14 , 5, 745 -- 765 . Dominique Barth and Loubna Echabbi. 2008. Optimal transit price negotiation: The distributed learning perspective. Journal of Universal Computer Science 14, 5, 745--765.","journal-title":"Journal of Universal Computer Science"},{"volume-title":"Innovations in Multi-Agent Systems and Applications. Studies in Computational Intelligence","author":"Bu\u015foniu Lucian","key":"e_1_2_1_3_1","unstructured":"Lucian Bu\u015foniu , Robert Babu\u0161ka , and Bart De Schutter . 2010. Multi-agent reinforcement learning: An overview . In Innovations in Multi-Agent Systems and Applications. Studies in Computational Intelligence , Vol. 281 . Springer , 183--221. Lucian Bu\u015foniu, Robert Babu\u0161ka, and Bart De Schutter. 2010. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications. Studies in Computational Intelligence, Vol. 281. Springer, 183--221."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.mathsocsci.2005.03.001"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/LANMAN.2011.6076922"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0967-0661(99)00141-0"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0957-4158(97)00003-2"},{"volume-title":"Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM\u201910)","author":"Hansan","key":"e_1_2_1_8_1","unstructured":"Hansan T. Karaoglu and Murat Yuksel. 2010. Value flows: Inter-domain routing over contract links . In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM\u201910) . 342--347. Hansan T. Karaoglu and Murat Yuksel. 2010. Value flows: Inter-domain routing over contract links. In Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM\u201910). 342--347."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851275.1851194"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/NGI.2005.1431661"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1006\/jeth.2001.2950"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-005-2631-2"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. 473--478","author":"Rodr\u00edguez Abdel","year":"2011","unstructured":"Abdel Rodr\u00edguez , Ricardo Grau , and Ann Now\u00e9 . 2011 . Continuous action reinforcement learning automata\u2014performance and convergence . In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. 473--478 . Abdel Rodr\u00edguez, Ricardo Grau, and Ann Now\u00e9. 2011. Continuous action reinforcement learning automata\u2014performance and convergence. In Proceedings of the 3rd International Conference on Agents and Artificial Intelligence. 473--478."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the AAMAS 2012 Workshop on Adaptive and Learning Agents. 17--23","author":"Rodriguez Abdel","year":"2012","unstructured":"Abdel Rodriguez , Peter Vrancx , Ricardo Grau , and Ann Now\u00e9 . 2012 a. Learning approach to coordinate exploration with limited communication in continuous action games . In Proceedings of the AAMAS 2012 Workshop on Adaptive and Learning Agents. 17--23 . Abdel Rodriguez, Peter Vrancx, Ricardo Grau, and Ann Now\u00e9. 2012a. Learning approach to coordinate exploration with limited communication in continuous action games. In Proceedings of the AAMAS 2012 Workshop on Adaptive and Learning Agents. 17--23."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 11th International Conference on Adaptive Agents and Multi-Agent Systems (AAMAS). 1401--1402","author":"Rodriguez Abdel","year":"2012","unstructured":"Abdel Rodriguez , Peter Vrancx , Ricardo Grau , and Ann Now\u00e9 . 2012 b. An RL approach to common-interest continuous action games . In Proceedings of the 11th International Conference on Adaptive Agents and Multi-Agent Systems (AAMAS). 1401--1402 . Abdel Rodriguez, Peter Vrancx, Ricardo Grau, and Ann Now\u00e9. 2012b. An RL approach to common-interest continuous action games. In Proceedings of the 11th International Conference on Adaptive Agents and Multi-Agent Systems (AAMAS). 1401--1402."},{"key":"e_1_2_1_16_1","unstructured":"Abdel Rodriguez Peter Vrancx and Ann Now\u00e9. In Press. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. Knowledge Engineering Review 31 2. Available at http:\/\/ai.vub.ac.be\/ALA2012\/KER.html.  Abdel Rodriguez Peter Vrancx and Ann Now\u00e9. In Press. A reinforcement learning approach to coordinate exploration with limited communication in continuous action games. Knowledge Engineering Review 31 2. Available at http:\/\/ai.vub.ac.be\/ALA2012\/KER.html."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASCC.2013.6606290"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2009.2026748"},{"volume-title":"Networks of Learning Automata: Techniques for Online Stochastic Optimization","author":"Thathachar Mandayam A. L.","key":"e_1_2_1_19_1","unstructured":"Mandayam A. L. Thathachar and P. Shanthi Sastry . 2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization . Kluwer Academic . Mandayam A. L. Thathachar and P. Shanthi Sastry. 2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic."},{"volume-title":"Proceedings of the Platinum Jubilee Conference on System Signal Processing.","author":"Thathachar Mandayam A. L.","key":"e_1_2_1_20_1","unstructured":"Mandayam A. L. Thathachar and P. Shanthi Sastry . 1986. Estimator algorithms for learning automata . In Proceedings of the Platinum Jubilee Conference on System Signal Processing. Mandayam A. L. Thathachar and P. Shanthi Sastry. 1986. Estimator algorithms for learning automata. In Proceedings of the Platinum Jubilee Conference on System Signal Processing."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2350124.2350127"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2018436.2018459"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00199-008-0338-8"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2008.920998"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.1986.1104342"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0019-9958(77)90354-0"}],"container-title":["ACM Transactions on Autonomous and Adaptive Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2719648","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2719648","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:55:54Z","timestamp":1750272954000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2719648"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,3,9]]},"references-count":26,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2015,3,25]]}},"alternative-id":["10.1145\/2719648"],"URL":"https:\/\/doi.org\/10.1145\/2719648","relation":{},"ISSN":["1556-4665","1556-4703"],"issn-type":[{"type":"print","value":"1556-4665"},{"type":"electronic","value":"1556-4703"}],"subject":[],"published":{"date-parts":[[2015,3,9]]},"assertion":[{"value":"2013-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-03-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}