{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T19:33:55Z","timestamp":1775072035289,"version":"3.50.1"},"reference-count":89,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,7,20]],"date-time":"2022-07-20T00:00:00Z","timestamp":1658275200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,20]],"date-time":"2022-07-20T00:00:00Z","timestamp":1658275200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reinforcement learning (RL) techniques nurture building up solutions for sequential decision-making problems under uncertainty and ambiguity. RL has agents with a reward function that interacts with a dynamic environment to find out an optimal policy. There are problems associated with RL like the reward function should be specified in advance, design difficulties and unable to handle large complex problems, etc. This led to the development of inverse reinforcement learning (IRL). IRL also suffers from many problems in real life like robust reward functions, ill-posed problems, etc., and different solutions have been proposed to solve these problems like maximum entropy, support for multiple rewards and non-linear reward functions, etc. There are majorly eight problems associated with IRL and eight solutions have been proposed to solve IRL problems. This paper has proposed a hybrid fuzzy AHP\u2013TOPSIS approach to prioritize the solutions while implementing IRL. Fuzzy Analytical Hierarchical Process (FAHP) is used to get the weights of identified problems. The relative accuracy and root-mean-squared error using FAHP are 97.74 and 0.0349, respectively. Fuzzy Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) uses these FAHP weights to prioritize the solutions. The most significant problem in IRL implementation is of \u2018lack of robust reward functions\u2019 weighting 0.180, whereas the most significant solution in IRL implementation is \u2018Supports optimal policy and rewards functions along with stochastic transition models\u2019 having closeness of coefficient (CofC) value of 0.967156846.<\/jats:p>","DOI":"10.1007\/s40747-022-00807-5","type":"journal-article","created":{"date-parts":[[2022,7,20]],"date-time":"2022-07-20T05:02:58Z","timestamp":1658293378000},"page":"493-513","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":95,"title":["Hybrid fuzzy AHP\u2013TOPSIS approach to prioritizing solutions for inverse reinforcement learning"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9760-0824","authenticated-orcid":false,"given":"Vinay","family":"Kukreja","sequence":"first","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,20]]},"reference":[{"issue":"3","key":"807_CR1","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1108\/17563781211255862","volume":"5","author":"S Zhifei","year":"2012","unstructured":"Zhifei S, Joo EM (2012) A survey of inverse reinforcement learning techniques. Int J Intell Comput Cybern 5(3):293\u2013311. https:\/\/doi.org\/10.1108\/17563781211255862","journal-title":"Int J Intell Comput Cybern"},{"issue":"5","key":"807_CR2","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1016\/j.robot.2008.10.024","volume":"57","author":"BD Argall","year":"2009","unstructured":"Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469\u2013483. https:\/\/doi.org\/10.1016\/j.robot.2008.10.024","journal-title":"Robot Auton Syst"},{"key":"807_CR3","doi-asserted-by":"publisher","unstructured":"Datta P, Sharma B (2017) A survey on IoT architectures, protocols, security and smart city based applications. In: 8th IEEE International Conference on Computing, Communications and Networking Technologies, ICCCNT 2017, 1\u20135. https:\/\/doi.org\/10.1109\/ICCCNT.2017.8203943","DOI":"10.1109\/ICCCNT.2017.8203943"},{"issue":"6","key":"807_CR4","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/978-3-319-15425-1_6","volume":"3","author":"S Schaal","year":"1999","unstructured":"Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 3(6):97\u2013114. https:\/\/doi.org\/10.1007\/978-3-319-15425-1_6","journal-title":"Trends Cogn Sci"},{"key":"807_CR5","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.cobeha.2019.04.010","volume":"29","author":"J Jara-Ettinger","year":"2019","unstructured":"Jara-Ettinger J (2019) Theory of mind as inverse reinforcement learning. Curr Opin Behav Sci 29:105\u2013110. https:\/\/doi.org\/10.1016\/j.cobeha.2019.04.010","journal-title":"Curr Opin Behav Sci"},{"key":"807_CR6","doi-asserted-by":"crossref","unstructured":"Russell S (1998) Learning agents for uncertain environments. In: Proceedings of the Annual ACM Conference on Computational Learning Theory, 101\u2013103.","DOI":"10.1145\/279943.279964"},{"key":"807_CR7","first-page":"2","volume":"1","author":"AY Ng","year":"2000","unstructured":"Ng AY, Russell S (2000) Algorithms for inverse reinforcement learning. ICML 1:2\u20139","journal-title":"ICML"},{"key":"807_CR8","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-29946-9_27","author":"C Dimitrakakis","year":"2012","unstructured":"Dimitrakakis C, Rothkopf CA (2012) Bayesian multitask inverse reinforcement learning. Eur Worksh Reinforce Learn. https:\/\/doi.org\/10.1007\/978-3-642-29946-9_27","journal-title":"Eur Worksh Reinforce Learn"},{"key":"807_CR9","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3051012","author":"M Imani","year":"2021","unstructured":"Imani M, Ghoreishi SF (2021) Scalable inverse reinforcement learning through multifidelity bayesian optimization. IEEE Trans Neural Netw Learn Syst. https:\/\/doi.org\/10.1109\/TNNLS.2021.3051012","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"807_CR10","unstructured":"Ni T, Sikchi H, Wang Y, Gupta T, Lee L, Eysenbach B (2020) f-IRL: inverse reinforcement learning via state marginal matching. ArXiv: 1\u201325"},{"issue":"2","key":"807_CR11","doi-asserted-by":"publisher","first-page":"304","DOI":"10.1109\/TPAMI.2018.2873794","volume":"42","author":"N Rhinehart","year":"2020","unstructured":"Rhinehart N, Kitani KM (2020) First-person activity forecasting from video with online inverse reinforcement learning. IEEE Trans Pattern Anal Mach Intell 42(2):304\u2013317. https:\/\/doi.org\/10.1109\/TPAMI.2018.2873794","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"807_CR12","doi-asserted-by":"publisher","unstructured":"Majumdar A, Singh S, Mandlekar A, Pavone M (2017) Risk-sensitive inverse reinforcement learning via coherent risk models. Robot Sci Syst 16: 117\u2013126. https:\/\/doi.org\/10.15607\/rss.2017.xiii.069","DOI":"10.15607\/rss.2017.xiii.069"},{"key":"807_CR13","doi-asserted-by":"crossref","unstructured":"Pirotta M, Restelli M (2016) Inverse reinforcement learning through policy gradient minimization. In: 30th AAAI Conference on Artificial Intelligence, AAAI 2016, 1993\u20131999.","DOI":"10.1609\/aaai.v30i1.10313"},{"key":"807_CR14","unstructured":"Qureshi AH, Boots B, Yip MC (2018) Adversarial imitation via variational inverse reinforcement learning, 1\u201314. arXiv:1809.06404"},{"key":"807_CR15","unstructured":"Shiarlis K, Messias J, Whiteson S (2016) Inverse reinforcement learning from Failure. In: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 1060\u20131068"},{"key":"807_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.robot.2019.01.003","volume":"114","author":"C You","year":"2019","unstructured":"You C, Lu J, Filev D, Tsiotras P (2019) Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robot Auton Syst 114:1\u201318. https:\/\/doi.org\/10.1016\/j.robot.2019.01.003","journal-title":"Robot Auton Syst"},{"key":"807_CR17","doi-asserted-by":"publisher","unstructured":"Kangasr\u00e4\u00e4si\u00f6 A, Kaski S (2018) Inverse reinforcement learning from summary data. Mach Learn 107: 1517\u20131535. https:\/\/doi.org\/10.1007\/s10994-018-5730-4","DOI":"10.1007\/s10994-018-5730-4"},{"key":"807_CR18","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1109\/CEC.2012.6256507","volume":"2012","author":"Z Shao","year":"2012","unstructured":"Shao Z, Er MJ (2012) A review of inverse reinforcement learning theory and recent advances. IEEE Cong Evolut Comput CEC 2012:10\u201315. https:\/\/doi.org\/10.1109\/CEC.2012.6256507","journal-title":"IEEE Cong Evolut Comput CEC"},{"key":"807_CR19","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04174-7_3","author":"M Lopes","year":"2009","unstructured":"Lopes M, Melo F, Montesano L (2009) Active learning for reward estimation in inverse reinforcement learning. Jt Eur Conf Mach Learn Knowl Discov Datab. https:\/\/doi.org\/10.1007\/978-3-642-04174-7_3","journal-title":"Jt Eur Conf Mach Learn Knowl Discov Datab"},{"key":"807_CR20","unstructured":"Brown DS, Goo W, Nagarajan P, Niekum S (2019) Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In: International Conference on Machine Learning, 783\u2013792."},{"key":"807_CR21","doi-asserted-by":"publisher","unstructured":"Ziebart BD, Maas A, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAi Conference of Artificial Intelligence, 1433\u20131438https:\/\/doi.org\/10.1007\/978-3-662-49390-8_64","DOI":"10.1007\/978-3-662-49390-8_64"},{"key":"807_CR22","unstructured":"Ziebart BD, Bagnell JA, Dey AK (2010) Modeling interaction via the principle of maximum causal entropy. In: Proceedings, 27th International Conference on Machine Learning, 1255\u20131262."},{"key":"807_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.artint.2021.103500","volume":"103500","author":"S Arora","year":"2021","unstructured":"Arora S, Doshi P (2021) A survey of inverse reinforcement learning: challenges, methods and progress. Artif Intell 103500:1\u201348. https:\/\/doi.org\/10.1016\/j.artint.2021.103500","journal-title":"Artif Intell"},{"key":"807_CR24","first-page":"1","volume":"1710","author":"J Fu","year":"2017","unstructured":"Fu J, Luo K, Levine S (2017) Learning robust rewards with adversarial inverse reinforcement learning. Arxiv 1710:1\u201315","journal-title":"Arxiv"},{"key":"807_CR25","unstructured":"Asri LE, Piot B, Geist M, Laroche R, Pietquin O, Asri LE, Piot B, Geist M, Laroche R, Inverse OPS, Asri LE, Geist M, Laroche R, Moulineaux I (2016) Score-based Inverse Reinforcement Learning. In: International Conference on Autonomous Agents and Multiagent Systems, 1\u20139."},{"issue":"4","key":"807_CR26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1778765.1778859","volume":"29","author":"SJ Lee","year":"2010","unstructured":"Lee SJ, Popovi\u0107 Z (2010) Learning behavior styles with inverse reinforcement learning. ACM Trans Graph (TOG) 29(4):1\u20137. https:\/\/doi.org\/10.1145\/1778765.1778859","journal-title":"ACM Trans Graph (TOG)"},{"key":"807_CR27","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40988-2_1","author":"E Klein","year":"2013","unstructured":"Klein E, Piot B, Geist M, Pietquin O (2013) A cascaded supervised learning approach to inverse reinforcement learning. Jt Eur Conf Mach Learn Knowl Discov Datab. https:\/\/doi.org\/10.1007\/978-3-642-40988-2_1","journal-title":"Jt Eur Conf Mach Learn Knowl Discov Datab"},{"key":"807_CR28","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23808-6_3","author":"CA Rothkopf","year":"2011","unstructured":"Rothkopf CA, Dimitrakakis C (2011) Preference elicitation and inverse reinforcement learning. Jt Eur Conf Mach Learn Knowl Discov Datab. https:\/\/doi.org\/10.1007\/978-3-642-23808-6_3","journal-title":"Jt Eur Conf Mach Learn Knowl Discov Datab"},{"key":"807_CR29","unstructured":"Sharifzadeh S, Chiotellis I, Triebel R, Cremers D (2016) Learning to drive using inverse reinforcement learning and deep Q-networks. arXiv preprint http:\/\/arxiv.org\/abs\/1612.03653"},{"key":"807_CR30","unstructured":"\u0160o\u0161ic A, KhudaBukhsh WR, Zoubir AM, Koeppl H (2017) Inverse reinforcement learning in swarm systems. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 1413\u20131420."},{"key":"807_CR31","doi-asserted-by":"publisher","first-page":"518","DOI":"10.1016\/j.ins.2019.09.066","volume":"512","author":"JL Lin","year":"2020","unstructured":"Lin JL, Hwang KS, Shi H, Pan W (2020) An ensemble method for inverse reinforcement learning. Inf Sci 512:518\u2013532. https:\/\/doi.org\/10.1016\/j.ins.2019.09.066","journal-title":"Inf Sci"},{"issue":"8","key":"807_CR32","doi-asserted-by":"publisher","first-page":"1814","DOI":"10.1109\/TNNLS.2016.2543000","volume":"28","author":"B Piot","year":"2017","unstructured":"Piot B, Geist M, Pietquin O (2017) Bridging the gap between imitation learning and inverse reinforcement learning. IEEE Trans Neural Netw Learn Syst 28(8):1814\u20131826. https:\/\/doi.org\/10.1109\/TNNLS.2016.2543000","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"807_CR33","doi-asserted-by":"publisher","unstructured":"Adams S, Cody T, Beling PA (2022) A survey of inverse reinforcement learning. In: Artificial Intelligence Review. Springer, Netherlands. https:\/\/doi.org\/10.1007\/s10462-021-10108-x","DOI":"10.1007\/s10462-021-10108-x"},{"key":"807_CR34","unstructured":"Hadfield-Menell D, Dragan A, Abbeel P, Russell S (2016) Cooperative inverse reinforcement learning. In: 30th Conference on Neural Information Processing Systems (NIPS), 3916\u20133924"},{"key":"807_CR35","unstructured":"Wulfmeier M, Ondruska P, Posner I (2015) Deep inverse reinforcement learning, 1\u20139. ArXiv PreprintarXiv:1507.04888"},{"key":"807_CR36","first-page":"102","volume":"51","author":"M Herman","year":"2016","unstructured":"Herman M, Gindele T, Wagner J, Schmitt F, Burgard W (2016) Inverse reinforcement learning with simultaneous estimation of rewards and dynamics. Artif Intell Stat 51:102\u2013110","journal-title":"Artif Intell Stat"},{"key":"807_CR37","unstructured":"Boularias A, Kober J, Peters J (2011) Relative entropy inverse reinforcement learning. In: JMLR Workshop and Conference Proceedings, 182\u2013189."},{"key":"807_CR38","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2014.6942731","author":"D Vasquez","year":"2014","unstructured":"Vasquez D, Okal B, Arras KO (2014) Inverse Reinforcement Learning algorithms and features for robot navigation in crowds: an experimental comparison. IEEE Int Conf Intell Robot Syst. https:\/\/doi.org\/10.1109\/IROS.2014.6942731","journal-title":"IEEE Int Conf Intell Robot Syst"},{"key":"807_CR39","unstructured":"Castro PS, Li S, Zhang D (2019). Inverse reinforcement learning with multiple ranked experts. ArXiv arXiv:1907.13411."},{"key":"807_CR40","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2017.2775960","author":"M Bloem","year":"2014","unstructured":"Bloem M, Bambos N (2014) Infinite time horizon maximum causal entropy inverse reinforcement learning. IEEE Conf Decis Control. https:\/\/doi.org\/10.1109\/TAC.2017.2775960","journal-title":"IEEE Conf Decis Control"},{"key":"807_CR41","doi-asserted-by":"publisher","unstructured":"Self R, Abudia M, Kamalapurkar R (2020) Online inverse reinforcement learning for systems with disturbances. ArXiv. https:\/\/doi.org\/10.23919\/ACC45564.2020.9147344","DOI":"10.23919\/ACC45564.2020.9147344"},{"key":"807_CR42","doi-asserted-by":"publisher","DOI":"10.1109\/CDC42340.2020.9304190","author":"F Memarian","year":"2020","unstructured":"Memarian F, Xu Z, Wu B, Wen M, Topcu U (2020) Active task-inference-guided deep inverse reinforcement learning. Proc IEEE Conf Decis Control. https:\/\/doi.org\/10.1109\/CDC42340.2020.9304190","journal-title":"Proc IEEE Conf Decis Control"},{"key":"807_CR43","unstructured":"Nguyen QP, Low KH, Jaillet P (2015) Inverse reinforcement learning with locally consistent reward functions. Adv Neural Inform Process Syst: 1747\u20131755"},{"key":"807_CR44","doi-asserted-by":"publisher","first-page":"8376","DOI":"10.1109\/ACCESS.2018.2808266","volume":"6","author":"H Shi","year":"2018","unstructured":"Shi H, Lin Z, Hwang KS, Yang S, Chen J (2018) An adaptive strategy selection method with reinforcement learning for robotic soccer games. IEEE Access 6:8376\u20138386. https:\/\/doi.org\/10.1109\/ACCESS.2018.2808266","journal-title":"IEEE Access"},{"key":"807_CR45","doi-asserted-by":"publisher","unstructured":"Shi Z, Chen X, Qiu X, Huang X (2018) Toward diverse text generation with inverse reinforcement learning. IJCAI Int Joint Conf Artif Intell. https:\/\/doi.org\/10.24963\/ijcai.2018\/606","DOI":"10.24963\/ijcai.2018\/606"},{"issue":"4","key":"807_CR46","doi-asserted-by":"publisher","first-page":"793","DOI":"10.1109\/TCYB.2014.2336867","volume":"45","author":"J Choi","year":"2015","unstructured":"Choi J, Kim KE (2015) Nonparametric bayesian inverse reinforcement learning for multiple reward functions. IEEE Trans Cybern 45(4):793\u2013805. https:\/\/doi.org\/10.1109\/TCYB.2014.2336867","journal-title":"IEEE Trans Cybern"},{"key":"807_CR47","unstructured":"Brown DS, Cui Y, Niekum S (2018) Risk-aware active inverse reinforcement learning. In: Conference on Robot Learning, 362\u2013372."},{"key":"807_CR48","unstructured":"Abbeel P, Coates A, Quigley M, Ng A (2006) An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems, 1\u20138."},{"key":"807_CR49","doi-asserted-by":"crossref","unstructured":"Inga J, K\u00f6pf F, Flad M, H S (2017) Individual human behavior identification using an inverse reinforcement learning method. IEEE Int Conf Syst Man Cybern (SMC): 99\u2013104.","DOI":"10.1109\/SMC.2017.8122585"},{"issue":"1","key":"807_CR50","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1007\/s12369-015-0310-2","volume":"8","author":"B Kim","year":"2016","unstructured":"Kim B, Pineau J (2016) Socially adaptive path planning in human environments using inverse reinforcement. Int J Soc Robot 8(1):51\u201366","journal-title":"Int J Soc Robot"},{"issue":"2","key":"807_CR51","doi-asserted-by":"publisher","first-page":"1387","DOI":"10.1109\/LRA.2019.2895892","volume":"4","author":"M Pflueger","year":"2019","unstructured":"Pflueger M, Agha A, Gaurav S (2019) Rover-IRL: inverse reinforcement learning with soft value. IEEE Robot Autom Lett 4(2):1387\u20131394","journal-title":"IEEE Robot Autom Lett"},{"key":"807_CR52","doi-asserted-by":"crossref","unstructured":"Kuderer M, Gulati SBW (2015) Learning driving styles for autonomous vehicles from demonstration. In: IEEE International Conference on Robotics and Automation (ICRA), 2641\u20132646.","DOI":"10.1109\/ICRA.2015.7139555"},{"key":"807_CR53","doi-asserted-by":"crossref","unstructured":"Kuderer M, Kretzschmar HBW (2013) Teaching mobile robots to cooperatively navigate in populated environments. In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, 3138\u20133143.","DOI":"10.1109\/IROS.2013.6696802"},{"key":"807_CR54","doi-asserted-by":"crossref","unstructured":"Pfeiffer M, Schwesinger U, Sommer H, Galceran ESR (2016) Predicting actions to act predictably: cooperative partial motion planning with maximum entropy models. In: IEEE\/RJS International Conference on Intelligent Robots and Systems (IROS), 2096\u20132101.","DOI":"10.1109\/IROS.2016.7759329"},{"key":"807_CR55","doi-asserted-by":"crossref","unstructured":"Ziebart BD, Ratliff N, Gallagher G, Mertz C, Peterson K, Bagnell JA, Hebert M, Dey AK, Srinivasa S (2009) Planning-based prediction for pedestrians. In: IEEE\/RSJ International IEEE Conference Intelligent Robots and Systems, 3931\u20133936","DOI":"10.1109\/IROS.2009.5354147"},{"issue":"4","key":"807_CR56","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1007\/s10772-014-9224-x","volume":"17","author":"HR Chinaei","year":"2014","unstructured":"Chinaei HR, Chaib-Draa B (2014) Dialogue POMDP components (part II): learning the reward function. Int J Speech Technol 17(4):325\u2013340","journal-title":"Int J Speech Technol"},{"key":"807_CR57","doi-asserted-by":"crossref","unstructured":"Scobee DR, Royo VR, Tomlin CJ, S S. (2018) Haptic assistance via inverse reinforcement learning. IEEE Int Conf Syst Man Cybern (SMC): 1510\u20131517","DOI":"10.1109\/SMC.2018.00262"},{"key":"807_CR58","doi-asserted-by":"crossref","unstructured":"Chandramohan S, Geist M, Lefevre FPO (2011) User simulation in dialogue systems using nverse reinforcement learning. Interspeech, 1025\u20131028.","DOI":"10.21437\/Interspeech.2011-302"},{"issue":"4","key":"807_CR59","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1007\/s10772-014-9244-6","volume":"17","author":"HR Chinaei","year":"2014","unstructured":"Chinaei HR, Chaib-Draa B (2014) Dialogue POMDP components (part I): learning states and observations. Int J Speech Technol 17(4):309\u2013323","journal-title":"Int J Speech Technol"},{"key":"807_CR60","doi-asserted-by":"crossref","unstructured":"Elnaggar MBN (2018) An IRL approach for cyber-physical attack intention prediction and recovery. In: IEEE Annual American Control Conference (ACC), 222\u2013227.","DOI":"10.23919\/ACC.2018.8430922"},{"issue":"10","key":"807_CR61","doi-asserted-by":"publisher","first-page":"1683","DOI":"10.1080\/14697688.2015.1011684","volume":"15","author":"SY Yang","year":"2015","unstructured":"Yang SY, Qiao Q, Beling PA, Scherer WT, Kirilenko A (2015) Gaussian process-based algorithmic trading strategy identification. Quant Finan 15(10):1683\u20131703","journal-title":"Quant Finan"},{"key":"807_CR62","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1016\/j.eswa.2018.07.056","volume":"114","author":"SY Yang","year":"2018","unstructured":"Yang SY, Yu YAS (2018) An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm. Expert Syst Appl 114:388\u2013401","journal-title":"Expert Syst Appl"},{"key":"807_CR63","doi-asserted-by":"publisher","DOI":"10.1016\/0305-0483(87)90016-8","author":"TL Saaty","year":"1980","unstructured":"Saaty TL (1980) The analytic hierarchy process. New McGraw-Hill. https:\/\/doi.org\/10.1016\/0305-0483(87)90016-8","journal-title":"New McGraw-Hill"},{"issue":"3","key":"807_CR64","doi-asserted-by":"publisher","first-page":"04019112","DOI":"10.1061\/(asce)co.1943-7862.0001757","volume":"146","author":"H-M Lyu","year":"2020","unstructured":"Lyu H-M, Sun W-J, Shen S-L, Zhou AN (2020) Risk assessment using a new consulting process in fuzzy AHP. J Constr Eng Manag 146(3):04019112. https:\/\/doi.org\/10.1061\/(asce)co.1943-7862.0001757","journal-title":"J Constr Eng Manag"},{"key":"807_CR65","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1016\/j.psep.2020.01.003","volume":"135","author":"M Li","year":"2020","unstructured":"Li M, Wang H, Wang D, Shao Z, He S (2020) Risk assessment of gas explosion in coal mines based on fuzzy AHP and bayesian network. Process Saf Environ Prot 135:207\u2013218. https:\/\/doi.org\/10.1016\/j.psep.2020.01.003","journal-title":"Process Saf Environ Prot"},{"key":"807_CR66","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.scs.2019.101861","volume":"52","author":"Y Wang","year":"2020","unstructured":"Wang Y, Xu L, Solangi YA (2020) Strategic renewable energy resources selection for Pakistan: based on SWOT-fuzzy AHP APPROACH. Sustain Cities Soc 52:1\u201314. https:\/\/doi.org\/10.1016\/j.scs.2019.101861","journal-title":"Sustain Cities Soc"},{"key":"807_CR67","doi-asserted-by":"publisher","unstructured":"Zavadskas EK, Turskis Z, Stevi\u0107 \u017d, Mardani A (2020) Modelling procedure for the selection of steel pipe supplier by applying the fuzzy ahp method. Oper Res Eng Sci Theory Appl 3(2):39\u201353. https:\/\/doi.org\/10.31181\/oresta2003034z","DOI":"10.31181\/oresta2003034z"},{"key":"807_CR68","doi-asserted-by":"publisher","DOI":"10.1016\/j.jairtraman.2020.101817","author":"G B\u00fcy\u00fck\u00f6zkan","year":"2020","unstructured":"B\u00fcy\u00fck\u00f6zkan G, Havle CA, Feyzio\u011flu O (2020) A new digital service quality model and its strategic analysis in aviation industry using interval-valued intuitionistic fuzzy AHP. J Air Transp Manag. https:\/\/doi.org\/10.1016\/j.jairtraman.2020.101817","journal-title":"J Air Transp Manag"},{"issue":"4","key":"807_CR69","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1002\/bse.1946","volume":"26","author":"R Raut","year":"2017","unstructured":"Raut R, Cheikhrouhou N, Kharat M (2017) Sustainability in the banking industry: a strategic multi-criterion analysis. Bus Strateg Environ 26(4):550\u2013568. https:\/\/doi.org\/10.1002\/bse.1946","journal-title":"Bus Strateg Environ"},{"key":"807_CR70","doi-asserted-by":"publisher","first-page":"198","DOI":"10.1016\/j.techfore.2013.08.007","volume":"85","author":"N Somsuk","year":"2014","unstructured":"Somsuk N, Laosirihongthong T (2014) A fuzzy AHP to prioritize enabling factors for strategic management of university business incubators: resource-based view. Technol Forecast Soc Chang 85:198\u2013210. https:\/\/doi.org\/10.1016\/j.techfore.2013.08.007","journal-title":"Technol Forecast Soc Chang"},{"issue":"3","key":"807_CR71","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1016\/0377-2217(95)00300-2","volume":"95","author":"DY Chang","year":"1996","unstructured":"Chang DY (1996) Applications of the extent analysis method on fuzzy AHP. Eur J Oper Res 95(3):649\u2013655. https:\/\/doi.org\/10.1016\/0377-2217(95)00300-2","journal-title":"Eur J Oper Res"},{"key":"807_CR72","doi-asserted-by":"publisher","unstructured":"Kaya \u0130, \u00c7olak M, Terzi F (2019) A comprehensive review of fuzzy multi criteria decision making methodologies for energy policy making. Energy Strategy Rev 24 (May 2017): 207\u2013228. https:\/\/doi.org\/10.1016\/j.esr.2019.03.003","DOI":"10.1016\/j.esr.2019.03.003"},{"issue":"23","key":"807_CR73","doi-asserted-by":"publisher","first-page":"8095","DOI":"10.3390\/s21238095","volume":"21","author":"KM Aamir","year":"2021","unstructured":"Aamir KM, Sarfraz L, Ramzan M, Bilal M, Shafi J, Attique M (2021) A fuzzy rule-based system for classification of diabetes. Sensors 21(23):8095. https:\/\/doi.org\/10.3390\/s21238095","journal-title":"Sensors"},{"key":"807_CR74","doi-asserted-by":"crossref","unstructured":"Huang C, Yoon K (1981) Attribute multiple decision making. Springer","DOI":"10.1007\/978-3-642-48318-9"},{"issue":"3","key":"807_CR75","doi-asserted-by":"publisher","first-page":"899","DOI":"10.1016\/j.eswa.2012.05.046","volume":"40","author":"A Baykaso\u01e7lu","year":"2013","unstructured":"Baykaso\u01e7lu A, Kaplanoglu V, Durmu\u015foglu ZDU, \u015eahin C (2013) Integrating fuzzy DEMATEL and fuzzy hierarchical TOPSIS methods for truck selection. Expert Syst Appl 40(3):899\u2013907. https:\/\/doi.org\/10.1016\/j.eswa.2012.05.046","journal-title":"Expert Syst Appl"},{"issue":"2","key":"807_CR76","doi-asserted-by":"publisher","first-page":"679","DOI":"10.1016\/j.eswa.2013.07.093","volume":"41","author":"SK Patil","year":"2014","unstructured":"Patil SK, Kant R (2014) A fuzzy AHP\u2013TOPSIS framework for ranking the solutions of knowledge management adoption in supply chain to overcome its barriers. Expert Syst Appl 41(2):679\u2013693. https:\/\/doi.org\/10.1016\/j.eswa.2013.07.093","journal-title":"Expert Syst Appl"},{"issue":"4","key":"807_CR77","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1080\/13504509.2019.1570981","volume":"26","author":"IS Rampasso","year":"2019","unstructured":"Rampasso IS, Siqueira RG, Anholon R, Silva D, Quelhas OLG, Leal Filho W, Brandli LL (2019) Some of the challenges in implementing education for sustainable development: perspectives from Brazilian engineering students. Int J Sust Dev World 26(4):367\u2013376. https:\/\/doi.org\/10.1080\/13504509.2019.1570981","journal-title":"Int J Sust Dev World"},{"issue":"1","key":"807_CR78","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1016\/j.eswa.2013.07.010","volume":"41","author":"S Senthil","year":"2014","unstructured":"Senthil S, Srirangacharyulu B, Ramesh A (2014) A robust hybrid multi-criteria decision making methodology for contractor evaluation and selection in third-party reverse logistics. Expert Syst Appl 41(1):50\u201358. https:\/\/doi.org\/10.1016\/j.eswa.2013.07.010","journal-title":"Expert Syst Appl"},{"key":"807_CR79","doi-asserted-by":"publisher","first-page":"599","DOI":"10.1016\/j.jmsy.2015.03.001","volume":"37","author":"C Prakash","year":"2015","unstructured":"Prakash C, Barua MK (2015) Integration of AHP\u2013TOPSIS method for prioritizing the solutions of reverse logistics adoption to overcome its barriers under fuzzy environment. J Manuf Syst 37:599\u2013615. https:\/\/doi.org\/10.1016\/j.jmsy.2015.03.001","journal-title":"J Manuf Syst"},{"issue":"3","key":"807_CR80","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1002\/spe.2853","volume":"51","author":"MZ Asghar","year":"2021","unstructured":"Asghar MZ, Subhan F, Ahmad H, Khan WZ, Hakak S, Gadekallu TR, Alazab M (2021) Senti-eSystem: a sentiment-based eSystem-using hybridized fuzzy and deep neural network for measuring customer satisfaction. Softw Pract Exp 51(3):571\u2013594. https:\/\/doi.org\/10.1002\/spe.2853","journal-title":"Softw Pract Exp"},{"issue":"2","key":"807_CR81","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1007\/s12065-019-00327-1","volume":"13","author":"GT Reddy","year":"2020","unstructured":"Reddy GT, Reddy MPK, Lakshmanna K, Rajput DS, Kaluri R, Srivastava G (2020) Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis. Evol Intel 13(2):185\u2013196. https:\/\/doi.org\/10.1007\/s12065-019-00327-1","journal-title":"Evol Intel"},{"key":"807_CR82","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1016\/j.isatra.2020.01.016","volume":"101","author":"H Malik","year":"2020","unstructured":"Malik H, Sharma R, Mishra S (2020) Fuzzy reinforcement learning based intelligent classifier for power transformer faults. ISA Trans 101:390\u2013398. https:\/\/doi.org\/10.1016\/j.isatra.2020.01.016","journal-title":"ISA Trans"},{"issue":"6","key":"807_CR83","doi-asserted-by":"publisher","first-page":"953","DOI":"10.1109\/TEVC.2016.2560139","volume":"20","author":"G Chen","year":"2016","unstructured":"Chen G, Douch CIJ, Zhang M (2016) Accuracy-based learning classifier systems for multistep reinforcement learning: a fuzzy logic approach to handling continuous inputs and learning continuous actions. IEEE Trans Evol Comput 20(6):953\u2013971. https:\/\/doi.org\/10.1109\/TEVC.2016.2560139","journal-title":"IEEE Trans Evol Comput"},{"issue":"6","key":"807_CR84","doi-asserted-by":"publisher","first-page":"1178","DOI":"10.1109\/TFUZZ.2019.2952831","volume":"28","author":"G Capizzi","year":"2020","unstructured":"Capizzi G, Sciuto GL, Napoli C, Polap D, Wozniak M (2020) Small lung nodules detection based on fuzzy-logic and probabilistic neural network with bioinspired reinforcement learning. IEEE Trans Fuzzy Syst 28(6):1178\u20131189. https:\/\/doi.org\/10.1109\/TFUZZ.2019.2952831","journal-title":"IEEE Trans Fuzzy Syst"},{"issue":"10","key":"807_CR85","doi-asserted-by":"publisher","first-page":"3921","DOI":"10.1007\/s12652-019-01627-1","volume":"11","author":"Y Madani","year":"2020","unstructured":"Madani Y, Ezzikouri H, Erritali M, Hssina B (2020) Finding optimal pedagogical content in an adaptive e-learning platform using a new recommendation approach and reinforcement learning. J Ambient Intell Humaniz Comput 11(10):3921\u20133936. https:\/\/doi.org\/10.1007\/s12652-019-01627-1","journal-title":"J Ambient Intell Humaniz Comput"},{"issue":"December","key":"807_CR86","doi-asserted-by":"publisher","DOI":"10.1016\/j.oceaneng.2020.108477","volume":"220","author":"AV Le","year":"2021","unstructured":"Le AV, Kyaw PT, Veerajagadheswar P, Muthugala MAVJ, Elara MR, Kumar M, Khanh Nhan NH (2021) Reinforcement learning-based optimal complete water-blasting for autonomous ship hull corrosion cleaning system. Ocean Eng 220(December):108477. https:\/\/doi.org\/10.1016\/j.oceaneng.2020.108477","journal-title":"Ocean Eng"},{"key":"807_CR87","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1007\/s41066-021-00267-1","volume":"7","author":"R Joshi","year":"2022","unstructured":"Joshi R, Kumar S (2022) A novel VIKOR approach based on weighted correlation coefficients and picture fuzzy information for multicriteria decision making. Granul Comput 7:323\u2013336. https:\/\/doi.org\/10.1007\/s41066-021-00267-1","journal-title":"Granul Comput"},{"issue":"2","key":"807_CR88","first-page":"1","volume":"39","author":"R Joshi","year":"2022","unstructured":"Joshi R (2022) A new picture fuzzy informatio n measure based on Tsallis\u2013Havrda\u2013Charvat concept with applications in presaging poll outcome. Comput Appl Math 39(2):1\u201324","journal-title":"Comput Appl Math"},{"key":"807_CR89","doi-asserted-by":"crossref","unstructured":"Diabat A, Khreishah A, Kannan G, Panikar V, Gunasekaran A (2013) Benchmarking the interactions an, among barriers in third-party logistics implementation: an ISM approach. Benchmark Int J 20(6): 805\u2013824","DOI":"10.1108\/BIJ-04-2013-0039"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00807-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00807-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00807-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,22]],"date-time":"2023-02-22T18:55:18Z","timestamp":1677092118000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00807-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,20]]},"references-count":89,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,2]]}},"alternative-id":["807"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00807-5","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,20]]},"assertion":[{"value":"18 September 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 June 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"No potential conflict of interest was reported by the authors and no financial and non-financial conflict of the authors. On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}