{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T15:31:50Z","timestamp":1762183910842,"version":"build-2065373602"},"reference-count":51,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T00:00:00Z","timestamp":1761868800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute for Information & communications Technology Planning & Evaluation"},{"name":"Korea government","award":["RS-2022-II220369"],"award-info":[{"award-number":["RS-2022-II220369"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Multi-hop question answering tasks involve identifying relevant supporting sentences from a given set of documents, which serve as the rationale for deriving answers. Most research in this area consists of two main components: a rationale identification module and a reader module. Since the rationale identification module often relies on retrieval models or supervised learning, annotated rationales are typically essential. This reliance on annotations, however, creates challenges when adapting to open-domain settings. Moreover, when models are trained on annotated rationales, explainable artificial intelligence (XAI) requires clear explanations of how the model arrives at these rationales. Consequently, traditional multi-hop question answering (QA) approaches that depend on annotated rationales are ill-suited for XAI, which demands transparency in the model\u2019s reasoning process. To address this issue, we propose a rationale reasoning framework that can effectively infer rationales and clearly demonstrate the model\u2019s reasoning process, even in open-domain environments without annotations. The proposed model is applicable to various tasks without structural constraints, and experimental results demonstrate its significantly improved rationale reasoning capabilities in multi-hop question answering, relation extraction, and sentence classification tasks.<\/jats:p>","DOI":"10.3390\/bdcc9110273","type":"journal-article","created":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T13:55:22Z","timestamp":1762178122000},"page":"273","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Explainable Multi-Hop Question Answering: A Rationale-Based Approach"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-2906-1899","authenticated-orcid":false,"given":"Kyubeen","family":"Han","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0717-2433","authenticated-orcid":false,"given":"Youngjin","family":"Jang","sequence":"additional","affiliation":[{"name":"Department of Multilingual Understanding, NC AI, 12, Daewangpangyo-ro 644beon-gil, Bundang-gu, Seongnam-si, Gyeonggi-do 13494, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8286-7198","authenticated-orcid":false,"given":"Harksoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul 05029, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,31]]},"reference":[{"key":"ref_1","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., and Anadkat, S. (2023). Gpt-4 technical report. arXiv."},{"key":"ref_2","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Bhosale, S. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv."},{"key":"ref_3","unstructured":"Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., and Dong, Z. (2023). A survey of large language models. arXiv."},{"key":"ref_4","unstructured":"Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., and Fan, A. (2024). The llama 3 herd of models. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Bender, E.M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021, January 3\u201310). On the dangers of stochastic parrots: Can language models be too big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtually.","DOI":"10.1145\/3442188.3445922"},{"key":"ref_6","unstructured":"Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., and Brunskill, E. (2021). On the opportunities and risks of foundation models. arXiv."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1214\/21-SS133","article-title":"Interpretable machine learning: Fundamental principles and 10 grand challenges","volume":"16","author":"Rudin","year":"2022","journal-title":"Stat. Surv."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Shwartz, V., and Choi, Y. (2020, January 8\u201313). Do neural language models overcome reporting bias?. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain.","DOI":"10.18653\/v1\/2020.coling-main.605"},{"key":"ref_9","unstructured":"Chen, J., Lin, S.t., and Durrett, G. (2019). Multi-hop question answering via reasoning chains. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Wu, H., Chen, W., Xu, S., and Xu, B. (2021, January 6\u201311). Counterfactual supporting facts extraction for explainable medical record based diagnosis with graph network. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online.","DOI":"10.18653\/v1\/2021.naacl-main.156"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhao, W., Chiu, J., Cardie, C., and Rush, A.M. (2023, January 6\u201310). Hop, Union, Generate: Explainable Multi-hop Reasoning without Rationale Supervision. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.1001"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W., Salakhutdinov, R., and Manning, C.D. (November, January 31). HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1259"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Qi, P., Lin, X., Mehr, L., Wang, Z., and Manning, C.D. (2019, January 3\u20137). Answering Complex Open-domain Questions Through Iterative Query Generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China.","DOI":"10.18653\/v1\/D19-1261"},{"key":"ref_14","first-page":"9459","article-title":"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks","volume":"33","author":"Lewis","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., and Guestrin, C. (2016, January 13\u201317). \u201cWhy should i trust you?\u201d Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939778"},{"key":"ref_16","first-page":"44","article-title":"DARPA\u2019s explainable artificial intelligence program","volume":"40","author":"Gunning","year":"2019","journal-title":"AI Mag"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1162\/tacl_a_00324","article-title":"How can we know what language models know?","volume":"8","author":"Jiang","year":"2020","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Arras, L., Montavon, G., M\u00fcller, K.R., and Samek, W. (2017, January 29). Explaining Recurrent Neural Network Predictions in Sentiment Analysis. Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis WASSA 2017: Proceedings of the Workshop, Copenhagen, Denmark.","DOI":"10.18653\/v1\/W17-5221"},{"key":"ref_19","first-page":"4765","article-title":"A unified approach to interpreting model predictions","volume":"30","author":"Scott","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_20","unstructured":"Alvarez-Melis, D., and Jaakkola, T.S. (2018). On the Robustness of Interpretability Methods. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., and Kagal, L. (2018, January 1\u20133). Explaining explanations: An overview of interpretability of machine learning. Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy.","DOI":"10.1109\/DSAA.2018.00018"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jiang, Z., Zhang, Y., Yang, Z., Zhao, J., and Liu, K. (2021, January 1\u20136). Alignment rationale for natural language inference. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Bangkok, Thailand.","DOI":"10.18653\/v1\/2021.acl-long.417"},{"key":"ref_23","unstructured":"Jain, S., and Wallace, B.C. (2019, January 3\u20135). Attention is not explanation. Proceedings of the NAACL-HLT 2019, Minneapolis, MN, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Serrano, S., and Smith, N.A. (2019, January 4\u201313). Is attention interpretable?. Proceedings of the ACL 2019, Austin, TX, USA.","DOI":"10.18653\/v1\/P19-1282"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Wiegreffe, S., and Pinter, Y. (2019, January 3\u20137). Attention is not not explanation. Proceedings of the EMNLP-IJCNLP 2019, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1002"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Jacovi, A., and Goldberg, Y. (2020, January 5\u201310). Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?. Proceedings of the ACL 2020, Online.","DOI":"10.18653\/v1\/2020.acl-main.386"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lei, T., Barzilay, R., and Jaakkola, T. (2016, January 22). Rationalizing neural predictions. Proceedings of the EMNLP 2016, Austin, TX, USA.","DOI":"10.18653\/v1\/D16-1011"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"DeYoung, J., Jain, S., Rajani, N.F., Lehman, E., Xiong, C., Socher, R., and Wallace, B.C. (2020, January 5\u201310). ERASER: A benchmark to evaluate rationalized NLP models. Proceedings of the ACL 2020, Online.","DOI":"10.18653\/v1\/2020.acl-main.408"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1162\/tacl_a_00021","article-title":"Constructing datasets for multi-hop reading comprehension across documents","volume":"6","author":"Welbl","year":"2018","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yu, X., Min, S., Zettlemoyer, L., and Hajishirzi, H. (2023, January 9\u201314). CREPE: Open-Domain Question Answering with False Presuppositions. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.583"},{"key":"ref_31","first-page":"1","article-title":"Explainability for Large Language Models: A Survey","volume":"15","author":"Zhao","year":"2024","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_32","unstructured":"Bilal, A., Ebert, D., and Lin, B. (2025). LLMs for Explainable AI: A Comprehensive Survey. arXiv."},{"key":"ref_33","unstructured":"Huang, S., Zhou, Y., Liu, X., Zhang, J., Wang, W., and Huang, M. (2023). Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations. arXiv."},{"key":"ref_34","first-page":"74952","article-title":"Language Models Don\u2019t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting","volume":"36","author":"Turpin","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Atanasova, P., Simonsen, J.G., Lioma, C., and Augenstein, I. (March, January 22). Diagnostics-guided explanation generation. Proceedings of the AAAI Conference on Artificial Intelligence 2022, Online.","DOI":"10.1609\/aaai.v36i10.21287"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Glockner, M., Habernal, I., and Gurevych, I. (2020). Why do you think that? exploring faithful sentence-level rationales without supervision. arXiv.","DOI":"10.18653\/v1\/2020.findings-emnlp.97"},{"key":"ref_37","unstructured":"Min, S., Zhong, V., Zettlemoyer, L., and Hajishirzi, H. (August, January 28). Multi-hop Reading Comprehension through Question Decomposition and Rescoring. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Mao, J., Jiang, W., Wang, X., Liu, H., Xia, Y., Lyu, Y., and She, Q. (2022, January 7\u201311). Explainable question answering based on semantic graph by global differentiable learning and dynamic adaptive reasoning. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.356"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yin, Z., Wang, Y., Hu, X., Wu, Y., Yan, H., Zhang, X., Cao, Z., Huang, X., and Qiu, X. (2023, January 3\u20135). Rethinking label smoothing on multi-hop question answering. Proceedings of the China National Conference on Chinese Computational Linguistics, Harbin, China.","DOI":"10.1007\/978-981-99-6207-5_5"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Tishby, N., and Zaslavsky, N. (2015, January 11\u201315). Deep learning and the information bottleneck. Proceedings of the 2015 IEEE Information Theory Workshop (ITW) 2015, Jeju Island, Republic of Korea.","DOI":"10.1109\/ITW.2015.7133169"},{"key":"ref_41","unstructured":"Clark, K. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv."},{"key":"ref_42","unstructured":"Vinyals, O., Fortunato, M., and Jaitly, N. (2015). Pointer networks. Adv. Neural Inf. Process. Syst., 28."},{"key":"ref_43","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv."},{"key":"ref_44","first-page":"263","article-title":"The mathematics of statistical machine translation: Parameter estimation","volume":"19","author":"Brown","year":"1993","journal-title":"Comput. Linguist."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1162\/tacl_a_00475","article-title":"MuSiQue: Multihop Questions via Single-hop Question Composition","volume":"10","author":"Trivedi","year":"2022","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_46","unstructured":"Yao, Y., Ye, D., Li, P., Han, X., Lin, Y., Liu, Z., Liu, Z., Huang, L., Zhou, J., and Sun, M. (August, January 28). DocRED: A Large-Scale Document-Level Relation Extraction Dataset. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_47","unstructured":"Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., and Potts, C. (2011, January 19\u201327). Learning Word Vectors for Sentiment Analysis. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., and Gatford, M. (1995). Okapi at TREC-3, British Library Research and Development Department.","DOI":"10.6028\/NIST.SP.500-225.city"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"21247","DOI":"10.1007\/s00521-023-08892-4","article-title":"Multi-grained unsupervised evidence retrieval for question answering","volume":"35","author":"You","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_51","unstructured":"OpenAI (2025, September 17). Gpt-4o Mini: Advancing Cost-Efficient Intelligence. Available online: https:\/\/openai.com\/index\/gpt-4o-mini-advancing-cost-efficient-intelligence\/."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/11\/273\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,3]],"date-time":"2025-11-03T14:43:21Z","timestamp":1762181001000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/11\/273"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,31]]},"references-count":51,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["bdcc9110273"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9110273","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,10,31]]}}}