{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T19:26:48Z","timestamp":1776367608176,"version":"3.51.2"},"reference-count":47,"publisher":"World Scientific Pub Co Pte Ltd","issue":"02","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Soft. Eng. Knowl. Eng."],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:p> The paper addresses the limitations of traditional evaluation metrics for Question Answering (QA) systems that primarily focus on syntax and n-gram similarity. We propose a novel model-based evaluation metric, MQA-metric, and create a human-judgment-based dataset, squad-qametric and marco-qametric, to validate our approach. The research aims to solve several key problems: the objectivity in dataset labeling, the effectiveness of metrics when there is no syntax similarity, the impact of answer length on metric performance, and the influence of real answer quality on metric results. To tackle these challenges, we designed an interface for dataset labeling and conducted extensive experiments with human reviewers. Our analysis shows that the MQA-metric outperforms traditional metrics like BLEU, ROUGE and METEOR. Unlike existing metrics, MQA-metric leverages semantic comprehension through large language models (LLMs), enabling it to capture contextual nuances and synonymous expressions more effectively. This approach sets a standard for evaluating QA systems by prioritizing semantic accuracy over surface-level similarities. The proposed metric correlates better with human judgment, making it a more reliable tool for evaluating QA systems. Our contributions include the development of a robust evaluation workflow, creation of high-quality datasets, and an extensive comparison with existing evaluation methods. The results indicate that our model-based approach provides a significant improvement in assessing the quality of QA systems, which is crucial for their practical application and trustworthiness. <\/jats:p>","DOI":"10.1142\/s0218194025500032","type":"journal-article","created":{"date-parts":[[2024,11,29]],"date-time":"2024-11-29T15:29:11Z","timestamp":1732894151000},"page":"243-262","source":"Crossref","is-referenced-by-count":2,"title":["A Model-Based Evaluation Metric for Question Answering Systems"],"prefix":"10.1142","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6650-6942","authenticated-orcid":false,"given":"Dilan","family":"Bak\u0131r","sequence":"first","affiliation":[{"name":"Computer Engineering Department, Yildiz Technical University Istanbul, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7908-5067","authenticated-orcid":false,"given":"Mehmet S.","family":"Aktas","sequence":"additional","affiliation":[{"name":"Computer Engineering Department, Yildiz Technical University Istanbul, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7664-5145","authenticated-orcid":false,"given":"Beytullah","family":"Y\u0131ld\u0131z","sequence":"additional","affiliation":[{"name":"Software Engineering Department, Atilim University Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2025,1,27]]},"reference":[{"key":"S0218194025500032BIB001","first-page":"311","volume-title":"Proc. 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni K.","year":"2002"},{"key":"S0218194025500032BIB002","first-page":"74","volume-title":"Proc. Text Summarization Branches Out","author":"Lin C. Y.","year":"2004"},{"key":"S0218194025500032BIB003","first-page":"65","volume-title":"Proc. ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization","author":"Banerjee S.","year":"2005"},{"key":"S0218194025500032BIB004","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1230"},{"key":"S0218194025500032BIB005","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1429"},{"key":"S0218194025500032BIB006","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1051"},{"key":"S0218194025500032BIB007","first-page":"249","volume-title":"Proc. 11th Conf. European Chapter Association for Computational Linguistics","author":"Callison-Burch C.","year":"2006"},{"key":"S0218194025500032BIB008","volume-title":"Proc. Int. Conf. Natural Language Process","author":"Ananthakrishnan R.","year":"2007"},{"key":"S0218194025500032BIB009","doi-asserted-by":"publisher","DOI":"10.3115\/1699510.1699548"},{"key":"S0218194025500032BIB010","volume-title":"Proc. 8th Int. Conf. Learning Representations","author":"Zhang T.","year":"2020"},{"key":"S0218194025500032BIB011","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1395"},{"key":"S0218194025500032BIB012","series-title":"Lecture Notes in Computer Science","volume-title":"Evaluating Natural Language Processing Systems: An Analysis and Review","author":"Jones K. S.","year":"1996"},{"key":"S0218194025500032BIB014","first-page":"60","volume-title":"Proc. AAAI Intelligent Text Summarization Workshop","author":"Jing H.","year":"1998"},{"key":"S0218194025500032BIB016","first-page":"26","volume-title":"Proc. AAAI Intelligent Text Summarization Workshop, 1998","author":"Strzalkowski T."},{"key":"S0218194025500032BIB017","author":"Peters M. E.","year":"2018","journal-title":"Comput. Sci."},{"key":"S0218194025500032BIB018","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1241"},{"key":"S0218194025500032BIB019","doi-asserted-by":"publisher","DOI":"10.1109\/QRS54544.2021.00108"},{"key":"S0218194025500032BIB020","series-title":"Lecture Notes in Computer Science","volume-title":"Advances in Visual Informatics","volume":"14322","author":"Chakraborty S.","year":"2023"},{"key":"S0218194025500032BIB021","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K19-1038"},{"key":"S0218194025500032BIB022","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-1158"},{"key":"S0218194025500032BIB023","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.454"},{"key":"S0218194025500032BIB024","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194023500201"},{"key":"S0218194025500032BIB025","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194019500086"},{"key":"S0218194025500032BIB026","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194022500449"},{"key":"S0218194025500032BIB027","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194021400039"},{"key":"S0218194025500032BIB028","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194021500479"},{"key":"S0218194025500032BIB029","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194021400064"},{"key":"S0218194025500032BIB030","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194011005475"},{"key":"S0218194025500032BIB031","doi-asserted-by":"publisher","DOI":"10.1142\/S0219427905001274"},{"key":"S0218194025500032BIB032","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194013500332"},{"key":"S0218194025500032BIB033","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194023500122"},{"key":"S0218194025500032BIB034","doi-asserted-by":"publisher","DOI":"10.1142\/S0218194024500335"},{"key":"S0218194025500032BIB035","first-page":"77013","volume-title":"Advances in Neural Information Processing Systems","volume":"36","author":"Wang C.","year":"2023"},{"key":"S0218194025500032BIB036","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.170"},{"key":"S0218194025500032BIB037","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1312"},{"key":"S0218194025500032BIB038","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1199"},{"key":"S0218194025500032BIB039","doi-asserted-by":"publisher","DOI":"10.1007\/s00791-007-0083-8"},{"key":"S0218194025500032BIB040","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7643-8131-8_3"},{"key":"S0218194025500032BIB041","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7643-8757-0_11"},{"key":"S0218194025500032BIB042","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-09156-3_35"},{"key":"S0218194025500032BIB043","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-21413-9_11"},{"key":"S0218194025500032BIB044","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213024500179"},{"key":"S0218194025500032BIB045","doi-asserted-by":"publisher","DOI":"10.1109\/BigData59044.2023.10386689"},{"key":"S0218194025500032BIB046","doi-asserted-by":"publisher","DOI":"10.1109\/BigData50022.2020.9378153"},{"key":"S0218194025500032BIB047","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.6546"},{"key":"S0218194025500032BIB048","first-page":"3173","volume-title":"2020 IEEE Int. Conf. Big Data (Big Data)","author":"Olmezogullari E.","year":"2020"},{"key":"S0218194025500032BIB051","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1264"}],"container-title":["International Journal of Software Engineering and Knowledge Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218194025500032","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T06:09:24Z","timestamp":1741327764000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218194025500032"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,27]]},"references-count":47,"journal-issue":{"issue":"02","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["10.1142\/S0218194025500032"],"URL":"https:\/\/doi.org\/10.1142\/s0218194025500032","relation":{},"ISSN":["0218-1940","1793-6403"],"issn-type":[{"value":"0218-1940","type":"print"},{"value":"1793-6403","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,27]]}}}