{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:20:15Z","timestamp":1753885215914,"version":"3.41.2"},"reference-count":26,"publisher":"World Scientific Pub Co Pte Ltd","issue":"02","funder":[{"name":"Key Science and Technology Special Program of Yunnan Province","award":["202202AF080003"],"award-info":[{"award-number":["202202AF080003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Comp. Intel. Appl."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:p> Knowledge-based visual question answering relies on open-ended external knowledge and a fine-grained comprehension of both the visual content of images and semantic information. Existing methods for utilizing knowledge have the following limitations: (1) Language pre-training methods output answers in the form of plain text, which only understand shallow visual content; (2) The knowledge retrieved by image objects as labels is represented as first-order logic, making it difficult to infer complex questions. To address the above problems, this paper integrates visual-textual multimodal information, accumulates domain-specific and external multi-modal knowledge, introduces and supplements external objective facts, and proposes a multimodal knowledge graph construction and fact-assisted reasoning network (MKGFA). The network consists of three parts: the multimodal knowledge graph construction module (MKGC), the objective fact-assisted reasoning module (FAR), and the answer inference module. The MKGC engages in the coarse-to-fine-grained learning of triplet representations for multimodal knowledge units. The FAR establishes deep cross-modal relations between visual objects and factual words for correlating real answers. The answer inference module makes the final decision based on the results of both. Among them, the former two modules employ a pre-training and fine-tuning strategy, systematically accumulating foundational and domain-specific knowledge. Compared with the state-of-the-arts, MKGFA achieves 1.09% and 0.7% higher accuracy on the two challenging OKVQA and KRVQA datasets, respectively. The experimental results demonstrate the complementary advantages of the integration of the two modules. <\/jats:p>","DOI":"10.1142\/s1469026824500342","type":"journal-article","created":{"date-parts":[[2024,12,30]],"date-time":"2024-12-30T03:30:56Z","timestamp":1735529456000},"source":"Crossref","is-referenced-by-count":0,"title":["MKGFA: Multimodal Knowledge Graph Construction and Fact-Assisted Reasoning for VQA"],"prefix":"10.1142","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6164-1253","authenticated-orcid":false,"given":"Longbao","family":"Wang","sequence":"first","affiliation":[{"name":"College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China"},{"name":"Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Nanjing 210000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3004-4872","authenticated-orcid":false,"given":"Jinhao","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5712-092X","authenticated-orcid":false,"given":"Libing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Kunming Engineering Corporation Limited, Kunming 650000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-0243-1337","authenticated-orcid":false,"given":"Shuai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Kunming Engineering Corporation Limited, Kunming 650000, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6802-9083","authenticated-orcid":false,"given":"Shufang","family":"Xu","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Hohai University, Changzhou 213200, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2201-5879","authenticated-orcid":false,"given":"Lin","family":"Yu","sequence":"additional","affiliation":[{"name":"College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8404-2464","authenticated-orcid":false,"given":"Hongmin","family":"Gao","sequence":"additional","affiliation":[{"name":"College of Information Science and Engineering, Hohai University, Changzhou 213200, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"219","published-online":{"date-parts":[[2024,12,30]]},"reference":[{"key":"S1469026824500342BIB001","first-page":"14106","volume-title":"IEEE\/CVF Conf. Computer Vision and Pattern Recognition","author":"Marino K.","year":"2020"},{"key":"S1469026824500342BIB003","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.44"},{"key":"S1469026824500342BIB005","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00857"},{"key":"S1469026824500342BIB006","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-demos.11"},{"key":"S1469026824500342BIB007","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.07.008"},{"key":"S1469026824500342BIB009","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.mrl-1.13"},{"key":"S1469026824500342BIB011","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.153"},{"key":"S1469026824500342BIB012","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.118669"},{"key":"S1469026824500342BIB014","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2020.10.007"},{"key":"S1469026824500342BIB016","doi-asserted-by":"publisher","DOI":"10.1561\/1500000019"},{"key":"S1469026824500342BIB018","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1514"},{"key":"S1469026824500342BIB020","first-page":"2787","volume-title":"Proc. 26th Int. Conf. Neural Information Processing Systems","volume":"2","author":"Bordes A.","year":"2013"},{"key":"S1469026824500342BIB021","doi-asserted-by":"publisher","DOI":"10.1142\/S1469026822500146"},{"key":"S1469026824500342BIB022","first-page":"6000","volume-title":"Proc. 31st Int. Conf. Neural Information Processing Systems","author":"Vaswani A.","year":"2017"},{"key":"S1469026824500342BIB024","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00208"},{"key":"S1469026824500342BIB025","doi-asserted-by":"publisher","DOI":"10.1016\/j.ymeth.2021.06.010"},{"key":"S1469026824500342BIB026","doi-asserted-by":"publisher","DOI":"10.1109\/TIE.2023.3274881"},{"key":"S1469026824500342BIB027","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2024.102575"},{"key":"S1469026824500342BIB028","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00331"},{"key":"S1469026824500342BIB029","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"S1469026824500342BIB030","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"S1469026824500342BIB032","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0966-6"},{"key":"S1469026824500342BIB033","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00553"},{"key":"S1469026824500342BIB035","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00444"},{"key":"S1469026824500342BIB039","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11671"},{"key":"S1469026824500342BIB040","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2817340"}],"container-title":["International Journal of Computational Intelligence and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S1469026824500342","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:28:13Z","timestamp":1750235293000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S1469026824500342"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,30]]},"references-count":26,"journal-issue":{"issue":"02","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.1142\/S1469026824500342"],"URL":"https:\/\/doi.org\/10.1142\/s1469026824500342","relation":{},"ISSN":["1469-0268","1757-5885"],"issn-type":[{"type":"print","value":"1469-0268"},{"type":"electronic","value":"1757-5885"}],"subject":[],"published":{"date-parts":[[2024,12,30]]},"article-number":"2450034"}}