{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:09:23Z","timestamp":1753884563355,"version":"3.41.2"},"reference-count":42,"publisher":"World Scientific Pub Co Pte Ltd","issue":"08","funder":[{"name":"National Key Technology R&D Program of China","award":["2021YFD2100605"],"award-info":[{"award-number":["2021YFD2100605"]}]},{"name":"Researchers Supporting, King Saud University, Riyadh, Saudi Arabia","award":["RSP2024R509"],"award-info":[{"award-number":["RSP2024R509"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["72301010"],"award-info":[{"award-number":["72301010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"The Project of Construction and Support for high-level Innovative Teams of Beijing Municipal Institutions","award":["BPHR20220104"],"award-info":[{"award-number":["BPHR20220104"]}]},{"name":"Beijing Scholars Program","award":["099"],"award-info":[{"award-number":["099"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J CIRCUIT SYST COMP"],"published-print":{"date-parts":[[2025,5,30]]},"abstract":"<jats:p> Recently, multimodal relation extraction (MRE) and multimodal-named entity recognition (MNER) have attracted widespread attention. However, prior research works have encountered challenges including inadequate semantic representation of images, cross-modal information fusion, and irrelevance between some images and text. To enhance semantic representation, we employ CLIP\u2019s image encoder, vision transformer (VIT), to generate visual features representing different semantic intensities. Addressing cross-modal semantic gaps, we introduce an image caption generation model and BERT to sequentially generate image captions and their features, transforming both modalities into text. Dynamic gates and attention mechanisms are introduced to efficiently fuse visual features, image description text features, and text features, mitigating noise from image-text irrelevance. Eventually, we successfully constructed an efficient MRE and MNER model. The experimental outcomes demonstrate that the model proposed in this paper improves 2.2% to 0.18% on the MRE and MNER datasets. Our code is available at https:\/\/github.com\/SiweiWei6\/VIT-CMNet . <\/jats:p>","DOI":"10.1142\/s0218126625501142","type":"journal-article","created":{"date-parts":[[2024,10,11]],"date-time":"2024-10-11T07:32:36Z","timestamp":1728631956000},"source":"Crossref","is-referenced-by-count":0,"title":["Efficient Image Semantic Representation and Visual\u2013Textual Semantic Fusion for Multimodal Relation Extraction and Multimodal-Named Entity Recognition"],"prefix":"10.1142","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9225-7660","authenticated-orcid":false,"given":"Qingchuan","family":"Zhang","sequence":"first","affiliation":[{"name":"National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No. 11 and No. 33, Fucheng Road, Haidian District, Beijing 100048, P. R. China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1718-9981","authenticated-orcid":false,"given":"Siwei","family":"Wei","sequence":"additional","affiliation":[{"name":"National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No. 11 and No. 33, Fucheng Road, Haidian District, Beijing 100048, P. R. China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8972-5953","authenticated-orcid":false,"given":"Fayez","family":"Alqahtani","sequence":"additional","affiliation":[{"name":"Software Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 12372, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2669-8972","authenticated-orcid":false,"given":"Zafer","family":"Almakhadmeh","sequence":"additional","affiliation":[{"name":"Computer Science Department, Community College, King Saud University, Riyadh, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1310-033X","authenticated-orcid":false,"given":"Yuanyuan","family":"Cai","sequence":"additional","affiliation":[{"name":"National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No. 11 and No. 33, Fucheng Road, Haidian District, Beijing 100048, P. R. China"}]}],"member":"219","published-online":{"date-parts":[[2025,3,7]]},"reference":[{"key":"S0218126625501142BIB001","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i16.17687"},{"key":"S0218126625501142BIB002","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531992"},{"key":"S0218126625501142BIB003","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i9.26309"},{"key":"S0218126625501142BIB004","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-022-08667-2"},{"key":"S0218126625501142BIB007","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-acl.147"},{"key":"S0218126625501142BIB008","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.121561"},{"key":"S0218126625501142BIB009","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2023.3289879"},{"key":"S0218126625501142BIB010","first-page":"3810","volume-title":"Proc. 29th Int. Conf. Computational Linguistics","author":"Zong S.","year":"2022"},{"key":"S0218126625501142BIB012","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.306"},{"key":"S0218126625501142BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3013398"},{"key":"S0218126625501142BIB014","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3476968"},{"key":"S0218126625501142BIB015","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2023.103546"},{"key":"S0218126625501142BIB017","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1272"},{"key":"S0218126625501142BIB018","first-page":"1855","volume-title":"Proc. 29th Int. Conf. Computational Linguistics","author":"Xu B.","year":"2022"},{"key":"S0218126625501142BIB019","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-naacl.121"},{"volume-title":"ICLR 2021 - 9th Int. Conf. Learning Representations","year":"2020","author":"Dosovitskiy A.","key":"S0218126625501142BIB020"},{"key":"S0218126625501142BIB021","first-page":"8748","volume-title":"Int. Conf. Machine Learning","author":"Radford A.","year":"2021"},{"key":"S0218126625501142BIB023","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"S0218126625501142BIB024","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"S0218126625501142BIB027","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.421"},{"key":"S0218126625501142BIB028","doi-asserted-by":"publisher","DOI":"10.1109\/TNSE.2022.3190765"},{"key":"S0218126625501142BIB029","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2023.3268843"},{"key":"S0218126625501142BIB030","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01748"},{"key":"S0218126625501142BIB031","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00282"},{"key":"S0218126625501142BIB032","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.wnut-1.11"},{"key":"S0218126625501142BIB033","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"S0218126625501142BIB034","first-page":"2048","volume-title":"Int. Conf. Machine Learning","author":"Xu K.","year":"2015"},{"key":"S0218126625501142BIB035","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01094"},{"key":"S0218126625501142BIB036","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00483"},{"key":"S0218126625501142BIB037","first-page":"6000","volume":"30","author":"Vaswani A.","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218126625501142BIB038","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16328"},{"key":"S0218126625501142BIB039","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3243725"},{"key":"S0218126625501142BIB040","first-page":"12888","volume-title":"Proc. Machine Learning Research","author":"Li J.","year":"2022"},{"key":"S0218126625501142BIB041","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2577031"},{"key":"S0218126625501142BIB042","first-page":"12116","volume":"34","author":"Raghu M.","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218126625501142BIB043","doi-asserted-by":"publisher","DOI":"10.1016\/j.jfranklin.2022.10.050"},{"key":"S0218126625501142BIB044","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2021.3106861"},{"key":"S0218126625501142BIB045","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1203"},{"key":"S0218126625501142BIB046","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1279"},{"key":"S0218126625501142BIB047","doi-asserted-by":"publisher","DOI":"10.1016\/j.compag.2021.106134"},{"key":"S0218126625501142BIB048","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.306"},{"key":"S0218126625501142BIB049","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i16.17687"}],"container-title":["Journal of Circuits, Systems and Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218126625501142","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T00:59:10Z","timestamp":1746665950000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S0218126625501142"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,7]]},"references-count":42,"journal-issue":{"issue":"08","published-print":{"date-parts":[[2025,5,30]]}},"alternative-id":["10.1142\/S0218126625501142"],"URL":"https:\/\/doi.org\/10.1142\/s0218126625501142","relation":{},"ISSN":["0218-1266","1793-6454"],"issn-type":[{"type":"print","value":"0218-1266"},{"type":"electronic","value":"1793-6454"}],"subject":[],"published":{"date-parts":[[2025,3,7]]},"article-number":"2550114"}}