{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:38:00Z","timestamp":1760060280350,"version":"build-2065373602"},"reference-count":37,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2025,8,14]],"date-time":"2025-08-14T00:00:00Z","timestamp":1755129600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Guangxi Key Research and Development Program","award":["GuiKe AB24010309","2024GXNSFAA010242","2025KY0343"],"award-info":[{"award-number":["GuiKe AB24010309","2024GXNSFAA010242","2025KY0343"]}]},{"name":"Natural Science Foundation of Guangxi Zhuang Autonomous Region","award":["GuiKe AB24010309","2024GXNSFAA010242","2025KY0343"],"award-info":[{"award-number":["GuiKe AB24010309","2024GXNSFAA010242","2025KY0343"]}]},{"name":"Guangxi Education Department Program","award":["GuiKe AB24010309","2024GXNSFAA010242","2025KY0343"],"award-info":[{"award-number":["GuiKe AB24010309","2024GXNSFAA010242","2025KY0343"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>With the increasing content richness of social media platforms, Multimodal Named Entity Recognition (MNER) faces the dual challenges of heterogeneous feature fusion and accurate entity recognition. Aiming at the key problems of inconsistent distribution of textual and visual information, insufficient feature alignment and noise interference fusion, this paper proposes a multimodal named entity recognition model based on dual-stream Transformer: CASF-MNER, which designs cross-modal cross-attention based on visual and textual features, constructs a bidirectional interaction mechanism between single-layer features, forms a higher-order semantic correlation modeling, and realizes the cross relevance alignment of modal features; construct a dynamic perception mechanism of multimodal feature saliency features based on multiscale pooling method, construct an entropy weighting strategy of global feature distribution information to adaptively suppress noise redundancy and enhance key feature expression; establish a deep semantic fusion method based on hybrid isomorphic model, design a progressive cross-modal interaction structure, and combine with contrastive learning to realize global fusion of the deep semantic space and representational consistency optimization. The experimental results show that CASF-MNER achieves excellent performance on both Twitter-2015 and Twitter-2017 public datasets, which verifies the effectiveness and advancement of the method proposed in this paper.<\/jats:p>","DOI":"10.3390\/a18080511","type":"journal-article","created":{"date-parts":[[2025,8,14]],"date-time":"2025-08-14T15:44:21Z","timestamp":1755186261000},"page":"511","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["CASF: Correlation-Alignment and Significance-Aware Fusion for Multimodal Named Entity Recognition"],"prefix":"10.3390","volume":"18","author":[{"given":"Hui","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou 545006, China"},{"name":"Guangxi Key Laboratory of Intelligent Computing and Distributed Information Processing, Liuzhou 545006, China"},{"name":"Cybersecurity Monitoring Center for Guangxi Education System, Liuzhou 545006, China"},{"name":"Liuzhou Key Laboratory of Big Data Intelligent Processing and Security, Liuzhou 545006, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-7356-6981","authenticated-orcid":false,"given":"Yunshi","family":"Tao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou 545006, China"},{"name":"Guangxi Key Laboratory of Intelligent Computing and Distributed Information Processing, Liuzhou 545006, China"},{"name":"Cybersecurity Monitoring Center for Guangxi Education System, Liuzhou 545006, China"},{"name":"Liuzhou Key Laboratory of Big Data Intelligent Processing and Security, Liuzhou 545006, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9688-6242","authenticated-orcid":false,"given":"Huan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou 545006, China"},{"name":"Guangxi Key Laboratory of Intelligent Computing and Distributed Information Processing, Liuzhou 545006, China"},{"name":"Cybersecurity Monitoring Center for Guangxi Education System, Liuzhou 545006, China"},{"name":"Liuzhou Key Laboratory of Big Data Intelligent Processing and Security, Liuzhou 545006, China"}]},{"given":"Zhe","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou 545006, China"},{"name":"Guangxi Key Laboratory of Intelligent Computing and Distributed Information Processing, Liuzhou 545006, China"},{"name":"Cybersecurity Monitoring Center for Guangxi Education System, Liuzhou 545006, China"},{"name":"Liuzhou Key Laboratory of Big Data Intelligent Processing and Security, Liuzhou 545006, China"}]},{"given":"Qingzheng","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou 545006, China"},{"name":"Guangxi Key Laboratory of Intelligent Computing and Distributed Information Processing, Liuzhou 545006, China"},{"name":"Cybersecurity Monitoring Center for Guangxi Education System, Liuzhou 545006, China"},{"name":"Liuzhou Key Laboratory of Big Data Intelligent Processing and Security, Liuzhou 545006, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,8,14]]},"reference":[{"key":"ref_1","first-page":"e77313","article-title":"Current application and future prospects of artificial intelligence in healthcare and medical education: A review of literature","volume":"17","author":"Joseph","year":"2025","journal-title":"Cureus"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Souza, A.S.d., Amorim, V.M.d.F., Soares, E.P., de Souza, R.F., and Guzzo, C.R. (2025). Antagonistic Trends Between Binding Affinity and Drug-Likeness in SARS-CoV-2 Mpro Inhibitors Revealed by Machine Learning. Viruses, 17.","DOI":"10.3390\/v17070935"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"71639","DOI":"10.1007\/s11042-024-18472-w","article-title":"MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition","volume":"83","author":"Liu","year":"2024","journal-title":"Multimed. Tools Appl."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Alfaqeeh, M. (2024, January 11\u201313). TriMod Fusion for Multimodal Named Entity Recognition in Social Media. Proceedings of the 2024 34th International Conference on Collaborative Advances in Software and COmputiNg (CASCON), Toronto, ON, Canada.","DOI":"10.1109\/CASCON62161.2024.10837944"},{"key":"ref_5","unstructured":"Chen, F., and Feng, Y. (2023). Chain-of-thought prompt distillation for multimodal named entity recognition and multimodal relation extraction. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"He, L., Wang, Q., Liu, J., Duan, J., and Wang, H. (2024). Visual Clue Guidance and Consistency Matching Framework for Multimodal Named Entity Recognition. Appl. Sci., 14.","DOI":"10.3390\/app14062333"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, H., Wang, Y., and Liu, D. (2024, January 18\u201320). A Multimodal Named Entity Recognition Approach Based on Multi-Perspective Contrastive Learning. Proceedings of the 2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP), Chengdu, China.","DOI":"10.1109\/MLNLP63328.2024.10799991"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1109\/TCSVT.2023.3284474","article-title":"Towards bridged vision and language: Learning cross-modal knowledge representation for relation extraction","volume":"34","author":"Feng","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Li, E., Li, T., Luo, H., Chu, J., Duan, L., and Lv, F. (2025). Adaptive Multi-Scale Language Reinforcement for Multimodal Named Entity Recognition. IEEE Trans. Multimed.","DOI":"10.1109\/TMM.2025.3543105"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, J., Li, H., Pan, Z., Sun, D., Wang, J., Zhang, W., and Pan, G. (2023). Prompting chatgpt in MNER: Enhanced multimodal named entity recognition with auxiliary refined knowledge. arXiv.","DOI":"10.18653\/v1\/2023.findings-emnlp.184"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, X., Ye, J., Li, Z., Tian, J., Jiang, Y., Yan, M., Zhang, J., and Xiao, Y. (2022, January 18\u201322). CAT-MNER: Multimodal named entity recognition with knowledge-refined cross-modal attention. Proceedings of the 2022 IEEE International Conference on Multimedia and Expo (ICME), Taipei, Taiwan.","DOI":"10.1109\/ICME52920.2022.9859972"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"128388","DOI":"10.1016\/j.neucom.2024.128388","article-title":"A multi-task framework based on decomposition for multimodal named entity recognition","volume":"604","author":"Cai","year":"2024","journal-title":"Neurocomputing"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"124867","DOI":"10.1016\/j.eswa.2024.124867","article-title":"ICKA: An instruction construction and knowledge alignment framework for multimodal named entity recognition","volume":"255","author":"Zeng","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"111940","DOI":"10.1016\/j.patcog.2025.111940","article-title":"Dual similarity enhanced hybrid orthogonal fusion for multimodal named entity recognition","volume":"169","author":"Jiang","year":"2025","journal-title":"Pattern Recognit."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"130982","DOI":"10.1016\/j.neucom.2025.130982","article-title":"Improving Multimodal Named Entity Recognition via Text-image Relevance Prediction with Large Language Models","volume":"651","author":"Zeng","year":"2025","journal-title":"Neurocomputing"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"103405","DOI":"10.1016\/j.inffus.2025.103405","article-title":"Multimodal Named Entity Recognition based on topic prompt and multi-curriculum denoising","volume":"124","author":"Xu","year":"2025","journal-title":"Inf. Fusion"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_18","unstructured":"Huang, S., Xu, B., Li, C., Ye, J., and Lin, X. (2024, January 20\u201325). Mner-mi: A multi-image dataset for multimodal named entity recognition in social media. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italia."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Bao, X., Tian, M., Wang, L., Zha, Z., and Qin, B. (2024, January 10\u201314). Contrastive pre-training with multi-level alignment for grounded multimodal named entity recognition. Proceedings of the 2024 International Conference on Multimedia Retrieval, Phuket, Thailand.","DOI":"10.1145\/3652583.3658011"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1109\/TASLP.2023.3345146","article-title":"Enhancing multimodal entity and relation extraction with variational information bottleneck","volume":"32","author":"Cui","year":"2024","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yang, L. (July, January 30). SAMNER: Image Screening and Cross-Modal Alignment Networks for Multimodal Named Entity Recognition. Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan.","DOI":"10.1109\/IJCNN60899.2024.10651087"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Yu, J., Jiang, J., Yang, L., and Xia, R. (2020). Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2020.acl-main.306"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Guo, A., Zhao, X., Tan, Z., and Xiao, W. (2023, January 21\u201325). MGICL: Multi-grained interaction contrastive learning for multimodal named entity recognition. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK.","DOI":"10.1145\/3583780.3614967"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Xiao, L., Mao, R., Zhang, X., He, L., and Cambria, E. (2024, January 12\u201316). Vanessa: Visual connotation and aesthetic attributes understanding network for multimodal aspect-based sentiment analysis. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA.","DOI":"10.18653\/v1\/2024.findings-emnlp.671"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Li, Z., Kong, J., and Zuo, M. (2024). Multimodal Entity Recognition and Relation Extraction via Dynamic Visual-Textual Enhanced Fusion for Social Media Applications in Consumer Electronics. IEEE Trans. Consum. Electron.","DOI":"10.1109\/TCE.2024.3505598"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bao, X., Tian, M., Zha, Z., and Qin, B. (2023, January 21\u201325). MPMRC-MNER: A unified MRC framework for multimodal named entity recognition based multimodal prompt. Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK.","DOI":"10.1145\/3583780.3614975"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wei, P., Ouyang, H., Hu, Q., Zeng, B., Feng, G., and Wen, Q. (2024, January 10\u201314). VEC-MNER: Hybrid Transformer with Visual-Enhanced Cross-Modal Multi-level Interaction for Multimodal NER. Proceedings of the 2024 International Conference on Multimedia Retrieval, Phuket, Thailand.","DOI":"10.1145\/3652583.3658097"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Yan, T., Zhao, S., Ma, W., Song, S., Wang, C., Rao, Z., Chen, S., Luo, Z., and Liu, X. (2025). FRCL-MNER: A Finer Grained Rank-Based Contrastive Learning Framework for Multimodal NER. IEEE Trans. Neural Netw. Learn. Syst.","DOI":"10.1109\/TNNLS.2025.3528567"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, Q., Fu, J., Liu, X., and Huang, X. (2018, January 2\u20137). Adaptive co-attention network for named entity recognition in tweets. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11962"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Lu, D., Neves, L., Carvalho, V., Zhang, N., and Ji, H. (2018, January 15\u201320). Visual attention model for name tagging in multimodal social media. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1185"},{"key":"ref_31","unstructured":"Huang, Z., Xu, W., and Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ma, X., and Hovy, E. (2016). End-to-end sequence labeling via bi-directional lstm-cnns-crf. arXiv.","DOI":"10.18653\/v1\/P16-1101"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., and Dyer, C. (2016). Neural architectures for named entity recognition. arXiv.","DOI":"10.18653\/v1\/N16-1030"},{"key":"ref_34","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhang, D., Wei, S., Li, S., Wu, H., Zhu, Q., and Zhou, G. (2021, January 2\u20139). Multi-modal graph fusion for named entity recognition with targeted visual guidance. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual.","DOI":"10.1609\/aaai.v35i16.17687"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"3944","DOI":"10.1109\/TCSS.2023.3323402","article-title":"Gdn-cmcf: A gated disentangled network with cross-modality consensus fusion for multimodal named entity recognition","volume":"11","author":"Huang","year":"2023","journal-title":"IEEE Trans. Comput. Soc. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"103546","DOI":"10.1016\/j.ipm.2023.103546","article-title":"Multi-granularity cross-modal representation learning for named entity recognition on social media","volume":"61","author":"Liu","year":"2024","journal-title":"Inf. Process. Manag."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/8\/511\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:27:03Z","timestamp":1760034423000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/8\/511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,14]]},"references-count":37,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2025,8]]}},"alternative-id":["a18080511"],"URL":"https:\/\/doi.org\/10.3390\/a18080511","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2025,8,14]]}}}