{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T01:04:50Z","timestamp":1767834290806,"version":"3.49.0"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T00:00:00Z","timestamp":1690416000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T00:00:00Z","timestamp":1690416000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2021YFC2801000"],"award-info":[{"award-number":["2021YFC2801000"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61872231"],"award-info":[{"award-number":["61872231"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Major Research plan of the National Social Science Foundation of China","award":["20 &ZD130"],"award-info":[{"award-number":["20 &ZD130"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Abstractive summarization (AS) aims to generate more flexible and informative descriptions than extractive summarization. Nevertheless, it often distorts or fabricates facts in the original article. To address this problem, some existing approaches attempt to evaluate or verify factual consistency, or design models to reduce factual errors. However, most of the efforts either have limited effects or result in lower rouge scores while reducing factual errors. In other words, it is challenging to promote factual consistency while maintaining the informativeness of generated summaries. Inspired by the knowledge graph embedding technique, in this paper, we propose a novel cross-modal knowledge guided model (CKGM) for AS, which embeds a multimodal knowledge graph (MKG) combining image entity-relationship information and textual factual information (FI) into BERT to accomplish cross-modal information interaction and knowledge expansion. The pre-training method obtains rich contextual semantic information, while the knowledge graph supplements the textual information. In addition, an entity memory embedding algorithm is further proposed to improve information fusion efficiency and model training speed. We elaborately conducted ablation experiments and evaluated our model on the Visual Genome, FewRel, MSCOCO, and CNN\/DailyMail datasets. Experimental results demonstrate that our model can significantly improve the FI consistency and informativeness of generated summaries.<\/jats:p>","DOI":"10.1007\/s40747-023-01170-9","type":"journal-article","created":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T03:26:09Z","timestamp":1690428369000},"page":"577-594","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Cross-modal knowledge guided model for abstractive summarization"],"prefix":"10.1007","volume":"10","author":[{"given":"Hong","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7249-698X","authenticated-orcid":false,"given":"Jin","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Mingyang","family":"Duan","sequence":"additional","affiliation":[]},{"given":"Peizhu","family":"Gong","sequence":"additional","affiliation":[]},{"given":"Zhongdai","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Junxiang","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Bing","family":"Han","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,7,27]]},"reference":[{"key":"1170_CR1","doi-asserted-by":"publisher","first-page":"2126","DOI":"10.1002\/rnc.5350","volume":"31","author":"H Fang","year":"2022","unstructured":"Fang H, Zhu G, Stojanovic V, Nie R, He S, Luan X, Liu F (2022) Adaptive optimization algorithm for nonlinear Markov jump systems with partial unknown dynamics. Int J Robust Nonlinear Control 31:2126\u20132140","journal-title":"Int J Robust Nonlinear Control"},{"key":"1170_CR2","first-page":"1","volume":"99","author":"S Chang","year":"2020","unstructured":"Chang S, Liu J (2020) Multi-lane capsule network for classifying images with complex background. IEEE Access 99:1\u20131","journal-title":"IEEE Access"},{"key":"1170_CR3","first-page":"359","volume":"9","author":"X Song","year":"2022","unstructured":"Song X, Sun P, Song S, Stojanovic V (2022) Event-driven NN adaptive fixed-time control for nonlinear systems with guaranteed performance. J Franklin Inst 9:359","journal-title":"J Franklin Inst"},{"issue":"4","key":"1170_CR4","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1145\/3200864","volume":"36","author":"P Ren","year":"2018","unstructured":"Ren P, Chen Z, Ren Z, Wei F, Nie L, Ma J, Rijke MD (2018) Sentence relations for extractive summarization with deep neural networks. ACM Trans Inf Syst 36(4):39\u201313932","journal-title":"ACM Trans Inf Syst"},{"key":"1170_CR5","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/j.neucom.2020.02.102","volume":"425","author":"Z Deng","year":"2020","unstructured":"Deng Z, Ma F, Lan R, Huang W, Luo X (2020) A two-stage Chinese text summarization algorithm using keyword information and adversarial learning. Neurocomputing 425:117\u2013126","journal-title":"Neurocomputing"},{"key":"1170_CR6","first-page":"1","volume":"2","author":"J Liu","year":"2019","unstructured":"Liu J, Yang Y, Lv S, Wang J, Chen H (2019) Attention-based BiGRU-CNN for Chinese question classification. J Ambient Intell Humaniz Comput 2:1\u201312","journal-title":"J Ambient Intell Humaniz Comput"},{"key":"1170_CR7","doi-asserted-by":"publisher","first-page":"90410","DOI":"10.1109\/ACCESS.2020.2993875","volume":"8","author":"S Shang","year":"2020","unstructured":"Shang S, Liu J, Yang Y (2020) Multi-layer transformer aggregation encoder for answer generation. IEEE Access 8:90410\u201390419","journal-title":"IEEE Access"},{"key":"1170_CR8","doi-asserted-by":"crossref","unstructured":"Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. Comput Sci","DOI":"10.3115\/v1\/D14-1179"},{"key":"1170_CR9","doi-asserted-by":"crossref","unstructured":"Cao Z, Wei F, Li W, Li S (2017) Faithful to the original: fact aware neural abstractive summarization","DOI":"10.1609\/aaai.v32i1.11912"},{"key":"1170_CR10","doi-asserted-by":"crossref","unstructured":"Falke T, Ribeiro L, Utama PA, Dagan I, Gurevych I (2019) Ranking generated summaries by correctness: an interesting but challenging application for natural language inference. In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics","DOI":"10.18653\/v1\/P19-1213"},{"key":"1170_CR11","doi-asserted-by":"crossref","unstructured":"Kryscinski W, Mccann B, Xiong C, Socher R (2020) Evaluating the factual consistency of abstractive text summarization. In: Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP)","DOI":"10.18653\/v1\/2020.emnlp-main.750"},{"key":"1170_CR12","doi-asserted-by":"crossref","unstructured":"Wang A, Cho K, Lewis M (2020) Asking and answering questions to evaluate the factual consistency of summaries","DOI":"10.18653\/v1\/2020.acl-main.450"},{"key":"1170_CR13","doi-asserted-by":"crossref","unstructured":"Zhang Y, Merck D, Tsai EB, Manning CD, Langlotz CP (2019) Optimizing the factual correctness of a summary: a study of summarizing radiology reports","DOI":"10.18653\/v1\/2020.acl-main.458"},{"key":"1170_CR14","doi-asserted-by":"crossref","unstructured":"Dong Y, Wang S, Gan Z, Cheng Y, Liu J (2020) Multi-fact correction in abstractive text summarization","DOI":"10.18653\/v1\/2020.emnlp-main.749"},{"key":"1170_CR15","doi-asserted-by":"crossref","unstructured":"Zhu C, Hinthorn W, Xu R, Zeng Q, Zeng M, Huang X, Jiang M (2020) Boosting factual correctness of abstractive summarization","DOI":"10.18653\/v1\/2021.naacl-main.58"},{"key":"1170_CR16","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1016\/j.neucom.2020.04.056","volume":"403","author":"LA Jin","year":"2020","unstructured":"Jin LA, Yy A, Hh B (2020) Multi-level semantic representation enhancement network for relationship extraction. Neurocomputing 403:282\u2013293","journal-title":"Neurocomputing"},{"key":"1170_CR17","doi-asserted-by":"crossref","unstructured":"Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. Comput Sci","DOI":"10.18653\/v1\/D15-1044"},{"key":"1170_CR18","doi-asserted-by":"crossref","unstructured":"Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Conference of the North American Chapter of the Association for Computational Linguistics: human language technologies","DOI":"10.18653\/v1\/N16-1012"},{"key":"1170_CR19","doi-asserted-by":"crossref","unstructured":"See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks","DOI":"10.18653\/v1\/P17-1099"},{"key":"1170_CR20","unstructured":"Paulus R, Xiong C, Socher R (2017) A deep reinforced model for abstractive summarization"},{"key":"1170_CR21","doi-asserted-by":"crossref","unstructured":"Li W, Xiao X, Lyu Y, Wang Y (2018) Improving neural abstractive document summarization with structural regularization. In: Proceedings of the 2018 conference on empirical methods in natural language processing","DOI":"10.18653\/v1\/D18-1441"},{"key":"1170_CR22","doi-asserted-by":"publisher","unstructured":"Zhang C, Zhang Z, Li J, Liu Q, Zhu H (2021) Ctnr: compress-then-reconstruct approach for multimodal abstractive summarization. In: International joint conference on neural networks. https:\/\/doi.org\/10.1109\/IJCNN52387.2021.9534082","DOI":"10.1109\/IJCNN52387.2021.9534082"},{"key":"1170_CR23","doi-asserted-by":"crossref","unstructured":"Li H, Zhu J, Zhang J, He X, Zong C (2020) Multimodal sentence summarization via multimodal selective encoding. In: International conference on computational linguistics","DOI":"10.18653\/v1\/2020.coling-main.496"},{"key":"1170_CR24","doi-asserted-by":"crossref","unstructured":"Zhu J, Xiang L, Zhou Y, Zhang J, Zong C (2021) Graph-based multimodal ranking models for multimodal summarization. Transactions on Asian and low-resource language information processing","DOI":"10.1145\/3445794"},{"key":"1170_CR25","doi-asserted-by":"publisher","first-page":"224837","DOI":"10.1109\/ACCESS.2020.3044308","volume":"8","author":"P Gong","year":"2020","unstructured":"Gong P, Liu J, Yang Y, He H (2020) Towards knowledge enhanced language model for machine reading comprehension. IEEE Access 8:224837\u2013224851","journal-title":"IEEE Access"},{"key":"1170_CR26","doi-asserted-by":"crossref","unstructured":"Schlichtkrull M, Kipf TN, Bloem P, Berg R, Titov I, Welling M (2017) Modeling relational data with graph convolutional networks","DOI":"10.1007\/978-3-319-93417-4_38"},{"key":"1170_CR27","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE","DOI":"10.1109\/CVPR.2016.90"},{"key":"1170_CR28","doi-asserted-by":"crossref","unstructured":"Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2017.106"},{"key":"1170_CR29","doi-asserted-by":"crossref","unstructured":"Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms\u2014improving object detection with one line of code","DOI":"10.1109\/ICCV.2017.593"},{"key":"1170_CR30","doi-asserted-by":"crossref","unstructured":"Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (gru) neural networks. In: IEEE international Midwest symposium on circuits and systems, pp 1597\u2013 1600","DOI":"10.1109\/MWSCAS.2017.8053243"},{"key":"1170_CR31","doi-asserted-by":"crossref","unstructured":"Krishna R, Zhu Y, Groth O, Johnson J, Li FF (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1)","DOI":"10.1007\/s11263-016-0981-7"},{"key":"1170_CR32","doi-asserted-by":"crossref","unstructured":"Han X, Zhu H, Yu P, Wang Z, Yao Y, Liu Z, Sun M (2018) Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: Proceedings of the 2018 conference on empirical methods in natural language processing","DOI":"10.18653\/v1\/D18-1514"},{"key":"1170_CR33","volume-title":"Microsoft coco: common objects in context","author":"TY Lin","year":"2014","unstructured":"Lin TY, Maire M, Belongie S, Hays J, Zitnick CL (2014) Microsoft coco: common objects in context. Springer International Publishing, Cham"},{"key":"1170_CR34","volume-title":"Teaching machines to read and comprehend","author":"KM Hermann","year":"2015","unstructured":"Hermann KM, Koisk T, Grefenstette E, Espeholt L, Kay W, Suleyman M, Blunsom P (2015) Teaching machines to read and comprehend. MIT Press, Cambridge"},{"key":"1170_CR35","unstructured":"Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004)"},{"key":"1170_CR36","doi-asserted-by":"crossref","unstructured":"Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. IEEE Computer Society","DOI":"10.1109\/CVPR.2017.330"},{"key":"1170_CR37","doi-asserted-by":"publisher","unstructured":"Leibe B, Matas J, Sebe N, Welling M (2016) Lecture notes in computer science. Computer vision\u2014ECCV 2016, vol 9905. Visual relationship detection with language priors (Chapter 51), pp 852\u2013869. https:\/\/doi.org\/10.1007\/978-3-319-46448-0","DOI":"10.1007\/978-3-319-46448-0"},{"key":"1170_CR38","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need"},{"key":"1170_CR39","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding"},{"key":"1170_CR40","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach"},{"key":"1170_CR41","doi-asserted-by":"crossref","unstructured":"Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Computer vision and pattern recognition","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"1170_CR42","doi-asserted-by":"crossref","unstructured":"Ma L, Lu Z, Shang L, Li H (2015) Multimodal convolutional neural networks for matching image and sentence. In: IEEE international conference on computer vision","DOI":"10.1109\/ICCV.2015.301"},{"key":"1170_CR43","doi-asserted-by":"crossref","unstructured":"Wang L, Yin L, Lazebnik S (2016) Learning deep structure-preserving image-text embeddings. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2016.541"},{"key":"1170_CR44","unstructured":"Faghri F, Fleet DJ, Kiros JR, Fidler S (2017) Vse++: improved visual-semantic embeddings"},{"key":"1170_CR45","doi-asserted-by":"crossref","unstructured":"Lee KH, Xi C, Gang H, Hu H, He X (2018) Stacked cross attention for image-text matching","DOI":"10.1007\/978-3-030-01225-0_13"},{"key":"1170_CR46","doi-asserted-by":"crossref","unstructured":"Shi B, Ji L, Lu P, Niu Z, Duan N (2019) Knowledge aware semantic concept expansion for image-text matching. In: Twenty-eighth international joint conference on artificial intelligence IJCAI-19","DOI":"10.24963\/ijcai.2019\/720"},{"key":"1170_CR47","doi-asserted-by":"crossref","unstructured":"Wang Y, Yang H, Qian X, Ma L, Fan X (2019) Position focused attention network for image-text matching","DOI":"10.24963\/ijcai.2019\/526"},{"key":"1170_CR48","unstructured":"Yue D, Shen Y, Crawford E, Hoof HV, Cheung, J (2018) Banditsum: extractive summarization as a contextual bandit. In: Proceedings of the 2018 conference on empirical methods in natural language processing"},{"key":"1170_CR49","doi-asserted-by":"crossref","unstructured":"Zhou Q, Yang N, Wei F, Huang S, Zhou M, Zhao T (2018) Neural document summarization by jointly learning to score and select sentences. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics, vol 1: Long Papers","DOI":"10.18653\/v1\/P18-1061"},{"key":"1170_CR50","doi-asserted-by":"crossref","unstructured":"Xu J, Durrett G (2019) Neural extractive text summarization with syntactic compression","DOI":"10.18653\/v1\/D19-1324"},{"key":"1170_CR51","doi-asserted-by":"crossref","unstructured":"Chowdhury T, Kumar S, Chakraborty T (2020) Neural abstractive summarization with structural attention","DOI":"10.24963\/ijcai.2020\/514"},{"key":"1170_CR52","doi-asserted-by":"crossref","unstructured":"Narayan S, Cohen SB, Lapata M (2018) Ranking sentences for extractive summarization with reinforcement learning","DOI":"10.18653\/v1\/N18-1158"},{"key":"1170_CR53","doi-asserted-by":"crossref","unstructured":"Zhang X, Lapata M, Wei F, Ming Z (2018) Neural latent extractive document summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing","DOI":"10.18653\/v1\/D18-1088"},{"key":"1170_CR54","doi-asserted-by":"crossref","unstructured":"Liu Y, Lapata M (2019) Text summarization with pretrained encoders","DOI":"10.18653\/v1\/D19-1387"},{"key":"1170_CR55","doi-asserted-by":"crossref","unstructured":"Zhong M, Liu P, Wang D, Qiu X, Huang X (2019) Searching for effective neural extractive summarization: what works and what\u2019s next. arXiv e-prints, arXiv:1907.03491 [cs.CL]","DOI":"10.18653\/v1\/P19-1100"},{"key":"1170_CR56","doi-asserted-by":"crossref","unstructured":"Zhang X, Wei F, Zhou M (2019) Hibert: document level pre-training of hierarchical bidirectional transformers for document summarization. In: Proceedings of the 57th annual meeting of the association for computational linguistics","DOI":"10.18653\/v1\/P19-1499"},{"key":"1170_CR57","doi-asserted-by":"crossref","unstructured":"Loem M, Takase S, Kaneko M, Okazaki N (2022) Extraphrase: efficient data augmentation for abstractive summarization","DOI":"10.18653\/v1\/2022.naacl-srw.3"},{"key":"1170_CR58","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1016\/j.neucom.2021.02.028","volume":"448","author":"W Liao","year":"2021","unstructured":"Liao W, Ma Y, Yin Y, Ye G, Zuo D (2021) Improving abstractive summarization based on dynamic residual network with reinforce dependency. Neurocomputing 448:228\u2013237","journal-title":"Neurocomputing"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01170-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01170-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01170-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,10]],"date-time":"2024-02-10T22:21:22Z","timestamp":1707603682000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01170-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,27]]},"references-count":58,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,2]]}},"alternative-id":["1170"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01170-9","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,27]]},"assertion":[{"value":"6 September 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 June 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 July 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that there is no conflict of interest in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}