{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T11:00:14Z","timestamp":1760785214680,"version":"3.40.4"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T00:00:00Z","timestamp":1738195200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T00:00:00Z","timestamp":1738195200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100015720","name":"Universidad de Extremadura","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100015720","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2025,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The use of keywords is increasingly being applied across diverse domains, including the movie industry, whose main platforms are adopting advanced natural language processing techniques. Algorithms for automatic extraction of keywords can provide relevant information in this domain. The most novel approaches covering several categories (statistics, graphs, word embedding, and hybrid) have been considered in a model study framework. They have been implemented, applied, and evaluated with standard datasets. In addition, a movie dataset with gold standard keywords, based on textual metadata from synopses and reviews, has been specifically developed for this scope. Keyword extraction models have been evaluated in terms of F-score and computation time. Furthermore, content analysis, both quantitative and qualitative, of the extracted keywords in the movie context has been performed. Results show a great variability in model performance and computation time among the different models. Qualitative results, in addition to F-score and computation time, demonstrate that keyword extraction works better with synopses than with reviews. The quantitative content analysis revealed that EmbedRank effectively reduces redundancy and limits the use of proper nouns, leading to high-quality keywords.<\/jats:p>","DOI":"10.1007\/s10115-025-02350-4","type":"journal-article","created":{"date-parts":[[2025,1,30]],"date-time":"2025-01-30T16:17:34Z","timestamp":1738253854000},"page":"4301-4323","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["A keyword extraction model study in the movie domain with synopsis and reviews"],"prefix":"10.1007","volume":"67","author":[{"given":"Carlos","family":"Gonz\u00e1lez-Santos","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miguel A.","family":"Vega-Rodr\u00edguez","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carlos J.","family":"P\u00e9rez","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"I\u00f1aki","family":"Mart\u00ednez-Sarriegui","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joaqu\u00edn M.","family":"L\u00f3pez-Mu\u00f1oz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,1,30]]},"reference":[{"key":"2350_CR1","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1016\/j.psep.2021.09.022","volume":"155","author":"A Ahadh","year":"2021","unstructured":"Ahadh A, Binish GV, Srinivasan R (2021) Text mining of accident reports using semi-supervised keyword extraction and topic modeling. Process Saf Environ Prot 155:455\u2013465. https:\/\/doi.org\/10.1016\/j.psep.2021.09.022","journal-title":"Process Saf Environ Prot"},{"issue":"2","key":"2350_CR2","first-page":"55","volume":"15","author":"GO Aquino","year":"2015","unstructured":"Aquino GO, Lanzarini LC (2015) Keyword identification in Spanish documents using neural networks. J Comput Sci & Technol 15(2):55\u201360","journal-title":"J Comput Sci & Technol"},{"key":"2350_CR3","doi-asserted-by":"publisher","unstructured":"Bennani-Smires K, Musat C, Hossmann A et\u00a0al (2018) Simple unsupervised keyphrase extraction using sentence embeddings. In: Proceedings of the 22nd Conference on Computational Natural Language Learning, pp 221\u2013229, https:\/\/doi.org\/10.18653\/v1\/K18-1022","DOI":"10.18653\/v1\/K18-1022"},{"key":"2350_CR4","doi-asserted-by":"publisher","unstructured":"Campos R, Mangaravite V, Pasquali A et\u00a0al (2018) YAKE! Collection-independent automatic keyword extractor. In: Advances in information retrieval, pp 806\u2013810, https:\/\/doi.org\/10.1007\/978-3-319-76941-7_80","DOI":"10.1007\/978-3-319-76941-7_80"},{"key":"2350_CR5","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1016\/j.ins.2019.09.013","volume":"509","author":"R Campos","year":"2020","unstructured":"Campos R, Mangaravite V, Pasquali A et al (2020) YAKE! Keyword extraction from single documents using multiple local features. Inf Sci 509:257\u2013289. https:\/\/doi.org\/10.1016\/j.ins.2019.09.013","journal-title":"Inf Sci"},{"key":"2350_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107014","volume":"223","author":"L Chi","year":"2021","unstructured":"Chi L, Hu L (2021) ISKE: an unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method. Knowl-Based Syst 223:107014. https:\/\/doi.org\/10.1016\/j.knosys.2021.107014","journal-title":"Knowl-Based Syst"},{"issue":"1","key":"2350_CR7","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1016\/j.is.2008.05.002","volume":"34","author":"SR El-Beltagy","year":"2009","unstructured":"El-Beltagy SR, Rafea A (2009) KP-Miner: a keyphrase extraction system for English and Arabic documents. Inf Syst 34(1):132\u2013144. https:\/\/doi.org\/10.1016\/j.is.2008.05.002","journal-title":"Inf Syst"},{"issue":"2","key":"2350_CR8","doi-asserted-by":"publisher","first-page":"30","DOI":"10.3390\/mti4020030","volume":"4","author":"I Gagliardi","year":"2020","unstructured":"Gagliardi I, Artese MT (2020) Semantic unsupervised automatic keyphrases extraction by integrating word embedding with clustering methods. Multimodal Technol Interact 4(2):30. https:\/\/doi.org\/10.3390\/mti4020030","journal-title":"Multimodal Technol Interact"},{"key":"2350_CR9","doi-asserted-by":"publisher","unstructured":"Grootendorst M, d\u00a0Warmerdam V (2021) KeyBERT: minimal keyword extraction with BERT. https:\/\/doi.org\/10.5281\/zenodo.5534341, Zenodo, Version v0.5.0","DOI":"10.5281\/zenodo.5534341"},{"key":"2350_CR10","doi-asserted-by":"publisher","unstructured":"Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 1262\u20131273, https:\/\/doi.org\/10.3115\/v1\/P14-1119","DOI":"10.3115\/v1\/P14-1119"},{"key":"2350_CR11","doi-asserted-by":"publisher","unstructured":"Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing, pp 216\u2013223, https:\/\/doi.org\/10.3115\/1119355.1119383","DOI":"10.3115\/1119355.1119383"},{"key":"2350_CR12","doi-asserted-by":"publisher","unstructured":"Jayasiriwardene TD, Ganegoda GU (2020) Keyword extraction from tweets using NLP tools for collecting relevant news. In: 2020 International research conference on smart computing and systems engineering (SCSE), pp 129\u2013135, https:\/\/doi.org\/10.1109\/SCSE49731.2020.9313024","DOI":"10.1109\/SCSE49731.2020.9313024"},{"key":"2350_CR13","unstructured":"Jhajharia M (2021) Reconstructing dependency trees for unsupervised keyphrase extraction. Undergraduate Thesis, University of Delhi, India"},{"key":"2350_CR14","doi-asserted-by":"publisher","unstructured":"Knittel J, Koch S, Ertl T (2021) ELSKE: efficient large-scale keyphrase extraction. In: Proceedings of the 21st ACM symposium on document engineering, p\u00a09, https:\/\/doi.org\/10.1145\/3469096.3474930","DOI":"10.1145\/3469096.3474930"},{"key":"2350_CR15","doi-asserted-by":"publisher","unstructured":"Ko\u0161\u00fat M, \u0160imko M (2016) Improving keyword extraction from movie subtitles by utilizing temporal properties. In: SOFSEM 2016: theory and practice of computer science, pp 544\u2013555, https:\/\/doi.org\/10.1007\/978-3-662-49192-8_44","DOI":"10.1007\/978-3-662-49192-8_44"},{"key":"2350_CR16","unstructured":"Krapivin M, Autaeu A, Marchese M (2009) Large dataset for keyphrase extraction. Tech. Rep. DISI-09-055, University of Trento, Italy"},{"issue":"4","key":"2350_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2020.101066","volume":"14","author":"W Lu","year":"2020","unstructured":"Lu W, Liu Z, Huang Y et al (2020) How do authors select keywords? A preliminary study of author keyword selection behavior. J Informet 14(4):101066. https:\/\/doi.org\/10.1016\/j.joi.2020.101066","journal-title":"J Informet"},{"issue":"6","key":"2350_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2019.102088","volume":"56","author":"Z Nasar","year":"2019","unstructured":"Nasar Z, Jaffry SW, Malik MK (2019) Textual keyword extraction and summarization: state-of-the-art. Inf Process & Manag 56(6):102088. https:\/\/doi.org\/10.1016\/j.ipm.2019.102088","journal-title":"Inf Process & Manag"},{"key":"2350_CR19","doi-asserted-by":"publisher","unstructured":"Nguyen TD, Kan MY (2007) Keyphrase extraction in scientific publications. In: Proceedings of the 10th international conference on asian digital libraries: looking back 10 years and forging new frontiers, pp 317\u2013326, https:\/\/doi.org\/10.1007\/978-3-540-77094-7_41","DOI":"10.1007\/978-3-540-77094-7_41"},{"key":"2350_CR20","doi-asserted-by":"publisher","unstructured":"Nikzad-Khasmakhi N, Feizi-Derakhshi MR, Asgari-Chenaghlu M, et al. (2021) Phraseformer: multimodal key-phrase extraction using transformer and graph embedding. https:\/\/doi.org\/10.48550\/arXiv.2106.04939","DOI":"10.48550\/arXiv.2106.04939"},{"key":"2350_CR21","unstructured":"Piskorski J, Stefanovitch N, Jacquet G, et\u00a0al (2021) Exploring linguistically-lightweight keyword extraction techniques for indexing news articles in a multilingual set-up. In: Proceedings of the EACL Hackashop on news media content analysis and automated report generation, pp 35\u201344"},{"key":"2350_CR22","doi-asserted-by":"publisher","unstructured":"Sang EFTK, Veenstra J (1999) Representing text chunks. In: Proceedings of the ninth conference on european chapter of the association for computational linguistics, pp 173\u2013179, https:\/\/doi.org\/10.3115\/977035.977059","DOI":"10.3115\/977035.977059"},{"key":"2350_CR23","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1007\/s00779-021-01605-5","volume":"27","author":"GL Sarrac\u00e9n","year":"2023","unstructured":"Sarrac\u00e9n GL, Rosso P (2023) Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation. Pers Ubiquit Comput 27:45\u201357. https:\/\/doi.org\/10.1007\/s00779-021-01605-5","journal-title":"Pers Ubiquit Comput"},{"key":"2350_CR24","doi-asserted-by":"publisher","unstructured":"Saxena A, Mangal M, Jain G (2020) KeyGames: a game theoretic approach to automatic keyphrase extraction. In: Proceedings of the 28th international conference on computational linguistics, pp 2037\u20132048, https:\/\/doi.org\/10.18653\/v1\/2020.coling-main.184","DOI":"10.18653\/v1\/2020.coling-main.184"},{"issue":"1","key":"2350_CR25","doi-asserted-by":"publisher","first-page":"2590","DOI":"10.48084\/etasr.1813","volume":"8","author":"ZA Shaikh","year":"2018","unstructured":"Shaikh ZA (2018) Keyword detection techniques. Eng Technol & Appl Sci Res 8(1):2590\u20132594. https:\/\/doi.org\/10.48084\/etasr.1813","journal-title":"Eng Technol & Appl Sci Res"},{"key":"2350_CR26","doi-asserted-by":"publisher","unstructured":"Singhal A, Sharma DK (2021) Keyword extraction using Renyi entropy: a statistical and domain independent method. In: 2021 7th international conference on advanced computing and communication systems (ICACCS), pp 1970\u20131975, https:\/\/doi.org\/10.1109\/ICACCS51430.2021.9441909","DOI":"10.1109\/ICACCS51430.2021.9441909"},{"key":"2350_CR27","doi-asserted-by":"publisher","unstructured":"Song M, Feng Y, Jing L (2023) A survey on recent advances in keyphrase extraction from pre-trained language models. In: Findings of the association for computational linguistics: EACL 2023, pp 2153\u20132164, https:\/\/doi.org\/10.18653\/v1\/2023.findings-eacl.161","DOI":"10.18653\/v1\/2023.findings-eacl.161"},{"key":"2350_CR28","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.389","volume":"7","author":"N Tahir","year":"2021","unstructured":"Tahir N, Asif M, Ahmad S et al (2021) FNG-IE: an improved graph-based method for keyword extraction from scholarly big-data. PeerJ Comput Sci 7:e389. https:\/\/doi.org\/10.7717\/peerj-cs.389","journal-title":"PeerJ Comput Sci"},{"key":"2350_CR29","doi-asserted-by":"publisher","unstructured":"Thiyagarajan G, Prasanna S, Uma B (2021) Automation of discussion board evaluation through keyword extraction techniques: a comparative study. In: IOP Conference series: materials science and engineering, p 012017, https:\/\/doi.org\/10.1088\/1757-899x\/1131\/1\/012017","DOI":"10.1088\/1757-899x\/1131\/1\/012017"},{"key":"2350_CR30","doi-asserted-by":"publisher","unstructured":"Thushara MG, Mownika T, Mangamuru R (2019) A comparative study on different keyword extraction algorithms. In: 2019 3rd International conference on computing methodologies and communication (ICCMC), pp 969\u2013973, https:\/\/doi.org\/10.1109\/ICCMC.2019.8819630","DOI":"10.1109\/ICCMC.2019.8819630"},{"key":"2350_CR31","doi-asserted-by":"publisher","unstructured":"Timonen M, Toivanen T, Teng Y et\u00a0al (2012) Informativeness-based keyword extraction from short documents. In: Proceedings of the international conference on knowledge discovery and information retrieval - SSTM, (IC3K 2012), pp 411\u2013421, https:\/\/doi.org\/10.5220\/0004130704110421","DOI":"10.5220\/0004130704110421"},{"key":"2350_CR32","doi-asserted-by":"publisher","unstructured":"Uehara K, Harada T (2020) Unsupervised keyword extraction for full-sentence VQA. In: Proceedings of the first international workshop on natural language processing beyond text, pp 51\u201359, https:\/\/doi.org\/10.18653\/v1\/2020.nlpbt-1.6","DOI":"10.18653\/v1\/2020.nlpbt-1.6"},{"key":"2350_CR33","unstructured":"Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd AAAI conference on artificial intelligence, pp 855\u2013860"},{"key":"2350_CR34","doi-asserted-by":"publisher","unstructured":"Wang Y, Zhang J (2017) Keyword extraction from online product reviews based on bi-directional LSTM recurrent neural network. In: 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), pp 2241\u20132245, https:\/\/doi.org\/10.1109\/IEEM.2017.8290290","DOI":"10.1109\/IEEM.2017.8290290"},{"issue":"19","key":"2350_CR35","doi-asserted-by":"publisher","first-page":"25355","DOI":"10.1007\/s11042-018-5788-9","volume":"77","author":"X Wu","year":"2018","unstructured":"Wu X, Du Z, Guo Y (2018) A visual attention-based keyword extraction for document classification. Multimed Tools Appl 77(19):25355\u201325367. https:\/\/doi.org\/10.1007\/s11042-018-5788-9","journal-title":"Multimed Tools Appl"},{"issue":"6","key":"2350_CR36","doi-asserted-by":"publisher","first-page":"886","DOI":"10.26599\/TST.2020.9010051","volume":"26","author":"A Xiong","year":"2021","unstructured":"Xiong A, Liu D, Tian H et al (2021) News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Sci Technol 26(6):886\u2013893. https:\/\/doi.org\/10.26599\/TST.2020.9010051","journal-title":"Tsinghua Sci Technol"},{"key":"2350_CR37","doi-asserted-by":"publisher","unstructured":"Yanan Q, Fuqiang T (2020) Keyword extraction for film reviews based on social network analysis and natural language technology. In: E3S Web of Conferences, p 03019, https:\/\/doi.org\/10.1051\/e3sconf\/202018903019","DOI":"10.1051\/e3sconf\/202018903019"},{"key":"2350_CR38","doi-asserted-by":"publisher","unstructured":"Zehtab-Salmasi A, Feizi-Derakhshi MR, Balafar MA (2021) FRAKE: fusional real-time automatic keyword extraction. 1\u201312 https:\/\/doi.org\/10.48550\/arXiv.2104.04830","DOI":"10.48550\/arXiv.2104.04830"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-025-02350-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-025-02350-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-025-02350-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,12]],"date-time":"2025-04-12T03:40:55Z","timestamp":1744429255000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-025-02350-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,30]]},"references-count":38,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5]]}},"alternative-id":["2350"],"URL":"https:\/\/doi.org\/10.1007\/s10115-025-02350-4","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"type":"print","value":"0219-1377"},{"type":"electronic","value":"0219-3116"}],"subject":[],"published":{"date-parts":[[2025,1,30]]},"assertion":[{"value":"5 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 November 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 January 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}