{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T10:18:03Z","timestamp":1778149083752,"version":"3.51.4"},"reference-count":101,"publisher":"Cambridge University Press (CUP)","issue":"3","license":[{"start":{"date-parts":[[2019,11,11]],"date-time":"2019-11-11T00:00:00Z","timestamp":1573430400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2020,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Due to the considerable growth of the volume of text documents on the Internet and in digital libraries, manual analysis of these documents is no longer feasible. Having efficient approaches to keyword extraction in order to retrieve the \u2018key\u2019 elements of the studied documents is now a necessity. Keyword extraction has been an active research field for many years, covering various applications in Text Mining, Information Retrieval, and Natural Language Processing, and meeting different requirements. However, it is not a unified domain of research. In spite of the existence of many approaches in the field, there is no single approach that effectively extracts keywords from different data sources. This shows the importance of having a comprehensive review, which discusses the complexity of the task and categorizes the main approaches of the field based on the features and methods of extraction that they use. This paper presents a general introduction to the field of keyword\/keyphrase extraction. Unlike the existing surveys, different aspects of the problem along with the main challenges in the field are discussed. This mainly includes the unclear definition of \u2018keyness\u2019, complexities of targeting proper features for capturing desired keyness properties and selecting efficient extraction methods, and also the evaluation issues. By classifying a broad range of state-of-the-art approaches and analysing the benefits and drawbacks of different features and methods, we provide a clearer picture of them. This review is intended to help readers find their way around all the works related to keyword extraction and guide them in choosing or designing a method that is appropriate for the application they are targeting.<\/jats:p>","DOI":"10.1017\/s1351324919000457","type":"journal-article","created":{"date-parts":[[2019,11,11]],"date-time":"2019-11-11T11:13:12Z","timestamp":1573470792000},"page":"259-291","source":"Crossref","is-referenced-by-count":170,"title":["Keyword extraction: Issues and methods"],"prefix":"10.1017","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5865-8281","authenticated-orcid":false,"given":"Nazanin","family":"Firoozeh","sequence":"first","affiliation":[]},{"given":"Adeline","family":"Nazarenko","sequence":"additional","affiliation":[]},{"given":"Fabrice","family":"Alizon","sequence":"additional","affiliation":[]},{"given":"B\u00e9atrice","family":"Daille","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,11,11]]},"reference":[{"key":"S1351324919000457_ref100","unstructured":"Zhang, F. , Huang, L. and Peng, B. (2013). WordTopic-MultiRank: A new method for automatic keyphrase extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Nagoya, Japan, pp. 10\u201318."},{"key":"S1351324919000457_ref98","first-page":"1169","article-title":"Automatic keyword extraction from documents using conditional random fields","volume":"4","author":"Zhang","year":"2008","journal-title":"Computational Information Systems"},{"key":"S1351324919000457_ref97","unstructured":"Zesch, T. and Gurevych, I. (2009). Approximate matching for evaluating keyphrase extraction. In Proceedings of the 7th International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria, pp. 484\u2013489."},{"key":"S1351324919000457_ref95","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-71701-0_95"},{"key":"S1351324919000457_ref94","unstructured":"Wan, X. , Yang, J. and Xiao, J. (2007). Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL). Prague, Czech Republic: ACL, pp. 552\u2013559."},{"key":"S1351324919000457_ref91","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277949"},{"key":"S1351324919000457_ref90","unstructured":"Voorhees, E.M. (2001). The philosophy of information retrieval evaluation. In Evaluation of Cross-Language Information Retrieval Systems: Second Workshop of the Cross-Language Evaluation Forum, pp. 355\u2013370."},{"key":"S1351324919000457_ref89","unstructured":"Voorhees, E.M. (1999). The TREC-8 question answering track report. In Proceedings of The Eighth Text REtrieval Conference, pp. 77\u201382."},{"key":"S1351324919000457_ref88","first-page":"360","article-title":"Understanding interobserver agreement: The kappa statistic","volume":"37","author":"Viera","year":"2005","journal-title":"Family Medicine"},{"key":"S1351324919000457_ref87","unstructured":"Unesco. (1975). UNISIST Indexing Principle SC.75\/WS\/58."},{"key":"S1351324919000457_ref85","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009976227802"},{"key":"S1351324919000457_ref84","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2172"},{"key":"S1351324919000457_ref82","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-017-9395-6"},{"key":"S1351324919000457_ref81","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1198"},{"key":"S1351324919000457_ref79","unstructured":"Singhal, A. , Kasturi, R. , Sharma, A. and Srivastava, J. (2017). Leveraging web resources for keyword assignment to short text documents. In Computing Research Repository (CoRR)."},{"key":"S1351324919000457_ref75","unstructured":"Sarkar, K. , Nasipuri, M. and Ghose, S. (2010). A new approach to keyphrase extraction using neural networks. In Computing Research Repository (CoRR), abs\/1004.3274."},{"key":"S1351324919000457_ref73","first-page":"1","volume-title":"Text Mining: Applications and Theory","author":"Rose","year":"2010"},{"key":"S1351324919000457_ref69","doi-asserted-by":"publisher","DOI":"10.1209\/epl\/i2002-00528-3"},{"key":"S1351324919000457_ref71","volume-title":"C4.5: Programs for Machine Learning","author":"Quinlan","year":"1993"},{"key":"S1351324919000457_ref68","doi-asserted-by":"publisher","DOI":"10.1109\/ADL.1998.670375"},{"key":"S1351324919000457_ref66","unstructured":"Navigli, R. and Velardi, P. (2002). Semantic interpretation of terminological strings. In Proceedings of the Conference on Terminology and Knowledge Engineering (TKE), pp. 95\u2013100."},{"key":"S1351324919000457_ref65","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1075\/li.30.1.03nad","article-title":"A survey of named entity recognition and classification","volume":"30","author":"Nadeau","year":"2007","journal-title":"Linguisticae Investigationes"},{"key":"S1351324919000457_ref63","unstructured":"Mori, J. , Matsuo, Y. , Ishizuka, M. and Faltings, B. (2004). Keyword extraction from the web for foaf metadata. In Workshop on Friend of a Friend, Social Networking and the Semantic Web."},{"key":"S1351324919000457_ref62","unstructured":"Momtazi, S. , Khudanpur, S. and Klakow, D. (2010). A comparative study of word co-occurrence for term clustering in language model-based sentence retrieval. In Proceedings of Human Language Technologies (NAACL). Los Angeles, CA, USA: ACL, pp. 325\u2013328."},{"key":"S1351324919000457_ref57","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1054"},{"key":"S1351324919000457_ref55","doi-asserted-by":"publisher","DOI":"10.1016\/j.physa.2011.04.013"},{"key":"S1351324919000457_ref53","doi-asserted-by":"publisher","DOI":"10.3115\/1699648.1699678"},{"key":"S1351324919000457_ref52","unstructured":"Medelyan, O. (2009). Human-Competitive Automatic Topic Indexing. PhD thesis, The University of Waikato."},{"key":"S1351324919000457_ref47","doi-asserted-by":"publisher","DOI":"10.1147\/rd.14.0309"},{"key":"S1351324919000457_ref101","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1080"},{"key":"S1351324919000457_ref46","unstructured":"Lossio-Ventura, J.A. , Jonquet, C. , Roche, M. and Teisseire, M. (2013). Combining C-value and keyword extraction methods for biomedical terms extraction. In LBM: Languages in Biologyand Medicine, Tokyo, Japan."},{"key":"S1351324919000457_ref43","unstructured":"Liu, Z. , Huang, W. , Zheng, Y. and Sun, M. (2010). Automatic keyphrase extraction via topic decomposition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Cambridge, Massachusetts: ACL, pp. 366\u2013376."},{"key":"S1351324919000457_ref38","unstructured":"Kelleher, D. and Luz, S. (2005). Automatic hypertext keyphrase detection. In Kaelbling, L.P. and Saffiotti, A. (eds), Proceedings of the 19th International Joint Conference on Artificial intelligence (IJCAI), San Francisco, CA, USA, pp. 1608\u20131609."},{"key":"S1351324919000457_ref37","first-page":"144","article-title":"Effective approaches for extraction of keywords","volume":"7","author":"Kaur","year":"2010","journal-title":"International Journal of Computer Science Issues (IJCSI)"},{"key":"S1351324919000457_ref35","first-page":"599","volume-title":"The Oxford Handbook of Computational Linguistics","author":"Jacquemin","year":"2003"},{"key":"S1351324919000457_ref34","unstructured":"Hussey, R. , Williams, S. and Mitchell, R. (2012). Automatic keyphrase extraction: A comparison of methods. In Proceedings of the 4th International Conference on Information Process, and Knowledge Management (eKNOW), Valencia, Spain, pp. 18\u201323."},{"key":"S1351324919000457_ref32","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.92"},{"key":"S1351324919000457_ref31","doi-asserted-by":"publisher","DOI":"10.3115\/981823.981857"},{"key":"S1351324919000457_ref30","doi-asserted-by":"publisher","DOI":"10.1140\/epjb\/e2008-00206-x"},{"key":"S1351324919000457_ref29","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1119"},{"key":"S1351324919000457_ref28","doi-asserted-by":"publisher","DOI":"10.1080\/00437956.1954.11659520"},{"key":"S1351324919000457_ref27","doi-asserted-by":"publisher","DOI":"10.1007\/11510888_26"},{"key":"S1351324919000457_ref26","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45224-9_112"},{"key":"S1351324919000457_ref80","doi-asserted-by":"publisher","DOI":"10.1108\/eb026526"},{"key":"S1351324919000457_ref25","doi-asserted-by":"crossref","unstructured":"Grineva, M. , Grinev, M. and Lizorkin, D. (2009). Extracting key terms from noisy and multi-theme documents. In Proceedings of the 18th International Conference on World Wide Web (WWW). New York, NY, USA: ACM, pp. 661\u2013670.","DOI":"10.1145\/1526709.1526798"},{"key":"S1351324919000457_ref24","unstructured":"Frank, E. , Paynter, G.W. , Witten, I.H. , Gutwin, C. and Nevill-Manning, C.G. (1999). Domain-specific keyphrase extraction. In Proceedings of the 6th International Joint Conference on Artificial Intelligence (IJCAI). San Francisco, CA, USA: Morgan Kaufmann, pp. 668\u2013673."},{"key":"S1351324919000457_ref23","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2015.06.004"},{"key":"S1351324919000457_ref22","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S15-1013"},{"key":"S1351324919000457_ref21","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199504)46:3<162::AID-ASI2>3.0.CO;2-6"},{"key":"S1351324919000457_ref93","doi-asserted-by":"publisher","DOI":"10.3115\/1599081.1599203"},{"key":"S1351324919000457_ref20","unstructured":"Chung, W. , Chen, H. and Nunamaker, J.F. (2003). Business intelligence explorer: A knowledge map framework for discovering business intelligence on the Web. In Proceedings of the 36th Annual Hawaii International Conference on System Sciences."},{"key":"S1351324919000457_ref83","doi-asserted-by":"publisher","DOI":"10.3115\/1119282.1119287"},{"key":"S1351324919000457_ref18","unstructured":"Cellier, P. , Charnois, T. , Hotho, A. , Matwin, S. , Moens, M. and Toussaint, Y. , (eds.) (2014). Proceedings of the 1st International Workshop on Interactions between Data Mining and Natural Language Processing (DMNLP@PKDD\/ECML), volume 1202 of CEUR Workshop Proceedings, Nancy, France."},{"key":"S1351324919000457_ref17","doi-asserted-by":"publisher","DOI":"10.1016\/j.physa.2012.11.052"},{"key":"S1351324919000457_ref15","doi-asserted-by":"publisher","DOI":"10.1145\/2740908.2742776"},{"key":"S1351324919000457_ref58","unstructured":"Mihalcea, R. and Tarau, P. (2004). TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain, pp. 404\u2013411."},{"key":"S1351324919000457_ref72","unstructured":"Robertson, S.E. , Walker, S. , Jones, S. , Hancock-Beaulieu, M.M. and Gatford, M. (1996). Okapi at TREC-3. In Overview of the Third Text REtrieval Conference (TREC-3). Gaithersburg, MD: NIST, pp. 109\u2013126."},{"key":"S1351324919000457_ref61","unstructured":"Mimouni, N. , Nazarenko, A. and Salotti, S. (2015). Search and discovery in legal document networks. In Legal Knowledge and Information Systems (JURIX), Braga, Portugal, pp. 187\u2013188."},{"key":"S1351324919000457_ref19","first-page":"161","volume-title":"Proceedings of the 20th Australasian Database Conference (ADC)","volume":"92","author":"Chen","year":"2009"},{"key":"S1351324919000457_ref40","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324140"},{"key":"S1351324919000457_ref64","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-1997-1103"},{"key":"S1351324919000457_ref56","doi-asserted-by":"publisher","DOI":"10.1016\/j.physleta.2015.04.030"},{"key":"S1351324919000457_ref51","unstructured":"Matsuo, Y. , Ohsawa, Y. and Ishizuka, M. (2001). Keyworld: Extracting keywords from a document as a small world. In Proceedings of the 4th International Conference on Discovery Science (DS), volume 2226 of LNCS, pp. 271\u2013281."},{"key":"S1351324919000457_ref11","unstructured":"Bougouin, A. , Boudin, F. and Daille, B. (2013). TopicRank: Graph-based topic ranking for keyphrase extraction. In Sixth International Joint Conference on Natural Language Processing (IJCNLP), Nagoya, Japan, pp. 543\u2013551."},{"key":"S1351324919000457_ref41","unstructured":"Lahiri, S. , Choudhury, S.R. and Caragea, C. (2014). Keyword and keyphrase extraction using centrality measures on collocation networks. In Computing Research Repository (CoRR), abs\/1401.6571."},{"key":"S1351324919000457_ref59","unstructured":"Mikolov, T. , Chen, K. , Corrado, G. and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. In Computing Research Repository (CoRR), pp. 1\u201312."},{"key":"S1351324919000457_ref3","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2091"},{"key":"S1351324919000457_ref44","doi-asserted-by":"publisher","DOI":"10.3115\/1620754.1620845"},{"key":"S1351324919000457_ref77","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"S1351324919000457_ref5","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K18-1022"},{"key":"S1351324919000457_ref60","doi-asserted-by":"publisher","DOI":"10.1093\/ijl\/3.4.235"},{"key":"S1351324919000457_ref78","doi-asserted-by":"publisher","DOI":"10.5120\/19161-0607"},{"key":"S1351324919000457_ref39","unstructured":"Kim, S. , Medelyan, O. , Kan, M. and Baldwin, T. (2010). SemEval-2010 Task 5: Automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 21\u201326."},{"key":"S1351324919000457_ref74","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630260106"},{"key":"S1351324919000457_ref70","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-77046-6_62"},{"key":"S1351324919000457_ref1","doi-asserted-by":"publisher","DOI":"10.1016\/j.amc.2014.04.090"},{"key":"S1351324919000457_ref96","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135813"},{"key":"S1351324919000457_ref10","unstructured":"Bougouin, A. (2015). Automatic Domain-Specific Keyphrase Annotation. PhD thesis, Universit\u00e9 de Nantes."},{"key":"S1351324919000457_ref8","unstructured":"Boudin, F. (2013). A comparison of centrality measures for graph-based keyphrase extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 834\u2013838."},{"key":"S1351324919000457_ref6","unstructured":"Bharti, S.K. and Babu, K.S. (2017). Automatic keyword extraction for text summarization: A survey. In Computing Research Repository (CoRR), Volume abs\/1704.03242."},{"key":"S1351324919000457_ref12","first-page":"107","article-title":"The anatomy of a large-scale hypertextual web search engine","volume":"30","author":"Brin","year":"1998","journal-title":"Computer Networks"},{"key":"S1351324919000457_ref67","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-77094-7_41"},{"key":"S1351324919000457_ref49","unstructured":"Martins, C.B. , Pardo, T.A.S. , Espina, A.P. and Rino, L.H.M. (2001). Introduc\u00e3o \u00e0 sumarizac\u00e3o autom\u00e1tica. Technical report RT-DC 002\/2001, ICMC-USP."},{"key":"S1351324919000457_ref33","doi-asserted-by":"publisher","DOI":"10.3115\/1119355.1119383"},{"key":"S1351324919000457_ref92","unstructured":"Wan, X. and Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI), Chicago, Illinois, pp. 855\u2013860."},{"key":"S1351324919000457_ref42","doi-asserted-by":"publisher","DOI":"10.3115\/1613172.1613178"},{"key":"S1351324919000457_ref36","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1572113"},{"key":"S1351324919000457_ref14","unstructured":"Budanitsky, A. and Hirst, G. (2001). Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Workshop on Wordnet and Other Lexical Resources, Second Meeting of the North American Chapter of the Association for Computational Linguistics."},{"key":"S1351324919000457_ref7","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2013","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324919000457_ref16","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1150"},{"key":"S1351324919000457_ref48","volume-title":"Foundations of statistical Natural Language Processing","author":"Manning","year":"1990"},{"key":"S1351324919000457_ref45","unstructured":"Lopez, P. and Romary, L. (2010). HUMB: Automatic key term extraction from scientific articles in GROBID. In Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 248\u2013251."},{"key":"S1351324919000457_ref50","doi-asserted-by":"publisher","DOI":"10.1142\/S0218213004001466"},{"key":"S1351324919000457_ref76","unstructured":"SEOmoz (2012). The beginners guide to SEO. Technical report."},{"key":"S1351324919000457_ref9","unstructured":"Boudin, F. and Morin, E. (2013). Keyphrase extraction for N-best reranking in multi-sentence compression. In North American Chapter of the Association for Computational Linguistics (NAACL), Atlanta, United States, pp. 298\u2013305."},{"key":"S1351324919000457_ref54","doi-asserted-by":"publisher","DOI":"10.1145\/1141753.1141819"},{"key":"S1351324919000457_ref86","unstructured":"Turney, P.D. (2003). Coherent keyphrase extraction via web mining. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), Morgan Kaufmann, pp. 434\u2013439."},{"key":"S1351324919000457_ref99","doi-asserted-by":"publisher","DOI":"10.1007\/11775300_8"},{"key":"S1351324919000457_ref13","doi-asserted-by":"publisher","DOI":"10.1145\/1008992.1009000"},{"key":"S1351324919000457_ref2","doi-asserted-by":"publisher","DOI":"10.1109\/DEXA.2013.16"},{"key":"S1351324919000457_ref4","first-page":"1","article-title":"An overview of graph-based keyword extraction methods and approaches","volume":"39","author":"Beliga","year":"2015","journal-title":"Journal of Information and Organizational Sciences"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324919000457","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,2]],"date-time":"2021-02-02T18:47:51Z","timestamp":1612291671000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324919000457\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,11]]},"references-count":101,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5]]}},"alternative-id":["S1351324919000457"],"URL":"https:\/\/doi.org\/10.1017\/s1351324919000457","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,11]]}}}