{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,4,1]],"date-time":"2022-04-01T07:31:46Z","timestamp":1648798306150},"reference-count":61,"publisher":"Cambridge University Press (CUP)","issue":"3","license":[{"start":{"date-parts":[[2019,9,10]],"date-time":"2019-09-10T00:00:00Z","timestamp":1568073600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2020,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Extraction keyphrase systems traditionally use classification algorithms and do not consider the fact that part of the keyphrases may not be found in the text, reducing the accuracy of such algorithms a priori. In this work, we propose to improve the accuracy of these systems with inferential mechanisms that use a knowledge representation model, including symbolic models of knowledge bases and distributional semantics, to expand the set of keyphrase candidates to be submitted to the classification algorithm with terms that are not in the text (not-in-text terms). The basic assumption we have is that not-in-text terms have a semantic relationship with terms that are in the text. To represent this relationship, we have defined two new features to be represented as input to the classification algorithms. The first feature refers to the power of discrimination of the inferred not-in-text terms. The intuition behind this is that good candidates for a keyphrase are those that are deduced from various textual terms in a specific document and that are not often deduced in other documents. The other feature represents the descriptive strength of a not-in-text candidate. We argue that not-in-text keyphrases must have a strong semantic relationship with the text and that the power of this semantic relationship can be measured in a similar way as popular metrics like TFxIDF. The method proposed in this work was compared with state-of-the-art systems using five corpora and the results show that it has significantly improved automatic keyphrase extraction, dealing with the limitation of extracting keyphrases absent from the text.<\/jats:p>","DOI":"10.1017\/s1351324919000342","type":"journal-article","created":{"date-parts":[[2019,9,10]],"date-time":"2019-09-10T08:09:27Z","timestamp":1568102967000},"page":"293-318","source":"Crossref","is-referenced-by-count":1,"title":["Learning keyphrases from corpora and knowledge models"],"prefix":"10.1017","volume":"26","author":[{"given":"R.","family":"Silveira","sequence":"first","affiliation":[]},{"given":"V.","family":"Furtado","sequence":"additional","affiliation":[]},{"given":"V.","family":"Pinheiro","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,9,10]]},"reference":[{"key":"S1351324919000342_ref15","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007413511361"},{"key":"S1351324919000342_ref13","unstructured":"Debanjan, M. , Kuriakose, J. , Shan, R.R. , Zimmermann, R. and Talburt, J.R. (2018b). Theme-weighted ranking of keywords from text documents using phrase embeddings. In arXiv:1807.05962v1."},{"key":"S1351324919000342_ref11","volume-title":"Lexical and Computational Semantics","author":"Danesh","year":"2015"},{"key":"S1351324919000342_ref2","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(98)00030-2"},{"key":"S1351324919000342_ref1","unstructured":"Ammar, W. , Peters, M.E. , Bhagavatula, C. and Power, R. (2017). The AI2 system at SemEval-2017 task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction. In Proceedings of the 11th International Workshop on Semantic Evaluations (SemEval-2017). Vancouver: Association for Computational Linguistics."},{"key":"S1351324919000342_ref52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1198"},{"key":"S1351324919000342_ref6","unstructured":"Bougouin, A. , Boudin, F. and Daille, B. (2016). Keyphrase annotation with graph co-ranking. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka: The COLING 2016 Organizing Committee, pp. 2945\u20132955."},{"key":"S1351324919000342_ref55","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2161"},{"key":"S1351324919000342_ref4","unstructured":"Bojanowski, P. , Grave, E. , Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. In arXiv:1607.04606."},{"key":"S1351324919000342_ref20","volume-title":"Technical report","author":"Gutwin","year":"1998"},{"key":"S1351324919000342_ref16","unstructured":"El-Beltagy, S.R. and Rafea, A. (2010). Kpminer: participation in Semeval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation. Uppsala: Association for Computational Linguistics, pp. 190\u2013193."},{"key":"S1351324919000342_ref10","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1439"},{"key":"S1351324919000342_ref58","unstructured":"Ye, H. and Wang, L. (2018). Semi-Supervised Learning for Neural Keyphrase Generation. In arXiv:1808.06773."},{"key":"S1351324919000342_ref59","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1080"},{"key":"S1351324919000342_ref14","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"S1351324919000342_ref12","unstructured":"Debanjan, M. , Kuriakose, J. , Shan, R.R. and Zimmermann, R. (2018a). Key2Vec: automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). New Orleans, LA: Association for Computational Linguistics, pp. 634\u2013639."},{"key":"S1351324919000342_ref5","unstructured":"Bougouin, A. , Boudin, F. and Beatrice, D. (2013). Topicrank: Graph-based topic ranking for keyphrase extraction. In Proceedings of the 6th International Joint Conference on Natural Language Processing. Nagoya: Asian Federation of Natural Language Processing, pp. 543\u2013551."},{"key":"S1351324919000342_ref7","first-page":"5","volume-title":"Machine Learning","author":"Breiman","year":"2001"},{"key":"S1351324919000342_ref50","unstructured":"Saric, F. , Glavas, G. , Karan, M. , Snajder, J. and Basic, B.D. (2012). Takelab: systems for measuring semantic text similarity. In First Joint Conference on Lexical and Computational Semantics, Canada, pp. 441\u2013448."},{"key":"S1351324919000342_ref9","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"S1351324919000342_ref17","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/7287.001.0001"},{"key":"S1351324919000342_ref42","doi-asserted-by":"publisher","DOI":"10.1145\/219717.219748"},{"key":"S1351324919000342_ref47","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"S1351324919000342_ref51","doi-asserted-by":"publisher","DOI":"10.1109\/BRACIS.2015.35"},{"key":"S1351324919000342_ref3","unstructured":"Berend, G. (2011). Opinion expression mining by exploiting keyphrase extraction. In Proceedings of the 5th International Joint Conference on Natural Language Processing. Chiang Mai: Asian Federation of Natural Language Processing, pp. 1162\u20131170."},{"key":"S1351324919000342_ref8","doi-asserted-by":"publisher","DOI":"10.17851\/2237-2083.23.3.695-726"},{"key":"S1351324919000342_ref56","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19548-3_21"},{"key":"S1351324919000342_ref22","doi-asserted-by":"publisher","DOI":"10.1109\/WI-IATW.2007.46"},{"key":"S1351324919000342_ref28","unstructured":"Le, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014). Beijing: JMLR, pp. 1188\u20131196."},{"key":"S1351324919000342_ref34","unstructured":"Medelyan, O. (2009). Human-competitive automatic topic indexing. PhD thesis. Department of Computer Science, University of Waikato, New Zealand."},{"key":"S1351324919000342_ref61","unstructured":"Zhang, Y. , Li, J. , Song, Y. and Zhang, C. (2018). Enconding Conversation Contexto for Neural Keyphrase Extraction from Microblog Posts. In Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). Minneapolis, MN: Association for Computational Linguistics, pp. 1676\u20131686."},{"key":"S1351324919000342_ref23","doi-asserted-by":"publisher","DOI":"10.3115\/1119355.1119383"},{"key":"S1351324919000342_ref26","unstructured":"Krapivin, M. , Autayeu, A. and Marchese, M. (2008). Large dataset for keyphrases extraction. Technical report disi-09-055, DISI, Trento, Italy."},{"key":"S1351324919000342_ref27","doi-asserted-by":"publisher","DOI":"10.1080\/15427951.2004.10129091"},{"key":"S1351324919000342_ref30","doi-asserted-by":"publisher","DOI":"10.1023\/B:BTTJ.0000047600.45421.6d"},{"key":"S1351324919000342_ref24","unstructured":"Hulth, A. and Megyesi, B.B. (2016). A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics. Sydney: ACM, pp. 537\u2013544."},{"key":"S1351324919000342_ref31","unstructured":"Liu, Z. , Huang, W. , Zheng, Y. and Sun, M. (2010). Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing."},{"key":"S1351324919000342_ref32","unstructured":"Manning, C.D. , Surdeanum, M. , Bauer, J. , Finkel, J. , Bethard, S.J. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, MD: Association for Computational Linguistics."},{"key":"S1351324919000342_ref35","doi-asserted-by":"publisher","DOI":"10.1145\/1141753.1141819"},{"key":"S1351324919000342_ref36","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20790"},{"key":"S1351324919000342_ref37","doi-asserted-by":"publisher","DOI":"10.3115\/1699648.1699678"},{"key":"S1351324919000342_ref38","unstructured":"Meng, R. , Zhao, S. , Han, S. , He, D. , Brusilovsky, P. and Chi, Y. (2017). Deep keyphrase generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL 2017). Vancouver: Association for Computational Linguistics."},{"key":"S1351324919000342_ref33","unstructured":"Medelyan, O. (2005). Automatic keyphrase indexing with a domain-specific thesaurus. Master thesis. Albert-Ludwigs-Universitaet Freiburg im Breisgau, Germany."},{"key":"S1351324919000342_ref39","unstructured":"Mihalcea, H. and Tarau, P. (2004). TextRank: bringing order into texts. In Association for Computational Linguistics. Barcelona, pp. 404\u2013411."},{"key":"S1351324919000342_ref43","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1105"},{"key":"S1351324919000342_ref40","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321475"},{"key":"S1351324919000342_ref21","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"S1351324919000342_ref41","unstructured":"Mikolov, T. , Chen, K. , Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of International Conference on Learning Representations - ICLR Workshop."},{"key":"S1351324919000342_ref29","doi-asserted-by":"publisher","DOI":"10.3115\/1613172.1613178"},{"key":"S1351324919000342_ref44","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-77094-7_41"},{"key":"S1351324919000342_ref45","unstructured":"Pagliardini, M. , Gupta, P. and Jaggi, M. (2017). Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features. In arXiv:1703.02507v2."},{"key":"S1351324919000342_ref46","unstructured":"Papagiannopoulou, E. and Tsoumakas, G. (2018). Local Word Vectors Guiding Keyphrase Extraction. In arXiv:1710.07503."},{"key":"S1351324919000342_ref48","doi-asserted-by":"publisher","DOI":"10.1109\/ISI.2010.5484783"},{"key":"S1351324919000342_ref49","volume-title":"Introduction to Modern Information Retrieval","author":"Salton","year":"1983"},{"key":"S1351324919000342_ref53","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1008"},{"key":"S1351324919000342_ref54","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009976227802"},{"key":"S1351324919000342_ref57","doi-asserted-by":"publisher","DOI":"10.1145\/313238.313437"},{"key":"S1351324919000342_ref60","unstructured":"Zhang, X. , Zhao, J. and Lecun, Y. (2015). Character-level convolutional networks for text classification. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015). Montreal: MIT Press Cambridge, pp. 649\u2013657."},{"key":"S1351324919000342_ref25","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-012-9210-3"},{"key":"S1351324919000342_ref19","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526798"},{"key":"S1351324919000342_ref18","unstructured":"Frank, E. , Paynter, G.W. , Witten, I.H. , Gutwin, C. and Nevill-Manning, C.G. (1999). Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm. San Francisco, CA: Morgan Kaufmann Publishers, pp. 668\u2013673."}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324919000342","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,4,7]],"date-time":"2020-04-07T08:08:28Z","timestamp":1586246908000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324919000342\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,9,10]]},"references-count":61,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5]]}},"alternative-id":["S1351324919000342"],"URL":"https:\/\/doi.org\/10.1017\/s1351324919000342","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,9,10]]}}}