{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T05:38:31Z","timestamp":1770961111794,"version":"3.50.1"},"reference-count":42,"publisher":"Cambridge University Press (CUP)","issue":"2","license":[{"start":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T00:00:00Z","timestamp":1473379200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2017,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Emails constitute an important genre of online communication. Many of us are often faced with the daunting task of sifting through increasingly large amounts of emails on a daily basis. Keywords extracted from emails can help us combat such information overload by allowing a systematic exploration of the topics contained in emails. Existing literature on keyword extraction has not covered the email genre, and no human-annotated gold standard datasets are currently available. In this paper, we introduce a new dataset for keyword extraction from emails, and evaluate supervised and unsupervised methods for keyword extraction from emails. The results obtained with our supervised keyword extraction system (38.99% F-score) improve over the results obtained with the best performing systems participating in the<jats:sc>SemEval<\/jats:sc>2010 keyword extraction task.<\/jats:p>","DOI":"10.1017\/s1351324916000231","type":"journal-article","created":{"date-parts":[[2016,9,9]],"date-time":"2016-09-09T09:47:42Z","timestamp":1473414462000},"page":"295-317","source":"Crossref","is-referenced-by-count":17,"title":["Keyword extraction from emails"],"prefix":"10.1017","volume":"23","author":[{"given":"S.","family":"LAHIRI","sequence":"first","affiliation":[]},{"given":"R.","family":"MIHALCEA","sequence":"additional","affiliation":[]},{"given":"P.-H.","family":"LAI","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2016,9,9]]},"reference":[{"key":"S1351324916000231_ref032","doi-asserted-by":"crossref","unstructured":"Nguyen T. D. , and Kan M.-Y. , 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers, ICADL\u201907, Hanoi, Vietnam, pp. 317\u2013326.","DOI":"10.1007\/978-3-540-77094-7_41"},{"key":"S1351324916000231_ref017","doi-asserted-by":"crossref","unstructured":"Hasan K. S. , and Ng V. 2014. Automatic keyphrase extraction: a survey of the state of the art. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, pp. 1262\u20131273.","DOI":"10.3115\/v1\/P14-1119"},{"key":"S1351324916000231_ref014","doi-asserted-by":"crossref","unstructured":"Grineva M. , Grinev M. , and Lizorkin D. , 2009. Extracting key terms from noisy and multi-theme documents. In Proceedings of the 18th International World Wide Web Conference, WWW 2009, Madrid, Spain, pp. 661\u2013670.","DOI":"10.1145\/1526709.1526798"},{"key":"S1351324916000231_ref013","unstructured":"Goodman Joshua , and Carvalho Vitor R. 2005. Implicit Queries for Email. In Proceedings of the Second Conference on Email and Anti-Spam (CEAS). July. Stanford, California, USA."},{"key":"S1351324916000231_ref012","doi-asserted-by":"crossref","unstructured":"Finkel J. R. , Grenager T. , and Manning C. , 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the Association for Computational Linguistics, ACL \u201905, Ann Arbor, Michigan, USA, pp. 363\u2013370.","DOI":"10.3115\/1219840.1219885"},{"key":"S1351324916000231_ref009","first-page":"932","volume-title":"ACL","author":"Csomai","year":"2008"},{"key":"S1351324916000231_ref001","unstructured":"Batagelj V. , and Zaver\u0161nik M. 2003. An O(m) algorithm for cores decomposition of networks. CoRR cs.DS\/0310049, 1\u201310."},{"key":"S1351324916000231_ref040","unstructured":"Wan X. , and Xiao J. , 2008. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 2, AAAI\u201908, AAAI Press, Chicago, Illinois, USA, pp. 855\u2013860."},{"key":"S1351324916000231_ref003","unstructured":"Berend G. , and Farkas R. 2010. SZTERGAK: feature engineering for keyphrase extraction. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden."},{"key":"S1351324916000231_ref035","unstructured":"Pianta E. , and Tonelli S. 2010. KX: a flexible system for keyphrase eXtraction. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden."},{"key":"S1351324916000231_ref022","unstructured":"Klimt B. , and Yang Y. 2004. Introducing the enron corpus. In Proceedings of the 1st Conference on Email and Anti-Spam (CEAS), Mountain View, California, USA."},{"key":"S1351324916000231_ref016","unstructured":"Hasan K. S. , and Ng V. , 2010. Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, pp. 365\u2013373."},{"key":"S1351324916000231_ref033","unstructured":"Page L. , Brin S. , Motwani R. , and Winograd T. , 1998. The PageRank citation ranking: bringing order to the web. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, pp. 161\u2013172."},{"key":"S1351324916000231_ref004","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"S1351324916000231_ref023","doi-asserted-by":"crossref","unstructured":"Laclav\u00edk M. , and Maynard D. 2009. Motivating intelligent e-mail in business: an investigation into current trends for e-mail processing and communication research. In IEEE Conference on Commerce and Enterprise Computing. CEC \u201909, Vienna, Austria.","DOI":"10.1109\/CEC.2009.47"},{"key":"S1351324916000231_ref037","doi-asserted-by":"crossref","unstructured":"Tomokiyo T. , and Hurst M. , 2003. A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment-Volume 18, Association for Computational Linguistics, Sapporo, Japan, pp. 33\u201340.","DOI":"10.3115\/1119282.1119287"},{"key":"S1351324916000231_ref036","doi-asserted-by":"publisher","DOI":"10.1016\/0378-8733(83)90028-X"},{"key":"S1351324916000231_ref008","first-page":"211","volume-title":"FLAIRS Conference","author":"Csomai","year":"2007"},{"key":"S1351324916000231_ref042","doi-asserted-by":"crossref","unstructured":"Yih W.-tau , Goodman J. , and Carvalho V. R. , 2006. Finding advertising keywords on web pages. In Proceedings of the 15th International Conference on World Wide Web, WWW \u201906, New York, NY, USA: ACM, pp. 213\u2013222.","DOI":"10.1145\/1135777.1135813"},{"key":"S1351324916000231_ref039","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009976227802"},{"key":"S1351324916000231_ref031","unstructured":"Mihalcea R. , and Tarau P. 2004. TextRank: bringing order into texts. In D. Lin , and D. Wu (eds.), Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 404\u2013411."},{"key":"S1351324916000231_ref026","doi-asserted-by":"crossref","unstructured":"Litvak M. , and Last M. , 2008. Graph-based keyword extraction for single-document summarization. In Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, MMIES \u201908, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 17\u201324.","DOI":"10.3115\/1613172.1613178"},{"key":"S1351324916000231_ref007","first-page":"163","volume-title":"The Digital Word","author":"Clear","year":"1993"},{"key":"S1351324916000231_ref025","doi-asserted-by":"crossref","unstructured":"Li Z. , Zhou D. , Juan Y.-F. , and Han J. , 2010. Keyword extraction for social snippets. In Proceedings of the 19th International Conference on World Wide Web, WWW \u201910, Raleigh, North Carolina, USA, pp. 1143\u20131144.","DOI":"10.1145\/1772690.1772845"},{"key":"S1351324916000231_ref029","unstructured":"Loza V. , Lahiri S. , Mihalcea R. , and Lai P.-H. 2014. Building a dataset for summarization and keyword extraction from emails. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914), European Language Resources Association (ELRA), Reykjavik, Iceland, pp. 26\u201331."},{"key":"S1351324916000231_ref030","doi-asserted-by":"crossref","unstructured":"Mihalcea R. , and Csomai A. , 2007. Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, CIKM \u201907, Lisboa, Portugal, pp. 233\u2013242.","DOI":"10.1145\/1321440.1321475"},{"key":"S1351324916000231_ref015","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"S1351324916000231_ref006","doi-asserted-by":"publisher","DOI":"10.1145\/2362364.2362367"},{"key":"S1351324916000231_ref028","unstructured":"Liu Z. , Huang W. , Zheng Y. , and Sun M. , 2010. Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP \u201910, MIT, Massachusetts, USA, pp. 366\u2013376."},{"key":"S1351324916000231_ref038","doi-asserted-by":"crossref","unstructured":"Tonella P. , Ricca F. , Pianta E. , and Girardi C. , 2003. Using keyword extraction for web site clustering. In Proceedings of the 5th IEEE International Workshop on Web Site Evolution, 2003. Theme: Architecture, Amsterdam, The Netherlands, pp. 41\u201348.","DOI":"10.1109\/WSE.2003.1234007"},{"key":"S1351324916000231_ref010","doi-asserted-by":"crossref","unstructured":"Dredze M. , Wallach H. M. , Puller D. , and Pereira F. 2008. Generating summary keywords for emails using topics. In Proceedings of the 13th International Conference on Intelligent User Interfaces (IUI \u201908). ACM, New York, NY, USA, pp. 199\u2013206.","DOI":"10.1145\/1378773.1378800"},{"key":"S1351324916000231_ref011","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27302-5_2"},{"key":"S1351324916000231_ref018","doi-asserted-by":"crossref","unstructured":"Hulth A. , 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, EMNLP \u201903, Sapporo, Japan, pp. 216\u2013223.","DOI":"10.3115\/1119355.1119383"},{"key":"S1351324916000231_ref002","unstructured":"Berend G. , 2011. Opinion expression mining by exploiting keyphrase extraction. In Proceedings of 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand: Asian Federation of Natural Language Processing, pp. 1162\u20131170."},{"key":"S1351324916000231_ref027","doi-asserted-by":"crossref","unstructured":"Liu F. , Pennell D. , Liu F. , and Liu Y. , 2009. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL \u201909, Boulder, Colorado, USA, pp. 620\u2013628.","DOI":"10.3115\/1620754.1620845"},{"key":"S1351324916000231_ref019","doi-asserted-by":"crossref","unstructured":"Jiang X. , Hu Y. , and Li H. , 2009. A ranking approach to keyphrase extraction. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Boston, Massachusetts, USA, pp. 756\u2013757.","DOI":"10.1145\/1571941.1572113"},{"key":"S1351324916000231_ref041","doi-asserted-by":"crossref","unstructured":"Witten I. H. , Paynter G. W. , Frank E. , Gutwin C. , and Nevill-Manning C. G. , 1999. KEA: practical automatic keyphrase extraction. In Proceedings of the 4th ACM Conference on Digital Libraries, DL \u201999, Berkeley, California, USA, pp. 254\u2013255.","DOI":"10.1145\/313238.313437"},{"key":"S1351324916000231_ref020","unstructured":"Kim S. N. , Medelyan O. , Kan M.-Y. , and Baldwin T. , 2010. SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval \u201910, Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 21\u201326."},{"key":"S1351324916000231_ref021","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324140"},{"key":"S1351324916000231_ref005","unstructured":"Boudin F. 2013. A comparison of centrality measures for graph-based keyphrase extraction. In Proceedings of the 6th International Joint Conference on Natural Language Processing, Nagoya, Japan."},{"key":"S1351324916000231_ref034","unstructured":"Phan X.-H. 2006. CRFTagger: CRF English POS Tagger."},{"key":"S1351324916000231_ref024","doi-asserted-by":"crossref","unstructured":"Lee S. , and Kim H.-J. , 2008. News keyword extraction for topic tracking. In Proceedings of the 2008 4th International Conference on Networked Computing and Advanced Information Management - Volume 02, NCM \u201908, Washington, DC, USA: IEEE Computer Society, pp. 554\u2013559.","DOI":"10.1109\/NCM.2008.199"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324916000231","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,9,13]],"date-time":"2019-09-13T09:13:05Z","timestamp":1568365985000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324916000231\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,9]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,3]]}},"alternative-id":["S1351324916000231"],"URL":"https:\/\/doi.org\/10.1017\/s1351324916000231","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,9]]}}}