{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T10:59:38Z","timestamp":1758279578708,"version":"3.41.2"},"reference-count":62,"publisher":"Emerald","issue":"6","license":[{"start":{"date-parts":[[2023,5,2]],"date-time":"2023-05-02T00:00:00Z","timestamp":1682985600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JD"],"published-print":{"date-parts":[[2023,10,24]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>A method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>The study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>The paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.<\/jats:p><\/jats:sec>","DOI":"10.1108\/jd-02-2023-0019","type":"journal-article","created":{"date-parts":[[2023,5,2]],"date-time":"2023-05-02T03:33:55Z","timestamp":1682998435000},"page":"1440-1458","source":"Crossref","is-referenced-by-count":5,"title":["Integrated use of KOS and deep learning for data set annotation in\u00a0tourism domain"],"prefix":"10.1108","volume":"79","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0390-6556","authenticated-orcid":false,"given":"Giovanna","family":"Aracri","sequence":"first","affiliation":[]},{"given":"Antonietta","family":"Folino","sequence":"additional","affiliation":[]},{"given":"Stefano","family":"Silvestri","sequence":"additional","affiliation":[]}],"member":"140","published-online":{"date-parts":[[2023,5,2]]},"reference":[{"volume-title":"Named Entity Recognition for Cultural Heritage Preservation","year":"2021","first-page":"249","key":"key2023102318081643400_ref001"},{"issue":"2","key":"key2023102318081643400_ref002","first-page":"398","article-title":"A semiautomatic annotation approach for sentiment analysis","volume":"49","year":"2021","journal-title":"Journal of Information Science"},{"key":"key2023102318081643400_ref003","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1016\/j.compbiomed.2016.01.014","article-title":"Unsupervised entity and relation extraction from clinical records in Italian","volume":"72","year":"2016","journal-title":"Computers in Biology and Medicine"},{"year":"2017","first-page":"338","article-title":"KIRA: a system for knowledge-based access to multimedia art collections","key":"key2023102318081643400_ref004"},{"year":"2015","article-title":"Annotation and extraction of relations from Italian medical records","key":"key2023102318081643400_ref005"},{"issue":"1","key":"key2023102318081643400_ref006","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1093\/jamiaopen\/ooy057","article-title":"Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment","volume":"2","year":"2019","journal-title":"JAMIA Open"},{"year":"2015","article-title":"Word embeddings go to Italy: a comparison of models and training datasets","key":"key2023102318081643400_ref007"},{"issue":"1","key":"key2023102318081643400_ref008","first-page":"267","article-title":"The unified medical language system (UMLS): integrating biomedical terminology","volume":"32","year":"2004","journal-title":"Nucleic Acids Research"},{"unstructured":"Broughton, V. (2008), \u201cCostruire thesauri: strumenti per indicizzazione e metadati semantic\u201d, in Ballestra, L. and Venuti, L. (Eds), Translated from Essential Thesaurus Construction, P. Cavaleri. Bibliografica, Milano.","key":"key2023102318081643400_ref009"},{"key":"key2023102318081643400_ref010","first-page":"11","article-title":"Corpus-based knowledge representation in specialized domains","volume":"210","year":"2015","journal-title":"Corpus based Studies on Language Varieties"},{"doi-asserted-by":"crossref","unstructured":"Cheng, C.K., Pan, X. and Kurfess, F. (2004), \u201cOntology-based semantic classification of unstructured documents\u201d, in N\u00fcrnberger, A. and Detyniecki, M. (Eds), Adaptive Multimedia Retrieval, Springer Berlin Heidelberg, Berlin, pp.\u00a0120-131.","key":"key2023102318081643400_ref011","DOI":"10.1007\/978-3-540-25981-7_8"},{"unstructured":"Chollet, F. (2015), \u201cKeras\u201d, available at: https:\/\/keras.io.","key":"key2023102318081643400_ref012"},{"key":"key2023102318081643400_ref013","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1613\/jair.295","article-title":"Active learning with statistical models","volume":"4","year":"1996","journal-title":"Journal of Artificial Intelligence Research"},{"year":"2017","first-page":"301","article-title":"Query expansion based on Wordnet and Word2vec for Italian question answering systems","key":"key2023102318081643400_ref014"},{"year":"2011","first-page":"413","article-title":"Semantic enhancement: the key to massive and heterogeneous data pools","key":"key2023102318081643400_ref015"},{"key":"key2023102318081643400_ref016","first-page":"30","article-title":"Integrating heritage management and tourism at Italian cultural destinations","volume":"12","year":"2010","journal-title":"International Journal of Arts Management"},{"year":"2014","first-page":"2062","article-title":"T2k\u02c62: a system for automatically extracting and organizing knowledge from texts","key":"key2023102318081643400_ref017"},{"year":"2019","first-page":"4171","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","key":"key2023102318081643400_ref018"},{"year":"2017","first-page":"182","article-title":"A novel system for the automatic extraction of a patient problem summary","key":"key2023102318081643400_ref019"},{"key":"key2023102318081643400_ref020","article-title":"Improving graph embeddings via entity linking: a case study on Italian clinical notes","volume":"17","year":"2023","journal-title":"Intelligent Systems with Applications"},{"issue":"12","key":"key2023102318081643400_ref021","doi-asserted-by":"crossref","first-page":"4480","DOI":"10.1108\/IJCHM-09-2021-1176","article-title":"Deep learning in hospitality and tourism: a research framework agenda for future research","volume":"34","year":"2022","journal-title":"International Journal of Contemporary Hospitality Management"},{"issue":"1","key":"key2023102318081643400_ref022","doi-asserted-by":"crossref","first-page":"47","DOI":"10.26599\/BDMA.2020.9020015","article-title":"Hybrid recommender system for tourism based on big data and AI: a conceptual framework","volume":"4","year":"2021","journal-title":"Big Data Mining and Analytics"},{"year":"2020","first-page":"7732","article-title":"Rethinking generalization of neural models: a named entity recognition case study","key":"key2023102318081643400_ref023"},{"key":"key2023102318081643400_ref024","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.imu.2018.10.011","article-title":"Learning for clinical named entity recognition without manual annotations","volume":"13","year":"2018","journal-title":"Informatics in Medicine Unlocked"},{"key":"key2023102318081643400_ref025","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.cosrev.2018.06.001","article-title":"Recent named entity recognition and classification techniques: a systematic review","volume":"29","year":"2018","journal-title":"Computer Science Review"},{"year":"2014","article-title":"Helping users find the \u2018good stuff\u2019: using the semantic analysis method (SAM) tool to identify and extract potential access points from archival finding aids","key":"key2023102318081643400_ref026"},{"year":"2014","first-page":"413","article-title":"Semantic analysis method (SAM): a tool for identifying potential access points in unstructured text","key":"key2023102318081643400_ref027"},{"issue":"5","key":"key2023102318081643400_ref028","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1109\/MIS.2015.68","article-title":"Information extraction","volume":"30","year":"2015","journal-title":"IEEE Intelligent Systems"},{"year":"2015","first-page":"147","article-title":"Improving access to large-scale digital libraries through semantic-enhanced search and disambiguation","key":"key2023102318081643400_ref029"},{"issue":"6","key":"key2023102318081643400_ref030","doi-asserted-by":"crossref","first-page":"1468","DOI":"10.1016\/j.tourman.2012.01.016","article-title":"Web users' behavioural patterns of tourism information\u00a0search: from online to offline","volume":"33","year":"2012","journal-title":"Tourism Management"},{"issue":"6","key":"key2023102318081643400_ref031","first-page":"1223","article-title":"Named-entity recognition for early modern textual documents: a review of capabilities and challenges with strategies for the future","volume":"22","year":"2021","journal-title":"Journal of Documentation"},{"volume-title":"Information and Documentation \u2014 Thesauri and Interoperability with Other Vocabularies \u2014 Part 1: Thesauri for Information Retrieval","year":"2011","author":"ISO25964-1:2011","key":"key2023102318081643400_ref032"},{"issue":"3","key":"key2023102318081643400_ref033","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1080\/13683500.2011.555528","article-title":"Automated web harvesting to collect and analyse user-generated content for tourism","volume":"15","year":"2012","journal-title":"Current Issues in Tourism"},{"issue":"2","key":"key2023102318081643400_ref034","first-page":"289","article-title":"Active learning: a step towards automating medical concept extraction","volume":"23","year":"2016","journal-title":"JAMIA"},{"key":"key2023102318081643400_ref035","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.ijmedinf.2017.08.001","article-title":"Active learning reduces annotation time for\u00a0clinical concept extraction","volume":"106","year":"2017","journal-title":"International Journal of Medical Informatics"},{"year":"2016","first-page":"260","article-title":"Neural architectures for named entity recognition","key":"key2023102318081643400_ref036"},{"volume-title":"Testo e computer. Introduzione alla linguistica computazionale","year":"2005","key":"key2023102318081643400_ref037"},{"volume-title":"Development of Information and Communication Technology: from E-Tourism to Smart Tourism","year":"2022","first-page":"1","key":"key2023102318081643400_ref038"},{"issue":"1","key":"key2023102318081643400_ref039","first-page":"50","article-title":"A survey on deep learning for named entity recognition","volume":"34","year":"2020","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"key2023102318081643400_ref040","first-page":"58","article-title":"Assessing online sustainability communication of Italian cultural destinations \u2013 a web content mining approach","year":"2021","journal-title":"Information and Communication Technologies in Tourism 2021"},{"issue":"1","key":"key2023102318081643400_ref041","doi-asserted-by":"crossref","first-page":"54","DOI":"10.5771\/0943-7444-2018-1-54","article-title":"Knowledge Organization System (KOS): an introductory critical account","volume":"45","year":"2018","journal-title":"Knowledge Organization"},{"year":"2013","article-title":"Efficient estimation of word representations in vector space","key":"key2023102318081643400_ref042"},{"year":"2011","first-page":"37","article-title":"Thesaurus alignment for linked data publishing","key":"key2023102318081643400_ref043"},{"year":"2018","first-page":"2033","article-title":"Annotation of a large clinical entity corpus","key":"key2023102318081643400_ref044"},{"key":"key2023102318081643400_ref045","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2017\/7831897","article-title":"Semantic annotation of unstructured documents using concepts similarity","volume":"2017","year":"2017","journal-title":"Scientific Programming"},{"year":"2008","first-page":"2603","article-title":"The TextPro tool suite","key":"key2023102318081643400_ref046"},{"issue":"1","key":"key2023102318081643400_ref047","first-page":"1","article-title":"Sensing and making sense of tourism flows and urban data to foster sustainability awareness: a real-world experience","volume":"8","year":"2021","journal-title":"Journal of big Data"},{"year":"2019","first-page":"1129","article-title":"Improving biomedical information extraction with word embeddings trained on closed-domain corpora","key":"key2023102318081643400_ref048"},{"issue":"12","key":"key2023102318081643400_ref049","doi-asserted-by":"crossref","first-page":"5775","DOI":"10.3390\/app12125775","article-title":"Iterative annotation of biomedical NER corpora with deep neural networks and knowledge bases","volume":"12","year":"2022","journal-title":"Applied Sciences"},{"doi-asserted-by":"crossref","unstructured":"Stiller, J., Petras, V., G\u00e4de, M. and Isaac, A. (2014), \u201cAutomatic enrichments with controlled vocabularies in Europeana: challenges and consequences\u201d, in Ioannides, M., Magnenat-Thalmann, N., Fink, E., \u017darni\u0107, R., Yen, A.Y. and Quak, E. (Eds), Digital Heritage. Progress in Cultural Heritage: Documentation, Preservation, and Protection, Springer International Publishing, Cham, pp.\u00a0238-247.","key":"key2023102318081643400_ref050","DOI":"10.1007\/978-3-319-13695-0_23"},{"year":"2020","first-page":"383","article-title":"Contextualized embeddings in named-entity recognition: an empirical study on generalization","key":"key2023102318081643400_ref051"},{"issue":"2","key":"key2023102318081643400_ref052","first-page":"180","article-title":"How diverse is hotel website accessibility? A study in the central region of Portugal using web diagnostic tools","volume":"22","year":"2021","journal-title":"Tourism and Hospitality Research"},{"year":"2000","first-page":"127","article-title":"Introduction to the CoNLL-2000 shared task chunking","key":"key2023102318081643400_ref053"},{"year":"2009","first-page":"105","article-title":"Reducing class imbalance during active learning for named entity annotation","key":"key2023102318081643400_ref054"},{"issue":"2","key":"key2023102318081643400_ref055","first-page":"262","article-title":"Exploring entity recognition and disambiguation for cultural heritage collections","volume":"30","year":"2013","journal-title":"Digital Scholarship in the Humanities"},{"year":"2020","first-page":"53","article-title":"Knowledge-based named entity recognition of archaeological concepts in Dutch","key":"key2023102318081643400_ref056"},{"issue":"1","key":"key2023102318081643400_ref057","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12911-018-0723-6","article-title":"A clinical text classification paradigm using weak supervision and deep representation","volume":"19","year":"2019","journal-title":"BMC Medical Informatics and Decision Making"},{"key":"key2023102318081643400_ref058","volume-title":"UNWTO Tourism Highlights","author":"WTO","year":"2018","edition":"2018 Edition"},{"year":"2018","first-page":"2145","article-title":"A survey on recent advances in named entity recognition from deep learning models","key":"key2023102318081643400_ref059"},{"year":"2018","first-page":"2159","article-title":"Distantly supervised NER with partial annotation learning and reinforcement learning","key":"key2023102318081643400_ref060"},{"issue":"2-3","key":"key2023102318081643400_ref061","doi-asserted-by":"crossref","first-page":"160","DOI":"10.5771\/0943-7444-2008-2-3-160","article-title":"Knowledge organization systems (KOS)","volume":"35","year":"2008","journal-title":"Knowledge Organization"},{"key":"key2023102318081643400_ref062","article-title":"Application of big data technology in the impact of tourism e-commerce on tourism planning","volume":"2021","year":"2021","journal-title":"Complex"}],"container-title":["Journal of Documentation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/JD-02-2023-0019\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/JD-02-2023-0019\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T22:33:26Z","timestamp":1753396406000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/jd\/article\/79\/6\/1440-1458\/207182"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,2]]},"references-count":62,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2023,5,2]]},"published-print":{"date-parts":[[2023,10,24]]}},"alternative-id":["10.1108\/JD-02-2023-0019"],"URL":"https:\/\/doi.org\/10.1108\/jd-02-2023-0019","relation":{},"ISSN":["0022-0418"],"issn-type":[{"type":"print","value":"0022-0418"}],"subject":[],"published":{"date-parts":[[2023,5,2]]}}}