{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T10:13:49Z","timestamp":1766139229353,"version":"3.41.2"},"reference-count":17,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T00:00:00Z","timestamp":1751414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010784","name":"Banco Santander","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100010784","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>The labor market is rapidly evolving, leading to a mismatch between existing Knowledge, Skills, and Abilities (KSAs) and future occupational requirements. Reports from organizations like the World Economic Forum and the OECD emphasize the need for dynamic skill identification. This paper introduces a novel system for constructing a dynamic taxonomy using Natural Language Processing (NLP) techniques, specifically Named Entity Recognition (NER) and Relation Extraction (RE), to identify and predict future skills. By leveraging machine learning models, this taxonomy aims to bridge the gap between current skills and future demands, contributing to educational and professional development.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>To achieve this, an NLP-based architecture was developed using a combination of text preprocessing, NER, and RE models. The NER model identifies and categorizes KSAs and occupations from a corpus of labor market reports, while the RE model establishes the relationships between these entities. A custom pipeline was used for PDF text extraction, tokenization, and lemmatization to standardize the data. The models were trained and evaluated using over 1,700 annotated documents, with the training process optimized for both entity recognition and relationship prediction accuracy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The NER and RE models demonstrated promising performance. The NER model achieved a best micro-averaged F1-score of 65.38% in identifying occupations, skills, and knowledge entities. The RE model subsequently achieved a best micro-F1 score of 82.2% for accurately classifying semantic relationships between these entities at epoch 1,009. The taxonomy generated from these models effectively identified emerging skills and occupations, offering insights into future workforce requirements. Visualizations of the taxonomy were created using various graph structures, demonstrating its applicability across multiple sectors. The results indicate that this system can dynamically update and adapt to changes in skill demand over time.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>The dynamic taxonomy model not only provides real-time updates on current competencies but also predicts emerging skill trends, offering a valuable tool for workforce planning. The high recall rates in NER suggest strong entity recognition capabilities, though precision improvements are needed to reduce false positives. Limitations include the need for a larger corpus and sector-specific models. Future work will focus on expanding the corpus, improving model accuracy, and incorporating expert feedback to further refine the taxonomy.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2025.1579998","type":"journal-article","created":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T05:52:19Z","timestamp":1751435539000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Dynamic taxonomy generation for future skills identification using a named entity recognition and relation extraction pipeline"],"prefix":"10.3389","volume":"8","author":[{"given":"Luis Jose","family":"Gonzalez-Gomez","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sofia Margarita","family":"Hernandez-Munoz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abiel","family":"Borja","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fernando A.","family":"Arana-Salas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jose Daniel","family":"Azofeifa","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julieta","family":"Noguez","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patricia","family":"Caratozzolo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1364","DOI":"10.1093\/jamia\/ocz068","article-title":"Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies","volume":"26","author":"Afshar","year":"2019","journal-title":"J. Am. Med. Inform. Assoc"},{"key":"B2","first-page":"1431","article-title":"\u201cAge recommendation for texts,\u201d","volume-title":"Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)","author":"Blandin","year":"2020"},{"key":"B3","doi-asserted-by":"publisher","first-page":"3279","DOI":"10.1109\/TKDE.2021.3126456","article-title":"A general survey on attention mechanisms in deep learning","volume":"35","author":"Brauwers","year":"2023","journal-title":"IEEE Trans. Knowl. Data Eng"},{"key":"B4","doi-asserted-by":"publisher","first-page":"30157","DOI":"10.1109\/ACCESS.2022.3158975","article-title":"Injecting user identity into pretrained language models for document-level sentiment classification","volume":"10","author":"Cao","year":"2022","journal-title":"IEEE Access"},{"key":"B5","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1109\/VAST.2012.6400552","article-title":"\u201cWatch this: a taxonomy for dynamic data visualization,\u201d","volume-title":"2012 IEEE Conference on Visual Analytics Science and Technology (VAST)","author":"Cottam","year":"2012"},{"key":"B6","first-page":"1","article-title":"\u201cAspect-based emotion analysis on speech for predicting performance in collaborative learning,\u201d","volume-title":"2021 IEEE Frontiers in Education Conference (FIE)","author":"Dehbozorgi","year":"2021"},{"key":"B7","first-page":"1928","volume-title":"Corel: Seed-guided Topical Taxonomy Construction by Concept Learning and Relation Transferring. KDD '20","author":"Huang","year":"2020"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1191","article-title":"Deep residual learning for weakly-supervised relation extraction","author":"Huang","year":"2017","journal-title":"arXiv preprint arXiv:1707.08866"},{"key":"B9","doi-asserted-by":"publisher","first-page":"925","DOI":"10.1145\/3485447.3511935","article-title":"TaxoEnrich: self-supervised taxonomy completion via structure-semantic representations","author":"Jiang","year":"2022","journal-title":"arXiv:2202.04887"},{"key":"B10","doi-asserted-by":"publisher","first-page":"139742","DOI":"10.1109\/ACCESS.2024.3465409","article-title":"Analyzing natural language processing techniques to extract meaningful information on skills acquisition from textual content","volume":"12","author":"Jose Gonzalez-Gomez","year":"2024","journal-title":"IEEE Access"},{"journal-title":"Speech and Language Processing","year":"2009","author":"Jurafsky","key":"B11"},{"year":"2021","author":"Laverghetta Jr","key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2106.06849"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3495162","article-title":"A survey on text classification: from traditional to deep learning","volume":"13","author":"Li","year":"2022","journal-title":"ACM Trans. Intell. Syst. Technol"},{"key":"B14","doi-asserted-by":"publisher","first-page":"2471","DOI":"10.1007\/s00521-022-07727-y","article-title":"Integration of global and local information for text classification","volume":"35","author":"Li","year":"2023","journal-title":"Neural Comput. Appl"},{"key":"B15","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3389\/frai.2019.00001","article-title":"A pattern-based method for medical entity recognition from chinese diagnostic imaging text","volume":"2","author":"Liang","year":"2019","journal-title":"Front. Artif. Intell"},{"key":"B16","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1145\/3543873.3587667","article-title":"\u201cSkill graph construction from semantic understanding,\u201d","volume-title":"Companion Proceedings of the ACM Web Conference 2023, WWW '23 Companion","author":"Lin","year":"2023"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1375419","DOI":"10.3389\/frai.2024.1375419","article-title":"Sats: simplification aware text summarization of scientific documents","volume":"7","author":"Zaman","year":"2024","journal-title":"Front. Artif. Intell"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1579998\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T05:52:21Z","timestamp":1751435541000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1579998\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,2]]},"references-count":17,"alternative-id":["10.3389\/frai.2025.1579998"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1579998","relation":{},"ISSN":["2624-8212"],"issn-type":[{"type":"electronic","value":"2624-8212"}],"subject":[],"published":{"date-parts":[[2025,7,2]]},"article-number":"1579998"}}