{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:40:10Z","timestamp":1760035210898,"version":"build-2065373602"},"reference-count":25,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T00:00:00Z","timestamp":1751414400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Career planning agencies and other organizations can help workers if they are able to effectively identify related occupations that are relevant to the task at hand. Occupational knowledge bases such as O*NET and ESCO represent mature attempts to categorize occupations and describe them in detail so that they can be used to search for related occupations. Vector databases offer an opportunity to find related occupations based on large pre-trained word and sentence embeddings and their associated retrieval algorithms for similarity search. This paper reports a systematic empirical evaluation of the possibilities of using vector databases for related occupation retrieval using different document structures, embeddings, and retrieval configurations for two popular open source vector databases, and using the O*NET curated database. The objective was to understand the extent to which curated relations capture all the meaningful relations in a context of retrieval. The results show that, independent of the database used, distance metrics, sentence embeddings, and the selection of text fragments are all significant in the overall retrieval performance when comparing with curated relations, but they also retrieve other relevant occupations based on text similarity. Further, the precision is high for smaller cutoffs in the results list, which is especially important for settings in which vector database retrieval is set up as part of a Retrieval Augmented Generation (RAG) pattern. The inspection of highly ranked retrieved related occupations not explicit in the curated database reveals that text similarity captures the taxonomical grouping of some occupations in some cases, but also other cross-cuts different aspects that are distinct from the hierarchical organization of the database in most of the cases. This suggests that text retrieval should be combined with querying explicit relations in practical applications.<\/jats:p>","DOI":"10.3390\/bdcc9070175","type":"journal-article","created":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T06:46:12Z","timestamp":1751438772000},"page":"175","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Using Vector Databases for the Selection of Related Occupations: An Empirical Evaluation Using O*NET"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0442-3399","authenticated-orcid":false,"given":"Lino","family":"Gonzalez-Garcia","sequence":"first","affiliation":[{"name":"Computer Science Department, Universidad de Alcal\u00e1, 28801 Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3067-4180","authenticated-orcid":false,"given":"Miguel-Angel","family":"Sicilia","sequence":"additional","affiliation":[{"name":"Computer Science Department, Universidad de Alcal\u00e1, 28801 Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6752-9599","authenticated-orcid":false,"given":"Elena","family":"Garc\u00eda-Barriocanal","sequence":"additional","affiliation":[{"name":"Computer Science Department, Universidad de Alcal\u00e1, 28801 Madrid, Spain"}]}],"member":"1968","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1002\/j.2161-0045.2005.tb00140.x","article-title":"Use of technology in delivering career services worldwide","volume":"54","author":"Sampson","year":"2005","journal-title":"Career Dev. Q."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1111\/j.1744-6570.2001.tb00100.x","article-title":"Understanding work using the Occupational Information Network (O*NET): Implications for practice and research","volume":"54","author":"Peterson","year":"2001","journal-title":"Pers. Psychol."},{"key":"ref_3","first-page":"57","article-title":"Esco: Boosting job matching in europe with semantic interoperability","volume":"47","author":"Papantoniou","year":"2014","journal-title":"Computer"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1111\/j.1744-6570.2004.tb02497.x","article-title":"Matching individuals to occupations using abilities and the O*NET: Issues and an application in career guidance","volume":"57","author":"Converse","year":"2004","journal-title":"Pers. Psychol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"100757","DOI":"10.1016\/j.patter.2023.100757","article-title":"Occupational models from 42 million unstructured job postings","volume":"4","author":"Dixon","year":"2023","journal-title":"Patterns"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1591","DOI":"10.1007\/s00778-024-00864-x","article-title":"Survey of vector database management systems","volume":"33","author":"Pan","year":"2024","journal-title":"VLDB J."},{"key":"ref_7","unstructured":"Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"104658","DOI":"10.1016\/j.respol.2022.104658","article-title":"Routinization, within-occupation task changes and long-run employment dynamics","volume":"52","author":"Consoli","year":"2023","journal-title":"Res. Policy"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1693","DOI":"10.1177\/00027642221127239","article-title":"Identifying alternative occupations for truck drivers displaced due to autonomous vehicles by leveraging the O*NET database","volume":"67","author":"Chang","year":"2023","journal-title":"Am. Behav. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4215","DOI":"10.1108\/IJCHM-01-2021-0073","article-title":"A network analysis of cross-occupational skill transferability for the hospitality industry","volume":"33","author":"Huang","year":"2021","journal-title":"Int. J. Contemp. Hosp. Manag."},{"key":"ref_11","first-page":"353","article-title":"Illustrating the application of a skills taxonomy, machine learning and online data to inform career and training decisions","volume":"40","author":"Mason","year":"2023","journal-title":"Int. J. Inf. Learn. Technol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"127043","DOI":"10.1016\/j.eswa.2025.127043","article-title":"CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations","volume":"275","author":"Rosenberger","year":"2025","journal-title":"Expert Syst. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"100509","DOI":"10.1016\/j.bdr.2025.100509","article-title":"A novel approach for job matching and skill recommendation using transformers and the O*NET database","volume":"39","author":"Alonso","year":"2025","journal-title":"Big Data Res."},{"key":"ref_14","unstructured":"Dahlke, J.A., Putka, D.J., Shewach, O., and Lewis, P. (2025, June 04). Developing Related Occupations for the O*NET Program. National Center for O*NET Development. Available online: https:\/\/www.onetcenter.org\/reports\/Related_2022.html."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"113302","DOI":"10.1016\/j.knosys.2025.113302","article-title":"LLM4Jobs: Unsupervised occupation extraction and standardization leveraging Large Language Models","volume":"316","author":"Li","year":"2025","journal-title":"Knowl.-Based Syst."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Liu, S., and Wang, J. (2024, January 13\u201317). Are there fundamental limitations in supporting vector data management in relational databases? A case study of PostgreSQL. Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands.","DOI":"10.1109\/ICDE60146.2024.00280"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"101216","DOI":"10.1016\/j.cogsys.2024.101216","article-title":"Vector database management systems: Fundamental concepts, use-cases, and current challenges","volume":"85","author":"Taipalus","year":"2024","journal-title":"Cogn. Syst. Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3678261","article-title":"Recommending career transitions to job seekers using earnings estimates, skills similarity, and occupational demand","volume":"5","author":"Howison","year":"2024","journal-title":"Digit. Gov. Res. Pract."},{"key":"ref_19","unstructured":"Decorte, J.J., Van Hautte, J., Demeester, T., and Develder, C. (2021). JobBERT: Understanding Job Titles through Skills. arXiv."},{"key":"ref_20","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_21","first-page":"24824","article-title":"Chain-of-thought prompting elicits reasoning in large language models","volume":"35","author":"Wei","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Chan, B., Schweter, S., and M\u00f6ller, T. (2020). German\u2019s next language model. arXiv.","DOI":"10.18653\/v1\/2020.coling-main.598"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Nalis, I., Kubicek, B., and Korunka, C. (2021). From shock to shift\u2013A qualitative analysis of accounts in mid-career about changes in the career path. Front. Psychol., 12.","DOI":"10.3389\/fpsyg.2021.641248"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/s12651-016-0199-8","article-title":"The O*NET content model: Strengths and limitations","volume":"49","author":"Handel","year":"2016","journal-title":"J. Labour Mark. Res."},{"key":"ref_25","unstructured":"Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., Zhang, Y., and Tang, S. (2024). Graph retrieval-augmented generation: A survey. arXiv."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/7\/175\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:03:02Z","timestamp":1760032982000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/7\/175"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,2]]},"references-count":25,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["bdcc9070175"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9070175","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,7,2]]}}}