{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:00:18Z","timestamp":1772906418387,"version":"3.50.1"},"reference-count":114,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,5,13]],"date-time":"2020-05-13T00:00:00Z","timestamp":1589328000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>The emergence of online professional platforms, such as LinkedIn and Indeed, has led to unprecedented volumes of rich resume data that have revolutionized the study of careers. One of the most prevalent problems in this space is the extraction of prototype career paths from a workforce. Previous research has consistently relied on a two-step approach to tackle this problem. The first step computes the pairwise distances between all the career sequences in the database. The second step uses the distance matrix to create clusters, with each cluster representing a different prototype path. As we demonstrate in this work, this approach faces two significant challenges when applied on large resume databases. First, the overwhelming diversity of job titles in the modern workforce prevents the accurate evaluation of distance between career sequences. Second, the clustering step of the standard approach leads to highly heterogeneous clusters, due to its inability to handle categorical sequences and sensitivity to outliers. This leads to non-representative centroids and spurious prototype paths that do not accurately represent the actual groups in the workforce. Our work addresses these two challenges and has practical implications for the numerous researchers and practitioners working on the analysis of career data across domains.<\/jats:p>","DOI":"10.1145\/3379984","type":"journal-article","created":{"date-parts":[[2020,5,19]],"date-time":"2020-05-19T10:42:16Z","timestamp":1589884936000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Mining Career Paths from Large Resume Databases"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4669-4170","authenticated-orcid":false,"given":"Theodoros","family":"Lappas","sequence":"first","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, New Jersey"}]}],"member":"320","published-online":{"date-parts":[[2020,5,13]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1086\/229495"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1177\/0049124100029001001"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.1120.1582"},{"key":"e_1_2_1_4_1","volume-title":"Reddy","author":"Aggarwal Charu C.","year":"2014"},{"key":"e_1_2_1_5_1","volume-title":"A k-mean clustering algorithm for mixed numeric and categorical data. Data 8 Knowledge Engineering 63, 2","author":"Ahmad Amir","year":"2007"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2009.150"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1080\/09585192.2012.697476"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2008.12.011"},{"key":"e_1_2_1_9_1","volume-title":"My brilliant career: Characterizing the early labor market trajectories of British women from generation X. Sociological Methods 8 Research 38, 3","author":"Anyadike-Danes Michael","year":"2010"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 2007 International Conference on Computational Intelligence and Multimedia Applications,Vol. 2. IEEE, 13--17","author":"Aranganayagi S."},{"key":"e_1_2_1_11_1","volume-title":"Rousseau","author":"Arthur Michael Bernard","year":"2001"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1404506"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1287\/orsc.2015.1003"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1080\/09585190902850190"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvb.2012.06.003"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1086\/210177"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/17.10.935"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1080\/03610927408827101"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1300\/J124v19n01_01"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/2481674.2481683"},{"key":"e_1_2_1_21_1","volume-title":"KDD Workshop on Data Cleaning and Object Consolidation","volume":"3","author":"Cohen William","year":"2003"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1744-6570.1995.tb01771.x"},{"key":"e_1_2_1_23_1","unstructured":"Kashyap Dalal. 2017. Data Analyst vs. Data Scientist - What\u2019s the Difference? Retrieved from https:\/\/www.simplilearn.com\/data-analyst-vs-data-scientist-article.  Kashyap Dalal. 2017. Data Analyst vs. Data Scientist - What\u2019s the Difference? Retrieved from https:\/\/www.simplilearn.com\/data-analyst-vs-data-scientist-article."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvb.2015.04.005"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MS.2014.72"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0263-7863(99)00034-4"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/16.5.451"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S1053-4822(03)00048-2"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.2307\/2340521"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1111\/jbl.12150"},{"key":"e_1_2_1_31_1","volume-title":"Frey and Delbert Dueck","author":"Brendan","year":"2007"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2017.2784"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1744-6570.1991.tb00697.x"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556195.2559893"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/990680.990684"},{"key":"e_1_2_1_36_1","unstructured":"David Goulden. 2017. What\u2019s the Difference between Program Product and Project Managers? Retrieved from https:\/\/www.clarizen.com\/whats-difference-program-product-project-managers\/.  David Goulden. 2017. What\u2019s the Difference between Program Product and Project Managers? Retrieved from https:\/\/www.clarizen.com\/whats-difference-program-product-project-managers\/."},{"key":"e_1_2_1_37_1","volume-title":"Handbook of Organizational Behavior","author":"Greenhaus Jeffrey H."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2018.00054"},{"key":"e_1_2_1_39_1","volume-title":"Software engineering at Google. arXiv preprint arXiv:1702.01715","author":"Henderson Fergus","year":"2017"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti1049"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1348\/096317906X119738"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvb.2006.12.003"},{"key":"e_1_2_1_43_1","volume-title":"Statistical Models for Data Analysis","author":"Iezzi Domenica Fioredistella"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigDataService.2015.61"},{"key":"e_1_2_1_45_1","volume-title":"Towards a job title classification system. arXiv preprint arXiv:1606.00917","author":"Javed Faizan","year":"2016"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1287\/isre.2014.0558"},{"key":"e_1_2_1_47_1","volume-title":"Soon Ang, and Sandra A. Slaughter.","author":"Joseph Damien","year":"2012"},{"key":"e_1_2_1_48_1","volume-title":"Quality assurance institute worldwide annual software testing conference. Exploratory Testing","author":"Kaner Cem","year":"2006"},{"key":"e_1_2_1_49_1","volume-title":"Rousseeuw","author":"Kaufman Leonard","year":"2009"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-8-286"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/1046103.1046108"},{"key":"e_1_2_1_52_1","volume-title":"Content analysis: An introduction to its methodology","author":"Klaus Krippendorff"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1002\/hrm.21759"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ssresearch.2006.03.004"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tourman.2017.03.005"},{"key":"e_1_2_1_56_1","unstructured":"Klaus Krippendorff. 2011. Computing Krippendorff\u2019s alpha-reliability.  Klaus Krippendorff. 2011. Computing Krippendorff\u2019s alpha-reliability."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0036144503424786"},{"key":"e_1_2_1_58_1","unstructured":"Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. Retrieved from https:\/\/www.clarizen.com\/whats-difference-program-product-project-managers\/.  Jey Han Lau and Timothy Baldwin. 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. Retrieved from https:\/\/www.clarizen.com\/whats-difference-program-product-project-managers\/."},{"key":"e_1_2_1_59_1","volume-title":"Proceedings of the 31st International Conference on Machine Learning. 1188--1196","author":"Le Quoc","year":"2014"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2017.2871"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.17"},{"key":"e_1_2_1_62_1","volume-title":"Proceedings of the 13th AAAI Conference on Artificial Intelligence, February 12--17, 2016","author":"Liu Ye","year":"1885"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.2218\/ijdc.v11i2.417"},{"key":"e_1_2_1_64_1","volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2959086"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v074.i09"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330969"},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.","author":"Mihalcea Rada","year":"2004"},{"key":"e_1_2_1_69_1","volume-title":"Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00148-009-0296-x"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02294245"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1365-2648.2004.03290.x"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.53"},{"key":"e_1_2_1_74_1","volume-title":"The Art of Software Testing","author":"Myers Glenford J."},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2004.04.002"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature03607"},{"key":"e_1_2_1_77_1","first-page":"844","article-title":"Cataloging professionals in the digital environment: A content analysis of job descriptions","volume":"60","author":"Park Jungran","year":"2009","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"e_1_2_1_78_1","unstructured":"Payscale. 2018. Software Engineering Manager. Retrieved from https:\/\/www.payscale.com\/research\/US\/Job=Software_Engineering_Manager\/Salary.  Payscale. 2018. Software Engineering Manager. Retrieved from https:\/\/www.payscale.com\/research\/US\/Job=Software_Engineering_Manager\/Salary."},{"key":"e_1_2_1_79_1","volume-title":"Proceedings of the International Conference on Information Systems, 150","author":"Jeria"},{"key":"e_1_2_1_80_1","unstructured":"Meghan Raphael. 2016. Software engineer vs computer programmer: What is the difference? Retrieved from https:\/\/www.electronicproducts.com\/Education\/Career\/Software_engineer_vs_computer_programmer_what_s_the_difference.aspx.  Meghan Raphael. 2016. Software engineer vs computer programmer: What is the difference? Retrieved from https:\/\/www.electronicproducts.com\/Education\/Career\/Software_engineer_vs_computer_programmer_what_s_the_difference.aspx."},{"key":"e_1_2_1_81_1","volume-title":"Mohammad Al Hasan, and Mohammed J. Zaki","author":"Reddy Chandan K.","year":"2013"},{"key":"e_1_2_1_82_1","volume-title":"Reddy and Bhanukiran Vinzamuri","author":"Chandan","year":"2013"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.2307\/249467"},{"key":"e_1_2_1_84_1","volume-title":"Proceedings of the ACM SIGCPR Conference on Computer Personnel Research","author":"Catherine","year":"1998"},{"key":"e_1_2_1_85_1","unstructured":"Richard Rivera and Adam Haverson. 2014. Data Scientist vs Data Analyst. Retrieved from https:\/\/www.captechconsulting.com\/blogs\/data-scientist-vs-data-analyst.  Richard Rivera and Adam Haverson. 2014. Data Scientist vs Data Analyst. Retrieved from https:\/\/www.captechconsulting.com\/blogs\/data-scientist-vs-data-analyst."},{"key":"e_1_2_1_86_1","unstructured":"Walker Royce. 1999. Software Project Management. Pearson Education India.  Walker Royce. 1999. Software Project Management. Pearson Education India."},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICTAI.2004.50"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1177\/0894486511427559"},{"key":"e_1_2_1_89_1","volume-title":"A concise guide to market research","author":"Sarstedt Marko"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2009.01.003"},{"key":"e_1_2_1_91_1","volume-title":"\u201cO\u2019Reilly Media","author":"Schutt Rachel"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1033"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.1108\/13620430810891437"},{"key":"e_1_2_1_94_1","volume-title":"Proceedings of the 9th International Conference on Neural Information Processing Systems. 648--654","author":"Smyth Padhraic","year":"1997"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8258354"},{"key":"e_1_2_1_96_1","volume-title":"Career trajectories of entrepreneurs, executives, and senior managers in high-tech.","author":"Stephens Bryan","year":"2018"},{"key":"e_1_2_1_97_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvb.2005.09.001"},{"key":"e_1_2_1_98_1","volume-title":"Bachrach","author":"Super Donald E.","year":"1957"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.2014.1899"},{"key":"e_1_2_1_100_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.1110.1445"},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-9868.00293"},{"key":"e_1_2_1_102_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2006.06.015"},{"key":"e_1_2_1_103_1","unstructured":"John R. Vacca. 2012. Computer and Information Security Handbook. Newnes.  John R. Vacca. 2012. Computer and Information Security Handbook. Newnes."},{"key":"e_1_2_1_104_1","doi-asserted-by":"crossref","unstructured":"Matia Vannoni and Peter John. 2015. Proto stars dwarfs and giants: A career path analysis of British MPs. (2015). Avilable on SSRN: https:\/\/ssrn.com\/abstract=2684490  Matia Vannoni and Peter John. 2015. Proto stars dwarfs and giants: A career path analysis of British MPs. (2015). Avilable on SSRN: https:\/\/ssrn.com\/abstract=2684490","DOI":"10.2139\/ssrn.2684490"},{"key":"e_1_2_1_105_1","volume-title":"Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 225--236","author":"Wang Gang"},{"key":"e_1_2_1_106_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2007.331"},{"key":"e_1_2_1_107_1","volume-title":"Kamakura","author":"Wedel Michel","year":"2012"},{"key":"e_1_2_1_108_1","volume-title":"Widmer and Gilbert Ritschard","author":"Eric","year":"2009"},{"key":"e_1_2_1_109_1","unstructured":"Karl Wiegers and Joy Beatty. 2013. Software Requirements. Pearson Education.  Karl Wiegers and Joy Beatty. 2013. Software Requirements. Pearson Education."},{"key":"e_1_2_1_110_1","doi-asserted-by":"publisher","DOI":"10.1177\/0049124100029001003"},{"key":"e_1_2_1_111_1","volume-title":"Proceedings of the 2013 IEEE\/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE, 779--786","author":"Yelena Wu Meng Qi","year":"2013"},{"key":"e_1_2_1_112_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2011.13"},{"key":"e_1_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.1145\/1150402.1150506"},{"key":"e_1_2_1_114_1","volume-title":"Proceedings of the 13th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017","author":"Zhu Yun","year":"2017"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3379984","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3379984","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:20Z","timestamp":1750200080000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3379984"}},"subtitle":["Evidence from IT Professionals"],"short-title":[],"issued":{"date-parts":[[2020,5,13]]},"references-count":114,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3379984"],"URL":"https:\/\/doi.org\/10.1145\/3379984","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,5,13]]},"assertion":[{"value":"2019-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}