{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T22:44:59Z","timestamp":1759963499968,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,5,7]],"date-time":"2019-05-07T00:00:00Z","timestamp":1557187200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2019,6,30]]},"abstract":"<jats:p>Mappings of first name to gender have been widely recognized as a critical tool for the completion, study, and validation of data records in a range of areas. In this study, we investigate how organizations with large databases of existing entities can create their own mappings between first names and gender and how these mappings can be improved and utilized. Therefore, we first explore a dataset with demographic information on more than 4 million people, which was provided by a car insurance company. Then, we study how naming conventions have changed over time and how they differ by nationality. Next, we build a probabilistic first-name-to-gender mapping and augment the mapping by adding nationality and decade of birth to improve the mapping's performance. We test our mapping in two-label and three-label settings and further validate our mapping by categorizing patent filings by gender of the inventor. We compare the results with previous studies\u2019 outcomes and find that our mapping produces high-precision results. We validate that the additional information of nationality and year of birth improve the precision scores of name-to-gender mappings. Therefore, the proposed approach constitutes an efficient process for improving the data quality of organizations\u2019 records, if the gender attribute is missing or unreliable.<\/jats:p>","DOI":"10.1145\/3297720","type":"journal-article","created":{"date-parts":[[2019,5,8]],"date-time":"2019-05-08T14:11:11Z","timestamp":1557324671000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Augmenting Data Quality through High-Precision Gender Categorization"],"prefix":"10.1145","volume":"11","author":[{"given":"Daniel","family":"M\u00fcller","sequence":"first","affiliation":[{"name":"ETH Zurich, Switzerland"}]},{"given":"Pratiksha","family":"Jain","sequence":"additional","affiliation":[{"name":"ETH Zurich, Switzerland"}]},{"given":"Yieh-Funk","family":"Te","sequence":"additional","affiliation":[{"name":"ETH Zurich, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2019,5,7]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.bushor.2017.01.002"},{"key":"e_1_2_1_2_1","volume-title":"Russel Horton, and Mark Olsen.","author":"Argamon Shlomo","year":"2017","unstructured":"Shlomo Argamon , Jean Baptiste Goulain , Russel Horton, and Mark Olsen. 2017 . DHQ\u202f: Digital humanities quarterly - vive la diff\u00e9rence! Text Mining Gender Difference in French Literature 3, 2 (2017), 1--11. Shlomo Argamon, Jean Baptiste Goulain, Russel Horton, and Mark Olsen. 2017. DHQ\u202f: Digital humanities quarterly - vive la diff\u00e9rence! Text Mining Gender Difference in French Literature 3, 2 (2017), 1--11."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1080\/713827181"},{"key":"e_1_2_1_4_1","first-page":"1","article-title":"Jane, John \u2026 Leslie? A historical method for algorithmic gender prediction","volume":"9","author":"Blevins Cameron","year":"2015","unstructured":"Cameron Blevins and Lincoln Mullen . 2015 . Jane, John \u2026 Leslie? A historical method for algorithmic gender prediction . Digital Humanities Quarterly 9 , 3 (2015), 1 -- 19 . Cameron Blevins and Lincoln Mullen. 2015. Jane, John \u2026 Leslie? A historical method for algorithmic gender prediction. Digital Humanities Quarterly 9, 3 (2015), 1--19.","journal-title":"Digital Humanities Quarterly"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5334\/dsj-2015-002"},{"key":"e_1_2_1_6_1","volume-title":"Does the Cream Always Rise to the Top\u202f? The Misallocation of Talent in Innovation","author":"Alp Murat Celik","year":"2015","unstructured":"Murat Celik Alp . 2015. Does the Cream Always Rise to the Top\u202f? The Misallocation of Talent in Innovation . University of Toronto . Working Paper. ( 2015 ). Murat Celik Alp. 2015. Does the Cream Always Rise to the Top\u202f? The Misallocation of Talent in Innovation. University of Toronto. Working Paper. (2015)."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/2481674.2481683"},{"key":"e_1_2_1_8_1","first-page":"191","article-title":"Understanding customer relationship management (CRM): People, process and technology","volume":"21","author":"Chen Injazz","year":"2017","unstructured":"Injazz Chen and Karen Popovich . 2017 . Understanding customer relationship management (CRM): People, process and technology . Business Process Management Journal 21 , 2 (2017), 191 -- 206 . Injazz Chen and Karen Popovich. 2017. Understanding customer relationship management (CRM): People, process and technology. Business Process Management Journal 21, 2 (2017), 191--206.","journal-title":"Business Process Management Journal"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.140216"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1080\/14640748108400805"},{"key":"e_1_2_1_11_1","volume-title":"Inventing social capital: Evidence from African American inventors from 1843--1930. 48, 4","author":"Cook Lisa D.","year":"2011","unstructured":"Lisa D. Cook . 2011. Inventing social capital: Evidence from African American inventors from 1843--1930. 48, 4 ( 2011 ), 507--518. Lisa D. Cook. 2011. Inventing social capital: Evidence from African American inventors from 1843--1930. 48, 4 (2011), 507--518."},{"key":"e_1_2_1_12_1","volume-title":"Smith","author":"Darley William K.","year":"1995","unstructured":"William K. Darley and Robert E . Smith . 1995 . Gender differences in information processing strategies: An empirical test of the selectivity model in advertising response. Journal of Advertising 24 1 (1995), 41--56. William K. Darley and Robert E. Smith. 1995. Gender differences in information processing strategies: An empirical test of the selectivity model in advertising response. Journal of Advertising 24 1 (1995), 41--56."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/240455.240479"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.4236\/jilsa.2012.43017"},{"key":"e_1_2_1_15_1","unstructured":"Flugzentrale. 2018. Die beliebtesten Schweizer Nachnamen. https:\/\/flugzentrale.de\/1905\/blog\/swiss-surnames.  Flugzentrale. 2018. Die beliebtesten Schweizer Nachnamen. https:\/\/flugzentrale.de\/1905\/blog\/swiss-surnames."},{"key":"e_1_2_1_16_1","volume-title":"How companies learn your secrets. New York Times Magazine","author":"Duhigg Charles","year":"2012","unstructured":"Charles Duhigg . 2012. How companies learn your secrets. New York Times Magazine ( 2012 ), 1--16. Charles Duhigg. 2012. How companies learn your secrets. New York Times Magazine (2012), 1--16."},{"key":"e_1_2_1_17_1","unstructured":"Federal Statistical Office\u2014Look for Statistics. 2017. https:\/\/www.bfs.admin.ch\/bfs\/en\/home\/statistics.html. Accessed: 2017-03-02.  Federal Statistical Office\u2014Look for Statistics. 2017. https:\/\/www.bfs.admin.ch\/bfs\/en\/home\/statistics.html. Accessed: 2017-03-02."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1353\/nlh.2014.0025"},{"key":"e_1_2_1_19_1","volume-title":"Record Linkage: Current Practice and Future Directions. CSIRO Mathematical and Information Sciences. Technical Report","author":"Gu Lifang","year":"2003","unstructured":"Lifang Gu , Rohan Baxter , Deanne Vickers , and Chris Rainsford . 2003 . Record Linkage: Current Practice and Future Directions. CSIRO Mathematical and Information Sciences. Technical Report (2003). Lifang Gu, Rohan Baxter, Deanne Vickers, and Chris Rainsford. 2003. Record Linkage: Current Practice and Future Directions. CSIRO Mathematical and Information Sciences. Technical Report (2003)."},{"key":"e_1_2_1_20_1","first-page":"1","article-title":"Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations","volume":"10","author":"Gudivada Venkat","year":"2017","unstructured":"Venkat Gudivada , Amy Apon , and Junhua Ding . 2017 . Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations . International Journal on Advances in Software 10 , 1 (2017), 1 -- 20 . Venkat Gudivada, Amy Apon, and Junhua Ding. 2017. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software 10, 1 (2017), 1--20.","journal-title":"International Journal on Advances in Software"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242594"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.respol.2012.11.004"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-937X.2008.00531.x"},{"key":"e_1_2_1_24_1","volume-title":"Technological forecasting 8 social change demographic patterns and trends in patenting: Gender, age, and education of inventors. Technological Forecasting 8 Social Change 86","author":"Jung Taehyun","year":"2014","unstructured":"Taehyun Jung and Olof Ejermo . 2014. Technological forecasting 8 social change demographic patterns and trends in patenting: Gender, age, and education of inventors. Technological Forecasting 8 Social Change 86 ( 2014 ), 110--124. Taehyun Jung and Olof Ejermo. 2014. Technological forecasting 8 social change demographic patterns and trends in patenting: Gender, age, and education of inventors. Technological Forecasting 8 Social Change 86 (2014), 110--124."},{"key":"e_1_2_1_25_1","unstructured":"Lens patent database. 2017. https:\/\/www.lens.org\/lens\/.  Lens patent database. 2017. https:\/\/www.lens.org\/lens\/."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.archger.2014.04.004"},{"key":"e_1_2_1_27_1","volume-title":"The gender patenting gap. Briefing Paper, Stem and Innovation","author":"Milli Jessica","year":"2016","unstructured":"Jessica Milli , Barbara Gault , Emma Williams-Barron , Jenny Xia , and Meika Berlan . 2016. The gender patenting gap. Briefing Paper, Stem and Innovation . Institute for Women\u2019s Policy Research. Washington , DC. ( 2016 ), 1--10. Jessica Milli, Barbara Gault, Emma Williams-Barron, Jenny Xia, and Meika Berlan. 2016. The gender patenting gap. Briefing Paper, Stem and Innovation. Institute for Women\u2019s Policy Research. Washington, DC. (2016), 1--10."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2017.8258223"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the 2005 International Conference on Information Quality. MIT","author":"Oliveira Paulo","year":"2005","unstructured":"Paulo Oliveira , F\u00e1tima Henriques , and Pedro Henriques . 2005 . A formal definition of data quality problems . In Proceedings of the 2005 International Conference on Information Quality. MIT , Boston, MA, 1--14. Paulo Oliveira, F\u00e1tima Henriques, and Pedro Henriques. 2005. A formal definition of data quality problems. In Proceedings of the 2005 International Conference on Information Quality. MIT, Boston, MA, 1--14."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/269012.269023"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/505248.506010"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-006-0032-0"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/65943.65945"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816764"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/646290.686929"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 3rd Intl. Workshop on Design and Management of Data Warehouses. (DMDW\u201901)","author":"Galhardas Helena","year":"2001","unstructured":"Helena Galhardas , Daniela Florescu , Dennis Shasha , Eric Simon , and Cristian-Augustin Saita . ( 2001 ). Improving data cleaning quality using a data lineage facility . In Proceedings of the 3rd Intl. Workshop on Design and Management of Data Warehouses. (DMDW\u201901) . Interlaken, Switzerland. Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon, and Cristian-Augustin Saita. (2001). Improving data cleaning quality using a data lineage facility. In Proceedings of the 3rd Intl. Workshop on Design and Management of Data Warehouses. (DMDW\u201901). Interlaken, Switzerland."},{"key":"e_1_2_1_37_1","unstructured":"Statista. 2017. Share of Patent Applicants that were Female in Switzerland from 1980 to 2013 Statista Accounts. https:\/\/www.statista.com\/statistics\/422266\/share-of-women-inventors-female-patent-applicants-y-on-y-switzerland\/.  Statista. 2017. Share of Patent Applicants that were Female in Switzerland from 1980 to 2013 Statista Accounts. https:\/\/www.statista.com\/statistics\/422266\/share-of-women-inventors-female-patent-applicants-y-on-y-switzerland\/."},{"key":"e_1_2_1_38_1","unstructured":"Statistiken \u00fcber das Telefonbuch der Schweiz. 2018. http:\/\/www.adp-gmbh.ch\/misc\/tel_book_ch.html.  Statistiken \u00fcber das Telefonbuch der Schweiz. 2018. http:\/\/www.adp-gmbh.ch\/misc\/tel_book_ch.html."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0128000"},{"key":"e_1_2_1_40_1","volume-title":"Gotland VT2011","author":"Sultan Muhammad Umar","year":"2011","unstructured":"Muhammad Umar Sultan and Nasir Uddin . 2011 . Consumers\u2019 Attitude towards Online Shopping. Master thesis, Department of Business Administration. H\u00f6gskolan p\u00e5 Gotland VT2011 . Muhammad Umar Sultan and Nasir Uddin. 2011. Consumers\u2019 Attitude towards Online Shopping. Master thesis, Department of Business Administration. H\u00f6gskolan p\u00e5 Gotland VT2011."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/269012.269021"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.404034"},{"key":"e_1_2_1_43_1","volume-title":"Gender differences in Internet use patterns and Internet application preferences: A two-sample comparison. CyberPsychology 8 Behavior","author":"Weiser Erik B.","year":"2000","unstructured":"Erik B. Weiser . 2000. Gender differences in Internet use patterns and Internet application preferences: A two-sample comparison. CyberPsychology 8 Behavior ( 2000 ). Erik B. Weiser. 2000. Gender differences in Internet use patterns and Internet application preferences: A two-sample comparison. CyberPsychology 8 Behavior (2000)."},{"key":"e_1_2_1_44_1","unstructured":"Swissinfo. 2017. Welches sind die h\u00e4ufigsten Schweizer Nachnamen? https:\/\/www.swissinfo.ch\/ger\/wirtschaft\/umstrittene-aliasnamen-in-callcentern_welches-sind-die-haeufigsten-schweizer-nachnamen\/43291670.  Swissinfo. 2017. Welches sind die h\u00e4ufigsten Schweizer Nachnamen? https:\/\/www.swissinfo.ch\/ger\/wirtschaft\/umstrittene-aliasnamen-in-callcentern_welches-sind-die-haeufigsten-schweizer-nachnamen\/43291670."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the International Conference on Management Science and Engineering. 78--83","author":"Shen","year":"2006","unstructured":"Shen Xue-wu, Nie Gui-hua, and Shen Yan Ling . 2006 . Gender-based differences in the effect of web advertising in e-business . In Proceedings of the International Conference on Management Science and Engineering. 78--83 . Shen Xue-wu, Nie Gui-hua, and Shen Yan Ling. 2006. Gender-based differences in the effect of web advertising in e-business. In Proceedings of the International Conference on Management Science and Engineering. 78--83."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cities.2013.11.006"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3297720","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3297720","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:10Z","timestamp":1750204450000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3297720"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5,7]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,6,30]]}},"alternative-id":["10.1145\/3297720"],"URL":"https:\/\/doi.org\/10.1145\/3297720","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2019,5,7]]},"assertion":[{"value":"2018-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-05-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}