{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:28:41Z","timestamp":1777854521048,"version":"3.51.4"},"reference-count":44,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T00:00:00Z","timestamp":1653955200000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.sagepub.com\/licence-information-for-chorus"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1917663"],"award-info":[{"award-number":["1917663"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2023,6]]},"abstract":"<jats:p>Chinese author names are known to be more difficult to disambiguate than other ethnic names because they tend to share surnames and forenames, thus creating many homonyms. In this study, we demonstrate how using Chinese characters can affect machine learning for author name disambiguation. For analysis, 15K author names recorded in Chinese are transliterated into English and simplified by initialising their forenames to create counterfactual scenarios, reflecting real-world indexing practices in which Chinese characters are usually unavailable. The results show that Chinese author names that are highly ambiguous in English or with initialised forenames tend to become less confusing if their Chinese characters are included in the processing. Our findings indicate that recording Chinese author names in native script can help researchers and digital libraries enhance authority control of Chinese author names that continue to increase in size in bibliographic data.<\/jats:p>","DOI":"10.1177\/01655515211018171","type":"journal-article","created":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T12:25:34Z","timestamp":1622636734000},"page":"711-725","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":5,"title":["Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation"],"prefix":"10.1177","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6481-2065","authenticated-orcid":false,"given":"Jinseok","family":"Kim","sequence":"first","affiliation":[{"name":"Institute for Research on Innovation & Science, Survey Research Center, School of Information, University of Michigan, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7438-448X","authenticated-orcid":false,"given":"Jenna","family":"Kim","sequence":"additional","affiliation":[{"name":"School of Information Sciences, University of Illinois at Urbana-Champaign, USA"}]},{"given":"Jinmo","family":"Kim","sequence":"additional","affiliation":[{"name":"School of Information Sciences, University of Illinois at Urbana-Champaign, USA"}]}],"member":"179","published-online":{"date-parts":[[2021,5,31]]},"reference":[{"key":"bibr1-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1038\/223763b0"},{"key":"bibr2-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1002\/asi.23489"},{"key":"bibr3-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-015-1699-y"},{"key":"bibr4-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2013.06.006"},{"key":"bibr5-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1002\/asi.22695"},{"key":"bibr6-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1145\/2350036.2350040"},{"key":"bibr7-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-016-1892-7"},{"key":"bibr8-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888917000182"},{"key":"bibr9-01655515211018171","first-page":"11","volume":"3","author":"Torvik VI","year":"2009","journal-title":"Acm T Knowl Discov D"},{"key":"bibr10-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-013-0978-8"},{"key":"bibr11-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-018-2788-5"},{"key":"bibr12-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-018-2753-3"},{"key":"bibr13-01655515211018171","unstructured":"Wu Z, Yuan D, Treeratpituk P, et al. Science and ethnicity: how ethnicities shape the evolution of computer science research community. arXiv, 2014, https:\/\/arxiv.org\/abs\/1411.1129"},{"key":"bibr14-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-017-2363-5"},{"key":"bibr15-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1177\/0165551518761011"},{"key":"bibr16-01655515211018171","first-page":"102","volume":"40","author":"Fong MCM","year":"2012","journal-title":"J Chin Linguist"},{"key":"bibr17-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-010-0196-6"},{"key":"bibr18-01655515211018171","volume-title":"Proceedings of the 2013 KDD Cup 2013 workshop","author":"Chin WS"},{"key":"bibr19-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1038\/451766a"},{"key":"bibr20-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1002\/asi.24298"},{"key":"bibr21-01655515211018171","first-page":"287","volume":"43","author":"Smalheiser NR","year":"2009","journal-title":"Annu Rev Inform Sci"},{"key":"bibr22-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-018-2865-9"},{"key":"bibr23-01655515211018171","volume-title":"JCDL 2004: proceedings of the fourth ACM\/IEEE joint conference on digital libraries","author":"Han H"},{"key":"bibr24-01655515211018171","unstructured":"Kim K, Sefid A, Giles CL. Scaling author name disambiguation with CNF Blocking. arXiv, 2017, https:\/\/arxiv.org\/abs\/1709.09657"},{"key":"bibr25-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-017-2338-6"},{"key":"bibr26-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-43997-6_31"},{"key":"bibr27-01655515211018171","first-page":"1002","volume-title":"Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining","author":"Zhang Y"},{"key":"bibr28-01655515211018171","volume-title":"2011 IEEE 11th international conference on data mining","author":"Wang X"},{"key":"bibr29-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21491"},{"key":"bibr30-01655515211018171","first-page":"59","volume-title":"International Conference on Theory and Practice of Digital Libraries (TPDL)","author":"Ackermann MR"},{"key":"bibr31-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2015.08.004"},{"key":"bibr32-01655515211018171","first-page":"39","volume-title":"JCDL 2009: proceedings of the 2009 ACM\/IEEE joint conference on digital libraries","author":"Treeratpituk P"},{"key":"bibr33-01655515211018171","volume-title":"Proceedings of the Fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM2016)","author":"Vishnyakova D"},{"key":"bibr34-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1177\/0165551519888605"},{"key":"bibr35-01655515211018171","first-page":"334","volume-title":"Proceedings of the 5th ACM\/IEEE joint conference on digital libraries","author":"Han H"},{"key":"bibr36-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1002\/asi.22621"},{"key":"bibr37-01655515211018171","first-page":"272","volume":"649","author":"Louppe G","year":"2016","journal-title":"Comm Com Inf Sc"},{"key":"bibr38-01655515211018171","first-page":"265","volume-title":"2018 IEEE international conference on web services (ICWS)","author":"Kim K"},{"key":"bibr39-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-012-0681-1"},{"key":"bibr40-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0070299"},{"key":"bibr41-01655515211018171","first-page":"803","volume-title":"Proceedings of the 27th ACM international conference on information and knowledge management","author":"Backes T"},{"key":"bibr42-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1016\/j.joi.2012.11.010"},{"key":"bibr43-01655515211018171","doi-asserted-by":"publisher","DOI":"10.1177\/0165551511398573"},{"key":"bibr44-01655515211018171","volume-title":"Library of congress international symposium on science of science","author":"Torvik VI"}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211018171","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515211018171","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211018171","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515211018171","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T23:09:20Z","timestamp":1777504160000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515211018171"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,31]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["10.1177\/01655515211018171"],"URL":"https:\/\/doi.org\/10.1177\/01655515211018171","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"value":"0165-5515","type":"print"},{"value":"1741-6485","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,31]]}}}