{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,17]],"date-time":"2025-12-17T13:03:31Z","timestamp":1765976611070,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":24,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T00:00:00Z","timestamp":1663632000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSF","award":["1823288"],"award-info":[{"award-number":["1823288"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,9,20]]},"DOI":"10.1145\/3558100.3563850","type":"proceedings-article","created":{"date-parts":[[2022,11,18]],"date-time":"2022-11-18T18:03:30Z","timestamp":1668794610000},"page":"1-4","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Scholarly big data quality assessment"],"prefix":"10.1145","author":[{"given":"Jian","family":"Wu","sequence":"first","affiliation":[{"name":"Old Dominion University"}]},{"given":"Ryan","family":"Hiltabrand","sequence":"additional","affiliation":[{"name":"Old Dominion University"}]},{"given":"Dominik","family":"So\u00f3s","sequence":"additional","affiliation":[{"name":"Old Dominion University"}]},{"given":"C. Lee","family":"Giles","sequence":"additional","affiliation":[{"name":"Pennsylvania State University"}]}],"member":"320","published-online":{"date-parts":[[2022,11,18]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Saleh Rehiel Alenazi Kamsuriah Ahmad and Akeem Olowolayemo. 2017. A review of similarity measurement for record duplication detection. In ICEEI.  Saleh Rehiel Alenazi Kamsuriah Ahmad and Akeem Olowolayemo. 2017. A review of similarity measurement for record duplication detection. In ICEEI.","DOI":"10.1109\/ICEEI.2017.8312386"},{"key":"e_1_3_2_1_2_1","volume-title":"Proceedings of the Annual Conference of CAIS\/Actes du congr\u00e8s annuel de l'ACSI.","author":"Bui Yen","year":"2006","unstructured":"Yen Bui and Jung-ran Park. 2006 . An assessment of metadata quality: A case study of the national science digital library metadata repository . In Proceedings of the Annual Conference of CAIS\/Actes du congr\u00e8s annuel de l'ACSI. Yen Bui and Jung-ran Park. 2006. An assessment of metadata quality: A case study of the national science digital library metadata repository. In Proceedings of the Annual Conference of CAIS\/Actes du congr\u00e8s annuel de l'ACSI."},{"key":"e_1_3_2_1_3_1","volume-title":"The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Sci. J. 14","author":"Cai Li","year":"2015","unstructured":"Li Cai and Yangyong Zhu . 2015. The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Sci. J. 14 ( 2015 ), 2. Li Cai and Yangyong Zhu. 2015. The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Sci. J. 14 (2015), 2."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1093\/bjaed\/mkv034"},{"key":"e_1_3_2_1_5_1","unstructured":"Abhinandan S. Das Mayur Datar Ashutosh Garg and Shyam Rajaram. 2007. Google News Personalization: Scalable Online Collaborative Filtering. In WWW.  Abhinandan S. Das Mayur Datar Ashutosh Garg and Shyam Rajaram. 2007. Google News Personalization: Scalable Online Collaborative Filtering. In WWW."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(94)90020-5"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318299.3318369"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/276675.276685"},{"key":"e_1_3_2_1_9_1","volume-title":"Winkler","author":"Herzog Thomas N.","year":"2007","unstructured":"Thomas N. Herzog , Fritz J. Scheuren , and William E . Winkler . 2007 . Data quality and record linkage techniques. Springer . Thomas N. Herzog, Fritz J. Scheuren, and William E. Winkler. 2007. Data quality and record linkage techniques. Springer."},{"key":"e_1_3_2_1_10_1","unstructured":"Omid Jafari Preeti Maurya Parth Nagarkar etal 2021. A Survey on Locality Sensitive Hashing Algorithms and their Applications.  Omid Jafari Preeti Maurya Parth Nagarkar et al. 2021. A Survey on Locality Sensitive Hashing Algorithms and their Applications."},{"key":"e_1_3_2_1_11_1","article-title":"CORE: Three Access Levels to Underpin Open Access","volume":"18","author":"Knoth Petr","year":"2012","unstructured":"Petr Knoth and Zdenek Zdr\u00e1hal . 2012 . CORE: Three Access Levels to Underpin Open Access . D Lib Mag. 18 , 11\/12 (2012). Petr Knoth and Zdenek Zdr\u00e1hal. 2012. CORE: Three Access Levels to Underpin Open Access. D Lib Mag. 18, 11\/12 (2012).","journal-title":"D Lib Mag."},{"key":"e_1_3_2_1_12_1","volume-title":"Mining of Massive Datasets","author":"Leskovec Jure","unstructured":"Jure Leskovec , Anand Rajaraman , and Jeffrey David Ullman . 2014. Mining of Massive Datasets ( 2 nd ed.). Cambridge University Press , USA. Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets (2nd ed.). Cambridge University Press, USA.","edition":"2"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.447"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04346-8_62"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1080\/01639370902737240"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3050547"},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of ISSI.","author":"van Eck Nees Jan","year":"2017","unstructured":"Nees Jan van Eck and Ludo Waltman . 2017 . Accuracy of citation data in Web of Science and Scopus . In Proceedings of ISSI. Nees Jan van Eck and Ludo Waltman. 2017. Accuracy of citation data in Web of Science and Scopus. In Proceedings of ISSI."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.609"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Kuansan Wang Zhihong Shen etal 2020. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 1 (02 2020) 396--413.  Kuansan Wang Zhihong Shen et al. 2020. Microsoft Academic Graph: When experts are not enough. Quantitative Science Studies 1 1 (02 2020) 396--413.","DOI":"10.1162\/qss_a_00021"},{"key":"e_1_3_2_1_20_1","unstructured":"Lucy Lu Wang Kyle Lo Yoganand Chandrasekhar etal 2020. CORD-19: The Covid-19 Open Research Dataset. CoRR abs\/2004.10706 (2020).  Lucy Lu Wang Kyle Lo Yoganand Chandrasekhar et al. 2020. CORD-19: The Covid-19 Open Research Dataset. CoRR abs\/2004.10706 (2020)."},{"volume-title":"Proceedings of DocEng.","author":"Williams Kyle","key":"e_1_3_2_1_21_1","unstructured":"Kyle Williams and C. Lee Giles . 2013. Near Duplicate Detection in an Academic Digital Library . In Proceedings of DocEng. Kyle Williams and C. Lee Giles. 2013. Near Duplicate Detection in an Academic Digital Library. In Proceedings of DocEng."},{"volume-title":"Proceedings of SBD@SIGMOD.","author":"Wu Jian","key":"e_1_3_2_1_22_1","unstructured":"Jian Wu , Chen Liang , Huaiyu Yang , and C. Lee Giles . 2016. CiteSeerX data: semanticizing scholarly papers . In Proceedings of SBD@SIGMOD. Jian Wu, Chen Liang, Huaiyu Yang, and C. Lee Giles. 2016. CiteSeerX data: semanticizing scholarly papers. In Proceedings of SBD@SIGMOD."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.sdp-1.3"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2016.2641460"}],"event":{"name":"DocEng '22: ACM Symposium on Document Engineering 2022","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGDOC ACM Special Interest Group on Systems Documentation"],"location":"San Jose California","acronym":"DocEng '22"},"container-title":["Proceedings of the 22nd ACM Symposium on Document Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3558100.3563850","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3558100.3563850","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3558100.3563850","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:49:32Z","timestamp":1750182572000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3558100.3563850"}},"subtitle":["a case study of document linking and conflation with S2ORC"],"short-title":[],"issued":{"date-parts":[[2022,9,20]]},"references-count":24,"alternative-id":["10.1145\/3558100.3563850","10.1145\/3558100"],"URL":"https:\/\/doi.org\/10.1145\/3558100.3563850","relation":{},"subject":[],"published":{"date-parts":[[2022,9,20]]},"assertion":[{"value":"2022-11-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}