{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,11]],"date-time":"2025-11-11T15:45:41Z","timestamp":1762875941332,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":18,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,12,4]],"date-time":"2017-12-04T00:00:00Z","timestamp":1512345600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,12,4]]},"DOI":"10.1145\/3148011.3154470","type":"proceedings-article","created":{"date-parts":[[2017,12,18]],"date-time":"2017-12-18T13:22:50Z","timestamp":1513603370000},"page":"1-4","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["A Supervised Learning Approach To Entity Matching Between Scholarly Big Datasets"],"prefix":"10.1145","author":[{"given":"Jian","family":"Wu","sequence":"first","affiliation":[{"name":"IST, Pennsylvania State University, University Park, PA, USA"}]},{"given":"Athar","family":"Sefid","sequence":"additional","affiliation":[{"name":"CSE, Pennsylvania State University, University Park, PA, USA"}]},{"given":"Allen C.","family":"Ge","sequence":"additional","affiliation":[{"name":"IST, Pennsylvania State University, University Park, PA, USA"}]},{"given":"C. Lee","family":"Giles","sequence":"additional","affiliation":[{"name":"IST, Pennsylvania State University, University Park, PA, USA and CSE, Pennsylvania State University, University Park, PA, USA"}]}],"member":"320","published-online":{"date-parts":[[2017,12,4]]},"reference":[{"volume-title":"CiteSeerX: A Scholarly Big Dataset","author":"Caragea Cornelia","key":"e_1_3_2_1_1_1","unstructured":"Cornelia Caragea , Jian Wu , Alina Ciobanu , Kyle Williams , Juan Fern\u00e1ndez-Ram\u00edrez , Hung-Hsuan Chen , Zhaohui Wu , and Lee Giles . 2014. CiteSeerX: A Scholarly Big Dataset . Springer International Publishing , Cham , 311--322. Cornelia Caragea, Jian Wu, Alina Ciobanu, Kyle Williams, Juan Fern\u00e1ndez-Ram\u00edrez, Hung-Hsuan Chen, Zhaohui Wu, and Lee Giles. 2014. CiteSeerX: A Scholarly Big Dataset. Springer International Publishing, Cham, 311--322."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509965"},{"volume-title":"Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection","author":"Christen Peter","key":"e_1_3_2_1_3_1","unstructured":"Peter Christen . 2012. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection . Springer Science & Business Media . Peter Christen. 2012. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer Science & Business Media."},{"volume-title":"Proceedings of the 3rd ACM\/IEEE-CS Joint Conference on Digital Libraries (JCDL '03)","author":"Han Hui","key":"e_1_3_2_1_4_1","unstructured":"Hui Han , C. Lee Giles , Eren Manavoglu , Hongyuan Zha , Zhenyue Zhang , and Edward A. Fox . 2003. Automatic document metadata extraction using support vector machines . In Proceedings of the 3rd ACM\/IEEE-CS Joint Conference on Digital Libraries (JCDL '03) . 37--48. Hui Han, C. Lee Giles, Eren Manavoglu, Hongyuan Zha, Zhenyue Zhang, and Edward A. Fox. 2003. Automatic document metadata extraction using support vector machines. In Proceedings of the 3rd ACM\/IEEE-CS Joint Conference on Digital Libraries (JCDL '03). 37--48."},{"key":"e_1_3_2_1_5_1","volume-title":"Winkler","author":"Herzog Thomas N.","year":"2007","unstructured":"Thomas N. Herzog , Fritz J. Scheuren , and William E . Winkler . 2007 . Data Quality and Record Linkage Techniques (1st ed.). Springer Publishing Company , Incorporated. Thomas N. Herzog, Fritz J. Scheuren, and William E. Winkler. 2007. Data Quality and Record Linkage Techniques (1st ed.). Springer Publishing Company, Incorporated."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2910896.2925465"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2467696.2467753"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1812799.1812875"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/2740769.2740783"},{"key":"e_1_3_2_1_10_1","volume-title":"Amsterdam, The Netherlands","author":"Siegel Noah","year":"2016","unstructured":"Noah Siegel , Zachary Horvitz , Roie Levin , Santosh Kumar Divvala , and Ali Farhadi . 2016 . FigureSeer: Parsing Result-Figures in Research Papers. In Computer Vision - ECCV 2016 - 14th European Conference , Amsterdam, The Netherlands , October 11-14, 2016, Proceedings, Part VII. 664--680. Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Kumar Divvala, and Ali Farhadi. 2016. FigureSeer: Parsing Result-Figures in Research Papers. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VII. 664--680."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigMM.2017.63"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1555400.1555408"},{"key":"e_1_3_2_1_13_1","volume-title":"Shanghai","author":"Wang Yan","year":"2016","unstructured":"Yan Wang , Hao Zhang , Yaxin Li , Deyun Wang , YanLin Ma , Tong Zhou , and Jianguo Lu . 2016 . A Data Cleaning Method for CiteSeer Dataset. In Web Information Systems Engineering - WISE 2016 - 17th International Conference , Shanghai , China, November 8-10, 2016, Proceedings, Part I. 35--49. Yan Wang, Hao Zhang, Yaxin Li, Deyun Wang, YanLin Ma, Tong Zhou, and Jianguo Lu. 2016. A Data Cleaning Method for CiteSeer Dataset. In Web Information Systems Engineering - WISE 2016 - 17th International Conference, Shanghai, China, November 8-10, 2016, Proceedings, Part I. 35--49."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2494266.2494312"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815833.2815834"},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence","author":"Wu Jian","year":"2014","unstructured":"Jian Wu , Kyle Williams , Hung-Hsuan Chen , Madian Khabsa , Cornelia Caragea , Alexander Ororbia , Douglas Jordan , and C. Lee Giles . 2014. CiteSeerX: AI in a Digital Library Search Engine . In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence , July 27-31, 2014 , Qu\u00e9bec City, Qu\u00e9bec, Canada. 2930--2937. Jian Wu, Kyle Williams, Hung-Hsuan Chen, Madian Khabsa, Cornelia Caragea, Alexander Ororbia, Douglas Jordan, and C. Lee Giles. 2014. CiteSeerX: AI in a Digital Library Search Engine. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27-31, 2014, Qu\u00e9bec City, Qu\u00e9bec, Canada. 2930--2937."},{"key":"e_1_3_2_1_17_1","volume-title":"10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2014","author":"Wu Jian","year":"2014","unstructured":"Jian Wu , Kyle Williams , Madian Khabsa , and C. Lee Giles . 2014. The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective . In 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2014 , Miami, Florida, USA , October 22-25, 2014 . 171--176. Jian Wu, Kyle Williams, Madian Khabsa, and C. Lee Giles. 2014. The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective. In 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing, CollaborateCom 2014, Miami, Florida, USA, October 22-25, 2014. 171--176."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016","author":"Xiao Shuai","year":"2016","unstructured":"Shuai Xiao , Junchi Yan , Changsheng Li , Bo Jin , Xiangfeng Wang , Xiaokang Yang , Stephen M. Chu , and Hongyuan Zha . 2016 . On Modeling and Predicting Individual Paper Citation Count over Time . In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016 , New York, NY, USA , 9-15 July 2016. 2676--2682. Shuai Xiao, Junchi Yan, Changsheng Li, Bo Jin, Xiangfeng Wang, Xiaokang Yang, Stephen M. Chu, and Hongyuan Zha. 2016. On Modeling and Predicting Individual Paper Citation Count over Time. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016. 2676--2682."}],"event":{"name":"K-CAP 2017: Knowledge Capture Conference","sponsor":["SIGAI ACM Special Interest Group on Artificial Intelligence"],"location":"Austin TX USA","acronym":"K-CAP 2017"},"container-title":["Proceedings of the Knowledge Capture Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3148011.3154470","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3148011.3154470","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:33Z","timestamp":1750213593000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3148011.3154470"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,4]]},"references-count":18,"alternative-id":["10.1145\/3148011.3154470","10.1145\/3148011"],"URL":"https:\/\/doi.org\/10.1145\/3148011.3154470","relation":{},"subject":[],"published":{"date-parts":[[2017,12,4]]},"assertion":[{"value":"2017-12-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}