{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:13:44Z","timestamp":1750220024718,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":17,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100011755","name":"National Center for Research and Development","doi-asserted-by":"publisher","award":["POIR.01.01.01-00-0287\/19"],"award-info":[{"award-number":["POIR.01.01.01-00-0287\/19"]}],"id":[{"id":"10.13039\/501100011755","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Polish Ministry of Education and Science","award":["Applied Doctorate grant no. DWD\/4\/24\/2020"],"award-info":[{"award-number":["Applied Doctorate grant no. DWD\/4\/24\/2020"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,3,27]]},"DOI":"10.1145\/3555776.3578724","type":"proceedings-article","created":{"date-parts":[[2023,6,7]],"date-time":"2023-06-07T17:16:29Z","timestamp":1686158189000},"page":"297-300","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["On evaluating text similarity measures for customer data deduplication"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4914-9394","authenticated-orcid":false,"given":"Pawel","family":"Boinski","sequence":"first","affiliation":[{"name":"Poznan University of Technology, Poznan, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1665-4928","authenticated-orcid":false,"given":"Mariusz","family":"Sienkiewicz","sequence":"additional","affiliation":[{"name":"Poznan University of Technology, Poznan, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6037-5718","authenticated-orcid":false,"given":"Robert","family":"Wrembel","sequence":"additional","affiliation":[{"name":"Poznan University of Technology, Poznan, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6426-3809","authenticated-orcid":false,"given":"Bartosz","family":"Bebel","sequence":"additional","affiliation":[{"name":"Poznan University of Technology, Poznan, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9486-929X","authenticated-orcid":false,"given":"Witold","family":"Andrzejewski","sequence":"additional","affiliation":[{"name":"Poznan University of Technology, Poznan, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,6,7]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Int. Joint Conf. on Neural Networks (IJCNN). IEEE","author":"Alamuri Madhavi","year":"2014","unstructured":"Madhavi Alamuri , Bapi Raju Surampudi , and Atul Negi . 2014 . A survey of distance\/similarity measures for categorical data . In Int. Joint Conf. on Neural Networks (IJCNN). IEEE , 1907--1914. Madhavi Alamuri, Bapi Raju Surampudi, and Atul Negi. 2014. A survey of distance\/similarity measures for categorical data. In Int. Joint Conf. on Neural Networks (IJCNN). IEEE, 1907--1914."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1007\/s00778-008-0098-x","article-title":"Swoosh: a generic approach to entity resolution","volume":"18","author":"Benjelloun Omar","year":"2009","unstructured":"Omar Benjelloun , Hector Garcia-Molina , David Menestrina , Qi Su , Steven Euijong Whang , and Jennifer Widom . 2009 . Swoosh: a generic approach to entity resolution . VLDB Journal 18 , 1 (2009), 255 -- 276 . Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang, and Jennifer Widom. 2009. Swoosh: a generic approach to entity resolution. VLDB Journal 18, 1 (2009), 255--276.","journal-title":"VLDB Journal"},{"key":"e_1_3_2_1_3_1","volume-title":"Proc. of the Workshops of the EDBT\/ICDT 2022 Joint Conference (CEUR Workshop Proceedings)","volume":"3135","author":"Boi\u0144ski Pawe\u0142","year":"2022","unstructured":"Pawe\u0142 Boi\u0144ski , Mariusz Sienkiewicz , Bartosz B\u0119bel , Robert Wrembel , Dariusz Ga\u0142\u0119zowski , and Waldemar Graniszewski . 2022 . On Customer Data Deduplication: Lessons Learned from a R&D Project in the Financial Sector . In Proc. of the Workshops of the EDBT\/ICDT 2022 Joint Conference (CEUR Workshop Proceedings) , Vol. 3135 . CEUR-WS.org. Pawe\u0142 Boi\u0144ski, Mariusz Sienkiewicz, Bartosz B\u0119bel, Robert Wrembel, Dariusz Ga\u0142\u0119zowski, and Waldemar Graniszewski. 2022. On Customer Data Deduplication: Lessons Learned from a R&D Project in the Financial Sector. In Proc. of the Workshops of the EDBT\/ICDT 2022 Joint Conference (CEUR Workshop Proceedings), Vol. 3135. CEUR-WS.org."},{"key":"e_1_3_2_1_4_1","volume-title":"Similarity Measures for Categorical Data: A Comparative Evaluation. In SIAM Int. Conf. on Data Mining (SDM). SIAM, 243--254","author":"Boriah Shyam","year":"2008","unstructured":"Shyam Boriah , Varun Chandola , and Vipin Kumar . 2008 . Similarity Measures for Categorical Data: A Comparative Evaluation. In SIAM Int. Conf. on Data Mining (SDM). SIAM, 243--254 . Shyam Boriah, Varun Chandola, and Vipin Kumar. 2008. Similarity Measures for Categorical Data: A Comparative Evaluation. In SIAM Int. Conf. on Data Mining (SDM). SIAM, 243--254."},{"key":"e_1_3_2_1_5_1","volume-title":"Int. Conf. on Data Mining (ICDM). IEEE Computer Society, 290--294","author":"Christen Peter","year":"2006","unstructured":"Peter Christen . 2006 . A Comparison of Personal Name Matching: Techniques and Practical Issues . In Int. Conf. on Data Mining (ICDM). IEEE Computer Society, 290--294 . Peter Christen. 2006. A Comparison of Personal Name Matching: Techniques and Practical Issues. In Int. Conf. on Data Mining (ICDM). IEEE Computer Society, 290--294."},{"volume-title":"Entity Resolution, and Duplicate Detection","author":"Christen Peter","key":"e_1_3_2_1_6_1","unstructured":"Peter Christen . 2012. Data Matching - Concepts and Techniques for Record Linkage , Entity Resolution, and Duplicate Detection . Springer . Peter Christen. 2012. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer."},{"key":"e_1_3_2_1_7_1","volume-title":"An Overview of End-to-End Entity Resolution for Big Data. Comput. Surveys 53, 6","author":"Christophides Vassilis","year":"2021","unstructured":"Vassilis Christophides , Vasilis Efthymiou , Themis Palpanas , George Papadakis , and Kostas Stefanidis . 2021. An Overview of End-to-End Entity Resolution for Big Data. Comput. Surveys 53, 6 ( 2021 ), 127:1--127:42. Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George Papadakis, and Kostas Stefanidis. 2021. An Overview of End-to-End Entity Resolution for Big Data. Comput. Surveys 53, 6 (2021), 127:1--127:42."},{"key":"e_1_3_2_1_8_1","unstructured":"Adrian Colyer. 2020. The morning paper on An overview of end-to-end entity resolution for big data. https:\/\/blog.acolyer.org\/2020\/12\/14\/entity-resolution\/.  Adrian Colyer. 2020. The morning paper on An overview of end-to-end entity resolution for big data. https:\/\/blog.acolyer.org\/2020\/12\/14\/entity-resolution\/."},{"key":"e_1_3_2_1_9_1","volume-title":"Int. Conf. on Advances in Databases, Knowledge, and Data Applications (DBKDA). 63--69","author":"del Pilar Angeles Mar\u00eda","year":"2015","unstructured":"Mar\u00eda del Pilar Angeles and Adrian Espino-Gamez . 2015 . Comparison of Methods Hamming Distance, Jaro, and Monge-Elkan . In Int. Conf. on Advances in Databases, Knowledge, and Data Applications (DBKDA). 63--69 . Mar\u00eda del Pilar Angeles and Adrian Espino-Gamez. 2015. Comparison of Methods Hamming Distance, Jaro, and Monge-Elkan. In Int. Conf. on Advances in Databases, Knowledge, and Data Applications (DBKDA). 63--69."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TKDE.2007.250581","article-title":"Duplicate Record Detection: A Survey","volume":"19","author":"Elmagarmid Ahmed K.","year":"2007","unstructured":"Ahmed K. Elmagarmid , Panagiotis G. Ipeirotis , and Vassilios S. Verykios . 2007 . Duplicate Record Detection: A Survey . IEEE Transactions on Knowledge and Data Engineering 19 , 1 (2007), 1 -- 16 . Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, and Vassilios S. Verykios. 2007. Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19, 1 (2007),1--16.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_3_2_1_11_1","volume-title":"Generalized Mongue-Elkan Method for Approximate Text String Comparison. In Int. Conf. on Computational Linguistics and Intelligent Text Processing (CICLing) (LNCS), Alexander F. Gelbukh (Ed.)","volume":"5449","author":"Jim\u00e9nez Sergio","unstructured":"Sergio Jim\u00e9nez , Claudia Jeanneth Becerra , Alexander F. Gelbukh , and Fabio A. Gonz\u00e1lez . 2009 . Generalized Mongue-Elkan Method for Approximate Text String Comparison. In Int. Conf. on Computational Linguistics and Intelligent Text Processing (CICLing) (LNCS), Alexander F. Gelbukh (Ed.) , Vol. 5449 . Springer, 559--570. Sergio Jim\u00e9nez, Claudia Jeanneth Becerra, Alexander F. Gelbukh, and Fabio A. Gonz\u00e1lez. 2009. Generalized Mongue-Elkan Method for Approximate Text String Comparison. In Int. Conf. on Computational Linguistics and Intelligent Text Processing (CICLing) (LNCS), Alexander F. Gelbukh (Ed.), Vol. 5449. Springer, 559--570."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.datak.2009.10.003","article-title":"Frameworks for entity matching: A comparison","volume":"69","author":"K\u00f6pcke Hanna","year":"2010","unstructured":"Hanna K\u00f6pcke and Erhard Rahm . 2010 . Frameworks for entity matching: A comparison . Data & Knowledge Engineering 69 , 2 (2010), 197 -- 210 . Hanna K\u00f6pcke and Erhard Rahm. 2010. Frameworks for entity matching: A comparison. Data & Knowledge Engineering 69, 2 (2010), 197--210.","journal-title":"Data & Knowledge Engineering"},{"volume-title":"An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD).","author":"Alvaro","key":"e_1_3_2_1_13_1","unstructured":"Alvaro E. Monge and Charles Elkan. 1997 . An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD). Alvaro E. Monge and Charles Elkan. 1997. An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records. In Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD)."},{"volume-title":"Similarity measures","author":"Naumann Felix","key":"e_1_3_2_1_14_1","unstructured":"Felix Naumann . 2013. Similarity measures . Hasso Plattner Institut . Felix Naumann. 2013. Similarity measures. Hasso Plattner Institut."},{"key":"e_1_3_2_1_15_1","volume-title":"Blocking and Filtering Techniques for Entity Resolution: A Survey. Comput. Surveys 53, 2","author":"Papadakis George","year":"2020","unstructured":"George Papadakis , Dimitrios Skoutas , Emmanouil Thanos , and Themis Palpanas . 2020. Blocking and Filtering Techniques for Entity Resolution: A Survey. Comput. Surveys 53, 2 ( 2020 ), 31:1--31:42. George Papadakis, Dimitrios Skoutas, Emmanouil Thanos, and Themis Palpanas. 2020. Blocking and Filtering Techniques for Entity Resolution: A Survey. Comput. Surveys 53, 2 (2020), 31:1--31:42."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1145\/3385658.3385664","article-title":"Domain- and Structure-Agnostic End-to-End Entity Resolution with JedAI","volume":"48","author":"Papadakis George","year":"2019","unstructured":"George Papadakis , Leonidas Tsekouras , Emmanouil Thanos , George Giannakopoulos , Themis Palpanas , and Manolis Koubarakis . 2019 . Domain- and Structure-Agnostic End-to-End Entity Resolution with JedAI . SIGMOD Record 48 , 4 (2019), 30 -- 36 . George Papadakis, Leonidas Tsekouras, Emmanouil Thanos, George Giannakopoulos, Themis Palpanas, and Manolis Koubarakis. 2019. Domain- and Structure-Agnostic End-to-End Entity Resolution with JedAI. SIGMOD Record 48, 4 (2019), 30--36.","journal-title":"SIGMOD Record"},{"key":"e_1_3_2_1_17_1","unstructured":"Textdistance. [n. d.]. Python package: textdistance. https:\/\/pypi.org\/project\/textdistance\/.  Textdistance. [n. d.]. Python package: textdistance. https:\/\/pypi.org\/project\/textdistance\/."}],"event":{"name":"SAC '23: 38th ACM\/SIGAPP Symposium on Applied Computing","sponsor":["SIGAPP ACM Special Interest Group on Applied Computing"],"location":"Tallinn Estonia","acronym":"SAC '23"},"container-title":["Proceedings of the 38th ACM\/SIGAPP Symposium on Applied Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555776.3578724","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3555776.3578724","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:35Z","timestamp":1750182695000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3555776.3578724"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,27]]},"references-count":17,"alternative-id":["10.1145\/3555776.3578724","10.1145\/3555776"],"URL":"https:\/\/doi.org\/10.1145\/3555776.3578724","relation":{},"subject":[],"published":{"date-parts":[[2023,3,27]]},"assertion":[{"value":"2023-06-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}