{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,20]],"date-time":"2025-02-20T05:18:22Z","timestamp":1740028702284,"version":"3.37.3"},"reference-count":0,"publisher":"IOS Press","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018]]},"abstract":"<jats:p>Duplicate detection aims to identify different records in data sources that refer to the same real-world entity. It is a fundamental task for: item catalogs fusion, customer databases integration, fraud detection, and more. In this work we present BigDedup, a toolkit able to detect duplicate records on Big Data sources in an efficient manner. BigDedup makes available the state-of-the-art duplicate detection techniques on Apache Spark, a modern framework for distributed computing in Big Data scenarios. It can be used in two different ways: (i) through a simple graphic interface that permit to the user to process structured and unstructured data in a fast and effective way; (ii) as a library that provides different components that can be easily extended and customized. In the paper we show how to use BigDedup and its usefulness through some industrial examples.<\/jats:p>","DOI":"10.3233\/978-1-61499-898-3-1015","type":"book-chapter","created":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T17:19:26Z","timestamp":1739985566000},"source":"Crossref","is-referenced-by-count":0,"title":["BigDedup: A Big Data Integration Toolkit for Duplicate Detection in Industrial Scenarios"],"prefix":"10.3233","author":[{"family":"Gagliardelli Luca","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Zhu Song","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Simonini Giovanni","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"family":"Bergamaschi Sonia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"7437","container-title":["Advances in Transdisciplinary Engineering","Transdisciplinary Engineering Methods for Social Innovation of Industry 4.0"],"original-title":[],"deposited":{"date-parts":[[2025,2,19]],"date-time":"2025-02-19T17:43:35Z","timestamp":1739987015000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.medra.org\/servlet\/aliasResolver?alias=iospressISBN&isbn=978-1-61499-897-6&spage=1015&doi=10.3233\/978-1-61499-898-3-1015"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018]]},"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/978-1-61499-898-3-1015","relation":{},"ISSN":["2352-751X"],"issn-type":[{"value":"2352-751X","type":"print"}],"subject":[],"published":{"date-parts":[[2018]]}}}