{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T16:19:26Z","timestamp":1772554766531,"version":"3.50.1"},"reference-count":13,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2020,7,22]],"date-time":"2020-07-22T00:00:00Z","timestamp":1595376000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2020,7,22]]},"abstract":"<jats:p>Entity matching (EM) finds data instances that refer to the same real-world entity. In 2015, we started the Magellan project at UW-Madison, jointly with industrial partners, to build EM systems. Most current EM systems are stand-alone monoliths. In contrast, Magellan borrows ideas from the field of data science (DS), to build a new kind of EM systems, which is ecosystems of interoperable tools for multiple execution environments, such as on-premise, cloud, and mobile. This paper describes Magellan, focusing on the system aspects. We argue why EM can be viewed as a special class of DS problems and thus can benefit from system building ideas in DS. We discuss how these ideas have been adapted to build &lt;code&gt;PyMatcher&lt;\/code&gt; and &lt;code&gt;CloudMatcher&lt;\/code&gt;, sophisticated on-premise tools for power users and self-service cloud tools for lay users. These tools exploit techniques from the fields of machine learning, big data scaling, efficient user interaction, databases, and cloud systems. They have been successfully used in 13 companies and domain science groups, have been pushed into production for many customers, and are being commercialized. We discuss the lessons learned and explore applying the Magellan template to other tasks in data exploration, cleaning, and integration.<\/jats:p>","DOI":"10.1145\/3405476","type":"journal-article","created":{"date-parts":[[2020,7,22]],"date-time":"2020-07-22T22:14:24Z","timestamp":1595456064000},"page":"83-91","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":32,"title":["Magellan"],"prefix":"10.1145","volume":"63","author":[{"given":"AnHai","family":"Doan","sequence":"first","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Pradap","family":"Konda","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Paul","family":"Suganthan G. C.","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Yash","family":"Govind","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Derek","family":"Paulsen","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Kaushik","family":"Chandrasekhar","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Philip","family":"Martinkus","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]},{"given":"Matthew","family":"Christie","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison"}]}],"member":"320","published-online":{"date-parts":[[2020,7,22]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Workshop on Human-In-the-Loop Data Analytics http:\/\/hilda.io\/.  Workshop on Human-In-the-Loop Data Analytics http:\/\/hilda.io\/."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/2344108"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035960"},{"key":"e_1_2_1_4_1","first-page":"2","article-title":"Toward a system building agenda for Data Integration (and Data Science)","volume":"41","author":"Doan A.","year":"2018","unstructured":"Doan , A. , Toward a system building agenda for Data Integration (and Data Science) . IEEE Data Eng. Bull. 41 , 2 ( 2018 ), 35--46. Doan, A., et al. Toward a system building agenda for Data Integration (and Data Science). IEEE Data Eng. Bull. 41, 2 (2018), 35--46.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_5_1","volume-title":"Morgan Kaufmann","author":"Doan A.","year":"2012","unstructured":"Doan , A. , Halevy , A.Y. , Ives , Z.G. Principles of Data Integration . Morgan Kaufmann , 2012 . Doan, A., Halevy, A.Y., Ives, Z.G. Principles of Data Integration. Morgan Kaufmann, 2012."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.250581"},{"key":"e_1_2_1_7_1","volume-title":"BIGDAS","author":"Govind Y.","year":"2017","unstructured":"Govind , Y. , Cloudmatcher: A cloud\/crowd service for entity matching . In BIGDAS ( 2017 ). Govind, Y., et al. Cloudmatcher: A cloud\/crowd service for entity matching. In BIGDAS (2017)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314042"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994535"},{"key":"e_1_2_1_10_1","volume-title":"EDBT","author":"Konda P.","year":"2019","unstructured":"Konda , P. , Performing entity matching end to end: A case study . In EDBT ( 2019 ). Konda, P., et al. Performing entity matching end to end: A case study. In EDBT (2019)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196926"},{"key":"e_1_2_1_12_1","first-page":"12","article-title":"The return of JedAI: End-to-End entity resolution for structured and semi-structured data","volume":"11","author":"Papadakis G.","year":"2018","unstructured":"Papadakis , G. , The return of JedAI: End-to-End entity resolution for structured and semi-structured data . PVLDB 11 , 12 ( 2018 ), 1950--1953. Papadakis, G., et al. The return of JedAI: End-to-End entity resolution for structured and semi-structured data. PVLDB 11, 12 (2018), 1950--1953.","journal-title":"PVLDB"},{"key":"e_1_2_1_13_1","volume-title":"End-to-End Entity Resolution. In The Web Conference (WWW)","author":"Papadakis G.","year":"2018","unstructured":"Papadakis , G. , Web -scale, Schema-Agnostic , End-to-End Entity Resolution. In The Web Conference (WWW) , ( Lyon, France, April) , 2018 . Papadakis, G., et al. Web-scale, Schema-Agnostic, End-to-End Entity Resolution. In The Web Conference (WWW), (Lyon, France, April), 2018."}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3405476","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3405476","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:32:09Z","timestamp":1750195929000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3405476"}},"subtitle":["toward building ecosystems of entity matching solutions"],"short-title":[],"issued":{"date-parts":[[2020,7,22]]},"references-count":13,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2020,7,22]]}},"alternative-id":["10.1145\/3405476"],"URL":"https:\/\/doi.org\/10.1145\/3405476","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"value":"0001-0782","type":"print"},{"value":"1557-7317","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7,22]]},"assertion":[{"value":"2020-07-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}