{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T22:23:18Z","timestamp":1761862998280},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2011,5,18]],"date-time":"2011-05-18T00:00:00Z","timestamp":1305676800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Comput Sci Res Dev"],"published-print":{"date-parts":[[2012,2]]},"DOI":"10.1007\/s00450-011-0177-x","type":"journal-article","created":{"date-parts":[[2011,5,17]],"date-time":"2011-05-17T15:38:01Z","timestamp":1305646681000},"page":"45-63","source":"Crossref","is-referenced-by-count":56,"title":["Multi-pass sorted neighborhood blocking with MapReduce"],"prefix":"10.1007","volume":"27","author":[{"given":"Lars","family":"Kolb","sequence":"first","affiliation":[]},{"given":"Andreas","family":"Thor","sequence":"additional","affiliation":[]},{"given":"Erhard","family":"Rahm","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2011,5,18]]},"reference":[{"key":"177_CR1","doi-asserted-by":"crossref","unstructured":"Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Stoica I, Zaharia M (2009) Above the clouds: A berkeley view of cloud computing. Tech rep, EECS Department. University of California, Berkeley","DOI":"10.1145\/1721654.1721672"},{"key":"177_CR2","volume-title":"Data quality: concepts, methodologies and techniques. Data-centric systems and applications","author":"C Batini","year":"2006","unstructured":"Batini C, Scannapieco M (2006) Data quality: concepts, methodologies and techniques. Data-centric systems and applications. Springer, Berlin"},{"key":"177_CR3","first-page":"25","volume-title":"ACM SIGKDD","author":"R Baxter","year":"2003","unstructured":"Baxter R, Christen P, Churches T (2003) A comparison of fast blocking methods for record linkage. In: ACM SIGKDD, vol 3, pp 25\u201327"},{"key":"177_CR4","first-page":"39","volume-title":"KDD","author":"M Bilenko","year":"2003","unstructured":"Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: KDD, pp 39\u201348"},{"key":"177_CR5","unstructured":"Borthakur D (2007) The hadoop distributed file system: Architecture and design. Hadoop Project Website"},{"key":"177_CR6","doi-asserted-by":"crossref","first-page":"1065","DOI":"10.1145\/1401890.1402020","volume-title":"KDD","author":"P Christen","year":"2008","unstructured":"Christen P (2008) Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: KDD, pp 1065\u20131068"},{"key":"177_CR7","first-page":"638","volume-title":"PAKDD","author":"P Christen","year":"2004","unstructured":"Christen P, Churches T, Hegland M (2004) Febrl\u2014a parallel open source data linkage system. In: PAKDD, pp 638\u2013647"},{"key":"177_CR8","first-page":"137","volume-title":"OSDI","author":"J Dean","year":"2004","unstructured":"Dean J, Ghemawat S (2004) MapReduce: Simplified data processing on large clusters. In: OSDI, pp 137\u2013150"},{"issue":"1","key":"177_CR9","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","volume":"51","author":"J Dean","year":"2008","unstructured":"Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107\u2013113","journal-title":"Commun ACM"},{"issue":"6","key":"177_CR10","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1145\/129888.129894","volume":"35","author":"D DeWitt","year":"1992","unstructured":"DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85\u201398","journal-title":"Commun ACM"},{"key":"177_CR11","first-page":"27","volume-title":"VLDB","author":"DJ DeWitt","year":"1992","unstructured":"DeWitt DJ, Naughton JF, Schneider DA, Seshadri S (1992) Practical skew handling in parallel joins. In: VLDB, pp 27\u201340"},{"issue":"1","key":"177_CR12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/TKDE.2007.250581","volume":"19","author":"AK Elmagarmid","year":"2007","unstructured":"Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19(1):1\u201316","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"177_CR13","unstructured":"Foundation AS (2006) Hadoop. http:\/\/hadoop.apache.org\/mapreduce\/"},{"key":"177_CR14","first-page":"127","volume-title":"SIGMOD Conference","author":"MA Hern\u00e1ndez","year":"1995","unstructured":"Hern\u00e1ndez MA, Stolfo SJ (1995) The merge\/purge problem for large databases. In: SIGMOD Conference, pp 127\u2013138"},{"key":"177_CR15","first-page":"283","volume-title":"CIKM","author":"HS Kim","year":"2007","unstructured":"Kim HS, Lee D (2007) Parallel linkage. In: CIKM, pp 283\u2013292"},{"key":"177_CR16","volume-title":"8th International Workshop on Quality in Databases","author":"T Kirsten","year":"2010","unstructured":"Kirsten T, Kolb L, Hartung M, Gross A, K\u00f6pcke H, Rahm E (2010) Data partitioning for parallel entity matching. In: 8th International Workshop on Quality in Databases"},{"key":"177_CR17","first-page":"45","volume-title":"BTW","author":"L Kolb","year":"2011","unstructured":"Kolb L, Thor A, Rahm E (2011) Parallel sorted neighborhood blocking with mapreduce. In: BTW, pp 45\u201364"},{"issue":"2","key":"177_CR18","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1016\/j.datak.2009.10.003","volume":"69","author":"H K\u00f6pcke","year":"2010","unstructured":"K\u00f6pcke H, Rahm E (2010) Frameworks for entity matching: a comparison. Data Knowl Eng 69(2):197\u2013210","journal-title":"Data Knowl Eng"},{"key":"177_CR19","volume-title":"VLDB","author":"H K\u00f6pcke","year":"2010","unstructured":"K\u00f6pcke H, Thor A, Rahm E (2010) Evaluation of entity resolution approaches on real-world match problems. In: VLDB, pp\u00a0484\u2013493"},{"key":"177_CR20","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1109\/MIC.2010.58","volume":"14","author":"H K\u00f6pcke","year":"2010","unstructured":"K\u00f6pcke H, Thor A, Rahm E (2010) Learning-based approaches for matching web data entities. IEEE Internet Comput 14:23\u201331","journal-title":"IEEE Internet Comput"},{"issue":"1","key":"177_CR21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.2200\/S00274ED1V01Y201006HLT007","volume":"3","author":"J Lin","year":"2010","unstructured":"Lin J, Dyer C (2010) Data-intensive text processing with mapreduce. Synth Lect Hum Lang Technol 3(1):1\u2013177","journal-title":"Synth Lect Hum Lang Technol"},{"issue":"4","key":"177_CR22","first-page":"3","volume":"23","author":"E Rahm","year":"2000","unstructured":"Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3\u201313","journal-title":"IEEE Data Eng Bull"},{"key":"177_CR23","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1145\/1807167.1807222","volume-title":"SIGMOD Conference","author":"R Vernica","year":"2010","unstructured":"Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using mapreduce. In: SIGMOD Conference, pp 495\u2013506"}],"container-title":["Computer Science - Research and Development"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s00450-011-0177-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s00450-011-0177-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s00450-011-0177-x","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,6,10]],"date-time":"2019-06-10T20:58:12Z","timestamp":1560200292000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s00450-011-0177-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5,18]]},"references-count":23,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,2]]}},"alternative-id":["177"],"URL":"https:\/\/doi.org\/10.1007\/s00450-011-0177-x","relation":{},"ISSN":["1865-2034","1865-2042"],"issn-type":[{"value":"1865-2034","type":"print"},{"value":"1865-2042","type":"electronic"}],"subject":[],"published":{"date-parts":[[2011,5,18]]}}}