{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T05:51:51Z","timestamp":1767678711057,"version":"3.48.0"},"reference-count":47,"publisher":"SAGE Publications","issue":"6","license":[{"start":{"date-parts":[[2022,9,16]],"date-time":"2022-09-16T00:00:00Z","timestamp":1663286400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Information Science"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:p>\n                    Data deduplication is process of discovering multiple representations of same entity in an information system. Blocking has been a benchmark technique for avoiding the pair-wise record comparisons in data deduplication. Standard blocking (SB) aims at putting the potential duplicate records in the same block on the basis of a blocking key. Afterwards, the detailed comparisons are made only among the records residing in the same block. The selection of blocking key is a tedious process that involves exponential alternatives. The outcome of SB varies considerably with a change in blocking key. To this end, we have proposed a robust blocking technique called Locality Sensitive Blocking (LSB) that does not require the selection of blocking key. The experimental results show an increase of up to 0.448 in\n                    <jats:italic toggle=\"yes\">F<\/jats:italic>\n                    -score as compared with SB. Furthermore, it is found that LSB is more robust towards blocking parameters and data noise.\n                  <\/jats:p>","DOI":"10.1177\/01655515221121963","type":"journal-article","created":{"date-parts":[[2022,9,16]],"date-time":"2022-09-16T06:43:57Z","timestamp":1663310637000},"page":"1400-1413","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["Locality sensitive blocking (LSB): A robust blocking technique for data deduplication"],"prefix":"10.1177","volume":"50","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6251-3261","authenticated-orcid":false,"given":"Asif","family":"Sohail","sequence":"first","affiliation":[{"name":"Department of Information Technology, Faculty of Computing and Information Technology, University of the Punjab, Lahore, Pakistan"}]},{"given":"Waqar ul","family":"Qounain","sequence":"additional","affiliation":[{"name":"National Center of Artificial Intelligence, University of the Punjab, Lahore, Pakistan; Department of Information Technology, Faculty of Computing and Information Technology, University of the Punjab, Lahore, Pakistan"}]}],"member":"179","published-online":{"date-parts":[[2022,9,16]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1177\/14604582221077055."},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1177\/2053951717745678."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551516650410"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1080\/08874417.2021.1965052."},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.250581"},{"key":"e_1_3_3_7_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-031-01835-0","volume-title":"An introduction to duplicate detection","author":"Naumann F","year":"2010","unstructured":"Naumann F, Herschel M. An introduction to duplicate detection (Synthesis lectures on data management), vol. 2, No. 1. San Rafael, CA: Morgan & Claypool Publishers, 2010, pp. 1\u201387."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515577912"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551512459923"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551518789874"},{"issue":"2","key":"e_1_3_3_11_2","first-page":"20","article-title":"Evaluating integration approaches adopted by healthcare organizations","volume":"47","author":"Khoumbati K","year":"2006","unstructured":"Khoumbati K, Themistocleous M. Evaluating integration approaches adopted by healthcare organizations. J Comput Inform Syst 2006; 47(2): 20\u201327.","journal-title":"J Comput Inform Syst"},{"issue":"4","key":"e_1_3_3_12_2","first-page":"3","article-title":"Data cleaning: problems and current approaches","volume":"23","author":"Rahm E","year":"2000","unstructured":"Rahm E, Do HH. Data cleaning: problems and current approaches. IEEE Data Eng Bull 2000; 23(4): 3\u201313.","journal-title":"IEEE Data Eng Bull"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/4037.001.0001"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/0-387-69505-2_2"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1177\/01655515211013693."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-12-381972-7.00002-6"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-31164-2"},{"issue":"2","key":"e_1_3_3_18_2","first-page":"681","article-title":"A solution for data inconsistency in data integration","volume":"27","author":"Wang X","year":"2011","unstructured":"Wang X, Huang L, Xu X et al. A solution for data inconsistency in data integration. J Inf Sci Eng 2011; 27(2): 681\u2013695.","journal-title":"J Inf Sci Eng"},{"issue":"5","key":"e_1_3_3_19_2","first-page":"1505","article-title":"An automatic domain independent schema matching in integrating schemas of heterogeneous relational databases","volume":"30","author":"Ibrahim H","year":"2014","unstructured":"Ibrahim H, Karasneh Y, Mirabi M et al. An automatic domain independent schema matching in integrating schemas of heterogeneous relational databases. J Inf Sci Eng 2014; 30(5): 1505\u20131536.","journal-title":"J Inf Sci Eng"},{"key":"e_1_3_3_20_2","volume-title":"Handbook of record linkage: methods for health and statistical studies, administration, and business","author":"Newcombe HB","year":"1988","unstructured":"Newcombe HB. Handbook of record linkage: methods for health and statistical studies, administration, and business. Oxford: Oxford University Press, 1988."},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515590097"},{"key":"e_1_3_3_22_2","doi-asserted-by":"crossref","unstructured":"Han J Kamber M Pei J. Data preprocessing. In: Han J Kamber M Pei J (eds) Data mining: concepts and techniques (The Morgan Kaufmann series in data management systems). 3rd ed. Waltham MA: Morgan Kaufmann 2011 pp. 83\u2013124.","DOI":"10.1016\/B978-0-12-381479-1.00003-4"},{"key":"e_1_3_3_23_2","first-page":"362","volume-title":"Proceedings of the international conference on advanced information technology, services and systems","author":"Alaoui SS","unstructured":"Alaoui SS, Farhaoui Y, Aksasse B. A comparative study of the four well-known classification algorithms in data mining. In: Proceedings of the international conference on advanced information technology, services and systems, Tangier, 14\u201315 April 2017, pp. 362\u2013373."},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2009.10.003"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10729-014-9276-0"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/568271.223807"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.127"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.14778\/2947618.2947624"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1080\/2330443X.2017.1389620"},{"key":"e_1_3_3_30_2","first-page":"219","volume-title":"Proceedings of the 2009 ACM SIGMOD international conference on management of data","author":"Whang SE","unstructured":"Whang SE, Menestrina D, Koutrika G et al. Entity resolution with iterative blocking. In: Proceedings of the 2009 ACM SIGMOD international conference on management of data, Providence, RI, 29 June\u20132 July 2009, pp. 219\u2013232. New York: ACM."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-021-00656-7"},{"key":"e_1_3_3_32_2","first-page":"210","volume-title":"Proceedings of the international conference on data warehousing and knowledge discovery","author":"Lehti P","unstructured":"Lehti P, Fankhauser P. A precise blocking method for record linkage. In: Proceedings of the international conference on data warehousing and knowledge discovery, Copenhagen, 22\u201326 August 2005, pp. 210\u2013220. Berlin; Heidelberg: Springer-Verlag."},{"key":"e_1_3_3_33_2","first-page":"185","volume-title":"Proceedings of the 7th ACM\/IEEE-CS joint conference on digital libraries","author":"Yan S","unstructured":"Yan S, Lee D, Kan M-Y et al. Adaptive sorted neighborhood methods for efficient record linkage. In: Proceedings of the 7th ACM\/IEEE-CS joint conference on digital libraries, Vancouver, BC, Canada, 18\u201323 June 2007, pp. 185\u2013194. New York: ACM."},{"key":"e_1_3_3_34_2","first-page":"1073","volume-title":"Proceedings of the 2012 IEEE 28th international conference on data engineering","author":"Draisbach U","unstructured":"Draisbach U, Naumann F, Szott S et al. Adaptive windows for duplicate detection. In: Proceedings of the 2012 IEEE 28th international conference on data engineering, Arlington, VA, 1\u20135 April 2012, pp. 1073\u20131083. New York: IEEE."},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1080\/14639230400005974"},{"key":"e_1_3_3_36_2","first-page":"78","volume-title":"Proceedings of the 2012 IEEE international conference on service operations and logistics, and informatics","author":"Prasad KH","unstructured":"Prasad KH, Chaturvedi S, Faruquie TA et al. Automated selection of blocking columns for record linkage. In: Proceedings of the 2012 IEEE international conference on service operations and logistics, and informatics, Suzhou, China, 8\u201310 July 2012, pp. 78\u201383. New York: IEEE."},{"volume-title":"Proceedings of the 12th Alberto Mendelzon international workshop on foundations of data management","author":"Souza L","key":"e_1_3_3_37_2","unstructured":"Souza L, Murai F, Da Silva APC et al. Automatic identification of best attributes for indexing in data deduplication. In: Proceedings of the 12th Alberto Mendelzon international workshop on foundations of data management, Cali, 21\u201325 May 2018."},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2012.11.008"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.13"},{"key":"e_1_3_3_40_2","first-page":"340","volume-title":"Proceedings of the 2013 IEEE 13th international conference on data mining","author":"Kejriwal M","unstructured":"Kejriwal M, Miranker DP. An unsupervised algorithm for learning blocking schemes. In: Proceedings of the 2013 IEEE 13th international conference on data mining, Dallas, TX, 7\u201310 December 2013, pp. 340\u2013349. New York: IEEE."},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18032-8_45"},{"key":"e_1_3_3_42_2","first-page":"440","volume-title":"Proceedings of the 21st national conference on artificial intelligence","volume":"1","author":"Michelson M","unstructured":"Michelson M, Knoblock CA. Learning blocking schemes for record linkage. In: Proceedings of the 21st national conference on artificial intelligence, Boston, MA, 16\u201320 July 2006, vol. 1, pp. 440\u2013445. Palo Alto, CA: AAAI Press."},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.datak.2019.06.005"},{"key":"e_1_3_3_44_2","volume-title":"Mining of massive datasets","author":"Leskovec J","year":"2011","unstructured":"Leskovec J, Rajaraman A, Ullman JD. Mining of massive datasets. Cambridge: Cambridge University Press, 2011."},{"key":"e_1_3_3_45_2","unstructured":"datasketch: Big Data Looks Small \u2013 datasketch 1.0.0 documentation http:\/\/ekzhu.com\/datasketch\/index.html"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-016-0280-9"},{"key":"e_1_3_3_47_2","first-page":"18","volume-title":"Proceedings of the 2011 international conference on data and knowledge engineering (ICDKE)","author":"Draisbach U","unstructured":"Draisbach U, Naumann F. A generalization of blocking and windowing algorithms for duplicate detection. In: Proceedings of the 2011 international conference on data and knowledge engineering (ICDKE), Milan, 6 September 2011, pp. 18\u201324. New York: IEEE."},{"key":"e_1_3_3_48_2","first-page":"1065","volume-title":"Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining","author":"Christen P","unstructured":"Christen P. Febrl \u2013 an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, NV, 24\u201327 August 2008, pp. 1065\u20131068. New York: ACM."}],"container-title":["Journal of Information Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221121963","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01655515221121963","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01655515221121963","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,6]],"date-time":"2026-01-06T05:47:01Z","timestamp":1767678421000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01655515221121963"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,16]]},"references-count":47,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["10.1177\/01655515221121963"],"URL":"https:\/\/doi.org\/10.1177\/01655515221121963","relation":{},"ISSN":["0165-5515","1741-6485"],"issn-type":[{"type":"print","value":"0165-5515"},{"type":"electronic","value":"1741-6485"}],"subject":[],"published":{"date-parts":[[2022,9,16]]}}}