{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T10:32:17Z","timestamp":1769596337133,"version":"3.49.0"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,3]]},"DOI":"10.1007\/s10618-021-00815-y","type":"journal-article","created":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T07:03:04Z","timestamp":1641279784000},"page":"537-565","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Efficient binary embedding of categorical data using BinSketch"],"prefix":"10.1007","volume":"36","author":[{"given":"Bhisham Dev","family":"Verma","sequence":"first","affiliation":[]},{"given":"Rameshwar","family":"Pratap","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1626-7514","authenticated-orcid":false,"given":"Debajyoti","family":"Bera","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,4]]},"reference":[{"key":"#cr-split#-815_CR1.1","doi-asserted-by":"crossref","unstructured":"Achlioptas D (2001) Database-friendly random projections. In: Buneman P","DOI":"10.1145\/375551.375608"},{"key":"#cr-split#-815_CR1.2","unstructured":"(ed) Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, May 21-23, 2001, Santa Barbara, California, USA. ACM"},{"key":"815_CR2","first-page":"1111","volume":"15","author":"A Agarwal","year":"2014","unstructured":"Agarwal A, Chapelle O, Dud\u00edk M, Langford J (2014) A reliable effective terascale linear learning system. J Mach Learn Res 15:1111\u20131133","journal-title":"J Mach Learn Res"},{"key":"815_CR3","doi-asserted-by":"crossref","unstructured":"Agrawal R, Imielinski T, and Swami A (1993) Mining association rules between sets of items in large databases. In: SIGMOD \u201993: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp 207\u2013216, New York, NY, USA, 1993. ACM Press","DOI":"10.1145\/170035.170072"},{"key":"815_CR4","unstructured":"Arthur D, and Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA \u201907, pp 1027\u20131035, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics"},{"key":"815_CR5","doi-asserted-by":"crossref","unstructured":"Blasius J and Greenacre M (2006) Multiple correspondence analysis and related methods. In: Multiple correspondence analysis and related methods, 06 2006","DOI":"10.1201\/9781420011319"},{"key":"815_CR6","first-page":"2003","volume":"3","author":"M Blei David","year":"2003","unstructured":"Blei David M, Ng Andrew Y, Jordan Michael I, Lafferty J (2003) Latent Dirichlet allocation. J Mach Learn Res 3:2003","journal-title":"J Mach Learn Res"},{"key":"815_CR7","first-page":"298","volume":"23","author":"C Boutsidis","year":"2010","unstructured":"Boutsidis C, Zouzias A, Drineas P (2010) Random projections for $$k$$-means clustering. Adv Neural Inf Process Syst 23:298\u2013306","journal-title":"Adv Neural Inf Process Syst"},{"key":"815_CR8","doi-asserted-by":"crossref","unstructured":"Broder AZ, Charikar M, Frieze AM, and Mitzenmacher M (1998) Min-wise independent permutations (extended abstract). In: Proceedings of the thirtieth annual ACM symposium on the theory of computing, Dallas, Texas, USA, May 23\u201326, 1998, pp 327\u2013336","DOI":"10.1145\/276698.276781"},{"key":"815_CR9","doi-asserted-by":"crossref","unstructured":"Charikar M (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings on 34th annual ACM symposium on theory of computing, May 19\u201321, 2002, Montr\u00e9al, Qu\u00e9bec, Canada, pp 380\u2013388","DOI":"10.1145\/509907.509965"},{"issue":"3","key":"815_CR10","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1109\/TKDE.2003.1198388","volume":"15","author":"G Cormode","year":"2003","unstructured":"Cormode G, Datar M, Indyk P, Muthukrishnan S (2003) Comparing data streams using Hamming norms (how to zero in). IEEE Trans Knowl Data Eng 15(3):529\u2013540","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"6","key":"815_CR11","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","volume":"41","author":"S Deerwester","year":"1990","unstructured":"Deerwester S, Dumais Susan T, Furnas George W, Landauer Thomas K, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391\u2013407","journal-title":"J Am Soc Inf Sci"},{"key":"815_CR12","unstructured":"Gionis A, Indyk P, and Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB\u201999, proceedings of 25th international conference on very large data bases, Sep 7\u201310, 1999, Edinburgh, Scotland, UK, pp 518\u2013529"},{"key":"815_CR13","unstructured":"Grigorev A (2017) Mastering Java for data science: building data science applications in Java. Packt Publishing"},{"issue":"3","key":"815_CR14","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1023\/A:1009769707641","volume":"2","author":"Z Huang","year":"1998","unstructured":"Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283\u2013304","journal-title":"Data Min Knowl Discov"},{"key":"815_CR15","doi-asserted-by":"crossref","unstructured":"H\u00e4m\u00e4l\u00e4inen W, and\u00a0Nyk\u00e4nen M (2008) Efficient discovery of statistically significant association rules. In: 2008 Eighth IEEE international conference on data mining, pp 203\u2013212","DOI":"10.1109\/ICDM.2008.144"},{"key":"815_CR16","doi-asserted-by":"crossref","unstructured":"Indyk P, and Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on the theory of computing, Dallas, Texas, USA, May 23\u201326, 1998, pp 604\u2013613","DOI":"10.1145\/276698.276876"},{"key":"815_CR17","doi-asserted-by":"crossref","unstructured":"Johnson WB, and Lindenstrauss J (1983) Extensions of Lipschitz mappings into a hilbert space. In: Conference in modern analysis and probability (New Haven, Conn., 1982), Am. Math. Soc., Providence, R.I., pp 189\u2013206","DOI":"10.1090\/conm\/026\/737400"},{"key":"815_CR18","doi-asserted-by":"crossref","unstructured":"Kane DM, Nelson J, and Woodruff DP (2010) An optimal algorithm for the distinct elements problem. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS 2010, June 6\u201311, 2010, Indianapolis, Indiana, USA, pp 41\u201352","DOI":"10.1145\/1807085.1807094"},{"issue":"1\/2","key":"815_CR19","doi-asserted-by":"publisher","first-page":"81","DOI":"10.2307\/2332226","volume":"30","author":"MG Kendall","year":"1938","unstructured":"Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1\/2):81\u201393","journal-title":"Biometrika"},{"key":"815_CR20","doi-asserted-by":"crossref","unstructured":"Kim M, and Smaragdis P (2018) Bitwise neural networks for efficient single-channel source separation. In: 2018 IEEE international conference on acoustics, speech and signal processing, ICASSP 2018, Calgary, AB, Canada, April 15\u201320, 2018, pp 701\u2013705. IEEE","DOI":"10.1109\/ICASSP.2018.8461824"},{"key":"815_CR21","unstructured":"Kingma DP, and Welling M (2014) Auto-encoding variational bayes. In: 2nd International conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14\u201316, 2014, Conference track proceedings"},{"key":"815_CR22","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1016\/S0933-3657(01)00082-3","volume":"23","author":"L Kurgan","year":"2001","unstructured":"Kurgan L, Cios K, Tadeusiewicz R, Ogiela M, Goodenday L (2001) Knowledge discovery approach to automated cardiac spect diagnosis. Artificial Intell Med 23:149\u201369","journal-title":"Artificial Intell Med"},{"key":"815_CR23","doi-asserted-by":"crossref","unstructured":"Lavergne J, Benton R, and Raghavan VV (2012) Min\u2013max itemset trees for dense and categorical datasets. In: Chen L, Felfernig A, Liu J, and Ra\u015b ZW (eds) Foundations of intelligent systems, pp 51\u201360. Springer, Berlin, Heidelberg, 2012","DOI":"10.1007\/978-3-642-34624-8_6"},{"key":"815_CR24","unstructured":"Lee DD, and Sebastian Seung H (2000) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, and Tresp V (eds) NIPS, pp 556\u2013562. MIT Press"},{"key":"815_CR25","doi-asserted-by":"crossref","unstructured":"Li P, Hastie TJ, and Church KW (2006) Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, KDD \u201906, pp 287\u2013296, New York, NY, USA, 2006. Association for Computing Machinery","DOI":"10.1145\/1150402.1150436"},{"key":"815_CR26","unstructured":"Lichman M (2013) UCI machine learning repository"},{"key":"815_CR27","unstructured":"Liu H, and Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: Seventh international conference on tools with artificial intelligence, ICTAI \u201995, Herndon, VA, USA, Nov 5\u20138, 1995, pp 388\u2013391"},{"key":"815_CR28","doi-asserted-by":"crossref","unstructured":"Mitzenmacher M, Pagh R, and Pham N (2014) Efficient estimation for high similarities using odd sketches. In: 23rd International World Wide Web Conference, WWW\u201914, Seoul, Republic of Korea, Apr 7\u201311, 2014, pp 109\u2013118","DOI":"10.1145\/2566486.2568017"},{"key":"815_CR29","unstructured":"Moody J, Touretsky D, Kaufmann M, Noordewier MO, Towell GG, and Shavlik JW (eds) (1991) Training knowledge-based neural networks to recognize genes in DNA sequences"},{"issue":"6","key":"815_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pcbi.1006907","volume":"15","author":"LH Nguyen","year":"2019","unstructured":"Nguyen LH, Holmes S (2019) Ten quick tips for effective dimensionality reduction. PLOS Comput Biology 15(6):1\u201319","journal-title":"PLOS Comput Biology"},{"key":"815_CR31","doi-asserted-by":"crossref","unstructured":"Patwary MMA, Byna S, Satish NR, Sundaram N, Luki\u0107 Z, Roytershteyn V, Anderson MJ, Yao Y, Prabhat, and Dubey P (2015) Bd-cats: big data clustering at trillion particle scale. In: SC \u201915: proceedings of the international conference for high performance computing, networking, storage and analysis, pp 1\u201312","DOI":"10.1145\/2807591.2807616"},{"issue":"8","key":"815_CR32","doi-asserted-by":"publisher","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","volume":"27","author":"H Peng","year":"2005","unstructured":"Peng H, Long F, Ding CHQ (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226\u20131238","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"815_CR33","doi-asserted-by":"crossref","unstructured":"Pratap R, Bera D, and Revanuru K (2019) Efficient sketching algorithm for sparse binary data. In: 2019 IEEE international conference on data mining, ICDM 2019, Beijing, China, Nov 8\u201311, 2019, pp 508\u2013517","DOI":"10.1109\/ICDM.2019.00061"},{"key":"815_CR34","doi-asserted-by":"crossref","unstructured":"Pratap R, Kulkarni R, and Sohony I (2018) Efficient dimensionality reduction for sparse binary data. In: IEEE international conference on big data, Big Data 2018, Seattle, WA, USA, Dec 10\u201313, 2018, pp 152\u2013157","DOI":"10.1109\/BigData.2018.8622338"},{"key":"815_CR35","doi-asserted-by":"crossref","unstructured":"Pratap R, Sohony I, and Kulkarni R (2018) Efficient compression technique for sparse sets. In: Advances in knowledge discovery and data mining\u201422nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia, June 3\u20136, 2018, Proceedings, Part III, pp 164\u2013176","DOI":"10.1007\/978-3-319-93040-4_14"},{"key":"815_CR36","first-page":"12","volume":"31","author":"T Rognvaldsson","year":"2014","unstructured":"Rognvaldsson T, You L, Garwicz D (2014) State of the art prediction of HIV-1 protease cleavage sites. Bioinformatics (Oxford, England) 31:12","journal-title":"Bioinformatics (Oxford, England)"},{"key":"815_CR37","doi-asserted-by":"crossref","unstructured":"Sidana S, Laclau C, and Amini M-R (2018) Learning to recommend diverse items over implicit feedback on pandor, pp 427\u2013431","DOI":"10.1145\/3240323.3240400"},{"key":"815_CR38","unstructured":"Spaen QP (2019) Applications and advances in similarity-based machine learning. PhD thesis, University of California, Berkeley"},{"key":"815_CR39","doi-asserted-by":"crossref","unstructured":"Steinbach M, Ert\u00f6z L, and Kumar V (2004) The challenges of clustering high dimensional data, pp 273\u2013309. Springer, Berlin Heidelberg","DOI":"10.1007\/978-3-662-08968-2_16"},{"key":"815_CR40","doi-asserted-by":"publisher","first-page":"e0135918","DOI":"10.1371\/journal.pone.0135918","volume":"10","author":"C Wang","year":"2015","unstructured":"Wang C, Kao W-H, Kate Hsiao C (2015) Using hamming distance as information for SNP-sets clustering and testing in disease association studies. PLoS ONE 10:e0135918","journal-title":"PLoS ONE"},{"key":"815_CR41","doi-asserted-by":"crossref","unstructured":"Weinberger KQ, Dasgupta A, Langford J, Smola AJ, and Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of the 26th annual international conference on machine learning, ICML 2009, Montreal, Quebec, Canada, June 14\u201318, 2009, pp 1113\u20131120","DOI":"10.1145\/1553374.1553516"},{"key":"815_CR42","unstructured":"Zheng G, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, Ziraldo SB, Wheeler TD, McDermott GP, Zhu J, et\u00a0al (2017) Massively parallel digital transcriptional profiling of single cells. Nature Commun 8(1):1\u201312 Made available by 10$$\\times $$ Genomics at https:\/\/support.10xgenomics.com\/single-cell-gene-expression\/datasets\/1.3.0\/1M_neurons"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00815-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-021-00815-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00815-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T09:58:45Z","timestamp":1648547925000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-021-00815-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,4]]},"references-count":43,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["815"],"URL":"https:\/\/doi.org\/10.1007\/s10618-021-00815-y","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,4]]},"assertion":[{"value":"30 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}