{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T06:26:34Z","timestamp":1778048794325,"version":"3.51.4"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T00:00:00Z","timestamp":1634515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Grid Computing"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Correlation determination brings out relationships in data that had not been seen before and it is imperative to successfully use the power of correlations for data mining. In this paper, we have used the concepts of correlations to cluster data, and merged it with recommendation algorithms. We have proposed two correlation clustering algorithms (RBACC and LGBACC), that are based on finding Spearman\u2019s rank correlation coefficient among data points, and using dimensionality reduction approach (PCA) along with graph theory respectively, to produce high quality hierarchical clusters. Both these algorithms have been tested on real life data (New York yellow cabs dataset taken from<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/www.nyc.gov\">http:\/\/www.nyc.gov<\/jats:ext-link>), using distributed and parallel computing (Spark and R). They are found to be scalable and perform better than the existing hierarchical clustering algorithms. These two approaches have been used to replace similarity measures in recommendation algorithms and generate a correlation clustering based recommendation system model. We have combined the power of correlation analysis with that of prediction analysis to propose a better recommendation system. It is found that this model makes better quality recommendations as compared to the random recommendation model. This model has been validated using a real time, large data set (MovieLens dataset, taken from<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/grouplens.org\/datasets\/movielens\/latest\">http:\/\/grouplens.org\/datasets\/movielens\/latest<\/jats:ext-link>). The results show that combining correlated points with the predictive power of recommendation algorithms, produce better quality recommendations which are faster to compute. LGBACC has approximately 25% better prediction capability but at the same time takes significantly more prediction time compared to RBACC.<\/jats:p>","DOI":"10.1007\/s10723-021-09585-9","type":"journal-article","created":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T02:00:53Z","timestamp":1634608853000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["A Correlation Based Recommendation System for Large Data Sets"],"prefix":"10.1007","volume":"19","author":[{"given":"Divya","family":"Pandove","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Avleen","family":"Malhi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,10,18]]},"reference":[{"issue":"2","key":"9585_CR1","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1007\/s11615-001-0048-3","volume":"42","author":"V Didelez","year":"2001","unstructured":"Didelez, V., Pigeot, I.: Judea pearl: Causality: Models, reasoning, and inference. Politische Vierteljahresschrift 42(2), 313\u2013315 (2001)","journal-title":"Politische Vierteljahresschrift"},{"issue":"6062","key":"9585_CR2","doi-asserted-by":"publisher","first-page":"1518","DOI":"10.1126\/science.1205438","volume":"334","author":"DN Reshef","year":"2011","unstructured":"Reshef, D.N., Reshef, Y.A., Finucane, H.K., Grossman, S.R., McVean, G., Turnbaugh, P.J., Lander, E.S., Mitzenmacher, M., Sabeti, P.C.: Detecting novel associations in large data sets. Science 334(6062), 1518\u20131524 (2011)","journal-title":"Science"},{"issue":"6","key":"9585_CR3","doi-asserted-by":"publisher","first-page":"1217","DOI":"10.1109\/TPAMI.2015.2478471","volume":"38","author":"N Armanfard","year":"2016","unstructured":"Armanfard, N., Reilly, J.P., Komeili, M.: Local feature selection for data classification. IEEE Trans. Pattern Anal. Mach Intell. 38(6), 1217\u20131227 (2016)","journal-title":"IEEE Trans. Pattern Anal. Mach Intell."},{"issue":"1-3","key":"9585_CR4","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1023\/B:MACH.0000033116.57574.95","volume":"56","author":"N Bansal","year":"2004","unstructured":"Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1-3), 89\u2013113 (2004)","journal-title":"Mach. Learn."},{"key":"9585_CR5","doi-asserted-by":"crossref","unstructured":"B\u00f6hm, C., Kailing, K., Kr\u00f6ger, P., Zimek, A.: Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, pp 455\u2013466. ACM (2004)","DOI":"10.1145\/1007568.1007620"},{"key":"9585_CR6","volume-title":"Finding Generalized Projected Clusters in High Dimensional Spaces, vol. 29","author":"CC Aggarwal","year":"2000","unstructured":"Aggarwal, C.C., Yu, P.S.: Finding Generalized Projected Clusters in High Dimensional Spaces, vol. 29. ACM, New York (2000)"},{"key":"9585_CR7","unstructured":"Yang, J., Wang, W., Wang, H., Yu, P.: \/spl delta\/-clusters: capturing subspace correlation in a large data set. In: Data Engineering, 2002 Proceedings 18th International Conference on, pp 517\u2013528. IEEE (2002)"},{"key":"9585_CR8","doi-asserted-by":"crossref","unstructured":"Achtert, E., Bohm, C., Kroger, P., Zimek, A.: Mining hierarchies of correlation clusters. In: Scientific and Statistical Database Management, 2006 18th International Conference on, pp 119\u2013128. IEEE (2006)","DOI":"10.1109\/SSDBM.2006.35"},{"key":"9585_CR9","doi-asserted-by":"crossref","unstructured":"Li, J., Huang, X., Selke, C., Yong, J.: A fast algorithm for finding correlation clusters in noise data. Adv. Know. Discov. Data Min., 639\u2013647 (2007)","DOI":"10.1007\/978-3-540-71701-0_68"},{"key":"9585_CR10","doi-asserted-by":"crossref","unstructured":"Achtert, E., B\u00f6hm, C., Kriegel, H.-P., Kr\u00f6ger, P., Zimek, A.: Robust, complete, and efficient correlation clustering. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp 413\u2013418. SIAM (2007)","DOI":"10.1137\/1.9781611972771.37"},{"key":"9585_CR11","doi-asserted-by":"crossref","unstructured":"Achtert, E., Bohm, C., Kriegel, H.-P., Kroger, P., Zimek, A.: On exploring complex relationships of correlation clusters. In: Scientific and Statistical Database Management, 2007 SSBDM\u201907, 19th International Conference on, pp 7\u20137. IEEE (2007)","DOI":"10.1109\/SSDBM.2007.21"},{"issue":"3","key":"9585_CR12","doi-asserted-by":"publisher","first-page":"993","DOI":"10.1016\/j.patcog.2014.08.027","volume":"48","author":"P Mukhopadhyay","year":"2015","unstructured":"Mukhopadhyay, P., Chaudhuri, B.B.: A survey of hough transform. Pattern Recogn. 48(3), 993\u20131010 (2015)","journal-title":"Pattern Recogn."},{"key":"9585_CR13","doi-asserted-by":"crossref","unstructured":"Chattopadhyay, A.K., Chattyopadhyay, T., De, T., Mondal, S.: Independent component analysis for dimension reduction classification: Hough transform and cash algorithm. In: Astrostatistical Challenges for the New Astronomy, pp 185\u2013202. Springer (2013)","DOI":"10.1007\/978-1-4614-3508-2_9"},{"key":"9585_CR14","volume-title":"Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, vol. 27","author":"R Agrawal","year":"1998","unstructured":"Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications, vol. 27. ACM, New York (1998)"},{"key":"9585_CR15","doi-asserted-by":"crossref","unstructured":"Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: ACM SIGMoD Record, vol. 28, pp 61\u201372. ACM (1999)","DOI":"10.1145\/304181.304188"},{"key":"9585_CR16","doi-asserted-by":"crossref","unstructured":"Kailing, K., Kriegel, H. -P., Kr\u00f6ger, P.: Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp 246\u2013256. SIAM (2004)","DOI":"10.1137\/1.9781611972740.23"},{"key":"9585_CR17","doi-asserted-by":"crossref","unstructured":"Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.: A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp 418\u2013427. ACM (2002)","DOI":"10.1145\/564691.564739"},{"key":"9585_CR18","doi-asserted-by":"crossref","unstructured":"Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp 394\u2013405. ACM (2002)","DOI":"10.1145\/564691.564737"},{"key":"9585_CR19","unstructured":"Pei, J., Zhang, X., Cho, M., Wang, H., Yu, P.S.: Maple: A fast algorithm for maximal pattern-based clustering. In: Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, pp 259\u2013266. IEEE (2003)"},{"key":"9585_CR20","doi-asserted-by":"crossref","unstructured":"Liu, J., Wang, W.: Op-cluster: Clustering by tendency in high dimensional space. In: Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, pp 187\u2013194. IEEE (2003)","DOI":"10.1109\/ICDM.2003.1250919"},{"issue":"3","key":"9585_CR21","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1016\/0167-8191(87)90031-7","volume":"4","author":"R Melhem","year":"1987","unstructured":"Melhem, R.: Parallel gauss-jordan elimination for the solution of dense linear systems. Parallel Comput. 4(3), 339\u2013343 (1987)","journal-title":"Parallel Comput."},{"issue":"1","key":"9585_CR22","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1145\/1656274.1656286","volume":"11","author":"A Zimek","year":"2009","unstructured":"Zimek, A.: Correlation clustering. ACM SIGKDD Explor. Newsl. 11(1), 53\u201354 (2009)","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"9585_CR23","doi-asserted-by":"crossref","unstructured":"Feng, J., Lin, Z., Xu, H., Yan, S.: Robust subspace segmentation with block-diagonal prior. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3818\u20133825 (2014)","DOI":"10.1109\/CVPR.2014.482"},{"issue":"1","key":"9585_CR24","doi-asserted-by":"publisher","first-page":"116","DOI":"10.1109\/TAC.2005.861710","volume":"51","author":"Y Kim","year":"2006","unstructured":"Kim, Y., Mesbahi, M.: On maximizing the second smallest eigenvalue of a state-dependent graph laplacian. IEEE Trans. Autom. Control 51(1), 116\u2013120 (2006)","journal-title":"IEEE Trans. Autom. Control"},{"key":"9585_CR25","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316801","volume-title":"Finding groups in data. An introduction to cluster analysis","author":"L Kauffman","year":"1990","unstructured":"Kauffman, L., Rousseeuw, P.: Finding groups in data. An introduction to cluster analysis. John Willey & Sons, New York (1990)"},{"issue":"5","key":"9585_CR26","doi-asserted-by":"publisher","first-page":"873","DOI":"10.1016\/j.jmva.2006.11.013","volume":"98","author":"M Meil\u0103","year":"2007","unstructured":"Meil\u0103, M.: Comparing clusterings? an information based distance. J. Multiv. Anal. 98(5), 873\u2013895 (2007)","journal-title":"J. Multiv. Anal."},{"key":"9585_CR27","doi-asserted-by":"crossref","unstructured":"Xiao, C., Ye, J., Esteves, R.M., Rong, C.: Using spearman\u2019s correlation coefficients for exploratory data analysis on big dataset. Concurr. Comput. Pract. Experience, 1\u201313 (2015)","DOI":"10.1002\/cpe.3745"},{"key":"9585_CR28","doi-asserted-by":"crossref","unstructured":"Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM Sigmod Record, vol. 29, pp 1\u201312. ACM (2000)","DOI":"10.1145\/335191.335372"},{"issue":"1","key":"9585_CR29","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1145\/963770.963776","volume":"22","author":"M Deshpande","year":"2004","unstructured":"Deshpande, M., Karypis, G.: Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst. (TOIS) 22(1), 143\u2013177 (2004)","journal-title":"ACM Trans. Inf. Syst. (TOIS)"},{"issue":"3","key":"9585_CR30","doi-asserted-by":"publisher","first-page":"700","DOI":"10.1016\/j.eswa.2005.04.037","volume":"29","author":"J-S Lee","year":"2005","unstructured":"Lee, J.-S., Jun, C.-H., Lee, J., Kim, S.: Classification-based collaborative filtering using market basket data. Expert Syst. Appl. 29(3), 700\u2013704 (2005)","journal-title":"Expert Syst. Appl."},{"issue":"2","key":"9585_CR31","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1023\/B:DAMI.0000031629.31935.ac","volume":"9","author":"A Demiriz","year":"2004","unstructured":"Demiriz, A.: Enhancing product recommender systems on sparse binary data. Data Min. Knowl. Disc. 9(2), 147\u2013170 (2004)","journal-title":"Data Min. Knowl. Disc."},{"key":"9585_CR32","first-page":"2935","volume":"10","author":"A Gunawardana","year":"2009","unstructured":"Gunawardana, A., Shani, G.: A survey of accuracy evaluation metrics of recommendation tasks. J. M. Learn. Res. 10, 2935\u20132962 (2009)","journal-title":"J. M. Learn. Res."},{"key":"9585_CR33","doi-asserted-by":"crossref","unstructured":"Hahsler, M.: recommenderlab: A framework for developing and testing recommendation algorithms, Southern Methodist University (2011)","DOI":"10.32614\/CRAN.package.recommenderlab"},{"key":"9585_CR34","unstructured":"Chowdhury, G.G.: Introduction to modern information retrieval. Facet Publishing, London, United Kingdom (2010)"},{"issue":"4","key":"9585_CR35","first-page":"19","volume":"5","author":"FM Harper","year":"2016","unstructured":"Harper, F.M., Konstan, J.A.: The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst. (TiiS) 5(4), 19 (2016)","journal-title":"ACM Trans. Interact. Intell. Syst. (TiiS)"},{"key":"9585_CR36","doi-asserted-by":"crossref","unstructured":"Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., Duval, E.: Dataset-driven research for improving recommender systems for learning. In: Proceedings of the 1st International Conference on Learning Analytics and Knowledge, pp 44\u201353. ACM (2011)","DOI":"10.1145\/2090116.2090122"},{"key":"9585_CR37","doi-asserted-by":"crossref","unstructured":"Shah, M., Parikh, D., Deshpande, B.: Movie recommendation system employing latent graph features in extremely randomized trees. In: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, p 42. ACM (2016)","DOI":"10.1145\/2905055.2905248"},{"issue":"3","key":"9585_CR38","first-page":"41","volume":"7","author":"S Dooms","year":"2016","unstructured":"Dooms, S., Bellog\u00edn, A., Pessemier, T.D., Martens, L.: A framework for dataset benchmarking and its application to a new movie rating dataset. ACM Tran. Intell. Syst. Technol. (TIST) 7(3), 41 (2016)","journal-title":"ACM Tran. Intell. Syst. Technol. (TIST)"}],"container-title":["Journal of Grid Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10723-021-09585-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10723-021-09585-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10723-021-09585-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,10]],"date-time":"2024-09-10T04:19:38Z","timestamp":1725941978000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10723-021-09585-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,18]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["9585"],"URL":"https:\/\/doi.org\/10.1007\/s10723-021-09585-9","relation":{},"ISSN":["1570-7873","1572-9184"],"issn-type":[{"value":"1570-7873","type":"print"},{"value":"1572-9184","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,18]]},"assertion":[{"value":"24 June 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 August 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 October 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 July 2022","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"License copyright has been updated.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"42"}}