{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T07:06:57Z","timestamp":1777446417993,"version":"3.51.4"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,2,3]],"date-time":"2020-02-03T00:00:00Z","timestamp":1580688000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,2,3]],"date-time":"2020-02-03T00:00:00Z","timestamp":1580688000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Telecom Companies logs customer\u2019s actions which generate a huge amount of data that can bring important findings related to customer\u2019s behavior and needs. The main characteristics of such data are the large number of features and the high sparsity that impose challenges to the analytics steps. This paper aims to explore dimensionality reduction on a real telecom dataset and evaluate customers\u2019 clustering in reduced and latent space, compared to original space in order to achieve better quality clustering results. The original dataset contains 220 features that belonging to 100,000 customers. However, dimensionality reduction is an important data preprocessing step in the data mining process specially with the presence of curse of dimensionality. In particular, the aim of data reduction techniques is to filter out irrelevant features and noisy data samples. To reduce the high dimensional data, we projected it down to a subspace using well known Principal Component Analysis (PCA) decomposition and a novel approach based on Autoencoder Neural Network, performing in this way dimensionality reduction of original data. Then K-Means Clustering is applied on both-original and reduced data set. Different internal measures were performed to evaluate clustering for different numbers of dimensions and then we evaluated how the reduction method impacts the clustering task.<\/jats:p>","DOI":"10.1186\/s40537-020-0286-0","type":"journal-article","created":{"date-parts":[[2020,2,3]],"date-time":"2020-02-03T03:03:23Z","timestamp":1580699003000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":112,"title":["A comparative dimensionality reduction study in telecom customer segmentation using deep learning and PCA"],"prefix":"10.1186","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0509-6796","authenticated-orcid":false,"given":"Maha","family":"Alkhayrat","sequence":"first","affiliation":[]},{"given":"Mohamad","family":"Aljnidi","sequence":"additional","affiliation":[]},{"given":"Kadan","family":"Aljoumaa","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,2,3]]},"reference":[{"issue":"1","key":"286_CR1","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1186\/s40537-019-0180-9","volume":"6","author":"IM Al-Zuabi","year":"2019","unstructured":"Al-Zuabi IM, Jafar A, Aljoumaa K. Predicting customer\u2019s gender and age depending on mobile phone data. J Big Data. 2019;6(1):18.","journal-title":"J Big Data"},{"key":"286_CR2","doi-asserted-by":"crossref","unstructured":"Joulin A, Bach F, Ponce J. Discriminative clustering for image co-segmentation. In: 2010 IEEE computer society conference on computer vision and pattern recognition. New York: IEEE; 2010. p. 1943\u201350.","DOI":"10.1109\/CVPR.2010.5539868"},{"key":"286_CR3","doi-asserted-by":"crossref","unstructured":"Liu H, Shao M, Li S, Fu Y. Infinite ensemble for image clustering. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Nwe York: ACM; 2016. p. 1745\u201354.","DOI":"10.1145\/2939672.2939813"},{"key":"286_CR4","unstructured":"Wang R, Shan S, Chen X, Gao W. Manifold-manifold distance with application to face recognition based on image set. In: 2008 IEEE conference on computer vision and pattern recognition. New York: IEEE; 2008. p. 1\u20138."},{"key":"286_CR5","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1007\/978-1-4614-3223-4_4","volume-title":"Mining text data","author":"CC Aggarwal","year":"2012","unstructured":"Aggarwal CC, Zhai C. A survey of text clustering algorithms. Mining text data. Berlin: Springer; 2012. p. 77\u2013128."},{"key":"286_CR6","doi-asserted-by":"crossref","unstructured":"Beil F, Ester M, Xu X. Frequent term-based text clustering. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2002. p. 436\u201342.","DOI":"10.1145\/775047.775110"},{"key":"286_CR7","doi-asserted-by":"crossref","unstructured":"Xu J, Peng W, Guanhua T, Bo X, Jun Z, Fangyuan W, Hongwei H, et al. Short text clustering via convolutional neural networks; 2015.","DOI":"10.3115\/v1\/W15-1509"},{"key":"286_CR8","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1016\/j.ymeth.2016.06.024","volume":"110","author":"K Tian","year":"2016","unstructured":"Tian K, Shao M, Wang Y, Guan J, Zhou S. Boosting compound\u2013protein interaction prediction by deep learning. Methods. 2016;110:64\u201372.","journal-title":"Methods"},{"key":"286_CR9","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1186\/1471-2105-16-S5-S2","volume":"16","author":"R Zhang","year":"2015","unstructured":"Zhang R, Cheng Z, Guan J, Zhou S. Exploiting topic modeling to boost metagenomic reads binning. BMC Bioinform. 2015;16:2.","journal-title":"BMC Bioinform"},{"key":"286_CR10","doi-asserted-by":"crossref","unstructured":"Dueck D, Frey BJ. Non-metric affinity propagation for unsupervised image categorization. In: 2007 IEEE 11th international conference on computer vision. New York: IEEE; 2007. p. 1\u20138.","DOI":"10.1109\/ICCV.2007.4408853"},{"key":"286_CR11","unstructured":"Ng AY, Jordan MI, Weiss Y. On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems. MIT Press; 2001. p. 849\u201356."},{"key":"286_CR12","doi-asserted-by":"publisher","first-page":"881","DOI":"10.1109\/TPAMI.2002.1017616","volume":"7","author":"T Kanungo","year":"2002","unstructured":"Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY. An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell. 2002;7:881\u201392.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"286_CR13","volume-title":"Pattern recognition and machine learning","author":"CM Bishop","year":"2006","unstructured":"Bishop CM. Pattern recognition and machine learning. Berlin: Sspringer; 2006."},{"key":"286_CR14","volume-title":"Adaptive control processes: a guided tour","author":"RE Bellman","year":"2015","unstructured":"Bellman RE. Adaptive control processes: a guided tour, vol. 2045. Princeton: Princeton University Press; 2015."},{"key":"286_CR15","volume-title":"Introduction to data mining","author":"P-N Tan","year":"2005","unstructured":"Tan P-N, Steinbach M, Kumar V. Introduction to data mining. Boston: Addison-Wesley Longman Publishing Co., Inc.; 2005."},{"issue":"1","key":"286_CR16","doi-asserted-by":"publisher","first-page":"115","DOI":"10.2333\/bhmk.41.115","volume":"41","author":"M Yamamoto","year":"2014","unstructured":"Yamamoto M, Hwang H. A general formulation of cluster analysis with dimension reduction and subspace separation. Behaviormetrika. 2014;41(1):115\u201329.","journal-title":"Behaviormetrika"},{"issue":"1","key":"286_CR17","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1109\/TKDE.2016.2606098","volume":"29","author":"K Allab","year":"2016","unstructured":"Allab K, Labiod L, Nadif M. A semi-nmf-pca unified framework for data clustering. IEEE Trans Knowl Data Eng. 2016;29(1):2\u201316.","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"12","key":"286_CR18","doi-asserted-by":"publisher","first-page":"6396","DOI":"10.1109\/TNNLS.2018.2815623","volume":"29","author":"K Allab","year":"2018","unstructured":"Allab K, Labiod L, Nadif M. Simultaneous spectral data embedding and clustering. IEEE Trans Neural Netw Learn Syst. 2018;29(12):6396\u2013401.","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"286_CR19","doi-asserted-by":"crossref","unstructured":"Wold S, Esbensen KH, Geladi P. Principal component analysis; 1987.","DOI":"10.1016\/0169-7439(87)80084-9"},{"key":"286_CR20","doi-asserted-by":"publisher","first-page":"1171","DOI":"10.1214\/009053607000000677","volume":"36","author":"T Hofmann","year":"2008","unstructured":"Hofmann T, Sch\u00f6lkopf B, Smola AJ. Kernel methods in machine learning. Ann Stat. 2008;36:1171\u2013220.","journal-title":"Ann Stat"},{"issue":"5786","key":"286_CR21","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","volume":"313","author":"GE Hinton","year":"2006","unstructured":"Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Sscience. 2006;313(5786):504\u20137.","journal-title":"Sscience"},{"issue":"1","key":"286_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2200000006","volume":"2","author":"Y Bengio","year":"2009","unstructured":"Bengio Y, et al. Learning deep architectures for ai. Found Trends\u00ae Mach Learn. 2009;2(1):1\u2013127.","journal-title":"Found Trends\u00ae Mach Learn"},{"key":"286_CR23","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2015. p. 1\u20139.","DOI":"10.1109\/CVPR.2015.7298594"},{"issue":"8","key":"286_CR24","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;35(8):1798\u2013828.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"286_CR25","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"J Schmidhuber","year":"2015","unstructured":"Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85\u2013117.","journal-title":"Neural Netw"},{"key":"286_CR26","unstructured":"Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In: International conference on machine learning; 2016. p. 478\u201387."},{"key":"286_CR27","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1016\/j.patcog.2018.05.019","volume":"83","author":"F Li","year":"2018","unstructured":"Li F, Qiao H, Zhang B. Discriminatively boosted image clustering with fully convolutional auto-encoders. Pattern Recognit. 2018;83:161\u201373.","journal-title":"Pattern Recognit"},{"key":"286_CR28","doi-asserted-by":"crossref","unstructured":"Wang Z, Chang S, Zhou J, Wang M, Huang TS. Learning a task-specific deep architecture for clustering. In: Proceedings of the 2016 SIAM international conference on data mining. Bangkok: SIAM; 2016. p. 369\u201377.","DOI":"10.1137\/1.9781611974348.42"},{"key":"286_CR29","unstructured":"Baldi P. Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML workshop on unsupervised and transfer learning; 2012. p. 37\u201349."},{"key":"286_CR30","doi-asserted-by":"crossref","unstructured":"Tian F, Gao B, Cui Q, Chen E, Liu T-Y. Learning deep representations for graph clustering. In: Twenty-eighth AAAI conference on artificial intelligence; 2014.","DOI":"10.1609\/aaai.v28i1.8916"},{"key":"286_CR31","unstructured":"Shao M, Li S, Ding Z, Fu Y. Deep linear coding for fast graph clustering. In: Twenty-fourth international joint conference on artificial intelligence; 2015."},{"key":"286_CR32","doi-asserted-by":"crossref","unstructured":"Wang W, Huang Y, Wang Y, Wang L. Generalized autoencoder: A neural network framework for dimensionality reduction. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops; 2014. p. 490\u20137.","DOI":"10.1109\/CVPRW.2014.79"},{"key":"286_CR33","doi-asserted-by":"crossref","unstructured":"Huang P, Huang Y, Wang W, Wang L. Deep embedding network for clustering. In: 2014 22nd international conference on pattern recognition. New York: IEEE; 2014. p. 1532\u20137.","DOI":"10.1109\/ICPR.2014.272"},{"key":"286_CR34","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1007\/978-3-319-57529-2_62","volume-title":"Advances in Knowledge Discovery and Data Mining","author":"Milad Leyli-Abadi","year":"2017","unstructured":"Leyli-Abadi M, Labiod L, Nadif M. Denoising autoencoder as an effective dimensionality reduction and clustering of text data. In: Pacific-Asia conference on knowledge discovery and data mining. Berlin: Springer; 2017. p. 801\u201313."},{"key":"286_CR35","unstructured":"Yang B, Fu X, Sidiropoulos ND, Hong M. Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In: Proceedings of the 34th international conference on machine learning, vol. 70. JMLR. org; 2017. p. 3861\u201370."},{"key":"286_CR36","doi-asserted-by":"publisher","first-page":"809","DOI":"10.1007\/978-3-319-71246-8_49","volume-title":"Machine Learning and Knowledge Discovery in Databases","author":"Kai Tian","year":"2017","unstructured":"Tian K, Zhou S, Guan J. Deepcluster: A general clustering framework based on deep learning. In: Joint European conference on machine learning and knowledge discovery in databases. Berlin: Springer; 2017. p. 809\u201325."},{"key":"286_CR37","doi-asserted-by":"crossref","unstructured":"Seuret M, Alberti M, Liwicki M, Ingold R. Pca-initialized deep neural networks applied to document image analysis. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR). vol. 1. New York: IEEE; 2017. p. 877\u201382.","DOI":"10.1109\/ICDAR.2017.148"},{"key":"286_CR38","first-page":"380","volume-title":"Lecture Notes in Computer Science","author":"Ershad Banijamali","year":"2017","unstructured":"Banijamali E, Ghodsi A. Fast spectral clustering using autoencoders and landmarks. In: International conference image analysis and recognition. Berlin: Springer; 2017. p. 380\u20138."},{"key":"286_CR39","doi-asserted-by":"crossref","unstructured":"Wang S, Ding Z, Fu Y. Feature selection guided auto-encoder. In: Thirty-first AAAI conference on artificial intelligence; 2017.","DOI":"10.1609\/aaai.v31i1.10811"},{"key":"286_CR40","doi-asserted-by":"crossref","unstructured":"Affeldt S, Labiod L, Nadif M. Spectral clustering via ensemble deep autoencoder learning (sc-edae); 2019. arXiv preprint arXiv:1901.02291.","DOI":"10.1016\/j.patcog.2020.107522"},{"key":"286_CR41","doi-asserted-by":"crossref","unstructured":"Lai X-a. Segmentation study on enterprise customers based on data mining technology. In: 2009 first international workshop on database technology and applications. New York: IEEE; 2009. p. 247\u201350.","DOI":"10.1109\/DBTA.2009.96"},{"key":"286_CR42","unstructured":"Jansen S. Customer segmentation and customer profiling for a mobile telecommunications company based on usage behavior. A Vodafone case study; 2007.p. 66."},{"key":"286_CR43","unstructured":"Aheleroff S. Customer segmentation for a mobile telecommunications company based on service usage behavior. In: The 3rd international conference on data mining and intelligent information technology applications. New York: IEEE; 2011. pp. 308\u201313."},{"key":"286_CR44","doi-asserted-by":"crossref","unstructured":"Masood S, Ali M, Arshad F, Qamar AM, Kamal A, Rehman A. Customer segmentation and analysis of a mobile telecommunication company of pakistan using two phase clustering algorithm. In: Eighth international conference on digital information management (ICDIM 2013). New York: IEEE; 2013. p. 137\u201342.","DOI":"10.1109\/ICDIM.2013.6693978"},{"key":"286_CR45","doi-asserted-by":"crossref","unstructured":"Guo X, Gao L, Liu X, Yin J. Improved deep embedded clustering with local structure preservation. In: IJCAI. 2017. p. 1753\u20139.","DOI":"10.24963\/ijcai.2017\/243"},{"key":"286_CR46","first-page":"2252","volume":"16","author":"L Yang","year":"2016","unstructured":"Yang L, Cao X, He D, Wang C, Wang X, Zhang W. Modularity based community detection with deep learning. IJCAI. 2016;16:2252\u20138.","journal-title":"IJCAI"},{"key":"286_CR47","doi-asserted-by":"crossref","unstructured":"Aparna U, Paul S. Feature selection and extraction in data mining. In: 2016 online international conference on green engineering and technologies (IC-GET). New York: IEEE; 2016. p. 1\u20133.","DOI":"10.1109\/GET.2016.7916845"},{"issue":"17","key":"286_CR48","doi-asserted-by":"publisher","first-page":"3299","DOI":"10.19026\/rjaset.6.3638","volume":"6","author":"IB Mohamad","year":"2013","unstructured":"Mohamad IB, Usman D. Standardization and its effects on k-means clustering algorithm. Res J Appl Sci Eng Technol. 2013;6(17):3299\u2013303.","journal-title":"Res J Appl Sci Eng Technol"},{"issue":"4","key":"286_CR49","doi-asserted-by":"publisher","first-page":"974","DOI":"10.1016\/j.csda.2004.06.015","volume":"49","author":"PR Peres-Neto","year":"2005","unstructured":"Peres-Neto PR, Jackson DA, Somers KM. How many principal components? stopping rules for determining the number of non-trivial axes revisited. Comput Stat Data Anal. 2005;49(4):974\u201397.","journal-title":"Comput Stat Data Anal"},{"key":"286_CR50","unstructured":"Reddi SJ, Kale S, Kumar S. On the convergence of adam and beyond; 2019. arXiv preprint arXiv:1904.09237."},{"key":"286_CR51","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v061.i06","volume":"61","author":"M Charrad","year":"2014","unstructured":"Charrad M, Ghazzali N, Boiteux V, Niknafs A. NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw. 2014;61:1\u201336.","journal-title":"J Stat Softw"},{"key":"286_CR52","doi-asserted-by":"crossref","unstructured":"Liu Y, Li Z, Xiong H, Gao X, Wu J. Understanding of internal clustering validation measures. In: 2010 IEEE international conference on data mining. New York: IEEE; 2010. p. 911\u20136.","DOI":"10.1109\/ICDM.2010.35"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-0286-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s40537-020-0286-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-020-0286-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,14]],"date-time":"2022-10-14T01:57:21Z","timestamp":1665712641000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-020-0286-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,3]]},"references-count":52,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["286"],"URL":"https:\/\/doi.org\/10.1186\/s40537-020-0286-0","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,3]]},"assertion":[{"value":"25 September 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 January 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 February 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors Ethics approval and consent to participate.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors consent for publication.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"9"}}