{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T17:25:49Z","timestamp":1772385949039,"version":"3.50.1"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2018,6,12]],"date-time":"2018-06-12T00:00:00Z","timestamp":1528761600000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2018,12]]},"DOI":"10.1007\/s10994-018-5722-4","type":"journal-article","created":{"date-parts":[[2018,6,12]],"date-time":"2018-06-12T12:42:20Z","timestamp":1528807340000},"page":"1987-2025","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Clustering with missing features: a penalized dissimilarity measure based approach"],"prefix":"10.1007","volume":"107","author":[{"given":"Shounak","family":"Datta","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Supritam","family":"Bhattacharjee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Swagatam","family":"Das","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2018,6,12]]},"reference":[{"key":"5722_CR1","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1007\/978-3-642-17103-1_60","volume-title":"Classification, clustering, and data mining applications, studies in classification, data analysis, and knowledge organisation","author":"E Acu\u00f1a","year":"2004","unstructured":"Acu\u00f1a, E., & Rodriguez, C. (2004). The treatment of missing values and its effect on classifier accuracy. In D. Banks, F. R. McMorris, P. Arabie, & W. Gaul (Eds.), Classification, clustering, and data mining applications, studies in classification, data analysis, and knowledge organisation (pp. 639\u2013647). Berlin, Heidelberg: Springer."},{"key":"5722_CR2","first-page":"393","volume-title":"Advances in neural information processing systems 5","author":"S Ahmad","year":"1993","unstructured":"Ahmad, S., & Tresp, V. (1993). Some solutions to the missing feature problem in vision. In S. Hanson, J. Cowan, & C. Giles (Eds.), Advances in neural information processing systems 5 (pp. 393\u2013400). Los Altos, CA: Morgan-Kaufmann."},{"key":"5722_CR3","doi-asserted-by":"crossref","unstructured":"Barcel\u00f3, C. (2008). The impact of alternative imputation methods on the measurement of income and wealth: Evidence from the spanish survey of household finances. In Working paper series. Banco de Espa\u00f1a.","DOI":"10.2139\/ssrn.1321827"},{"issue":"3","key":"5722_CR4","doi-asserted-by":"publisher","first-page":"34e","DOI":"10.1093\/nar\/gnh026","volume":"32","author":"T. H. Bo","year":"2004","unstructured":"Bo, T. H., Dysvik, B., & Jonassen, I. (2004). Lsimpute: Accurate estimation of missing values in microarray data with least squares methods. Nucleic Acid Research, 32(3).","journal-title":"Nucleic Acids Research"},{"issue":"8\u201313","key":"5722_CR5","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1016\/S0169-7552(97)00031-7","volume":"29","author":"AZ Broder","year":"1997","unstructured":"Broder, A. Z., Glassman, S. C., Manasse, M. S., & Zweig, G. (1997). Syntactic clustering of the web. Computer Networks and ISDN Systems, 29(8\u201313), 1157\u20131166.","journal-title":"Computer Networks and ISDN Systems"},{"issue":"338","key":"5722_CR6","first-page":"473","volume":"67","author":"LS Chan","year":"1972","unstructured":"Chan, L. S., & Dunn, O. J. (1972). The treatment of missing values in discriminant analysis-1. The sampling experiment. Journal of the American Statistical Association, 67(338), 473\u2013477.","journal-title":"Journal of the American Statistical Association"},{"issue":"3","key":"5722_CR7","doi-asserted-by":"publisher","first-page":"370","DOI":"10.2307\/3151899","volume":"34","author":"Anil Chaturvedi","year":"1997","unstructured":"Chaturvedi, A., Carroll, J.\u00a0D., Green, P.\u00a0E., & Rotondo, J.\u00a0A. (1997). A feature-based approach to market segmentation via overlapping k-centroids clustering. Journal of Marketing Research, pp. 370\u2013377.","journal-title":"Journal of Marketing Research"},{"key":"5722_CR8","first-page":"1","volume":"9","author":"G Chechik","year":"2008","unstructured":"Chechik, G., Heitz, G., Elidan, G., Abbeel, P., & Koller, D. (2008). Max-margin classification of data with absent features. Journal of Machine Learning Research, 9, 1\u201321.","journal-title":"Journal of Machine Learning Research"},{"key":"5722_CR9","unstructured":"Chen, F. (2013). Missing no more: Using the mcmc procedure to model missing data. In Proceedings of the SAS global forum 2013 conference, pp. 1\u201323. SAS Institute Inc."},{"key":"5722_CR10","unstructured":"Datta, S., Bhattacharjee, S., & Das, S. (2016a). Clustering with missing features: A penalized dissimilarity measure based approach. CoRR, \n                    arXiv:1604.06602\n                    \n                  ."},{"key":"5722_CR11","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1016\/j.patrec.2016.06.023","volume":"80","author":"S Datta","year":"2016","unstructured":"Datta, S., Misra, D., & Das, S. (2016b). A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features. Pattern Recognition Letters, 80, 231\u2013237.","journal-title":"Pattern Recognition Letters"},{"key":"5722_CR12","unstructured":"Dempster, A.\u00a0P., & Rubin, D.\u00a0B. (1983). Incomplete data in sample surveys, vol.\u00a02, chap. Part I: Introduction, pp. 3\u201310. New York: Academic Press."},{"key":"5722_CR13","unstructured":"Dheeru, D., & Taniskidou, E. K. (2017). UCI machine learning repository. Online repository at \n                    http:\/\/archive.ics.uci.edu\/ml\n                    \n                  ."},{"issue":"10","key":"5722_CR14","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1109\/TSMC.1979.4310090","volume":"9","author":"JK Dixon","year":"1979","unstructured":"Dixon, J. K. (1979). Pattern recognition with partly missing data. IEEE Transactions on Systems, Man and Cybernetics, 9(10), 617\u2013621.","journal-title":"IEEE Transactions on Systems, Man and Cybernetics"},{"issue":"10","key":"5722_CR15","doi-asserted-by":"publisher","first-page":"1087","DOI":"10.1016\/j.jclinepi.2006.01.014","volume":"59","author":"ART Donders","year":"2006","unstructured":"Donders, A. R. T., van der Heijden, G. J. M. G., Stijnen, T., & Moons, K. G. M. (2006). Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087\u20131091.","journal-title":"Journal of Clinical Epidemiology"},{"key":"5722_CR16","first-page":"768","volume":"21","author":"EW Forgy","year":"1965","unstructured":"Forgy, E. W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768\u2013769.","journal-title":"Biometrics"},{"key":"5722_CR17","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1007\/3-540-45554-X_46","volume-title":"Rough Sets and Current Trends in Computing","author":"Jerzy W. Grzymala-Busse","year":"2001","unstructured":"Grzymala-Busse, J.\u00a0W., & Hu, M. (2001). A comparison of several approaches to missing attribute values in data mining. In Rough sets and current trends in computing, pp. 378\u2013385. Berlin: Springer."},{"issue":"5","key":"5722_CR18","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1109\/3477.956035","volume":"31","author":"RJ Hathaway","year":"2001","unstructured":"Hathaway, R. J., & Bezdek, J. C. (2001). Fuzzy c-means clustering of incomplete data. IEEE Transactions on Systems, Man, and Cybernetics: Part B: Cybernetics, 31(5), 735\u2013744.","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics: Part B: Cybernetics"},{"key":"5722_CR19","unstructured":"Haveliwala, T., Gionis, A., & Indyk, P. (2000). Scalable techniques for clustering the web. Tech. rep.: Stanford University."},{"issue":"3","key":"5722_CR20","first-page":"207","volume":"50","author":"DF Heitjan","year":"1996","unstructured":"Heitjan, D. F., & Basu, S. (1996). Distinguishing \u201cmissing at random\u201d and \u201cmissing completely at random\u201d. The American Statistician, 50(3), 207\u2013213.","journal-title":"The American Statistician"},{"key":"5722_CR21","doi-asserted-by":"crossref","unstructured":"Himmelspach, L., & Conrad, S. (2010). Clustering approaches for data with missing values: Comparison and evaluation. In Digital Information Management (ICDIM), 2010 fifth international conference on, pp. 19\u201328.","DOI":"10.1109\/ICDIM.2010.5664691"},{"issue":"3","key":"5722_CR22","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1198\/000313001317098266","volume":"55","author":"NJ Horton","year":"2001","unstructured":"Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. The American Statistician, 55(3), 244\u2013254.","journal-title":"The American Statistician"},{"issue":"1","key":"5722_CR23","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193\u2013218.","journal-title":"Journal of Classification"},{"key":"5722_CR24","unstructured":"Jin, J. (2017). Genomics dataset repository. Online Repository at \n                    http:\/\/www.stat.cmu.edu\/~jiashun\/Research\/software\/GenomicsData\/\n                    \n                  ."},{"key":"5722_CR25","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1007\/978-3-540-25966-4_9","volume-title":"Multiple Classifier Systems","author":"Piotr Juszczak","year":"2004","unstructured":"Juszczak, P., & Duin, R. P.\u00a0W. (2004). Combining one-class classifiers to classify missing data. In Multiple classifier systems, pp. 92\u2013101. Berlin: Springer."},{"key":"5722_CR26","doi-asserted-by":"crossref","unstructured":"Krause, S., & Polikar, R. (2003). An ensemble of classifiers approach for the missing feature problem. In Proceedings of the international joint conference on neural networks, vol.\u00a01, pp. 553\u2013558. IEEE.","DOI":"10.1109\/IJCNN.2003.1223406"},{"key":"5722_CR27","unstructured":"Lasdon, L. S. (2013). Optimization theory for large systems. Courier Corporation."},{"key":"5722_CR28","doi-asserted-by":"crossref","unstructured":"Lei, L. (2010). Identify earthquake hot spots with 3-dimensional density-based clustering analysis. In Geoscience and remote sensing symposium (IGARSS), 2010 IEEE international, pp. 530\u2013533. IEEE.","DOI":"10.1109\/IGARSS.2010.5652510"},{"key":"5722_CR29","volume-title":"Statistical analysis with missing data","author":"RJA Little","year":"1987","unstructured":"Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley."},{"issue":"2","key":"5722_CR30","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"SP Lloyd","year":"1982","unstructured":"Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2), 129\u2013137.","journal-title":"IEEE Transactions on Information Theory"},{"key":"5722_CR31","unstructured":"MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol.\u00a01, pp. 281\u2013297. University of California Press."},{"key":"5722_CR32","unstructured":"Marlin, B.\u00a0M. (2008). Missing data problems in machine learning. Ph.D. thesis, University of Toronto."},{"key":"5722_CR33","doi-asserted-by":"crossref","unstructured":"Mill\u00e1n-Giraldo, M., Duin, R.\u00a0P., & S\u00e1nchez, J.\u00a0S. (2010). Dissimilarity-based classification of data with missing attributes. In Cognitive information processing (CIP), 2010 2nd international workshop on, pp. 293\u2013298. IEEE.","DOI":"10.1109\/CIP.2010.5604125"},{"issue":"1","key":"5722_CR34","first-page":"86","volume":"2","author":"F Murtagh","year":"2012","unstructured":"Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86\u201397.","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"issue":"11","key":"5722_CR35","doi-asserted-by":"publisher","first-page":"999","DOI":"10.1109\/32.965340","volume":"27","author":"I Myrtveit","year":"2001","unstructured":"Myrtveit, I., Stensrud, E., & Olsson, U. H. (2001). Analyzing data sets with missing data: An empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering, 27(11), 999\u20131013.","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"1","key":"5722_CR36","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/j.artmed.2011.11.006","volume":"55","author":"L Nanni","year":"2012","unstructured":"Nanni, L., Lumini, A., & Brahnam, S. (2012). A classifier ensemble approach for the missing feature problem. Artificial Intelligence in Medicine, 55(1), 37\u201350.","journal-title":"Artificial Intelligence in Medicine"},{"key":"5722_CR37","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1007\/978-3-642-41822-8_27","volume-title":"Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications","author":"Diana Porro-Mu\u00f1oz","year":"2013","unstructured":"Porro-Mu\u00f1oz, D., Duin, R.\u00a0P., & Talavera, I. (2013). Missing values in dissimilarity-based classification of multi-way data. In Iberoamerican congress on pattern recognition, pp. 214\u2013221. Berlin: Springer."},{"issue":"3","key":"5722_CR38","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1093\/biomet\/63.3.581","volume":"63","author":"DB Rubin","year":"1976","unstructured":"Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581\u2013592.","journal-title":"Biometrika"},{"key":"5722_CR39","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316696","volume-title":"Multiple imputation for nonresponse in surveys","author":"DB Rubin","year":"1987","unstructured":"Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. London: Wiley."},{"issue":"1","key":"5722_CR40","first-page":"110","volume":"16","author":"AS Sabau","year":"2012","unstructured":"Sabau, A. S. (2012). Survey of clustering based financial fraud detection research. Informatica Economica, 16(1), 110.","journal-title":"Informatica Economica"},{"key":"5722_CR41","doi-asserted-by":"publisher","DOI":"10.1201\/9781439821862","volume-title":"Analysis of incomplete multivariate data","author":"JL Schafer","year":"1997","unstructured":"Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: CRC Press."},{"issue":"2","key":"5722_CR42","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1037\/1082-989X.7.2.147","volume":"7","author":"JL Schafer","year":"2002","unstructured":"Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147\u2013177.","journal-title":"Psychological Methods"},{"issue":"10","key":"5722_CR43","doi-asserted-by":"publisher","first-page":"2417","DOI":"10.1093\/bioinformatics\/bti345","volume":"21","author":"MSB Sehgal","year":"2005","unstructured":"Sehgal, M. S. B., Gondal, I., & Dooley, L. S. (2005). Collateral missing value imputation: a new robust missing value estimation algorithm fpr microarray data. Bioinformatics, 21(10), 2417\u20132423.","journal-title":"Bioinformatics"},{"issue":"1","key":"5722_CR44","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1109\/TPAMI.1984.4767478","volume":"6","author":"SZ Selim","year":"1984","unstructured":"Selim, S. Z., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(1), 81\u201387.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"5722_CR45","doi-asserted-by":"crossref","unstructured":"Shelly, D. R., Ellsworth, W. L., Ryberg, T., Haberland, C., Fuis, G. S., Murphy, J., et al. (2009). Precise location of san andreas fault tremors near cholame, california using seismometer clusters: Slip on the deep extension of the fault? Geophysical Research Letters, 36(1).","DOI":"10.1029\/2008GL036367"},{"issue":"6","key":"5722_CR46","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","volume":"17","author":"O Troyanskaya","year":"2001","unstructured":"Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., et al. (2001). Missing value estimation methods for dna microarrays. Bioinformatics, 17(6), 520\u2013525.","journal-title":"Bioinformatics"},{"key":"5722_CR47","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1007\/978-3-642-17103-1_61","volume-title":"Classification, Clustering, and Data Mining Applications","author":"Kiri Wagstaff","year":"2004","unstructured":"Wagstaff, K.\u00a0L. (2004). Clustering with missing values: No imputation required. In Proceedings of the meeting of the international Federation of classification societies, pp. 649\u2013658."},{"key":"5722_CR48","unstructured":"Wagstaff, K.\u00a0L., & Laidler, V.\u00a0G. (2005). Making the most of missing values: Object clustering with partial data in astronomy. In Astronomical data analysis software and systems XIV, ASP Conference Series, pp. 172\u2013176. Astronomical Society of the Pacific."},{"issue":"3","key":"5722_CR49","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1111\/1467-9469.00306","volume":"29","author":"Q Wang","year":"2002","unstructured":"Wang, Q., & Rao, J. N. K. (2002a). Empirical likelihood-based inference in linear models with missing data. Scandinavian Journal of Statistics, 29(3), 563\u2013576.","journal-title":"Scandinavian Journal of Statistics"},{"issue":"3","key":"5722_CR50","doi-asserted-by":"publisher","first-page":"896","DOI":"10.1214\/aos\/1028674845","volume":"30","author":"Q Wang","year":"2002","unstructured":"Wang, Q., & Rao, J. N. K. (2002b). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896\u2013924.","journal-title":"The Annals of Statistics"},{"issue":"2","key":"5722_CR51","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1111\/j.1365-246X.2008.03997.x","volume":"176","author":"G Weatherill","year":"2009","unstructured":"Weatherill, G., & Burton, P. W. (2009). Delineation of shallow seismic source zones using k-means cluster analysis, with application to the aegean region. Geophysical Journal International, 176(2), 565\u2013588.","journal-title":"Geophysical Journal International"},{"key":"5722_CR52","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1287\/opre.24.4.643","volume":"24","author":"RE Wendel","year":"1976","unstructured":"Wendel, R. E., & Hurter, A. P, Jr. (1976). Minimization of a non-separable objective function subject to disjoint constraints. Operations Research, 24, 643\u2013657.","journal-title":"Operations Research"},{"issue":"6","key":"5722_CR53","doi-asserted-by":"publisher","first-page":"80","DOI":"10.2307\/3001968","volume":"1","author":"F Wilcoxon","year":"1945","unstructured":"Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80\u201383.","journal-title":"Biometrics Bulletin"},{"issue":"02","key":"5722_CR54","doi-asserted-by":"publisher","first-page":"185","DOI":"10.1142\/S0218194012400025","volume":"22","author":"W Zhang","year":"2012","unstructured":"Zhang, W., Yang, Y., & Wang, Q. (2012). A comparative study of absent features and unobserved values in software effort data. International Journal of Software Engineering and Knowledge Engineering, 22(02), 185\u2013202.","journal-title":"International Journal of Software Engineering and Knowledge Engineering"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-018-5722-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-018-5722-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-018-5722-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,6,11]],"date-time":"2019-06-11T20:03:48Z","timestamp":1560283428000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-018-5722-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,12]]},"references-count":54,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2018,12]]}},"alternative-id":["5722"],"URL":"https:\/\/doi.org\/10.1007\/s10994-018-5722-4","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,6,12]]},"assertion":[{"value":"16 October 2017","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 June 2018","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 June 2018","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}