{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:06:26Z","timestamp":1760144786497,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,5,23]],"date-time":"2024-05-23T00:00:00Z","timestamp":1716422400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004489","name":"MITACS","doi-asserted-by":"publisher","award":["IT17587"],"award-info":[{"award-number":["IT17587"]}],"id":[{"id":"10.13039\/501100004489","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Classification, the task of discerning the class of an unlabeled data point using information from a set of labeled data points, is a well-studied area of machine learning with a variety of approaches. Many of these approaches are closely linked to the selection of metrics or the generalizing of similarities defined by kernels. These metrics or similarity measures often require their parameters to be tuned in order to achieve the highest accuracy for each dataset. For example, an extensive search is required to determine the value of K or the choice of distance metric in K-NN classification. This paper explores a method of kernel construction that when used in classification performs consistently over a variety of datasets and does not require the parameters to be tuned. Inspired by dimensionality reduction techniques (DRT), we construct a kernel-based similarity measure that captures the topological structure of the data. This work compares the accuracy of K-NN classifiers, computed with specific operating parameters that obtain the highest accuracy per dataset, to a single trial of the here-proposed kernel classifier with no specialized parameters on standard benchmark sets. The here-proposed kernel used with simple classifiers has comparable accuracy to the \u2018best-case\u2019 K-NN classifiers without requiring the tuning of operating parameters.<\/jats:p>","DOI":"10.3390\/make6020052","type":"journal-article","created":{"date-parts":[[2024,5,23]],"date-time":"2024-05-23T05:35:20Z","timestamp":1716442520000},"page":"1126-1144","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Locally-Scaled Kernels and Confidence Voting"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-3689-5923","authenticated-orcid":false,"given":"Elizabeth","family":"Hofer","sequence":"first","affiliation":[{"name":"Department of Computing and Software, McMaster University, Hamilton, ON L8S 4L8, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1390-620X","authenticated-orcid":false,"given":"Martin","family":"v. Mohrenschildt","sequence":"additional","affiliation":[{"name":"Department of Computing and Software, McMaster University, Hamilton, ON L8S 4L8, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2024,5,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1089\/big.2018.0175","article-title":"Effects of distance measure choice on k-nearest neighbor classifier performance: A review","volume":"7","author":"Hassanat","year":"2019","journal-title":"Big Data"},{"key":"ref_2","unstructured":"Alkasassbeh, M., Altarawneh, G.A., and Hassanat, A. (2015). On enhancing the performance of nearest neighbour classifiers using hassanat distance metric. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"012004","DOI":"10.1088\/1742-6596\/2161\/1\/012004","article-title":"Study of distance metrics on k-nearest neighbor algorithm for star categorization","volume":"2161","author":"Nayak","year":"2022","journal-title":"J. Phys. Conf. Ser."},{"key":"ref_4","first-page":"8212546","article-title":"Hybrid metric k-nearest neighbor algorithm and applications","volume":"2022","author":"Zhang","year":"2022","journal-title":"Math. Probl. Eng."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Yean, C.W., Khairunizam, W., Omar, M.I., Murugappan, M., Zheng, B.S., Bakar, S.A., Razlan, Z.M., and Ibrahim, Z. (2018, January 15\u201317). Analysis of the distance metrics of KNN classifier for EEG signal in stroke patients. Proceedings of the 2018 International Conference on Computational Approach in Smart Systems Design and Applications (ICASSDA), Kuching, Malaysia.","DOI":"10.1109\/ICASSDA.2018.8477601"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"97","DOI":"10.56705\/ijodas.v4i2.71","article-title":"Comparison of Performance of Four Distance Metric Algorithms in K-Nearest Neighbor Method on Diabetes Patient Data","volume":"4","author":"Ratnasari","year":"2023","journal-title":"Indones. J. Data Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hofer, E., and v. Mohrenschildt, M. (2022). Model-Free Data Mining of Families of Rotating Machinery. Appl. Sci., 12.","DOI":"10.3390\/app12063178"},{"key":"ref_8","unstructured":"Ghojogh, B., Ghodsi, A., Karray, F., and Crowley, M. (2021). Reproducing Kernel Hilbert Space, Mercer\u2019s Theorem, Eigenfunctions, Nystr\u00f6m Method, and Use of Kernels in Machine Learning: Tutorial and Survey. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1016\/j.neucom.2017.06.005","article-title":"Kernel-driven similarity learning","volume":"267","author":"Kang","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_10","first-page":"73","article-title":"Robust statistics for outlier detection","volume":"Volume 1","author":"Rousseeuw","year":"2011","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"key":"ref_11","unstructured":"Dua, D., and Graff, C. (2017). UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest neighbor pattern classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2733","DOI":"10.1214\/12-AOS1049","article-title":"Optimal weighted nearest neighbour classifiers","volume":"40","author":"Samworth","year":"2012","journal-title":"Ann. Statist."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1759","DOI":"10.1016\/j.asoc.2013.01.010","article-title":"New empirical nonparametric kernels for support vector machine classification","volume":"13","author":"Turabieh","year":"2013","journal-title":"Appl. Soft Comput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"McInnes, L., Healy, J., and Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv.","DOI":"10.21105\/joss.00861"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1373","DOI":"10.1162\/089976603321780317","article-title":"Laplacian eigenmaps for dimensionality reduction and data representation","volume":"15","author":"Belkin","year":"2003","journal-title":"Neural Comput."},{"key":"ref_17","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1109\/TSMC.1978.4309958","article-title":"The distance-weighted k-nearest neighbor rule","volume":"8","author":"Dudani","year":"1978","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_19","first-page":"1429","article-title":"A new distance-weighted k-nearest neighbor classifier","volume":"9","author":"Gou","year":"2012","journal-title":"J. Inf. Comput. Sci"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Hong, P., Luo, L., and Lin, C. (2011, January 17\u201318). The Parameter Optimization of Gaussian Function via the Similarity Comparison within Class and between Classes. Proceedings of the 2011 Third Pacific-Asia Conference on Circuits, Communications and System (PACCS), Wuhan, China.","DOI":"10.1109\/PACCS.2011.5990298"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1090\/jams\/852","article-title":"Testing the manifold hypothesis","volume":"29","author":"Fefferman","year":"2016","journal-title":"J. Am. Math. Soc."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"2319","DOI":"10.1126\/science.290.5500.2319","article-title":"A global geometric framework for nonlinear dimensionality reduction","volume":"290","author":"Tenenbaum","year":"2000","journal-title":"Science"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1559","DOI":"10.1007\/s42452-019-1356-9","article-title":"Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets","volume":"1","author":"Ali","year":"2019","journal-title":"SN Appl. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"984","DOI":"10.1016\/j.patcog.2014.09.020","article-title":"Least squares twin multi-class classification support vector machine","volume":"48","author":"Nasiri","year":"2015","journal-title":"Pattern Recognit."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1937","DOI":"10.1016\/j.eswa.2013.08.089","article-title":"Hybrid decision tree and na\u00efve Bayes classifiers for multi-class classification tasks","volume":"41","author":"Farid","year":"2014","journal-title":"Expert Syst. Appl."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"105524","DOI":"10.1016\/j.asoc.2019.105524","article-title":"Investigating the impact of data normalization on classification performance","volume":"97","author":"Singh","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"100804","DOI":"10.1016\/j.patter.2023.100804","article-title":"Leakage and the reproducibility crisis in machine-learning-based science","volume":"4","author":"Kapoor","year":"2023","journal-title":"Patterns"},{"key":"ref_28","unstructured":"Grandini, M., Bagli, E., and Visani, G. (2020). Metrics for multi-class classification: An overview. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.ipm.2009.03.002","article-title":"A systematic analysis of performance measures for classification tasks","volume":"45","author":"Sokolova","year":"2009","journal-title":"Inf. Process. Manag."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.aci.2018.08.003","article-title":"Classification assessment methods","volume":"17","author":"Tharwat","year":"2020","journal-title":"Appl. Comput. Inform."},{"key":"ref_31","unstructured":"Kundu, D. (2005). Advances in Ranking and Selection, Multiple Comparisons, and Reliability: Methodology and Applications, Springer."},{"key":"ref_32","unstructured":"Hajij, M., Zamzmi, G., Papamarkou, T., Maroulas, V., and Cai, X. (2021). Simplicial complex representation learning. arXiv."},{"key":"ref_33","unstructured":"Ramirez-Padron, R., Foregger, D., Manuel, J., Georgiopoulos, M., and Mederos, B. (2010, January 19\u201321). Similarity kernels for nearest neighbor-based outlier detection. Proceedings of the Advances in Intelligent Data Analysis IX: 9th International Symposium, IDA 2010, Tucson, AZ, USA. Proceedings 9."},{"key":"ref_34","unstructured":"Dik, A., Jebari, K., Bouroumi, A., and Ettouhami, A. (2014). Similarity- based approach for outlier detection. arXiv."},{"key":"ref_35","unstructured":"Zhou, D., Bousquet, O., Lal, T., Weston, J., and Sch\u00f6lkopf, B. (2003). Advances in Neural Information Processing Systems, The MIT Press."},{"key":"ref_36","unstructured":"Liu, W., Qian, B., Cui, J., and Liu, J. (2009, January 11\u201317). Spectral kernel learning for semi-supervised classification. Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, Pasadenia, CA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1016\/j.eswa.2017.02.049","article-title":"Feature selection based on FDA and F-score for multi-class classification","volume":"81","author":"Song","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Khan, M.M.R., Arif, R.B., Siddique, M.A.B., and Oishe, M.R. (2018, January 13\u201315). Study and observation of the variation of accuracies of KNN, SVM, LMNN, ENN algorithms on eleven different datasets from UCI machine learning repository. Proceedings of the 2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), Dhaka, Bangladesh.","DOI":"10.1109\/CEEICT.2018.8628041"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1413","DOI":"10.1007\/s11222-016-9696-4","article-title":"Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC","volume":"27","author":"Vehtari","year":"2016","journal-title":"Stat. Comput."},{"key":"ref_40","unstructured":"Guennebaud, G., Jacob, B., Avery, P., Bachrach, A., Barthelemy, S., Becker, C., Benjamin, D., Berger, C., Berres, A., and Luis Blanco, J. (2019, September 25). Eigen, Version v3. Available online: http:\/\/eigen.tuxfamily.org."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/2\/52\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:47:04Z","timestamp":1760107624000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/2\/52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,23]]},"references-count":40,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["make6020052"],"URL":"https:\/\/doi.org\/10.3390\/make6020052","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2024,5,23]]}}}