{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:51:25Z","timestamp":1760241085874,"version":"build-2065373602"},"reference-count":53,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2019,11,23]],"date-time":"2019-11-23T00:00:00Z","timestamp":1574467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Bounding the best achievable error probability for binary classification problems is relevant to many applications including machine learning, signal processing, and information theory. Many bounds on the Bayes binary classification error rate depend on information divergences between the pair of class distributions. Recently, the Henze\u2013Penrose (HP) divergence has been proposed for bounding classification error probability. We consider the problem of empirically estimating the HP-divergence from random samples. We derive a bound on the convergence rate for the Friedman\u2013Rafsky (FR) estimator of the HP-divergence, which is related to a multivariate runs statistic for testing between two distributions. The FR estimator is derived from a multicolored Euclidean minimal spanning tree (MST) that spans the merged samples. We obtain a concentration inequality for the Friedman\u2013Rafsky estimator of the Henze\u2013Penrose divergence. We validate our results experimentally and illustrate their application to real datasets.<\/jats:p>","DOI":"10.3390\/e21121144","type":"journal-article","created":{"date-parts":[[2019,11,25]],"date-time":"2019-11-25T03:10:00Z","timestamp":1574651400000},"page":"1144","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Convergence Rates for Empirical Estimation of Binary Classification Bounds"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4630-5593","authenticated-orcid":false,"given":"Salimeh Yasaei","family":"Sekeh","sequence":"first","affiliation":[{"name":"School of Computing and Information Science, University of Maine, Orono, ME 04469, USA"}]},{"given":"Morteza","family":"Noshad","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Kevin R.","family":"Moon","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Statistics, Utah State University, Logan, UT 84322, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2531-9670","authenticated-orcid":false,"given":"Alfred O.","family":"Hero","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,11,23]]},"reference":[{"key":"ref_1","unstructured":"Xuan, G., Chia, P., and Wu, M. (1996, January 25\u201329). Bhattacharyya distance feature selection. Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Hamza, A., and Krim, H. (2003). Image registration and segmentation by maximizing the Jensen-Renyi divergence. Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2003, Springer.","DOI":"10.1007\/978-3-540-45063-4_10"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1109\/97.923043","article-title":"Blind source separation using Renyi\u2019s mutual information","volume":"8","author":"Hild","year":"2001","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1016\/j.sigpro.2012.09.003","article-title":"Divergence measures for statistical data processing\u2013An annotated bibliography","volume":"93","author":"Basseville","year":"2013","journal-title":"Signal Process."},{"key":"ref_5","first-page":"401","article-title":"On a measure of divergence between two multinomial populations","volume":"7","author":"Battacharyya","year":"1946","journal-title":"Sankhy \u0101 Indian J. Stat."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1109\/18.61115","article-title":"Divergence Measures Based on the Shannon Entropy","volume":"37","author":"Lin","year":"1991","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"988","DOI":"10.1109\/LSP.2014.2378514","article-title":"Empirical non-parametric estimation of the Fisher information","volume":"22","author":"Berisha","year":"2015","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1109\/TSP.2015.2477805","article-title":"Empirically estimable classification bounds based on a nonparametric divergence measure","volume":"64","author":"Berisha","year":"2016","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_9","unstructured":"Moon, K., and Hero, A. (2014, January 8\u201313). Multivariate f-divergence estimation with confidence. Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada."},{"key":"ref_10","unstructured":"Moon, K., and Hero, A. (July, January 29). Ensemble estimation of multivariate f-divergence. Proceedings of the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Moon, K., Sricharan, K., Greenewald, K., and Hero, A. (2016, January 10\u201315). Improving convergence of divergence functional ensemble estimators. Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain.","DOI":"10.1109\/ISIT.2016.7541476"},{"key":"ref_12","unstructured":"Moon, K., Sricharan, K., Greenewald, K., and Hero, A. (2016). Nonparametric ensemble estimation of distributional functionals. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Noshad, M., Moon, K., Yasaei Sekeh, S., and Hero, A. (2017, January 25\u201330). Direct Estimation of Information Divergence Using Nearest Neighbor Ratios. Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany.","DOI":"10.1109\/ISIT.2017.8006659"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yasaei Sekeh, S., Oselio, B., and Hero, A. (2018, January 15\u201320). A Dimension-Independent discriminant between distributions. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462306"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Noshad, M., and Hero, A. (2018, January 15\u201320). Rate-optimal Meta Learning of Classification Error. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461949"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wisler, A., Berisha, V., Wei, D., Ramamurthy, K., and Spanias, A. (2016, January 20\u201325). Empirically-estimable multi-class classification bounds. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China.","DOI":"10.1109\/ICASSP.2016.7472146"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yukich, J. (1998). Probability Theory of Classical Euclidean Optimization, Springer. Lecture Notes in Mathematics.","DOI":"10.1007\/BFb0093472"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1214\/aos\/1176349952","article-title":"An Efron\u2013Stein inequality for nonsymmetric statistics","volume":"14","author":"Steele","year":"1986","journal-title":"Ann. Stat."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1007\/BF01194923","article-title":"Asymptotic for Euclidean minimal spanning trees on random points","volume":"92","author":"Aldous","year":"1992","journal-title":"Probab. Theory Relat. Fields"},{"key":"ref_20","unstructured":"Ma, B., Hero, A., Gorman, J., and Michel, O. (2000, January 10\u201313). Image registration with minimal spanning tree algorithm. Proceedings of the IEEE International Conference on Image Processing, Vancouver, BC, Canada."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1016\/j.sigpro.2004.10.002","article-title":"Image registration using entropy measures and entropic graphs","volume":"85","author":"Neemuchwala","year":"2005","journal-title":"Eur. J. Signal Process."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1109\/MSP.2002.1028355","article-title":"Applications of entropic spanning graphs","volume":"19","author":"Hero","year":"2002","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_23","unstructured":"Hero, A., and Michel, O. (1999, January 16). Estimation of R\u00e9nyi information divergence via pruned minimal spanning trees. Proceedings of the IEEE Workshop on Higher Order Statistics, Caesarea, Isreal."},{"key":"ref_24","first-page":"3","article-title":"On the estimation of the discrepancy between empirical curves of distribution for two independent samples","volume":"2","author":"Smirnov","year":"1939","journal-title":"Bull. Mosc. Univ."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1214\/aoms\/1177731909","article-title":"On a test whether two samples are from the same population","volume":"11","author":"Wald","year":"1940","journal-title":"Ann. Math. Stat."},{"key":"ref_26","unstructured":"Gibbons, J. (1971). Nonparametric Statistical Inference, McGraw-Hill."},{"key":"ref_27","unstructured":"Singh, S., and P\u00f3czos, B. (1997). Probability Theory and Combinatorial Optimization, Society for Industrial and Applied Mathematics (SIAM). CBMF-NSF Regional Conference in Applied Mathematics."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1057","DOI":"10.1214\/aoap\/1177004902","article-title":"Limit theorems and rates of convergence for Euclidean functionals","volume":"4","author":"Redmond","year":"1994","journal-title":"Ann. Appl. Probab."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1016\/0304-4149(95)00075-5","article-title":"Asymptotics for Euclidean functionals with power weighted edges","volume":"6","author":"Redmond","year":"1996","journal-title":"Stoch. Process. Their Appl."},{"key":"ref_30","unstructured":"Hero, A., Costa, J., and Ma, B. (2019, November 18). Convergence Rates of Minimal Graphs with Random Vertices. Available online: https:\/\/pdfs.semanticscholar.org\/7817\/308a5065aa0dd44098319eb66f81d4fa7a14.pdf."},{"key":"ref_31","unstructured":"Hero, A., Costa, J., and Ma, B. (2003). Asymptotic Relations between Minimal Graphs and Alpha-Entropy, Communication and Signal Processing Laboratory (CSPL), Department EECS, University of Michigan. Tech. Rep."},{"key":"ref_32","unstructured":"Lorentz, G. (1996). Approximation of Functions, Holt, Rinehart and Winston."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1007\/BF02699376","article-title":"Concentration of measure and isoperimetric inequalities in product spaces","volume":"81","author":"Talagrand","year":"1995","journal-title":"Publications Math\u00e9matiques de i\u2019I. H. E. S."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","article-title":"On information and sufficiency","volume":"22","author":"Kullback","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_35","unstructured":"R\u00e9nyi, A. (July, January 20). On measures of entropy and information. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1111\/j.2517-6161.1966.tb00626.x","article-title":"A general class of coefficients of divergence of one distribution from another","volume":"28","author":"Ali","year":"1996","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_37","first-page":"300","article-title":"Comprehensive survey on distance\/similarity measures between probability density functions","volume":"1","author":"Cha","year":"2007","journal-title":"Int. J. Math. Models Methods Appl. Sci."},{"key":"ref_38","unstructured":"Rukhin, A. (September, January 29). Optimal estimator for the mixture parameter by the method of moments and information affinity. Proceedings of the 12th Prague Conference on Information Theory, Prague, Czech Republic."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/0031-3203(80)90066-7","article-title":"The relative neighborhood graph of a finite planar set","volume":"12","author":"Toussaint","year":"1980","journal-title":"Pattern Recognit."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1109\/T-C.1971.223083","article-title":"Graph-theoretical methods for detecting and describing Gestalt clusters","volume":"100","author":"Zahn","year":"1971","journal-title":"IEEE Trans. Comput."},{"key":"ref_41","unstructured":"Joseph Newton, H. (1992). The minimal spanning tree for nonparametric regression and structure discovery. Computing Science and Statistics, Proceedings of the 24th Symposium on the Interface, Interface Foundation of North America."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/0167-8655(83)90059-4","article-title":"A test of randomness based on the minimal spanning tree","volume":"1","author":"Hoffman","year":"1983","journal-title":"Pattern Recognit. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1214\/aos\/1176345462","article-title":"The jackknife estimate of variance","volume":"9","author":"Efron","year":"1981","journal-title":"Ann. Stat."},{"key":"ref_44","unstructured":"Singh, S., and P\u00f3czos, B. (2014, January 22\u201324). Generalized exponential concentration inequality for R\u00e9nyi divergence estimation. Proceedings of the 31st International Conference on Machine Learning (ICML-14), Bejing, China."},{"key":"ref_45","unstructured":"Singh, S., and P\u00f3czos, B. (2014, January 8\u201313). Exponential concentration of a density functional estimator. Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada."},{"key":"ref_46","unstructured":"Lichman, M. (2019, November 18). UCI Machine Learning Repository. Available online: https:\/\/www.re3data.org\/repository\/r3d100010960."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bhatt, R.B., Sharma, G., Dhall, A., and Chaudhury, S. (2009, January 16\u201318). Efficient skin region segmentation using low complexity fuzzy decision tree model. Proceedings of the IEEE-INDICON, Ahmedabad, India.","DOI":"10.1109\/INDCON.2009.5409447"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"809","DOI":"10.2307\/3214207","article-title":"On the number of leaves of a euclidean minimal spanning tree","volume":"24","author":"Steele","year":"1987","journal-title":"J. Appl. Prob."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1214\/aos\/1018031112","article-title":"On the multivarite runs test","volume":"27","author":"Henze","year":"1999","journal-title":"Ann. Stat."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"794","DOI":"10.1214\/aoap\/1177005364","article-title":"A matching problem and subadditive Euclidean funetionals","volume":"3","author":"Rhee","year":"1993","journal-title":"Ann. Appl. Prob."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Whittaker, E., and Watson, G. (1996). A Course in Modern Analysis, Cambridge University Press. [4th ed.].","DOI":"10.1017\/CBO9780511608759"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Kingman, J. (1993). Poisson Processes, Oxford Univ. Press.","DOI":"10.1093\/oso\/9780198536932.001.0001"},{"key":"ref_53","unstructured":"P\u00e1l, D., P\u00f3czos, B., and Szapesv\u00e1ri, C. (2010, January 6\u20139). Estimation of Renyi entropy andmutual information based on generalized nearest-neighbor graphs. Proceedings of the 23th International Conference on Neural Information Processing Systems (NIPS 2010), Vancouver, BC, Canada."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/12\/1144\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:36:55Z","timestamp":1760189815000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/12\/1144"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,23]]},"references-count":53,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2019,12]]}},"alternative-id":["e21121144"],"URL":"https:\/\/doi.org\/10.3390\/e21121144","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,11,23]]}}}