{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:57:30Z","timestamp":1760245050585,"version":"build-2065373602"},"reference-count":49,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2019,8,12]],"date-time":"2019-08-12T00:00:00Z","timestamp":1565568000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>This paper proposes a geometric estimator of dependency between a pair of multivariate random variables. The proposed estimator of dependency is based on a randomly permuted geometric graph (the minimal spanning tree) over the two multivariate samples. This estimator converges to a quantity that we call the geometric mutual information (GMI), which is equivalent to the Henze\u2013Penrose divergence. between the joint distribution of the multivariate samples and the product of the marginals. The GMI has many of the same properties as standard MI but can be estimated from empirical data without density estimation; making it scalable to large datasets. The proposed empirical estimator of GMI is simple to implement, involving the construction of an minimal spanning tree (MST) spanning over both the original data and a randomly permuted version of this data. We establish asymptotic convergence of the estimator and convergence rates of the bias and variance for smooth multivariate density functions belonging to a H\u00f6lder class. We demonstrate the advantages of our proposed geometric dependency estimator in a series of experiments.<\/jats:p>","DOI":"10.3390\/e21080787","type":"journal-article","created":{"date-parts":[[2019,8,13]],"date-time":"2019-08-13T04:31:21Z","timestamp":1565670681000},"page":"787","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Geometric Estimation of Multivariate Dependency"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4630-5593","authenticated-orcid":false,"given":"Salimeh","family":"Yasaei Sekeh","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA"}]},{"given":"Alfred O.","family":"Hero","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,8,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Lewi, J., Butera, R., and Paninski, L. (2006). Real-time adaptive information theoretic optimization of neurophysiology experiments. Advances in Neural Information Processing Systems, The MIT Press.","DOI":"10.7551\/mitpress\/7503.003.0112"},{"key":"ref_2","unstructured":"Peng, H.C., Herskovits, E.H., and Davatzikos, C. (2002, January 7\u201310). Bayesian Clustering Methods for Morphological Analysis of MR Images. Proceedings of the IEEE International Symposium on Biomedical Imaging, Washington, DC, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Moon, K.R., Noshad, M., Yasaei Sekeh, S., and Hero, A.O. (2017, January 5\u20139). Information theoretic structure learning with confidence. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.","DOI":"10.1109\/ICASSP.2017.7953327"},{"key":"ref_4","first-page":"163","article-title":"Some data analyses using mutual information","volume":"18","author":"Brillinger","year":"2004","journal-title":"Braz. J. Probab. Stat."},{"key":"ref_5","first-page":"1415","article-title":"Feature extraction by non parametric mutual information maximization","volume":"3","author":"Torkkola","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1007\/s00521-013-1368-0","article-title":"A review of feature selection methods based on mutual information","volume":"24","author":"Vergara","year":"2014","journal-title":"Neural Comput. Appl."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1109\/34.574797","article-title":"Evaluation, Application, and Small Sample Performance","volume":"19","author":"Peng","year":"1997","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1007\/11550907_87","article-title":"Mutual information and knearest neighbors approximator for time series prediction","volume":"3697","author":"Sorjamaa","year":"2005","journal-title":"Lecture Notes Comput. Sci."},{"key":"ref_10","unstructured":"Mohamed, S., and Rezende, D.J. (2015). Variational information maximization for intrinsically motivated reinforcement learning. Advances in Neural Information Processing Systems, The MIT Press."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Neemuchwala, H., and Hero, A.O. (2005). Entropic graphs for registration. Multi-Sensor Image Fusion and its Applications, CRC Press Book.","DOI":"10.1201\/9781420026986.ch6"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1002\/ima.20079","article-title":"Image registration methods in high-dimensional space","volume":"16","author":"Neemuchwala","year":"2006","journal-title":"Int. J. Imaging Syst. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1214\/aos\/1018031112","article-title":"On the multivarite runs test","volume":"27","author":"Henze","year":"1999","journal-title":"Ann. Stat."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"988","DOI":"10.1109\/LSP.2014.2378514","article-title":"Empirical non-parametric estimation of the Fisher information","volume":"22","author":"Berisha","year":"2015","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1109\/TSP.2015.2477805","article-title":"Empirically estimable classification bounds based on a nonparametric divergence measure","volume":"64","author":"Berisha","year":"2016","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1109\/TCOM.1967.1089532","article-title":"The divergence and Bhattacharyya distance measures in signal selection","volume":"15","author":"Kailath","year":"1967","journal-title":"IEEE Trans. Commun. Technol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yasaei Sekeh, S., Oselio, B., and Hero, A.O. (2018, January 2\u20135). Multi-class Bayes error estimation with a global minimal spanning tree. Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA.","DOI":"10.1109\/ALLERTON.2018.8635642"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Noshad, M., Moon, K.R., Yasaei Sekeh, S., and Hero, A.O. (2017, January 25\u201330). Direct Estimation of Information Divergence Using Nearest Neighbor Ratios. Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany.","DOI":"10.1109\/ISIT.2017.8006659"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"March, W., Ram, P., and Gray, A. (2010, January 20\u201323). Fast Euclidean minimum spanning tree: Algorithm, analysis, and applications. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA.","DOI":"10.1145\/1835804.1835882"},{"key":"ref_20","first-page":"37","article-title":"O jist\u00e9m probl\u00e9mu minim\u00e1ln\u00edm","volume":"3","year":"1926","journal-title":"Pr\u00e1ce Moravsk\u00e9 Pridovedeck\u00e9 Spolecnosti"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"066","DOI":"10.1103\/PhysRevE.69.066138","article-title":"Estimating mutual information","volume":"69","author":"Kraskov","year":"2004","journal-title":"Phys. Rev. E"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Moon, K.R., Sricharan, K., and Hero, A.O. (2017, January 25\u201330). Ensemble Estimation of Mutual Information. Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany.","DOI":"10.1109\/ISIT.2017.8007086"},{"key":"ref_23","unstructured":"Moon, K.R., and Hero, A.O. (2014, January 8\u201313). Multivariate f-divergence estimation with confidence. Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Moon, K.R., Sricharan, K., Greenewald, K., and Hero, A.O. (2016, January 10\u201315). Improving convergence of divergence functional ensemble estimators. Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain.","DOI":"10.1109\/ISIT.2016.7541476"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2153","DOI":"10.1214\/07-AOS539","article-title":"A class of R\u00e9nyi information estimators for multidimensional densities","volume":"36","author":"Leonenko","year":"2008","journal-title":"Ann. Stat."},{"key":"ref_26","unstructured":"Gao, S., Ver Steeg, G., and Galstyan, A. (2015, January 9\u201312). Efficient estimation of mutual information for strongly dependent variables. Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA."},{"key":"ref_27","unstructured":"P\u00e1l, D., P\u00f3czos, B., and Szapesv\u00e1ri, C. (2010, January 6\u20139). Estimation of R\u00e9nyi entropy and mutual information based on generalized nearest-neighbor graphs. Proceedings of the 24th Annual Conference on Neural Information Processing Systems 2010, Vancouver, BC, Canada."},{"key":"ref_28","unstructured":"Krishnamurthy, A., Kandasamy, K., P\u00f3czos, B., and Wasserman, L. (2014, January 21\u201326). Nonparametric estimation of R\u00e9nyi divergence and friends. Proceedings of the 31st International Conference on Machine Learning, Beijing, China."},{"key":"ref_29","unstructured":"Kandasamy, K., Krishnamurthy, A., P\u00f3czos, B., Wasserman, L., and Robins, J. (2015, January 7\u201312). Nonparametric von mises estimators for entropies, divergences and mutual informations. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_30","unstructured":"Singh, S., and P\u00f3czos, B. (2016, January 5\u201310). Analysis of k nearest neighbor distances with application to entropy estimation. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"80","DOI":"10.3390\/e15010080","article-title":"Machine learning with squared-loss mutual information","volume":"15","author":"Sugiyama","year":"2012","journal-title":"Entropy"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1109\/TSP.2004.831130","article-title":"Geodesic entropic graphs for dimension and entropy estimation in manifold learning","volume":"52","author":"Costa","year":"2004","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1214\/aos\/1176344722","article-title":"Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests","volume":"7","author":"Friedman","year":"1979","journal-title":"Ann. Stat."},{"key":"ref_34","first-page":"3","article-title":"On the estimation of the discrepancy between empirical curves of distribution for two independent samples","volume":"2","author":"Smirnov","year":"1939","journal-title":"Bull. Moscow Univ."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1214\/aoms\/1177731909","article-title":"On a test whether two samples are from the same population","volume":"11","author":"Wald","year":"1940","journal-title":"Ann. Math. Stat."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Noshad, M., and Hero, A.O. (2019, January 12\u201317). Scalable mutual information estimation using dependence graphs. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTAT), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683351"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Yasaei Sekeh, S., Oselio, B., and Hero, A.O. (2018, January 15\u201320). A Dimension-Independent discriminant between distributions. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8462306"},{"key":"ref_38","first-page":"299","article-title":"Information-type measures of difference of probability distributions and indirect observations","volume":"2","year":"1967","journal-title":"Studia Sci. Math. Hungar."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1111\/j.2517-6161.1966.tb00626.x","article-title":"A general class of coefficients of divergence of one distribution from another","volume":"28","author":"Ali","year":"1966","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_40","unstructured":"Cover, T., and Thomas, J. (1991). Elements of Information Theory, John Wiley & Sons. [1st ed.]."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"H\u00e4rdle, W. (1991). Applied Nonparametric Regression, Cambridge University Press.","DOI":"10.1017\/CCOL0521382483"},{"key":"ref_42","unstructured":"Lorentz, G.G. (1966). Approximation of Functions, Holt, Rinehart and Winston."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1006\/acha.1997.0219","article-title":"Characterization of pointwise H\u00f6lder regularity","volume":"4","author":"Andersson","year":"1997","journal-title":"Appl. Comput. Harmon. Anal."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Robins, G., and Salowe, J.S. (1994, January 6\u20138). On the maximum degree of minimum spanning trees. Proceedings of the SCG 94 Tenth Annual Symposium on Computational Geometry, Stony Brook, NY, USA.","DOI":"10.1145\/177424.177978"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1214\/aos\/1176345462","article-title":"The jackknife estimate of variance","volume":"9","author":"Efron","year":"1981","journal-title":"Ann. Stat."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Yasaei Sekeh, S., Noshad, M., Moon, K.R., and Hero, A.O. (2018). Convergence Rates for Empirical Estimation of Binary Classification Bounds. arXiv.","DOI":"10.3390\/e21121144"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Kingman, J.F.C. (1993). Poisson Processes, Oxford Univ. Press.","DOI":"10.1093\/oso\/9780198536932.001.0001"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Yukich, J.E. (1998). Probability Theory of Classical Euclidean Optimization, Springer. Vol. 1675 of Lecture Notes in Mathematics.","DOI":"10.1007\/BFb0093472"},{"key":"ref_49","unstructured":"Luenberger, D.G. (1969). Optimization by Vector Space Methods, Wiley-Interscience."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/8\/787\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:10:35Z","timestamp":1760188235000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/8\/787"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,12]]},"references-count":49,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2019,8]]}},"alternative-id":["e21080787"],"URL":"https:\/\/doi.org\/10.3390\/e21080787","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,8,12]]}}}