{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T16:23:43Z","timestamp":1775060623483,"version":"3.50.1"},"reference-count":299,"publisher":"Emerald","issue":"1-2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,6,28]]},"abstract":"<jats:p>A Hilbert space embedding of a distribution\u2014in short, a kernel mean embedding\u2014has recently emerged as a powerful tool for machine learning and statistical inference. The basic idea behind this framework is to map distributions into a reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel methods can be extended to probability measures. It can be viewed as a generalization of the original \u201cfeature map\u201d common to support vector machines (SVMs) and other kernel methods. In addition to the classical applications of kernel methods, the kernel mean embedding has found novel applications in fields ranging from probabilistic modeling to statistical inference, causal discovery, and deep learning. This survey aims to give a comprehensive review of existing work and recent advances in this research area, and to discuss challenging issues and open problems that could potentially lead to new research directions. The survey begins with a brief introduction to the RKHS and positive definite kernels which forms the backbone of this survey, followed by a thorough discussion of the Hilbert space embedding of marginal distributions, theoretical guarantees, and a review of its applications. The embedding of distributions enables us to apply RKHS methods to probability measures which prompts a wide range of applications such as kernel two-sample testing, independent testing, and learning on distributional data. Next, we discuss the Hilbert space embedding for conditional distributions, give theoretical insights, and review some applications. The conditional mean embedding enables us to perform sum, product, and Bayes' rules\u2014which are ubiquitous in graphical model, probabilistic inference, and reinforcement learning\u2014 in a non-parametric way using this new representation of distributions. We then discuss relationships between this framework and other related areas. Lastly, we give some suggestions on future research directions.<\/jats:p>","DOI":"10.1561\/2200000060","type":"journal-article","created":{"date-parts":[[2017,6,28]],"date-time":"2017-06-28T04:34:57Z","timestamp":1498624497000},"page":"1-141","source":"Crossref","is-referenced-by-count":283,"title":["Kernel Mean Embedding of Distributions: A Review and Beyond"],"prefix":"10.1561","volume":"10","author":[{"given":"Krikamol","family":"Muandet","sequence":"first","affiliation":[{"name":"Department of Mathematics, Faculty of Science, 272 Rama VI Road, Ratchathewi, Bangkok 10400, Thailand"},{"name":"Max Planck Institute for Intelligent Systems Empirical Inference Department, Spemannstra\u00dfe 38, T\u00fcbingen 72076,","place":["Germany"]}]},{"given":"Kenji","family":"Fukumizu","sequence":"additional","affiliation":[{"name":"Institute of Statistical Mathematics , 10-3 Midoricho, Tachikawa, Tokyo 190-8562,","place":["Japan"]}]},{"given":"Bharath","family":"Sriperumbudur","sequence":"additional","affiliation":[{"name":"Department of Statistics, Pennsylvania State University, University Park , 16802,","place":["PA, USA"]}]},{"given":"Bernhard","family":"Sch\u00f6lkopf","sequence":"additional","affiliation":[{"name":"Max Planck Institute for Intelligent Systems , Spemannstra\u00dfe 38, T\u00fcbingen 72076,","place":["Germany"]}]}],"member":"140","published-online":{"date-parts":[[2017,6,28]]},"reference":[{"key":"2026033012243668900_ref001","first-page":"263","volume-title":"Proceeding of 4th International Symposium on Independent Component Analysis and Blind Source Separation (ICA2003)","author":"Achard"},{"key":"2026033012243668900_ref002","volume-title":"Kernel Methods for Nonparametric Bayesian Inference of Probability Densities and Point Processes","author":"Adams","year":"2009"},{"key":"2026033012243668900_ref003","volume-title":"Sobolev spaces","author":"Adams","year":"2003"},{"key":"2026033012243668900_ref004","volume-title":"Theory of linear operators in Hilbert space","author":"Akhiezer"},{"issue":"7","key":"2026033012243668900_ref005","doi-asserted-by":"crossref","first-page":"3730","DOI":"10.1016\/j.csda.2007.12.013","article-title":"A test for the two-sample problem based on empirical characteristic functions","volume":"52","author":"Alba Fern\u00e0ndez","year":"2008","journal-title":"Computational Statistics & Data Analysis"},{"issue":"3","key":"2026033012243668900_ref006","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1561\/2200000036","article-title":"Kernels for vector-valued functions: A review","volume":"4","author":"","year":"2012","journal-title":"Foundation and Trends in Machine Learning"},{"key":"2026033012243668900_ref007","first-page":"285","volume-title":"Statistical Signal Processing Workshop","author":"Anderson","year":"2011"},{"issue":"1","key":"2026033012243668900_ref008","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1006\/jmva.1994.1033","article-title":"Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates","volume":"50","author":"Anderson","year":"1994","journal-title":"Journal of Multivariate Analysis"},{"issue":"4","key":"2026033012243668900_ref009","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1007\/s11222-008-9110-y","article-title":"A tutorial on adaptive MCMC","volume":"18","author":"Andrieu","year":"2008","journal-title":"Statistics and Computing"},{"issue":"3","key":"2026033012243668900_ref010","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1090\/S0002-9947-1950-0051437-7","article-title":"Theory of reproducing kernels","volume":"68","author":"Aronszajn","year":"1950","journal-title":"Transactions of the American Mathematical Society"},{"issue":"4","key":"2026033012243668900_ref011","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1111\/j.1467-842X.2004.00360.x","article-title":"Partial correlation and conditional correlation as measures of conditional independence","volume":"46","author":"Baba","year":"2004","journal-title":"Australian & New Zealand Journal of Statistics"},{"key":"2026033012243668900_ref012","first-page":"185","article-title":"Sharp analysis of low-rank kernel matrix approximations","volume":"30","author":"Bach","year":"2013","journal-title":"The 26th Annual Conference on Learning Theory"},{"key":"2026033012243668900_ref013","author":"Bach","year":"2015"},{"key":"2026033012243668900_ref014","first-page":"1","article-title":"Kernel independent component analysis","volume":"3","author":"Bach","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref015","first-page":"1359","article-title":"On the equivalence between herding and conditional gradient algorithms","author":"Bach","year":"2012","journal-title":"Proceedings of the 29th International Conference on Machine Learning"},{"issue":"2","key":"2026033012243668900_ref016","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1137\/0119044","article-title":"Mutual information for Gaussian processes","volume":"19","author":"Baker","year":"1970","journal-title":"SIAM Journal on Applied Mathematics"},{"key":"2026033012243668900_ref017","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1090\/S0002-9947-1973-0336795-3","article-title":"Joint measures and cross-covariance operators","volume":"186","author":"Baker","year":"1973","journal-title":"Transactions of the American Mathematical Society"},{"issue":"1","key":"2026033012243668900_ref018","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1016\/j.jco.2006.07.001","article-title":"On regularization algorithms in learning theory","volume":"23","author":"Bauer","year":"2007","journal-title":"Journal of Complexity"},{"issue":"6","key":"2026033012243668900_ref019","doi-asserted-by":"crossref","first-page":"1554","DOI":"10.1214\/aoms\/1177699147","article-title":"Statistical inference for probabilistic functions of finite state Markov chains","volume":"37","author":"Baum","year":"1966","journal-title":"Annals of Mathematical Statistics"},{"issue":"3","key":"2026033012243668900_ref020","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1007\/s00357-011-9092-x","article-title":"On the Schoenberg transformations in data analysis: Theory and illustrations","volume":"28","author":"Bavaud","year":"2011","journal-title":"Journal of Classification"},{"key":"2026033012243668900_ref021","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1512\/iumj.1957.6.56038","article-title":"A Markovian decision process","volume":"6","author":"Bellman","year":"1957","journal-title":"Indiana University Mathematics Journal"},{"key":"2026033012243668900_ref022","author":"Bellman","year":"2003","journal-title":"Dynamic Programming"},{"issue":"3","key":"2026033012243668900_ref023","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1016\/0047-259X(83)90018-0","article-title":"Estimating the mean function of a Gaussian process and the Stein effect","volume":"13","author":"Berger","year":"1983","journal-title":"Journal of Multivariate Analysis"},{"key":"2026033012243668900_ref024","volume-title":"EURANDOM-report","author":"Bergsma","year":"2004"},{"key":"2026033012243668900_ref025","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4419-9096-9","volume-title":"Reproducing Kernel Hilbert Spaces in Probability and Statistics","author":"Berlinet","year":"2004"},{"key":"2026033012243668900_ref026","first-page":"2535","volume-title":"Advances in Neural Information Processing Systems 26","author":"Besserve","year":"2013"},{"issue":"1","key":"2026033012243668900_ref027","first-page":"99","article-title":"On a measure of divergence between two statistical populations defined by their probability distributions","volume":"35","author":"Bhattacharyya","year":"1943","journal-title":"Bulletin of Calcutta Mathematical Society"},{"issue":"11","key":"2026033012243668900_ref028","doi-asserted-by":"crossref","first-page":"3965","DOI":"10.1109\/TIT.2005.856979","article-title":"On the asymptotic properties of a nonparametric l\/sub 1\/-test statistic of homogeneity","volume":"51","author":"Biau","year":"2005","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033012243668900_ref029","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop","year":"2006"},{"key":"2026033012243668900_ref030","first-page":"2178","volume-title":"Advances in Neural Information Processing Systems","author":"Blanchard","year":"2011"},{"issue":"3","key":"2026033012243668900_ref031","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/biomet\/63.3.573","article-title":"Some properties of incomplete U-statistics","volume":"63","author":"Blom","year":"1976","journal-title":"Biometrika"},{"key":"2026033012243668900_ref032","first-page":"52","volume-title":"Proceedings of the International Conference on Subspace, Latent Structure and Feature Selection","author":"Blum","year":"2005"},{"issue":"1","key":"2026033012243668900_ref033","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1007\/BF01452844","article-title":"Monotone funktionen, Stieltjessche integrale und harmonische analyse","volume":"108","author":"Bochner","year":"1933","journal-title":"Mathematische Annalen"},{"key":"2026033012243668900_ref034","first-page":"92","author":"Boots","year":"2013","journal-title":"Proceedings of the 29th International Conference on Uncertainty in Artificial Intelligence"},{"issue":"14","key":"2026033012243668900_ref035","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1093\/bioinformatics\/btl242","article-title":"Integrating structured biological data by kernel maximum mean discrepancy","volume":"22","author":"Borgwardt","year":"2006","journal-title":"Bioinformatics"},{"key":"2026033012243668900_ref036","first-page":"144","volume-title":"Proceedings of the 5th Annual Workshop on Computational Learning Theory","author":"Boser","year":"1992"},{"key":"2026033012243668900_ref037","first-page":"352","volume-title":"Advances in Neural Information Processing Systems 29","author":"Bouchacourt","year":"2016"},{"key":"2026033012243668900_ref038","first-page":"20","volume-title":"Proceedings of the 32nd International Conference on Machine Learning","author":"Bounliphone","year":"2015"},{"key":"2026033012243668900_ref039","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Bounliphone","year":"2016"},{"key":"2026033012243668900_ref040","first-page":"121","volume-title":"Data Mining and Knowledge Discovery","author":"Burges","year":"1998"},{"key":"2026033012243668900_ref041","first-page":"331","volume-title":"Foundations of Computational Mathematics","author":"Caponnetto","year":"2007"},{"key":"2026033012243668900_ref042","first-page":"1615","article-title":"Universal multi-task kernels","volume":"9","author":"Caponnetto","year":"2008","journal-title":"Journal of Machine Learning Research"},{"issue":"04","key":"2026033012243668900_ref043","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1142\/S0219530506000838","volume":"04","author":"Carmeli","year":"2006","journal-title":"Analysis and Applications"},{"key":"2026033012243668900_ref044","first-page":"416","volume-title":"Advances in Neural Information Processing Systems 13","author":"Chapelle","year":"2001"},{"key":"2026033012243668900_ref045","first-page":"109","volume-title":"Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence","author":"Chen","year":"2010"},{"issue":"7","key":"2026033012243668900_ref046","doi-asserted-by":"crossref","first-page":"1484","DOI":"10.1162\/NECO_a_00599","volume":"26","author":"Chen","year":"2014","journal-title":"Neural Computation"},{"key":"2026033012243668900_ref047","first-page":"406","volume-title":"Advances in Neural Information Processing Systems","author":"Christmann","year":"2010"},{"key":"2026033012243668900_ref048","first-page":"1422","article-title":"A kernel independence test for random processes","volume":"32","author":"Chwialkowski","year":"2014","journal-title":"Proceedings of The 31st International Conference on Machine Learning"},{"key":"2026033012243668900_ref049","first-page":"3608","volume-title":"Advances in Neural Information Processing Systems 27","author":"Chwialkowski","year":"2014"},{"key":"2026033012243668900_ref050","first-page":"1972","volume-title":"Advances in Neural Information Processing Systems 28","author":"Chwialkowski","year":"2015"},{"key":"2026033012243668900_ref051","first-page":"2606","volume-title":"Proceedings of the 33nd International Conference on Machine Learning","author":"Chwialkowski","year":"2016"},{"issue":"3","key":"2026033012243668900_ref052","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1023\/A:1022627411411","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Machine Learning"},{"key":"2026033012243668900_ref053","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1145\/1102351.1102371","volume-title":"Proceedings of the 22nd International Conference on Machine Learning","author":"Cortes","year":"2005"},{"key":"2026033012243668900_ref054","first-page":"5274","author":"Cruz Cort\u00e9s","year":"2014","journal-title":"IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)"},{"issue":"1","key":"2026033012243668900_ref055","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1090\/S0273-0979-01-00923-5","article-title":"On the mathematical foundations of learning","volume":"39","author":"Cucker","year":"2002","journal-title":"Bulletin of the American Mathematical Society"},{"key":"2026033012243668900_ref056","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511618796","volume-title":"Learning Theory: An approximation theory viewpoint","author":"Cucker","year":"2007"},{"key":"2026033012243668900_ref057","first-page":"1169","article-title":"Semigroup kernels on measures","volume":"6","author":"Cuturi","year":"2005","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref058","first-page":"223","article-title":"Testing hypotheses by regularized maximum mean discrepancy","volume":"3","author":"Danafar","year":"2014","journal-title":"International Journal of Computer and Information Technology"},{"issue":"1","key":"2026033012243668900_ref059","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1002\/rsa.10073","article-title":"An elementary proof of a theorem of Johnson and Lindenstrauss","volume":"22","author":"Dasgupta","year":"2003","journal-title":"Random Structures & Algorithms"},{"issue":"3","key":"2026033012243668900_ref060","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1093\/biomet\/67.3.581","article-title":"Partial association measures and an application to qualitative regression","volume":"67","author":"Daudin","year":"1980","journal-title":"Biometrika"},{"key":"2026033012243668900_ref061","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511618864","volume-title":"Linear Operators and Their Spectra","author":"Davies","year":"2007"},{"issue":"1","key":"2026033012243668900_ref062","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1979.tb01052.x","article-title":"Conditional independence in statistical theory","volume":"41","author":"Dawid","year":"1979","journal-title":"Journal of the Royal Statistical Society. Series B (Methodological)"},{"key":"2026033012243668900_ref063","volume-title":"Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems","author":"Dayan","year":"2005"},{"key":"2026033012243668900_ref064","author":"De Vito","year":"2006"},{"key":"2026033012243668900_ref065","first-page":"551","volume-title":"Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Dhillon","year":"2004"},{"key":"2026033012243668900_ref066","doi-asserted-by":"crossref","DOI":"10.1090\/surv\/015","volume-title":"Vector Measures","author":"Diestel","year":"1977"},{"key":"2026033012243668900_ref067","doi-asserted-by":"crossref","DOI":"10.1002\/9781118033012","volume-title":"Vector Integration and Stochastic Integration in Banach Spaces","author":"Dinculeanu","year":"2000"},{"key":"2026033012243668900_ref068","first-page":"1660","volume-title":"Proceedings of the 27th AAAI Conference on Artificial Intelligence","author":"Doran","year":"2013"},{"key":"2026033012243668900_ref069","first-page":"132","article-title":"A permutation-based kernel conditional independence test","author":"Doran","year":"2014","journal-title":"30th Conference on Uncertainty in Artificial Intelligence"},{"key":"2026033012243668900_ref070","volume-title":"Pattern Classification and Scene Analysis","author":"Duda","year":"1973"},{"key":"2026033012243668900_ref071","first-page":"1","volume-title":"Proceedings of the 5th International Conference on Theory and Applications of Models of Computation (TAMC2008)","author":"Dwork","year":"2008"},{"key":"2026033012243668900_ref072","first-page":"258","volume-title":"Proceedings of the 31st Conference on Uncertainty in Artificial Intel ligence","author":"Dziugaite","year":"2015"},{"key":"2026033012243668900_ref073","first-page":"154","volume-title":"Proceedings of the 20th International Conference on Machine Learning","author":"Engel","year":"2003"},{"key":"2026033012243668900_ref074","doi-asserted-by":"crossref","DOI":"10.1007\/978-94-009-1740-8","volume-title":"Regularization of Inverse Problems","author":"Engl","year":"1996"},{"key":"2026033012243668900_ref075","first-page":"154","volume-title":"Advances in Neural Information Processing Systems 27","author":"Eslami","year":"2014"},{"issue":"1","key":"2026033012243668900_ref076","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1214\/aos\/1176343742","article-title":"The empirical characteristic function and its applications","volume":"5","author":"Feuerverger","year":"1977","journal-title":"The Annals of Statistics"},{"key":"2026033012243668900_ref077","first-page":"243","article-title":"Efficient SVM training using low-rank kernel representations","volume":"2","author":"Fine","year":"2001","journal-title":"Journal of Machine Learning Research"},{"issue":"2","key":"2026033012243668900_ref078","first-page":"22:1","article-title":"Gaussian processes for independence tests with non-iid data in causal inference","volume":"7","author":"Flaxman","year":"2015","journal-title":"ACM Transactions on Intelligent Systems and Technology"},{"key":"2026033012243668900_ref079","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1145\/2783258.2783300","volume-title":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Flaxman","year":"2015"},{"key":"2026033012243668900_ref080","first-page":"182","volume-title":"Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence","author":"Flaxman","year":"2016"},{"key":"2026033012243668900_ref081","volume-title":"Real analysis","author":"Folland","year":"1999"},{"key":"2026033012243668900_ref082","first-page":"1","volume-title":"Modern Methodology and Applications in Spatial-Temporal Modeling","author":"Fukumizu","year":"2015"},{"key":"2026033012243668900_ref083","first-page":"73","article-title":"Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces","volume":"5","author":"Fukumizu","year":"2004","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref084","first-page":"361","article-title":"Statistical consistency of kernel canonical correlation analysis","volume":"8","author":"Fukumizu","year":"2007","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref085","first-page":"489","volume-title":"Advances in Neural Information Processing Systems 20","author":"Fukumizu","year":"2008"},{"issue":"4","key":"2026033012243668900_ref086","doi-asserted-by":"crossref","first-page":"1871","DOI":"10.1214\/08-AOS637","article-title":"Kernel dimension reduction in regression","volume":"37","author":"Fukumizu","year":"2009","journal-title":"Annals of Statistics"},{"key":"2026033012243668900_ref087","first-page":"473","volume-title":"Advances in Neural Information Processing Systems 21","author":"Fukumizu","year":"2009"},{"key":"2026033012243668900_ref088","first-page":"1737","volume-title":"Advances in Neural Information Processing Systems","author":"Fukumizu","year":"2011"},{"key":"2026033012243668900_ref089","first-page":"3753","article-title":"Kernel Bayes' rule: Bayesian inference with positive definite kernels","volume":"14","author":"Fukumizu","year":"2013","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"2026033012243668900_ref090","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1145\/959242.959248","article-title":"A survey of kernels for structured data","volume":"5","author":"G\u00e4rtner","year":"2003","journal-title":"SIGKDD Explorations Newsletter"},{"key":"2026033012243668900_ref091","first-page":"179","volume-title":"In Proceeding of the 19th International Conference on Machine Learning","author":"Gartner","year":"2002"},{"key":"2026033012243668900_ref092","first-page":"299","article-title":"Classes of kernels for machine learning: A statistics perspective","volume":"2","author":"Genton","year":"2002","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref093","first-page":"529","volume-title":"Advances in Neural Information Processing Systems 15","author":"Girard","year":"2002"},{"issue":"1\u20131","key":"2026033012243668900_ref094","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1109\/TGRS.2009.2026425","article-title":"Mean map kernel methods for semisupervised cloud classification","volume":"48","author":"Gomez-Chova","year":"2010","journal-title":"IEEE Transaction on Geoscience and Remote Sensing"},{"key":"2026033012243668900_ref095","first-page":"2672","volume-title":"Advances in Neural Information Processing Systems 27","author":"Goodfellow","year":"2014"},{"key":"2026033012243668900_ref096","author":"Goodman","year":"2011","journal-title":"Psychological Review"},{"key":"2026033012243668900_ref097","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1145\/2593882.2593900","volume-title":"Proceedings of the on Future of Software Engineering","author":"Gordon","year":"2014"},{"key":"2026033012243668900_ref098","unstructured":"A.\n              Gretton\n            \n          . Reproducing kernel Hilbert spaces in machine learning. http:\/\/www.gatsby.ucl.ac.uk\/\u02dcgretton\/coursefiles\/rkhscourse.html, 2016."},{"key":"2026033012243668900_ref099","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/11564089_7","volume-title":"Proceedings of the 16th International Conference on Algorithmic Learning Theory","author":"Gretton","year":"2005"},{"key":"2026033012243668900_ref100","first-page":"2075","article-title":"Kernel methods for measuring independence","volume":"6","author":"Gretton","year":"2005","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref101","first-page":"513","volume-title":"Advances in Neural Information Processing Systems","author":"Gretton","year":"2007"},{"key":"2026033012243668900_ref102","first-page":"131","volume-title":"Covariate Shift by Kernel Mean Matching","author":"Gretton","year":"2009"},{"key":"2026033012243668900_ref103","first-page":"723","article-title":"A kernel two-sample test","volume":"13","author":"Gretton","year":"2012","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref104","first-page":"1214","volume-title":"Advances in Neural Information Processing Systems","author":"Gretton","year":"2012"},{"key":"2026033012243668900_ref105","first-page":"535","volume-title":"Proceedings of the 29th International Conference on Machine Learning","author":"Grunewalder","year":"2012"},{"key":"2026033012243668900_ref106","first-page":"1823","volume-title":"Proceedings of the 29th International Conference on Machine Learning","author":"Grunewalder","year":"2012"},{"key":"2026033012243668900_ref107","first-page":"1184","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Grunewalder","year":"2013"},{"key":"2026033012243668900_ref108","author":"Guevara","year":"2015","journal-title":"ODDx3 Workshop on Outlier Definition, Detection, and Description at the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"},{"key":"2026033012243668900_ref109","unstructured":"I.\n              Guyon\n            \n          . Cause-effect pairs Kaggle competition, 2013. URL https:\/\/www.kaggle.com\/c\/cause-effect-pairs\/."},{"key":"2026033012243668900_ref110","unstructured":"I.\n              Guyon\n            \n          . ChaLearn fast causation coefficient challenge, 2014. URLhttps:\/\/www.codalab.org\/competitions\/1381."},{"key":"2026033012243668900_ref111","first-page":"1157","article-title":"An introduction to variable and feature selection","volume":"3","author":"Guyon","year":"2003","journal-title":"Journal of Machine Learning Research"},{"issue":"2","key":"2026033012243668900_ref112","doi-asserted-by":"crossref","first-page":"223","DOI":"10.2307\/3318737","article-title":"An adaptive Metropolis algorithm","volume":"7","author":"Haario","year":"2001","journal-title":"Bernoulli"},{"key":"2026033012243668900_ref113","author":"Hammersley","year":"1971"},{"key":"2026033012243668900_ref114","first-page":"609","volume-title":"Advances in Neural Information Processing Systems 20","author":"Harchaoui","year":"2007"},{"key":"2026033012243668900_ref115","first-page":"609","volume-title":"Advances in Neural Information Processing Systems 21","author":"Harchaoui","year":"2009"},{"key":"2026033012243668900_ref116","first-page":"1665","volume-title":"IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Harchaoui","year":"2009"},{"key":"2026033012243668900_ref117","first-page":"1083","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"Harmeling","year":"2013"},{"key":"2026033012243668900_ref118","author":"Haussler","year":"1999"},{"key":"2026033012243668900_ref119","author":"Hein","year":"2004"},{"key":"2026033012243668900_ref120","first-page":"136","author":"Hein","year":"2005","journal-title":"Proceedings of the 12th International Conference on Artificial Intelligence and Statistics"},{"issue":"2","key":"2026033012243668900_ref121","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1214\/aoms\/1177728261","article-title":"The efficiency of some nonparametric competitors of the t-test","volume":"27","author":"Hodges","year":"1956","journal-title":"The Annals of Mathematical Statistics"},{"issue":"3","key":"2026033012243668900_ref122","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1214\/aoms\/1177730196","article-title":"A class of statistics with asymptotically normal distribution","volume":"19","author":"Hoeffding","year":"1948","journal-title":"The Annals of Mathematical Statistics"},{"issue":"3","key":"2026033012243668900_ref123","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1214\/009053607000000677","article-title":"Kernel methods in machine learning","volume":"36","author":"Hofmann","year":"2008","journal-title":"The Annals of Statistics"},{"issue":"6","key":"2026033012243668900_ref124","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1037\/h0071325","article-title":"Analysis of a complex of statistical variables into principal components","volume":"24","author":"Hotelling","year":"1933","journal-title":"Journal of Educational Psychology"},{"issue":"5","key":"2026033012243668900_ref125","doi-asserted-by":"crossref","first-page":"1460","DOI":"10.1016\/j.jcss.2011.12.025","article-title":"A spectral algorithm for learning hidden Markov models","volume":"78","author":"Hsu","year":"2012","journal-title":"Journal of Computer and System Sciences"},{"key":"2026033012243668900_ref126","first-page":"601","volume-title":"Advances in Neural Information Processing Systems","author":"Huang","year":"2007"},{"issue":"1","key":"2026033012243668900_ref127","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1214\/aoms\/1177703732","article-title":"Robust estimation of a location parameter","volume":"35","author":"Huber","year":"1964","journal-title":"The Annals of Mathematical Statistics"},{"key":"2026033012243668900_ref128","first-page":"377","author":"Huszar","year":"2012","journal-title":"Proceedings of the 28th Conference on Uncertainty in Artificial Intel ligence"},{"key":"2026033012243668900_ref129","first-page":"695","article-title":"Estimation of non-normalized statistical models by score matching","volume":"6","author":"Hyvarinen","year":"2005","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref130","first-page":"256","article-title":"Particle belief propagation","volume":"5","author":"Ihler","year":"2009","journal-title":"International Conference on Artificial Intelligence and Statistics"},{"key":"2026033012243668900_ref131","first-page":"487","author":"Jaakkola","year":"1998","journal-title":"Advances in Neural Information Processing Systems 11"},{"issue":"6","key":"2026033012243668900_ref132","doi-asserted-by":"crossref","first-page":"1371","DOI":"10.1162\/089976600300015411","article-title":"Observable operator models for discrete stochastic time series","volume":"12","author":"Jaeger","year":"2000","journal-title":"Neural Computation"},{"key":"2026033012243668900_ref133","first-page":"361","author":"James","year":"1961","journal-title":"Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability"},{"key":"2026033012243668900_ref134","first-page":"383","author":"Janzing","year":"2011","journal-title":"Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence"},{"key":"2026033012243668900_ref135","author":"Janzing","year":"2013","journal-title":"The Annals of Statistics"},{"key":"2026033012243668900_ref136","first-page":"819","article-title":"Probability product kernels","volume":"5","author":"Jebara","year":"2004","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref137","first-page":"144","volume":"5803","author":"Jegelka","year":"2009","journal-title":"KI 2009: AI and Automation, Lecture Notes in Computer Science"},{"key":"2026033012243668900_ref138","first-page":"405","author":"Jitkrittum","year":"2015","journal-title":"Proceedings of the 31st Conference on Uncertainty in Artificial Intel ligence"},{"key":"2026033012243668900_ref139","first-page":"181","author":"Jitkrittum","year":"2016","journal-title":"Advances in Neural Information Processing Systems 29"},{"key":"2026033012243668900_ref140","first-page":"457","volume-title":"Proceedings of the 17th International Conference on Artificial Intel ligence and Statistics","author":"Kanagawa","year":"2014"},{"key":"2026033012243668900_ref141","author":"Kanagawa","year":"2013"},{"issue":"2","key":"2026033012243668900_ref142","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1162\/NECO_a_00806","article-title":"Filtering with state-observation examples via kernel Monte Carlo filter","volume":"28","author":"Kanagawa","year":"2016","journal-title":"Neural Computation"},{"key":"2026033012243668900_ref143","first-page":"3288","volume-title":"Advances in Neural Information Processing Systems 29","author":"Kanagawa","year":"2016"},{"key":"2026033012243668900_ref144","volume-title":"Consistent Testing of Total Independence Based on the Empirical Characteristic Function","author":"Kankainen","year":"1995"},{"issue":"5","key":"2026033012243668900_ref145","doi-asserted-by":"crossref","first-page":"1486","DOI":"10.1007\/BF02362283","article-title":"A consistent modification of a test for independence based on the empirical characteristic function","volume":"89","author":"Kankainen","year":"1998","journal-title":"Journal of Mathematical Sciences"},{"key":"2026033012243668900_ref146","first-page":"583","article-title":"Random feature maps for dot product kernels","volume":"22","author":"Kar","year":"2012","journal-title":"Proceedings of the 15th International Conference on Artificial Intelligence and Statistics"},{"key":"2026033012243668900_ref147","first-page":"2280","volume-title":"Advances in Neural Information Processing Systems 29","author":"Kim","year":"2016"},{"key":"2026033012243668900_ref148","first-page":"2529","article-title":"Robust kernel density estimation","volume":"13","author":"Kim","year":"2012","journal-title":"Journal of Machine Learning Research"},{"issue":"9","key":"2026033012243668900_ref149","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1109\/TPAMI.2005.181","article-title":"Iterative kernel principal component analysis for image modeling","volume":"27","author":"Kim","year":"2005","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2026033012243668900_ref150","first-page":"361","volume-title":"Proceedings of the 20th International Conference on Machine Learning","author":"Kondor","year":"2003"},{"key":"2026033012243668900_ref151","first-page":"729","volume-title":"Advances in Neural Information Processing Systems 24","author":"Kpotufe","year":"2011"},{"key":"2026033012243668900_ref152","volume-title":"Introductory Functional Analysis with Application","author":"Kreyszig","year":"1978"},{"issue":"2-3","key":"2026033012243668900_ref153","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1561\/2200000044","article-title":"Determinantal point processes for machine learning","volume":"5","author":"Kulesza","year":"2012","journal-title":"Foundations and Trends in Machine Learning"},{"issue":"6","key":"2026033012243668900_ref154","doi-asserted-by":"crossref","first-page":"1517","DOI":"10.1109\/TNN.2004.837781","article-title":"The pre-image problem in kernel methods","volume":"15","author":"Kwok","year":"2004","journal-title":"IEEE Transactions on Neural Networks"},{"key":"2026033012243668900_ref155","first-page":"544","article-title":"Sequential kernel herding: Frank- Wolfe optimization for particle filtering","volume":"38","author":"Lacoste-Julien","year":"2015","journal-title":"Proceedings of the 18th International Conference on Artificial Intel ligence and Statistics"},{"key":"2026033012243668900_ref156","first-page":"244","article-title":"Fastfood-approximating kernel expansions in loglinear time","volume":"28","author":"Le","year":"2013","journal-title":"Proceedings of the 30th International Conference on Machine Learning"},{"key":"2026033012243668900_ref157","first-page":"1718","article-title":"Generative moment matching networks","volume":"37","author":"Li","year":"2015","journal-title":"Proceedings of the 32nd International Conference on Machine Learning"},{"key":"2026033012243668900_ref158","first-page":"76","volume-title":"Proceedings of the 33nd International Conference on Machine Learning","author":"Liu","year":"2016"},{"key":"2026033012243668900_ref159","first-page":"829","volume-title":"Advances in Neural Information Processing Systems 28","author":"Lloyd","year":"2015"},{"key":"2026033012243668900_ref160","first-page":"97","article-title":"Learning transferable features with deep adaptation networks","volume":"37","author":"Long","year":"2015","journal-title":"Proceedings of the 32nd International Conference on Machine Learning"},{"key":"2026033012243668900_ref161","first-page":"136","volume-title":"Advances in Neural Information Processing Systems 29","author":"Long","year":"2016"},{"key":"2026033012243668900_ref162","first-page":"1452","article-title":"Towards a learning theory of cause-effect inference","volume":"37","author":"Lopez-Paz","year":"2015","journal-title":"Proceedings of the 32nd International Conference on Machine Learning"},{"issue":"1","key":"2026033012243668900_ref163","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1214\/aos\/1176350264","article-title":"Admissibility as a touchstone","volume":"15","author":"Mandelbaum","year":"1987","journal-title":"Annals of Statistics"},{"key":"2026033012243668900_ref164","first-page":"935","article-title":"Nonextensive information theoretic kernels on measures","volume":"10","author":"Martins","year":"2009","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref165","first-page":"2845","volume-title":"IEEE International Conference on Robotics and Automation","author":"McCalman","year":"2013"},{"key":"2026033012243668900_ref166","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4612-0603-3","volume-title":"An Introduction to Banach Space Theory","author":"Megginson","year":"1998"},{"key":"2026033012243668900_ref167","volume-title":"CoRR","author":"Mehta","year":"2010"},{"issue":"441-458","key":"2026033012243668900_ref168","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1098\/rsta.1909.0016","article-title":"Functions of positive and negative type, and their connection with the theory of integral equations","volume":"209","author":"Mercer","year":"1909","journal-title":"Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences"},{"issue":"1","key":"2026033012243668900_ref169","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1162\/0899766052530802","article-title":"On learning vector-valued functions","volume":"17","author":"Micchelli","year":"2005","journal-title":"Neural Computation"},{"key":"2026033012243668900_ref170","first-page":"362","volume-title":"Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence","author":"Minka","year":"2001"},{"key":"2026033012243668900_ref171","volume-title":"Foundations of Machine Learning","author":"Mohri","year":"2012"},{"issue":"32","key":"2026033012243668900_ref172","first-page":"1","article-title":"Distinguishing cause from effect using observational data: Methods and benchmarks","volume":"17","author":"Mooij","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref173","first-page":"1385","volume-title":"Advances in Neural Information Processing Systems 16","author":"Moreno","year":"2004"},{"key":"2026033012243668900_ref174","volume-title":"From Points to Probability Measures: Statistical Learning on Distributions with Kernel Mean Embedding","author":"Muandet","year":"2015"},{"key":"2026033012243668900_ref175","first-page":"449","volume-title":"Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence","author":"Muandet","year":"2013"},{"key":"2026033012243668900_ref176","first-page":"10","volume-title":"Advances in Neural Information Processing Systems","author":"Muandet","year":"2012"},{"key":"2026033012243668900_ref177","first-page":"10","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Muandet","year":"2013"},{"key":"2026033012243668900_ref178","first-page":"10","article-title":"Kernel mean estimation and Stein effect","volume":"32","author":"Muandet","year":"2014","journal-title":"Proceedings of The 31st International Conference on Machine Learning"},{"key":"2026033012243668900_ref179","first-page":"10","volume-title":"Advances in Neural Information Processing Systems 27","author":"Muandet","year":"2014"},{"issue":"48","key":"2026033012243668900_ref180","first-page":"1","article-title":"Kernel mean shrinkage estimators","volume":"17","author":"Muandet","year":"2016","journal-title":"Journal of Machine Learning Research"},{"issue":"2","key":"2026033012243668900_ref181","doi-asserted-by":"crossref","first-page":"429","DOI":"10.2307\/1428011","article-title":"Integral probability metrics and their generating classes of functions","volume":"29","author":"Muller","year":"1997","journal-title":"Advances in Applied Probability"},{"key":"2026033012243668900_ref182","volume-title":"Machine Learning: A Probabilistic Perspective","author":"Murphy","year":"2012"},{"key":"2026033012243668900_ref183","author":"Neal","year":"1993"},{"key":"2026033012243668900_ref184","first-page":"1089","volume-title":"Advances in Neural Information Processing Systems 20","author":"Nguyen","year":"2007"},{"issue":"180","key":"2026033012243668900_ref185","first-page":"1","article-title":"Characteristic kernels and infinitely divisible distributions","volume":"17","author":"Nishiyama","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref186","first-page":"644","volume-title":"Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence","author":"Nishiyama","year":"2012"},{"key":"2026033012243668900_ref187","volume-title":"Journal of the Royal Statistical Society: Series B (Statistical Methodology)","author":"Oates","year":"2016"},{"key":"2026033012243668900_ref188","first-page":"1049","article-title":"Distribution to distribution regression","author":"Oliva","year":"2013","journal-title":"Proceedings of the 30th International Conference on Machine Learning"},{"key":"2026033012243668900_ref189","first-page":"706","article-title":"Fast distribution to real regression","volume":"33","author":"Oliva","year":"2014","journal-title":"Proceedings of the 17th International Conference on Artificial Intelligence and Statistics"},{"issue":"2","key":"2026033012243668900_ref190","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1109\/TNN.2010.2091281","article-title":"Domain adaptation via transfer component analysis","volume":"22","author":"Pan","year":"2011","journal-title":"IEEE Transactions on Neural Networks"},{"key":"2026033012243668900_ref191","first-page":"398","volume-title":"Proceedings of the 19th International Conference on Artificial Intel ligence and Statistics","author":"Park","year":"2016"},{"key":"2026033012243668900_ref192","volume-title":"Probabilistic Reasoning in Intel ligent Systems: Networks of Plausible Inference","author":"Pearl","year":"1988"},{"key":"2026033012243668900_ref193","volume-title":"Causality: Models, Reasoning, and Inference","author":"Pearl","year":"2000"},{"issue":"6","key":"2026033012243668900_ref194","first-page":"559","volume":"2","author":"Pearson","year":"1901","journal-title":"Philosophical Magazine"},{"key":"2026033012243668900_ref195","first-page":"1846","volume-title":"Advances in Neural Information Processing Systems 28","author":"Pennington","year":"2015"},{"key":"2026033012243668900_ref196","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1145\/2487575.2487591","volume-title":"Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Pham","year":"2013"},{"key":"2026033012243668900_ref197","first-page":"599","volume-title":"Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence,","author":"P\u0243czos","year":"2011"},{"key":"2026033012243668900_ref198","first-page":"2989","volume-title":"IEEE Conference on Computer Vision and Pattern Recognition","author":"P\u0243czos","year":"2012"},{"key":"2026033012243668900_ref199","first-page":"507","volume":"31","author":"P\u0243czos","year":"2013","journal-title":"Proceedings of the 16th International Conference on Artificial Intel ligence and Statistics"},{"key":"2026033012243668900_ref200","first-page":"2329","article-title":"Point-based value iteration for continuous POMDPs","volume":"7","author":"Porta","year":"2006","journal-title":"Journal of Machine Learning Research"},{"issue":"5","key":"2026033012243668900_ref201","doi-asserted-by":"crossref","first-page":"2531","DOI":"10.1214\/07-AOS540","article-title":"Stein estimation for the drift of Gaussian processes using the Malliavin calculus","volume":"36","author":"Privault","year":"2008","journal-title":"Annals of Statistics"},{"key":"2026033012243668900_ref202","first-page":"1289","volume-title":"Advances in Neural Information Processing Systems 21","author":"Quadrianto","year":"2009"},{"key":"2026033012243668900_ref203","first-page":"388","volume-title":"Advances in Neural Information Processing Systems 27","author":"Quang","year":"2014"},{"key":"2026033012243668900_ref204","first-page":"1177","volume-title":"Advances in Neural Information Processing Systems 20","author":"Rahimi","year":"2007"},{"key":"2026033012243668900_ref205","first-page":"3777","volume-title":"Proceedings of the 2015 International Joint Conference on Artificial Intel ligence","author":"Ramdas","year":"2015"},{"key":"2026033012243668900_ref206","volume-title":"CoRR","author":"Ramdas","year":"2015"},{"key":"2026033012243668900_ref207","first-page":"3571","volume-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence","author":"Ramdas","year":"2015"},{"key":"2026033012243668900_ref208","first-page":"489","volume-title":"Advances in Neural Information Processing Systems 15","author":"Rasmussen","year":"2002"},{"key":"2026033012243668900_ref209","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/3206.001.0001","volume-title":"Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning)","author":"Rasmussen","year":"2005"},{"key":"2026033012243668900_ref210","first-page":"772","volume-title":"Proceedings of the 18th International Conference on Artificial Intelligence and Statistics","author":"Reddi","year":"2015"},{"key":"2026033012243668900_ref211","volume":"1","author":"Reed","year":"1981","journal-title":"Functional Analysis, Volume"},{"issue":"3-4","key":"2026033012243668900_ref212","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1007\/BF02024507","article-title":"On measures of dependence","volume":"10","author":"R\u0233nyi","year":"1959","journal-title":"Acta Mathematica Academiae Scien- tiarum Hungarica"},{"key":"2026033012243668900_ref213","volume-title":"Monte Carlo Statistical Methods (Springer Texts in Statistics)","author":"Robert","year":"2005"},{"key":"2026033012243668900_ref214","first-page":"905","article-title":"On learning with integral operators","volume":"11","author":"Rosasco","year":"2010","journal-title":"Journal of Machine Learning Research"},{"issue":"6","key":"2026033012243668900_ref215","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1037\/h0042519","article-title":"The perceptron: a probabilistic model for information storage and organization in the brain","volume":"65","author":"Rosenblatt","year":"1958","journal-title":"Psychological Review"},{"key":"2026033012243668900_ref216","volume-title":"Functional Analysis","author":"Rudin","year":"1991"},{"issue":"4","key":"2026033012243668900_ref217","doi-asserted-by":"crossref","first-page":"811","DOI":"10.2307\/1968466","article-title":"Metric spaces and completely monotone functions","volume":"39","author":"Schoenberg","year":"1938","journal-title":"The Annals of Mathematics"},{"key":"2026033012243668900_ref218","volume-title":"Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond","author":"Sch\u0246lkopf","year":"2002"},{"issue":"5","key":"2026033012243668900_ref219","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1162\/089976698300017467","article-title":"Nonlinear component analysis as a kernel eigenvalue problem","volume":"10","author":"Sch\u0246lkopf","year":"1998","journal-title":"Neural Computation"},{"key":"2026033012243668900_ref220","first-page":"416","volume-title":"Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory","author":"Sch\u0246lkopf","year":"2001"},{"key":"2026033012243668900_ref221","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/4057.001.0001","volume-title":"Kernel Methods in Computational Biology","author":"Sch\u0246lkopf","year":"2004"},{"key":"2026033012243668900_ref222","first-page":"1255","volume-title":"Proceedings of the 29th International Conference on Machine Learning","author":"Sch\u0246lkopf","year":"2012"},{"issue":"4","key":"2026033012243668900_ref223","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1007\/s11222-015-9558-5","article-title":"Computing functions of random variables via reproducing kernel Hilbert space representations","volume":"25","author":"Sch\u0246lkopf","year":"2015","journal-title":"Statistics and Computing"},{"key":"2026033012243668900_ref224","volume-title":"CoRR","author":"Schuster","year":"2016"},{"key":"2026033012243668900_ref225","first-page":"1111","volume-title":"Proceedings of the 29th International Conference on Machine Learning","author":"Sejdinovic","year":"2012"},{"key":"2026033012243668900_ref226","first-page":"1124","volume-title":"Advances in Neural Information Processing Systems 26","author":"Sejdinovic","year":"2013"},{"issue":"5","key":"2026033012243668900_ref227","doi-asserted-by":"crossref","first-page":"2263","DOI":"10.1214\/13-AOS1140","article-title":"Equivalence of distance-based and RKHS-based statistics in hypothesis testing","volume":"41","author":"Sejdinovic","year":"2013","journal-title":"The Annals of Statistics"},{"key":"2026033012243668900_ref228","first-page":"1665","volume-title":"Proceedings of the 31th International Conference on Machine Learning","author":"Sejdinovic","year":"2014"},{"key":"2026033012243668900_ref229","volume-title":"Approximation theorems of mathematical statistics","author":"Serfling","year":"1981"},{"key":"2026033012243668900_ref230","first-page":"556","volume-title":"Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence","author":"Sgouritsa","year":"2013"},{"key":"2026033012243668900_ref231","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511809682","volume-title":"Kernel Methods for Pattern Analysis","author":"Shawe-Taylor","year":"2004"},{"key":"2026033012243668900_ref232","first-page":"468","article-title":"A framework for probability density estimation","volume":"2","author":"Shawe-Taylor","year":"2007","journal-title":"Proceedings of the 11th International Conference on Artificial Intelligence and Statistics"},{"key":"2026033012243668900_ref233","first-page":"1283","article-title":"Second order cone programming approaches for handling missing and uncertain data","volume":"7","author":"Shivaswamy","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref234","volume-title":"Density Estimation for Statistics and Data Analysis","author":"Silverman","year":"1986"},{"key":"2026033012243668900_ref235","first-page":"1732","volume-title":"Advances in Neural Information Processing Systems 29","author":"Simon-Gabriel","year":"2016"},{"key":"2026033012243668900_ref236","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/s00365-006-0659-y","article-title":"Learning theory estimates via integral operators and their approximations","volume":"26","author":"Smale","year":"2007","journal-title":"Constructive Approximation"},{"issue":"5","key":"2026033012243668900_ref237","doi-asserted-by":"crossref","first-page":"1071","DOI":"10.1287\/opre.21.5.1071","article-title":"The optimal control of partially observable Markov processes over a finite horizon","volume":"21","author":"Smallwood","year":"1973","journal-title":"Operations Research"},{"key":"2026033012243668900_ref238","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1007\/978-3-540-75225-7_5","volume-title":"Proceedings of the 18th International Conference on Algorithmic Learning Theory","author":"Smola","year":"2007"},{"key":"2026033012243668900_ref239","volume-title":"Learning via Hilbert Space Embedding of Distributions","author":"Song","year":"2008"},{"key":"2026033012243668900_ref240","first-page":"3228","volume-title":"Advances in Neural Information Processing Systems 26","author":"Song","year":"2013"},{"key":"2026033012243668900_ref241","doi-asserted-by":"crossref","first-page":"815","DOI":"10.1145\/1273496.1273599","volume-title":"Proceedings of the 24th International Conference on Machine Learning","author":"Song","year":"2007"},{"key":"2026033012243668900_ref242","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1145\/1273496.1273600","volume-title":"Proceedings of the 24th International Conference on Machine Learning","author":"Song","year":"2007"},{"key":"2026033012243668900_ref243","doi-asserted-by":"crossref","first-page":"992","DOI":"10.1145\/1390156.1390281","volume-title":"Proceedings of the 25th International Conference on Machine Learning","author":"Song","year":"2008"},{"key":"2026033012243668900_ref244","first-page":"961","volume-title":"Proceedings of the 26th International Conference on Machine Learning","author":"Song","year":"2009"},{"key":"2026033012243668900_ref245","first-page":"991","volume-title":"Proceedings of the 27th International Conference on Machine Learning","author":"Song","year":"2010"},{"key":"2026033012243668900_ref246","first-page":"765","article-title":"Nonparametric tree graphical models via kernel embeddings","volume":"9","author":"Song","year":"2010","journal-title":"Proceedings of the 13th International Conference on Artificial Intelligence and Statistics"},{"key":"2026033012243668900_ref247","first-page":"707","volume-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics","author":"Song","year":"2011"},{"key":"2026033012243668900_ref248","first-page":"2708","volume-title":"Advances in Neural Information Processing Systems","author":"Song","year":"2011"},{"issue":"4","key":"2026033012243668900_ref249","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1109\/MSP.2013.2252713","article-title":"Kernel embeddings of conditional distributions: A unified kernel framework for nonparametric inference in graphical models","volume":"30","author":"Song","year":"2013","journal-title":"IEEE Signal Processing Magazine"},{"key":"2026033012243668900_ref250","volume-title":"Causation, prediction, and search","author":"Spirtes","year":"2000"},{"key":"2026033012243668900_ref251","volume-title":"The Algebra of Random Variables","author":"Springer","year":"1979"},{"key":"2026033012243668900_ref252","first-page":"1027","volume-title":"Proceedings of the IEEE International Symposium on Information Theory","author":"Sriperumbudur","year":"2011"},{"issue":"3","key":"2026033012243668900_ref253","doi-asserted-by":"crossref","first-page":"1839","DOI":"10.3150\/15-BEJ713","article-title":"On the optimal estimation of probability measures in weak and strong topologies","volume":"22","author":"Sriperumbudur","year":"2016","journal-title":"Bernoulli"},{"key":"2026033012243668900_ref254","first-page":"1144","volume-title":"Advances in Neural Information Processing Systems 28","author":"Sriperumbudur","year":"2015"},{"key":"2026033012243668900_ref255","first-page":"111","volume-title":"The 21st Annual Conference on Learning Theory","author":"Sriperumbudur","year":"2008"},{"key":"2026033012243668900_ref256","first-page":"1750","volume-title":"Advances in Neural Information Processing Systems 22","author":"Sriperumbudur","year":"2009"},{"key":"2026033012243668900_ref257","first-page":"1517","article-title":"Hilbert space embeddings and metrics on probability measures","volume":"99","author":"Sriperumbudur","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref299","first-page":"2389","article-title":"Universality, characteristic kernels and RKHS embedding of measures","volume":"12","author":"Sriperumbudur","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref258","first-page":"1773","volume-title":"Advances in Neural Information Processing Systems 24","author":"Sriperumbudur","year":"2011"},{"key":"2026033012243668900_ref259","doi-asserted-by":"crossref","first-page":"1550","DOI":"10.1214\/12-EJS722","article-title":"On the empirical estimation of integral probability metrics","volume":"6","author":"Sriperumbudur","year":"2012","journal-title":"Electronic Journal of Statistics"},{"key":"2026033012243668900_ref260","volume-title":"CoRR","author":"Sriperumbudur","year":"2013"},{"key":"2026033012243668900_ref261","first-page":"197","volume-title":"Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability","author":"Stein","year":"1955"},{"key":"2026033012243668900_ref262","first-page":"67","article-title":"On the influence of the kernel on the consistency of support vector machines","volume":"2","author":"Steinwart","year":"2002","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref263","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-77242-4","volume-title":"Support Vector Machines","author":"Steinwart","year":"2008"},{"issue":"3","key":"2026033012243668900_ref264","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1007\/s00365-012-9153-3","article-title":"Mercer's theorem on general domains: On the interaction between measures, kernels, and RKHSs","volume":"35","author":"Steinwart","year":"2012","journal-title":"Constructive Approximation"},{"key":"2026033012243668900_ref265","first-page":"955","volume-title":"Advances in Neural Information Processing Systems 28","author":"Strathmann","year":"2015"},{"key":"2026033012243668900_ref266","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1017\/S0266466608080341","article-title":"A nonparametric Hellinger metric test for conditional independence","volume":"24","author":"Su","year":"2008","journal-title":"Econometric Theory"},{"issue":"10","key":"2026033012243668900_ref267","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1145\/1831407.1831431","article-title":"Nonparametric belief propagation","volume":"53","author":"Sudderth","year":"2010","journal-title":"Communications of the ACM"},{"key":"2026033012243668900_ref268","first-page":"862","volume-title":"Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence","author":"Sutherland","year":"2015"},{"key":"2026033012243668900_ref269","doi-asserted-by":"crossref","DOI":"10.1109\/TNN.1998.712192","volume-title":"Introduction to Reinforcement Learning","author":"Sutton","year":"1998"},{"issue":"152","key":"2026033012243668900_ref270","first-page":"1","article-title":"Learning theory for distribution regression","volume":"17","author":"Szab\u00f3","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref271","volume-title":"InterStat","author":"Sz\u00e9kely","year":"2004"},{"issue":"1","key":"2026033012243668900_ref272","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1016\/j.jmva.2003.12.002","article-title":"A new test for multivariate normality","volume":"93","author":"Sz\u00e9kely","year":"2005","journal-title":"Journal of Multivariate Analysis"},{"issue":"4","key":"2026033012243668900_ref273","first-page":"1236","article-title":"Brownian distance covariance","volume":"3","author":"Sz\u00e9kely","year":"2009","journal-title":"Annals of Applied Statistics"},{"issue":"6","key":"2026033012243668900_ref274","doi-asserted-by":"crossref","first-page":"2769","DOI":"10.1214\/009053607000000505","article-title":"Measuring and testing dependence by correlation of distances","volume":"35","author":"Sz\u00e9kely","year":"2007","journal-title":"Annals of Statistics"},{"key":"2026033012243668900_ref275","volume-title":"CoRR","author":"Tolstikhin","year":"2016"},{"key":"2026033012243668900_ref276","first-page":"322","article-title":"Permutation testing improves Bayesian network learning","volume":"6323","author":"Tsamardinos","year":"2010","journal-title":"ECML\/PKDD (3)"},{"key":"2026033012243668900_ref277","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511802256","volume-title":"Asymptotic Statistics","author":"van der Vaart","year":"1998"},{"key":"2026033012243668900_ref278","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-3264-1","volume-title":"The Nature of Statistical Learning Theory","author":"Vapnik","year":"2000"},{"issue":"3","key":"2026033012243668900_ref279","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1109\/TPAMI.2011.153","article-title":"Efficient additive kernels via explicit feature maps","volume":"34","author":"Vedaldi","year":"2012","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intel ligence"},{"key":"2026033012243668900_ref280","volume-title":"All of Nonparametric Statistics","author":"Wasserman","year":"2006"},{"key":"2026033012243668900_ref281","volume-title":"All of Statistics: A Concise Course in Statistical Inference","author":"Wasserman","year":"2010"},{"key":"2026033012243668900_ref282","first-page":"1121","volume-title":"Proceedings of the 26th International Conference on Machine Learning","author":"Welling","year":"2009"},{"key":"2026033012243668900_ref283","first-page":"599","volume-title":"Proceedings of the 25th Conference on Uncertainty in Artificial Intel ligence","author":"Welling","year":"2009"},{"issue":"1","key":"2026033012243668900_ref284","article-title":"Statistical inference using weak chaos and infinite memory","volume":"233","author":"Welling","year":"2010","journal-title":"Journal of Physics: Conference Series"},{"key":"2026033012243668900_ref285","volume-title":"Scattered Data Approximation","author":"Wendland","year":"2005"},{"key":"2026033012243668900_ref286","first-page":"682","volume-title":"Advances in Neural Information Processing Systems 13","author":"Williams","year":"2001"},{"issue":"1","key":"2026033012243668900_ref287","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1023\/A:1013848912046","article-title":"Conformal transformation of kernel functions: A data-dependent way to improve support vector machine classifiers","volume":"15","author":"Wu","year":"2002","journal-title":"Neural Processing Letters"},{"key":"2026033012243668900_ref288","first-page":"1071","volume-title":"Advances in Neural Information Processing Systems 24","author":"Xiong","year":"2011"},{"key":"2026033012243668900_ref289","first-page":"789","article-title":"Hierarchical probabilistic models for group anomaly detection","volume":"15","author":"Xiong","year":"2011","journal-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics"},{"key":"2026033012243668900_ref290","first-page":"485","article-title":"Quasi-Monte Carlo feature maps for shift-invariant kernels","volume":"32","author":"Yang","year":"2014","journal-title":"Proceedings of the 31th International Conference on Machine Learning"},{"key":"2026033012243668900_ref291","first-page":"1961","volume-title":"Advances in Neural Information Processing Systems 27","author":"Yoshikawa","year":"2014"},{"key":"2026033012243668900_ref292","first-page":"3129","volume-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence","author":"Yoshikawa","year":"2015"},{"key":"2026033012243668900_ref293","first-page":"755","volume-title":"Advances in Neural Information Processing Systems 26","author":"Zaremba","year":"2013"},{"key":"2026033012243668900_ref294","first-page":"2741","article-title":"Reproducing kernel Banach spaces for machine learning","volume":"10","author":"Zhang","year":"2009","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012243668900_ref295","first-page":"804","volume-title":"Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence","author":"Zhang","year":"2011"},{"key":"2026033012243668900_ref296","first-page":"1937","volume-title":"Advances in Neural Information Processing Systems 21","author":"Zhang","year":"2008"},{"issue":"6","key":"2026033012243668900_ref297","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1162\/NECO_a_00732","article-title":"FastMMD: Ensemble of circular discrepancy for efficient two-sample test","volume":"27","author":"Zhao","year":"2015","journal-title":"Neural Computation"},{"key":"2026033012243668900_ref298","first-page":"594","volume":"3120","author":"Zwald","year":"2004","journal-title":"Proceedings of the 17th Annual Conference on Learning Theory"}],"container-title":["Foundations and Trends\u00ae in Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/10\/1-2\/1\/11154061\/2200000060en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/10\/1-2\/1\/11154061\/2200000060en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T16:25:28Z","timestamp":1774887928000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftmal\/article\/10\/1-2\/1\/1332390\/Kernel-Mean-Embedding-of-Distributions-A-Review"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,6,28]]},"references-count":299,"journal-issue":{"issue":"1-2","published-print":{"date-parts":[[2017,6,28]]}},"URL":"https:\/\/doi.org\/10.1561\/2200000060","relation":{},"ISSN":["1935-8237","1935-8245"],"issn-type":[{"value":"1935-8237","type":"print"},{"value":"1935-8245","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,6,28]]}}}