{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,10]],"date-time":"2024-07-10T06:34:37Z","timestamp":1720593277164},"reference-count":45,"publisher":"MIT Press - Journals","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Neural Computation"],"published-print":{"date-parts":[[2020,7]]},"abstract":"<jats:p>Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (i.i.d.). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. Our proposed framework consists of two main parts: homology analysis, where we compute the Betti number of the target topological space, and Shapley value calculation, where we decompose the topological features of a complex built from data points to individual points. By interpreting the influence as a probability measure, we further define an entropy that reflects the complexity of the data manifold. Furthermore, we provide a preliminary discussion of the connection of the Shapley homology to the Vapnik-Chervonenkis dimension. Empirical studies show that when the zero-dimensional Shapley homology is used on neighboring graphs, samples with higher influence scores have a greater impact on the accuracy of neural networks that determine graph connectivity and on several regular grammars whose higher entropy values imply greater difficulty in being learned.<\/jats:p>","DOI":"10.1162\/neco_a_01289","type":"journal-article","created":{"date-parts":[[2020,5,20]],"date-time":"2020-05-20T23:03:41Z","timestamp":1590015821000},"page":"1355-1378","source":"Crossref","is-referenced-by-count":4,"title":["Shapley Homology: Topological Analysis of Sample Influence for Neural Networks"],"prefix":"10.1162","volume":"32","author":[{"given":"Kaixuan","family":"Zhang","sequence":"first","affiliation":[{"name":"Information Sciences and Technology, Pennsylvania State University, State College, PA 16802, U.S.A."}]},{"given":"Qinglong","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada"}]},{"given":"Xue","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, McGill University, Montreal, Quebec H3A 0G4, Canada"}]},{"given":"C. Lee","family":"Giles","sequence":"additional","affiliation":[{"name":"Information Sciences and Technology, Pennsylvania State University, State College, PA 16802, U.S.A."}]}],"member":"281","reference":[{"key":"B1","author":"Anirudh R.","year":"2017","journal-title":"Influential sample selection: A graph signal processing approach"},{"issue":"1","key":"B2","first-page":"77","volume":"16","author":"Bubenik P.","year":"2015","journal-title":"Journal of Machine Learning Research"},{"key":"B3","author":"Carlsson G.","year":"2018","journal-title":"Topological approaches to deep learning"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0056-x"},{"key":"B5","author":"Chazal F.","year":"2017","journal-title":"An introduction to topological data analysis: Fundamental and practical aspects for data scientists"},{"key":"B6","author":"Chen J.","year":"2018","journal-title":"L-Shapley and c-Shapley: Efficient model interpretation for structured data."},{"key":"B7","author":"Chen X.","year":"2017","journal-title":"Targeted backdoor attacks on deep learning systems using data poisoning"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4012"},{"key":"B9","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1145\/3219819.3220119","author":"Cohen-Steiner D.","year":"2018","journal-title":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"},{"key":"B10","author":"Conrad K.","year":"2008","journal-title":"Group actions."},{"key":"B11","author":"Dai H.","year":"2016","journal-title":"Discriminative embeddings of latent variable models for structured data."},{"key":"B12","author":"Dai H.","year":"2018","journal-title":"Adversarial attack on graph structured data"},{"key":"B13","first-page":"598","author":"Datta A.","year":"2016","journal-title":"Proceedings of the IEEE Symposium on Security and Privacy"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139194655"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.1090\/conm\/453\/08802"},{"key":"B16","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog1402_1"},{"key":"B17","author":"Gunning D.","year":"2017","journal-title":"Explainable artificial intelligence (XAI)"},{"issue":"1","key":"B18","first-page":"1319","volume":"17","author":"Hanneke S.","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"B19","author":"Hatcher A.","year":"2005","journal-title":"Algebraic topology"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"B21","first-page":"1633","volume-title":"Advances in neural information processing systems","volume":"30","author":"Hofer C.","year":"2017"},{"key":"B22","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-218X(96)00025-X"},{"key":"B23","author":"Khoury M.","year":"2018","journal-title":"On the geometry of adversarial examples"},{"key":"B24","first-page":"1885","author":"Koh P. W.","year":"2017","journal-title":"Proceedings of the 34th International Conference on Machine Learning"},{"key":"B25","author":"Lee D.","year":"2019","journal-title":"Learning to augment influential data."},{"key":"B26","first-page":"1995","author":"Li C.","year":"2014","journal-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition"},{"key":"B27","first-page":"4768","volume":"30","author":"Lundberg S. M.","year":"2017","journal-title":"Advances in neural information processing systems"},{"key":"B28","author":"Marsden A.","year":"2013","journal-title":"Eigenvalues of the Laplacian and their relationship to the connectedness of a graph"},{"key":"B29","author":"Narahari Y.","year":"2012","journal-title":"The Shapley value"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1016\/j.socnet.2004.11.009"},{"key":"B31","first-page":"4331","author":"Ren M.","year":"2018","journal-title":"Proceedings of the 35th International Conference on Machine Learning"},{"key":"B32","author":"Rezaei S. S. C.","year":"2013","journal-title":"Entropy and graphs"},{"key":"B33","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-33259-6_7"},{"key":"B35","author":"Schapira P.","year":"2001","journal-title":"Categories and homological algebra"},{"key":"B36","first-page":"105","author":"Tomita M.","year":"1982","journal-title":"Proceedings of the Fourth Annual Conference of the Cognitive Science Society"},{"key":"B37","doi-asserted-by":"publisher","DOI":"10.1093\/imaiai\/iau011"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3264-1"},{"key":"B39","author":"Wang Q.","year":"2018","journal-title":"Verification of recurrent neural networks through rule extraction"},{"key":"B40","author":"Wang Q.","year":"2018","journal-title":"A comparative study of rule extraction for recurrent neural networks"},{"key":"B41","doi-asserted-by":"publisher","DOI":"10.1162\/neco_a_01111"},{"key":"B43","author":"Wang Y.","year":"2018","journal-title":"Data poisoning attacks against online learning"},{"key":"B44","author":"Weiss G.","year":"2017","journal-title":"Extracting automata from recurrent neural networks using queries and counterexamples"},{"key":"B45","doi-asserted-by":"publisher","DOI":"10.1090\/psapm\/060\/2078843"},{"key":"B46","first-page":"9311","volume-title":"Advances in neural information processing systems, 31","author":"Yeh C.","year":"2018"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/neco_a_01289","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,16]],"date-time":"2021-03-16T02:38:16Z","timestamp":1615862296000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/32\/7\/1355-1378\/95599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,7]]},"references-count":45,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2020,7]]}},"alternative-id":["10.1162\/neco_a_01289"],"URL":"https:\/\/doi.org\/10.1162\/neco_a_01289","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,7]]}}}