{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:04:09Z","timestamp":1750309449924,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,12,6]],"date-time":"2024-12-06T00:00:00Z","timestamp":1733443200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>Causality-informed machine learning has been proposed as an avenue for achieving many of the goals of modern machine learning, from ensuring generalization under domain shifts to attaining fairness, robustness, and interpretability. A key component of causal machine learning is the inference of causal structures from observational data; in practice, this data may be incompletely observed. Prior work has demonstrated that adversarial perturbations of completely observed training data may be used to force the learning of inaccurate structural causal models (SCMs). However, when the data can be audited for correctness (e.g., it is cryptographically signed by its source), this adversarial mechanism is invalidated. This work introduces a novel attack methodology wherein the adversary deceptively omits a portion of the true training data to bias the learned causal structures in a desired manner (under strong signed sample input validation, this behavior seems to be the only strategy available to the adversary). Under this model, theoretically sound attack mechanisms are derived for the case of arbitrary SCMs, and a sample-efficient learning-based heuristic is given. Experimental validation of these approaches on real and synthetic datasets, across a range of SCMs from the family of additive noise models (linear Gaussian, linear non-Gaussian, and non-linear Gaussian), demonstrates the effectiveness of adversarial missingness attacks at deceiving popular causal structure learning algorithms.<\/jats:p>","DOI":"10.1145\/3682065","type":"journal-article","created":{"date-parts":[[2024,8,27]],"date-time":"2024-08-27T12:18:43Z","timestamp":1724761123000},"page":"1-60","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Adversarial Missingness Attacks on Causal Structure Learning"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6316-8853","authenticated-orcid":false,"given":"Deniz","family":"Koyuncu","sequence":"first","affiliation":[{"name":"Rensselaer Polytechnic Institute, Troy, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3482-0157","authenticated-orcid":false,"given":"Alex","family":"Gittens","sequence":"additional","affiliation":[{"name":"Rensselaer Polytechnic Institute, Troy, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3989-6097","authenticated-orcid":false,"given":"B\u00fclent","family":"Yener","sequence":"additional","affiliation":[{"name":"Rensselaer Polytechnic Institute, Troy, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0848-0873","authenticated-orcid":false,"given":"Moti","family":"Yung","sequence":"additional","affiliation":[{"name":"Google LLC, New York, NY, USA and Columbia University, New York, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,12,6]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Emad Alsuwat Hatim Alsuwat Marco Valtorta and Csilla Farkas. 2018. Cyber attacks against the PC learning algorithm. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer 159\u2013176.","DOI":"10.1007\/978-3-030-13453-2_13"},{"issue":"1","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1080\/03081079.2019.1630401","article-title":"Adversarial data poisoning attacks against the PC learning algorithm","volume":"49","author":"Alsuwat Emad","year":"2020","unstructured":"Emad Alsuwat, Hatim Alsuwat, Marco Valtorta, and Csilla Farkas. 2020. Adversarial data poisoning attacks against the PC learning algorithm. International Journal of General Systems 49, 1 (2020), 3\u201331.","journal-title":"International Journal of General Systems"},{"key":"e_1_3_2_4_2","unstructured":"Rohit Bhattacharya Razieh Nabi Ilya Shpitser and James M. Robins. 2020. Identification in missing data models represented by directed acyclic graphs. In Proceedings of the 35th Uncertainty in Artificial Intelligence Conference. PMLR 1149\u20131158."},{"key":"e_1_3_2_5_2","unstructured":"Ruichu Cai Zhiyi Huang Wei Chen Zhifeng Hao and Kun Zhang. 2023. Causal discovery with latent confounders based on higher-order cumulants. In Proceedings of the 40th International Conference on Machine Learning (ICML\u201923) Vol. 202 JMLR.org 3380\u20133407."},{"issue":"2019","key":"e_1_3_2_6_2","first-page":"138872","article-title":"A backdoor attack against LSTM-based text classification systems","volume":"7","author":"Dai Jiazhu","year":"2019","unstructured":"Jiazhu Dai, Chuanshuai Chen, and Yufeng Li. 2019. A backdoor attack against LSTM-based text classification systems. IEEE Access 7 (2019), 138872\u2013138878.","journal-title":"IEEE Access"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-021-00516-9"},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"Minghong Fang Neil Zhenqiang Gong and Jia Liu. 2020. Influence function based data poisoning attacks to top-n recommender systems. In Proceedings of the Web Conference 3019\u20133025.","DOI":"10.1145\/3366423.3380072"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1093\/biostatistics\/kxm045"},{"key":"e_1_3_2_10_2","unstructured":"Erdun Gao Ignavier Ng Mingming Gong Li Shen Wei Huang Tongliang Liu Kun Zhang and Howard Bondell. 2022. MissDAG: Causal discovery in the presence of missing data with continuous additive noise models. arXiv:2205.13869. Retrieved from from https:\/\/arxiv.org\/abs\/2205.13869"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12874-018-0615-6"},{"key":"e_1_3_2_12_2","first-page":"586","volume-title":"Advances in Cyber Security","author":"Kashmoola M. Y.","year":"2021","unstructured":"M. Y. Kashmoola I. Ahmed, M. Ibrahim. 2021. Threats on machine learning technique by data poisoning attack: A survey. In Advances in Cyber Security. M. Anbar N. Abdullah, S. Manickam (Eds.), Springer Singapore, Singapore, 586\u2013600."},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Kang Liu Brendan Dolan-Gavitt and Siddharth Garg. 2018. Fine-pruning: Defending against backdooring attacks on deep neural networks. In International Symposium on Research in Attacks Intrusions and Defenses. Springer 273\u2013294.","DOI":"10.1007\/978-3-030-00470-5_13"},{"key":"e_1_3_2_14_2","doi-asserted-by":"crossref","unstructured":"Karthika Mohan and Judea Pearl. 2014. Graphical models for recovering probabilistic and causal queries from missing data. In Proceedings of the 27th International Conference on Neural Information Processing Systems 1520\u20131528.","DOI":"10.21236\/ADA614408"},{"key":"e_1_3_2_15_2","first-page":"17943","volume-title":"Advances in Neural Information Processing Systems","author":"Ng Ignavier","year":"2020","unstructured":"Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. 2020. On the role of sparsity and DAG constraints for learning linear DAGs. In Advances in Neural Information Processing Systems, Vol. 33, Curran Associates, Inc., 17943\u201317954."},{"issue":"1","key":"e_1_3_2_16_2","first-page":"2009","article-title":"Causal discovery with continuous additive noise models","volume":"15","author":"Peters Jonas","year":"2014","unstructured":"Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Sch\u00f6lkopf. 2014. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research 15, 1 (2014), 2009\u20132053.","journal-title":"The Journal of Machine Learning Research"},{"issue":"3","key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1093\/biomet\/63.3.581","article-title":"Inference and missing data","volume":"63","author":"Rubin Donald B.","year":"1976","unstructured":"Donald B. Rubin. 1976. Inference and missing data. Biometrika 63, 3 (1976), 581\u2013592.","journal-title":"Biometrika"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1105809"},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","unstructured":"Skipper Seabold and Josef Perktold. 2010. statsmodels: Econometric and statistical modeling with python. In Proceedings of the 9th Python in Science Conference.","DOI":"10.25080\/Majora-92bf1922-011"},{"key":"e_1_3_2_20_2","unstructured":"Ilya Shpitser Karthika Mohan and Judea Pearl. 2015. Missing data as a causal and probabilistic problem. In Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence (UAI\u201915). AUAI Press 802\u2013811."},{"key":"e_1_3_2_21_2","volume-title":"Causation, Prediction, and Search","author":"Spirtes Peter","year":"2000","unstructured":"Peter Spirtes, Clark N. Glymour, Richard Scheines, and David Heckerman. 2000. Causation, Prediction, and Search. MIT Press."},{"issue":"2010","key":"e_1_3_2_22_2","first-page":"219","article-title":"Missing values: Sparse inverse covariance estimation and an extension to sparse regression","volume":"22","author":"St\u00e4dler N.","year":"2010","unstructured":"N. St\u00e4dler and P. B\u00fchlmann. 2010. Missing values: Sparse inverse covariance estimation and an extension to sparse regression. Statistics and Computing 22 (2010), 219\u2013235.","journal-title":"Statistics and Computing"},{"issue":"4","key":"e_1_3_2_23_2","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1080\/10705511.2014.937378","article-title":"Graphical representation of missing data problems","volume":"22","author":"Thoemmes Felix","year":"2015","unstructured":"Felix Thoemmes and Karthika Mohan. 2015. Graphical representation of missing data problems. Structural Equation Modeling: A Multidisciplinary Journal 22, 4 (2015), 631\u2013642.","journal-title":"Structural Equation Modeling: A Multidisciplinary Journal"},{"key":"e_1_3_2_24_2","unstructured":"Ruibo Tu Cheng Zhang Paul Ackermann Karthika Mohan Hedvig Kjellstr\u00f6m and Kun Zhang. 2019. Causal discovery in the presence of missing data. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. PMLR 1762\u20131770."},{"key":"e_1_3_2_25_2","unstructured":"Feng Xie Ruichu Cai Biwei Huang Clark Glymour Zeng Hao and Kun Zhang. 2020. Generalized independent noise condition for estimating latent variable causal graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS \u201920). Curran Associates Inc. 14891\u201314902."},{"key":"e_1_3_2_26_2","volume-title":"Advances in Neural Information Processing Systems","author":"Zheng Xun","year":"2018","unstructured":"Xun Zheng, Bryon Aragam, Pradeep K. Ravikumar, and Eric P. Xing. 2018. DAGs with NO TEARS: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, Inc."},{"key":"e_1_3_2_27_2","unstructured":"Xun Zheng Chen Dan Bryon Aragam Pradeep Ravikumar and Eric Xing. 2020. Learning sparse nonparametric DAGs. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR 3414\u20133425."},{"key":"e_1_3_2_28_2","unstructured":"Shengyu Zhu Ignavier Ng and Zhitang Chen. 2020. Causal discovery with reinforcement learning. arXiv:1906.04477. Retrieved from https:\/\/arxiv.org\/abs\/1906.04477"},{"key":"e_1_3_3_7_5_1_2","unstructured":"Mart\u00edn Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Mane Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Viegas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467. Retrieved from https:\/\/arxiv.org\/abs\/1603.04467"},{"issue":"1","key":"e_1_3_3_7_5_2_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster A. P.","year":"1977","unstructured":"A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1 (1977), 1\u201338.","journal-title":"Journal of the Royal Statistical Society. Series B (Methodological)"},{"key":"e_1_3_3_7_5_3_2","volume-title":"Machine Learning: A Probabilistic Perspective","author":"Murphy Kevin P.","year":"2012","unstructured":"Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. MIT Press."},{"key":"e_1_3_3_7_5_4_2","first-page":"2825","article-title":"Scikit-Learn: Machine learning in python","volume":"12","author":"Pedregosa F.","year":"2011","unstructured":"F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-Learn: Machine learning in python. Journal of Machine Learning Research 12 (2011), 2825\u20132830.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_7_5_5_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/ast043"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3682065","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3682065","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:03Z","timestamp":1750295403000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3682065"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,6]]},"references-count":32,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3682065"],"URL":"https:\/\/doi.org\/10.1145\/3682065","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2024,12,6]]},"assertion":[{"value":"2023-10-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-04","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-12-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}