{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T07:13:09Z","timestamp":1779174789963,"version":"3.51.4"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"7","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:p>Outlier detection is crucial for preventing financial fraud, network intrusions, and device failures. Users often expect systems to automatically summarize and interpret outlier detection results to reduce human effort and convert outliers into actionable insights. However, existing methods fail to effectively assist users in identifying the root causes of outliers, as they only pinpoint data attributes without considering outliers in the same subspace may have different causes.<\/jats:p>\n          <jats:p>\n            To fill this gap, we propose STAIR, which learns concise and human-understandable\n            <jats:italic>rules<\/jats:italic>\n            to summarize and explain outlier detection results with\n            <jats:italic>finer<\/jats:italic>\n            granularity. These rules consider both attributes and associated values. STAIR employs an interpretation-aware optimization objective to generate a small number of rules with minimal complexity for strong interpretability. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets that are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate.\n          <\/jats:p>","DOI":"10.14778\/3654621.3654627","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T22:21:08Z","timestamp":1717107668000},"page":"1591-1604","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Outlier Summarization via Human Interpretable Rules"],"prefix":"10.14778","volume":"17","author":[{"given":"Yuhao","family":"Deng","sequence":"first","affiliation":[{"name":"Beijing Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Wang","sequence":"additional","affiliation":[{"name":"University of California"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"University of Arizona\/MIT"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lianpeng","family":"Qiao","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuping","family":"Wang","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingzhe","family":"Xu","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yizhou","family":"Yan","sequence":"additional","affiliation":[{"name":"Worcester Polytechnic Institute"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samuel","family":"Madden","sequence":"additional","affiliation":[{"name":"MIT"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"1993. Mammography. https:\/\/www.kaggle.com\/datasets\/kmader\/mias-mammography."},{"key":"e_1_2_1_2_1","unstructured":"1993. Satimage-2. https:\/\/odds.cs.stonybrook.edu\/satimage-2-dataset\/."},{"key":"e_1_2_1_3_1","unstructured":"1995. PageBlock. https:\/\/archive.ics.uci.edu\/dataset\/78\/page+blocks+classification."},{"key":"e_1_2_1_4_1","unstructured":"1998. Covertype. https:\/\/archive.ics.uci.edu\/dataset\/31\/covertype."},{"key":"e_1_2_1_5_1","unstructured":"1999. Spambase. http:\/\/archive.ics.uci.edu\/dataset\/94\/spambase."},{"key":"e_1_2_1_6_1","unstructured":"2007. Shuttle. https:\/\/archive.ics.uci.edu\/dataset\/148\/statlog+shuttle."},{"key":"e_1_2_1_7_1","unstructured":"2014. Pendigits. https:\/\/datahub.io\/machine-learning\/pendigits#readme."},{"key":"e_1_2_1_8_1","unstructured":"2016. Pima. https:\/\/www.dbs.ifi.lmu.de\/research\/outlier-evaluation\/DAMI\/semantic\/Pima\/Pima_35.html."},{"key":"e_1_2_1_9_1","unstructured":"2017. Satellite. https:\/\/datahub.io\/machine-learning\/satellite#readme."},{"key":"e_1_2_1_10_1","unstructured":"2018. Thursday-01-03. https:\/\/www.kaggle.com\/datasets\/karenp\/original-network-traffic-thursday-01-03-2018-logs."},{"key":"e_1_2_1_11_1","unstructured":"2023. https:\/\/github.com\/baodaBBji\/anonymous-Tech-Report\/blob\/main\/Outlier_Tech_Report.pdf."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-47578-3"},{"key":"e_1_2_1_13_1","article-title":"Learning Certifiably Optimal Rule Lists for Categorical Data","volume":"18","author":"Angelino Elaine","year":"2017","unstructured":"Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo I. Seltzer, and Cynthia Rudin. 2017. Learning Certifiably Optimal Rule Lists for Categorical Data. J. Mach. Learn. Res. 18 (2017), 234:1--234:78. http:\/\/jmlr.org\/papers\/v18\/17-716.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Fabrizio Angiulli and Clara Pizzuti. 2002. Fast Outlier Detection in High Dimensional Spaces. In PKDD. 15--26.","DOI":"10.1007\/3-540-45681-3_2"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0130140"},{"key":"e_1_2_1_16_1","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035928"},{"key":"e_1_2_1_18_1","volume-title":"Comparative Analysis of Decision Tree Algorithms","author":"Batra Mridula","unstructured":"Mridula Batra and Rashmi Agrawal. 2018. Comparative Analysis of Decision Tree Algorithms. In Nature Inspired Computing, Bijaya Ketan Panigrahi, M. N. Hoda, Vinod Sharma, and Shivendra Goel (Eds.). Springer Singapore, Singapore, 31--36."},{"key":"e_1_2_1_19_1","first-page":"462","article-title":"A method of choosing multiway partitions for classification and decision trees","volume":"18","author":"Biggs David","year":"1991","unstructured":"David Biggs, Barry De Ville, and Ed Suen. 1991. A method of choosing multiway partitions for classification and decision trees. Journal of applied statistics 18, 1 (1991), 462.","journal-title":"Journal of applied statistics"},{"key":"e_1_2_1_20_1","unstructured":"L. Breiman J. H. Friedman R. A. Olshen and C.J. Stone. 1984. Classification and Regression Trees. Wadsworth and Brooks Monterey CA."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335388"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389772"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589302"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/3091622.3091637"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2009.05.016"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.14778\/3648160.3648161"},{"key":"e_1_2_1_27_1","volume-title":"IDE: A System for Iterative Mislabel Detection. In Companion of the 2024 International Conference on Management of Data, SIGMOD\/PODS 2024","author":"Deng Yuhao","year":"2024","unstructured":"Yuhao Deng, Qiyan Deng, Chengliang Chai, Lei Cao, Nan Tang, Ju Fan, Jiayi Wang, Ye Yuan, and Guoren Wang. 2024. IDE: A System for Iterative Mislabel Detection. In Companion of the 2024 International Conference on Management of Data, SIGMOD\/PODS 2024, Santiago, Chile, June 9--15, 2024. ACM."},{"key":"e_1_2_1_28_1","volume-title":"Comparing interpretability and explainability for feature selection. CoRR abs\/2105.05328","author":"Dunn Jack","year":"2021","unstructured":"Jack Dunn, Luca Mingardi, and Ying Daisy Zhuo. 2021. Comparing interpretability and explainability for feature selection. CoRR abs\/2105.05328 (2021)."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, LOPAL 2018","author":"Elaidi Halima","year":"2018","unstructured":"Halima Elaidi, Zahra Benabbou, and Hassan Abbar. 2018. A comparative study of algorithms constructing decision trees: ID3 and C4.5. In Proceedings of the International Conference on Learning and Optimization Algorithms: Theory and Applications, LOPAL 2018, Rabat, Morocco, May 2--5, 2018. 26:1--26:5."},{"key":"e_1_2_1_30_1","first-page":"226","article-title":"A density-based algorithm for discovering clusters in large spatial databases with noise","volume":"96","author":"Ester Martin","year":"1996","unstructured":"Martin Ester, Hans-Peter Kriegel, J\u00f6rg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96. 226--231.","journal-title":"Kdd"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/10618600.2019.1647846"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00778-015-0394-1"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2488388.2488425"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735461.2735467"},{"key":"e_1_2_1_35_1","volume-title":"Local Rule-Based Explanations of Black Box Decision Systems. CoRR abs\/1805.10820","author":"Guidotti Riccardo","year":"2018","unstructured":"Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Dino Pedreschi, Franco Turini, and Fosca Giannotti. 2018. Local Rule-Based Explanations of Black Box Decision Systems. CoRR abs\/1805.10820 (2018)."},{"key":"e_1_2_1_36_1","volume-title":"Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD","author":"Gupta Nikhil","year":"2018","unstructured":"Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, and Christos Faloutsos. 2018. Beyond Outlier Detection: LookOut for Pictorial Explanation. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2018, Dublin, Ireland, September 10--14, 2018, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 11051), Michele Berlingerio, Francesco Bonchi, Thomas G\u00e4rtner, Neil Hurley, and Georgiana Ifrim (Eds.). Springer, 122--138."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2002.1017616"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.2307\/2986296"},{"key":"e_1_2_1_39_1","volume-title":"HiCS: High Contrast Subspaces for Density-Based Outlier Ranking. In IEEE 28th International Conference on Data Engineering (ICDE 2012","author":"Keller Fabian","year":"2012","unstructured":"Fabian Keller, Emmanuel M\u00fcller, and Klemens B\u00f6hm. 2012. HiCS: High Contrast Subspaces for Density-Based Outlier Ranking. In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1--5 April, 2012, Anastasios Kementsietsidis and Marcos Antonio Vaz Salles (Eds.). IEEE Computer Society, 1037--1048."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505560"},{"key":"e_1_2_1_41_1","volume-title":"Ng","author":"Knorr Edwin M.","year":"1999","unstructured":"Edwin M. Knorr and Raymond T. Ng. 1999. Finding Intensional Knowledge of Distance-Based Outliers. In In VLDB. 211--222."},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Himabindu Lakkaraju Stephen H. Bach and Jure Leskovec. 2016. Interpretable Decision Sets: A Joint Framework for Description and Prediction. In KDD. ACM 1675--1684.","DOI":"10.1145\/2939672.2939874"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.17"},{"key":"e_1_2_1_44_1","volume-title":"A unified approach to interpreting model predictions. Advances in neural information processing systems 30","author":"Lundberg Scott M","year":"2017","unstructured":"Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3494124.3494143"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2022.3186498"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352071"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2864812"},{"key":"e_1_2_1_49_1","volume-title":"Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021","author":"Myrtakis Nikolaos","year":"2021","unstructured":"Nikolaos Myrtakis, Vassilis Christophides, and Eric Simon. 2021. A Comparative Evaluation of Anomaly Explanation Algorithms. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021. OpenProceedings.org, 97--108."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-016-0453-2"},{"key":"e_1_2_1_51_1","volume-title":"ICCS (5) (Lecture Notes in Computer Science","author":"Pa\u00e7aci G\u00f6rkem","unstructured":"G\u00f6rkem Pa\u00e7aci, David Johnson, Steve McKeever, and Andreas Hamfelt. 2019. \"Why Did You Do That?\" - Explaining Black Box Models with Inductive Synthesis. In ICCS (5) (Lecture Notes in Computer Science, Vol. 11540). Springer, 334--345."},{"key":"e_1_2_1_52_1","volume-title":"Open the Black Box Data-Driven Explanation of Black Box Decision Systems. CoRR abs\/1806.09936","author":"Pedreschi Dino","year":"2018","unstructured":"Dino Pedreschi, Fosca Giannotti, Riccardo Guidotti, Anna Monreale, Luca Pappalardo, Salvatore Ruggieri, and Franco Turini. 2018. Open the Black Box Data-Driven Explanation of Black Box Decision Systems. CoRR abs\/1806.09936 (2018)."},{"key":"e_1_2_1_53_1","volume-title":"Maryland","author":"Phillips P Jonathon","year":"2020","unstructured":"P Jonathon Phillips, Carina A Hahn, Peter C Fontana, David A Broniatowski, and Mark A Przybocki. 2020. Four principles of explainable artificial intelligence. Gaithersburg, Maryland (2020)."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00116251"},{"key":"e_1_2_1_55_1","unstructured":"J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335437"},{"key":"e_1_2_1_57_1","doi-asserted-by":"crossref","unstructured":"Marco T\u00falio Ribeiro Sameer Singh and Carlos Guestrin. 2016. \"Why Should I Trust You?\": Explaining the Predictions of Any Classifier. In KDD. ACM 1135--1144.","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_2_1_58_1","volume-title":"Anchors: High-Precision Model-Agnostic Explanations","author":"Ribeiro Marco T\u00falio","year":"2018","unstructured":"Marco T\u00falio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-Precision Model-Agnostic Explanations. In AAAI. AAAI Press, 1527--1535."},{"key":"e_1_2_1_59_1","volume-title":"Contemporary issues in exploratory data mining in the behavioral sciences","author":"Ritschard Gilbert","unstructured":"Gilbert Ritschard. 2013. CHAID and earlier supervised tree methods. In Contemporary issues in exploratory data mining in the behavioral sciences. Routledge, 70--96."},{"key":"e_1_2_1_60_1","volume-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence 1, 5","author":"Rudin Cynthia","year":"2019","unstructured":"Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence 1, 5 (2019), 206--215."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/21.97458"},{"key":"e_1_2_1_62_1","volume-title":"ICML (Proceedings of Machine Learning Research","volume":"3153","author":"Shrikumar Avanti","year":"2017","unstructured":"Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning Important Features Through Propagating Activation Differences. In ICML (Proceedings of Machine Learning Research, Vol. 70). PMLR, 3145--3153."},{"key":"e_1_2_1_63_1","unstructured":"Karen Simonyan Andrea Vedaldi and Andrew Zisserman. 2014. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In ICLR (Workshop Poster)."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3058739"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.14778\/3149193.3149199"},{"key":"e_1_2_1_66_1","volume-title":"ICML (Proceedings of Machine Learning Research","volume":"3328","author":"Sundararajan Mukund","year":"2017","unstructured":"Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. In ICML (Proceedings of Machine Learning Research, Vol. 70). PMLR, 3319--3328."},{"key":"e_1_2_1_67_1","volume-title":"BlackboxNLP@EMNLP","author":"Sushil Madhumita","unstructured":"Madhumita Sushil, Simon Suster, and Walter Daelemans. 2018. Rule induction for global explanation of trained models. In BlackboxNLP@EMNLP. Association for Computational Linguistics, 82--97."},{"key":"e_1_2_1_68_1","volume-title":"Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9--12, 2015 (JMLR Workshop and Conference Proceedings","author":"Wang Fulton","year":"2015","unstructured":"Fulton Wang and Cynthia Rudin. 2015. Falling Rule Lists. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2015, San Diego, California, USA, May 9--12, 2015 (JMLR Workshop and Conference Proceedings, Vol. 38), Guy Lebanon and S. V. N. Vishwanathan (Eds.). JMLR.org. http:\/\/proceedings.mlr.press\/v38\/wang15a.html"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536354.2536356"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2023.3293129"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3654621.3654627","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T22:27:17Z","timestamp":1717108037000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3654621.3654627"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3]]},"references-count":70,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10.14778\/3654621.3654627"],"URL":"https:\/\/doi.org\/10.14778\/3654621.3654627","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,3]]},"assertion":[{"value":"2024-05-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}