{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T18:40:03Z","timestamp":1744051203822,"version":"3.40.3"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"3","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,11]]},"abstract":"<jats:p>Understanding the reasons behind differences between aggregated sequences derived from SQL queries is crucial for data scientists. However, existing methods often suffer from being labor-intensive, lacking scalability, providing only approximate solutions, and inadequately supporting sequence difference explanations. In response, we introduce SDEcho, a novel framework designed to automate the explanation searching for sequence differences in high-dimensional and high-volume datasets. SDEcho utilizes advanced pruning techniques, considering pattern, order, and dimension perspectives, as well as their interactions, to prune the entire explanation space while maintaining explanations accurate and concise. This hybrid pruning approach significantly accelerates the explanation searching process, making SDEcho a valuable tool for data analysis tasks. Extensive experiments on synthetic and real-world datasets, along with a case study, demonstrate that SDEcho outperforms existing methods in terms of both effectiveness and efficiency.<\/jats:p>","DOI":"10.14778\/3712221.3712242","type":"journal-article","created":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T18:03:04Z","timestamp":1744048984000},"page":"784-797","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["SDEcho: Efficient Explanation of Aggregated Sequence Difference"],"prefix":"10.14778","volume":"18","author":[{"given":"Fei","family":"Ye","sequence":"first","affiliation":[{"name":"Fudan University"}]},{"given":"Zikang","family":"Liu","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Xi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Yinan","family":"Jing","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Zhenying","family":"He","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Yuxin","family":"Che","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Haoran","family":"Xiong","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Kai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"X. Sean","family":"Wang","sequence":"additional","affiliation":[{"name":"Fudan University"}]}],"member":"320","published-online":{"date-parts":[[2025,4,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. CDC. https:\/\/www.cdc.gov\/nchs Last accessed: 2024-12-16."},{"key":"e_1_2_1_2_1","unstructured":"2024. CMS. https:\/\/www.cms.gov\/priorities\/key-initiatives\/open-payments\/data Last accessed: 2024-5-1."},{"key":"e_1_2_1_3_1","unstructured":"2024. Google Trend. https:\/\/trends.google.com\/trends\/explore?q=%2Fg%2F11j0_8y5xw %2Fg%2F11c75ypgws Last accessed: 2024-12-16."},{"key":"e_1_2_1_4_1","unstructured":"2024. Imply. https:\/\/docs.imply.io\/latest\/explain\/ Last accessed: 2024-12-16."},{"key":"e_1_2_1_5_1","unstructured":"2024. Natality Dataset. https:\/\/www.cdc.gov\/nchs\/data_access\/VitalStatsOnline.htm#Births Last accessed: 2024-12-16."},{"key":"e_1_2_1_6_1","unstructured":"2024. Power BI. https:\/\/learn.microsoft.com\/en-us\/power-bi\/visuals\/power-bi-visualization-influencers?tabs=powerbi-desktop Last accessed: 2024-12-16."},{"key":"e_1_2_1_7_1","unstructured":"2024. Tableau. https:\/\/help.tableau.com\/current\/pro\/desktop\/en-us\/explain_data_basics.htm Last accessed: 2024-12-16."},{"key":"e_1_2_1_8_1","unstructured":"2024. U.S. Department of Health & Human Services. https:\/\/www.hhs.gov\/sites\/default\/files\/call-to-action-maternal-health.pdf Last accessed: 2024-12-16."},{"key":"e_1_2_1_9_1","unstructured":"2024. Yale Medicine. https:\/\/www.yalemedicine.org\/news\/maternal-mortality-on-the-rise Last accessed: 2024-12-16."},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3276463","article-title":"Macrobase: Prioritizing attention in fast data","volume":"43","author":"Abuzaid Firas","year":"2018","unstructured":"Firas Abuzaid, Peter Bailis, Jialin Ding, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, and Sahaana Suri. 2018. Macrobase: Prioritizing attention in fast data. ACM Transactions on Database Systems (TODS) 43, 4 (2018), 1--45.","journal-title":"ACM Transactions on Database Systems (TODS)"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/3297753.3297761"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989284.1989302"},{"key":"e_1_2_1_13_1","volume-title":"Pattern recognition and machine learning","author":"Bishop Christopher M","unstructured":"Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Springer."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/543613.543633"},{"key":"e_1_2_1_15_1","volume-title":"2023 IEEE 39th International Conference on Data Engineering (ICDE). 708--720","author":"Chen Yiru","year":"2023","unstructured":"Yiru Chen and Silu Huang. 2023. TSExplain: Explaining Aggregated Time Series by Surfacing Evolving Contributors. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). 708--720."},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"James Cheney Laura Chiticariu Wang-Chiew Tan et al. 2009. Provenance in databases: Why how and where. Foundations and Trends\u00ae in Databases 1 4 (2009) 379--474.","DOI":"10.1561\/1900000006"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639329"},{"key":"e_1_2_1_18_1","volume-title":"C","author":"Dadvar Vargha","year":"2022","unstructured":"Vargha Dadvar, Lukasz Golab, and Divesh Srivastava. 2022. Exploring Data Using Patterns: A Survey. Information Systems 108, C (2022), 11."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","first-page":"3854","DOI":"10.14778\/3565838.3565841","article-title":"FEDEX: An Explainability Framework for Data Exploration Steps","volume":"15","author":"Deutch Daniel","year":"2022","unstructured":"Daniel Deutch, Amir Gilad, Tova Milo, Amit Mualem, and Amit Somech. 2022. FEDEX: An Explainability Framework for Data Exploration Steps. Proceedings of the VLDB Endowment 15, 13 (2022), 3854--3868.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314037"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1038\/s41372-020-00912-8","article-title":"Temporal trends in preterm birth phenotypes by plurality: Black-White disparity over half a century","volume":"41","author":"Dongarwar Deepa","year":"2021","unstructured":"Deepa Dongarwar, Danyal Tahseen, Liye Wang, Muktar H Aliyu, and Hamisu M Salihu. 2021. Temporal trends in preterm birth phenotypes by plurality: Black-White disparity over half a century. Journal of Perinatology 41, 2 (2021), 204--211.","journal-title":"Journal of Perinatology"},{"key":"e_1_2_1_22_1","unstructured":"Richard O Duda Peter E Hart et al. 2006. Pattern classification. John Wiley & Sons."},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","first-page":"112718","DOI":"10.1016\/j.knosys.2024.112718","article-title":"A confidence-based knowledge integration framework for cross-domain table question answering","volume":"306","author":"Fan Yuankai","year":"2024","unstructured":"Yuankai Fan, Tonghui Ren, Can Huang, Beini Zheng, Yinan Jing, Zhenying He, Jinbao Li, and Jianxin Li. 2024. A confidence-based knowledge integration framework for cross-domain table question answering. Knowledge-Based Systems 306 (2024), 112718.","journal-title":"Knowledge-Based Systems"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132960.1132963"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","DOI":"10.1561\/9781680838817","volume-title":"Trends in explanations: Understanding and debugging data-driven systems. Foundations and Trends\u00ae in Databases 11, 3","author":"Glavic Boris","year":"2021","unstructured":"Boris Glavic, Alexandra Meliou, and Sudeepa Roy. 2021. Trends in explanations: Understanding and debugging data-driven systems. Foundations and Trends\u00ae in Databases 11, 3 (2021)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-017-0486-1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3517854"},{"key":"e_1_2_1_28_1","volume-title":"Disentangling age, gender, and racial\/ethnic disparities in multiple myeloma burden: a modeling study. Nature communications 14, 1","author":"Huber John H","year":"2023","unstructured":"John H Huber, Mengmeng Ji, Yi-Hsuan Shih, Mei Wang, Graham Colditz, and Su-Hsin Chang. 2023. Disentangling age, gender, and racial\/ethnic disparities in multiple myeloma burden: a modeling study. Nature communications 14, 1 (2023), 5768."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824103"},{"key":"e_1_2_1_31_1","volume-title":"Prevention (CDC), et al.","author":"Johnson Nicole Blair","year":"2014","unstructured":"Nicole Blair Johnson, Locola D Hayes, Kathryn Brown, Elizabeth C Hoo, Kathleen A Ethier, Centers for Disease Control, Prevention (CDC), et al. 2014. CDC National Health Report: leading causes of morbidity and mortality and associated behavioral risk and protective factors-United States, 2005-2013. MMWR suppl 63, 4 (2014), 3--27."},{"key":"e_1_2_1_32_1","volume-title":"International Conference on Big Data Analytics and Knowledge Discovery. 235--244","author":"Labaien Jokin","year":"2020","unstructured":"Jokin Labaien, Ekhi Zugasti, and Xabier De Carlos. 2020. Contrastive explanations for a deep learning model on time-series data. In International Conference on Big Data Analytics and Knowledge Discovery. 235--244."},{"key":"e_1_2_1_33_1","unstructured":"Kin Kwan Leung Clayton Rooke Jonathan Smith Saba Zuberi and Maksims Volkovs. 2023. Temporal Dependencies in Feature Importance for Time Series Predictions. arXiv:2107.14317"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3459246"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476304"},{"key":"e_1_2_1_36_1","volume-title":"DeepEye: Towards Automatic Data Visualization. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 101--112","author":"Luo Yuyu","year":"2018","unstructured":"Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li. 2018. DeepEye: Towards Automatic Data Visualization. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 101--112."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-demo.31"},{"key":"e_1_2_1_38_1","first-page":"27","article-title":"XInsight: EXplainable Data Analysis Through The Lens of Causality","volume":"1","author":"Ma Pingchuan","year":"2023","unstructured":"Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, and Dongmei Zhang. 2023. XInsight: EXplainable Data Analysis Through The Lens of Causality. Proceedings of the ACM on Management of Data 1, 2 (2023), 27.","journal-title":"Proceedings of the ACM on Management of Data"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733070"},{"key":"e_1_2_1_40_1","doi-asserted-by":"crossref","first-page":"1898","DOI":"10.14778\/3352063.3352094","article-title":"LensXPlain: Visualizing and explaining contributing subsets for aggregate query answers","volume":"12","author":"Miao Zhengjie","year":"2019","unstructured":"Zhengjie Miao, Andrew Lee, and Sudeepa Roy. 2019. LensXPlain: Visualizing and explaining contributing subsets for aggregate query answers. Proceedings of the VLDB Endowment 12, 12 (2019), 1898--1901.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319866"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300066"},{"key":"e_1_2_1_43_1","volume-title":"Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12","author":"Pedregosa Fabian","year":"2011","unstructured":"Fabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825--2830."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856329"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588578"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196914"},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1023\/A:1011494927464","article-title":"idiff: Informative summarization of differences in multidimensional aggregates","volume":"5","author":"Sarawagi Sunita","year":"2001","unstructured":"Sunita Sarawagi. 2001. idiff: Informative summarization of differences in multidimensional aggregates. Data Mining and Knowledge Discovery 5 (2001), 255--276.","journal-title":"Data Mining and Knowledge Discovery"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","first-page":"2419","DOI":"10.14778\/3476249.3476291","article-title":"COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics","volume":"14","author":"Siddiqui Tarique","year":"2021","unstructured":"Tarique Siddiqui, Surajit Chaudhuri, and Vivek Narasayya. 2021. COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics. Proceedings of the VLDB Endowment 14, 11 (2021), 2419--2431.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","first-page":"457","DOI":"10.14778\/3025111.3025126","article-title":"Effortless data exploration with zenvisage: an expressive and interactive visual analytics system","volume":"10","author":"Siddiqui Tarique","year":"2016","unstructured":"Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya Parameswaran. 2016. Effortless data exploration with zenvisage: an expressive and interactive visual analytics system. Proceedings of the VLDB Endowment 10, 4 (2016), 457--468.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389762"},{"key":"e_1_2_1_51_1","first-page":"799","article-title":"What went wrong and when? Instance-wise feature importance for time-series black-box models","volume":"33","author":"Tonekaboni Sana","year":"2020","unstructured":"Sana Tonekaboni, Shalmali Joshi, Kieran Campbell, David K Duvenaud, and Anna Goldenberg. 2020. What went wrong and when? Instance-wise feature importance for time-series black-box models. Advances in Neural Information Processing Systems 33 (2020), 799--809.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/2831360.2831371"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750549"},{"key":"e_1_2_1_54_1","volume-title":"Principal component analysis. Chemometrics and intelligent laboratory systems 2, 1-3","author":"Wold Svante","year":"1987","unstructured":"Svante Wold, Kim Esbensen, and Paul Geladi. 1987. Principal component analysis. Chemometrics and intelligent laboratory systems 2, 1-3 (1987), 37--52."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536354.2536356"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538603"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3639328"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467279"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3712221.3712242","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,7]],"date-time":"2025-04-07T18:08:32Z","timestamp":1744049312000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3712221.3712242"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11]]},"references-count":58,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["10.14778\/3712221.3712242"],"URL":"https:\/\/doi.org\/10.14778\/3712221.3712242","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,11]]},"assertion":[{"value":"2025-04-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}