{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T09:00:19Z","timestamp":1768294819684,"version":"3.49.0"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGKDD Explor. Newsl."],"published-print":{"date-parts":[[2025,12,30]]},"abstract":"<jats:p>Interpretable (or explainable) machine learning models, such as decision trees, play a crucial role in the context of trustworthy AI. However, finding optimal decision trees (i.e., minimum size and maximum accuracy trees) is not a simple task and remains an active area of research. While a single decision tree has limited expressivity, using an ensemble of decision trees can effectively capture the complex structures found in many real-world applications. Many existing tree ensemble methods are greedy and suboptimal, and often suffer from randomness in the tree generation process. In this paper, we introduce DT-sampler, a SAT-based decision tree ensemble which allows explicit control over both the size and accuracy of the sampled trees. We developed a novel SAT-based encoding method that utilizes only branch nodes, resulting in a compact representation of decision tree space. Additionally, standard point predictions made using decision tree ensembles do not offer any statistical guarantee over miscoverage rate. We employ conformal prediction (CP), a distribution-free statistical framework which provides a valid finite-sample coverage guarantee, to demonstrate that DT-sampler is statistically more efficient and produces stable results when compared with random forest classifier. We demonstrate the effectiveness of our method through several benchmark and real-world datasets.<\/jats:p>","DOI":"10.1145\/3787470.3787484","type":"journal-article","created":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T00:46:21Z","timestamp":1767228381000},"page":"132-141","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["DT-sampler: A SAT-based Decision Tree Ensemble"],"prefix":"10.1145","volume":"27","author":[{"given":"Xiaotian","family":"Xue","sequence":"first","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chao","family":"Huang","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Koji","family":"Tsuda","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Diptesh","family":"Das","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,12,31]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939874"},{"key":"e_1_2_1_2_1","volume-title":"An interpretable machine learning model for accurate prediction of sepsis in the icu. Critical care medicine, 46:547--553","author":"Nemati Shamim","year":"2018","unstructured":"Shamim Nemati, Andre Holder, Fereshteh Razmi, Matthew D Stanley, Gari D Clifford, and Timothy G Buchman. An interpretable machine learning model for accurate prediction of sepsis in the icu. Critical care medicine, 46:547--553, 2018."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3233547.3233667"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.7717\/peerj.6543"},{"key":"e_1_2_1_5_1","volume-title":"Department of Computational Biology and Medical Sciences","year":"2019","unstructured":"Diptesh. Das. Interpretable Machine Learning Models for Medical Data. Ph.D. diss., Department of Computational Biology and Medical Sciences, The University of Tokyo, Kashiwa, Japan, 2019."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i9.21238"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1002\/sta4.633"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-0947-6_31"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3098047"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10940-022-09545-w"},{"key":"e_1_2_1_11_1","first-page":"17760","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Liu Jiachang","year":"2022","unstructured":"Jiachang Liu, Chudi Zhong, Boxuan Li, Margo Seltzer, and Cynthia Rudin. Fasterrisk: fast and accurate interpretable risk scores. In Proceedings of the 36th International Conference on Neural Information Processing Systems, pages 17760--17773, 2022."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v38i3.2741"},{"key":"e_1_2_1_13_1","volume-title":"Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206--215","author":"Rudin Cynthia","year":"2019","unstructured":"Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206--215, 2019."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1996.tb02080.x"},{"key":"e_1_2_1_15_1","volume-title":"Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70:849--911","author":"Fan Jianqing","year":"2008","unstructured":"Jianqing Fan and Jinchi Lv. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70:849--911, 2008."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.1993.342465"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022643204877"},{"key":"e_1_2_1_18_1","volume-title":"5: programs for machine learning","author":"Quinlan J Ross","year":"2014","unstructured":"J Ross Quinlan. C4. 5: programs for machine learning. Elsevier, 2014."},{"key":"e_1_2_1_19_1","volume-title":"Classification and Regression Trees","author":"Breiman L.","year":"1984","unstructured":"L. Breiman, J. Friedman, C.J. Stone, and R.A. Olshen. Classification and Regression Trees. Taylor & Francis, 1984."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2023.1124553"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/189"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/85.2.363"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1998.10473750"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013916107446"},{"key":"e_1_2_1_25_1","volume-title":"Bart: Bayesian additive regression trees","author":"Chipman Hugh A","year":"2010","unstructured":"Hugh A Chipman, Edward I George, and Robert E Mc- Culloch. Bart: Bayesian additive regression trees. 2010."},{"key":"e_1_2_1_26_1","volume-title":"Tyler H Mc- Cormick, and David Madigan. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model","author":"Letham Benjamin","year":"2015","unstructured":"Benjamin Letham, Cynthia Rudin, Tyler H Mc- Cormick, and David Madigan. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. 2015."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2013.6557778"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1550723"},{"key":"e_1_2_1_30_1","volume-title":"A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511","author":"Angelopoulos Anastasios N","year":"2021","unstructured":"Anastasios N Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1390681.1390693"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2013.85"},{"key":"e_1_2_1_33_1","first-page":"173","volume-title":"CP 2009 Lisbon, Portugal, September 20--24, 2009 Proceedings 15","author":"Bessiere Christian","year":"2009","unstructured":"Christian Bessiere, Emmanuel Hebrard, and Barry O'Sullivan. Minimising decision tree size as combinatorial optimisation. In Principles and Practice of Constraint Programming-CP 2009: 15th International Conference, CP 2009 Lisbon, Portugal, September 20--24, 2009 Proceedings 15, pages 173--187. Springer, 2009."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10601-020-09312-3"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-51825-7_35"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/1792734.1792766"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3180155.3180248"},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Mate Soos Stephan Gocht and Kuldeep Meel. Tinted Detached and Lazy CNF-XOR Solving and Its Applications to Counting and Sampling pages 463--484. 07 2020.","DOI":"10.1007\/978-3-030-53288-8_22"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1039\/D1DD00043H"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-36755-1_29"},{"key":"e_1_2_1_41_1","volume-title":"UCI machine learning repository","author":"Dua Dheeru","year":"2017","unstructured":"Dheeru Dua and Casey Graff. UCI machine learning repository, 2017."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-019-12130-8"},{"key":"e_1_2_1_43_1","first-page":"11","volume-title":"ProPublica","author":"Larson Jeff","year":"2016","unstructured":"Jeff Larson, Surya Mattu, Lauren Kirchner, and Julia Angwin. How we analyzed the compas recidivism algorithm. ProPublica, May 23, 2016. Accessed: 2025--11-09."}],"container-title":["ACM SIGKDD Explorations Newsletter"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3787470.3787484","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,13]],"date-time":"2026-01-13T00:43:54Z","timestamp":1768265034000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3787470.3787484"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,30]]},"references-count":43,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,12,30]]}},"alternative-id":["10.1145\/3787470.3787484"],"URL":"https:\/\/doi.org\/10.1145\/3787470.3787484","relation":{},"ISSN":["1931-0145","1931-0153"],"issn-type":[{"value":"1931-0145","type":"print"},{"value":"1931-0153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,30]]},"assertion":[{"value":"2025-12-31","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}