{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T20:14:03Z","timestamp":1776111243801,"version":"3.50.1"},"reference-count":56,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2020,10,24]],"date-time":"2020-10-24T00:00:00Z","timestamp":1603497600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFB1800204"],"award-info":[{"award-number":["2018YFB1800204"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61771273"],"award-info":[{"award-number":["61771273"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"R&amp;D Program of Shenzhe","award":["JCYJ20180508152204044"],"award-info":[{"award-number":["JCYJ20180508152204044"]}]},{"name":"PCL Future Greater-Bay Area Network Facilities for Large-scale Experiments and Applications","award":["LZC0019"],"award-info":[{"award-number":["LZC0019"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Deep Neural Networks (DNNs) usually work in an end-to-end manner. This makes the trained DNNs easy to use, but they remain an ambiguous decision process for every test case. Unfortunately, the interpretability of decisions is crucial in some scenarios, such as medical or financial data mining and decision-making. In this paper, we propose a Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs. Specifically, the proposed TNT learning framework exerts the advantages of different models at different stages: (1) a novel James\u2013Stein Decision Tree (JSDT) is proposed to generate better knowledge representations for DNNs, especially when the input data are in low-frequency or low-quality; (2) the DNNs output high-performing prediction result from the knowledge embedding inputs and behave as a teacher model for the following tree model; and (3) a novel distillable Gradient Boosted Decision Tree (dGBDT) is proposed to learn interpretable trees from the soft labels and make a comparable prediction as DNNs do. Extensive experiments on various machine learning tasks demonstrated the effectiveness of the proposed method.<\/jats:p>","DOI":"10.3390\/e22111203","type":"journal-article","created":{"date-parts":[[2020,10,26]],"date-time":"2020-10-26T02:34:54Z","timestamp":1603679694000},"page":"1203","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["TNT: An Interpretable Tree-Network-Tree Learning Framework using Knowledge Distillation"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3873-8003","authenticated-orcid":false,"given":"Jiawei","family":"Li","sequence":"first","affiliation":[{"name":"Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2258-265X","authenticated-orcid":false,"given":"Yiming","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China"}]},{"given":"Xingchun","family":"Xiang","sequence":"additional","affiliation":[{"name":"Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China"}]},{"given":"Shu-Tao","family":"Xia","sequence":"additional","affiliation":[{"name":"Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China"},{"name":"PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen 518055, China"}]},{"given":"Siyi","family":"Dong","sequence":"additional","affiliation":[{"name":"Ping An Life Insurance Company of China, Ltd., Shenzhen 518046, China"}]},{"given":"Yun","family":"Cai","sequence":"additional","affiliation":[{"name":"Ping An Life Insurance Company of China, Ltd., Shenzhen 518046, China"}]}],"member":"1968","published-online":{"date-parts":[[2020,10,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Pan, Y., Mei, T., Yao, T., Li, H., and Rui, Y. (2016, January 27\u201330). Jointly modeling embedding and translation to bridge video and language. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.497"},{"key":"ref_2","unstructured":"Huang, L., Wang, W., Chen, J., and Wei, X.Y. (November, January 27). Attention on attention for image captioning. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_3","unstructured":"Lu, J., Yang, J., Batra, D., and Parikh, D. (2016, January 5\u201310). Hierarchical question-image co-attention for visual question answering. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3236009","article-title":"A survey of methods for explaining black box models","volume":"51","author":"Guidotti","year":"2018","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_5","unstructured":"Molnar, C. (2018, June 06). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Available online: https:\/\/christophm.github.io\/interpretable-ml-book\/."},{"key":"ref_6","first-page":"371","article-title":"Interpretable deep models for ICU outcome prediction","volume":"2016","author":"Che","year":"2016","journal-title":"AMIA Annu. Symp. Proc."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"106384","DOI":"10.1016\/j.asoc.2020.106384","article-title":"Deep learning for financial applications: A survey","volume":"93","author":"Ozbayoglu","year":"2020","journal-title":"Appl. Soft Comput."},{"key":"ref_8","unstructured":"Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2016). Understanding deep learning requires rethinking generalization. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kontschieder, P., Fiterau, M., Criminisi, A., and Rota Bulo, S. (2015, January 13\u201316). Deep neural decision forests. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.172"},{"key":"ref_10","unstructured":"Ioannou, Y., Robertson, D., Zikic, D., Kontschieder, P., Shotton, J., Brown, M., and Criminisi, A. (2016). Decision forests, convolutional networks and the models in-between. arXiv."},{"key":"ref_11","unstructured":"Frosst, N., and Hinton, G. (2017). Distilling a neural network into a soft decision tree. arXiv."},{"key":"ref_12","unstructured":"Feng, J., Xu, Y.X., Jiang, Y., and Zhou, Z.H. (2020). Soft Gradient Boosting Machine. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wang, X., He, X., Feng, F., Nie, L., and Chua, T.S. (2018, January 23\u201327). Tem: Tree-enhanced embedding model for explainable recommendation. Proceedings of the 2018 World Wide Web Conference, Lyon, France.","DOI":"10.1145\/3178876.3186066"},{"key":"ref_14","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhao, B., Xiao, X., Zhang, W., Zhang, B., Gan, G., and Xia, S. (2020, January 4\u20138). Self-Paced Probabilistic Principal Component Analysis for Data with Outliers. Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9054487"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Li, J., Dai, T., Tang, Q., Xing, Y., and Xia, S.T. (2018, January 7\u201310). Cyclic annealing training convolutional neural networks for image classification with noisy labels. Proceedings of the 2018 IEEE International Conference on Image Processing, Athens, Greece.","DOI":"10.1109\/ICIP.2018.8451331"},{"key":"ref_18","unstructured":"Papernot, N., McDaniel, P., and Goodfellow, I. (2016). Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, X., Yan, X., Zheng, F., Jiang, Y., Xia, S., Zhao, Y., and Ji, R. (2020, January 13\u201319). One-Shot Adversarial Attacks on Visual Tracking With Dual Attention. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01019"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yan, X., Chen, X., Jiang, Y., Xia, S., Zhao, Y., and Zheng, F. (2020, January 4\u20138). Hijacking Tracker: A Powerful Adversarial Attack on Visual Tracking. Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain.","DOI":"10.1109\/ICASSP40776.2020.9053574"},{"key":"ref_21","unstructured":"Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., and Jordan, M.I. (2019). Theoretically Principled Trade-off between Robustness and Accuracy. arXiv."},{"key":"ref_22","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., and Torralba, A. (2016, January 27\u201330). Learning deep features for discriminative localization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.319"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, W., Zhu, M., Wang, J., Zeng, X., and Yang, Z. (2017, January 22\u201324). End-to-end encrypted traffic classification with one-dimensional convolution neural networks. Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics, Beijing, China.","DOI":"10.1109\/ISI.2017.8004872"},{"key":"ref_25","unstructured":"Bai, S., Kolter, J.Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_28","unstructured":"Chen, H., Zhang, H., Boning, D., and Hsieh, C.J. (2019, January 10\u201315). Robust Decision Trees Against Adversarial Examples. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_29","unstructured":"Bai, J., Li, Y., Li, J., Jiang, Y., and Xia, S. (2020). Rectified Decision Trees: Exploring the Landscape of Interpretable and Effective Machine Learning. arXiv."},{"key":"ref_30","unstructured":"Chen, H., Zhang, H., Si, S., Li, Y., Boning, D., and Hsieh, C.J. (2019, January 8\u201314). Robustness verification of tree-based models. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_31","first-page":"59","article-title":"Robustness Verification of Decision Tree Ensembles","volume":"2509","author":"Ranzato","year":"2019","journal-title":"OVERLAY@ AI* IA"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Cheng, H.T., Koc, L., Harmsen, J., Shaked, T., Chandra, T., Aradhye, H., Anderson, G., Corrado, G., Chai, W., and Ispir, M. (2016, January 15\u201319). Wide & deep learning for recommender systems. Proceedings of the 1st workshop on Deep Learning for Recommender Systems, Boston, MA, USA.","DOI":"10.1145\/2988450.2988454"},{"key":"ref_33","unstructured":"Irsoy, O., Y\u0131ld\u0131z, O.T., and Alpayd\u0131n, E. (2012, January 11\u201315). Soft decision trees. Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan."},{"key":"ref_34","unstructured":"Zhou, Z.H., and Feng, J. (2019, January 8\u201314). Deep forest: Towards an alternative to deep neural networks. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Rota Bulo, S., and Kontschieder, P. (2014, January 24\u201327). Neural decision forests for semantic image labelling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.18"},{"key":"ref_36","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Li, J., Xiang, X., Dai, T., and Xia, S.T. (2019, January 1\u20134). Making Large Ensemble of Convolutional Neural Networks via Bootstrap Re-sampling. Proceedings of the 2019 IEEE Visual Communications and Image Processing, Sydney, Australia.","DOI":"10.1109\/VCIP47243.2019.8965828"},{"key":"ref_38","first-page":"86","article-title":"UA-DRN: Unbiased Aggregation of Deep Neural Networks for Regression Ensemble","volume":"15","author":"Li","year":"2019","journal-title":"Aust. J. Intell. Inf. Process. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Yim, J., Joo, D., Bae, J., and Kim, J. (2017, January 21\u201326). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.754"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"4334","DOI":"10.1109\/TII.2018.2789925","article-title":"Distilling the knowledge from handcrafted features for human activity recognition","volume":"14","author":"Chen","year":"2018","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_41","unstructured":"Shen, Z., He, Z., and Xue, X. (February, January 27). Meal: Multi-model ensemble via adversarial learning. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_42","unstructured":"Breiman, L., Friedman, J., Stone, C.J., and Olshen, R.A. (1984). Classification and Regression Trees, CRC Press."},{"key":"ref_43","unstructured":"Xiang, X., Tang, Q., Zhang, H., Dai, T., Li, J., and Xia, S. (2020). JSRT: James-Stein Regression Tree. arXiv."},{"key":"ref_44","unstructured":"James, W., and Stein, C. (July, January 20). Estimation with quadratic loss. Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Efron, B., and Hastie, T. (2016). Computer Age Statistical Inference, Cambridge University Press.","DOI":"10.1017\/CBO9781316576533"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1214\/aos\/1176343009","article-title":"Minimax estimators of the mean of a multivariate normal distribution","volume":"3","author":"Bock","year":"1975","journal-title":"Ann. Stat."},{"key":"ref_47","unstructured":"Feldman, S., Gupta, M., and Frigyik, B. (2012, January 3\u20138). Multi-task averaging. Proceedings of the Advances in Neural Information Processing Systems, Stateline, NV, USA."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Shi, T., Agostinelli, F., Staib, M., Wipf, D., and Moscibroda, T. (2016, January 13\u201317). Improving survey aggregation with sparsely represented signals. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939876"},{"key":"ref_49","unstructured":"Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.Y. (2017, January 4\u20139). Lightgbm: A highly efficient gradient boosting decision tree. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_50","unstructured":"Huang, J., Li, G., Yan, Z., Luo, F., and Li, S. (2020). Joint learning of interpretation and distillation. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Ke, G., Xu, Z., Zhang, J., Bian, J., and Liu, T.Y. (2019, January 4\u20138). DeepGBM: A deep learning framework distilled by GBDT for online prediction tasks. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330858"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Fukui, S., Yu, J., and Hashimoto, M. (2019, January 18\u201321). Distilling Knowledge for Non-Neural Networks. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Lanzhou, China.","DOI":"10.1109\/APSIPAASC47483.2019.9023120"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3309547","article-title":"Temporal relational ranking for stock prediction","volume":"37","author":"Feng","year":"2019","journal-title":"ACM Trans. Inf. Syst."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"103269","DOI":"10.1016\/j.jbi.2019.103269","article-title":"ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU","volume":"98","author":"Gutierrez","year":"2019","journal-title":"J. Biomed. Inform."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"He, X., Yang, X., Zhang, S., Zhao, J., Zhang, Y., Xing, E., and Xie, P. (2020). Sample-Efficient Deep Learning for COVID-19 Diagnosis Based on CT Scans. medRxiv.","DOI":"10.1101\/2020.04.13.20063941"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Guo, H., Tang, R., Ye, Y., Li, Z., and He, X. (2017, January 19\u201325). DeepFM: A factorization-machine based neural network for CTR prediction. Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia.","DOI":"10.24963\/ijcai.2017\/239"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/11\/1203\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:27:41Z","timestamp":1760178461000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/11\/1203"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,24]]},"references-count":56,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,11]]}},"alternative-id":["e22111203"],"URL":"https:\/\/doi.org\/10.3390\/e22111203","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,24]]}}}