{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T05:31:09Z","timestamp":1774935069999,"version":"3.50.1"},"reference-count":60,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T00:00:00Z","timestamp":1690329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Science and Higher Education of the Russian Federation","award":["075-15-2022-311"],"award-info":[{"award-number":["075-15-2022-311"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>A new random forest-based model for solving the Multiple Instance Learning problem under small tabular data, called the Soft Tree Ensemble Multiple Instance Learning, is proposed. A new type of soft decision trees is considered, which is similar to the well-known soft oblique trees, but with a smaller number of trainable parameters. In order to train the trees, it is proposed to convert them into neural networks of a specific form, which approximate the tree functions. It is also proposed to aggregate the instance and bag embeddings (output vectors) by using the attention mechanism. The whole Soft Tree Ensemble Multiple Instance Learning model, including soft decision trees, neural networks, the attention mechanism and a classifier, is trained in an end-to-end manner. Numerical experiments with well-known real tabular datasets show that the proposed model can outperform many existing multiple instance learning models. A code implementing the model is publicly available.<\/jats:p>","DOI":"10.3390\/a16080358","type":"journal-article","created":{"date-parts":[[2023,7,27]],"date-time":"2023-07-27T02:07:17Z","timestamp":1690423637000},"page":"358","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Multiple Instance Learning with Trainable Soft Decision Tree Ensembles"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1542-6480","authenticated-orcid":false,"given":"Andrei","family":"Konstantinov","sequence":"first","affiliation":[{"name":"Department of Artificial Intelligence, Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, 195251 St. Petersburg, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5637-1420","authenticated-orcid":false,"given":"Lev","family":"Utkin","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, 195251 St. Petersburg, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3583-7324","authenticated-orcid":false,"given":"Vladimir","family":"Muliukha","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya, 29, 195251 St. Petersburg, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2023,7,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"6423","DOI":"10.1038\/s41598-020-62724-2","article-title":"Resolving challenges in deep learning-based analyses of histopathological images using explanation methods","volume":"10","author":"Hagele","year":"2020","journal-title":"Sci. Rep."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"775","DOI":"10.1038\/s41591-021-01343-4","article-title":"Deep learning in histopathology: The path to the clinic","volume":"27","author":"Litjens","year":"2021","journal-title":"Nat. Med."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5642","DOI":"10.1038\/s41467-019-13647-8","article-title":"Automated acquisition of explainable knowledge from unannotated histopathology images","volume":"10","author":"Yamamoto","year":"2019","journal-title":"Nat. Commun."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1016\/S0004-3702(96)00034-3","article-title":"Solving the multiple instance problem with axis-parallel rectangles","volume":"89","author":"Dietterich","year":"1997","journal-title":"Artif. Intell."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhu, L., Zhao, B., and Gao, Y. (2008, January 18\u201320). Multi-class multi-instance learning for lung cancer image classification based on bag feature selection. Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Jinan, China.","DOI":"10.1109\/FSKD.2008.54"},{"key":"ref_6","first-page":"2109","article-title":"Multiple instance learning with emerging novel class","volume":"33","author":"Wei","year":"2019","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.artint.2013.06.003","article-title":"Multiple instance classification: Review, taxonomy and comparative study","volume":"201","author":"Amores","year":"2013","journal-title":"Artif. Intell."},{"key":"ref_8","unstructured":"Babenko, B. (2008). Multiple Instance Learning: Algorithms and Applications, University of California. Technical Report."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/j.patcog.2017.10.009","article-title":"Multiple instance learning: A survey of problem characteristics and applications","volume":"77","author":"Carbonneau","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1016\/j.media.2019.03.009","article-title":"Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis","volume":"54","author":"Cheplygina","year":"2019","journal-title":"Med. Image Anal."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1109\/RBME.2017.2651164","article-title":"Multiple-Instance Learning for Medical Image and Video Analysis","volume":"10","author":"Quellec","year":"2017","journal-title":"IEEE Rev. Biomed. Eng."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"101789","DOI":"10.1016\/j.media.2020.101789","article-title":"Whole slide images based cancer survival prediction using attention guided deep multiple instance learning network","volume":"65","author":"Yao","year":"2020","journal-title":"Med. Image Anal."},{"key":"ref_13","unstructured":"Zhou, Z.H. (2004). Multi-Instance Learning: A Survey, National Laboratory for Novel Software Technology, Nanjing University. Technical Report."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"101813","DOI":"10.1016\/j.media.2020.101813","article-title":"Deep neural network models for computational histopathology: A survey","volume":"67","author":"Srinidhi","year":"2021","journal-title":"Med. Image Anal."},{"key":"ref_15","unstructured":"Andrews, S., Tsochantaridis, I., and Hofmann, T. (2002). Proceedings of the 15th International Conference on Neural Information Processing Systems, NIPS\u201902, MIT Press."},{"key":"ref_16","first-page":"204","article-title":"Solving multiple-instance and multiple-part learning problems with decision trees and rule sets. Application to the mutagenesis problem","volume":"Volume 2056","author":"Chevaleyre","year":"2001","journal-title":"Proceedings of the Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"i52","DOI":"10.1093\/bioinformatics\/btw252","article-title":"Classifying and segmenting microscopy images with deep multiple instance learning","volume":"32","author":"Kraus","year":"2016","journal-title":"Bioinformatics"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sun, M., Han, T., Liu, M.C., and Khodayari-Rostamabad, A. (2016, January 4\u20138). Multiple instance learning convolutional neural networks for object recognition. Proceedings of the International Conference on Pattern Recognition (ICPR), Cancun, Mexico.","DOI":"10.1109\/ICPR.2016.7900139"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.patcog.2017.08.026","article-title":"Revisiting multiple instance neural networks","volume":"74","author":"Wang","year":"2018","journal-title":"Pattern Recognit."},{"key":"ref_20","unstructured":"Wang, J., and Zucker, J.D. (July, January 29). Solving the multiple-instance problem: A lazy learning approach. Proceedings of the Seventeenth International Conference on Machine Learning, ICML, Stanford, CA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1613\/jair.5240","article-title":"Explicit Document Modeling through Weighted Multiple-Instance Learning","volume":"58","author":"Pappas","year":"2017","journal-title":"J. Artif. Intell. Res."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fuster, S., Eftestol, T., and Engan, K. (2021). Nested multiple instance learning with attention mechanisms. arXiv.","DOI":"10.1109\/ICMLA55696.2022.00038"},{"key":"ref_23","unstructured":"Ilse, M., Tomczak, J., and Welling, M. (2018, January 10\u201315). Attention-based Deep Multiple Instance Learning. Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden."},{"key":"ref_24","unstructured":"Jiang, S., Suriawinata, A., and Hassanpour, S. (2021). MHAttnSurv: Multi-Head Attention for Survival Prediction Using Whole-Slide Pathology Images. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"14029","DOI":"10.1007\/s00521-022-07259-5","article-title":"Multi-attention multiple instance learning","volume":"34","author":"Konstantinov","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_26","unstructured":"Rymarczyk, D., Kaczynska, A., Kraus, J., Pardyl, A., and Zielinski, B. (2021). ProtoMIL: Multiple Instance Learning with Prototypical Parts for Fine-Grained Interpretability. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhou, Y., Huang, J., Liu, Z., Li, L., Xu, W., and Cheng, J.Z. (2020, January 16\u201319). Hierarchical Attention-Based Multiple Instance Learning Network for Patient-Level Lung Cancer Diagnosis. Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Republic of Korea.","DOI":"10.1109\/BIBM49941.2020.9313417"},{"key":"ref_28","unstructured":"Heath, D., Kasif, S., and IJCAI, S.S. (September, January 28). Induction of oblique decision trees. Proceedings of the International Joint Conference on Artificial Intelligence, Chamb\u00e9ry, France."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Taser, P., Birant, K., and Birant, D. (2019, January 3\u20135). Comparison of Ensemble-Based Multiple Instance Learning Approaches. Proceedings of the 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), Sofia, Bulgaria.","DOI":"10.1109\/INISTA.2019.8778273"},{"key":"ref_30","first-page":"4384","article-title":"Multiple-Instance Learning from Distributions","volume":"17","author":"Doran","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Feng, J., and Zhou, Z.H. (2017, January 4\u20139). Deep miml network. Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.10890"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1044","DOI":"10.1016\/j.neucom.2015.08.061","article-title":"MI-ELM: Highly efficient multi-instance learning based on hierarchical extreme learning machine","volume":"173","author":"Liu","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"826","DOI":"10.1016\/j.neucom.2015.07.024","article-title":"Multiple-instance learning based decision neural networks for image retrieval and classification","volume":"171","author":"Xu","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Rymarczyk, D., Borowa, A., Tabor, J., and Zielinski, B. (2021, January 3\u20138). Kernel Self-Attention for Weakly-supervised Image Classification using Deep Multiple Instance Learning. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00176"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3454009","article-title":"MILL: Channel Attention\u2013based Deep Multiple Instance Learning for Landslide Recognition","volume":"17","author":"Tang","year":"2021","journal-title":"ACM Trans. Multimed. Comput. Commun. Appl. (TOMM)"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, B., Li, Y., and Eliceiri, K. (2021, January 20\u201325). Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01409"},{"key":"ref_37","unstructured":"Qi, C., Hao, S., Kaichun, M., and Leonidas, J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3D classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_38","unstructured":"Schmidt, A., Morales-Alvarez, P., and Molina, R. (2021). Probabilistic attention based on Gaussian processes for deep multiple instance learning. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"4765","DOI":"10.1007\/s10462-022-10275-5","article-title":"Recent advances in decision trees: An updated survey","volume":"56","author":"Costa","year":"2022","journal-title":"Artif. Intell. Rev."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.csda.2015.11.006","article-title":"HHCART: An oblique decision tree","volume":"96","author":"Wickramarachchi","year":"2016","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_41","unstructured":"Carreira-Perpinan, M., and Tavallali, P. (2018, January 3\u20138). Alternating optimization of decision trees, with application to learning sparse oblique trees. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1959","DOI":"10.1007\/s10994-021-06094-4","article-title":"One-Stage Tree: End-to-end tree builder and pruner","volume":"111","author":"Xu","year":"2022","journal-title":"Mach. Learn."},{"key":"ref_43","first-page":"453","article-title":"On oblique random forests","volume":"Volume 22","author":"Menze","year":"2011","journal-title":"Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"107078","DOI":"10.1016\/j.patcog.2019.107078","article-title":"Heterogeneous oblique random forest","volume":"99","author":"Katuwal","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1109\/TEVC.2002.806857","article-title":"Inducing oblique decision trees with evolutionary algorithms","volume":"7","author":"Kamath","year":"2003","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1007\/s11263-019-01237-6","article-title":"End-to-End Learning of Decision Trees and Forests","volume":"128","author":"Hehn","year":"2020","journal-title":"Int. J. Comput. Vis."},{"key":"ref_47","unstructured":"Lee, G.H., and Jaakkola, T. (2019). Oblique decision trees from derivatives of relu networks. arXiv."},{"key":"ref_48","unstructured":"Hazimeh, H., Ponomareva, N., Mol, P., Tan, Z., and Mazumder, R. (2020, January 13\u201318). The tree ensemble layer: Differentiability meets conditional computation. Proceedings of the International Conference on Machine Learning, Virtual."},{"key":"ref_49","unstructured":"Frosst, N., and Hinton, G. (2017). Distilling a neural network into a soft decision tree. arXiv."},{"key":"ref_50","unstructured":"Karthikeyan, A., Jain, N., Natarajan, N., and Jain, P. (2021). Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent. arXiv."},{"key":"ref_51","unstructured":"Madaan, L., Bhojanapalli, S., Jain, H., and Jain, P. (2022). Treeformer: Dense Gradient Trees for Efficient Attention Computation. arXiv."},{"key":"ref_52","unstructured":"Bengio, Y., Leonard, N., and Courville, A. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Leistner, C., Saffari, A., and Bischof, H. (2010, January 5\u201311). MIForests: Multiple-instance learning with randomized trees. Proceedings of the European Conference on Computer Vision, Crete, Greece.","DOI":"10.1007\/978-3-642-15567-3_3"},{"key":"ref_54","unstructured":"Gartner, T., Flach, P., Kowalczyk, A., and Smola, A. (2002, January 8\u201312). Multi-instance kernels. Proceedings of the ICML, Sydney, Australia."},{"key":"ref_55","unstructured":"Zhang, Q., and Goldman, S. (2002, January 9\u201314). Em-dd: An improved multiple-instance learning technique. Proceedings of the NIPS, Vancouver, BC, Canada."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Zhou, Z.H., Sun, Y.Y., and Li, Y.F. (2009, January 14\u201318). Multi-instance learning by treating instances as non-iid samples. Proceedings of the ICML, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553534"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1109\/TNNLS.2016.2519102","article-title":"Scalable algorithms for multi-instance learning","volume":"28","author":"Wei","year":"2017","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s10994-006-6226-1","article-title":"Extremely randomized trees","volume":"63","author":"Geurts","year":"2006","journal-title":"Mach. Learn."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: A gradient boosting machine","volume":"29","author":"Friedman","year":"2001","journal-title":"Ann. Stat."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","article-title":"Stochastic gradient boosting","volume":"38","author":"Friedman","year":"2002","journal-title":"Comput. Stat. Data Anal."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/8\/358\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:19:36Z","timestamp":1760127576000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/8\/358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,26]]},"references-count":60,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["a16080358"],"URL":"https:\/\/doi.org\/10.3390\/a16080358","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,7,26]]}}}