{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T04:41:11Z","timestamp":1768970471427,"version":"3.49.0"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"9","license":[{"start":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T00:00:00Z","timestamp":1746835200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T00:00:00Z","timestamp":1746835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","award":["RTG 2126 (Algorithmic Optimization)"],"award-info":[{"award-number":["RTG 2126 (Algorithmic Optimization)"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100018755","name":"Universit\u00e4t Trier","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100018755","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Optim Lett"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Random forests are among the most famous algorithms for solving classification problems, in particular for large-scale data sets. Considering a set of labeled points and several decision trees, the method takes the majority vote to classify a new given point. In some scenarios, however, labels are only accessible for a proper subset of the given points. Moreover, this subset can be non-representative, e.g., due to collection bias. Semi-supervised learning considers the setting of labeled and unlabeled data and often improves the reliability of the results. In addition, it can be possible to obtain additional information about class sizes from undisclosed sources. We propose a mixed-integer linear optimization model for computing a semi-supervised random forest that covers the setting of labeled and unlabeled data points as well as the overall number of points in each class for a binary classification. Since the solution time rapidly grows as the number of variables increases, we present some problem-tailored preprocessing techniques and an intuitive branching rule. Our numerical results show that our approach leads to better accuracy and a better Matthews correlation coefficient for biased samples compared to random forests by majority vote, even if only a few labeled points are available.<\/jats:p>","DOI":"10.1007\/s11590-025-02191-8","type":"journal-article","created":{"date-parts":[[2025,5,10]],"date-time":"2025-05-10T08:14:09Z","timestamp":1746864849000},"page":"1717-1735","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Mixed-integer linear optimization for cardinality-constrained random forests"],"prefix":"10.1007","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5771-6179","authenticated-orcid":false,"given":"Jan Pablo","family":"Burgard","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4045-0597","authenticated-orcid":false,"given":"Maria Eduarda","family":"Pinheiro","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6208-5677","authenticated-orcid":false,"given":"Martin","family":"Schmidt","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,5,10]]},"reference":[{"key":"2191_CR1","unstructured":"Amini, M.-R., Gallinari, P.: \u201cSemi-supervised logistic regression.\u201d In: Proceedings of the 15th European Conference on Artificial Intelligence. ECAI\u201902. Lyon, France: IOS Press, pp.\u00a0390\u2013394 (2002)"},{"key":"2191_CR2","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/s11749-016-0481-7","volume":"25.2","author":"G Biau","year":"2016","unstructured":"Biau, G., Scornet, E.: A random forest guided tour. TEST Off. J. Span. Soc. Stat. Op. Res. 25.2, 197\u2013227 (2016). https:\/\/doi.org\/10.1007\/s11749-016-0481-7","journal-title":"TEST Off. J. Span. Soc. Stat. Op. Res."},{"key":"2191_CR3","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45.1","author":"L Breiman","year":"2001","unstructured":"Breiman, L.: Random forests. Mach. Learn. 45.1, 5\u201332 (2001). https:\/\/doi.org\/10.1023\/A:1010933404324","journal-title":"Mach. Learn."},{"key":"2191_CR4","doi-asserted-by":"publisher","first-page":"107048","DOI":"10.1016\/j.csda.2020.107048","volume":"154","author":"JP Burgard","year":"2021","unstructured":"Burgard, J.P., Krause, J., Schmaus, S.: Estimation of regional transition probabilities for spatial dynamic microsimulations from survey data lacking in regional detail. Comput. Stat. Data Anal. 154, 107048 (2021). https:\/\/doi.org\/10.1016\/j.csda.2020.107048","journal-title":"Comput. Stat. Data Anal."},{"key":"2191_CR5","unstructured":"Burgard, J.P., Pinheiro, M.E., Schmidt, M.: Mixed-Integer Linear Optimization for Semi-Supervised Optimal Classification Trees. (2024a) arXiv: 2401.09848 [math.OC]"},{"key":"2191_CR6","doi-asserted-by":"publisher","DOI":"10.1007\/s11750-024-00668-w","author":"JP Burgard","year":"2024","unstructured":"Burgard, J.P., Pinheiro, M.E., Schmidt, M.: Mixed-integer quadratic optimization and iterative clustering techniques for semi-supervised support vector machines. TOP (2024). https:\/\/doi.org\/10.1007\/s11750-024-00668-w","journal-title":"TOP"},{"key":"2191_CR7","first-page":"3348","volume-title":"Advances in Neural Information Processing Systems","author":"D Bzdok","year":"2015","unstructured":"Bzdok, D., Eickenberg, M., Grisel, O., Thirion, B., Varoquaux, G.: Semi-supervised factored logistic regression for high-dimensional neuroimaging data. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3348\u20133356. MIT Press, Cambridge (2015)"},{"key":"2191_CR8","doi-asserted-by":"publisher","unstructured":"Chapelle, O., Chi, M., Zien, A.: \u201cA continuation method for semi-supervised SVMs.\u201d In: Proceedings of the 23rd International Conference on Machine Learning. ICML \u201906. New York, NY, USA: Association for Computing Machinery, pp.\u00a0185\u2013192. (2006). https:\/\/doi.org\/10.1145\/1143844.1143868","DOI":"10.1145\/1143844.1143868"},{"key":"2191_CR9","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1007\/978-1-4419-9326-7_5","volume-title":"Ensemble Machine Learning: Methods and Applications","author":"A Cutler","year":"2012","unstructured":"Cutler, A., Cutler, D.R., Stevens, J.R.: Random Forests. In: Zhang, C., Ma, Y. (eds.) Ensemble Machine Learning: Methods and Applications, pp. 157\u2013175. Springer, New York (2012). https:\/\/doi.org\/10.1007\/978-1-4419-9326-7_5"},{"key":"2191_CR10","doi-asserted-by":"publisher","unstructured":"Dogru, N., Subasi, A.: Traffic accident detection using random forest classifier. In: 2018 15th Learning and Technology Conference (L &T). IEEE, pp.\u00a040\u201345. (2018). https:\/\/doi.org\/10.1109\/LT.2018.8368509","DOI":"10.1109\/LT.2018.8368509"},{"key":"2191_CR11","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1137\/15M1020575","volume":"592","author":"I Dunning","year":"2017","unstructured":"Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 592, 295\u2013320 (2017). https:\/\/doi.org\/10.1137\/15M1020575","journal-title":"SIAM Rev."},{"key":"2191_CR12","doi-asserted-by":"publisher","first-page":"116","DOI":"10.26599\/BDMA.2020.9020016","volume":"4.2","author":"VK Gupta","year":"2021","unstructured":"Gupta, V.K., Gupta, A., Kumar, D., Sardana, A.: Prediction of COVID-19 confirmed, death, and cured cases in India using random forest model. Big Data Min. Anal. 4.2, 116\u2013123 (2021). https:\/\/doi.org\/10.26599\/BDMA.2020.9020016","journal-title":"Big Data Min. Anal."},{"key":"2191_CR13","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-84858-7","volume-title":"The Elements of Statistical Learning: Data Mining Inference and Prediction","author":"T Hastie","year":"2009","unstructured":"Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining Inference and Prediction, 2nd edn. Springer, New York (2009). https:\/\/doi.org\/10.1007\/978-0-387-84858-7","edition":"2"},{"key":"2191_CR14","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/j.patcog.2016.04.016","volume":"60","author":"K Kim","year":"2016","unstructured":"Kim, K.: A hybrid classification algorithm by subspace partitioning through semi-supervised decision tree. Pattern Recognit. 60, 157\u2013163 (2016). https:\/\/doi.org\/10.1016\/j.patcog.2016.04.016","journal-title":"Pattern Recognit."},{"key":"2191_CR15","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1007\/s10844-017-0457-4","volume":"49","author":"MCD Kocev","year":"2017","unstructured":"Kocev, M.C.D., Levati\u0107, J., D\u017eeroski, S.: Semi-supervised classification trees. J. Intell. Inform. Syst. 49, 461\u2013486 (2017). https:\/\/doi.org\/10.1007\/s10844-017-0457-4","journal-title":"J. Intell. Inform. Syst."},{"key":"2191_CR16","unstructured":"Lee, D.-H.: Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks. In: ICML 2013 Workshop : Challenges in Representation Learning (WREPL) (2013)"},{"key":"2191_CR17","doi-asserted-by":"publisher","unstructured":"Leistner, C., Saffari, A., Santner, J., Bischof, H.: Semi-Supervised Random Forests. In: 2009 IEEE 12th International Conference on Computer Vision, pp.\u00a0506\u2013513. (2009) https:\/\/doi.org\/10.1109\/ICCV.2009.5459198","DOI":"10.1109\/ICCV.2009.5459198"},{"key":"2191_CR18","doi-asserted-by":"publisher","first-page":"1088","DOI":"10.1109\/TSMCA.2007.904745","volume":"37.6","author":"M Li","year":"2007","unstructured":"Li, M., Zhou, Z.-H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37.6, 1088\u20131098 (2007). https:\/\/doi.org\/10.1109\/TSMCA.2007.904745","journal-title":"IEEE Trans. Syst. Man Cybern. Part A Syst. Hum."},{"key":"2191_CR19","doi-asserted-by":"publisher","unstructured":"Melacci, S., Belkin, M.: Laplacian Support Vector Machines Trained in the Primal. J. Mach. Learn. Res. Vol. 12. (2009). https:\/\/doi.org\/10.48550\/ARXIV.0909.5422","DOI":"10.48550\/ARXIV.0909.5422"},{"key":"2191_CR20","doi-asserted-by":"publisher","unstructured":"Nguyen, T.\u00a0N.\u00a0N., Veeravalli, B., Fong, X.: A semi-supervised learning method for spiking neural networks based on pseudo-labeling. In: 2023 International Joint Conference on Neural Networks (IJCNN), pp.\u00a01\u20137. (2023). https:\/\/doi.org\/10.1109\/IJCNN54540.2023.10191317","DOI":"10.1109\/IJCNN54540.2023.10191317"},{"key":"2191_CR21","doi-asserted-by":"publisher","unstructured":"Oliver, A., Odena, A., Raffel, C.\u00a0A., Cubuk, E.\u00a0D., Goodfellow, I.: Realistic evaluation of deep semi-supervised learning algorithms. In: Advances in Neural Information Processing Systems. Vol.\u00a031. Curran Associates, Inc. (2018). https:\/\/doi.org\/10.48550\/arXiv.1804.09170","DOI":"10.48550\/arXiv.1804.09170"},{"key":"2191_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13040-017-0154-4","volume":"10.36","author":"RS Olson","year":"2017","unstructured":"Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min. 10.36, 1\u201313 (2017). https:\/\/doi.org\/10.1186\/s13040-017-0154-4","journal-title":"BioData Min."},{"key":"2191_CR23","doi-asserted-by":"publisher","unstructured":"Pal, M., Parija, S.: Prediction of Heart Diseases using Random Forest. In: Journal of Physics: Conference Series, Vol. 1817.1, p. 012009. (2021). https:\/\/doi.org\/10.1088\/1742-6596\/1817\/1\/012009","DOI":"10.1088\/1742-6596\/1817\/1\/012009"},{"key":"2191_CR24","doi-asserted-by":"publisher","unstructured":"Sadeghi, B., Chiarawongse, P., Squire, K., Jones, D.\u00a0C., Noack, A., St-Jean, C., Huijzer, R., Sch\u00e4tzle, R., Butterworth, I., Peng, Y.-F., Blaom, A.: DecisionTree.jl - a Julia implementation of the CART decision tree and random forest algorithms. Version\u00a00.11.3. (2022). https:\/\/doi.org\/10.5281\/zenodo.7359268","DOI":"10.5281\/zenodo.7359268"},{"key":"2191_CR25","doi-asserted-by":"publisher","unstructured":"Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from single depth images. In: CVPR 2011, pp.\u00a01297\u20131304. (2011). https:\/\/doi.org\/10.1109\/CVPR.2011.5995316","DOI":"10.1109\/CVPR.2011.5995316"},{"key":"2191_CR26","doi-asserted-by":"publisher","first-page":"953","DOI":"10.1093\/biomet\/asr058","volume":"98.4","author":"CJ Skinner","year":"2011","unstructured":"Skinner, C.J.: D\u2019arrigo: inverse probability weighting for clustered nonresponse. Biometrika 98.4, 953\u2013966 (2011). https:\/\/doi.org\/10.1093\/biomet\/asr058","journal-title":"Biometrika"},{"key":"2191_CR27","doi-asserted-by":"publisher","unstructured":"Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., Jiang, C.: Random forest for credit card fraud detection. In: 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), pp.\u00a01\u20136. (2018). https:\/\/doi.org\/10.1109\/ICNSC.2018.8361343","DOI":"10.1109\/ICNSC.2018.8361343"},{"key":"2191_CR28","doi-asserted-by":"publisher","DOI":"10.3390\/rs11242974","author":"Y Zhang","year":"2019","unstructured":"Zhang, Y., Cao, G., Li, X., Wang, B., Fu, P.: Active semi-supervised random forest for hyperspectral image classification. Remote Sens. (2019). https:\/\/doi.org\/10.3390\/rs11242974","journal-title":"Remote Sens."},{"key":"2191_CR29","first-page":"2392","volume":"35","author":"A Zharmagambetov","year":"2022","unstructured":"Zharmagambetov, A., Carreira-Perpinan, M.A.: Semi-supervised learning with decision trees: graph laplacian tree alternating optimization. Adv. Neural Inform. Process. Syst. 35, 2392\u20132405 (2022)","journal-title":"Adv. Neural Inform. Process. Syst."},{"key":"2191_CR30","doi-asserted-by":"publisher","unstructured":"Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers. (2009). https:\/\/doi.org\/10.2200\/S00196ED1V01Y200906AIM006","DOI":"10.2200\/S00196ED1V01Y200906AIM006"}],"container-title":["Optimization Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11590-025-02191-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11590-025-02191-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11590-025-02191-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,14]],"date-time":"2025-11-14T01:02:46Z","timestamp":1763082166000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11590-025-02191-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,10]]},"references-count":30,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["2191"],"URL":"https:\/\/doi.org\/10.1007\/s11590-025-02191-8","relation":{},"ISSN":["1862-4472","1862-4480"],"issn-type":[{"value":"1862-4472","type":"print"},{"value":"1862-4480","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,10]]},"assertion":[{"value":"17 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 May 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}