{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T11:22:07Z","timestamp":1780053727714,"version":"3.54.0"},"reference-count":35,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T00:00:00Z","timestamp":1676937600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"World Bank funding","award":["ESC 91"],"award-info":[{"award-number":["ESC 91"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Tax fraud is a common problem for many tax administrations, costing billions of dollars. Different tax administrations have considered several options to optimize revenue; among them, there is the so-called electronic billing machine (EBM), which aims to monitor all business transactions and, as a result, boost value added tax (VAT) revenue and compliance. Most of the current research has focused on the impact of EBMs on VAT revenue collection and compliance rather than understanding how EBM reporting behavior influences future compliance. The essential contribution of this study is that it leverages both EBM\u2019s historical reporting behavior and actual business characteristics to understand and predict the future reporting behavior of EBMs. Herein, tree-based machine learning algorithms such as decision trees, random forest, gradient boost, and XGBoost are utilized, tested, and compared for better performance. The results exhibit the robustness of the random forest model, among others, with an accuracy of 92.3%. This paper clearly presents our approach contribution with respect to existing approaches through well-defined research questions, analysis mechanisms, and constructive discussions. Once applied, we believe that our approach could ultimately help the tax-collecting agency conduct timely interventions on EBM compliance, which will help achieve the EBM objective of improving VAT compliance.<\/jats:p>","DOI":"10.3390\/info14030140","type":"journal-article","created":{"date-parts":[[2023,2,21]],"date-time":"2023-02-21T03:04:12Z","timestamp":1676948652000},"page":"140","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Comparison of Tree-Based Machine Learning Algorithms to Predict Reporting Behavior of Electronic Billing Machines"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7798-1781","authenticated-orcid":false,"given":"Belle Fille","family":"Murorunkwere","sequence":"first","affiliation":[{"name":"African Center of Excellence in Data Science, University of Rwanda, Kigali P.O. Box 4285, Rwanda"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4463-6268","authenticated-orcid":false,"given":"Jean Felicien","family":"Ihirwe","sequence":"additional","affiliation":[{"name":"Department of Information Engineering Computer Science and Mathematics, University of l\u2019Aquila, 56121 Pisa, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3175-5081","authenticated-orcid":false,"given":"Idrissa","family":"Kayijuka","sequence":"additional","affiliation":[{"name":"Department of Applied Statistics, University of Rwanda, Kigali P.O. Box 4285, Rwanda"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2973-9258","authenticated-orcid":false,"given":"Joseph","family":"Nzabanita","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Science and Technology, University of Rwanda, Kigali P.O. Box 3900, Rwanda"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dominique","family":"Haughton","sequence":"additional","affiliation":[{"name":"Department of Mathematical Sciences and Global Studies, Bentley University, Watham, MA 02452-4705, USA"},{"name":"Department of Mathematical Sciences and Global Studies, Universit\u00e9 Paris 1 (SAMM), 75634 Paris, France"},{"name":"Department of Mathematical Sciences and Global Studies, Universit\u00e9 Toulouse 1 (TSE-R), 31042 Toulouse, France"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,21]]},"reference":[{"key":"ref_1","unstructured":"Cobham, A. (2022, April 01). Taxation Policy and Development. Available online: https:\/\/www.files.ethz.ch\/isn\/110040."},{"key":"ref_2","first-page":"56","article-title":"Electronic Fiscal Devices (EFDs) An Empirical Study of their Impact on Taxpayer Compliance and Administrative Efficiency","volume":"15","author":"Casey","year":"2015","journal-title":"IMF Work. Pap."},{"key":"ref_3","unstructured":"Steenbergen, V. (2017). Reaping the Benefits of Electronic Billing Machines Using Data-Driven Tools to Improve VAT Compliance, International Growth Centre. Working Paper."},{"key":"ref_4","unstructured":"Eissa, N., Zeitlin, A., and Using mobile technologies to increase VAT compliance in Rwanda (2023, February 01). Unpublished Working Paper. Available online: https:\/\/scholar.google.com\/scholar?hl=en&as_sdt=0%2C5&q=Using+mobile+technologies+to+increase+VAT+compliance+in+Rwanda&btnG=."},{"key":"ref_5","unstructured":"Rwanda Revenue Authority (2022, July 01). Tax Statistics Publication in Rwanda, Available online: https:\/\/www.rra.gov.rw\/Publication\/."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Botchey, F.E., Qin, Z., and Hughes-Lartey, K. (2020). Mobile Money Fraud Prediction\u2014A Cross-Case Analysis on the Efficiency of Support Vector Machines, Gradient Boosted Decision Trees, and Na\u00efve Bayes Algorithms. Information, 11.","DOI":"10.3390\/info11080383"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Andrade, J.P.A., Paulucio, L.S., Paixao, T.M., Berriel, R.F., Carneiro, T.C.J., Carneiro, R.V., De Souza, A.F., Badue, C., and Oliveira-Santos, T. (2021, January 29). A machine learning-based system for financial fraud detection. Proceedings of the Anais do XVIII Encontro Nacional de Intelig\u00eancia Artificial e Computacional, SBC, online.","DOI":"10.5753\/eniac.2021.18250"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1016\/j.ins.2020.03.089","article-title":"Anomaly detection in electronic invoice systems based on machine learning","volume":"535","author":"Tang","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_9","unstructured":"Hu, P. (2015). Predicting and Improving Invoice-to-Cash Collection through Machine Learning. [Ph.D. Thesis, Massachusetts Institute of Technology]."},{"key":"ref_10","unstructured":"Siarka, P., and Chojnacka-Komorowska, A. (2022). Fraud in Accounting and Taxation and Its Detection, Publishing House of Wroclaw University of Economics and Busine."},{"key":"ref_11","first-page":"107","article-title":"A comparison of psychological factors for tax compliance: Self employed versus salaried people","volume":"2","author":"Khurana","year":"2014","journal-title":"Int. J. Manag. Soc. Sci."},{"key":"ref_12","unstructured":"Murphy, R. (2022, February 01). The Cost of Tax Abuse. A Briefing Paper on the Cost of Tax Evasion Worldwide. Available online: https:\/\/openaccess.city.ac.uk\/id\/eprint\/16561\/1\/cost_of_tax_."},{"key":"ref_13","first-page":"125","article-title":"Tax compliance research: Findings, problems and prospects","volume":"5","author":"Jackson","year":"1986","journal-title":"Int. J. Account. Lit."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.intaccaudtax.2004.09.001","article-title":"Relationship between tax compliance internationally and selected determinants of tax morale","volume":"13","year":"2004","journal-title":"J. Int. Account. Audit. Tax."},{"key":"ref_15","unstructured":"Trivedi, V., Shehata, M., and Mestelman, S. (2004). Attitudes, Incentives and Tax Compliance, McMaster University. Department of Economics Working Papers."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1069","DOI":"10.1016\/j.sbspro.2013.12.590","article-title":"Tax Knowledge, Tax Complexity and Tax Compliance: Taxpayers\u2019 View","volume":"109","author":"Saad","year":"2014","journal-title":"Procedia-Soc. Behav. Sci."},{"key":"ref_17","unstructured":"Ngigi, E.W. (2011). The Effect of Electronic Tax Register System on the Duration of Value Added tax Audit in Kenya. [Doctoral Dissertation, University of Nairobi]."},{"key":"ref_18","unstructured":"Chege, J.M. (2010). The Impact of Using Electronic tax Register on Value Added Tax Compliance in Kenya: A case Study of Classified Hotels in Nairobi. [Doctoral Dissertation, University of Nairobi]."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"349","DOI":"10.17722\/ijrbt.v5i3.349","article-title":"Assessment of Challenges Facing the Implementation of Electronic Fiscal Devices (EFDs) in Revenue Collection in Tanzania","volume":"5","author":"Ikasu","year":"2014","journal-title":"Int. J. Res. Bus. Technol."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mascagni, G., Monkam, N., and Nell, C. (2016). Unlocking the Potential of Administrative Data in Africa: Tax Compliance and Progressivity in Rwanda, International Centre for Tax & Development. International Centre for Tax & Development, Working Paper.","DOI":"10.2139\/ssrn.3120309"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ranaldi, L., and Pucci, G. (2023). Knowing Knowledge: Epistemological Study of Knowledge in Transformers. Appl. Sci., 13.","DOI":"10.3390\/app13020677"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Murorunkwere, B.F., Tuyishimire, O., Haughton, D., and Nzabanita, J. (2022). Fraud detection using neural networks: A case study of income tax. Future Internet, 14.","DOI":"10.3390\/fi14060168"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Bel, N., Bracons, G., and Anderberg, S. (2021). Finding Evidence of Fraudster Companies in the CEO\u2019s Letter to Shareholders with Sentiment Analysis. Information, 12.","DOI":"10.3390\/info12080307"},{"key":"ref_24","unstructured":"Humski, L., Vrdoljak, B., and Skocir, Z. (2012, January 18\u201320). Concept, development and implementation of FER e-invoice system. Proceedings of the SoftCOM 2012, 20th International Conference on Software, Telecommunications and Computer Networks, Split-Primosten, Croatia."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Shao, P.E., and Dida, M. (2020). The Implementation of an Enhanced EFD System with an Embedded Tax Evasion Detection Features: A Case of Tanzania. J. Inf. Syst. Eng. Manag., 5.","DOI":"10.29333\/jisem\/7824"},{"key":"ref_26","unstructured":"Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O\u2019Reilly Media, Inc.. [2nd ed.]."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Maimon, O., and Rokach, L. (2005). Data Mining and Knowledge Discovery Handbook, Springer.","DOI":"10.1007\/b107408"},{"key":"ref_28","unstructured":"Dangeti, P. (2017). Statistics for Machine Learning, Packt Publishing, Limited. [1st ed.]."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Liu, B., Ma, M., and Chang, J. (2012, January 14\u201316). New Machine Learning Algorithm: Random Forest. Proceedings of the Information Computing and Applications, Chengde, China.","DOI":"10.1007\/978-3-642-34062-8_32"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"21","DOI":"10.3389\/fnbot.2013.00021","article-title":"Gradient boosting machines, a tutorial","volume":"7","author":"Natekin","year":"2013","journal-title":"Front. Neurorobot."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Dhieb, N., Ghazzai, H., Besbes, H., and Massoud, Y. (2019, January 4). Extreme Gradient Boosting Machine Learning Algorithm For Safe Auto Insurance Operations. Proceedings of the 2019 IEEE International Conference on Vehicular Electronics and Safety (ICVES), Cairo, Egypt.","DOI":"10.1109\/ICVES.2019.8906396"},{"key":"ref_32","unstructured":"Wallach, H., Larochelle, H., Beygelzimer, A., d\u2019Alch\u00e9-Buc, F., Fox, E., and Garnett, R. (21019, January 8\u201314). Regularized Gradient Boosting. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5121\/ijdkp.2015.5201","article-title":"A Review on Evaluation Metrics for Data Classification Evaluations","volume":"5","author":"Hossin","year":"2015","journal-title":"Int. J. Data Min. Knowl. Manag. Process"},{"key":"ref_34","first-page":"120670","article-title":"Classification Model Evaluation Metrics","volume":"12","author":"Vujovic","year":"2021","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_35","unstructured":"Singh, A., and Zhu, J. (2017, January 20\u201322). Beta calibration: A well-founded and easily implemented improvement on logistic calibration for binary classifiers. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/3\/140\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:37:55Z","timestamp":1760121475000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/14\/3\/140"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,21]]},"references-count":35,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3]]}},"alternative-id":["info14030140"],"URL":"https:\/\/doi.org\/10.3390\/info14030140","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,2,21]]}}}