{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T23:42:46Z","timestamp":1780616566491,"version":"3.54.1"},"reference-count":31,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T00:00:00Z","timestamp":1606694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Electronic invoicing has been mandatory for Italian companies since January 2019. All the invoices are structured in a predefined xml template which facilitates the extraction of the information. The main aim of this paper is to exploit the information contained in electronic invoices to build an intelligent system which can simplify accountants\u2019 work. More precisely, this contribution shows how it is possible to automate part of the accounting process: all the invoices of a company are classified into specific codes which represent the economic nature of the financial transactions. To accomplish this classification task, a multiclass classification algorithm is proposed to predict two different target variables, the account and the VAT codes, which are part of the general ledger entry. To apply this model to real datasets, a multi-step procedure is proposed: first, a matching algorithm is used for the reconstruction of the training set, then input data are elaborated and prepared for the training phase, and finally a classification algorithm is trained. Different classification algorithms are compared in terms of prediction accuracy, including ensemble models and neural networks. The models under comparison show optimal results in the prediction of the target variables, meaning that machine learning classifiers succeed in translating the complex rules of the accounting process into an automated model. A final study suggests that best performances can be achieved considering the hierarchical structure of the account codes, splitting the classification task into smaller sub-problems.<\/jats:p>","DOI":"10.3390\/make2040033","type":"journal-article","created":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T10:26:12Z","timestamp":1606731972000},"page":"617-629","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Automatic Electronic Invoice Classification Using Machine Learning Models"],"prefix":"10.3390","volume":"2","author":[{"given":"Chiara","family":"Bardelli","sequence":"first","affiliation":[{"name":"Department of Computational Mathematics and Decision Sciences, University of Pavia, 27100 Pavia, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Alessandro","family":"Rondinelli","sequence":"additional","affiliation":[{"name":"Datev.it S.p.a., 20090 Assago, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ruggero","family":"Vecchio","sequence":"additional","affiliation":[{"name":"Datev.it S.p.a., 20090 Assago, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5756-7831","authenticated-orcid":false,"given":"Silvia","family":"Figini","sequence":"additional","affiliation":[{"name":"Department of Political and Social Sciences, University of Pavia, 27100 Pavia, Italy"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,11,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.techfore.2016.08.019","article-title":"The future of employment: How susceptible are jobs to computerisation?","volume":"114","author":"Frey","year":"2017","journal-title":"Technol. Forecast. Soc. Chang."},{"key":"ref_2","unstructured":"Tekbas, I., and Nonwoven, K. (2018). The Profession of the digital age: Accounting Engineering. IFAC Proceedings Volumes, Project: The Theory of Accounting, Enginnering, Elsevier."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gulin, D., Hladika, M., and Valenta, I. (2019, January 12\u201314). Digitalization and the Challenges for the Accounting Profession. Proceedings of the 2019 ENTRENOVA Conference, Rovinj, Croatia.","DOI":"10.2139\/ssrn.3492237"},{"key":"ref_4","unstructured":"ICAEW (2020, November 29). Artificial Intelligence and the Future of Accountancy. Available online: https:\/\/www.icaew.com\/technical\/technology\/artificial-intelligence\/artificial-intelligence-the-future-of-accountancy."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"738","DOI":"10.1109\/21.376488","article-title":"Financial document processing based on staff line and description language","volume":"25","author":"Tang","year":"1995","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"730","DOI":"10.1109\/34.689303","article-title":"INFORMys: A flexible invoice-like form-reader system","volume":"20","author":"Cesarini","year":"1998","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_7","unstructured":"Holt, X., and Chisholm, A. (2018, January 10\u201312). Extracting structured data from invoices. Proceedings of the Australasian Language Technology Association Workshop 2018, Dunedin, New Zealand."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Wang, Y., Gui, G., Zhao, N., Yin, Y., Huang, H., Li, Y., Wang, J., Yang, J., and Zhang, H. (2018, January 14\u201316). Deep learning for optical character recognition and its application to VAT invoice recognition. Proceedings of the International Conference in Communications, Signal Processing, and Systems, Dalian, China.","DOI":"10.1007\/978-981-13-6508-9_12"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1109\/ICDAR.2017.74","article-title":"Cloudscan-a configuration-free invoice analysis system using recurrent neural networks","volume":"Volume 1","author":"Palm","year":"2017","journal-title":"Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)"},{"key":"ref_10","unstructured":"Schreyer, M., Sattarov, T., Borth, D., Dengel, A., and Reimer, B. (2017). Detection of anomalies in large scale accounting data using deep autoencoder networks. arXiv."},{"key":"ref_11","unstructured":"Zupan, M., Letinic, S., and Budimir, V. (2020, November 29). Accounting Journal Reconstruction with Variational Autoencoders and Long Short-term Memory Architecture. Available online: http:\/\/ceur-ws.org\/Vol-2646\/05-paper.pdf."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Schultz, M., and Tropmann-Frick, M. (2020, January 7\u201310). Autoencoder Neural Networks versus External Auditors: Detecting Unusual Journal Entries in Financial Statement Audits. Proceedings of the 53rd Hawaii International Conference on System Sciences, Maui, HI, USA.","DOI":"10.24251\/HICSS.2020.666"},{"key":"ref_13","unstructured":"Bengtsson, H., and Jansson, J. (2015). Using Classification Algorithms for Smart Suggestions in Accounting Systems. [Master\u2019s Thesis, Chalmers University of Technology]."},{"key":"ref_14","unstructured":"Bergdorf, J. (2020, November 29). Machine Learning and Rule Induction in Invoice Processing: Comparing Machine Learning Methods in Their Ability to Assign Account Codes in the Bookkeeping Process. Available online: http:\/\/www.diva-portal.se\/smash\/get\/diva2:1254853\/FULLTEXT01.pdf."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1504\/IJOR.2005.007433","article-title":"The linear multiple choice knapsack problem with equity constraints","volume":"1","author":"Kozanidis","year":"2005","journal-title":"Int. J. Oper. Res."},{"key":"ref_16","unstructured":"Pyle, D. (1999). Data Preparation for Data Mining, Morgan Kaufmann."},{"key":"ref_17","first-page":"4","article-title":"A review of machine learning algorithms for text-documents classification","volume":"1","author":"Khan","year":"2010","journal-title":"J. Adv. Inf. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines, Springer Science & Business Media.","DOI":"10.1007\/978-1-4615-0907-3"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505282.505283","article-title":"Machine learning in automated text categorization","volume":"34","author":"Sebastiani","year":"2002","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_20","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text classification algorithms: A survey. Information, 10.","DOI":"10.3390\/info10040150"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1457","DOI":"10.1109\/TKDE.2006.180","article-title":"Some effective techniques for naive bayes text classification","volume":"18","author":"Kim","year":"2006","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wang, Z., He, Y., and Jiang, M. (2006, January 16\u201320). A comparison among three neural networks for text classification. Proceedings of the 2006 8th International Conference on Signal Processing, Beijing, China.","DOI":"10.1109\/ICOSP.2006.345923"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, Z.Q., Sun, X., Zhang, D.X., and Li, X. (2006, January 13\u201316). An optimal SVM-based text classification algorithm. Proceedings of the 2006 International Conference on Machine Learning and Cybernetics, Dalian, China.","DOI":"10.1109\/ICMLC.2006.258708"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kanakaraj, M., and Guddeti, R.M.R. (2015, January 7\u20139). Performance analysis of Ensemble methods on Twitter sentiment analysis using NLP techniques. Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), Anaheim, CA, USA.","DOI":"10.1109\/ICOSC.2015.7050801"},{"key":"ref_26","unstructured":"Colas, F., and Brazdil, P. (2006, January 21\u201324). Comparison of SVM and some older classification algorithms in text classification tasks. Proceedings of the IFIP International Conference on Artificial Intelligence in Theory and Practice, Santiago, Chile."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"349","DOI":"10.4310\/SII.2009.v2.n3.a8","article-title":"Multi-class adaboost","volume":"2","author":"Hastie","year":"2009","journal-title":"Stat. Its Interface"},{"key":"ref_29","unstructured":"Delashmit, W.H., and Manry, M.T. (2005, January 11). Recent developments in multilayer perceptron neural networks. Proceedings of the Seventh Annual Memphis Area Engineering and Science Conference, MAESC, Memphis, TN, USA."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10618-010-0175-9","article-title":"A survey of hierarchical classification across different application domains","volume":"22","author":"Silla","year":"2011","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_31","first-page":"31","article-title":"Incremental algorithms for hierarchical classification","volume":"7","author":"Gentile","year":"2006","journal-title":"J. Mach. Learn. Res."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/2\/4\/33\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:39:29Z","timestamp":1760179169000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/2\/4\/33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":31,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["make2040033"],"URL":"https:\/\/doi.org\/10.3390\/make2040033","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202010.0057.v1","asserted-by":"object"}]},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,30]]}}}