{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T04:19:15Z","timestamp":1772252355109,"version":"3.50.1"},"reference-count":26,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T00:00:00Z","timestamp":1641772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in international trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists\u2019 performance in the classification of goods.<\/jats:p>","DOI":"10.3390\/bdcc6010008","type":"journal-article","created":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T17:42:25Z","timestamp":1641836545000},"page":"8","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade"],"prefix":"10.3390","volume":"6","author":[{"given":"Roberta Rodrigues","family":"de Lima","sequence":"first","affiliation":[{"name":"Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itaja\u00ed, Itaja\u00ed 88302-901, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2986-5353","authenticated-orcid":false,"given":"Anita M. R.","family":"Fernandes","sequence":"additional","affiliation":[{"name":"Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itaja\u00ed, Itaja\u00ed 88302-901, Brazil"}]},{"given":"James Roberto","family":"Bombasar","sequence":"additional","affiliation":[{"name":"Analysis and Systems Development Course, Centro Universit\u00e1rio Avantis, Balne\u00e1rio Cambori\u00fa 88339-125, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7995-0695","authenticated-orcid":false,"given":"Bruno Alves","family":"da Silva","sequence":"additional","affiliation":[{"name":"Laboratory of Applied Intelligence, School of the Sea Science and Technology-University of Vale do Itaja\u00ed, Itaja\u00ed 88302-901, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6824-6136","authenticated-orcid":false,"given":"Paul","family":"Crocker","sequence":"additional","affiliation":[{"name":"Instituto de Telecomunica\u00e7\u00f5es and Departamento de Inform\u00e1tica, Universidade da Beira Interior, 6201-001 Covilh\u00e3, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0446-9271","authenticated-orcid":false,"given":"Valderi Reis Quietinho","family":"Leithardt","sequence":"additional","affiliation":[{"name":"COPELABS, Lus\u00f3fona University of Humanities and Technologies, Campo Grande 376, 1749-024 Lisboa, Portugal"},{"name":"VALORIZA, Research Center for Endogenous Resources Valorization, Instituto Polit\u00e9cnico de Portalegre, 7300-555 Portalegre, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,10]]},"reference":[{"key":"ref_1","unstructured":"Keedi, S. (2011). ABC do Com\u00e9rcio Exterior: Abrindo as Primeiras P\u00e1ginas, Aduaneiras. [4th ed.]."},{"key":"ref_2","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, The MIT Press."},{"key":"ref_3","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA. Volume 1 (Long and Short Papers)."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Souza, F., Nogueira, R., and Lotufo, R. (2020, January 20\u201323). BERTimbau: Pretrained BERT models for brazilian portuguese. Intelligent Systems. Proceedings of the Brazilian Conference on Intelligent Systems, Rio Grande, Brazil.","DOI":"10.1007\/978-3-030-61377-8_28"},{"key":"ref_5","unstructured":"Receita Federal do Brasil (2021, July 21). NCM: Nomenclatura Comum do Mercosul, Available online: https:\/\/receita.economia.gov.br\/orientacao\/aduaneira\/classificacao-fiscal-de-mercadorias\/ncm."},{"key":"ref_6","unstructured":"Receita Federal do Brasil (2021, July 07). Tarifas Vigentes\/Lista de Bens sem Similar Nacional (Lessin), Available online: https:\/\/www.gov.br\/produtividade-e-comercio-exterior\/pt-br\/assuntos\/camex\/estrategia-comercial\/listas-vigentes."},{"key":"ref_7","unstructured":"Bizelli, J.S. (2003). Classifica\u00e7\u00e3o Fiscal de Mercadorias, Aduaneiras."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Aggarwal, C.C. (2015). Data Mining: The Textbook, Springer.","DOI":"10.1007\/978-3-319-14142-8"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bramer, M. (2016). Principles of Data Mining, Springer. [3rd ed.].","DOI":"10.1007\/978-1-4471-7307-6"},{"key":"ref_10","unstructured":"Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., and Macherey, K. (2016). Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv."},{"key":"ref_11","unstructured":"Alammar, J. (2021, July 07). The Illustrated BERT, ELMo, and Co.: How NLP Cracked Transfer Learning. Available online: http:\/\/jalammar.github.io\/illustrated-bert\/."},{"key":"ref_12","unstructured":"Nayak, P. (2021, December 11). Understanding Searches Better than Ever before. Available online: https:\/\/blog.google\/products\/search\/search-language-understanding-bert\/."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., and Guestrin, C. (2016, January 13\u201317). \u201cWhy Should I Trust You?\u201d: Explaining the Predictions of Any Classifier. Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939778"},{"key":"ref_14","unstructured":"Receita Federal do Brasil (2021, July 25). Sistema Apoio Siscori, Available online: https:\/\/siscori.receita.fazenda.gov.br\/apoiosiscori\/consulta.jsf."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1462","DOI":"10.1016\/j.procs.2015.08.224","article-title":"Auto-Categorization of HS Code Using Background Net Approach","volume":"60","author":"Ding","year":"2015","journal-title":"Procedia Comput. Sci."},{"key":"ref_16","first-page":"4","article-title":"Classifica\u00e7\u00e3o Autom\u00e1tica de C\u00f3digos NCM Utilizando o Algoritmo Na\u00efve Bayes","volume":"11","author":"Batista","year":"2018","journal-title":"iSys\u2014Rev. Bras. Sist. Inf."},{"key":"ref_17","unstructured":"Rajapakse, T. (2021, May 02). Simple Transformers. Available online: https:\/\/simpletransformers.ai\/."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Fava, L.P., Furtado, J.C., Helfer, G.A., Barbosa, J.L.V., Beko, M., Correia, S.D., and Leithardt, V.R.Q. (2021). A Multi-Start Algorithm for Solving the Capacitated Vehicle Routing Problem with Two-Dimensional Loading Constraints. Symmetry, 13.","DOI":"10.20944\/preprints202109.0125.v1"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Jurman, G., Riccadonna, S., and Furlanello, C. (2012). A Comparison of MCC and CEN Error Measures in Multi-Class Prediction. PLoS ONE, 7.","DOI":"10.1371\/journal.pone.0041882"},{"key":"ref_20","unstructured":"Zafar, I., Tzanidou, G., Burton, R., Patel, N., and Araujo, L. (2018). Hands-On Convolutional Neural Networks with TensorFlow: Solve Computer Vision Problems with Modeling in Tensorflow and Python, Packt Publishing."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Cui, Y., Jia, M., Lin, T., Song, Y., and Belongie, S. (2019, January 15\u201320). Class-Balanced Loss Based on Effective Number of Samples. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00949"},{"key":"ref_22","unstructured":"Gorodkin, J. (2021, August 08). The Rk Page. Available online: https:\/\/rth.dk\/resources\/rk\/introduction\/index.html."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Chicco, D., and Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. Bmc Genom., 21.","DOI":"10.1186\/s12864-019-6413-7"},{"key":"ref_24","unstructured":"Biewald, L. (2021, June 09). Experiment Tracking with Weights and Biases. Weight and Biases. Available online: http:\/\/wandb.com\/."},{"key":"ref_25","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_26","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/1\/8\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T13:38:28Z","timestamp":1760362708000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/1\/8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,10]]},"references-count":26,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["bdcc6010008"],"URL":"https:\/\/doi.org\/10.3390\/bdcc6010008","relation":{"has-preprint":[{"id-type":"doi","id":"10.20944\/preprints202111.0378.v1","asserted-by":"object"}]},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,10]]}}}