{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:38:29Z","timestamp":1760236709003,"version":"build-2065373602"},"reference-count":43,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2021,12,17]],"date-time":"2021-12-17T00:00:00Z","timestamp":1639699200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["INCoDe.2030"],"award-info":[{"award-number":["INCoDe.2030"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The Natural Language Processing (NLP) community has witnessed huge improvements in the last years. However, most achievements are evaluated on benchmarked curated corpora, with little attention devoted to user-generated content and less-resourced languages. Despite the fact that recent approaches target the development of multi-lingual tools and models, they still underperform in languages such as Portuguese, for which linguistic resources do not abound. This paper exposes a set of challenges encountered when dealing with a real-world complex NLP problem, based on user-generated complaint data in Portuguese. This case study meets the needs of a country-wide governmental institution responsible for food safety and economic surveillance, and its responsibilities in handling a high number of citizen complaints. Beyond looking at the problem from an exclusively academic point of view, we adopt application-level concerns when analyzing the progress obtained through different techniques, including the need to obtain explainable decision support. We discuss modeling choices and provide useful insights for researchers working on similar problems or data.<\/jats:p>","DOI":"10.3390\/info12120525","type":"journal-article","created":{"date-parts":[[2021,12,19]],"date-time":"2021-12-19T20:37:27Z","timestamp":1639946247000},"page":"525","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Robust Complaint Processing in Portuguese"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1252-7515","authenticated-orcid":false,"given":"Henrique","family":"Lopes-Cardoso","sequence":"first","affiliation":[{"name":"Laborat\u00f3rio de Intelig\u00eancia Artificial e Ci\u00eancia de Computadores (LIACC), Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"}]},{"given":"Tom\u00e1s Freitas","family":"Os\u00f3rio","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de Intelig\u00eancia Artificial e Ci\u00eancia de Computadores (LIACC), Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"}]},{"given":"Lu\u00eds Vilar","family":"Barbosa","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de Intelig\u00eancia Artificial e Ci\u00eancia de Computadores (LIACC), Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8252-7292","authenticated-orcid":false,"given":"Gil","family":"Rocha","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de Intelig\u00eancia Artificial e Ci\u00eancia de Computadores (LIACC), Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4709-1718","authenticated-orcid":false,"given":"Lu\u00eds Paulo","family":"Reis","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de Intelig\u00eancia Artificial e Ci\u00eancia de Computadores (LIACC), Faculdade de Engenharia da Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal"}]},{"given":"Jo\u00e3o Pedro","family":"Machado","sequence":"additional","affiliation":[{"name":"Autoridade para a Seguran\u00e7a Alimentar e Econ\u00f3mica (ASAE), Rua Rodrigo da Fonseca, 73, 1269-274 Lisbon, Portugal"}]},{"given":"Ana Maria","family":"Oliveira","sequence":"additional","affiliation":[{"name":"Autoridade para a Seguran\u00e7a Alimentar e Econ\u00f3mica (ASAE), Rua Rodrigo da Fonseca, 73, 1269-274 Lisbon, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,17]]},"reference":[{"key":"ref_1","unstructured":"Eggers, W.D., Malik, N., and Gracie, M. (2019). Using AI to Unleash the Power of Unstructured Government Data. Deloitte Insights, Available online: https:\/\/www2.deloitte.com\/us\/en\/insights\/focus\/cognitive-technologies\/natural-language-processing-examples-in-government-data.html."},{"key":"ref_2","unstructured":"Kowalski, R., Esteve, M., and Mikhaylov, S.J. (2017). Application of Natural Language Processing to Determine User Satisfaction in Public Services. CoRR."},{"key":"ref_3","first-page":"41:1","article-title":"A Survey on Assessment and Ranking Methodologies for User-Generated Content on the Web","volume":"48","author":"Momeni","year":"2015","journal-title":"ACM Comput. Surv."},{"key":"ref_4","first-page":"249","article-title":"Automatic Identification of Economic Activities in Complaints","volume":"Volume 11816","author":"Barbosa","year":"2019","journal-title":"Statistical Language and Speech Processing, 7th International Conference, SLSP 2019, Ljubljana, Slovenia, 14\u201316 October 2019"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Filgueiras, J., Barbosa, L., Rocha, G., Lopes Cardoso, H., Reis, L.P., Machado, J.P., and Oliveira, A.M. (2019). Complaint Analysis and Classification for Economic and Food Safety. Proceedings of the Second Workshop on Economics and Natural Language Processing, Association for Computational Linguistics.","DOI":"10.18653\/v1\/D19-5107"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/s00146-014-0549-4","article-title":"Social media analytics: A survey of techniques, tools and platforms","volume":"30","author":"Batrinca","year":"2015","journal-title":"AI Soc."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Petz, G., Karpowicz, M., F\u00fcrschu\u00df, H., Auinger, A., St\u0159\u00edtesk\u00fd, V., and Holzinger, A. (2013). Opinion Mining on the Web 2.0\u2014Characteristics of User Generated Content and Their Impacts. Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data, Springer.","DOI":"10.1007\/978-3-642-39146-0_4"},{"key":"ref_8","unstructured":"Diaz, G.O., and Ng, V. (2018, January 15\u201320). Modeling and Prediction of Online Product Review Helpfulness: A Survey. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia."},{"key":"ref_9","first-page":"74","article-title":"Determining the Level of Clients\u2019 Dissatisfaction from Their Commentaries","volume":"Volume 9727","author":"Forte","year":"2016","journal-title":"Proceedings of the Computational Processing of the Portuguese Language\u201412th International Conference PROPOR"},{"key":"ref_10","unstructured":"Liu, C.H., Moriya, Y., Poncelas, A., and Groves, D. (December, January 27). IJCNLP-2017 Task 4: Customer Feedback Analysis. Proceedings of the IJCNLP, Taipei, Taiwan."},{"key":"ref_11","unstructured":"Liu, C., Groves, D., Akira, H., Poncelas, A., and Liu, Q. (2018). Understanding Meanings in Multilingual Customer Feedback. arXiv."},{"key":"ref_12","unstructured":"Plank, B. (December, January 27). All-In-1 at IJCNLP-2017 Task 4: Short Text Classification with One Model for All Languages. Proceedings of the IJCNLP, Taipei, Taiwan."},{"key":"ref_13","unstructured":"Wang, N., Wang, J., and Zhang, X. (December, January 27). YNU-HPCC at IJCNLP-2017 Task 4: Attention-based Bi-directional GRU Model for Customer Feedback Analysis Task of English. Proceedings of the IJCNLP, Taipei, Taiwan."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1177\/1094670514524625","article-title":"Analyzing Customer Experience Feedback Using Text Mining: A Linguistics-Based Approach","volume":"17","author":"Ordenes","year":"2014","journal-title":"J. Serv. Res."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Dong, S., and Wang, Z. (2015, January 27\u201329). Evaluating service quality in insurance customer complaint handling throught text categorization. Proceedings of the 2015 International Conference on Logistics, Informatics and Service Sciences (LISS), Barcelona, Brazil.","DOI":"10.1109\/LISS.2015.7369671"},{"key":"ref_16","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013, January 5\u201310). Distributed Representations of Words and Phrases and Their Compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems\u2014Volume 2, Lake Tahoe, NV, USA."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1207\/s15516709cog1402_1","article-title":"Finding Structure in Time","volume":"14","author":"Elman","year":"1990","journal-title":"Cogn. Sci."},{"key":"ref_18","unstructured":"Assawinjaipetch, P., Shirai, K., Sornlertlamvanich, V., and Marukata, S. (2016). Recurrent Neural Network with Word Embedding for Complaint Classification. Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI\/OIAF4HLT2016), The COLING 2016 Organizing Committee."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, H. (2014). Learning to Rank for Information Retrieval and Natural Language Processing, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publ.. [2nd ed.].","DOI":"10.1007\/978-3-031-02155-8"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Fauzan, A., and Khodra, M.L. (2014, January 20\u201321). Automatic Multilabel Categorization using Learning to Rank Framework for Complaint Text on Bandung Government. Proceedings of the 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), Institut Teknologi Bandung, Bandung, Indonesia.","DOI":"10.1109\/ICAICTA.2014.7005910"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Kalyoncu, F., Zeydan, E., Yigit, I.O., and Yildirim, A. (2018, January 28\u201331). A Customer Complaint Analysis Tool for Mobile Network Operators. Proceedings of the 2018 IEEE\/ACM Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), Barcelona, Spain.","DOI":"10.1109\/ASONAM.2018.8508289"},{"key":"ref_22","first-page":"993","article-title":"Latent Dirichlet Allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Qi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C.D. (2020, January 5\u201310). Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.acl-demos.14"},{"key":"ref_24","unstructured":"Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., and Grobler, J. (2013, January 23\u201327). API design for machine learning software: Experiences from the scikit-learn project. Proceedings of the ECML PKDD Workshop: Languages for Data Mining and Machine Learning, Prague, Czech Republic."},{"key":"ref_25","first-page":"1157","article-title":"An Introduction to Variable and Feature Selection","volume":"3","author":"Guyon","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_26","unstructured":"Silva, J., Ribeiro, R., Quaresma, P., Adami, A., and Branco, A. (2016). CONTO.PT: Groundwork for the Automatic Creation of a Fuzzy Portuguese Wordnet. Computational Processing of the Portuguese Language, Springer International Publishing."},{"key":"ref_27","unstructured":"Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., and Macherey, K. (2016). Google\u2019s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chung, J., Cho, K., and Bengio, Y. (2016). A Character-level Decoder without Explicit Segmentation for Neural Machine Translation. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/P16-1160"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2019). HuggingFace\u2019s Transformers: State-of-the-art Natural Language Processing. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_30","first-page":"7059","article-title":"Cross-lingual Language Model Pretraining","volume":"Volume 32","author":"Wallach","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"ref_31","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient estimation of word representations in vector space. Proceedings of the 1st International Conference on Learning Representations, ICLR 2013\u2014Workshop Track Proceedings, Scottsdale, AZ, USA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Kim, Y. (2014, January 25\u201329). Convolutional neural networks for sentence classification. Proceedings of the EMNLP 2014\u20142014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar.","DOI":"10.3115\/v1\/D14-1181"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2017, January 3\u20137). Bag of tricks for efficient text classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain.","DOI":"10.18653\/v1\/E17-2068"},{"key":"ref_34","unstructured":"Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. arXiv."},{"key":"ref_35","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics."},{"key":"ref_36","first-page":"194","article-title":"How to Fine-Tune BERT for Text Classification?","volume":"Volume 11856 LNAI","author":"Sun","year":"2019","journal-title":"Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)"},{"key":"ref_37","unstructured":"Pham, T. (2019). Super-convergence: Very fast training of neural networks using large learning rates. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, SPIE."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Maslej-Kre\u0161\u0148\u00e1kov\u00e1, V., Sarnovsk\u00fd, M., Butka, P., and Machov\u00e1, K. (2020). Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification. Appl. Sci., 10.","DOI":"10.3390\/app10238631"},{"key":"ref_39","unstructured":"Cerri, R., and Prati, R.C. (2020). BERTimbau: Pretrained BERT Models for Brazilian Portuguese. Intelligent Systems, Springer International Publishing."},{"key":"ref_40","unstructured":"Jain, S., and Wallace, B.C. (2019). Attention is not Explanation. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Wiegreffe, S., and Pinter, Y. Attention is not not Explanation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).","DOI":"10.18653\/v1\/D19-1002"},{"key":"ref_42","unstructured":"Jurafsky, D., Chai, J., Schluter, N., and Tetreault, J.R. (2020, January 5\u201310). Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1016\/j.procs.2013.05.005","article-title":"The Role of Text Pre-processing in Sentiment Analysis","volume":"17","author":"Haddi","year":"2013","journal-title":"Procedia Comput. Sci."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/12\/12\/525\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:50:17Z","timestamp":1760169017000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/12\/12\/525"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,17]]},"references-count":43,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["info12120525"],"URL":"https:\/\/doi.org\/10.3390\/info12120525","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2021,12,17]]}}}