{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:11:40Z","timestamp":1760238700414,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2020,9,1]],"date-time":"2020-09-01T00:00:00Z","timestamp":1598918400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"AIA: Agente Inteligente para Atendimento no Balc\u00e3o do Empreendedor","award":["FCT INCoDe 2030"],"award-info":[{"award-number":["FCT INCoDe 2030"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>This paper describes how we tackled the development of Amaia, a conversational agent for Portuguese entrepreneurs. After introducing the domain corpus used as Amaia\u2019s Knowledge Base (KB), we make an extensive comparison of approaches for automatically matching user requests with Frequently Asked Questions (FAQs) in the KB, covering Information Retrieval (IR), approaches based on static and contextual word embeddings, and a model of Semantic Textual Similarity (STS) trained for Portuguese, which achieved the best performance. We further describe how we decreased the model\u2019s complexity and improved scalability, with minimal impact on performance. In the end, Amaia combines an IR library and an STS model with reduced features. Towards a more human-like behavior, Amaia can also answer out-of-domain questions, based on a second corpus integrated in the KB. Such interactions are identified with a text classifier, also described in the paper.<\/jats:p>","DOI":"10.3390\/info11090428","type":"journal-article","created":{"date-parts":[[2020,9,1]],"date-time":"2020-09-01T08:53:43Z","timestamp":1598950423000},"page":"428","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Developing Amaia: A Conversational Agent for Helping Portuguese Entrepreneurs\u2014An Extensive Exploration of Question-Matching Approaches for Portuguese"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9207-9761","authenticated-orcid":false,"given":"Jos\u00e9","family":"Santos","sequence":"first","affiliation":[{"name":"CISUC, DEI, University of Coimbra, 3030-290 Coimbra, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2008-8647","authenticated-orcid":false,"given":"Lu\u00eds","family":"Duarte","sequence":"additional","affiliation":[{"name":"CISUC, DEI, University of Coimbra, 3030-290 Coimbra, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0203-8443","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Ferreira","sequence":"additional","affiliation":[{"name":"CISUC, DEI, University of Coimbra, 3030-290 Coimbra, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3692-338X","authenticated-orcid":false,"given":"Ana","family":"Alves","sequence":"additional","affiliation":[{"name":"CISUC, DEI, University of Coimbra, 3030-290 Coimbra, Portugal"},{"name":"Instituto Superior de Engenharia de Coimbra (ISEC), Instituto Polit\u00e9cnico de Coimbra, 3030-199 Coimbra, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5779-8645","authenticated-orcid":false,"given":"Hugo Gon\u00e7alo","family":"Oliveira","sequence":"additional","affiliation":[{"name":"CISUC, DEI, University of Coimbra, 3030-290 Coimbra, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2020,9,1]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Manning, C.D., Raghavan, P., and Sch\u00fctze, H. (2008). Introduction to Information Retrieval, Cambridge University Press.","DOI":"10.1017\/CBO9780511809071"},{"key":"ref_2","unstructured":"Agirre, E., Diab, M., Cer, D., and Gonzalez-Agirre, A. (2012, January 7). Semeval-2012 task 6: A pilot on semantic textual similarity. Proceedings of the *SEM 2012: The First, Joint Conference on Lexical and Computational Semantics\u2013Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation SemEval 2012, Montr\u00e9al, QB, Canada."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Cer, D., Diab, M., Agirre, E., Lopez-Gazpio, I., and Specia, L. (2017, January 3\u20134). SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, Vancouver, BC, Canada.","DOI":"10.18653\/v1\/S17-2001"},{"key":"ref_4","first-page":"3","article-title":"Vis\u00e3o Geral da Avalia\u00e7\u00e3o de Similaridade Sem\u00e2ntica e Infer\u00eancia Textual","volume":"8","author":"Fonseca","year":"2016","journal-title":"Linguam\u00e1tica"},{"key":"ref_5","unstructured":"Gon\u00e7alo Oliveira, H., Real, L., and Fonseca, E. (2019, January 15). Organizing the ASSIN 2 Shared Task. Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese, Salvador, BA, Brazil."},{"key":"ref_6","unstructured":"Gon\u00e7alo Oliveira, H., Ferreira, J., Santos, J., Fialho, P., Rodrigues, R., Coheur, L., and Alves, A. (2020, January 11\u201316). AIA-BDE: A Corpus of FAQs in Portuguese and their Variations. Proceedings of the 12th International Conference on Language Resources and Evaluation, Marseille, France."},{"key":"ref_7","unstructured":"Ameixa, D., Coheur, L., and Redol, R.A. (2013). From Subtitles to Human Interactions: Introducing the Subtle Corpus, INESC-ID. Technical Report."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Santos, J., Alves, A., and Gon\u00e7alo Oliveira, H. (2020, January 2\u20134). Leveraging on Semantic Textual Similarity for developing a Portuguese Dialogue System. Proceedings of the Portuguese Language-13th International Conference, PROPOR 2020, \u00c9vora, Portugal.","DOI":"10.1007\/978-3-030-41505-1_13"},{"key":"ref_9","unstructured":"Vinyals, O., and Le, Q.V. (2015, January 6\u201311). A Neural Conversational Model. Proceedings of the Deep Learning Workshop at ICML, Lille, France."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"361","DOI":"10.1017\/S1351324901002789","article-title":"The TREC Question Answering Track","volume":"7","author":"Voorhees","year":"2001","journal-title":"Nat. Lang. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Rinaldi, F., Dowdall, J., Hess, M., Moll\u00e1, D., Schwitter, R., and Kaljurand, K. (2003, January 3\u20135). Knowledge-Based Question Answering. Proceedings of the 7th International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES 2003), Oxford, UK.","DOI":"10.1007\/978-3-540-45224-9_106"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"5412","DOI":"10.1016\/j.ins.2011.07.047","article-title":"A Survey on Question Answering Technology from an Information Retrieval Perspective","volume":"181","author":"Kolomiyets","year":"2011","journal-title":"Inf. Sci."},{"key":"ref_13","unstructured":"Ji, Z., Lu, Z., and Li, H. (2014). An Information Retrieval Approach to Short Text Conversation. arXiv."},{"key":"ref_14","unstructured":"Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., and Zhou, M. (August, January 30). SuperAgent: A Customer Service Chatbot for E-commerce Websites. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, A., Mihalcea, R., Rigau, G., and Wiebe, J. (2016, January 16\u201317). SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA. Association for Computational Linguistics.","DOI":"10.18653\/v1\/S16-1081"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Nakov, P., M\u00e0rquez, L., Moschitti, A., Magdy, W., Mubarak, H., Freihat, A.A., Glass, J., and Randeree, B. (2016, January 16\u201317). SemEval-2016 Task 3: Community Question Answering. Proceedings of the 10th International Workshop on Semantic Evaluation, (SemEval-2016), San Diego, CA, USA.","DOI":"10.18653\/v1\/S16-1083"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Nakov, P., Hoogeveen, D., M\u00e0rquez, L., Moschitti, A., Mubarak, H., Baldwin, T., and Verspoor, K. (2017, January 3\u20134). SemEval-2017 Task 3: Community Question Answering. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, BC, Canada.","DOI":"10.18653\/v1\/S17-2003"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Kothari, G., Negi, S., Faruquie, T.A., Chakaravarthy, V.T., and Subramaniam, L.V. (2009, January 2\u20137). SMS Based Interface for FAQ Retrieval. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2, Singapore.","DOI":"10.3115\/1690219.1690266"},{"key":"ref_19","unstructured":"Karan, M., \u017dmak, L., and \u0160najder, J. (2013, January 8\u20139). Frequently Asked Questions Retrieval for Croatian Based on Semantic Textual Similarity. Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, Sofia, Bulgaria."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Caputo, A., Degemmis, M., Lops, P., Lovecchio, F., and Manzari, V. (2016, January 12). Overview of the EVALITA 2016 Question Answering for Frequently Asked Questions (QA4FAQ) Task. Proceedings of the 3rd Italian Conference on Computational Linguistics (CLiC-it 2016) & 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Final Workshop (EVALITA 2016), Naples, Italy.","DOI":"10.4000\/books.aaccademia.1970"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pipitone, A., Tirone, G., and Pirrone, R. (2016, January 20). ChiLab4It system in the QA4FAQ competition. Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Naples, Italy.","DOI":"10.4000\/books.aaccademia.1988"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Fonseca, E.R., Magnolini, S., Feltracco, A., Qwaider, M.R.H., and Magnini, B. (2016, January 20). Tweaking Word Embeddings for FAQ Ranking. Proceedings of the 5th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian, Naples, Italy.","DOI":"10.4000\/books.aaccademia.1981"},{"key":"ref_23","unstructured":"Magarreiro, D., Coheur, L., and Melour, F.S. (2014, January 16\u201318). Using subtitles to deal with Out-of-Domain interactions. Proceedings of the 18th Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Edinburgh, UK."},{"key":"ref_24","unstructured":"Li, J., Galley, M., Brockett, C., Spithourakis, G., Gao, J., and Dolan, B. (May, January 28). A Persona-Based Neural Conversation Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Melo, G., and Coheur, L. (2020, January 2\u20134). Towards a Conversational Agent with \u201cCharacter\u201d. Proceedings of the Portuguese Language-14th International Conference, PROPOR 2020, Evora, Portugal.","DOI":"10.1007\/978-3-030-41505-1_41"},{"key":"ref_26","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_27","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013, January 2\u20134). Efficient Estimation of Word Representations in Vector Space. Proceedings of the 1st International Conference on Learning Representations, Scottsdale, AZ, USA."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 25\u201329). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching Word Vectors with Subword Information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_30","unstructured":"Hartmann, N.S., Fonseca, E.R., Shulby, C.D., Treviso, M.V., Rodrigues, J.S., and Alu\u00edsio, S.M. (2017, January 26). Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology (STIL 2017), Uberl\u00e2ndia, Brazil."},{"key":"ref_31","unstructured":"\u0158eh\u016f\u0159ek, R., and Sojka, P. (2010, January 22). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta."},{"key":"ref_32","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota."},{"key":"ref_33","unstructured":"Souza, F., Nogueira, R., and Lotufo, R. (2019). Portuguese Named Entity Recognition using BERT-CRF. arXiv."},{"key":"ref_34","unstructured":"Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python, O\u2019Reilly Media."},{"key":"ref_35","first-page":"18:1","article-title":"Improving NLTK for Processing Portuguese","volume":"Volume 74","author":"Ferreira","year":"2019","journal-title":"Symposium on Languages, Applications and Technologies (SLATE 2019), Coimbra, Portugal"},{"key":"ref_36","unstructured":"Speer, R., Chin, J., and Havasi, C. (, January 4\u20139). ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. Proceedings of the 3st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Gon\u00e7alo Oliveira, H. (2018, January 24\u201326). Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases. Proceedings of the Computational Processing of the Portuguese Language-13th International Conference, PROPOR 2018, Canela, Brazil.","DOI":"10.1007\/978-3-319-99722-3_27"},{"key":"ref_38","unstructured":"Santos, J., Alves, A., and Gon\u00e7alo Oliveira, H. (2019, January 15). ASAPPpy: A Python Framework for Portuguese STS. Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese co-located with XII Symposium in Information and Human Language Technology (STIL 2019), Salvador, Brazil."},{"key":"ref_39","unstructured":"Rodrigues, R., Couto, P., and Rodrigues, I. (2019, January 15). IPR: The Semantic Textual Similarity and Recognizing Textual Entailment Systems. Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese co-located with XII Symposium in Information and Human Language Technology (STIL 2019), Salvador, BA, Brazil."},{"key":"ref_40","unstructured":"Fonseca, E., and Alvarenga, J.P.R. (2019, January 15). Multilingual Transformer Ensembles for Portuguese Natural Language Tasks. Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese Co-Located with XII Symposium in Information and Human Language Technology (STIL 2019), Salvador, Brazil."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/9\/428\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:05:33Z","timestamp":1760177133000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/9\/428"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,1]]},"references-count":40,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["info11090428"],"URL":"https:\/\/doi.org\/10.3390\/info11090428","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2020,9,1]]}}}