{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,28]],"date-time":"2025-11-28T17:26:20Z","timestamp":1764350780610,"version":"build-2065373602"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,11,7]],"date-time":"2022-11-07T00:00:00Z","timestamp":1667779200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Fapemig"},{"name":"CAPES"},{"name":"Aws"},{"name":"CNPq"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,11,7]]},"DOI":"10.1145\/3539637.3557052","type":"proceedings-article","created":{"date-parts":[[2022,9,26]],"date-time":"2022-09-26T22:14:00Z","timestamp":1664230440000},"page":"191-201","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Evaluating Topic Modeling Pre-processing Pipelines for Portuguese Texts"],"prefix":"10.1145","author":[{"given":"Ant\u00f4nio Pereira De Souza","family":"J\u00fanior","sequence":"first","affiliation":[{"name":"Departamento de Ci\u00eancia da Computa\u00e7\u00e3o, Universidade Federal de S\u00e3o Jo\u00e3o del-Rei, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Pablo","family":"Cecilio","sequence":"additional","affiliation":[{"name":"Departamento de Ci\u00eancia da Computa\u00e7\u00e3o, Universidade Federal de S\u00e3o Jo\u00e3o del-Rei, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Felipe","family":"Viegas","sequence":"additional","affiliation":[{"name":"Departamento de Ci\u00eancia da Computa\u00e7\u00e3o, Universidade Federal de Minas Gerais, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Washington","family":"Cunha","sequence":"additional","affiliation":[{"name":"Departamento de Ci\u00eancia da Computa\u00e7\u00e3o, Universidade Federal de Minas Gerais, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elisa Tuler De","family":"Albergaria","sequence":"additional","affiliation":[{"name":"Departamento de Ci\u00eancia da Computa\u00e7\u00e3o, Universidade Federal de S\u00e3o Jo\u00e3o del-Rei, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leonardo Chaves Dutra Da","family":"Rocha","sequence":"additional","affiliation":[{"name":"Departamento de Ci\u00eancia da Computa\u00e7\u00e3o, Universidade Federal de S\u00e3o Jo\u00e3o del-Rei, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,11,7]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323503.3360644"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/BRACIS.2014.56"},{"key":"e_1_3_2_1_3_1","article-title":"Latent Dirichlet Allocation","author":"Blei M.","year":"2003","unstructured":"David\u00a0 M. Blei , Andrew\u00a0 Y. Ng , and Michael\u00a0 I. Jordan . 2003 . Latent Dirichlet Allocation . J. Mach. Learn. Res. 3, null ( March 2003), 993\u20131022. David\u00a0M. Blei, Andrew\u00a0Y. Ng, and Michael\u00a0I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, null (March 2003), 993\u20131022.","journal-title":"J. Mach. Learn. Res. 3, null"},{"key":"e_1_3_2_1_4_1","volume-title":"What\u2019s all the talk about? Topic modelling in a mental health Internet support group. BMC psychiatry 16, 1","author":"Carron-Arthur Bradley","year":"2016","unstructured":"Bradley Carron-Arthur , Julia Reynolds , Kylie Bennett , Anthony Bennett , and Kathleen\u00a0 M Griffiths . 2016. What\u2019s all the talk about? Topic modelling in a mental health Internet support group. BMC psychiatry 16, 1 ( 2016 ), 367. https:\/\/doi.org\/10.1186\/s12888-016-1073-5 10.1186\/s12888-016-1073-5 Bradley Carron-Arthur, Julia Reynolds, Kylie Bennett, Anthony Bennett, and Kathleen\u00a0M Griffiths. 2016. What\u2019s all the talk about? Topic modelling in a mental health Internet support group. BMC psychiatry 16, 1 (2016), 367. https:\/\/doi.org\/10.1186\/s12888-016-1073-5"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102263"},{"key":"e_1_3_2_1_6_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018). https:\/\/arxiv.org\/abs\/1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018). https:\/\/arxiv.org\/abs\/1810.04805 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805(2018). https:\/\/arxiv.org\/abs\/1810.04805"},{"key":"e_1_3_2_1_7_1","unstructured":"Derek Greene and James\u00a0P. Cross. 2016. Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. arxiv:1607.03055\u00a0[cs.CL]  Derek Greene and James\u00a0P. Cross. 2016. Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach. arxiv:1607.03055\u00a0[cs.CL]"},{"key":"e_1_3_2_1_8_1","volume-title":"Article arXiv:1708.06025 (Aug.","author":"Hartmann Nathan","year":"2017","unstructured":"Nathan Hartmann , Erick Fonseca , Christopher Shulby , Marcos Treviso , Jessica Rodrigues , and Sandra Aluisio . 2017. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. arXiv e-prints , Article arXiv:1708.06025 (Aug. 2017 ), arXiv:1708.06025\u00a0pages. arxiv:1708.06025\u00a0[cs.CL] Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos Treviso, Jessica Rodrigues, and Sandra Aluisio. 2017. Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks. arXiv e-prints, Article arXiv:1708.06025 (Aug. 2017), arXiv:1708.06025\u00a0pages. arxiv:1708.06025\u00a0[cs.CL]"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-173364"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1080\/21670811.2015.1093271"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISA.2016.7785373"},{"key":"e_1_3_2_1_12_1","volume-title":"Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755","author":"Lee D.","year":"1999","unstructured":"Daniel\u00a0 D. Lee and H.\u00a0 Sebastian Seung . 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 ( 1999 ), 788\u2013791. https:\/\/doi.org\/10.1038\/44565 10.1038\/44565 Daniel\u00a0D. Lee and H.\u00a0Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788\u2013791. https:\/\/doi.org\/10.1038\/44565"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.5555\/3008751.3008829"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3091108"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186168"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1080\/10548408.2020.1740138"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2914720"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2914720"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1177\/0165551515617393"},{"key":"e_1_3_2_1_20_1","unstructured":"D. Nunes D. Matos J. Gomes and F. Neto. 2021. Chronic Pain and Language: A Topic Modelling Approach to Personal Pain Descriptions. https:\/\/arxiv.org\/abs\/2109.00402. arxiv:2109.00402\u00a0[cs.CL]  D. Nunes D. Matos J. Gomes and F. Neto. 2021. Chronic Pain and Language: A Topic Modelling Approach to Personal Pain Descriptions. https:\/\/arxiv.org\/abs\/2109.00402. arxiv:2109.00402\u00a0[cs.CL]"},{"key":"e_1_3_2_1_21_1","volume-title":"Antonio Jos\u00e9\u00a0G. Busson, and S\u00e9rgio Colcher.","author":"Adler\u00a0Soares Pinto Matheus","year":"2020","unstructured":"Matheus Adler\u00a0Soares Pinto , Antonio Fernando Lavareda\u00a0Jacob Junior , Antonio Jos\u00e9\u00a0G. Busson, and S\u00e9rgio Colcher. 2020 . Relacionando Modelagem de T\u00f3picos e Classifica\u00e7\u00e3o de Sentimentos para An\u00e1lise de Mensagens do Twitter Durante a Pandemia da COVID-19. In Anais Estendidos do XXVI Simp\u00f3sio Brasileiro de Sistemas Multim\u00eddia e Web (S\u00e3o Lu\u00eds). SBC, Porto Alegre, RS, Brasil , 61\u201364. https:\/\/doi.org\/10.5753\/webmedia_estendido.2020.13064 10.5753\/webmedia_estendido.2020.13064 Matheus Adler\u00a0Soares Pinto, Antonio Fernando Lavareda\u00a0Jacob Junior, Antonio Jos\u00e9\u00a0G. Busson, and S\u00e9rgio Colcher. 2020. Relacionando Modelagem de T\u00f3picos e Classifica\u00e7\u00e3o de Sentimentos para An\u00e1lise de Mensagens do Twitter Durante a Pandemia da COVID-19. In Anais Estendidos do XXVI Simp\u00f3sio Brasileiro de Sistemas Multim\u00eddia e Web (S\u00e3o Lu\u00eds). SBC, Porto Alegre, RS, Brasil, 61\u201364. https:\/\/doi.org\/10.5753\/webmedia_estendido.2020.13064"},{"key":"e_1_3_2_1_22_1","volume-title":"Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications 181 (07","author":"Qaiser Shahzad","year":"2018","unstructured":"Shahzad Qaiser and Ramsha Ali . 2018 . Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications 181 (07 2018). https:\/\/doi.org\/10.5120\/ijca2018917395 10.5120\/ijca2018917395 Shahzad Qaiser and Ramsha Ali. 2018. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications 181 (07 2018). https:\/\/doi.org\/10.5120\/ijca2018917395"},{"volume-title":"Advances in Knowledge Discovery and Data Mining, Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (Eds.)","author":"Qiang Jipeng","key":"e_1_3_2_1_23_1","unstructured":"Jipeng Qiang , Ping Chen , Tong Wang , and Xindong Wu. 2017. Topic Modeling over Short Texts by Incorporating Word Embeddings . In Advances in Knowledge Discovery and Data Mining, Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (Eds.) . Springer International Publishing , Cham , 363\u2013374. Jipeng Qiang, Ping Chen, Tong Wang, and Xindong Wu. 2017. Topic Modeling over Short Texts by Incorporating Word Embeddings. In Advances in Knowledge Discovery and Data Mining, Jinho Kim, Kyuseok Shim, Longbing Cao, Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (Eds.). Springer International Publishing, Cham, 363\u2013374."},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2020.01.083"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323503.3360628"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186009"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3178876.3186009"},{"key":"e_1_3_2_1_28_1","volume-title":"Preprocessing Egyptian Dialect Tweets for Sentiment Mining. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages. Association for Machine Translation in the Americas","author":"Shoukry Amira","year":"2012","unstructured":"Amira Shoukry and Ahmed Rafea . 2012 . Preprocessing Egyptian Dialect Tweets for Sentiment Mining. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages. Association for Machine Translation in the Americas , San Diego, California, USA, 47\u201356. https:\/\/aclanthology.org\/ 2012.amta-caas14.7 Amira Shoukry and Ahmed Rafea. 2012. Preprocessing Egyptian Dialect Tweets for Sentiment Mining. In Fourth Workshop on Computational Approaches to Arabic-Script-based Languages. Association for Machine Translation in the Americas, San Diego, California, USA, 47\u201356. https:\/\/aclanthology.org\/2012.amta-caas14.7"},{"key":"e_1_3_2_1_29_1","volume-title":"Modelagem de t\u00f3picos: Resumir e organizar corpus de dados por meio de algoritmos de aprendizagem de m\u00e1quina. M\u00faltiplos Olhares em Ci\u00eancia da Informa\u00e7\u00e3o 9, 2 (jan","author":"Souza Marcos\u00a0de","year":"2020","unstructured":"Marcos\u00a0de Souza and Renato\u00a0Rocha Souza . 2020. Modelagem de t\u00f3picos: Resumir e organizar corpus de dados por meio de algoritmos de aprendizagem de m\u00e1quina. M\u00faltiplos Olhares em Ci\u00eancia da Informa\u00e7\u00e3o 9, 2 (jan . 2020 ). https:\/\/periodicos.ufmg.br\/index.php\/moci\/article\/view\/19138 Marcos\u00a0de Souza and Renato\u00a0Rocha Souza. 2020. Modelagem de t\u00f3picos: Resumir e organizar corpus de dados por meio de algoritmos de aprendizagem de m\u00e1quina. M\u00faltiplos Olhares em Ci\u00eancia da Informa\u00e7\u00e3o 9, 2 (jan. 2020). https:\/\/periodicos.ufmg.br\/index.php\/moci\/article\/view\/19138"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2013.08.006"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3291032"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.724"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271797"}],"event":{"name":"WebMedia '22: Brazilian Symposium on Multimedia and Web","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGMM ACM Special Interest Group on Multimedia"],"location":"Curitiba Brazil","acronym":"WebMedia '22"},"container-title":["Proceedings of the Brazilian Symposium on Multimedia and the Web"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3539637.3557052","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3539637.3557052","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:38:03Z","timestamp":1750178283000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3539637.3557052"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,7]]},"references-count":33,"alternative-id":["10.1145\/3539637.3557052","10.1145\/3539637"],"URL":"https:\/\/doi.org\/10.1145\/3539637.3557052","relation":{},"subject":[],"published":{"date-parts":[[2022,11,7]]},"assertion":[{"value":"2022-11-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}