{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:23:05Z","timestamp":1760149385234,"version":"build-2065373602"},"reference-count":13,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2023,7,31]],"date-time":"2023-07-31T00:00:00Z","timestamp":1690761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior\u2014Brazil (CAPES)","award":["001"],"award-info":[{"award-number":["001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>One of the areas in which knowledge management has application is in companies that are concerned with maintaining and disseminating their practices among their members. However, studies involving these two domains may end up suffering from the issue of data confidentiality. Furthermore, it is difficult to find data regarding organizations processes and associated knowledge. Therefore, this paper presents a method to support the generation of a labeled dataset composed of texts that simulate corporate emails containing sensitive information regarding disclosure, written in Portuguese. The method begins with the definition of the dataset\u2019s size and content distribution; the structure of its emails\u2019 texts; and the guidelines for specialists to build the emails\u2019 texts. It aims to create datasets that can be used in the validation of a tacit knowledge extraction process considering the 5W1H approach for the resulting base. The method was applied to create a dataset with content related to several domains, such as Federal Court and Registry Office and Marketing, giving it diversity and realism, while simulating real-world situations in the specialists\u2019 professional life. The dataset generated is available in an open-access repository so that it can be downloaded and, eventually, expanded.<\/jats:p>","DOI":"10.3390\/data8080127","type":"journal-article","created":{"date-parts":[[2023,7,31]],"date-time":"2023-07-31T09:22:38Z","timestamp":1690795358000},"page":"127","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["eMailMe: A Method to Build Datasets of Corporate Emails in Portuguese"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8597-4303","authenticated-orcid":false,"given":"Akira A. de Moura Galv\u00e3o","family":"Uematsu","sequence":"first","affiliation":[{"name":"Engenharia de Computa\u00e7\u00e3o e Sistemas Digitais, Escola Polit\u00e9cnica-Universidade de S\u00e3o Paulo, Av. Prof. Luciano Gualberto, S\u00e3o Paulo 05508-010, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8992-4768","authenticated-orcid":false,"given":"Anarosa A. F.","family":"Brand\u00e3o","sequence":"additional","affiliation":[{"name":"Engenharia de Computa\u00e7\u00e3o e Sistemas Digitais, Escola Polit\u00e9cnica-Universidade de S\u00e3o Paulo, Av. Prof. Luciano Gualberto, S\u00e3o Paulo 05508-010, Brazil"}]}],"member":"1968","published-online":{"date-parts":[[2023,7,31]]},"reference":[{"key":"ref_1","unstructured":"Jurisica, I., Mylopoulos, J., and Yu, E. (1999, January 1\u20134). Using Ontologies for Knowledge Management: An Information Systems Perspective. Proceedings of the Annual Conference of the American Society for Information Science, Washington DC, USA."},{"key":"ref_2","first-page":"110","article-title":"Guidelines for Tacit Knowledge Acquisition","volume":"38","author":"Mohammad","year":"2012","journal-title":"J. Theor. Appl. Inf. Technol."},{"key":"ref_3","unstructured":"Hamborg, F., Breitinger, C., and Gipp, B. (2019, January 19). GiveMe5W1H: A universal system for extracting main events from news articles. Proceedings of the INRA-International Workshop on News Recommendation and Analytics, Copenhagen, Denmark."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1504\/IJKMS.2020.105074","article-title":"The innovative model for extracting tacit knowledge in organisations","volume":"11","author":"Supnitchaisiri","year":"2020","journal-title":"Int. J. Knowl. Manag. Stud."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Carnaz, G., Nogueira, V., and Antunes, M. (2021). A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics, 8.","DOI":"10.3390\/informatics8020037"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Carnaz, G., Antunes, M., and Nogueira, V.B. (2021). An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing. Data, 6.","DOI":"10.3390\/data6070071"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Islam, M.T., Hasan, K.M.A., and Hossen, M.I. (2022, January 17\u201319). Classification and Resource Generation for Bangla Emails Based on Machine Learning Algorithms. Proceedings of the 2022 25th International Conference on Computer and Information Technology, ICCIT 2022, Cox\u2019s Bazar, Bangladesh.","DOI":"10.1109\/ICCIT57492.2022.10054742"},{"key":"ref_8","unstructured":"Cha, I., Oh, J., Park, C.Y., Han, J., and Lee, H. (2022). The Grind for Good Data: Understanding ML Practitioners\u2019 Struggles and Aspirations in Making Good Data. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hristov, E., Petrova-Antonova, D., Petrov, A., Borukova, M., and Shirinyan, E. (2023). Remote Sensing Data Preparation for Recognition and Classification of Building Roofs. Data, 8.","DOI":"10.3390\/data8050080"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Alshammari, T., Alshammari, N., Sedky, M., and Howard, C. (2018). SIMADL: Simulated Activities of Daily Living Dataset. Data, 3.","DOI":"10.3390\/data3020011"},{"key":"ref_11","unstructured":"Bussab, W.O., and Morettin, P.A. (2006). Estat\u00edstica B\u00e1sica, Saraiva."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Searle, J. (1969). Speech Acts: An Essay in the Philosophy of Language, Cambridge University Press.","DOI":"10.1017\/CBO9781139173438"},{"key":"ref_13","unstructured":"The R Foundation (2023, July 25). R. Version 3.6.3. Available online: https:\/\/www.r-project.org\/."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/8\/127\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:22:55Z","timestamp":1760127775000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/8\/127"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7,31]]},"references-count":13,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2023,8]]}},"alternative-id":["data8080127"],"URL":"https:\/\/doi.org\/10.3390\/data8080127","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2023,7,31]]}}}