{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T11:18:53Z","timestamp":1762082333656,"version":"build-2065373602"},"reference-count":36,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T00:00:00Z","timestamp":1672272000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Studies that use medical records are often impeded due to the information presented in narrative fields. However, recent studies have used artificial intelligence to extract and process secondary health data from electronic medical records. The aim of this study was to develop a neural network that uses data from unstructured medical records to capture information regarding symptoms, diagnoses, medications, conditions, exams, and treatment. Data from 30,000 medical records of patients hospitalized in the Clinical Hospital of the Botucatu Medical School (HCFMB), S\u00e3o Paulo, Brazil, were obtained, creating a corpus with 1200 clinical texts. A natural language algorithm for text extraction and convolutional neural networks for pattern recognition were used to evaluate the model with goodness-of-fit indices. The results showed good accuracy, considering the complexity of the model, with an F-score of 63.9% and a precision of 72.7%. The patient condition class reached a precision of 90.3% and the medication class reached 87.5%. The proposed neural network will facilitate the detection of relationships between diseases and symptoms and prevalence and incidence, in addition to detecting the identification of clinical conditions, disease evolution, and the effects of prescribed medications.<\/jats:p>","DOI":"10.3390\/data8010011","type":"journal-article","created":{"date-parts":[[2022,12,29]],"date-time":"2022-12-29T02:52:21Z","timestamp":1672282341000},"page":"11","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Natural Language Processing to Extract Information from Portuguese-Language Medical Records"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1684-2574","authenticated-orcid":false,"given":"Naila","family":"da Rocha","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Institute of Biosciences, Universidade Estadual Paulista (UNESP), Botucatu 18618-970, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3668-8911","authenticated-orcid":false,"given":"Abner","family":"Barbosa","sequence":"additional","affiliation":[{"name":"Medical School, Universidade Estadual Paulista (UNESP), Botucatu 18618-970, Brazil"}]},{"given":"Yaron","family":"Schnr","sequence":"additional","affiliation":[{"name":"Medical School, Universidade Estadual Paulista (UNESP), Botucatu 18618-970, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3984-4959","authenticated-orcid":false,"given":"Juliana","family":"Machado-Rugolo","sequence":"additional","affiliation":[{"name":"Health Technology Assessment Center (Clinical Hospital of the Botucatu Medical School), Botucatu 18618-970, Brazil"}]},{"given":"Luis","family":"de Andrade","sequence":"additional","affiliation":[{"name":"Medical School, Universidade Estadual Paulista (UNESP), Botucatu 18618-970, Brazil"}]},{"given":"Jos\u00e9","family":"Corrente","sequence":"additional","affiliation":[{"name":"Research Support Office, Funda\u00e7\u00e3o para o Desenvolvimento M\u00e9dico e Hospitalar (FAMESP), Botucatu 18618-687, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8931-5495","authenticated-orcid":false,"given":"Liciana","family":"de Arruda Silveira","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Institute of Biosciences, Universidade Estadual Paulista (UNESP), Botucatu 18618-970, Brazil"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"34","DOI":"10.5007\/1518-2924.2006v11n21p34","article-title":"Prontu\u00e1rio eletr\u00f4nico do paciente: Documento t\u00e9cnico de informa\u00e7\u00e3o e comunica\u00e7\u00e3o do dom\u00ednio da sa\u00fade","volume":"11","author":"Pinto","year":"2006","journal-title":"Encontros Bibli Rev. Eletr\u00f4nica De Bibliotecon. E Ci\u00eancia Da Inf."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"358","DOI":"10.1093\/bib\/bbm045","article-title":"Frontiers of biomedical text mining: Current progress","volume":"Volume 8","author":"Zweigenbaum","year":"2007","journal-title":"Briefings in Bioinformatics"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1016\/j.tibtech.2006.10.002","article-title":"Text mining and its potential applications in systems biology","volume":"Volume 24","author":"Ananiadou","year":"2006","journal-title":"Trends in Biotechnology"},{"key":"ref_4","first-page":"1","article-title":"Indecs: M\u00e9todo automatizado de classifica\u00e7\u00e3o de p\u00e1ginas web de sa\u00fade usando minera\u00e7\u00e3o de texto e descritores em ci\u00eancias da sa\u00fade (DECS)","volume":"1","author":"Mancini","year":"2009","journal-title":"J. Health Inform."},{"key":"ref_5","first-page":"13","article-title":"Analyzing medical data","volume":"55","author":"Goth","year":"2012","journal-title":"Commun. ACM"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1038\/nrg2999","article-title":"Using electronic health records to drive discovery in disease genomics","volume":"12","author":"Kohane","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"ref_7","unstructured":"Song, M. (2013). Opinion: Text mining in the clinic. Scientist, 1, Available online: https:\/\/www.the-scientist.com\/opinion\/opinion-text-mining-in-the-clinic-39531."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1093\/bib\/6.3.239","article-title":"Text mining and ontologies in biomedicine: Making sense of raw text","volume":"6","author":"Spasic","year":"2005","journal-title":"Brief. Bioinform."},{"key":"ref_9","first-page":"281","article-title":"Electronic medical records for clinical research: Application to the identification of heart failure","volume":"13","author":"Pakhomov","year":"2007","journal-title":"Am. J. Manag. Care"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1136\/amiajnl-2011-000456","article-title":"Importance of multi-modal approaches to effectively identify cataract cases from electronic health records","volume":"19","author":"Peissig","year":"2012","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Roque, F.S., Jensen, P.B., Schmock, H., Dalgaard, M., Andreatta, M., Hansen, T.F., S\u00f8eby, K., Bredkj\u00e6r, S., Juul, A., and Werge, T. (2011). Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts. PLoS Comput. Biol., 7.","DOI":"10.1371\/journal.pcbi.1002141"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Lopes, F., Teixeira, C., and Oliveira, H.G. (2019, January 1). Contributions to clinical named entity recognition in Portuguese. Proceedings of the 18th BioNLP Workshop and Shared Task, Florence, Italy. Available online: https:\/\/www.aclweb.org\/anthology\/W19-5024.","DOI":"10.18653\/v1\/W19-5024"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"de Souza, J.V.A., Gumiel, Y.B., Silva, L.E., and Moro, C.M.C. (2019, January 11\u201314). Named entity recognition for clinical Portuguese corpus with conditional random fields and semantic groups. Proceedings of the Anais do XIX Simp\u00f3sio Brasileiro de Computa\u00e7\u00e3o Aplicada \u00e0 Sa\u00fade, SBC, Niter\u00f3i, Brazil.","DOI":"10.5753\/sbcas.2019.6269"},{"key":"ref_14","unstructured":"e Oliveira, L.E.S., Peters, A.C., da Silva, A.M.P., Gebeluca, C.P., Gumiel, Y.B., Cintho, L.M.M., Carvalho, D.R., Al Hasan, S., and Moro, C.M.C. (2020). Semclinbr\u2013a multi institutional and multi-specialty semantically annotated corpus for Portuguese clinical nlp tasks. arXiv."},{"key":"ref_15","first-page":"506","article-title":"da S. Information extraction from Portuguese hospital discharge letters","volume":"8","author":"Ferreira","year":"2010","journal-title":"Evolution"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Wang, X., Song, X., Li, B., Guan, Y., and Han, J. (2020). Comprehensive named entity recognition on cord-19 with distant or weak supervision. arXiv.","DOI":"10.1109\/BigData50022.2020.9378052"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Andrade, V.D., Ruas, P., and Couto, F.M. (2021). Named entity recognition and linking: A Portuguese and Spanish oncological parallel corpus. bioRxiv.","DOI":"10.1101\/2021.09.16.460605"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Dias, M., Bon\u00e9, J., Ferreira, J.C., Ribeiro, R., and Maia, R. (2020). Named Entity Recognition for Sensitive Data Discovery in Portuguese. Appl. Sci., 10.","DOI":"10.3390\/app10072303"},{"key":"ref_19","unstructured":"Ferreira, L., Teixeira, A., and Cunha, J.P.S. (2013). Handbook of Research on ICTs for Human-Centered Healthcare and Social Care Services, IGI Global."},{"key":"ref_20","unstructured":"Leite-Moreira, A., Mendes, A., Pedrosa, A., Rocha-Sousa, A., Azevedo, A., Amaral-Gomes, A., Pinto, C., Figueira, H., Pereira, N.R., and Mendes, P. (2022). An NLP solution to foster the use of information in electronic health records for efficiency in decision-making in hospital care. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10916-020-1542-8","article-title":"Comparing Different Methods for Named Entity Recognition in Portuguese Neurology Text","volume":"44","author":"Lopes","year":"2020","journal-title":"J. Med. Syst."},{"key":"ref_22","unstructured":"Oleynik, M., Nohama, P., Cancian, P.S., and Schulz, S. (2010). MEDINFO, IOS Press."},{"key":"ref_23","unstructured":"Peters, A.C., Oleynik, M., Pacheco, E.J., Moro, C.M.C., Schulz, S., and Nohama, P. (2010, January 18\u201322). Elabora\u00e7\u00e3o de um corpus m\u00e9dico baseado em narrativas cl\u00ednicas contidas em sum\u00e1rios de alta hospitalar. Proceedings of the Anais do XII Congresso Brasileiro de Inform\u00e1tica em Sa\u00fade, Ipojuca, Brazil."},{"key":"ref_24","unstructured":"Schneider, E.T.R., Gumiel, Y.B., Luz, M.A.P.D., Paraiso, E.C., and Moro, C. (December, January 29). Experiments on Portuguese clinical question answering. Proceedings of the Brazilian Conference on Intelligent Systems, Virtual Event."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Terumi Rubel Schneider, E., Andrioli de Souza, J.V., Knafou, J.D.M., Silva e Oliveira, L.E., Copara Zea, J.L., Bonescki Gumiel, Y., Ferro Antunes de Oliveira, L., Cabrera Paraiso, E., Teodoro, D., and Cabral Moro Barra, C.M. (2020, January 19). BioBERTpt-a Portuguese neural language model for clinical named entity recognition. Proceedings of the 3rd Clinical Natural Language Processing Workshop, Online. Available online: https:\/\/www.aclweb.org\/anthology\/2020.clinicalnlp-1.7.","DOI":"10.18653\/v1\/2020.clinicalnlp-1.7"},{"key":"ref_26","unstructured":"Souza, F., Nogueira, R., and Lotufo, R. (2019). Portuguese named entity recognition using bert-crf. arXiv."},{"key":"ref_27","unstructured":"de Souza, J.V.A., Schneider, E.T.R., Cezar, J.O., Silva, L.E., Gumiel, Y.B., Paraiso, E.C., Teodoro, D., and Barra, C.M.C.M. (2020). A multilabel approach to Portuguese clinical named entity recognition. J. Health Inform., 366\u2013372."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Arnaud, \u00c9., Elbattah, M., Gignon, M., and Dequen, G. (2022, January 9\u201311). Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models. Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies, Online.","DOI":"10.5220\/0011012800003123"},{"key":"ref_29","unstructured":"HCFMB (2022, October 03). Hospital das Clinicas da Faculdade de Medicina de Botucatu. Available online: http:\/\/www.hcfmb.unesp.br\/."},{"key":"ref_30","unstructured":"Murugavel, M. (2022, October 03). Spacy Annotation Tool. Available online: https:\/\/manivannanmurugavel.github.io\/annotating-tool\/spacy-ner-annotator\/."},{"key":"ref_31","unstructured":"Zhang, Y., and Wallace, B. (2015). A sensitivity analysis of (and practitioners\u2019 guide to) convolutional neural networks for sentence classification. arXiv."},{"key":"ref_32","unstructured":"Ai Hub, T.M. (2022, October 03). Named Entity Recognition using Spacy and Tensorflow. Available online: https:\/\/aihub.cloud.google.com\/p\/products%2F2290fc65-0041-4c87-a898-0289f59aa8ba."},{"key":"ref_33","unstructured":"Slatton, T.G. (2022, October 03). A Comparison of Dropout and Weight Decay for Regularizing Deep Neural Networks. Available online: https:\/\/scholarworks.uark.edu\/cgi\/viewcontent.cgi?article=1028&context=csceuht."},{"key":"ref_34","first-page":"1929","article-title":"Dropout: A simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1527","DOI":"10.1162\/neco.2006.18.7.1527","article-title":"A Fast Learning Algorithm for Deep Belief Nets","volume":"18","author":"Hinton","year":"2006","journal-title":"Neural Comput."},{"key":"ref_36","unstructured":"SPACY (2022, October 03). Language Processing Pipelines. Available online: https:\/\/spacy.io\/usage\/processing-pipelines."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/1\/11\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:54:57Z","timestamp":1760147697000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/1\/11"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,29]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["data8010011"],"URL":"https:\/\/doi.org\/10.3390\/data8010011","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2022,12,29]]}}}