{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T02:51:04Z","timestamp":1761101464071,"version":"build-2065373602"},"reference-count":65,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2021,9,29]],"date-time":"2021-09-29T00:00:00Z","timestamp":1632873600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/CCI-BIO\/28685\/2017"],"award-info":[{"award-number":["PTDC\/CCI-BIO\/28685\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"name":"LASIGE Research Unit","award":["ref. UIDB\/00408\/2020 and ref. UIDP\/00408\/2020"],"award-info":[{"award-number":["ref. UIDB\/00408\/2020 and ref. UIDP\/00408\/2020"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Biomolecules"],"abstract":"<jats:p>In the assembly of biological networks it is important to provide reliable interactions in an effort to have the most possible accurate representation of real-life systems. Commonly, the data used to build a network comes from diverse high-throughput essays, however most of the interaction data is available through scientific literature. This has become a challenge with the notable increase in scientific literature being published, as it is hard for human curators to track all recent discoveries without using efficient tools to help them identify these interactions in an automatic way. This can be surpassed by using text mining approaches which are capable of extracting knowledge from scientific documents. One of the most important tasks in text mining for biological network building is relation extraction, which identifies relations between the entities of interest. Many interaction databases already use text mining systems, and the development of these tools will lead to more reliable networks, as well as the possibility to personalize the networks by selecting the desired relations. This review will focus on different approaches of automatic information extraction from biomedical text that can be used to enhance existing networks or create new ones, such as deep learning state-of-the-art approaches, focusing on cancer disease as a case-study.<\/jats:p>","DOI":"10.3390\/biom11101430","type":"journal-article","created":{"date-parts":[[2021,9,30]],"date-time":"2021-09-30T00:03:42Z","timestamp":1632960222000},"page":"1430","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Text Mining for Building Biomedical Networks Using Cancer as a Case Study"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8891-3546","authenticated-orcid":false,"given":"Sofia I. R.","family":"Concei\u00e7\u00e3o","sequence":"first","affiliation":[{"name":"LASIGE, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0627-1496","authenticated-orcid":false,"given":"Francisco M.","family":"Couto","sequence":"additional","affiliation":[{"name":"LASIGE, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"D573","DOI":"10.1093\/nar\/gky1126","article-title":"HumanNet v2: Human gene networks for disease research","volume":"47","author":"Hwang","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"294","DOI":"10.3389\/fgene.2019.00294","article-title":"Network medicine in the age of biomedical big data","volume":"10","author":"Sonawane","year":"2019","journal-title":"Front. Genet."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"193","DOI":"10.12700\/APH.18.1.2021.1.12","article-title":"Analyse the Readability of LINQ Code using an Eye-Tracking-based Evaluation","volume":"18","author":"Katona","year":"2021","journal-title":"Acta Polytech. Hung."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"D607","DOI":"10.1093\/nar\/gky1131","article-title":"STRING v11: Protein\u2013protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets","volume":"47","author":"Szklarczyk","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Singhal, A., Leaman, R., Catlett, N., Lemberger, T., McEntyre, J., Polson, S., Xenarios, I., Arighi, C., and Lu, Z. (2016). Pressing needs of biomedical text mining in biocuration and beyond: Opportunities and challenges. Database, 2016.","DOI":"10.1093\/database\/baw161"},{"key":"ref_6","unstructured":"Ranganathan, S., Gribskov, M., Nakai, K., and Sch\u00f6nbach, C. (2019). Text Mining for Bioinformatics Using Biomedical Literature. Encyclopedia of Bioinformatics and Computational Biology, Academic Press."},{"key":"ref_7","unstructured":"(2021, January 21). World Health Organization: Cancer. Available online: https:\/\/www.who.int\/health-topics\/cancer#tab=tab_1."},{"key":"ref_8","unstructured":"(2021, January 21). World Health Organization: Cancer. Available online: https:\/\/www.who.int\/news-room\/fact-sheets\/detail\/cancer."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Korhonen, A., S\u00e9aghdha, D.\u00d3., Silins, I., Sun, L., H\u00f6gberg, J., and Stenius, U. (2012). Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS ONE, 7.","DOI":"10.1371\/journal.pone.0033427"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1016\/j.jbi.2012.10.007","article-title":"Biomedical text mining and its applications in cancer research","volume":"46","author":"Zhu","year":"2013","journal-title":"J. Biomed. Inform."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"605","DOI":"10.1016\/j.ijmedinf.2014.06.009","article-title":"Text mining of cancer-related information: Review of current status and future directions","volume":"83","author":"Livsey","year":"2014","journal-title":"Int. J. Med. Inform."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Couto, F.M. (2019). Data and Text Processing for Health and Life Sciences, Springer Nature.","DOI":"10.1007\/978-3-030-13845-5"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Jurca, G., Addam, O., Aksac, A., Gao, S., \u00d6zyer, T., Demetrick, D., and Alhajj, R. (2016). Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends. Bmc Res. Notes, 9.","DOI":"10.1186\/s13104-016-2023-5"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"i37","DOI":"10.1093\/bioinformatics\/btx228","article-title":"Deep learning with word embeddings improves biomedical named entity recognition","volume":"33","author":"Habibi","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1186\/s12859-019-2813-6","article-title":"Collabonet: Collaboration of deep neural networks for biomedical named entity recognition","volume":"20","author":"Yoon","year":"2019","journal-title":"Bmc Bioinform."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"D955","DOI":"10.1093\/nar\/gky1032","article-title":"Human Disease Ontology 2018 update: Classification, content and workflow expansion","volume":"47","author":"Schriml","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"ref_17","first-page":"D1018","article-title":"Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources","volume":"47","author":"Carmody","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"ref_18","first-page":"D7","article-title":"Database resources of the National Center for Biotechnology Information","volume":"44","author":"Coordinators","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"D1214","DOI":"10.1093\/nar\/gkv1031","article-title":"ChEBI in 2016: Improved services and an expanding collection of metabolites","volume":"44","author":"Hastings","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Baltoumas, F.A., Zafeiropoulou, S., Karatzas, E., Paragkamian, S., Thanati, F., Iliopoulos, I., Eliopoulos, A.G., Schneider, R., Jensen, L.J., and Pafilis, E. (2021). OnTheFly2.0: A text-mining web application for automated biomedical entity recognition, document annotation, network and functional enrichment analysis. bioRxiv, 2021.05.14.444150.","DOI":"10.1101\/2021.05.14.444150"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pafilis, E., Buttigieg, P.L., Ferrell, B., Pereira, E., Schnetzer, J., Arvanitidis, C., and Jensen, L.J. (2016). EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation. Database.","DOI":"10.1093\/database\/baw005"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1093\/bioinformatics\/btz504","article-title":"Towards reliable named entity recognition in the biomedical domain","volume":"36","author":"Giorgi","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2792","DOI":"10.1093\/bioinformatics\/btab042","article-title":"HunFlair: An easy-to-use tool for state-of-the-art biomedical named entity recognition","volume":"37","author":"Weber","year":"2021","journal-title":"Bioinformatics"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"178","DOI":"10.15265\/IY-2016-022","article-title":"Knowledge representation and management: A linked data perspective","volume":"25","author":"Barros","year":"2016","journal-title":"Yearb. Med. Inform."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Bunescu, R., Mooney, R., Ramani, A., and Marcotte, E. (2006, January 8). Integrating co-occurrence statistics with information extraction for robust retrieval of protein interactions from Medline. Proceedings of the HTLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, New York, NY, USA.","DOI":"10.3115\/1654415.1654424"},{"key":"ref_26","unstructured":"Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., and Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Hearst, M.A. (1992, January 23\u201328). Automatic acquisition of hyponyms from large text corpora. Proceedings of the Coling 1992 volume 2: The 14th International Conference on Computational Linguistics, Nantes, France.","DOI":"10.3115\/992133.992154"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"104130","DOI":"10.1016\/j.engappai.2020.104130","article-title":"Pattern-based bootstrapping framework for biomedical relation extraction","volume":"99","author":"Deepika","year":"2021","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_29","unstructured":"Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (, January August). Distant supervision for relation extraction without labeled data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Yan, Y., Okazaki, N., Matsuo, Y., Yang, Z., and Ishizuka, M. (2009, January 2\u20137). Unsupervised relation extraction by mining wikipedia texts using information from the web. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore.","DOI":"10.3115\/1690219.1690289"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_32","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1006\/knac.1993.1008","article-title":"A translation approach to portable ontology specifications","volume":"5","author":"Gruber","year":"1993","journal-title":"Knowl. Acquis."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-018-2584-5","article-title":"BO-LSTM: Classifying relations via long short-term memory networks along biomedical ontologies","volume":"20","author":"Lamurias","year":"2019","journal-title":"BMC Bioinform."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Sousa, D., and Couto, F.M. (2020). BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction, Springer. European Conference on Information Retrieval.","DOI":"10.1007\/978-3-030-45442-5_46"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.jbi.2018.03.011","article-title":"A hybrid model based on neural networks for biomedical relation extraction","volume":"81","author":"Zhang","year":"2018","journal-title":"J. Biomed. Inform."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Quan, C., Luo, Z., and Wang, S. (2020). A Hybrid Deep Learning Model for Protein\u2013Protein Interactions Extraction from Biomedical Literature. Appl. Sci., 10.","DOI":"10.3390\/app10082690"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1162\/tacl_a_00049","article-title":"Cross-sentence N-ary relation extraction with graph LSTMs","volume":"5","author":"Peng","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"107230","DOI":"10.1016\/j.asoc.2021.107230","article-title":"Biomedical cross-sentence relation extraction via multihead attention and graph convolutional networks","volume":"104","author":"Zhao","year":"2021","journal-title":"Appl. Soft Comput."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Baltoumas, F.A., Zafeiropoulou, S., Karatzas, E., Koutrouli, M., Thanati, F., Voutsadaki, K., Gkonta, M., Hotova, J., Kasionis, I., and Hatzis, P. (2021). Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review. Biomolecules, 11.","DOI":"10.3390\/biom11081245"},{"key":"ref_41","unstructured":"(2021, July 20). Online Mendelian Inheritance in Man, OMIM\u00ae McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD). Available online: https:\/\/omim.org\/."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"bav028","DOI":"10.1093\/database\/bav028","article-title":"DisGeNET: A discovery platform for the dynamical exploration of human diseases and their genes","volume":"2015","author":"Pinero","year":"2015","journal-title":"Database"},{"key":"ref_43","first-page":"D833","article-title":"DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants","volume":"45","author":"Bravo","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"ref_44","first-page":"D845","article-title":"The DisGeNET knowledge platform for disease genomics: 2019 update","volume":"48","author":"Ronzano","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"2883","DOI":"10.1093\/bioinformatics\/btw234","article-title":"SETH detects and normalizes genetic variants in text","volume":"32","author":"Thomas","year":"2016","journal-title":"Bioinformatics"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"D506","DOI":"10.1093\/nar\/gky1049","article-title":"UniProt: A worldwide hub of protein knowledge","volume":"47","author":"Consortium","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Bravo, \u00c0., Pi\u00f1ero, J., Queralt-Rosinach, N., Rautschka, M., and Furlong, L.I. (2015). Extraction of relations between genes and diseases from text and large-scale data analysis: Implications for translational research. BMC Bioinform., 16.","DOI":"10.1186\/s12859-015-0472-9"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Bundschus, M., Dejori, M., Stetter, M., Tresp, V., and Kriegel, H.P. (2008). Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinform., 9.","DOI":"10.1186\/1471-2105-9-207"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"D1302","DOI":"10.1093\/nar\/gkaa1027","article-title":"Open Targets Platform: Supporting systematic drug\u2013target identification and prioritisation","volume":"49","author":"Ochoa","year":"2021","journal-title":"Nucleic Acids Res."},{"key":"ref_50","unstructured":"(2021, January 27). LIterature coNcept Knowledgebase. Available online: Hhttps:\/\/link.opentargets.io\/."},{"key":"ref_51","first-page":"1","article-title":"Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases","volume":"12","author":"Winter","year":"2011","journal-title":"BMC Bioinform."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1002\/pro.3978","article-title":"The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions","volume":"30","author":"Oughtred","year":"2021","journal-title":"Protein Sci."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"D857","DOI":"10.1093\/nar\/gkr930","article-title":"MINT, the molecular interaction database: 2012 update","volume":"40","author":"Licata","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.ymeth.2014.11.020","article-title":"DISEASES: Text mining and data integration of disease\u2013gene associations","volume":"74","author":"Tsafou","year":"2015","journal-title":"Methods"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"23","DOI":"10.4103\/2153-3539.97788","article-title":"The feasibility of using natural language processing to extract clinical information from breast pathology reports","volume":"3","author":"Buckley","year":"2012","journal-title":"J. Pathol. Inform."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/s10549-016-4035-1","article-title":"Using machine learning to parse breast pathology reports","volume":"161","author":"Yala","year":"2017","journal-title":"Breast Cancer Res. Treat."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Kawashima, K., Bai, W., and Quan, C. (2017, January 26\u201328). Text Mining and Pattern Clustering for Relation Extraction of Breast Cancer and Related Genes. Proceedings of the 2017 18th IEEE\/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel\/Distributed Computing (SNPD), Kanazawa, Japan.","DOI":"10.1109\/SNPD.2017.8022701"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12885-020-06931-0","article-title":"Text mining in a literature review of urothelial cancer using topic model","volume":"20","author":"Lin","year":"2020","journal-title":"BMC Cancer"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"104139","DOI":"10.1016\/j.ijmedinf.2020.104139","article-title":"Machine learning application for incident prostate adenocarcinomas automatic registration in a French regional cancer registry","volume":"139","author":"Fabacher","year":"2020","journal-title":"Int. J. Med. Inform."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/S0092-8674(00)81683-9","article-title":"The hallmarks of cancer","volume":"100","author":"Weinberg","year":"2000","journal-title":"Cell"},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"106486","DOI":"10.1016\/j.knosys.2020.106486","article-title":"DECAB-LSTM: Deep Contextualized Attentional Bidirectional LSTM for cancer hallmark classification","volume":"210","author":"Jiang","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_62","unstructured":"Baker, S., Korhonen, A.L., and Pyysalo, S. (2016, January 11\u201316). Cancer hallmark text classification using convolutional neural networks. Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016), Osaka, Japan."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13073-019-0686-y","article-title":"Text-mining clinically relevant cancer biomarkers for curation into the CIViC database","volume":"11","author":"Lever","year":"2019","journal-title":"Genome Med."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1093\/jamia\/ocz153","article-title":"Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks","volume":"27","author":"Alawad","year":"2020","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.ceb.2020.01.005","article-title":"Not all cancers are created equal: Tissue specificity in cancer genes and pathways","volume":"63","author":"Bianchi","year":"2020","journal-title":"Curr. Opin. Cell Biol."}],"container-title":["Biomolecules"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-273X\/11\/10\/1430\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:07:32Z","timestamp":1760166452000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-273X\/11\/10\/1430"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,29]]},"references-count":65,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2021,10]]}},"alternative-id":["biom11101430"],"URL":"https:\/\/doi.org\/10.3390\/biom11101430","relation":{},"ISSN":["2218-273X"],"issn-type":[{"type":"electronic","value":"2218-273X"}],"subject":[],"published":{"date-parts":[[2021,9,29]]}}}