{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:24:01Z","timestamp":1766067841319,"version":"build-2065373602"},"reference-count":56,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T00:00:00Z","timestamp":1702252800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.<\/jats:p>","DOI":"10.3390\/informatics10040089","type":"journal-article","created":{"date-parts":[[2023,12,11]],"date-time":"2023-12-11T14:12:51Z","timestamp":1702303971000},"page":"89","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Knowledge-Based Intelligent Text Simplification for Biological Relation Extraction"],"prefix":"10.3390","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4829-5971","authenticated-orcid":false,"given":"Jaskaran","family":"Gill","sequence":"first","affiliation":[{"name":"Health Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7052-0413","authenticated-orcid":false,"given":"Madhu","family":"Chetty","sequence":"additional","affiliation":[{"name":"Health Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, Australia"}]},{"given":"Suryani","family":"Lim","sequence":"additional","affiliation":[{"name":"Health Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, Australia"}]},{"given":"Jennifer","family":"Hallinan","sequence":"additional","affiliation":[{"name":"Health Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, Australia"},{"name":"BioThink Pty Ltd., Brisbane, QLD 4020, Australia"}]}],"member":"1968","published-online":{"date-parts":[[2023,12,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Naseem, U., Khushi, M., Khan, S.K., Shaukat, K., and Moni, M.A. (2021). A Comparative Analysis of Active Learning for Biomedical Text Mining. Appl. Syst. Innov., 4.","DOI":"10.3390\/asi4010023"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Simon, C., Davidsen, K., Hansen, C., Seymour, E., Barnkob, M.B., and Olsen, L.R. (2019). BioReader: A text mining tool for performing classification of biomedical literature. BMC Bioinform., 19.","DOI":"10.1186\/s12859-019-2607-x"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gamage, H.N., Chetty, M., Shatte, A., and Hallinan, J. (2022, January 15\u201317). Ensemble Regression Modelling for Genetic Network Inference. Proceedings of the 2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Ottawa, ON, Canada.","DOI":"10.1109\/CIBCB55180.2022.9863017"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"2449","DOI":"10.1039\/C5MB00122F","article-title":"Improving gene regulatory network inference using network topology information","volume":"11","author":"Nair","year":"2015","journal-title":"Mol. BioSystems"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Morshed, N., Chetty, M., and Vinh, N.X. (2012). Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique. BMC Syst. Biol., 6.","DOI":"10.1186\/1752-0509-6-62"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"102155","DOI":"10.1016\/j.isci.2021.102155","article-title":"Opportunities and challenges of text mining in materials research","volume":"24","author":"Kononova","year":"2021","journal-title":"iScience"},{"key":"ref_7","unstructured":"Corlan, A.D. (2023, February 14). Medline Trend: Automated Yearly Statistics of PubMed Results for Any Query. Available online: http:\/\/dan.corlan.net\/medline-trend.html."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Mercatellia, D., Scalambra, L., Triboli, L., Ray, F., and Giorgi, F.M. (2020). Gene regulatory network inference resources: A practical overview. Biochim. Et Biophys. Acta (BBA)-Gene Regul. Mech., 1863.","DOI":"10.1016\/j.bbagrm.2019.194430"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"103294","DOI":"10.1016\/j.jbi.2019.103294","article-title":"Neural network-based approaches for biomedical relation classification: A review","volume":"99","author":"Zhang","year":"2019","journal-title":"J. Biomed. Inform."},{"key":"ref_10","unstructured":"BioCreative (2023, November 12). BioCreative VI Challenge and Workshop. Available online: https:\/\/biocreative.bioinformatics.udel.edu\/events\/biocreative-vi\/biocreative-vi-challenge\/."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Peng, Y., Rios, A., Kavuluru, R., and Lu, Z. (2018). Extracting chemical\u2013protein relations with ensembles of SVM and deep learning models. Database J. Biol. Databases Curation, 2018.","DOI":"10.1093\/database\/bay073"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"4781","DOI":"10.1007\/s00521-021-06667-3","article-title":"Deep neural network-based relation extraction: An overview","volume":"34","author":"Wang","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zhao, S., Lu, C.S.Z., and Wang, F. (2021). Recent advances in biomedical literature mining. Brief. Bioinform., 22.","DOI":"10.1093\/bib\/bbaa057"},{"key":"ref_14","first-page":"1400","article-title":"Biomedical text mining for research rigor and integrity: Tasks, challenges, directions","volume":"19","author":"Kilicoglu","year":"2018","journal-title":"Brief. Bioinform."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.ymeth.2015.01.015","article-title":"Application of text mining in the biomedical domain","volume":"75","author":"Fleuren","year":"2015","journal-title":"Methods"},{"key":"ref_16","unstructured":"N\u00e9dellec, C. (2005, January 1). Learning language in logic\u2014Genic interaction extraction challenge. Proceedings of the Learning Language in Logic Workshop (LLL05), Bonn, Germany."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1093\/bib\/bbv024","article-title":"Community challenges in biomedical text mining over 10 years: Success, failure and the future","volume":"17","author":"Huang","year":"2016","journal-title":"Brief. Bioinform."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"baw161","DOI":"10.1093\/database\/baw161","article-title":"Pressing needs of biomedical text mining in biocuration and beyond: Opportunities and challenges","volume":"2016","author":"Singhal","year":"2016","journal-title":"Database"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Peng, Y., Torii, M., Wu, C.H., and Vijay-Shanker, K. (2014). A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems. BMC Bioinform., 15.","DOI":"10.1186\/1471-2105-15-285"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Jonnalagadda, S., Tari, L., Hakenberg, J., Baral, C., and Gonzalez, G. (2010). Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. arXiv.","DOI":"10.3115\/1620853.1620902"},{"key":"ref_21","unstructured":"Bach, N., Gao, Q., Vogel, S., and Waibel, A. (2011, January 2). TriS: A Statistical Sentence Simplifier with Log-linear Models and Margin-based Discriminative Training. Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1109\/TCBB.2010.51","article-title":"Efficient extraction of protein-protein interactions from Full-Text Articles","volume":"7","author":"Hakenberg","year":"2010","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_23","unstructured":"Miao, Q., Zhang, S., Zhang, B., and Yu, H. (2012, January 7\u201310). Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text. Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, Bali, Indonesia."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1093\/jamia\/ocac149","article-title":"A survey of automated methods for biomedical text simplification","volume":"29","author":"Ondov","year":"2022","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Devaraj, A., Marshall, I.J., Wallace, B.C., and Li, J.J. (2021, January 6\u201311). Paragraph-level Simplification of Medical Texts. Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2021.naacl-main.395"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, T., Chen, P., Rochford, J., and Qiang, J. (2016, January 12\u201317). Text Simplification Using Neural Machine Translation. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.9933"},{"key":"ref_27","unstructured":"Siddharthan, A. (2011, January 28\u201331). Text Simplification using Typed Dependencies: A Comparison of the robustness of different generation strategies. Proceedings of the 13th European Workshop on Natural Language Generation, Nancy, France."},{"key":"ref_28","unstructured":"Siddharthan, A. (2011, January 28\u201331). Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules. Proceedings of the 13th European Workshop on Natural Language Generation, Nancy, France."},{"key":"ref_29","unstructured":"Chatterjee, N., and Agarwal, R. (2021, January 21\u201324). DEPSYM: A Lightweight Syntactic Text Simplification Approach using Dependency Trees. Proceedings of the CTTS@ SEPLN, M\u00e1laga, Spain."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2614","DOI":"10.1093\/bioinformatics\/bty114","article-title":"A global network of biomedical relationships derived from text","volume":"34","author":"Percha","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/bioinformatics\/btl616","article-title":"RelEx\u2014Relation extraction using dependency parse trees","volume":"23","author":"Fundel","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"298473","DOI":"10.1155\/2014\/298473","article-title":"Biomedical Relation Extraction: From Binary to Complex","volume":"2014","author":"Zhou","year":"2014","journal-title":"Comput. Math. Methods Med."},{"key":"ref_33","unstructured":"Yang, X., Yu, Z., Guo, Y., Bian, J., and Wu, Y. (2021). Clinical Relation Extraction Using Transformer-based Models. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1109\/TKDE.2020.2981314","article-title":"A Survey on Deep Learning for Named Entity Recognition","volume":"34","author":"Li","year":"2022","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.cosrev.2018.06.001","article-title":"Recent Named Entity Recognition and Classification techniques: A systematic review","volume":"29","author":"Goyal","year":"2018","journal-title":"Comput. Sci. Rev."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"i37","DOI":"10.1093\/bioinformatics\/btx228","article-title":"Deep learning with word embeddings improves biomedical named entity recognition","volume":"33","author":"Habibi","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_37","unstructured":"Raul Garreta, G.M.T.H.G.H. (2017). Scikit-Learn: Machine Learning Simplified: Implement Scikit-Learn into Every Step of the Data Science Pipeline, Packt Publishing Ltd."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: A pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"4837","DOI":"10.1093\/bioinformatics\/btac598","article-title":"BERN2: An advanced neural biomedical named entity recognition and normalization tool","volume":"38","author":"Sung","year":"2022","journal-title":"Bioinformatics"},{"key":"ref_40","unstructured":"Vacariu, A.V. (2023, September 04). A High-Throughput Dependency Parser. Available online: https:\/\/summit.sfu.ca\/item\/17739."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1075\/itl.165.2.06sid","article-title":"A survey of research on text simplification","volume":"165","author":"Siddharthan","year":"2014","journal-title":"ITL-Int. J. Appl. Linguist."},{"key":"ref_42","unstructured":"Millstein, F. (2023, September 04). NLTK, Natural Language Processing with Python: Natural Language Processing Using. Available online: https:\/\/scholar.google.com.hk\/scholar?hl=zh-TW&as_sdt=0%2C5&q=NLTK%2C+Natural+Language+Processing+with+Python%3A+Natural+Language+Processing+Using&btnG=#d=gs_cit&t=1702266004906&u=%2Fscholar%3Fq%3Dinfo%3ARrd7HVVyN8IJ%3Ascholar.google.com%2F%26output%3Dcite%26scirp%3D0%26hl%3Dzh-TW."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Nazaruka, E., Osis, J., and Griberman, V. (2019, January 4\u20135). Using Stanford CoreNLP Capabilities for Semantic Information Extraction from Textual Descriptions. Proceedings of the International Conference on Evaluation of Novel Approaches to Software Engineering, Heraklion, Greece.","DOI":"10.1007\/978-3-030-40223-5_1"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Okhapkin, V.P., Okhapkina, E.P., Iskhakova, A.O., and Iskhakov, A.Y. (2020, January 14\u201316). Constructing of Semantically Dependent Patterns Based on SpaCy and StanfordNLP Libraries. Proceedings of the Futuristic Trends in Network and Communication Technologies: Third International Conference, FTNCT 2020, Taganrog, Russia.","DOI":"10.1007\/978-981-16-1480-4_45"},{"key":"ref_45","unstructured":"Vasiliev, Y. (2020). Natural Language Processing with Python and spaCy: A Practical Introduction, No Starch Press."},{"key":"ref_46","unstructured":"Honnibal, M., Montani, I., Landeghem, S.V., and Boyd, A. (2023, September 04). spaCy: Industrial-strength Natural Language Processing in Python. Available online: https:\/\/github.com\/explosion\/spaCy."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Ramesh, S., Tiwari, A., Choubey, P., Kashyap, S., Khose, S., Lakara, K., Singh, N., and Verma, U. (2021, January 10). BERT based Transformers lead the way in Extraction of Health Information from Social Media. Proceedings of the Sixth Social Media Mining for Health Workshop, Mexico City, Mexico.","DOI":"10.18653\/v1\/2021.smm4h-1.5"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Algamdi, S., Albanyan, A., Shah, S.K., and Tariq, Z. (2022, January 17\u201320). Twitter Accounts Suggestion: Pipeline Technique SpaCy Entity Recognition. Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan.","DOI":"10.1109\/BigData55660.2022.10020570"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Kandji, A.K., and Ndiaye, S. (2023, January 23\u201325). Design and realization of an NLP application for the massive processing of large volumes of resumes. Proceedings of the IEEE Multi-conference on Natural and Engineering Sciences for Sahel\u2019s Sustainable Development (MNE3SD), Bobo-Dioulasso, Burkina Faso.","DOI":"10.1109\/MNE3SD53781.2022.9723408"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Pyysalo, S., Ginter, F., Heimonen, J., Bj\u00f6rne, J., Boberg, J., J\u00e4rvinen, J., and Salakoski, T. (2007). BioInfer: A corpus for information extraction in the biomedical domain. BMC Bioinform., 8.","DOI":"10.1186\/1471-2105-8-50"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Panyam, K.V.N.C., Cohn, T., and Ramamohanarao, K. (2018). Exploiting graph kernels for high performance biomedical relation extraction. J. Biomed. Semant., 9.","DOI":"10.1186\/s13326-017-0168-3"},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chang, Y.-C., Chu, C.-H., Su, Y.-C., Chen, C.C., and Hsu, W.-L. (2016). PIPE: A protein-protein interaction passage extraction module for BioCreative challenge. Database J. Biol. Databases Curation, 2016.","DOI":"10.1093\/database\/baw101"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"89354","DOI":"10.1109\/ACCESS.2019.2927253","article-title":"A protein-protein interaction extraction approach based on deep neural network","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.jbi.2018.03.011","article-title":"A hybrid model based on neural networks for biomedical relation","volume":"81","author":"Zhang","year":"2018","journal-title":"J. Biomed. Inform."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Ahmed, M., Islam, J., Samee, M.R., and Mercer, R.E. (February, January 30). Identifying Protein-Protein Interaction using Tree LSTM and Structured Attention. Proceedings of the 2019 IEEE 13th international conference on semantic computing (ICSC), Newport Beach, CA, USA.","DOI":"10.1109\/ICOSC.2019.8665584"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Park, G., McCorkle, S., Soto, C., Blaby, I., and Yoo, S. (2022, January 17\u201320). Extracting Protein-Protein Interactions (PPIs) from Biomedical Literature using Attention-based Relational Context Infor-mation. Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan.","DOI":"10.1109\/BigData55660.2022.10021099"}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/10\/4\/89\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:36:36Z","timestamp":1760132196000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/10\/4\/89"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,11]]},"references-count":56,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["informatics10040089"],"URL":"https:\/\/doi.org\/10.3390\/informatics10040089","relation":{},"ISSN":["2227-9709"],"issn-type":[{"type":"electronic","value":"2227-9709"}],"subject":[],"published":{"date-parts":[[2023,12,11]]}}}