{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T22:28:54Z","timestamp":1775255334802,"version":"3.50.1"},"reference-count":29,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T00:00:00Z","timestamp":1717372800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Regulatory agencies generate a vast amount of textual data in the review process. For example, drug labeling serves as a valuable resource for regulatory agencies, such as U.S. Food and Drug Administration (FDA) and Europe Medical Agency (EMA), to communicate drug safety and effectiveness information to healthcare professionals and patients. Drug labeling also serves as a resource for pharmacovigilance and drug safety research. Automated text classification would significantly improve the analysis of drug labeling documents and conserve reviewer resources.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We utilized artificial intelligence in this study to classify drug-induced liver injury (DILI)-related content from drug labeling documents based on FDA\u2019s DILIrank dataset. We employed text mining and XGBoost models and utilized the Preferred Terms of Medical queries for adverse event standards to simplify the elimination of common words and phrases while retaining medical standard terms for FDA and EMA drug label datasets. Then, we constructed a document term matrix using weights computed by Term Frequency-Inverse Document Frequency (TF-IDF) for each included word\/term\/token.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>The automatic text classification model exhibited robust performance in predicting DILI, achieving cross-validation AUC scores exceeding 0.90 for both drug labels from FDA and EMA and literature abstracts from the Critical Assessment of Massive Data Analysis (CAMDA).<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Moreover, the text mining and XGBoost functions demonstrated in this study can be applied to other text processing and classification tasks.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2024.1401810","type":"journal-article","created":{"date-parts":[[2024,6,3]],"date-time":"2024-06-03T13:48:56Z","timestamp":1717422536000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost"],"prefix":"10.3389","volume":"7","author":[{"given":"Minjun","family":"Chen","sequence":"first","affiliation":[]},{"given":"Yue","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Byron","family":"Wingerd","sequence":"additional","affiliation":[]},{"given":"Zhichao","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Joshua","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Shraddha","family":"Thakkar","sequence":"additional","affiliation":[]},{"given":"Thomas J.","family":"Pedersen","sequence":"additional","affiliation":[]},{"given":"Tom","family":"Donnelly","sequence":"additional","affiliation":[]},{"given":"Nicholas","family":"Mann","sequence":"additional","affiliation":[]},{"given":"Weida","family":"Tong","sequence":"additional","affiliation":[]},{"given":"Russell D.","family":"Wolfinger","sequence":"additional","affiliation":[]},{"given":"Wenjun","family":"Bao","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,6,3]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1038\/s41572-019-0105-0","article-title":"Drug-induced liver injury","volume":"5","author":"Andrade","year":"2019","journal-title":"Nat. Rev. Dis. Prim."},{"key":"ref2","article-title":"Summary of product characteristics","volume-title":"Committee for Proprietary Medicinal Products","author":"Annex","year":"1999"},{"key":"ref3","doi-asserted-by":"publisher","first-page":"5552","DOI":"10.3748\/wjg.v13.i42.5552","article-title":"Acute renal dysfunction in liver diseases","volume":"13","author":"Betrosian","year":"2007","journal-title":"World J. Gastroenterol."},{"key":"ref4","doi-asserted-by":"publisher","first-page":"815","DOI":"10.1016\/j.jbusres.2020.10.043","article-title":"Exploring healthcare\/health-product ecommerce satisfaction: a text mining and machine learning application","volume":"131","author":"Chatterjee","year":"2021","journal-title":"J. Bus. Res."},{"key":"ref5","doi-asserted-by":"crossref","DOI":"10.1145\/2939672.2939785","article-title":"XGBoost: a scalable tree boosting system","author":"Chen","year":"2016"},{"key":"ref6","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1016\/j.drudis.2016.02.015","article-title":"DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans","volume":"21","author":"Chen","year":"2016","journal-title":"Drug Discov. Today"},{"key":"ref7","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1016\/j.drudis.2011.05.007","article-title":"FDA-approved drug labeling for the study of drug-induced liver injury","volume":"16","author":"Chen","year":"2011","journal-title":"Drug Discov. Today"},{"key":"ref8","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/978-3-030-58721-5_8","article-title":"Natural language processing for health-related texts","volume-title":"Biomedical informatics: Computer applications in health care and biomedicine","author":"Demner-Fushman","year":"2021"},{"key":"ref9","doi-asserted-by":"publisher","first-page":"1378","DOI":"10.1038\/s41587-020-00751-0","article-title":"FDALabel for drug repurposing studies and beyond","volume":"38","author":"Fang","year":"2020","journal-title":"Nat. Biotechnol."},{"key":"ref10","unstructured":"2022"},{"key":"ref11","unstructured":"2023"},{"key":"ref12","volume-title":"Warnings and precautions, contraindications, and boxed warning sections of labeling for human prescription drug and biological products\u2013content and format. 2011","year":""},{"key":"ref13","volume-title":"Adverse reactions section of labeling for human prescription drug and biological products\u2014Content and format","year":""},{"key":"ref14","doi-asserted-by":"publisher","first-page":"561","DOI":"10.1007\/s40264-016-0409-x","article-title":"A pharmacovigilance signaling system based on FDA regulatory action and post-marketing adverse event reports","volume":"39","author":"Hoffman","year":"2016","journal-title":"Drug Saf."},{"key":"ref15","article-title":"LightGBM: a highly efficient gradient boosting decision tree","volume-title":"Advances in neural information processing systems","author":"Ke","year":"2017"},{"key":"ref16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12911-020-01266-z","article-title":"Using machine learning of clinical data to diagnose COVID-19: a systematic review and meta-analysis","volume":"20","author":"Li","year":"2020","journal-title":"BMC Med. Inform. Decis. Mak."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"284","DOI":"10.1097\/01.pra.0000452565.83039.20","article-title":"The package insert: who writes it and why, what are its implications, and how well does medical school explain it?","volume":"20","author":"McMahon","year":"2014","journal-title":"J. Psychiatr. Pract."},{"key":"ref18","doi-asserted-by":"publisher","first-page":"602030","DOI":"10.3389\/fphar.2020.602030","article-title":"A novel text-mining approach for retrieving pharmacogenomics associations from the literature","volume":"11","author":"Pandi","year":"2020","journal-title":"Front. Pharmacol."},{"key":"ref19","article-title":"CatBoost: unbiased boosting with categorical features","volume":"31","author":"Prokhorenkova","year":"2018","journal-title":"Advances in neural information processing systems"},{"key":"ref20","doi-asserted-by":"publisher","first-page":"104285","DOI":"10.1016\/j.jbi.2023.104285","article-title":"Fine-tuning BERT for automatic ADME semantic labeling in FDA drug labeling to enhance product-specific guidance assessment","volume":"138","author":"Shi","year":"2023","journal-title":"J. Biomed. Inform."},{"key":"ref21","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1016\/j.inffus.2021.11.011","article-title":"Tabular data: deep learning is not all you need","volume":"81","author":"Shwartz-Ziv","year":"2022","journal-title":"Inf. Fusion"},{"key":"ref22","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1097\/RLI.0000000000000518","article-title":"Impact of machine learning with multiparametric magnetic resonance imaging of the breast for early prediction of response to neoadjuvant chemotherapy and survival outcomes in breast cancer patients","volume":"54","author":"Tahmassebi","year":"2019","journal-title":"Investig. Radiol."},{"key":"ref23","doi-asserted-by":"publisher","first-page":"1249","DOI":"10.1289\/txg.7125","article-title":"Assessment of prediction confidence and domain extrapolation of two structure\u2013activity relationship models for predicting estrogen receptor binding activity","volume":"112","author":"Tong","year":"2004","journal-title":"Environ. Health Perspect."},{"key":"ref24","doi-asserted-by":"publisher","first-page":"bbad226","DOI":"10.1093\/bib\/bbad226","article-title":"PharmBERT: a domain-specific BERT model for drug labels","volume":"24","author":"ValizadehAslani","year":"2023","journal-title":"Brief. Bioinform."},{"key":"ref25","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1213\/ane.0b013e31818c1b27","article-title":"The new Food and Drug Administration drug package insert: implications for patient safety and clinical care","volume":"108","author":"Watson","year":"2009","journal-title":"Anesth. Analg."},{"key":"ref26","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1186\/s12859-019-2628-5","article-title":"Study of serious adverse drug reactions using FDA-approved drug labeling and MedDRA","volume":"20","author":"Wu","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"ref27","doi-asserted-by":"publisher","first-page":"729834","DOI":"10.3389\/frai.2021.729834","article-title":"BERT-based natural language processing of drug labeling documents: a case study for classifying drug-induced liver injury risk","volume":"4","author":"Wu","year":"2021","journal-title":"Front Artif Intell"},{"key":"ref28","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1016\/j.drudis.2021.09.009","article-title":"A systematic comparison of hepatobiliary adverse drug reactions in FDA and EMA drug labeling reveals discrepancies","volume":"27","author":"Wu","year":"2022","journal-title":"Drug Discov. Today"},{"key":"ref29","doi-asserted-by":"crossref","DOI":"10.1109\/BDCAT.2018.00021","article-title":"Development of a radiology decision support system for the classification of MRI brain scans","author":"Zhang","year":"2018"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1401810\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,19]],"date-time":"2024-07-19T10:18:42Z","timestamp":1721384322000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2024.1401810\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,3]]},"references-count":29,"alternative-id":["10.3389\/frai.2024.1401810"],"URL":"https:\/\/doi.org\/10.3389\/frai.2024.1401810","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,3]]},"article-number":"1401810"}}