{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T16:37:44Z","timestamp":1780591064589,"version":"3.54.1"},"reference-count":23,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T00:00:00Z","timestamp":1753056000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Background<\/jats:title><jats:p>The body of toxicological knowledge and literature is expanding at an accelerating pace. This rapid growth presents significant challenges for researchers, who must stay abreast with latest studies while also synthesizing the vast amount of published information.<\/jats:p><\/jats:sec><jats:sec><jats:title>Goal<\/jats:title><jats:p>Our goal is to automatically identify potential hepatoxicants from over 50,000 compounds using the wealth of scientific publications and knowledge.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We employ and compare three distinct methods for automatic information extraction from unstructured text: (1) text mining (2) word embeddings and (3) large language models. These approaches are combined to calculate a hepatotoxicity score for over 50,000 compounds. We assess the performance of the different methods with a use case on Drug-Induced Liver Injury (DILI).<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We evaluated hepatotoxicity for over 50,000 compounds and calculated a hepatotoxicity score for each compound. Our results indicate that text mining is effective for this purpose, achieving an Area Under the Curve (AUC) of 0.8 in DILI validation. Large language models performed even better, with an AUC of 0.85, thanks to their ability to interpret the semantic context accurately. Combining these methods further improved performance, yielding an AUC of 0.87 in DILI validation. All findings are available for download to support further research on toxicity assessment.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>We demonstrated that automated text mining is able to successfully assess the toxicity of compounds. A text mining approach seems to be superior to word embeddings. However, the application of a large language model with prompt engineering showed the best performance.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2025.1561292","type":"journal-article","created":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T05:21:49Z","timestamp":1753075309000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Systematic analysis of hepatotoxicity: combining literature mining and AI language models"],"prefix":"10.3389","volume":"8","author":[{"given":"Chris","family":"Bauer","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Long Tran","family":"Duc Dang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Twan","family":"van den Beucken","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Johannes","family":"Schuchhardt","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ralf","family":"Herwig","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1965","published-online":{"date-parts":[[2025,7,21]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2402.14762","article-title":"Mt-bench-101: A fine-grained benchmark for evaluating large language models in multi-turn dialogues","author":"Bai","year":"2024","journal-title":"arXiv"},{"key":"B2","doi-asserted-by":"publisher","first-page":"274","DOI":"10.1186\/s12967-021-02941-z","article-title":"Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types","volume":"19","author":"Bauer","year":"2021","journal-title":"J. Transl. Med"},{"key":"B3","doi-asserted-by":"publisher","first-page":"W484","DOI":"10.1093\/nar\/gkx462","article-title":"LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes","volume":"45","author":"Canada","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"B4","doi-asserted-by":"publisher","first-page":"47005","DOI":"10.1289\/EHP4200","article-title":"Linking bisphenol s to adverse outcome pathways using a combined text mining and systems biology approach","volume":"127","author":"Carvaillo","year":"2019","journal-title":"Environ. Health Perspect"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1340","DOI":"10.1053\/j.gastro.2015.03.006","article-title":"Features and outcomes of 899 patients with drug-induced liver injury: the DILIN prospective study","volume":"148","author":"Chalasani","year":"2015","journal-title":"Gastroenterology"},{"key":"B6","doi-asserted-by":"publisher","first-page":"648","DOI":"10.1016\/j.drudis.2016.02.015","article-title":"DILIrank: the largest reference drug list ranked by the risk for developing drug-induced liver injury in humans","volume":"21","author":"Chen","year":"2016","journal-title":"Drug Discov. Today"},{"key":"B7","volume-title":"Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference","author":"Chiang","year":"2024"},{"key":"B8","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1016\/B978-0-12-387817-5.00040-6","article-title":"\u201cChapter 40 - livertox: A website on drug-induced liver injury,\u201d","volume-title":"Drug-Induced Liver Disease","author":"Hoofnagle","year":"2013"},{"key":"B9","doi-asserted-by":"publisher","first-page":"108017","DOI":"10.1016\/j.envint.2023.108017","article-title":"AOP-helpFinder 2.0: Integration of an event-event searches module","volume":"177","author":"Jaylet","year":"2023","journal-title":"Environ. Int"},{"key":"B10","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.1093\/bioinformatics\/btab750","article-title":"AOP-helpFinder webserver: a tool for comprehensive analysis of the literature to support adverse outcome pathways development","volume":"38","author":"Jornod","year":"2022","journal-title":"Bioinformatics"},{"key":"B11","doi-asserted-by":"publisher","first-page":"2839","DOI":"10.1093\/bioinformatics\/btw343","article-title":"TaggerOne: joint named entity recognition and normalization with semi-Markov Models","volume":"32","author":"Leaman","year":"2016","journal-title":"Bioinformatics"},{"key":"B12","doi-asserted-by":"publisher","first-page":"1234","DOI":"10.1093\/bioinformatics\/btz682","article-title":"BioBERT: a pre-trained biomedical language representation model for biomedical text mining","volume":"36","author":"Lee","year":"2020","journal-title":"Bioinformatics"},{"key":"B13","doi-asserted-by":"publisher","first-page":"1524","DOI":"10.1021\/acs.chemrestox.4c00134","article-title":"Toward an explainable large language model for the automatic identification of the drug-induced liver injury literature","volume":"37","author":"Ma","year":"2024","journal-title":"Chem. Res. Toxicol"},{"key":"B14","doi-asserted-by":"publisher","author":"Mikolov","year":"2013","DOI":"10.48550\/arXiv.1310.4546"},{"key":"B15","volume-title":"Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine","author":"Nori","year":"2023"},{"key":"B16","doi-asserted-by":"crossref","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"\u201cGlove: Global vectors for word representation,\u201d","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"B17","article-title":"\u201cHow context affects language models' factual predictions,\u201d","volume-title":"Automated Knowledge Base Construction","author":"Petroni","year":"2020"},{"key":"B18","doi-asserted-by":"publisher","first-page":"3486","DOI":"10.1002\/hep.31999","article-title":"Key characteristics of human hepatotoxicants as a basis for identification and characterization of the causes of liver toxicity","volume":"74","author":"Rusyn","year":"2021","journal-title":"Hepatology"},{"key":"B19","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","article-title":"Large language models encode clinical knowledge","volume":"620","author":"Singhal","year":"2023","journal-title":"Nature"},{"key":"B20","doi-asserted-by":"publisher","first-page":"4837","DOI":"10.1093\/bioinformatics\/btac598","article-title":"BERN2: an advanced neural biomedical named entity recognition and normalization tool","volume":"38","author":"Sung","year":"2022","journal-title":"Bioinformatics"},{"key":"B21","doi-asserted-by":"publisher","first-page":"2628","DOI":"10.1021\/acs.jcim.3c00200","article-title":"Artificial intelligence in drug toxicity prediction: recent advances, challenges, and future perspectives","volume":"63","author":"Tran","year":"2023","journal-title":"J. Chem. Inf. Model"},{"key":"B22","doi-asserted-by":"publisher","first-page":"W587","DOI":"10.1093\/nar\/gkz389","article-title":"PubTator central: automated concept annotation for biomedical full text articles","volume":"47","author":"Wei","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"B23","doi-asserted-by":"publisher","first-page":"918710","DOI":"10.1155\/2015\/918710","article-title":"GNormPlus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"Biomed Res. Int"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1561292\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,11]],"date-time":"2025-08-11T11:17:01Z","timestamp":1754911021000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1561292\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,21]]},"references-count":23,"alternative-id":["10.3389\/frai.2025.1561292"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1561292","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,21]]},"article-number":"1561292"}}