{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:10:52Z","timestamp":1776100252639,"version":"3.50.1"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["956832"],"award-info":[{"award-number":["956832"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100023293","name":"Finnish Center for Artificial Intelligence","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100023293","id-type":"DOI","asserted-by":"publisher"}]},{"name":"BASF SE,Germany"},{"name":"Christian Doppler Research Association,Austria"},{"name":"Austrian National Foundation for Research, Technology and Development, Austria"},{"name":"Federal Ministry of Labour and Economy,Austria"},{"name":"Boehringer-Ingelheim RCV GmbH & Co KG,Austria"},{"DOI":"10.13039\/100014013","name":"UK Research and Innovation","doi-asserted-by":"publisher","award":["EP\/W002973\/1"],"award-info":[{"award-number":["EP\/W002973\/1"]}],"id":[{"id":"10.13039\/100014013","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003065","name":"University of Vienna","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003065","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:sec>\n                    <jats:title>Abstract<\/jats:title>\n                    <jats:p>Assay interference caused by small organic compounds continues to pose formidable challenges to early drug discovery. Various computational methods have been developed to identify compounds likely to cause assay interference. However, due to the scarcity of data available for model development, the predictive accuracy and applicability of these approaches are limited. In this work, we present E-GuARD, a novel framework seeking to address data scarcity and imbalance by integrating self-distillation, active learning, and expert-guided molecular generation. E-GuARD iteratively enriches the training data with interference-relevant molecules, resulting in quantitative structure-interference relationship (QSIR) models with superior performance. We demonstrate the utility of E-GuARD with the examples of four high-quality data sets on thiol reactivity, redox reactivity, nanoluciferase inhibition, and firefly luciferase inhibition. Our models reached MCC values of up to 0.47 for these data sets, with two-fold or higher improvements in enrichment factors compared to models trained without E-GuARD data augmentation. These results highlight the potential of E-GuARD as a scalable solution to mitigating assay interference in early drug discovery.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Scientific contribution<\/jats:title>\n                    <jats:p>We present E-GuARD, an innovative framework that combines iterative self-distillation with guided molecular augmentation to enhance the predictive performance of QSAR models. By allowing models to learn from newly generated, informative compounds through iterations, E-GuARD facilitates the understanding of underrepresented structural patterns and improves performance on unseen data. When applied across different interference mechanisms, E-GuARD consistently outperformed standard approaches. E-GuARD establishes the foundation for further research into dynamic data enrichment and more robust molecular modeling.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s13321-025-01014-3","type":"journal-article","created":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T09:31:14Z","timestamp":1745919074000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays"],"prefix":"10.1186","volume":"17","author":[{"given":"Vincenzo","family":"Palmacci","sequence":"first","affiliation":[]},{"given":"Yasmine","family":"Nahal","sequence":"additional","affiliation":[]},{"given":"Matthias","family":"Welsch","sequence":"additional","affiliation":[]},{"given":"Ola","family":"Engkvist","sequence":"additional","affiliation":[]},{"given":"Samuel","family":"Kaski","sequence":"additional","affiliation":[]},{"given":"Johannes","family":"Kirchmair","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"key":"1014_CR1","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1038\/nrd.2017.232","volume":"17","author":"G Schneider","year":"2018","unstructured":"Schneider G (2018) Automating drug discovery. Nat Rev Drug Discov 17:97\u2013113","journal-title":"Nat Rev Drug Discov"},{"key":"1014_CR2","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1038\/s41570-024-00593-3","volume":"8","author":"L Tan","year":"2024","unstructured":"Tan L, Hirte S, Palmacci V, Stork C, Kirchmair J (2024) Tackling assay interference associated with small molecules. Nat Rev Chem 8:319\u2013339","journal-title":"Nat Rev Chem"},{"key":"1014_CR3","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1016\/j.cbpa.2010.03.020","volume":"14","author":"N Thorne","year":"2010","unstructured":"Thorne N, Auld DS, Inglese J (2010) Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Curr Opin Chem Biol 14:315\u2013324","journal-title":"Curr Opin Chem Biol"},{"key":"1014_CR4","doi-asserted-by":"publisher","first-page":"481","DOI":"10.1038\/513481a","volume":"513","author":"J Baell","year":"2014","unstructured":"Baell J, Walters MA (2014) Chemistry: chemical con artists foil drug discovery. Nature 513:481\u2013483","journal-title":"Nature"},{"key":"1014_CR5","first-page":"100007","volume":"1","author":"C Stork","year":"2021","unstructured":"Stork C, Mathai N, Kirchmair J (2021) Computational prediction of frequent hitters in target-based and cell-based assays. Artif Intell Life Sci 1:100007","journal-title":"Artif Intell Life Sci"},{"key":"1014_CR6","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1093\/bioinformatics\/btz695","volume":"36","author":"C Stork","year":"2020","unstructured":"Stork C et al (2020) NERDD: a web portal providing access to in silico tools for drug discovery. Bioinformatics 36:1291\u20131292","journal-title":"Bioinformatics"},{"key":"1014_CR7","first-page":"100099","volume":"5","author":"V Palmacci","year":"2024","unstructured":"Palmacci V, Hirte S, Hern\u00e1ndez Gonz\u00e1lez JE, Montanari F, Kirchmair J (2024) Statistical approaches enabling technology-specific assay interference prediction from large screening data sets. Artif Intell Life Sci 5:100099","journal-title":"Artif Intell Life Sci"},{"key":"1014_CR8","doi-asserted-by":"publisher","first-page":"bbaa282","DOI":"10.1093\/bib\/bbaa282","volume":"22","author":"Z-Y Yang","year":"2021","unstructured":"Yang Z-Y et al (2021) ChemFLuo: a web-server for structure analysis and identification of fluorescent compounds. Brief Bioinform 22:bbaa282","journal-title":"Brief Bioinform"},{"key":"1014_CR9","doi-asserted-by":"publisher","first-page":"3714","DOI":"10.1021\/acs.jcim.9b00541","volume":"59","author":"Z-Y Yang","year":"2019","unstructured":"Yang Z-Y et al (2019) Structural analysis and identification of colloidal aggregators in drug discovery. J Chem Inf Model 59:3714\u20133726","journal-title":"J Chem Inf Model"},{"key":"1014_CR10","doi-asserted-by":"publisher","first-page":"1795","DOI":"10.1002\/cmdc.201900395","volume":"14","author":"L David","year":"2019","unstructured":"David L et al (2019) Identification of compounds that interfere with high-throughput screening assay technologies. ChemMedChem 14:1795\u20131802","journal-title":"ChemMedChem"},{"key":"1014_CR11","doi-asserted-by":"publisher","first-page":"12828","DOI":"10.1021\/acs.jmedchem.3c00482","volume":"66","author":"VM Alves","year":"2023","unstructured":"Alves VM et al (2023) Lies and liabilities: computational assessment of high-throughput screening hits to identify artifact compounds. J Med Chem 66:12828\u201312839","journal-title":"J Med Chem"},{"key":"1014_CR12","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1016\/j.neuroimage.2013.10.005","volume":"87","author":"R Dubey","year":"2014","unstructured":"Dubey R, Zhou J, Wang Y, Thompson PM, Ye J (2014) Analysis of sampling techniques for imbalanced data: an n = 648 ADNI study. Neuroimage 87:220\u2013241","journal-title":"Neuroimage"},{"key":"1014_CR13","doi-asserted-by":"publisher","unstructured":"Lin T-Y, Goyal P, Girshick R, He K, Doll\u00e1r P. Focal loss for dense object detection. 2018. Preprint at https:\/\/doi.org\/10.48550\/arXiv.1708.02002.","DOI":"10.48550\/arXiv.1708.02002"},{"key":"1014_CR14","doi-asserted-by":"publisher","unstructured":"Bjerrum EJ. SMILES enumeration as data augmentation for neural network modeling of folecules. 2017. Preprint at https:\/\/doi.org\/10.48550\/arXiv.1703.07076.","DOI":"10.48550\/arXiv.1703.07076"},{"key":"1014_CR15","doi-asserted-by":"publisher","first-page":"18299","DOI":"10.1038\/s41598-023-45532-2","volume":"13","author":"D Schaudt","year":"2023","unstructured":"Schaudt D et al (2023) Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset. Sci Rep 13:18299","journal-title":"Sci Rep"},{"key":"1014_CR16","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321\u2013357","journal-title":"J Artif Intell Res"},{"key":"1014_CR17","doi-asserted-by":"crossref","unstructured":"Xie Q, Luong M-T, Hovy E, Le QV. Self-training with noisy student improves ImageNet classification. 2020. Preprint at http:\/\/arxiv.org\/abs\/1911.04252.","DOI":"10.1109\/CVPR42600.2020.01070"},{"key":"1014_CR18","doi-asserted-by":"publisher","unstructured":"Zhang L et al. Be your own teacher: improve the performance of convolutional neural networks via self distillation. 2019. Preprint at https:\/\/doi.org\/10.48550\/arXiv.1905.08094.","DOI":"10.48550\/arXiv.1905.08094"},{"key":"1014_CR19","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","volume":"596","author":"J Jumper","year":"2021","unstructured":"Jumper J et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583\u2013589","journal-title":"Nature"},{"key":"1014_CR20","doi-asserted-by":"publisher","first-page":"158","DOI":"10.1186\/s12859-022-04681-3","volume":"23","author":"Y Liu","year":"2022","unstructured":"Liu Y, Lim H, Xie L (2022) Exploration of chemical space with partial labeled noisy student self-training and self-supervised graph embedding. BMC Bioinform 23:158","journal-title":"BMC Bioinform"},{"key":"1014_CR21","doi-asserted-by":"publisher","DOI":"10.3389\/fenvs.2015.00085","author":"R Huang","year":"2016","unstructured":"Huang R et al (2016) Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs. Front Environ Sci. https:\/\/doi.org\/10.3389\/fenvs.2015.00085","journal-title":"Front Environ Sci"},{"key":"1014_CR22","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1038\/s43588-024-00704-6","volume":"4","author":"Z Fralish","year":"2024","unstructured":"Fralish Z, Reker D (2024) Taking a deep dive with active learning for drug discovery. Nat Comput Sci 4:727\u2013728","journal-title":"Nat Comput Sci"},{"key":"1014_CR23","doi-asserted-by":"publisher","unstructured":"Nahal Y et al. Human-in-the-loop active learning for goal-oriented molecule generation. 2024. Preprint at https:\/\/doi.org\/10.1186\/s13321-024-00924-y.","DOI":"10.1186\/s13321-024-00924-y"},{"key":"1014_CR24","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-024-00812-5","volume":"16","author":"HH Loeffler","year":"2024","unstructured":"Loeffler HH et al (2024) Reinvent 4: modern AI\u2013driven generative molecule design. J Cheminformatics 16:20","journal-title":"J Cheminformatics"},{"key":"1014_CR25","doi-asserted-by":"publisher","first-page":"6651","DOI":"10.1038\/s41467-023-42242-1","volume":"14","author":"O-H Choung","year":"2023","unstructured":"Choung O-H, Vianello R, Segler M, Stiefl N, Jim\u00e9nez-Luna J (2023) Extracting medicinal chemistry intuition via preference machine learning. Nat Commun 14:6651","journal-title":"Nat Commun"},{"key":"1014_CR26","doi-asserted-by":"publisher","first-page":"933","DOI":"10.1021\/acs.jcim.7b00574","volume":"58","author":"D Ghosh","year":"2018","unstructured":"Ghosh D, Koch U, Hadian K, Sattler M, Tetko IV (2018) Luciferase Advisor: high-accuracy model to flag false positive hits in luciferase HTS assays. J Chem Inf Model 58:933\u2013942","journal-title":"J Chem Inf Model"},{"key":"1014_CR27","unstructured":"Lemaitre G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. 2016. arXiv.org https:\/\/arxiv.org\/abs\/1609.06570v1."},{"key":"1014_CR28","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. 2019. arXiv.org https:\/\/arxiv.org\/abs\/1907.10902v1.","DOI":"10.1145\/3292500.3330701"},{"key":"1014_CR29","unstructured":"RDKit. https:\/\/www.rdkit.org\/."},{"key":"1014_CR30","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1038\/nchem.1243","volume":"4","author":"GR Bickerton","year":"2012","unstructured":"Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90\u201398","journal-title":"Nat Chem"},{"key":"1014_CR31","doi-asserted-by":"publisher","unstructured":"Smith FB et al. Prediction-oriented Bayesian active learning. 2023. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2304.08151.","DOI":"10.48550\/arXiv.2304.08151"},{"key":"1014_CR32","doi-asserted-by":"publisher","first-page":"383","DOI":"10.1021\/acs.molpharmaceut.2c00680","volume":"20","author":"R Rodr\u00edguez-P\u00e9rez","year":"2023","unstructured":"Rodr\u00edguez-P\u00e9rez R, Trunzer M, Schneider N, Faller B, Gerebtzoff G (2023) Multispecies machine learning predictions of in vitro intrinsic clearance with uncertainty quantification analyses. Mol Pharm 20:383\u2013394","journal-title":"Mol Pharm"},{"key":"1014_CR33","doi-asserted-by":"publisher","first-page":"2719","DOI":"10.1021\/jm901137j","volume":"53","author":"JB Baell","year":"2010","unstructured":"Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53:2719\u20132740","journal-title":"J Med Chem"},{"key":"1014_CR34","unstructured":"Kornblith S, Norouzi M, Lee H, Hinton G. Similarity of Neural Network Representations Revisited. In Proceedings of the 36th International Conference on Machine Learning, (PMLR). 2019; p.3519\u20133529"},{"key":"1014_CR35","doi-asserted-by":"publisher","first-page":"7303","DOI":"10.1021\/acs.jcim.4c00837","volume":"64","author":"M Welsch","year":"2024","unstructured":"Welsch M, Hirte S, Kirchmair J (2024) Deciphering molecular embeddings with centered kernel alignment. J Chem Inf Model 64:7303\u20137312","journal-title":"J Chem Inf Model"},{"key":"1014_CR36","doi-asserted-by":"publisher","unstructured":"Davies A, and Ghahramani Z. The random forest kernel and other kernels for big data from random partitions. 2014. Preprint at https:\/\/doi.org\/10.48550\/arXiv.1402.4293.","DOI":"10.48550\/arXiv.1402.4293"},{"key":"1014_CR37","doi-asserted-by":"publisher","unstructured":"Abdullah BM, Zaitova I, Avgustinova T, M\u00f6bius B, Klakow D. How familiar does that sound? Cross-lingual representational similarity analysis of acoustic word embeddings. 2021. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2109.10179.","DOI":"10.48550\/arXiv.2109.10179"},{"key":"1014_CR38","volume":"3","author":"M Vogt","year":"2023","unstructured":"Vogt M (2023) Exploring chemical space\u2014Generative models and their evaluation. Artif Intell Life Sci 3:100064","journal-title":"Artif Intell Life Sci"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01014-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01014-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01014-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T09:31:22Z","timestamp":1745919082000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01014-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":38,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1014"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01014-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-5740531\/v1","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,29]]},"assertion":[{"value":"31 December 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 April 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"64"}}