{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T16:05:57Z","timestamp":1774541157085,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T00:00:00Z","timestamp":1745366400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T00:00:00Z","timestamp":1745366400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"European Union\u2019s Horizon 2020","award":["956832"],"award-info":[{"award-number":["956832"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:sec>\n            <jats:title>Abstract<\/jats:title>\n            <jats:p>In drug discovery, prioritizing compounds for experimental testing is a critical task that can be optimized through active learning by strategically selecting informative molecules. Active learning typically trains models on labeled examples alone, while unlabeled data is only used for acquisition. This fully supervised approach neglects valuable information present in unlabeled molecular data, impairing both predictive performance and the molecule selection process. We address this limitation by integrating a transformer-based BERT model, pretrained on 1.26 million compounds, into the active learning pipeline. This effectively disentangles representation learning and uncertainty estimation, leading to more reliable molecule selection. Experiments on Tox21 and ClinTox datasets demonstrate that our approach achieves equivalent toxic compound identification with 50% fewer iterations compared to conventional active learning. Analysis reveals that pretrained BERT representations generate a structured embedding space enabling reliable uncertainty estimation despite limited labeled data, confirmed through Expected Calibration Error measurements. This work establishes that combining pretrained molecular representations with active learning significantly improves both model performance and acquisition efficiency in drug discovery, providing a scalable framework for compound prioritization.\n<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Scientific Contribution<\/jats:title>\n            <jats:p>We demonstrate that high-quality molecular representations fundamentally determine active learning success in drug discovery, outweighing acquisition strategy selection. We provide a framework that integrates pretrained transformer models with Bayesian active learning to separate representation learning from uncertainty estimation\u2014a critical distinction in low-data scenarios. This approach establishes a foundation for more efficient screening workflows across diverse pharmaceutical applications.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/s13321-025-00986-6","type":"journal-article","created":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T14:51:34Z","timestamp":1745419894000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Molecular property prediction using pretrained-BERT and Bayesian active learning: a data-efficient approach to drug design"],"prefix":"10.1186","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9190-3023","authenticated-orcid":false,"given":"Muhammad Arslan","family":"Masood","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1925-9154","authenticated-orcid":false,"given":"Samuel","family":"Kaski","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2392-0689","authenticated-orcid":false,"given":"Tianyu","family":"Cui","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,23]]},"reference":[{"issue":"2","key":"986_CR1","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1007\/BF00993277","volume":"15","author":"D Cohn","year":"1994","unstructured":"Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201\u2013221. https:\/\/doi.org\/10.1007\/BF00993277","journal-title":"Mach Learn"},{"issue":"4","key":"986_CR2","doi-asserted-by":"publisher","first-page":"381","DOI":"10.4155\/fmc-2016-0197","volume":"9","author":"D Reker","year":"2017","unstructured":"Reker D, Schneider P, Schneider G, Brown JB (2017) Active learning for computational chemogenomics. Future Med Chem 9(4):381\u2013402. https:\/\/doi.org\/10.4155\/fmc-2016-0197","journal-title":"Future Med Chem"},{"issue":"4","key":"986_CR3","doi-asserted-by":"publisher","first-page":"458","DOI":"10.1016\/j.drudis.2014.12.004","volume":"20","author":"D Reker","year":"2015","unstructured":"Reker D, Schneider G (2015) Active-learning strategies in computer-assisted drug discovery. Drug Discov Today 20(4):458\u2013465. https:\/\/doi.org\/10.1016\/j.drudis.2014.12.004","journal-title":"Drug Discov Today"},{"issue":"22","key":"986_CR4","doi-asserted-by":"publisher","first-page":"16838","DOI":"10.1021\/acs.jmedchem.1c01683","volume":"64","author":"X Ding","year":"2021","unstructured":"Ding X, Cui R, Yu J, Liu T, Zhu T, Wang D, Chang J, Fan Z, Liu X, Chen K, Jiang H, Li X, Luo X, Zheng M (2021) Active learning for drug design: a case study on the plasma exposure of orally administered drugs. J Med Chem 64(22):16838\u201316853. https:\/\/doi.org\/10.1021\/acs.jmedchem.1c01683. (Publisher: American Chemical Society)","journal-title":"J Med Chem"},{"issue":"3","key":"986_CR5","doi-asserted-by":"publisher","first-page":"653","DOI":"10.1021\/acs.jcim.3c01456","volume":"64","author":"GW Kyro","year":"2024","unstructured":"Kyro GW, Morgunov A, Brent RI, Batista VS (2024) ChemSpaceAL: an efficient active learning methodology applied to protein-specific molecular generation. J Chem Inf Model 64(3):653\u2013665. https:\/\/doi.org\/10.1021\/acs.jcim.3c01456. (Publisher: American Chemical Society)","journal-title":"J Chem Inf Model"},{"issue":"1","key":"986_CR6","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1186\/s13321-024-00924-y","volume":"16","author":"Y Nahal","year":"2024","unstructured":"Nahal Y, Menke J, Martinelli J, Heinonen M, Kabeshov M, Janet JP, Nittinger E, Engkvist O, Kaski S (2024) Human-in-the-loop active learning for goal-oriented molecule generation. J Cheminform 16(1):138. https:\/\/doi.org\/10.1186\/s13321-024-00924-y","journal-title":"J Cheminform"},{"issue":"1","key":"986_CR7","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1021\/acs.jcim.8b00597","volume":"59","author":"R Liu","year":"2019","unstructured":"Liu R, Wallqvist A (2019) Molecular similarity-based domain applicability metric efficiently identifies out-of-domain compounds. J Chem Inf Model 59(1):181\u2013189. https:\/\/doi.org\/10.1021\/acs.jcim.8b00597. (Publisher: American Chemical Society)","journal-title":"J Chem Inf Model"},{"issue":"6","key":"986_CR8","doi-asserted-by":"publisher","first-page":"1912","DOI":"10.1021\/ci049782w","volume":"44","author":"P Sheridan","year":"2004","unstructured":"Sheridan P, Feuston BP, Maiorov VN, Kearsley SK (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inf Comput Sci 44(6):1912\u20131928. https:\/\/doi.org\/10.1021\/ci049782w. (ISSN 0095-2338)","journal-title":"J Chem Inf Comput Sci"},{"key":"986_CR9","unstructured":"Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. arXiv:1612.01474 [stat]"},{"key":"986_CR10","doi-asserted-by":"publisher","unstructured":"H\u00fcllermeier E, Waegeman W (2021) Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn 110(3):457\u2013506. ISSN 0885-6125, 1573-0565. https:\/\/doi.org\/10.1007\/s10994-021-05946-3. arXiv:1910.09457 [cs]","DOI":"10.1007\/s10994-021-05946-3"},{"issue":"8","key":"986_CR11","doi-asserted-by":"publisher","first-page":"3770","DOI":"10.1021\/acs.jcim.0c00502","volume":"60","author":"L Hirschfeld","year":"2020","unstructured":"Hirschfeld L, Swanson K, Yang K, Barzilay R, Coley CW (2020) Uncertainty quantification using neural networks for molecular property prediction. J Chem Inf Model 60(8):3770\u20133780. https:\/\/doi.org\/10.1021\/acs.jcim.0c00502","journal-title":"J Chem Inf Model"},{"key":"986_CR12","unstructured":"Kendall A, Gal Y (2017) What uncertainties do we need in bayesian deep learning for computer vision?. arXiv:1703.04977 [cs]"},{"issue":"35","key":"986_CR13","doi-asserted-by":"publisher","first-page":"8154","DOI":"10.1039\/C9SC00616H","volume":"10","author":"Y Zhang","year":"2019","unstructured":"Zhang Y, Lee AA (2019) Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem Sci 10(35):8154\u20138163. https:\/\/doi.org\/10.1039\/C9SC00616H","journal-title":"Chem Sci"},{"key":"986_CR14","unstructured":"Houlsby N, Husz\u00e1r F, Ghahramani Z, Lengyel M (2011) Bayesian active learning for classification and preference learning. arXiv:1112.5745"},{"key":"986_CR15","unstructured":"Smith FB, Kirsch A, Farquhar S, Gal Y, Foster A, Rainforth T (2023) Prediction-oriented bayesian active learning. In: International conference on artificial intelligence and statistics, pp 7331\u20137348. PMLR"},{"issue":"12","key":"986_CR16","doi-asserted-by":"publisher","first-page":"4977","DOI":"10.1021\/jm4004285","volume":"57","author":"A Cherkasov","year":"2014","unstructured":"Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuzmin VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR Modeling: Where Have You Been? Where Are You Going To? J Med Chem 57(12):4977\u20135010. https:\/\/doi.org\/10.1021\/jm4004285. (Publisher: American Chemical Society)","journal-title":"J Med Chem"},{"issue":"1","key":"986_CR17","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1021\/acs.jcim.3c01250","volume":"64","author":"E Heid","year":"2024","unstructured":"Heid E, Greenman KP, Chung Y, Li SC, Graff DE, Vermeire FH, Wu H, Green WH, McGill CJ (2024) Chemprop: a machine learning package for chemical property prediction. J Chem Inf Model 64(1):9\u201317. https:\/\/doi.org\/10.1021\/acs.jcim.3c01250. (Publisher: American Chemical Society)","journal-title":"J Chem Inf Model"},{"key":"986_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpha.2024.101081","author":"J Jiang","year":"2024","unstructured":"Jiang J, Chen L, Ke L, Dou B, Zhang C, Feng H, Zhu Y, Qiu H, Zhang B, Wei G (2024) A review of transformers in drug discovery and beyond. J Pharm Anal. https:\/\/doi.org\/10.1016\/j.jpha.2024.101081","journal-title":"J Pharm Anal"},{"key":"986_CR19","unstructured":"Smith FB, Foster A, Rainforth T (2024) Making better use of unlabelled data in bayesian active learning. In: International conference on artificial intelligence and statistics, pp 847\u2013855. PMLR"},{"key":"986_CR20","unstructured":"Fabian B, Edlich T, Gaspar H, Segler M, Meyers J, Fiscato M, Ahmed M(2020) Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv:2011.13230 [cs]"},{"key":"986_CR21","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs]"},{"issue":"6","key":"986_CR22","doi-asserted-by":"publisher","first-page":"1882","DOI":"10.1021\/acs.jcim.3c01938","volume":"64","author":"C Zhonglin","year":"2024","unstructured":"Zhonglin C, Simone S, Ye W (2024) Large-scale pretraining improves sample efficiency of active learning-based virtual screening. J Chem Inf Model 64(6):1882\u20131891. https:\/\/doi.org\/10.1021\/acs.jcim.3c01938. (Publisher: American Chemical Society)","journal-title":"J Chem Inf Model"},{"issue":"2","key":"986_CR23","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1021\/acs.chemrestox.0c00264","volume":"34","author":"AM Richard","year":"2021","unstructured":"Richard AM, Huang R, Waidyanatha S, Shinn P, Collins BJ, Thillainadarajah I, Grulke CM, Williams AJ, Lougee RR, Judson RS, Houck KA, Shobair M, Yang C, Rathman JF, Yasgar A, Fitzpatrick SC, Simeonov A, Thomas RS, Crofton KM, Paules RS, Bucher JR, Austin CP, Kavlock RJ, Tice RR (2021) The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem Res Toxicol 34(2):189\u2013216. https:\/\/doi.org\/10.1021\/acs.chemrestox.0c00264","journal-title":"Chem Res Toxicol"},{"issue":"10","key":"986_CR24","doi-asserted-by":"publisher","first-page":"1294","DOI":"10.1016\/j.chembiol.2016.07.023","volume":"23","author":"KM Gayvert","year":"2016","unstructured":"Gayvert KM, Madhukar NS, Elemento O (2016) A data-driven approach to predicting successes and failures of clinical trials. Cell Chem Biol 23(10):1294\u20131301. https:\/\/doi.org\/10.1016\/j.chembiol.2016.07.023","journal-title":"Cell Chem Biol"},{"issue":"15","key":"986_CR25","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"W Bemis Guy","year":"1996","unstructured":"Bemis Guy W, Murcko Mark A (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39(15):2887\u20132893. https:\/\/doi.org\/10.1021\/jm9602928. (Publisher: American Chemical Society)","journal-title":"J Med Chem"},{"issue":"1","key":"986_CR26","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1214\/23-STS915","volume":"39","author":"T Rainforth","year":"2024","unstructured":"Rainforth T, Foster A, Ivanova DR, Smith FB (2024) Modern bayesian experimental design. Stat Sci 39(1):100\u2013114","journal-title":"Stat Sci"},{"key":"986_CR27","unstructured":"Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for computer vision?. Adv Neural Inf Process Syst, 30"},{"key":"986_CR28","unstructured":"Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 34th international conference on machine learning, pp. 1050\u20131059"},{"issue":"1","key":"986_CR29","doi-asserted-by":"publisher","first-page":"10752","DOI":"10.1038\/s41598-019-47148-x","volume":"9","author":"Z Zhou","year":"2019","unstructured":"Zhou Z, Kearnes S, Li L, Zare RN, Riley P (2019) ptimization of molecules via deep reinforcement learning. Sci Rep 9(1):10752","journal-title":"Sci Rep"},{"key":"986_CR30","doi-asserted-by":"crossref","unstructured":"Hao Z, Lu C, Huang Z, Wang H, Hu Z, Liu Q, Chen E, Lee C (2020) Asgn: an active semi-supervised graph neural network for molecular property prediction. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 731\u2013752","DOI":"10.1145\/3394486.3403117"},{"issue":"35","key":"986_CR31","doi-asserted-by":"publisher","first-page":"8154","DOI":"10.1039\/C9SC00616H","volume":"10","author":"Yao Zhang","year":"2019","unstructured":"Zhang Yao et al (2019) Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem Sci 10(35):8154\u20138163","journal-title":"Chem Sci"},{"issue":"518","key":"986_CR32","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","volume":"112","author":"DM Blei","year":"2017","unstructured":"Blei DM, Kucukelbir Alp, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859\u2013877","journal-title":"J Am Stat Assoc"},{"key":"986_CR33","unstructured":"Gal Y, Islam R, Ghahramani Z (2017) Deep bayesian active learning with image data. In: International conference on machine learning, pp 1183\u20131192. PMLR"},{"key":"986_CR34","doi-asserted-by":"crossref","unstructured":"Rakesh V, Jain S (2021) Efficacy of Bayesian neural networks in active learning. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 2601\u20132609","DOI":"10.1109\/CVPRW53098.2021.00294"},{"key":"986_CR35","doi-asserted-by":"publisher","unstructured":"Davies David L, Bouldin Donald W (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell , PAMI-1(2):224\u2013227. ISSN 1939-3539. https:\/\/doi.org\/10.1109\/TPAMI.1979.4766909. https:\/\/ieeexplore.ieee.org\/document\/4766909. Conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence","DOI":"10.1109\/TPAMI.1979.4766909"},{"key":"986_CR36","doi-asserted-by":"publisher","unstructured":"Theodoridis S (2020) Classification: a tour of the classics. In: Machine Learning, pages 301\u2013350. Elsevier. ISBN 978-0-12-818803-3. https:\/\/doi.org\/10.1016\/B978-0-12-818803-3.00016-7. https:\/\/linkinghub.elsevier.com\/retrieve\/pii\/B9780128188033000167","DOI":"10.1016\/B978-0-12-818803-3.00016-7"},{"key":"986_CR37","unstructured":"Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. arXiv:1706.04599"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00986-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-00986-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00986-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T14:51:42Z","timestamp":1745419902000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-00986-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,23]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["986"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-00986-6","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,23]]},"assertion":[{"value":"31 December 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declaration"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"58"}}