{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T14:40:25Z","timestamp":1776696025384,"version":"3.51.2"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T00:00:00Z","timestamp":1742947200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T00:00:00Z","timestamp":1742947200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","award":["101120466"],"award-info":[{"award-number":["101120466"]}],"id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","award":["101120466"],"award-info":[{"award-number":["101120466"]}],"id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100019180","name":"HORIZON EUROPE European Research Council","doi-asserted-by":"publisher","award":["101120466"],"award-info":[{"award-number":["101120466"]}],"id":[{"id":"10.13039\/100019180","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001858","name":"VINNOVA","doi-asserted-by":"publisher","award":["2023-03000"],"award-info":[{"award-number":["2023-03000"]}],"id":[{"id":"10.13039\/501100001858","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>This study investigates the risks of exposing confidential chemical structures when machine learning models trained on these structures are made publicly available. We use membership inference attacks, a common method to assess privacy that is largely unexplored in the context of drug discovery, to examine neural networks for molecular property prediction in a black-box setting. Our results reveal significant privacy risks across all evaluated datasets and neural network architectures. Combining multiple attacks increases these risks. Molecules from minority classes, often the most valuable in drug discovery, are particularly vulnerable. We also found that representing molecules as graphs and using message-passing neural networks may mitigate these risks. We provide a framework to assess privacy risks of classification models and molecular representations, available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/FabianKruger\/molprivacy\" ext-link-type=\"uri\">https:\/\/github.com\/FabianKruger\/molprivacy<\/jats:ext-link>. Our findings highlight the need for careful consideration when sharing neural networks trained on proprietary chemical structures, informing organisations and researchers about the trade-offs between data confidentiality and model openness.<\/jats:p>","DOI":"10.1186\/s13321-025-00982-w","type":"journal-article","created":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T22:14:07Z","timestamp":1743027247000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Publishing neural networks in drug discovery might compromise training data privacy"],"prefix":"10.1186","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-6420-2175","authenticated-orcid":false,"given":"Fabian P.","family":"Kr\u00fcger","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4138-0508","authenticated-orcid":false,"given":"Johan","family":"\u00d6stman","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7271-0824","authenticated-orcid":false,"given":"Lewis","family":"Mervin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6855-0012","authenticated-orcid":false,"given":"Igor V.","family":"Tetko","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4970-6461","authenticated-orcid":false,"given":"Ola","family":"Engkvist","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,26]]},"reference":[{"issue":"6","key":"982_CR1","doi-asserted-by":"crossref","first-page":"1241","DOI":"10.1016\/j.drudis.2018.01.039","volume":"23","author":"Hongming Chen","year":"2018","unstructured":"Chen Hongming, Engkvist Ola, Wang Yinhai, Olivecrona Marcus, Blaschke Thomas (2018) The rise of deep learning in drug discovery. Drug Dis Today 23(6):1241\u20131250","journal-title":"Drug Dis Today"},{"issue":"11","key":"982_CR2","doi-asserted-by":"crossref","first-page":"3525","DOI":"10.1039\/D0CS00098A","volume":"49","author":"Eugene N Muratov","year":"2020","unstructured":"Muratov Eugene N, Bajorath J\u00fcrgen, Sheridan Robert P, Tetko Igor V, Filimonov Dmitry, Poroikov Vladimir, Oprea Tudor I, Baskin Igor I, Varnek Alexandre, Roitberg Adrian et al (2020) Qsar without borders. Cheml Soc Rev 49(11):3525\u20133564","journal-title":"Cheml Soc Rev"},{"issue":"3","key":"982_CR3","doi-asserted-by":"crossref","first-page":"1947","DOI":"10.1007\/s10462-021-10058-4","volume":"55","author":"Suresh Dara","year":"2022","unstructured":"Dara Suresh, Dhamercherla Swetha, Jadav Surender Singh, Madhu Babu CH, Ahsan Mohamed Jawed (2022) Machine learning in drug discovery: a review. Artif Intell Rev 55(3):1947\u20131999","journal-title":"Artif Intell Rev"},{"issue":"6","key":"982_CR4","doi-asserted-by":"crossref","first-page":"463","DOI":"10.1038\/s41573-019-0024-5","volume":"18","author":"Jessica Vamathevan","year":"2019","unstructured":"Vamathevan Jessica, Clark Dominic, Czodrowski Paul, Dunham Ian, Ferran Edgardo, Lee George, Li Bin, Madabhushi Anant, Shah Parantu, Spitzer Michaela et al (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Dis 18(6):463\u2013477","journal-title":"Nat Rev Drug Dis"},{"key":"982_CR5","first-page":"15576","volume":"37","author":"Martijn Oldenhof","year":"2023","unstructured":"Oldenhof Martijn, \u00c1cs Gergely, Pej\u00f3 Bal\u00e1zs, Schuffenhauer Ansgar, Holway Nicholas, Sturm No\u00e9, Dieckmann Arne, Fortmeier Oliver, Boniface Eric, Mayer Cl\u00e9ment et al (2023) Industry-scale orchestrated federated learning for drug discovery. Proc AAAI Conf Artif Intell 37:15576\u201315584","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"982_CR6","unstructured":"Zuckerberg Mark (2024) Open-source ai is the path forward, July 2024. URL https:\/\/about.fb.com\/news\/2024\/07\/open-source-ai-is-the-path-forward\/. Accessed: 25-09-2025"},{"issue":"11","key":"982_CR7","doi-asserted-by":"crossref","first-page":"908","DOI":"10.1038\/s43588-023-00540-0","volume":"3","author":"Yash Raj Shrestha","year":"2023","unstructured":"Yash Raj Shrestha (2023) Georg von Krogh, and Stefan Feuerriegel Building open-source ai. Nat Comput Sci 3(11):908\u2013911","journal-title":"Nat Comput Sci"},{"key":"982_CR8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12910-020-00568-1","volume":"22","author":"Blake Murdoch","year":"2021","unstructured":"Murdoch Blake (2021) Privacy and artificial intelligence: challenges for protecting health information in a new era. BMC Med Ethics 22:1\u20135","journal-title":"BMC Med Ethics"},{"key":"982_CR9","doi-asserted-by":"crossref","unstructured":"Shokri Reza, Stronati Marco, Song Congzheng, Shmatikov Vitaly (2017) Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP), pages 3\u201318. IEEE","DOI":"10.1109\/SP.2017.41"},{"key":"982_CR10","unstructured":"Murakonda Sasi Kumar, Shokri Reza (2020) Ml privacy meter: Aiding regulatory compliance by quantifying the privacy risks of machine learning. arXiv preprint arXiv:2007.09339"},{"key":"982_CR11","doi-asserted-by":"crossref","unstructured":"Carlini Nicholas, Chien Steve, Nasr Milad, Song Shuang, Terzis Andreas, Tramer Florian (2022) Membership inference attacks from first principles. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1897\u20131914. IEEE","DOI":"10.1109\/SP46214.2022.9833649"},{"key":"982_CR12","doi-asserted-by":"crossref","unstructured":"Salem Ahmed, Cherubin Giovanni, Evans David, K\u00f6pf Boris, Paverd Andrew, Suri Anshuman, Tople Shruti, Zanella-B\u00e9guelin Santiago (2023) Sok: Let the privacy games begin! a unified treatment of data inference privacy in machine learning. In 2023 IEEE Symposium on Security and Privacy (SP), pages 327\u2013345. IEEE","DOI":"10.1109\/SP46215.2023.10179281"},{"issue":"11","key":"982_CR13","first-page":"1","volume":"54","author":"Hu Hongsheng","year":"2022","unstructured":"Hongsheng Hu, Zoran Salcic, Lichao Sun, Dobbie Gillian Yu, Philip S, Xuyun Zhang (2022) Membership inference attacks on machine learning: A survey. ACM Comput Surv (CSUR) 54(11):1\u201337","journal-title":"ACM Comput Surv (CSUR)"},{"key":"982_CR14","unstructured":"Zarifzadeh Sajjad, Liu Philippe, Shokri Reza (2024) Low-cost high-power membership inference attacks. In Forty-first International Conference on Machine Learning"},{"key":"982_CR15","unstructured":"Pejo Balazs, Remeli Mina, Arany Adam, Galtier Mathieu, Acs Gergely (2022) Collaborative drug discovery: Inference-level data protection perspective. arXiv preprint arXiv:2205.06506"},{"key":"982_CR16","unstructured":"Bergstra James, Bardenet R\u00e9mi, Bengio Yoshua, K\u00e9gl Bal\u00e1zs (2011) Algorithms for hyper-parameter optimization. Advances in neural information processing systems 24"},{"issue":"6","key":"982_CR17","doi-asserted-by":"crossref","first-page":"1686","DOI":"10.1021\/ci300124c","volume":"52","author":"Ines Filipa Martins","year":"2012","unstructured":"Martins Ines Filipa, Teixeira Ana L, Pinheiro Luis, Falcao Andre O (2012) A bayesian approach to in silico blood-brain barrier penetration modeling. J Chem Inf Model 52(6):1686\u20131697","journal-title":"J Chem Inf Model"},{"issue":"9","key":"982_CR18","doi-asserted-by":"crossref","first-page":"2077","DOI":"10.1021\/ci900161g","volume":"49","author":"Katja Hansen","year":"2009","unstructured":"Hansen Katja, Mika Sebastian, Schroeter Timon, Sutter Andreas, Ter Laak Antonius, Steger-Hartmann Thomas, Heinrich Nikolaus, Muller Klaus-Robert (2009) Benchmark data set for in silico prediction of ames mutagenicity. J Chem Inf Model 49(9):2077\u20132081","journal-title":"J Chem Inf Model"},{"issue":"11","key":"982_CR19","doi-asserted-by":"crossref","first-page":"2840","DOI":"10.1021\/ci300400a","volume":"52","author":"Xu Congying","year":"2012","unstructured":"Congying Xu, Cheng Feixiong, Chen Lei, Zheng Du, Li Weihua, Liu Guixia, Lee Philip W, Tang Yun (2012) In silico prediction of chemical ames mutagenicity. J Chem Inf Model 52(11):2840\u20132847","journal-title":"J Chem Inf Model"},{"issue":"10","key":"982_CR20","doi-asserted-by":"crossref","first-page":"2316","DOI":"10.1021\/acs.jcim.2c00041","volume":"62","author":"Katherine S Lim","year":"2022","unstructured":"Lim Katherine S, Reidenbach Andrew G, Hua Bruce K, Mason Jeremy W, Gerry Christopher J, Clemons Paul A, Coley Connor W (2022) Machine learning on dna-encoded library count data using an uncertainty-aware probabilistic loss function. J Chem Inf Model 62(10):2316\u20132331","journal-title":"J Chem Inf Model"},{"issue":"6","key":"982_CR21","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1089\/adt.2011.0425","volume":"9","author":"Du Fang","year":"2011","unstructured":"Fang Du, Haibo Yu, Zou Beiyan, Babcock Joseph, Long Shunyou, Li Min (2011) Hergcentral: a large database to store, retrieve, and analyze compound-human ether-a-go-go related gene channel interactions to facilitate cardiotoxicity assessment in drug development. Assay Drug Dev Technol 9(6):580\u2013588","journal-title":"Assay Drug Dev Technol"},{"key":"982_CR22","unstructured":"Huang Kexin, Tianfan Fu, Gao Wenhao, Zhao Yue, Roohani Yusuf, Leskovec Jure, Coley Connor W, Xiao Cao, Sun Jimeng, Zitnik Marinka (2021) Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. arXiv preprint arXiv:2102.09548"},{"key":"982_CR23","doi-asserted-by":"crossref","unstructured":"Wu Bang,Yang Xiangwen, Pan Shirui, Yuan Xingliang (2021) Adapting membership inference attacks to gnn for graph classification: Approaches and implications. In 2021 IEEE International Conference on Data Mining (ICDM), pages 1421\u20131426. IEEE","DOI":"10.1109\/ICDM51629.2021.00182"},{"key":"982_CR24","doi-asserted-by":"crossref","unstructured":"Ye Jiayuan, Maddi Aadyaa, Murakonda Sasi\u00a0Kumar, Bindschaedler Vincent, Shokri Reza (2022) Enhanced membership inference attacks against machine learning models. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pages 3093\u20133106,","DOI":"10.1145\/3548606.3560675"},{"key":"982_CR25","unstructured":"Jain Prateek, Kulkarni Vivek, Thakurta Abhradeep, Williams Oliver (2015) To drop or not to drop: Robustness, consistency and differential privacy properties of dropout. arXiv preprint arXiv:1503.02031"},{"issue":"1","key":"982_CR26","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1038\/s43586-021-00084-5","volume":"2","author":"Alexander L Satz","year":"2022","unstructured":"Satz Alexander L, Brunschweiger Andreas, Flanagan Mark E, Gloger Andreas, Hansen Nils JV, Kuai Letian, Kunig Verena BK, Xiaojie Lu, Madsen Daniel, Marcaurelle Lisa A et al (2022) DNA-encoded chemical libraries. Nature Rev Methods Primers 2(1):3","journal-title":"Nature Rev Methods Primers"},{"issue":"21","key":"982_CR27","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1016\/j.drudis.2013.07.001","volume":"18","author":"Zheng Wei","year":"2013","unstructured":"Wei Zheng, Natasha Thorne, McKew John C (2013) Phenotypic screens as a renewed approach for drug discovery. Drug Disc Today 18(21):1067\u20131073","journal-title":"Drug Disc Today"},{"key":"982_CR28","first-page":"2881","volume":"33","author":"Vitaly Feldman","year":"2020","unstructured":"Feldman Vitaly, Zhang Chiyuan (2020) What neural networks memorize and why: discovering the long tail via influence estimation. Adv Neural Inf Process Syst 33:2881\u20132891","journal-title":"Adv Neural Inf Process Syst"},{"issue":"1","key":"982_CR29","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"Weininger David","year":"1988","unstructured":"David Weininger (1988) Smiles, a chemical language and information system 1 introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336","journal-title":"J Chem Inf Comput Sci"},{"issue":"5","key":"982_CR30","doi-asserted-by":"crossref","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"David Rogers","year":"2010","unstructured":"Rogers David, Hahn Mathew (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742\u2013754","journal-title":"J Chem Inf Model"},{"issue":"6","key":"982_CR31","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1021\/ci010132r","volume":"42","author":"Joseph L Durant","year":"2002","unstructured":"Durant Joseph L, Leland Burton A, Henry Douglas R, Nourse James G (2002) Reoptimization of mdl keys for use in drug discovery. J Chem Inf Comput Sci 42(6):1273\u20131280","journal-title":"J Chem Inf Comput Sci"},{"key":"982_CR32","unstructured":"Greg Landrum, Paolo Tosco, Brian Kelley, Ricardo Rodriguez, David Cosgrove, Riccardo Vianello et al (2024) rdkit\/rdkit: 2024_09_1 (q3 2024) release. https:\/\/www.rdkit.org"},{"issue":"8","key":"982_CR33","doi-asserted-by":"crossref","first-page":"3370","DOI":"10.1021\/acs.jcim.9b00237","volume":"59","author":"Kevin Yang","year":"2019","unstructured":"Yang Kevin, Swanson Kyle, Jin Wengong, Coley Connor, Eiden Philipp, Gao Hua, Guzman-Perez Angel, Hopper Timothy, Kelley Brian, Mathea Miriam et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59(8):3370\u20133388","journal-title":"J Chem Inf Model"},{"key":"982_CR34","first-page":"1","volume":"12","author":"Pavel Karpov","year":"2020","unstructured":"Karpov Pavel, Godin Guillaume, Tetko Igor V (2020) Swiss knife for QSAR modeling and interpretation Transformer-CNN. J Chem 12:1\u201312","journal-title":"J Chem"},{"key":"982_CR35","first-page":"87654","volume":"32","author":"Adam Paszke","year":"2019","unstructured":"Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca et al (2019) Pytorch: An imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:87654","journal-title":"Adv Neural Inf Process Syst"},{"issue":"1","key":"982_CR36","doi-asserted-by":"crossref","first-page":"1180","DOI":"10.1093\/nar\/gkad1004","volume":"52","author":"Zdrazil Barbara","year":"2024","unstructured":"Barbara Zdrazil, Eloy Felix, Fiona Hunter, Manners Emma J, James Blackshaw, Sybilla Corbett, de Veij Marleen, Ioannidis Harris, Lopez David Mendez, Mosquera Juan F, et al (2024) The chembl database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 52(1):1180-D1192","journal-title":"Nucleic Acids Res"},{"key":"982_CR37","doi-asserted-by":"crossref","unstructured":"Akiba Takuya, Sano Shotaro, Yanase Toshihiko, Ohta Takeru, Koyama Masanori (2019) Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623\u20132631","DOI":"10.1145\/3292500.3330701"},{"key":"982_CR38","unstructured":"Loshchilov Ilya, Hutter Frank, et\u00a0al (2017) Fixing weight decay regularization in adam. arXiv preprint arXiv:1711.05101, 5"},{"key":"982_CR39","unstructured":"Kingma Diederik\u00a0P (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00982-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-00982-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00982-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T22:15:31Z","timestamp":1743027331000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-00982-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,26]]},"references-count":39,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["982"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-00982-w","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,26]]},"assertion":[{"value":"10 December 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"38"}}