{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,25]],"date-time":"2026-01-25T05:10:40Z","timestamp":1769317840831,"version":"3.49.0"},"reference-count":67,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T00:00:00Z","timestamp":1760486400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T00:00:00Z","timestamp":1760486400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-22-PEBB-0008"],"award-info":[{"award-number":["ANR-22-PEBB-0008"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-22-PEBB-0008"],"award-info":[{"award-number":["ANR-22-PEBB-0008"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-22-PEBB-0008"],"award-info":[{"award-number":["ANR-22-PEBB-0008"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-22-PEBB-0008"],"award-info":[{"award-number":["ANR-22-PEBB-0008"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Reverse engineering in molecular design aims to identify optimal structures based on activities, or properties, computed through molecular descriptors like fingerprints. This task is known to be particularly difficult for the widely used Extended-Connectivity Fingerprints (ECFPs), due to significant loss of structural information during vectorization. While recent artificial intelligence-based works have raised awareness about the privacy risks associated with ECFP-based data sharing, we contribute a more conclusive demonstration by introducing a deterministic algorithm that reconstructs molecular structures from ECFPs. Using MetaNetX and eMolecules as databases of natural compounds and commercially available chemicals, the deterministic algorithm benchmarks a Transformer-based generative model trained to predict SMILES from ECFPs. The generative model achieves a top-ranked retrieval accuracy of 95.64% but struggles with exhaustive enumeration. Additionally, applying the deterministic method to a drug dataset reveals its potential for de novo drug design, as many of the reverse-engineered structures are found to be patented or supported by bioassay data.<\/jats:p>\n          <jats:p>\n            <jats:bold>Graphical Abstract<\/jats:bold>\n          <\/jats:p>","DOI":"10.1186\/s13321-025-01074-5","type":"journal-article","created":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T09:24:14Z","timestamp":1760520254000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Reverse engineering molecules from fingerprints through deterministic enumeration and generative models"],"prefix":"10.1186","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0618-2947","authenticated-orcid":false,"given":"Philippe","family":"Meyer","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2649-2950","authenticated-orcid":false,"given":"Thomas","family":"Duigou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0143-5535","authenticated-orcid":false,"given":"Guillaume","family":"Gricourt","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4274-2953","authenticated-orcid":false,"given":"Jean-Loup","family":"Faulon","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,10,15]]},"reference":[{"issue":"12","key":"1074_CR1","doi-asserted-by":"publisher","first-page":"4609","DOI":"10.1002\/app.1985.070301208","volume":"30","author":"GC Derringer","year":"1985","unstructured":"Derringer GC, Markham RL (1985) A computer-based methodology for matching polymer structures with required properties. J Appl Polym Sci 30(12):4609\u20134617. https:\/\/doi.org\/10.1002\/app.1985.070301208","journal-title":"J Appl Polym Sci"},{"key":"1074_CR2","doi-asserted-by":"publisher","unstructured":"Joback KG, Stephanopoulos G (1995) Searching spaces of discrete solutions: the design of molecules possessing desired physical properties. In: Advances in chemical engineering, vol 21, Elsevier, pp 257\u2013311. https:\/\/doi.org\/10.1016\/S0065-2377(08)60075-7.","DOI":"10.1016\/S0065-2377(08)60075-7"},{"issue":"6604","key":"1074_CR3","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1038\/384014a0","volume":"384","author":"JR Broach","year":"1996","unstructured":"Broach JR, Thorner J et al (1996) High-throughput screening for drug discovery. Nature 384(6604):14\u201316","journal-title":"Nature"},{"issue":"9","key":"1074_CR4","doi-asserted-by":"publisher","first-page":"833","DOI":"10.1016\/0098-1354(93)E0023-3","volume":"18","author":"V Venkatasubramanian","year":"1994","unstructured":"Venkatasubramanian V, Chan K, Caruthers JM (1994) Computer-aided molecular design using genetic algorithms. Comput Chem Eng 18(9):833\u2013844. https:\/\/doi.org\/10.1016\/0098-1354(93)E0023-3","journal-title":"Comput Chem Eng"},{"issue":"2","key":"1074_CR5","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1021\/ci600383v","volume":"48","author":"BB Masek","year":"2008","unstructured":"Masek BB, Shen L, Smith KM, Pearlman RS (2008) Sharing chemical information without sharing chemical structure. J Chem Inf Model 48(2):256\u2013261. https:\/\/doi.org\/10.1021\/ci600383v","journal-title":"J Chem Inf Model"},{"key":"1074_CR6","doi-asserted-by":"publisher","unstructured":"Cayley  A On the analytic forms called trees, with applications to the theory of chemical combinations, Report of the British\nAssociation for the Advancement of Science, Vol.45, British Association for the Advancement of Science, London, pp.257\u2013305, 1875,\nhttps:\/\/doi.org\/10.1017\/cbo9780511703751.056","DOI":"10.1017\/cbo9780511703751.056"},{"key":"1074_CR7","doi-asserted-by":"publisher","unstructured":"Mariya I, Skvortsova Igor I, Baskin and Olga L. Slovokhotova, Vladimir A. Palyulinand and Nikolai S. Zefirov, Inverse problem in\nQSAR\/QSPR studies for the case of topological indexes characterizing molecular shape (Kier indices), Journal of Chemical\nInformation and Computer Sciences, Vol.33(4), American Chemical Society, pp.630-634, 1993, https:\/\/doi.org\/10.1021\/ci00014a017","DOI":"10.1021\/ci00014a017"},{"issue":"7","key":"1074_CR8","doi-asserted-by":"publisher","first-page":"1345","DOI":"10.1021\/ci700385a","volume":"48","author":"H Fujiwara","year":"2008","unstructured":"Fujiwara H, Wang J, Zhao L, Nagamochi H, Akutsu T (2008) Enumerating treelike chemical graphs with given path frequency. J Chem Inf Model 48(7):1345\u20131357. https:\/\/doi.org\/10.1021\/ci700385a","journal-title":"J Chem Inf Model"},{"issue":"2","key":"1074_CR9","doi-asserted-by":"publisher","first-page":"633","DOI":"10.1109\/TCBB.2016.2628888","volume":"15","author":"J Li","year":"2018","unstructured":"Li J, Nagamochi H, Akutsu T (2018) Enumerating substituted benzene isomers of tree-like chemical graphs. IEEE ACM Trans Comput Biol Bioinform 15(2):633\u2013646. https:\/\/doi.org\/10.1109\/TCBB.2016.2628888","journal-title":"IEEE ACM Trans Comput Biol Bioinform"},{"issue":"1","key":"1074_CR10","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1008504","volume":"17","author":"MA Yirik","year":"2021","unstructured":"Yirik MA, Steinbeck C (2021) Chemical graph generators. PLoS Comput Biol 17(1):e1008504. https:\/\/doi.org\/10.1371\/journal.pcbi.1008504","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"1074_CR11","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s13321-021-00529-9","volume":"13","author":"MA Yirik","year":"2021","unstructured":"Yirik MA, Sorokina M, Steinbeck C (2021) MAYGEN: an open-source chemical structure generator for constitutional isomers based on the orderly generation principle. J Cheminform 13(1):48. https:\/\/doi.org\/10.1186\/s13321-021-00529-9","journal-title":"J Cheminform"},{"issue":"1","key":"1074_CR12","doi-asserted-by":"publisher","first-page":"24","DOI":"10.1186\/s13321-022-00604-9","volume":"14","author":"BD McKay","year":"2022","unstructured":"McKay BD, Yirik MA, Steinbeck C (2022) Surge: a fast open-source chemical graph generator. J Cheminform 14(1):24. https:\/\/doi.org\/10.1186\/s13321-022-00604-9","journal-title":"J Cheminform"},{"issue":"25","key":"1074_CR13","doi-asserted-by":"publisher","first-page":"8732","DOI":"10.1021\/ja902302h","volume":"131","author":"LC Blum","year":"2009","unstructured":"Blum LC, Reymond J-L (2009) 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J Am Chem Soc 131(25):8732\u20138733. https:\/\/doi.org\/10.1021\/ja902302h","journal-title":"J Am Chem Soc"},{"issue":"11","key":"1074_CR14","doi-asserted-by":"publisher","first-page":"2864","DOI":"10.1021\/ci300415d","volume":"52","author":"L Ruddigkeit","year":"2012","unstructured":"Ruddigkeit L, Van Deursen R, Blum LC, Reymond J-L (2012) Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J Chem Inf Model 52(11):2864\u20132875. https:\/\/doi.org\/10.1021\/ci300415d","journal-title":"J Chem Inf Model"},{"issue":"3","key":"1074_CR15","doi-asserted-by":"publisher","first-page":"707","DOI":"10.1021\/ci020345w","volume":"43","author":"J-L Faulon","year":"2003","unstructured":"Faulon J-L, Visco DP, Pophale RS (2003) The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J Chem Inf Comput Sci 43(3):707\u2013720. https:\/\/doi.org\/10.1021\/ci020345w","journal-title":"J Chem Inf Comput Sci"},{"key":"1074_CR16","unstructured":"Daylight Theory Manual (2024) https:\/\/www.daylight.com\/dayhtml\/doc\/theory\/theory.smarts.html. Accessed 30 Oct 2024"},{"issue":"3","key":"1074_CR17","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1021\/ci020346o","volume":"43","author":"J-L Faulon","year":"2003","unstructured":"Faulon J-L, Churchwell CJ, Visco DP (2003) The signature molecular descriptor. 2. Enumerating molecules from their extended valence sequences. J Chem Inf Comput Sci 43(3):721\u2013734. https:\/\/doi.org\/10.1021\/ci020346o","journal-title":"J Chem Inf Comput Sci"},{"issue":"4","key":"1074_CR18","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1016\/j.jmgm.2003.10.002","volume":"22","author":"CJ Churchwell","year":"2004","unstructured":"Churchwell CJ, Rintoul MD, Martin S, Visco DP, Kotu A, Larson RS, Sillerud LO, Brown DC, Faulon J-L (2004) The signature molecular descriptor. 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides. J Mol Graph Model 22(4):263\u2013273. https:\/\/doi.org\/10.1016\/j.jmgm.2003.10.002","journal-title":"J Mol Graph Model"},{"key":"1074_CR19","doi-asserted-by":"publisher","unstructured":"Martin S, Brown WM, Faulon JL, Weis D, Visco D (2005) Inverse design of large molecules using linear diophantine equations. In: 2005 IEEE computational systems bioinformatics conference\u2014workshops (CSBW\u201905). IEEE, Stanford, CA, pp 11\u201316. https:\/\/doi.org\/10.1109\/CSBW.2005.79.","DOI":"10.1109\/CSBW.2005.79"},{"issue":"7","key":"1074_CR20","doi-asserted-by":"publisher","first-page":"1787","DOI":"10.1021\/ci3001748","volume":"52","author":"S Martin","year":"2012","unstructured":"Martin S (2012) Lattice enumeration for inverse molecular design using the signature descriptor. J Chem Inf Model 52(7):1787\u20131797. https:\/\/doi.org\/10.1021\/ci3001748","journal-title":"J Chem Inf Model"},{"issue":"8","key":"1074_CR21","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1038\/nrd1799","volume":"4","author":"G Schneider","year":"2005","unstructured":"Schneider G, Fechner U (2005) Computer-based de novo design of drug-like molecules. Nat Rev Drug Discov 4(8):649\u2013663. https:\/\/doi.org\/10.1038\/nrd1799","journal-title":"Nat Rev Drug Discov"},{"issue":"1","key":"1074_CR22","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1002\/(SICI)1098-1128(199601)16:1<3::AID-MED1>3.0.CO;2-6","volume":"16","author":"RS Bohacek","year":"1996","unstructured":"Bohacek RS, McMartin C, Guida WC (1996) The art and practice of structure-based drug design: a molecular modeling perspective. Med Res Rev 16(1):3\u201350. https:\/\/doi.org\/10.1002\/(SICI)1098-1128(199601)16:1%3c3::AID-MED1%3e3.0.CO;2-6","journal-title":"Med Res Rev"},{"issue":"3","key":"1074_CR23","doi-asserted-by":"publisher","first-page":"672","DOI":"10.1038\/s41596-021-00659-2","volume":"17","author":"F Gentile","year":"2022","unstructured":"Gentile F, Yaacoub JC, Gleave J, Fernandez M, Ton A-T, Ban F, Stern A, Cherkasov A (2022) Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc 17(3):672\u2013697. https:\/\/doi.org\/10.1038\/s41596-021-00659-2","journal-title":"Nat Protoc"},{"issue":"4","key":"1074_CR24","doi-asserted-by":"publisher","first-page":"390","DOI":"10.1039\/d2dd00003b","volume":"1","author":"A Nigam","year":"2022","unstructured":"Nigam A, Pollice R, Aspuru-Guzik A (2022) Parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design. Digit Discov 1(4):390\u2013404. https:\/\/doi.org\/10.1039\/d2dd00003b","journal-title":"Digit Discov"},{"issue":"2","key":"1074_CR25","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1038\/s42256-023-00788-1","volume":"6","author":"KM Jablonka","year":"2024","unstructured":"Jablonka KM, Schwaller P, Ortega-Guerrero A, Smit B (2024) Leveraging large language models for predictive chemistry. Nat Mach Intell 6(2):161\u2013169. https:\/\/doi.org\/10.1038\/s42256-023-00788-1","journal-title":"Nat Mach Intell"},{"issue":"5","key":"1074_CR26","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1038\/s42256-020-0174-5","volume":"2","author":"P-C Kotsias","year":"2020","unstructured":"Kotsias P-C, Ar\u00fas-Pous J, Chen H, Engkvist O, Tyrchan C, Bjerrum EJ (2020) Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat Mach Intell 2(5):254\u2013265. https:\/\/doi.org\/10.1038\/s42256-020-0174-5","journal-title":"Nat Mach Intell"},{"issue":"2","key":"1074_CR27","doi-asserted-by":"publisher","first-page":"268","DOI":"10.1021\/acscentsci.7b00572","volume":"4","author":"R G\u00f3mez-Bombarelli","year":"2018","unstructured":"G\u00f3mez-Bombarelli R, Wei JN, Duvenaud D, Hern\u00e1ndez-Lobato JM, S\u00e1nchez-Lengeling B, Sheberla D, Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4(2):268\u2013276. https:\/\/doi.org\/10.1021\/acscentsci.7b00572","journal-title":"ACS Cent Sci"},{"issue":"1","key":"1074_CR28","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1021\/acscentsci.7b00512","volume":"4","author":"MHS Segler","year":"2018","unstructured":"Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4(1):120\u2013131. https:\/\/doi.org\/10.1021\/acscentsci.7b00512","journal-title":"ACS Cent Sci"},{"issue":"1","key":"1074_CR29","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-023-00694-z","volume":"15","author":"X Liu","year":"2023","unstructured":"Liu X, Ye K, Van Vlijmen HWT, IJzerman AP, Van Westen GJP (2023) Drugex v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J Cheminform 15(1):24. https:\/\/doi.org\/10.1186\/s13321-023-00694-z","journal-title":"J Cheminform"},{"issue":"1","key":"1074_CR30","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-024-00812-5","volume":"16","author":"HH Loeffler","year":"2024","unstructured":"Loeffler HH, He J, Tibo A, Janet JP, Voronov A, Mervin LH, Engkvist O (2024) Reinvent 4: modern AI\u2013driven generative molecule design. J Cheminform 16(1):20. https:\/\/doi.org\/10.1186\/s13321-024-00812-5","journal-title":"J Cheminform"},{"issue":"4","key":"1074_CR31","doi-asserted-by":"publisher","first-page":"437","DOI":"10.1038\/s42256-024-00821-x","volume":"6","author":"MA Skinnider","year":"2024","unstructured":"Skinnider MA (2024) Invalid SMILES are beneficial rather than detrimental to chemical language models. Nat Mach Intell 6(4):437\u2013448. https:\/\/doi.org\/10.1038\/s42256-024-00821-x","journal-title":"Nat Mach Intell"},{"issue":"5","key":"1074_CR32","doi-asserted-by":"publisher","first-page":"742","DOI":"10.1021\/ci100050t","volume":"50","author":"D Rogers","year":"2010","unstructured":"Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742\u2013754. https:\/\/doi.org\/10.1021\/ci100050t","journal-title":"J Chem Inf Model"},{"key":"1074_CR33","unstructured":"Ramsundar B, Kearnes S, Riley P, Webster D, Konerding D, Pande V (2015) Massively multitask networks for drug discovery. arXiv February 6, 2015. http:\/\/arxiv.org\/abs\/1502.02072. Accessed 29 Oct 2024"},{"issue":"3","key":"1074_CR34","doi-asserted-by":"publisher","first-page":"542","DOI":"10.1021\/ci700372s","volume":"48","author":"R Liu","year":"2008","unstructured":"Liu R, Zhou D (2008) Using molecular fingerprint as descriptors in the QSPR study of lipophilicity. J Chem Inf Model 48(3):542\u2013549. https:\/\/doi.org\/10.1021\/ci700372s","journal-title":"J Chem Inf Model"},{"issue":"7698","key":"1074_CR35","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/nature25978","volume":"555","author":"MHS Segler","year":"2018","unstructured":"Segler MHS, Preuss M, Waller MP (2018) Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555(7698):604\u2013610. https:\/\/doi.org\/10.1038\/nature25978","journal-title":"Nature"},{"key":"1074_CR36","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2020.606668","volume":"11","author":"L Xie","year":"2020","unstructured":"Xie L, Xu L, Kong R, Chang S, Xu X (2020) Improvement of prediction performance with conjoint molecular fingerprint in deep learning. Front Pharmacol 11:606668. https:\/\/doi.org\/10.3389\/fphar.2020.606668","journal-title":"Front Pharmacol"},{"issue":"6","key":"1074_CR37","doi-asserted-by":"publisher","first-page":"1379","DOI":"10.1016\/j.chempr.2020.02.017","volume":"6","author":"F Sandfort","year":"2020","unstructured":"Sandfort F, Strieth-Kalthoff F, K\u00fchnemund M, Beecks C, Glorius F (2020) A structure-based platform for predicting chemical reactivity. Chem 6(6):1379\u20131390. https:\/\/doi.org\/10.1016\/j.chempr.2020.02.017","journal-title":"Chem"},{"issue":"2","key":"1074_CR38","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The generation of a unique machine description for chemical structures-a technique developed at Chemical Abstracts Service. J Chem Doc 5(2):107\u2013113. https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J Chem Doc"},{"key":"1074_CR39","doi-asserted-by":"publisher","unstructured":"Xu Z, Wang S, Zhu F, Huang J (2017) Seq2seq fingerprint: an unsupervised deep molecular embedding for drug discovery. In: Proceedings of the 8th ACM international conference on bioinformatics, computational biology,and health informatics. ACM, Boston, MA, pp 285\u2013294. https:\/\/doi.org\/10.1145\/3107411.3107424.","DOI":"10.1145\/3107411.3107424"},{"issue":"19\u201320","key":"1074_CR40","doi-asserted-by":"publisher","first-page":"1014","DOI":"10.1016\/j.drudis.2012.10.011","volume":"18","author":"T Kogej","year":"2013","unstructured":"Kogej T, Blomberg N, Greasley PJ, Mundt S, Vainio MJ, Schamberger J, Schmidt G, H\u00fcser J (2013) Big Pharma screening collections: more of the same or unique libraries? The AstraZeneca\u2013Bayer Pharma AG case. Drug Discov Today 18(19\u201320):1014\u20131024. https:\/\/doi.org\/10.1016\/j.drudis.2012.10.011","journal-title":"Drug Discov Today"},{"issue":"7","key":"1074_CR41","doi-asserted-by":"publisher","first-page":"2331","DOI":"10.1021\/acs.jcim.3c00799","volume":"64","author":"W Heyndrickx","year":"2024","unstructured":"Heyndrickx W, Mervin L, Morawietz T, Sturm N, Friedrich L, Zalewski A, Pentina A, Humbeck L, Oldenhof M, Niwayama R, Schmidtke P, Fechner N, Simm J, Arany A, Drizard N, Jabal R, Afanasyeva A, Loeb R, Verma S, Harnqvist S, Holmes M, Pejo B, Telenczuk M, Holway N, Dieckmann A, Rieke N, Zumsande F, Clevert D-A, Krug M, Luscombe C, Green D, Ertl P, Antal P, Marcus D, Do Huu N, Fuji H, Pickett S, Acs G, Boniface E, Beck B, Sun Y, Gohier A, Rippmann F, Engkvist O, G\u00f6ller AH, Moreau Y, Galtier MN, Schuffenhauer A, Ceulemans H (2024) MELLODDY: cross-pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information. J Chem Inf Model 64(7):2331\u20132344. https:\/\/doi.org\/10.1021\/acs.jcim.3c00799","journal-title":"J Chem Inf Model"},{"issue":"10","key":"1074_CR42","doi-asserted-by":"publisher","first-page":"4487","DOI":"10.1021\/acs.jcim.0c00321","volume":"60","author":"P Maragakis","year":"2020","unstructured":"Maragakis P, Nisonoff H, Cole B, Shaw DE (2020) A deep-learning view of chemical space designed to facilitate drug discovery. J Chem Inf Model 60(10):4487\u20134496. https:\/\/doi.org\/10.1021\/acs.jcim.0c00321","journal-title":"J Chem Inf Model"},{"issue":"38","key":"1074_CR43","doi-asserted-by":"publisher","first-page":"10378","DOI":"10.1039\/D0SC03115A","volume":"11","author":"T Le","year":"2020","unstructured":"Le T, Winter R, No\u00e9 F, Clevert D-A (2020) Neuraldecipher\u2014reverse-engineering extended-connectivity fingerprints (ECFPs) to their molecular structures. Chem Sci 11(38):10378\u201310389. https:\/\/doi.org\/10.1039\/D0SC03115A","journal-title":"Chem Sci"},{"issue":"1","key":"1074_CR44","doi-asserted-by":"publisher","DOI":"10.1186\/s13321-023-00693-0","volume":"15","author":"UV Ucak","year":"2023","unstructured":"Ucak UV, Ashyrmamatov I, Lee J (2023) Reconstruction of lossless molecular representations from fingerprints. J Cheminform 15(1):26. https:\/\/doi.org\/10.1186\/s13321-023-00693-0","journal-title":"J Cheminform"},{"key":"1074_CR45","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin,\nAttention Is All You Need, preprint, 2017, http:\/\/arxiv.org\/abs\/1706.03762"},{"issue":"1","key":"1074_CR46","doi-asserted-by":"publisher","first-page":"17304","DOI":"10.1038\/s41598-021-96812-8","volume":"11","author":"Y Kwon","year":"2021","unstructured":"Kwon Y, Kang S, Choi Y-S, Kim I (2021) Evolutionary design of molecules based on deep learning and a genetic algorithm. Sci Rep 11(1):17304. https:\/\/doi.org\/10.1038\/s41598-021-96812-8","journal-title":"Sci Rep"},{"issue":"D1","key":"1074_CR47","doi-asserted-by":"publisher","first-page":"D523","DOI":"10.1093\/nar\/gkv1117","volume":"44","author":"S Moretti","year":"2016","unstructured":"Moretti S, Martin O, Van Du Tran T, Bridge A, Morgat A, Pagni M (2016) Metanetx\/mnxref \u2013 reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks. Nucleic Acids Res 44(D1):D523\u2013D526. https:\/\/doi.org\/10.1093\/nar\/gkv1117","journal-title":"Nucleic Acids Res"},{"key":"1074_CR48","unstructured":"eMolecules. http:\/\/www.emolecules.com\/. Accessed 01 Jul 2024."},{"issue":"1","key":"1074_CR49","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s13321-020-0416-x","volume":"12","author":"D Probst","year":"2020","unstructured":"Probst D, Reymond J-L (2020) Visualization of very large high-dimensional data sets as minimum spanning trees. J Cheminform 12(1):12. https:\/\/doi.org\/10.1186\/s13321-020-0416-x","journal-title":"J Cheminform"},{"key":"1074_CR50","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1016\/0022-5193(66)90013-0","volume":"13","author":"EC Pielou","year":"1966","unstructured":"Pielou EC (1966) The measurement of diversity in different types of biological collections. J Theor Biol 13:131\u2013144. https:\/\/doi.org\/10.1016\/0022-5193(66)90013-0","journal-title":"J Theor Biol"},{"key":"1074_CR51","doi-asserted-by":"publisher","DOI":"10.1021\/acs.jcim.5c00334","author":"Y Buehler","year":"2025","unstructured":"Buehler Y, Reymond J-L (2025) A view on molecular complexity from the GDB chemical space. J Chem Inf Model. https:\/\/doi.org\/10.1021\/acs.jcim.5c00334","journal-title":"J Chem Inf Model"},{"issue":"1","key":"1074_CR52","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1007\/BF01164638","volume":"12","author":"D Plav\u0161i\u0107","year":"1993","unstructured":"Plav\u0161i\u0107 D, Nikoli\u0107 S, Trinajsti\u0107 N, Mihali\u0107 Z (1993) On the Harary index for the characterization of chemical graphs. J Math Chem 12(1):235\u2013250. https:\/\/doi.org\/10.1007\/BF01164638","journal-title":"J Math Chem"},{"issue":"9","key":"1074_CR53","doi-asserted-by":"publisher","first-page":"2014","DOI":"10.1016\/j.bmcl.2017.03.008","volume":"27","author":"JR Proudfoot","year":"2017","unstructured":"Proudfoot JR (2017) A path based approach to assessing molecular complexity. Bioorg Med Chem Lett 27(9):2014\u20132017. https:\/\/doi.org\/10.1016\/j.bmcl.2017.03.008","journal-title":"Bioorg Med Chem Lett"},{"issue":"D1","key":"1074_CR54","doi-asserted-by":"publisher","first-page":"D1265","DOI":"10.1093\/nar\/gkad976","volume":"52","author":"C Knox","year":"2024","unstructured":"Knox C, Wilson M, Klinger CM, Franklin M, Oler E, Wilson A, Pon A, Cox J, Chin NE, Strawbridge SA, Klinger C, Strawbridge S, Garcia-Patino M, Kruger R, Sivakumaran A, Sanford S, Doshi R, Khetarpal N, Fatokun O, Doucet D, Zubkowski A, Rayat D, Jackson H, Harford K, Anjum A, Zakir M, Wang F, Tian S, Lee B, Liigand J, Peters H, Wang RQ, Nguyen T, So D, Sharp M, da Silva R, Gabriel C, Scantlebury J, Jasinski M, Ackerman D, Jewison T, Sajed T, Gautam V, Wishart D (2024) DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res 52(D1):D1265\u2013D1275","journal-title":"Nucleic Acids Res"},{"issue":"15","key":"1074_CR55","doi-asserted-by":"publisher","first-page":"4631","DOI":"10.1021\/jm900326c","volume":"52","author":"MJ Gorczynski","year":"2009","unstructured":"Gorczynski MJ, Smitherman PK, Akiyama TE, Wood HB, Berger JP, King SB, Morrow CS (2009) Activation of peroxisome proliferator-activated receptor gamma (PPARgamma) by nitroalkene fatty acids: importance of nitration position and degree of unsaturation. J Med Chem 52(15):4631\u20134639. https:\/\/doi.org\/10.1021\/jm900326c","journal-title":"J Med Chem"},{"issue":"17","key":"1074_CR56","doi-asserted-by":"publisher","first-page":"5285","DOI":"10.1021\/jm800321h","volume":"51","author":"RD Richardson","year":"2008","unstructured":"Richardson RD, Ma G, Oyola Y, Zancanella M, Knowles LM, Cieplak P, Romo D, Smith JW (2008) Synthesis of novel beta-lactone inhibitors of fatty acid synthase. J Med Chem 51(17):5285\u20135296. https:\/\/doi.org\/10.1021\/jm800321h","journal-title":"J Med Chem"},{"key":"1074_CR57","unstructured":"Landrum G, et al (2016) Rdkit: Open-Source Cheminformatics Software. 149 (150): 650. http:\/\/www.rdkit.org\/"},{"issue":"1","key":"1074_CR58","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31\u201336. https:\/\/doi.org\/10.1021\/ci00057a005","journal-title":"J Chem Inf Comput Sci"},{"issue":"D1","key":"1074_CR59","doi-asserted-by":"publisher","first-page":"D1100","DOI":"10.1093\/nar\/gkr777","volume":"40","author":"A Gaulton","year":"2012","unstructured":"Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(D1):D1100\u2013D1107. https:\/\/doi.org\/10.1093\/nar\/gkr777","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"1074_CR60","doi-asserted-by":"publisher","first-page":"2978","DOI":"10.1137\/090778043","volume":"20","author":"I Aliev","year":"2010","unstructured":"Aliev I, Henk M (2010) Feasibility of integer knapsacks. SIAM J Optim 20(6):2978\u20132993. https:\/\/doi.org\/10.1137\/090778043","journal-title":"SIAM J Optim"},{"issue":"1","key":"1074_CR61","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1006\/inco.1994.1067","volume":"113","author":"E Contejean","year":"1994","unstructured":"Contejean E, Devie H (1994) An efficient incremental algorithm for solving systems of linear Diophantine equations. Inf Comput 113(1):143\u2013172. https:\/\/doi.org\/10.1006\/inco.1994.1067","journal-title":"Inf Comput"},{"key":"1074_CR62","doi-asserted-by":"publisher","unstructured":"Domenjoud E(1991) Solving systems of linear diophantine equations: an algebraic approach. In: Tarlecki A (ed) Mathematical foundations of computer science. Goos G, Hartmanis J, (Series eds) Lecture notes in computer science, vol. 520, Springer Berlin Heidelberg, Berlin, Heidelberg, pp 141\u2013150. https:\/\/doi.org\/10.1007\/3-540-54345-7_57.","DOI":"10.1007\/3-540-54345-7_57"},{"key":"1074_CR63","volume-title":"The theory of partitions, 1st","author":"GE Andrews","year":"1998","unstructured":"Andrews GE (1998) The theory of partitions, 1st, paperback. Cambridge mathematical library; Cambridge University Press, Cambridge","edition":"paperback"},{"key":"1074_CR64","volume-title":"Algebra","author":"S Lang","year":"2012","unstructured":"Lang S (2012) Algebra, vol 211. Springer Science & Business Media"},{"issue":"28","key":"1074_CR65","doi-asserted-by":"publisher","first-page":"6091","DOI":"10.1039\/C8SC02339E","volume":"9","author":"P Schwaller","year":"2018","unstructured":"Schwaller P, Gaudin T, L\u00e1nyi D, Bekas C, Laino T (2018) \u201cFound in Translation\u201d: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem Sci 9(28):6091\u20136098. https:\/\/doi.org\/10.1039\/C8SC02339E","journal-title":"Chem Sci"},{"key":"1074_CR66","unstructured":"Loshchilov I (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101."},{"key":"1074_CR67","unstructured":"ReduceLROnPlateau. https:\/\/pytorch.org\/docs\/stable\/generated\/torch.optim.lr_scheduler.ReduceLROnPlateau.html."}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01074-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01074-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01074-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,15]],"date-time":"2025-10-15T09:24:19Z","timestamp":1760520259000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01074-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,15]]},"references-count":67,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1074"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01074-5","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,15]]},"assertion":[{"value":"26 March 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 October 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"157"}}