{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,9]],"date-time":"2026-06-09T14:50:08Z","timestamp":1781016608080,"version":"3.54.1"},"reference-count":78,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T00:00:00Z","timestamp":1715299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Comput Sci"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule\u2013ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein\u2013ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein\u2013ligand complexes in explicit water is included, accumulating over 170\u2009\u03bcs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.<\/jats:p>","DOI":"10.1038\/s43588-024-00627-2","type":"journal-article","created":{"date-parts":[[2024,5,10]],"date-time":"2024-05-10T06:01:59Z","timestamp":1715320919000},"page":"367-378","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":64,"title":["MISATO: machine learning dataset of protein\u2013ligand complexes for structure-based drug discovery"],"prefix":"10.1038","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-5160-8100","authenticated-orcid":false,"given":"Till","family":"Siebenmorgen","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7630-5447","authenticated-orcid":false,"given":"Filipe","family":"Menezes","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sabrina","family":"Benassou","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Erinc","family":"Merdivan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6839-3320","authenticated-orcid":false,"given":"Kieran","family":"Didi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andr\u00e9 Santos Dias","family":"Mour\u00e3o","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rados\u0142aw","family":"Kitel","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pietro","family":"Li\u00f2","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0940-5752","authenticated-orcid":false,"given":"Stefan","family":"Kesselheim","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Marie","family":"Piraud","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2419-1943","authenticated-orcid":false,"given":"Fabian J.","family":"Theis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1594-0527","authenticated-orcid":false,"given":"Michael","family":"Sattler","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2818-7498","authenticated-orcid":false,"given":"Grzegorz M.","family":"Popowicz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,5,10]]},"reference":[{"key":"627_CR1","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","volume":"596","author":"J Jumper","year":"2021","unstructured":"Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583\u2013589 (2021).","journal-title":"Nature"},{"key":"627_CR2","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1038\/nsb1203-980","volume":"10","author":"H Berman","year":"2003","unstructured":"Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980 (2003).","journal-title":"Nat. Struct. Mol. Biol."},{"key":"627_CR3","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.trci.2017.10.005","volume":"3","author":"RC Mohs","year":"2017","unstructured":"Mohs, R. C. & Greig, N. H. Drug discovery and development: role of basic biological research. Alzheimer\u2019s Dement. Transl. Res. Clin. Interv. 3, 651\u2013657 (2017).","journal-title":"Alzheimer\u2019s Dement. Transl. Res. Clin. Interv."},{"key":"627_CR4","doi-asserted-by":"publisher","first-page":"334","DOI":"10.1124\/pr.112.007336","volume":"66","author":"G Sliwoski","year":"2014","unstructured":"Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational methods in drug discovery. Pharm. Rev. 66, 334\u2013395 (2014).","journal-title":"Pharm. Rev."},{"key":"627_CR5","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1002\/wcms.1161","volume":"4","author":"W Thiel","year":"2014","unstructured":"Thiel, W. Semiempirical quantum-chemical methods. WIREs Comput. Mol. Sci. 4, 145\u2013157 (2014).","journal-title":"WIREs Comput. Mol. Sci."},{"key":"627_CR6","doi-asserted-by":"publisher","first-page":"1129","DOI":"10.1016\/j.neuron.2018.08.011","volume":"99","author":"SA Hollingsworth","year":"2018","unstructured":"Hollingsworth, S. A. & Dror, R. O. Molecular dynamics simulation for all. Neuron 99, 1129\u20131143 (2018).","journal-title":"Neuron"},{"key":"627_CR7","doi-asserted-by":"publisher","first-page":"e1448","DOI":"10.1002\/wcms.1448","volume":"10","author":"T Siebenmorgen","year":"2020","unstructured":"Siebenmorgen, T. & Zacharias, M. Computational prediction of protein\u2013protein binding affinities. WIREs Comput. Mol. Sci. 10, e1448 (2020).","journal-title":"WIREs Comput. Mol. Sci."},{"key":"627_CR8","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1002\/jcc.21334","volume":"31","author":"O Trott","year":"2010","unstructured":"Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455\u2013461 (2010).","journal-title":"J. Comput. Chem."},{"key":"627_CR9","doi-asserted-by":"publisher","first-page":"7898","DOI":"10.1021\/acs.chemrev.6b00163","volume":"116","author":"S Kmiecik","year":"2016","unstructured":"Kmiecik, S. et al. Coarse-grained protein models and their applications. Chem. Rev. 116, 7898\u20137936 (2016).","journal-title":"Chem. Rev."},{"key":"627_CR10","doi-asserted-by":"publisher","first-page":"15665","DOI":"10.1002\/anie.202004239","volume":"59","author":"S Spicher","year":"2020","unstructured":"Spicher, S. & Grimme, S. Robust atomistic modeling of materials, organometallic, and biochemical systems. Angew. Chem. Int. Ed. 59, 15665\u201315673 (2020).","journal-title":"Angew. Chem. Int. Ed."},{"key":"627_CR11","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1021\/acs.jctc.6b00969","volume":"13","author":"S Vandenbrande","year":"2017","unstructured":"Vandenbrande, S., Waroquier, M., Speybroeck, V. V. & Verstraelen, T. The monomer electron density force field (MEDFF): a physically inspired model for noncovalent interactions. J. Chem. Theory Comput. 13, 161\u2013179 (2017).","journal-title":"J. Chem. Theory Comput."},{"key":"627_CR12","doi-asserted-by":"publisher","first-page":"463","DOI":"10.1021\/acs.jcim.1c01531","volume":"62","author":"J Wang","year":"2022","unstructured":"Wang, J. & Dokholyan, N. V. Yuel: improving the generalizability of structure-free compound\u2013protein interaction prediction. J. Chem. Inf. Model. 62, 463\u2013471 (2022).","journal-title":"J. Chem. Inf. Model."},{"key":"627_CR13","doi-asserted-by":"publisher","first-page":"2549","DOI":"10.1021\/jp910674d","volume":"114","author":"JW Ponder","year":"2010","unstructured":"Ponder, J. W. et al. Current status of the AMOEBA polarizable force field. J. Phys. Chem. B 114, 2549\u20132564 (2010).","journal-title":"J. Phys. Chem. B"},{"key":"627_CR14","doi-asserted-by":"publisher","first-page":"433","DOI":"10.1038\/s43588-022-00281-6","volume":"2","author":"B Chen","year":"2022","unstructured":"Chen, B. et al. Automated discovery of fundamental variables hidden in experimental data. Nat. Comput Sci. 2, 433\u2013442 (2022).","journal-title":"Nat. Comput Sci."},{"key":"627_CR15","doi-asserted-by":"publisher","first-page":"1865","DOI":"10.1021\/ci100244v","volume":"50","author":"JD Durrant","year":"2010","unstructured":"Durrant, J. D. & McCammon, J. A. NNScore: a neural-network-based scoring function for the characterization of protein\u2212ligand complexes. J. Chem. Inf. Model. 50, 1865\u20131871 (2010).","journal-title":"J. Chem. Inf. Model."},{"key":"627_CR16","doi-asserted-by":"publisher","first-page":"2113","DOI":"10.1093\/bioinformatics\/btz870","volume":"36","author":"X Wang","year":"2020","unstructured":"Wang, X., Terashi, G., Christoffer, C. W., Zhu, M. & Kihara, D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 36, 2113\u20132118 (2020).","journal-title":"Bioinformatics"},{"key":"627_CR17","doi-asserted-by":"publisher","first-page":"763","DOI":"10.1021\/acs.jcim.5b00642","volume":"56","author":"N-N Wang","year":"2016","unstructured":"Wang, N.-N. et al. ADME properties evaluation in drug discovery: prediction of Caco-2 cell permeability using a combination of NSGA-II and boosting. J. Chem. Inf. Model. 56, 763\u2013773 (2016).","journal-title":"J. Chem. Inf. Model."},{"key":"627_CR18","doi-asserted-by":"publisher","first-page":"1357","DOI":"10.1021\/acs.jcim.1c01074","volume":"62","author":"S Ishida","year":"2022","unstructured":"Ishida, S., Terayama, K., Kojima, R., Takasu, K. & Okuno, Y. AI-driven synthetic route design incorporated with retrosynthesis knowledge. J. Chem. Inf. Model. 62, 1357\u20131367 (2022).","journal-title":"J. Chem. Inf. Model."},{"key":"627_CR19","doi-asserted-by":"crossref","unstructured":"Karpov, P., Godin, G. & Tetko, I. V. A transformer model for retrosynthesis. In Artificial Neural Networks and Machine Learning\u2014ICANN 2019: Workshop and Special Sessions (eds Tetko, I. V. et al.) 817\u2013830 (Springer, 2019).","DOI":"10.1007\/978-3-030-30493-5_78"},{"key":"627_CR20","doi-asserted-by":"publisher","first-page":"i821","DOI":"10.1093\/bioinformatics\/bty593","volume":"34","author":"H \u00d6zt\u00fcrk","year":"2018","unstructured":"\u00d6zt\u00fcrk, H., \u00d6zg\u00fcr, A. & Ozkirimli, E. DeepDTA: deep drug\u2013target binding affinity prediction. Bioinformatics 34, i821\u2013i829 (2018).","journal-title":"Bioinformatics"},{"key":"627_CR21","doi-asserted-by":"publisher","first-page":"3329","DOI":"10.1093\/bioinformatics\/btz111","volume":"35","author":"M Karimi","year":"2019","unstructured":"Karimi, M., Wu, D., Wang, Z. & Shen, Y. DeepAffinity: interpretable deep learning of compound\u2013protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35, 3329\u20133338 (2019).","journal-title":"Bioinformatics"},{"key":"627_CR22","doi-asserted-by":"publisher","first-page":"2791","DOI":"10.1021\/acs.jcim.0c00075","volume":"60","author":"H Hassan-Harrirou","year":"2020","unstructured":"Hassan-Harrirou, H., Zhang, C. & Lemmin, T. RosENet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J. Chem. Inf. Model. 60, 2791\u20132802 (2020).","journal-title":"J. Chem. Inf. Model."},{"key":"627_CR23","doi-asserted-by":"publisher","first-page":"1520","DOI":"10.1021\/acscentsci.8b00507","volume":"4","author":"EN Feinberg","year":"2018","unstructured":"Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520\u20131530 (2018).","journal-title":"ACS Cent. Sci."},{"key":"627_CR24","doi-asserted-by":"crossref","unstructured":"Li, Y., Rezaei, M. A., Li, C. & Li, X. DeepAtom: a framework for protein\u2013ligand binding affinity prediction. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 303\u2013310 (IEEE, 2019).","DOI":"10.1109\/BIBM47256.2019.8982964"},{"key":"627_CR25","doi-asserted-by":"publisher","first-page":"4111","DOI":"10.1021\/jm048957q","volume":"48","author":"R Wang","year":"2005","unstructured":"Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111\u20134119 (2005).","journal-title":"J. Med. Chem."},{"key":"627_CR26","doi-asserted-by":"publisher","first-page":"D198","DOI":"10.1093\/nar\/gkl999","volume":"35","author":"T Liu","year":"2007","unstructured":"Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein\u2013ligand binding affinities. Nucleic Acids Res. 35, D198\u2013D201 (2007).","journal-title":"Nucleic Acids Res."},{"key":"627_CR27","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1002\/prot.20512","volume":"60","author":"L Hu","year":"2005","unstructured":"Hu, L., Benson, M. L., Smith, R. D., Lerner, M. G. & Carlson, H. A. Binding MOAD (Mother Of All Databases). Proteins Struct. Funct. Bioinform. 60, 333\u2013340 (2005).","journal-title":"Proteins Struct. Funct. Bioinform."},{"key":"627_CR28","doi-asserted-by":"publisher","first-page":"68","DOI":"10.3389\/fchem.2018.00068","volume":"6","author":"N-O Friedrich","year":"2018","unstructured":"Friedrich, N.-O., Simsir, M. & Kirchmair, J. How diverse are the protein-bound conformations of small-molecule drugs and cofactors? Front. Chem. 6, 68 (2018).","journal-title":"Front. Chem."},{"key":"627_CR29","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-022-01631-9","volume":"9","author":"DB Korlepara","year":"2022","unstructured":"Korlepara, D. B. et al. PLAS-5k: dataset of protein\u2013ligand affinities from molecular dynamics for machine learning applications. Sci. Data 9, 548 (2022).","journal-title":"Sci. Data"},{"key":"627_CR30","doi-asserted-by":"publisher","DOI":"10.1038\/s41597-023-02872-y","volume":"11","author":"DB Korlepara","year":"2024","unstructured":"Korlepara, D. B. et al. PLAS-20k: extended dataset of protein\u2013ligand affinities from MD simulations for machine learning applications. Sci. Data 11, 180 (2024).","journal-title":"Sci. Data"},{"key":"627_CR31","doi-asserted-by":"publisher","first-page":"69","DOI":"10.3389\/fphar.2020.00069","volume":"11","author":"J Yang","year":"2020","unstructured":"Yang, J., Shen, C. & Huang, N. Predicting or pretending: artificial intelligence for protein\u2013ligand interactions lack of sufficiently large and unbiased datasets. Front. Pharmacol. 11, 69 (2020).","journal-title":"Front. Pharmacol."},{"key":"627_CR32","doi-asserted-by":"publisher","first-page":"7946","DOI":"10.1021\/acs.jmedchem.2c00487","volume":"65","author":"M Volkov","year":"2022","unstructured":"Volkov, M. et al. On the frustration to predict binding affinities from protein\u2013ligand structures with deep neural networks. J. Med. Chem. 65, 7946\u20137958 (2022).","journal-title":"J. Med. Chem."},{"key":"627_CR33","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cbpa.2018.05.003","volume":"44","author":"S Vajda","year":"2018","unstructured":"Vajda, S., Beglov, D., Wakefield, A. E., Egbert, M. & Whitty, A. Cryptic binding sites on proteins: definition, detection, and druggability. Curr. Opin. Chem. Biol. 44, 1\u20138 (2018).","journal-title":"Curr. Opin. Chem. Biol."},{"key":"627_CR34","doi-asserted-by":"publisher","first-page":"2376","DOI":"10.1021\/ja044885g","volume":"127","author":"L Zeng","year":"2005","unstructured":"Zeng, L. et al. Selective small molecules blocking HIV-1 Tat and coactivator PCAF association. J. Am. Chem. Soc. 127, 2376\u20132377 (2005).","journal-title":"J. Am. Chem. Soc."},{"key":"627_CR35","unstructured":"Johnson, R. D. III (ed). Computational Chemistry Comparison and Benchmark Database Standard Reference Database Number 101 Release 22 (NIST, accessed 12 Jul 2022); http:\/\/cccbdb.nist.gov\/"},{"key":"627_CR36","doi-asserted-by":"publisher","first-page":"2143","DOI":"10.1016\/j.str.2013.09.006","volume":"21","author":"M Bista","year":"2013","unstructured":"Bista, M. et al. Transient protein states in designing inhibitors of the MDM2\u2013p53 interaction. Structure 21, 2143\u20132151 (2013).","journal-title":"Structure"},{"key":"627_CR37","doi-asserted-by":"publisher","first-page":"8731","DOI":"10.1021\/acs.jmedchem.7b00732","volume":"60","author":"M Xie","year":"2017","unstructured":"Xie, M. et al. Structural basis of inhibition of ER\u03b1\u2013coactivator interaction by high-affinity N-terminus isoaspartic acid tethered helical peptides. J. Med. Chem. 60, 8731\u20138740 (2017).","journal-title":"J. Med. Chem."},{"key":"627_CR38","doi-asserted-by":"publisher","first-page":"1623","DOI":"10.1002\/jcc.10128","volume":"23","author":"A Jakalian","year":"2002","unstructured":"Jakalian, A., Jack, D. B. & Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 23, 1623\u20131641 (2002).","journal-title":"J. Comput. Chem."},{"key":"627_CR39","doi-asserted-by":"publisher","first-page":"3864","DOI":"10.1021\/acs.jpcb.7b00272","volume":"121","author":"LS Dodda","year":"2017","unstructured":"Dodda, L. S., Vilseck, J. Z., Tirado-Rives, J. & Jorgensen, W. L. 1.14*CM1A-LBCC: localized bond-charge corrected CM1A charges for condensed-phase simulations. J. Phys. Chem. B 121, 3864\u20133870 (2017).","journal-title":"J. Phys. Chem. B"},{"key":"627_CR40","doi-asserted-by":"publisher","first-page":"11225","DOI":"10.1021\/ja9621760","volume":"118","author":"WL Jorgensen","year":"1996","unstructured":"Jorgensen, W. L., Maxwell, D. S. & Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 118, 11225\u201311236 (1996).","journal-title":"J. Am. Chem. Soc."},{"key":"627_CR41","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/BF00117280","volume":"9","author":"JW Storer","year":"1995","unstructured":"Storer, J. W., Giesen, D. J., Cramer, C. J. & Truhlar, D. G. Class IV charge models: a new semiempirical approach in quantum chemistry. J. Comput. Aided Mol. Des. 9, 87\u2013110 (1995).","journal-title":"J. Comput. Aided Mol. Des."},{"key":"627_CR42","doi-asserted-by":"publisher","first-page":"1820","DOI":"10.1021\/jp972682r","volume":"102","author":"J Li","year":"1998","unstructured":"Li, J., Zhu, T., Cramer, C. J. & Truhlar, D. G. New class IV charge model for extracting accurate partial charges from wave functions. J. Phys. Chem. A 102, 1820\u20131831 (1998).","journal-title":"J. Phys. Chem. A"},{"key":"627_CR43","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1002\/jcc.10244","volume":"24","author":"JD Thompson","year":"2003","unstructured":"Thompson, J. D., Cramer, C. J. & Truhlar, D. G. Parameterization of charge model 3 for AM1, PM3, BLYP, and B3LYP. J. Comput. Chem. 24, 1291\u20131304 (2003).","journal-title":"J. Comput. Chem."},{"key":"627_CR44","doi-asserted-by":"publisher","first-page":"054103","DOI":"10.1063\/1.4959605","volume":"145","author":"S Grimme","year":"2016","unstructured":"Grimme, S. & Bannwarth, C. Ultra-fast computation of electronic spectra for large systems by tight-binding based simplified Tamm\u2013Dancoff approximation (sTDA-xTB). J. Chem. Phys. 145, 054103 (2016).","journal-title":"J. Chem. Phys."},{"key":"627_CR45","doi-asserted-by":"publisher","first-page":"9478","DOI":"10.1021\/acs.chemrev.9b00055","volume":"119","author":"E Wang","year":"2019","unstructured":"Wang, E. et al. End-point binding free energy calculation with MM\/PBSA and MM\/GBSA: strategies and applications in drug design. Chem. Rev. 119, 9478\u20139508 (2019).","journal-title":"Chem. Rev."},{"key":"627_CR46","doi-asserted-by":"publisher","first-page":"1626","DOI":"10.1021\/acs.chemrev.8b00290","volume":"119","author":"Z Sun","year":"2019","unstructured":"Sun, Z., Liu, Q., Qu, G., Feng, Y. & Reetz, M. T. Utility of B factors in protein science: interpreting rigidity, flexibility, and internal motion and engineering thermostability. Chem. Rev. 119, 1626\u20131665 (2019).","journal-title":"Chem. Rev."},{"key":"627_CR47","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1038\/nsmb.1421","volume":"15","author":"D Guilligay","year":"2008","unstructured":"Guilligay, D. et al. The structural basis for cap binding by influenza virus polymerase subunit PB2. Nat. Struct. Mol. Biol. 15, 500\u2013506 (2008).","journal-title":"Nat. Struct. Mol. Biol."},{"key":"627_CR48","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1139\/cjc-2015-0526","volume":"94","author":"S Rayne","year":"2016","unstructured":"Rayne, S. & Forest, K. Benchmarking semiempirical, Hartree\u2013Fock, DFT, and MP2 methods against the ionization energies and electron affinities of short- through long-chain [n]acenes and [n]phenacenes. Can. J. Chem. 94, 251\u2013258 (2016).","journal-title":"Can. J. Chem."},{"key":"627_CR49","doi-asserted-by":"publisher","first-page":"4184","DOI":"10.1021\/jp0225774","volume":"107","author":"C-G Zhan","year":"2003","unstructured":"Zhan, C.-G., Nichols, J. A. & Dixon, D. A. Ionization potential, electron affinity, electronegativity, hardness, and electron excitation energy: molecular properties from density functional theory orbital energies. J. Phys. Chem. A 107, 4184\u20134195 (2003).","journal-title":"J. Phys. Chem. A"},{"key":"627_CR50","doi-asserted-by":"publisher","first-page":"5184","DOI":"10.1021\/jm020970s","volume":"46","author":"G Lange","year":"2003","unstructured":"Lange, G. et al. Requirements for specific binding of low affinity inhibitor fragments to the SH2 domain of pp60Src are identical to those for high affinity binding of full length inhibitors. J. Med. Chem. 46, 5184\u20135195 (2003).","journal-title":"J. Med. Chem."},{"key":"627_CR51","doi-asserted-by":"publisher","first-page":"1104","DOI":"10.1016\/j.drudis.2015.04.005","volume":"20","author":"L \u00d6ster","year":"2015","unstructured":"\u00d6ster, L., Tapani, S., Xue, Y. & K\u00e4ck, H. Successful generation of structural information for fragment-based drug discovery. Drug Discov. Today 20, 1104\u20131111 (2015).","journal-title":"Drug Discov. Today"},{"key":"627_CR52","doi-asserted-by":"publisher","first-page":"999","DOI":"10.1002\/cmdc.201700217","volume":"12","author":"S Heinzlmeir","year":"2017","unstructured":"Heinzlmeir, S. et al. Chemoproteomics-aided medicinal chemistry for the discovery of EPHA2 inhibitors. ChemMedChem 12, 999\u20131011 (2017).","journal-title":"ChemMedChem"},{"key":"627_CR53","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10822-017-0088-4","volume":"32","author":"Z Gaieb","year":"2018","unstructured":"Gaieb, Z. et al. D3R Grand Challenge 2: blind prediction of protein\u2013ligand poses, affinity rankings, and relative binding free energies. J. Comput. Aided Mol. Des. 32, 1\u201320 (2018).","journal-title":"J. Comput. Aided Mol. Des."},{"key":"627_CR54","doi-asserted-by":"publisher","first-page":"7210","DOI":"10.1021\/acs.jmedchem.9b00809","volume":"62","author":"AJ Whitehouse","year":"2019","unstructured":"Whitehouse, A. J. et al. Development of inhibitors against Mycobacterium abscessus tRNA (m1G37) methyltransferase (TrmD) using fragment-based approaches. J. Med. Chem. 62, 7210\u20137232 (2019).","journal-title":"J. Med. Chem."},{"key":"627_CR55","doi-asserted-by":"publisher","first-page":"3685","DOI":"10.1021\/acs.jcim.2c00757","volume":"62","author":"F Menezes","year":"2022","unstructured":"Menezes, F. & Popowicz, G. M. ULYSSES: an efficient and easy to use semiempirical library for C. J. Chem. Inf. Model. 62, 3685\u20133694 (2022).","journal-title":"J. Chem. Inf. Model."},{"key":"627_CR56","doi-asserted-by":"publisher","first-page":"1652","DOI":"10.1021\/acs.jctc.8b01176","volume":"15","author":"C Bannwarth","year":"2019","unstructured":"Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB\u2014an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652\u20131671 (2019).","journal-title":"J. Chem. Theory Comput."},{"key":"627_CR57","doi-asserted-by":"publisher","first-page":"3902","DOI":"10.1021\/ja00299a024","volume":"107","author":"MJS Dewar","year":"1985","unstructured":"Dewar, M. J. S., Zoebisch, E. G., Healy, E. F. & Stewart, J. J. P. Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model. J. Am. Chem. Soc. 107, 3902\u20133909 (1985).","journal-title":"J. Am. Chem. Soc."},{"key":"627_CR58","doi-asserted-by":"publisher","first-page":"765","DOI":"10.1007\/s00894-008-0420-y","volume":"15","author":"JJP Stewart","year":"2009","unstructured":"Stewart, J. J. P. Application of the PM6 method to modeling proteins. J. Mol. Model. 15, 765\u2013805 (2009).","journal-title":"J. Mol. Model."},{"key":"627_CR59","doi-asserted-by":"publisher","first-page":"124902","DOI":"10.1063\/1.2177251","volume":"124","author":"G Sigalov","year":"2006","unstructured":"Sigalov, G., Fenley, A. & Onufriev, A. Analytical electrostatics for biomolecules: beyond the generalized Born approximation. J. Chem. Phys. 124, 124902 (2006).","journal-title":"J. Chem. Phys."},{"key":"627_CR60","doi-asserted-by":"publisher","first-page":"5301","DOI":"10.1021\/acs.chemrev.5b00584","volume":"116","author":"AS Christensen","year":"2016","unstructured":"Christensen, A. S., Kuba\u0159, T., Cui, Q. & Elstner, M. Semiempirical quantum mechanical methods for noncovalent interactions for chemical and biochemical applications. Chem. Rev. 116, 5301\u20135337 (2016).","journal-title":"Chem. Rev."},{"key":"627_CR61","doi-asserted-by":"publisher","first-page":"879","DOI":"10.1063\/1.474386","volume":"107","author":"SL Dixon","year":"1997","unstructured":"Dixon, S. L. & Merz, K. M. Fast, accurate semiempirical molecular orbital calculations for macromolecules. J. Chem. Phys. 107, 879\u2013893 (1997).","journal-title":"J. Chem. Phys."},{"key":"627_CR62","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-3-33","volume":"3","author":"NM O\u2019Boyle","year":"2011","unstructured":"O\u2019Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).","journal-title":"J. Cheminform."},{"key":"627_CR63","doi-asserted-by":"publisher","first-page":"154122","DOI":"10.1063\/1.5090222","volume":"150","author":"E Caldeweyher","year":"2019","unstructured":"Caldeweyher, E. et al. A generally applicable atomic-charge dependent London dispersion correction. J. Chem. Phys. 150, 154122 (2019).","journal-title":"J. Chem. Phys."},{"key":"627_CR64","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-4-17","volume":"4","author":"MD Hanwell","year":"2012","unstructured":"Hanwell, M. D. et al. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminform. 4, 17 (2012).","journal-title":"J. Cheminform."},{"key":"627_CR65","unstructured":"Case, D. A. et al. Amber 2021 (Univ. of California, San Francisco, 2021)."},{"key":"627_CR66","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1002\/jcc.20035","volume":"25","author":"J Wang","year":"2004","unstructured":"Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general Amber force field. J. Comput. Chem. 25, 1157\u20131174 (2004).","journal-title":"J. Comput. Chem."},{"key":"627_CR67","doi-asserted-by":"publisher","first-page":"3696","DOI":"10.1021\/acs.jctc.5b00255","volume":"11","author":"JA Maier","year":"2015","unstructured":"Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696\u20133713 (2015).","journal-title":"J. Chem. Theory Comput."},{"key":"627_CR68","doi-asserted-by":"publisher","first-page":"926","DOI":"10.1063\/1.445869","volume":"79","author":"WL Jorgensen","year":"1983","unstructured":"Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926\u2013935 (1983).","journal-title":"J. Chem. Phys."},{"key":"627_CR69","doi-asserted-by":"publisher","unstructured":"Townshend, R. J. L. et al. ATOM3D: tasks on molecules in three dimensions. Preprint at https:\/\/doi.org\/10.48550\/arXiv.2012.04035 (2022).","DOI":"10.48550\/arXiv.2012.04035"},{"key":"627_CR70","doi-asserted-by":"publisher","unstructured":"Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https:\/\/doi.org\/10.48550\/arXiv.1609.02907 (2017).","DOI":"10.48550\/arXiv.1609.02907"},{"key":"627_CR71","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1093\/bioinformatics\/btq003","volume":"26","author":"Y Huang","year":"2010","unstructured":"Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680\u2013682 (2010).","journal-title":"Bioinformatics"},{"key":"627_CR72","doi-asserted-by":"publisher","first-page":"905","DOI":"10.1038\/nprot.2016.051","volume":"11","author":"S Forli","year":"2016","unstructured":"Forli, S. et al. Computational protein\u2013ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 11, 905\u2013919 (2016).","journal-title":"Nat. Protoc."},{"key":"627_CR73","doi-asserted-by":"publisher","first-page":"2768","DOI":"10.1093\/bioinformatics\/btl481","volume":"22","author":"Y Zhao","year":"2006","unstructured":"Zhao, Y., Stoffler, D. & Sanner, M. Hierarchical and multi-resolution representation of protein flexibility. Bioinformatics 22, 2768\u20132774 (2006).","journal-title":"Bioinformatics"},{"key":"627_CR74","doi-asserted-by":"publisher","first-page":"e1004586","DOI":"10.1371\/journal.pcbi.1004586","volume":"11","author":"PA Ravindranath","year":"2015","unstructured":"Ravindranath, P. A., Forli, S., Goodsell, D. S., Olson, A. J. & Sanner, M. F. AutoDockFR: advances in protein\u2013ligand docking with explicitly specified binding site flexibility. PLoS Comput. Biol. 11, e1004586 (2015).","journal-title":"PLoS Comput. Biol."},{"key":"627_CR75","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1007\/BF00197809","volume":"6","author":"F Delaglio","year":"1995","unstructured":"Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277\u2013293 (1995).","journal-title":"J. Biomol. NMR"},{"key":"627_CR76","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1007\/BF00404272","volume":"4","author":"BA Johnson","year":"1994","unstructured":"Johnson, B. A. & Blevins, R. A. NMR View: a computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603\u2013614 (1994).","journal-title":"J. Biomol. NMR"},{"key":"627_CR77","doi-asserted-by":"publisher","unstructured":"Siebenmorgen, T. et al. MISATO\u2014machine learning dataset for structure-based drug discovery. Zenodo https:\/\/doi.org\/10.5281\/zenodo.7711953 (2023).","DOI":"10.5281\/zenodo.7711953"},{"key":"627_CR78","doi-asserted-by":"publisher","unstructured":"t7morgen\/misato-dataset: release for publication. Zenodo https:\/\/doi.org\/10.5281\/zenodo.10926008 (2024).","DOI":"10.5281\/zenodo.10926008"}],"container-title":["Nature Computational Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s43588-024-00627-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-024-00627-2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s43588-024-00627-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,29]],"date-time":"2024-05-29T12:11:46Z","timestamp":1716984706000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s43588-024-00627-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,10]]},"references-count":78,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,5]]}},"alternative-id":["627"],"URL":"https:\/\/doi.org\/10.1038\/s43588-024-00627-2","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.05.24.542082","asserted-by":"object"}]},"ISSN":["2662-8457"],"issn-type":[{"value":"2662-8457","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,10]]},"assertion":[{"value":"30 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}