{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T15:03:57Z","timestamp":1750950237817,"version":"3.40.3"},"publisher-location":"Cham","reference-count":28,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031452741"},{"type":"electronic","value":"9783031452758"}],"license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,8]],"date-time":"2023-10-08T00:00:00Z","timestamp":1696723200000},"content-version":"vor","delay-in-days":280,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We propose a new machine learning formulation designed specifically for extrapolation. The textbook way to apply machine learning to drug design is to learn a univariate function that when a drug (structure) is input, the function outputs a real number (the activity): <jats:italic>F<\/jats:italic>(drug)\u2009\u2192\u2009activity. The PubMed server lists around twenty thousand papers doing this. However, experience in real-world drug design suggests that this formulation of the drug design problem is not quite correct. Specifically, what one is really interested in is extrapolation: predicting the activity of new drugs with higher activity than any existing ones. Our new formulation for extrapolation is based around learning a bivariate function that predicts the difference in activities of two drugs: <jats:italic>F<\/jats:italic>(drug1, drug2)\u2009\u2192\u2009signed difference in activity. This formulation is general and potentially suitable for problems to find samples with target values beyond the target value range of the training set. We applied the formulation to work with support vector machines (SVMs), random forests (RFs), and Gradient Boosting Machines (XGBs). We compared the formulation with standard regression on thousands of drug design datasets, and hundreds of gene expression datasets. The test set extrapolation metrics use the concept of classification metrics to count the identification of extraordinary examples (with greater values than the training set), and top-performing examples (within the top 10% of the whole dataset). On these metrics our pairwise formulation vastly outperformed standard regression for SVMs, RFs, and XGBs. We expect this success to extrapolate to other extrapolation problems.<\/jats:p>","DOI":"10.1007\/978-3-031-45275-8_19","type":"book-chapter","created":{"date-parts":[[2023,10,7]],"date-time":"2023-10-07T06:01:56Z","timestamp":1696658516000},"page":"277-292","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Extrapolation is Not the Same as Interpolation"],"prefix":"10.1007","author":[{"given":"Yuxuan","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7208-4387","authenticated-orcid":false,"given":"Ross D.","family":"King","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,8]]},"reference":[{"key":"19_CR1","doi-asserted-by":"publisher","DOI":"10.1016\/j.commatsci.2019.109498","volume":"174","author":"SK Kauwe","year":"2020","unstructured":"Kauwe, S.K., Graser, J., Murdock, R., Sparks, T.D.: Can machine learning find extraordinary materials? Comput. Mater. Sci. 174, 109498 (2020). https:\/\/doi.org\/10.1016\/j.commatsci.2019.109498","journal-title":"Comput. Mater. Sci."},{"key":"19_CR2","unstructured":"Tong, W., Hong, H., Xie, Q., Shi, L., Fang, H., Perkins, R.: Assessing QSAR Limitations \u2013 A Regulatory Perspective"},{"key":"19_CR3","doi-asserted-by":"publisher","unstructured":"Nicolotti, O. ed: Computational Toxicology: Methods and Protocols. Springer New York (2018). https:\/\/doi.org\/10.1007\/978-1-4939-7899-1","DOI":"10.1007\/978-1-4939-7899-1"},{"key":"19_CR4","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2022.832120","volume":"13","author":"M von Korff","year":"2022","unstructured":"von Korff, M., Sander, T.: Limits of prediction for machine learning in drug discovery. Front. Pharmacol. 13, 832120 (2022). https:\/\/doi.org\/10.3389\/fphar.2022.832120","journal-title":"Front. Pharmacol."},{"key":"19_CR5","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1007\/s10822-011-9495-0","volume":"26","author":"RD Cramer","year":"2012","unstructured":"Cramer, R.D.: The inevitable QSAR renaissance. J. Comput. Aided Mol. Des. 26, 35\u201338 (2012). https:\/\/doi.org\/10.1007\/s10822-011-9495-0","journal-title":"J. Comput. Aided Mol. Des."},{"key":"19_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.commatsci.2019.109203","volume":"171","author":"Z Xiong","year":"2020","unstructured":"Xiong, Z., Cui, Y., Liu, Z., Zhao, Y., Hu, M., Hu, J.: Evaluating explorative prediction power of machine learning algorithms for materials discovery using k -fold forward cross-validation. Comput. Mater. Sci. 171, 109203 (2020). https:\/\/doi.org\/10.1016\/j.commatsci.2019.109203","journal-title":"Comput. Mater. Sci."},{"key":"19_CR7","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1021\/ci9003865","volume":"50","author":"S Agarwal","year":"2010","unstructured":"Agarwal, S., Dugar, D., Sengupta, S.: Ranking chemical structures for drug discovery: a new machine learning approach. J. Chem. Inf. Model. 50, 716\u2013731 (2010). https:\/\/doi.org\/10.1021\/ci9003865","journal-title":"J. Chem. Inf. Model."},{"key":"19_CR8","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1021\/ci100308f","volume":"51","author":"F Rathke","year":"2011","unstructured":"Rathke, F., Hansen, K., Brefeld, U., M\u00fcller, K.-R.: StructRank: a new approach for ligand-based virtual screening. J. Chem. Inf. Model. 51, 83\u201392 (2011). https:\/\/doi.org\/10.1021\/ci100308f","journal-title":"J. Chem. Inf. Model."},{"key":"19_CR9","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/s10822-016-0003-4","volume":"31","author":"MM Al-Dabbagh","year":"2017","unstructured":"Al-Dabbagh, M.M., Salim, N., Himmat, M., Ahmed, A., Saeed, F.: Quantum probability ranking principle for ligand-based virtual screening. J. Comput. Aided Mol. Des. 31, 365\u2013378 (2017). https:\/\/doi.org\/10.1007\/s10822-016-0003-4","journal-title":"J. Comput. Aided Mol. Des."},{"key":"19_CR10","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1021\/acs.jcim.6b00737","volume":"57","author":"J Liu","year":"2017","unstructured":"Liu, J., Ning, X.: Multi-assay-based compound prioritization via assistance utilization: a machine learning framework. J. Chem. Inf. Model. 57, 484\u2013498 (2017). https:\/\/doi.org\/10.1021\/acs.jcim.6b00737","journal-title":"J. Chem. Inf. Model."},{"key":"19_CR11","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/s13321-015-0052-z","volume":"7","author":"W Zhang","year":"2015","unstructured":"Zhang, W., et al.: When drug discovery meets web search: learning to rank for ligand-based virtual screening. J Cheminform. 7, 5 (2015). https:\/\/doi.org\/10.1186\/s13321-015-0052-z","journal-title":"J Cheminform."},{"key":"19_CR12","doi-asserted-by":"publisher","first-page":"4656","DOI":"10.1093\/bioinformatics\/btz293","volume":"35","author":"OP Watson","year":"2019","unstructured":"Watson, O.P., Cortes-Ciriano, I., Taylor, A.R., Watson, J.A.: A decision-theoretic approach to the evaluation of machine learning algorithms in computational drug discovery. Bioinformatics 35, 4656\u20134663 (2019). https:\/\/doi.org\/10.1093\/bioinformatics\/btz293","journal-title":"Bioinformatics"},{"key":"19_CR13","doi-asserted-by":"publisher","first-page":"819","DOI":"10.1039\/C8ME00012C","volume":"3","author":"B Meredig","year":"2018","unstructured":"Meredig, B., et al.: Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Mol. Syst. Des. Eng. 3, 819\u2013825 (2018). https:\/\/doi.org\/10.1039\/C8ME00012C","journal-title":"Mol. Syst. Des. Eng."},{"key":"19_CR14","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1038\/s42256-021-00332-z","volume":"3","author":"RD King","year":"2021","unstructured":"King, R.D., Orhobor, O.I., Taylor, C.C.: Cross-validation is safe to use. Nat Mach Intell. 3, 276 (2021). https:\/\/doi.org\/10.1038\/s42256-021-00332-z","journal-title":"Nat Mach Intell."},{"key":"19_CR15","doi-asserted-by":"publisher","first-page":"D930","DOI":"10.1093\/nar\/gky1075","volume":"47","author":"D Mendez","year":"2019","unstructured":"Mendez, D., et al.: ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930\u2013D940 (2019). https:\/\/doi.org\/10.1093\/nar\/gky1075","journal-title":"Nucleic Acids Res."},{"key":"19_CR16","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2108013118","volume":"118","author":"I Olier","year":"2021","unstructured":"Olier, I., et al.: Transformational machine learning: Learning how to learn from many related scientific problems. Proc. Natl. Acad. Sci. U.S.A. 118, e2108013118 (2021). https:\/\/doi.org\/10.1073\/pnas.2108013118","journal-title":"Proc. Natl. Acad. Sci. U.S.A."},{"key":"19_CR17","doi-asserted-by":"publisher","first-page":"5441","DOI":"10.1039\/C8SC00148K","volume":"9","author":"A Mayr","year":"2018","unstructured":"Mayr, A., et al.: Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9, 5441\u20135451 (2018). https:\/\/doi.org\/10.1039\/C8SC00148K","journal-title":"Chem. Sci."},{"key":"19_CR18","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan, H.L.: The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5, 107\u2013113 (1965). https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J. Chem. Doc."},{"key":"19_CR19","doi-asserted-by":"publisher","first-page":"D558","DOI":"10.1093\/nar\/gkx1063","volume":"46","author":"A Koleti","year":"2018","unstructured":"Koleti, A., et al.: Data portal for the library of integrated network-based cellular signatures (LINCS) program: integrated access to diverse large-scale cellular perturbation response data. Nucleic Acids Res. 46, D558\u2013D566 (2018). https:\/\/doi.org\/10.1093\/nar\/gkx1063","journal-title":"Nucleic Acids Res."},{"key":"19_CR20","unstructured":"Brownlee, J.: Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python. Machine Learning Mastery (2020)"},{"key":"19_CR21","doi-asserted-by":"publisher","unstructured":"Kunanbayev, K., Temirbek, I., Zollanvari, A.: Complex encoding. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1\u20136. IEEE, Shenzhen, China (2021). https:\/\/doi.org\/10.1109\/IJCNN52387.2021.9534094","DOI":"10.1109\/IJCNN52387.2021.9534094"},{"key":"19_CR22","doi-asserted-by":"publisher","first-page":"1134","DOI":"10.1038\/nmeth.2259","volume":"9","author":"Y Park","year":"2012","unstructured":"Park, Y., Marcotte, E.M.: Flaws in evaluation schemes for pair-input computational predictions. Nat. Methods 9, 1134\u20131136 (2012). https:\/\/doi.org\/10.1038\/nmeth.2259","journal-title":"Nat. Methods"},{"key":"19_CR23","doi-asserted-by":"crossref","unstructured":"Herbrich, R., Minka, T., Graepel, T.: TrueSkill(TM): A Bayesian skill rating system. In: Presented at the Advances in Neural Information Processing Systems 20 January 1 (2007)","DOI":"10.7551\/mitpress\/7503.003.0076"},{"key":"19_CR24","unstructured":"Elo, A.E.: The Rating of Chessplayers, Past and Present. Arco Pub. (1978)"},{"key":"19_CR25","doi-asserted-by":"publisher","unstructured":"Hub\u00e1\u010dek, O., \u0160ourek, G., \u017eelezn\u00fd, F.: Forty years of score-based soccer match outcome prediction: an experimental review. IMA J. Manage. Math. 33, 1\u201318 (2022)https:\/\/doi.org\/10.1093\/imaman\/dpab029","DOI":"10.1093\/imaman\/dpab029"},{"key":"19_CR26","unstructured":"TrueSkill \u2014 trueskill 0.4.5 documentation. https:\/\/trueskill.org\/. Accessed 25 Apr 2023"},{"key":"19_CR27","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825\u20132830 (2011)","journal-title":"J. Mach. Learn. Res."},{"key":"19_CR28","doi-asserted-by":"publisher","first-page":"3846","DOI":"10.1021\/acs.jcim.1c00670","volume":"61","author":"M Tynes","year":"2021","unstructured":"Tynes, M., et al.: Pairwise difference regression: a machine learning meta-algorithm for improved prediction and uncertainty quantification in chemical search. J. Chem. Inf. Model. 61, 3846\u20133857 (2021). https:\/\/doi.org\/10.1021\/acs.jcim.1c00670","journal-title":"J. Chem. Inf. Model."}],"container-title":["Lecture Notes in Computer Science","Discovery Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-45275-8_19","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T15:10:19Z","timestamp":1710256219000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-45275-8_19"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"ISBN":["9783031452741","9783031452758"],"references-count":28,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-45275-8_19","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2023]]},"assertion":[{"value":"8 October 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"DS","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on Discovery Science","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Porto","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Portugal","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2023","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"9 October 2023","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"11 October 2023","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"26","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"dis2023","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/ds2023.inesctec.pt\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"CMT","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"133","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"37","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"10","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"28% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"4","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}