{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T01:44:37Z","timestamp":1774057477615,"version":"3.50.1"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T00:00:00Z","timestamp":1760313600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T00:00:00Z","timestamp":1760313600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100012331","name":"Flanders Innovation & Entrepreneurship","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100012331","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100007051","name":"Uppsala University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007051","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Aqueous solubility of a compound plays a crucial role throughout various stages of drug discovery and development. Despite numerous efforts using various machine learning models, accurately estimating aqueous solubility remains a challenge. One primary limitation is the absence of a single source, large dataset of druglike compounds for model training. Additionally, studies have highlighted the need for improvements in prediction algorithms and molecular representations. To address these challenges, the Johnson and Johnson (J&amp;J) in-house solubility data was leveraged. Theoretical pH-solubility equations and in-house pKa prediction tools were utilized to calculate intrinsic solubility from J&amp;J data. A multi-task graph transformer model was developed and trained on the calculated intrinsic solubility data of 13,306 compounds along with seven relevant physicochemical properties including solubility at pH 2\/7, logP, and logD at three different pHs. When evaluated making use of high-quality test data, the developed model achieved a root mean square error (RMSE) of 0.61 and coefficient of determination (R<jats:sup>2<\/jats:sup>) of 0.60, demonstrating state-of-the-art performance in estimating intrinsic solubility for drug-like compounds. <\/jats:p>","DOI":"10.1186\/s13321-025-01106-0","type":"journal-article","created":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T07:50:52Z","timestamp":1760341852000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Improved estimation of intrinsic solubility of drug-like molecules through multi-task graph transformer"],"prefix":"10.1186","volume":"17","author":[{"given":"Jiaxi","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Eline","family":"Hermans","sequence":"additional","affiliation":[]},{"given":"Kia","family":"Sepassi","sequence":"additional","affiliation":[]},{"given":"Christophe","family":"Tistaert","sequence":"additional","affiliation":[]},{"given":"Christel A. S.","family":"Bergstr\u00f6m","sequence":"additional","affiliation":[]},{"given":"Mazen","family":"Ahmad","sequence":"additional","affiliation":[]},{"given":"Per","family":"Larsson","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,10,13]]},"reference":[{"issue":"7","key":"1106_CR1","doi-asserted-by":"publisher","first-page":"1289","DOI":"10.1021\/CI800058V\/SUPPL_FILE\/CI800058V-FILE001.PDF","volume":"48","author":"A Llin\u00e0s","year":"2008","unstructured":"Llin\u00e0s A, Glen RC, Goodman JM (2008) Solubility challenge: Can you predict solubilities of 32 molecules using a database of 100 reliable measurements? J Chem Inf Model 48(7):1289\u20131303. https:\/\/doi.org\/10.1021\/CI800058V\/SUPPL_FILE\/CI800058V-FILE001.PDF","journal-title":"J Chem Inf Model"},{"issue":"6","key":"1106_CR2","doi-asserted-by":"publisher","first-page":"3036","DOI":"10.1021\/ACS.JCIM.9B00345\/SUPPL_FILE\/CI9B00345_SI_001.XLSX","volume":"59","author":"A Llinas","year":"2019","unstructured":"Llinas A, Avdeef A (2019) Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD \u223c0.17 log) and Loose (SD \u223c0.62 log) Test Sets. J Chem Inf Model 59(6):3036\u20133040. https:\/\/doi.org\/10.1021\/ACS.JCIM.9B00345\/SUPPL_FILE\/CI9B00345_SI_001.XLSX","journal-title":"J Chem Inf Model"},{"issue":"10","key":"1106_CR3","doi-asserted-by":"publisher","first-page":"4791","DOI":"10.1021\/ACS.JCIM.0C00701\/SUPPL_FILE\/CI0C00701_SI_001.XLSX","volume":"60","author":"A Llinas","year":"2020","unstructured":"Llinas A, Oprisiu I, Avdeef A (2020) Findings of the second challenge to predict aqueous solubility. J Chem Inf Model 60(10):4791\u20134803. https:\/\/doi.org\/10.1021\/ACS.JCIM.0C00701\/SUPPL_FILE\/CI0C00701_SI_001.XLSX","journal-title":"J Chem Inf Model"},{"issue":"8","key":"1106_CR4","doi-asserted-by":"publisher","first-page":"2962","DOI":"10.1021\/MP500103R\/SUPPL_FILE\/MP500103R_SI_002.PDF","volume":"11","author":"DS Palmer","year":"2014","unstructured":"Palmer DS, Mitchell JBO (2014) Is experimental data quality the limiting factor in predicting the aqueous solubility of druglike molecules? Mol Pharm 11(8):2962\u20132972. https:\/\/doi.org\/10.1021\/MP500103R\/SUPPL_FILE\/MP500103R_SI_002.PDF","journal-title":"Mol Pharm"},{"key":"1106_CR5","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1016\/J.NEUCOM.2020.10.081","volume":"429","author":"M Wang","year":"2021","unstructured":"Wang M, Deng W (2021) Deep face recognition: a survey. Neurocomputing 429:215\u2013244. https:\/\/doi.org\/10.1016\/J.NEUCOM.2020.10.081","journal-title":"Neurocomputing"},{"issue":"8018","key":"1106_CR6","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1038\/s41586-024-07335-x","volume":"630","author":"MR Costa-juss\u00e0","year":"2024","unstructured":"Costa-juss\u00e0 MR, Cross J, \u00c7elebi O et al (2024) Scaling neural machine translation to 200 languages. Nature 630(8018):841\u2013846. https:\/\/doi.org\/10.1038\/s41586-024-07335-x","journal-title":"Nature"},{"issue":"3","key":"1106_CR7","doi-asserted-by":"publisher","first-page":"362","DOI":"10.1002\/ROB.21918","volume":"37","author":"S Grigorescu","year":"2020","unstructured":"Grigorescu S, Trasnea B, Cocias T, Macesanu G (2020) A survey of deep learning techniques for autonomous driving. J Field Robot 37(3):362\u2013386. https:\/\/doi.org\/10.1002\/ROB.21918","journal-title":"J Field Robot"},{"issue":"7873","key":"1106_CR8","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","volume":"596","author":"J Jumper","year":"2021","unstructured":"Jumper J, Evans R, Pritzel A et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583\u2013589. https:\/\/doi.org\/10.1038\/s41586-021-03819-2","journal-title":"Nature"},{"issue":"8016","key":"1106_CR9","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1038\/s41586-024-07487-w","volume":"630","author":"J Abramson","year":"2024","unstructured":"Abramson J, Adler J, Dunger J et al (2024) Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630(8016):493\u2013500. https:\/\/doi.org\/10.1038\/s41586-024-07487-w","journal-title":"Nature"},{"key":"1106_CR10","unstructured":"ChatGPT. Accessed March 14, 2025. https:\/\/chatgpt.com\/"},{"key":"1106_CR11","unstructured":"DeepSeek-AI, Liu A, Feng B, et al. DeepSeek-V2: a strong, economical, and efficient mixture-of-experts language model. Published online May 7, 2024. Accessed March 14, 2025. https:\/\/arxiv.org\/abs\/2405.04434v5"},{"issue":"16","key":"1106_CR12","doi-asserted-by":"publisher","first-page":"8736","DOI":"10.1021\/JACS.2C13467\/ASSET\/IMAGES\/LARGE\/JA2C13467_0005.JPEG","volume":"145","author":"DM Anstine","year":"2023","unstructured":"Anstine DM, Isayev O (2023) Generative models as an emerging paradigm in the chemical sciences. J Am Chem Soc 145(16):8736\u20138750. https:\/\/doi.org\/10.1021\/JACS.2C13467\/ASSET\/IMAGES\/LARGE\/JA2C13467_0005.JPEG","journal-title":"J Am Chem Soc"},{"issue":"1","key":"1106_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/S13321-024-00812-5\/FIGURES\/5","volume":"16","author":"HH Loeffler","year":"2024","unstructured":"Loeffler HH, He J, Tibo A et al (2024) Reinvent 4: modern AI\u2013driven generative molecule design. J Cheminform 16(1):1\u201316. https:\/\/doi.org\/10.1186\/S13321-024-00812-5\/FIGURES\/5","journal-title":"J Cheminform"},{"issue":"9","key":"1106_CR14","doi-asserted-by":"publisher","first-page":"2046","DOI":"10.1021\/ACS.JCIM.1C00469\/SUPPL_FILE\/CI1C00469_SI_001.PDF","volume":"62","author":"V Fialkov\u00e1","year":"2022","unstructured":"Fialkov\u00e1 V, Zhao J, Papadopoulos K et al (2022) LibINVENT: reaction-based generative scaffold decoration for in silico library design. J Chem Inf Model 62(9):2046\u20132063. https:\/\/doi.org\/10.1021\/ACS.JCIM.1C00469\/SUPPL_FILE\/CI1C00469_SI_001.PDF","journal-title":"J Chem Inf Model"},{"key":"1106_CR15","doi-asserted-by":"publisher","DOI":"10.1093\/BIOINFORMATICS\/BTAE416","author":"K Swanson","year":"2024","unstructured":"Swanson K, Walther P, Leitz J et al (2024) ADMET-AI: a machine learning ADMET platform for evaluation of large-scale chemical libraries. Bioinformatics. https:\/\/doi.org\/10.1093\/BIOINFORMATICS\/BTAE416","journal-title":"Bioinformatics"},{"issue":"W1","key":"1106_CR16","doi-asserted-by":"publisher","first-page":"W5","DOI":"10.1093\/NAR\/GKAB255","volume":"49","author":"G Xiong","year":"2021","unstructured":"Xiong G, Wu Z, Yi J et al (2021) ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res 49(W1):W5\u2013W14. https:\/\/doi.org\/10.1093\/NAR\/GKAB255","journal-title":"Nucleic Acids Res"},{"key":"1106_CR17","unstructured":"St\u00e4rk H, Beaini D, Corso G, et al. 3D Infomax improves GNNs for Molecular Property Prediction. Published online June 28, 2022:20479\u201320502. Accessed April 2, 2025. https:\/\/proceedings.mlr.press\/v162\/stark22a.html"},{"issue":"6","key":"1106_CR18","doi-asserted-by":"publisher","first-page":"642","DOI":"10.1016\/J.MEDJ.2021.04.006\/ASSET\/2B8CA7C0-7021-4409-B5CA-634DAA17D471\/MAIN.ASSETS\/GR1.JPG","volume":"2","author":"L Adlung","year":"2021","unstructured":"Adlung L, Cohen Y, Mor U, Elinav E (2021) Machine learning in clinical decision making. Med 2(6):642\u2013665. https:\/\/doi.org\/10.1016\/J.MEDJ.2021.04.006\/ASSET\/2B8CA7C0-7021-4409-B5CA-634DAA17D471\/MAIN.ASSETS\/GR1.JPG","journal-title":"Med"},{"issue":"1","key":"1106_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1208\/S12248-024-01006-5\/FIGURES\/7","volume":"27","author":"CS Ajmal","year":"2025","unstructured":"Ajmal CS, Yerram S, Abishek V et al (2025) Innovative approaches in regulatory affairs: leveraging artificial intelligence and machine learning for efficient compliance and decision-making. AAPS Journal 27(1):1\u201327. https:\/\/doi.org\/10.1208\/S12248-024-01006-5\/FIGURES\/7","journal-title":"AAPS Journal"},{"issue":"7","key":"1106_CR20","doi-asserted-by":"publisher","first-page":"1563","DOI":"10.1021\/CI400187Y\/SUPPL_FILE\/CI400187Y_SI_002.ZIP","volume":"53","author":"A Lusci","year":"2013","unstructured":"Lusci A, Pollastri G, Baldi P (2013) Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J Chem Inf Model 53(7):1563\u20131575. https:\/\/doi.org\/10.1021\/CI400187Y\/SUPPL_FILE\/CI400187Y_SI_002.ZIP","journal-title":"J Chem Inf Model"},{"issue":"3","key":"1106_CR21","doi-asserted-by":"publisher","first-page":"1000","DOI":"10.1021\/CI034243X\/SUPPL_FILE\/CI034243XSI20040112_053635.TXT","volume":"44","author":"JS Delaney","year":"2004","unstructured":"Delaney JS (2004) ESOL: estimating aqueous solubility directly from molecular structure. J Chem Inf Comput Sci 44(3):1000\u20131005. https:\/\/doi.org\/10.1021\/CI034243X\/SUPPL_FILE\/CI034243XSI20040112_053635.TXT","journal-title":"J Chem Inf Comput Sci"},{"key":"1106_CR22","doi-asserted-by":"publisher","DOI":"10.1016\/J.AILSCI.2021.100021","volume":"1","author":"M Wiercioch","year":"2021","unstructured":"Wiercioch M, Kirchmair J (2021) Dealing with a data-limited regime: combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance. Artif Intell Life Sci 1:100021. https:\/\/doi.org\/10.1016\/J.AILSCI.2021.100021","journal-title":"Artif Intell Life Sci"},{"issue":"6","key":"1106_CR23","doi-asserted-by":"publisher","first-page":"2530","DOI":"10.1021\/ACS.JCIM.1C00331\/ASSET\/IMAGES\/LARGE\/CI1C00331_0002.JPEG","volume":"61","author":"PG Francoeur","year":"2021","unstructured":"Francoeur PG, Koes DR (2021) SolTranNet-A machine learning tool for fast aqueous solubility prediction. J Chem Inf Model 61(6):2530\u20132536. https:\/\/doi.org\/10.1021\/ACS.JCIM.1C00331\/ASSET\/IMAGES\/LARGE\/CI1C00331_0002.JPEG","journal-title":"J Chem Inf Model"},{"issue":"1","key":"1106_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41597-024-03105-6","volume":"11","author":"P Llompart","year":"2024","unstructured":"Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A (2024) Will we ever be able to accurately predict solubility? Sci Data 11(1):1\u201326. https:\/\/doi.org\/10.1038\/s41597-024-03105-6","journal-title":"Sci Data"},{"key":"1106_CR25","doi-asserted-by":"publisher","first-page":"5261","DOI":"10.1021\/ACS.MOLPHARMACEUT.4C00685\/ASSET\/IMAGES\/LARGE\/MP4C00685_0008.JPEG","volume":"21","author":"J Zhao","year":"2024","unstructured":"Zhao J, Hermans E, Sepassi K et al (2024) Effect of data quality and data quantity on the estimation of intrinsic solubility: analysis based on a single-source data set. Mol Pharm 21:5261\u20135271. https:\/\/doi.org\/10.1021\/ACS.MOLPHARMACEUT.4C00685\/ASSET\/IMAGES\/LARGE\/MP4C00685_0008.JPEG","journal-title":"Mol Pharm"},{"key":"1106_CR26","doi-asserted-by":"publisher","unstructured":"Avdeef A. Absorption and Drug Development: Solubility, Permeability, and Charge State. Absorption and Drug Development: Solubility, Permeability, and Charge State. Published online May 7, 2012. https:\/\/doi.org\/10.1002\/9781118286067","DOI":"10.1002\/9781118286067"},{"key":"1106_CR27","doi-asserted-by":"publisher","unstructured":"Liu C, Yao Z, Zhan Y, Ma X, Pan S, Hu W. Gradformer: Graph Transformer with Exponential Decay. Published online April 24, 2024:2171\u20132179. https:\/\/doi.org\/10.24963\/ijcai.2024\/240","DOI":"10.24963\/ijcai.2024\/240"},{"key":"1106_CR28","first-page":"14501","volume":"35","author":"L Ramp\u00e1\u0161ek","year":"2022","unstructured":"Ramp\u00e1\u0161ek L, Galkin M, Dwivedi VP, Luu AT, Wolf G, Beaini D (2022) Recipe for a general, powerful, scalable graph transformer. Adv Neural Inf Process Syst 35:14501\u201314515","journal-title":"Adv Neural Inf Process Syst"},{"issue":"2","key":"1106_CR29","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1038\/s42256-023-00785-4","volume":"6","author":"S Allenspach","year":"2024","unstructured":"Allenspach S, Hiss JA, Schneider G (2024) Neural multi-task learning in drug design. Nat Mach Intell 6(2):124\u2013137. https:\/\/doi.org\/10.1038\/s42256-023-00785-4","journal-title":"Nat Mach Intell"},{"key":"1106_CR30","unstructured":"PyTorch. Accessed June 11, 2025. https:\/\/pytorch.org\/"},{"key":"1106_CR31","unstructured":"PyG Documentation\u2014pytorch_geometric documentation. Accessed March 21, 2025. https:\/\/pytorch-geometric.readthedocs.io\/en\/latest\/"},{"key":"1106_CR32","unstructured":"RDKit. Accessed March 20, 2025. https:\/\/www.rdkit.org\/"},{"key":"1106_CR33","doi-asserted-by":"publisher","unstructured":"Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. Published online May 19, 2017:7482\u20137491. https:\/\/doi.org\/10.1109\/CVPR.2018.00781","DOI":"10.1109\/CVPR.2018.00781"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01106-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01106-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01106-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T07:50:54Z","timestamp":1760341854000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01106-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,13]]},"references-count":33,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1106"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01106-0","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,13]]},"assertion":[{"value":"30 June 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 October 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"153"}}