{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T10:10:14Z","timestamp":1742983814095,"version":"3.40.3"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T00:00:00Z","timestamp":1742947200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T00:00:00Z","timestamp":1742947200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["22H00532"],"award-info":[{"award-number":["22H00532"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science, Japan","doi-asserted-by":"crossref","award":["22K19830"],"award-info":[{"award-number":["22K19830"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Aqueous solubility\u00a0(AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR), and mixed integer linear programming\u00a0(MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6,\u00a01166] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at\u00a0<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/ku-dml\/mol-infer\/tree\/master\/AqSol\" ext-link-type=\"uri\">https:\/\/github.com\/ku-dml\/mol-infer\/tree\/master\/AqSol<\/jats:ext-link>.  <\/jats:p>","DOI":"10.1186\/s13321-025-00966-w","type":"journal-article","created":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T09:48:32Z","timestamp":1742982512000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A unified approach to inferring chemical compounds with the desired aqueous solubility"],"prefix":"10.1186","volume":"17","author":[{"given":"Muniba","family":"Batool","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7941-3419","authenticated-orcid":false,"given":"Naveed Ahmed","family":"Azam","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1287-9572","authenticated-orcid":false,"given":"Jianshen","family":"Zhu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2479-3135","authenticated-orcid":false,"given":"Kazuya","family":"Haraguchi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0869-7896","authenticated-orcid":false,"given":"Liang","family":"Zhao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9763-797X","authenticated-orcid":false,"given":"Tatsuya","family":"Akutsu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,26]]},"reference":[{"issue":"2","key":"966_CR1","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1021\/acs.jcim.5b00628","volume":"56","author":"T Miyao","year":"2016","unstructured":"Miyao T, Kaneko H, Funatsu K (2016) Inverse QSPR\/QSAR analysis for chemical structure generation (from y to x). J Chem Inf Model 56(2):286\u2013299","journal-title":"J Chem Inf Model"},{"issue":"1","key":"966_CR2","doi-asserted-by":"crossref","first-page":"5753","DOI":"10.1038\/s41467-020-19594-z","volume":"11","author":"S Boobier","year":"2020","unstructured":"Boobier S, Hose DR, Blacker AJ, Nguyen BN (2020) Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 11(1):5753","journal-title":"Nat Commun"},{"issue":"1","key":"966_CR3","doi-asserted-by":"crossref","first-page":"150","DOI":"10.1021\/ci060164k","volume":"47","author":"DS Palmer","year":"2007","unstructured":"Palmer DS, O\u2019Boyle NM, Glen RC, Mitchell JB (2007) Random forest models to predict aqueous solubility. J Chem Inf Model 47(1):150","journal-title":"J Chem Inf Model"},{"issue":"8","key":"966_CR4","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1080\/1062936X.2017.1368704","volume":"28","author":"OA Raevsky","year":"2017","unstructured":"Raevsky OA, Grigorev VY, Polianczyk DE, Raevskaja OE, Dearden JC (2017) Six global and local QSPR models of aqueous solubility at pH= 7.4 based on structural similarity and physicochemical descriptors. SAR QSAR Environ Res 28(8):661\u2013676","journal-title":"SAR QSAR Environ Res"},{"issue":"3","key":"966_CR5","doi-asserted-by":"crossref","first-page":"465","DOI":"10.1021\/acs.chemrestox.2c00379","volume":"36","author":"CN Lowe","year":"2023","unstructured":"Lowe CN, Charest N, Ramsland C, Chang DT, Martin TM, Williams AJ (2023) Transparency in modeling through careful application of OECD\u2019s QSAR\/QSPR principles via a curated water solubility data set. Chem Res Toxicol 36(3):465\u2013478","journal-title":"Chem Res Toxicol"},{"issue":"7\u20138","key":"966_CR6","volume":"35","author":"M Lovri\u0107","year":"2021","unstructured":"Lovri\u0107 M, Pavlovi\u0107 K, \u017duvela P, Spataru A, Lu\u010di\u0107 B, Kern R, Wong MW (2021) Machine learning in prediction of intrinsic aqueous solubility of drug-like compounds: Generalization, complexity, or predictive ability? J Chemom 35(7\u20138):e3349","journal-title":"J Chemom"},{"issue":"1","key":"966_CR7","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1186\/s13321-023-00752-6","volume":"15","author":"A Tayyebi","year":"2023","unstructured":"Tayyebi A, Alshami AS, Rabiei Z, Yu X, Ismail N, Talukder MJ, Power J (2023) Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models. J Cheminform 15(1):99","journal-title":"J Cheminform"},{"issue":"3","key":"966_CR8","doi-asserted-by":"crossref","first-page":"571","DOI":"10.1021\/ci800406y","volume":"49","author":"J Wang","year":"2009","unstructured":"Wang J, Hou T, Xu X (2009) Aqueous solubility prediction based on weighted atom type counts and solvent accessible surface areas. J Chem Inf Model 49(3):571\u2013581","journal-title":"J Chem Inf Model"},{"key":"966_CR9","doi-asserted-by":"crossref","DOI":"10.1016\/j.jmgm.2021.107901","volume":"106","author":"N Meftahi","year":"2021","unstructured":"Meftahi N, Walker ML, Smith BJ (2021) Predicting aqueous solubility by QSPR modeling. J Mol Graph Model 106:107901","journal-title":"J Mol Graph Model"},{"issue":"9","key":"966_CR10","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1002\/cem.1321","volume":"24","author":"DS Cao","year":"2010","unstructured":"Cao DS, Xu QS, Liang YZ, Chen X, Li HD (2010) Prediction of aqueous solubility of druglike organic compounds using partial least squares, back-propagation network and support vector machine. J Chemom 24(9):584\u2013595","journal-title":"J Chemom"},{"issue":"2","key":"966_CR11","doi-asserted-by":"crossref","DOI":"10.1080\/00268976.2019.1600754","volume":"118","author":"T Deng","year":"2020","unstructured":"Deng T, Jia GZ (2020) Prediction of aqueous solubility of compounds based on neural network. Mol Phys 118(2):e1600754","journal-title":"Mol Phys"},{"key":"966_CR12","unstructured":"Panapitiya G, Girard M, Hollas A, Murugesan V, Wang W, Saldanha E (2021) Predicting aqueous solubility of organic molecules using deep learning models with varied molecular representations. arXiv preprint. https:\/\/arxiv.org\/abs\/2105.12638"},{"issue":"5","key":"966_CR13","doi-asserted-by":"crossref","first-page":"1668","DOI":"10.3390\/molecules27051668","volume":"27","author":"Y Hou","year":"2022","unstructured":"Hou Y, Wang S, Bai B, Chan HS, Yuan S (2022) Accurate physical property predictions via deep learning. Molecules 27(5):1668","journal-title":"Molecules"},{"issue":"6","key":"966_CR14","doi-asserted-by":"crossref","first-page":"2530","DOI":"10.1021\/acs.jcim.1c00331","volume":"61","author":"PG Francoeur","year":"2021","unstructured":"Francoeur PG, Koes DR (2021) SolTranNet-A machine learning tool for fast aqueous solubility prediction. J Chem Inf Model 61(6):2530\u20132536","journal-title":"J Chem Inf Model"},{"issue":"4","key":"966_CR15","doi-asserted-by":"crossref","first-page":"1099","DOI":"10.1021\/acs.jcim.2c01189","volume":"63","author":"JG Conn","year":"2023","unstructured":"Conn JG, Carter JW, Conn JJ, Subramanian V, Baxter A, Engkvist O, Llinas A, Ratkova EL, Pickett SD, McDonagh JL, Palmer DS (2023) Blinded predictions and post hoc analysis of the second solubility challenge data: exploring training data and feature set selection for machine and deep learning models. J Chem Inf Model 63(4):1099\u20131113","journal-title":"J Chem Inf Model"},{"issue":"46","key":"966_CR16","doi-asserted-by":"crossref","first-page":"42027","DOI":"10.1021\/acsomega.2c03885","volume":"7","author":"M Li","year":"2022","unstructured":"Li M, Chen H, Zhang H, Zeng M, Chen B, Guan L (2022) Prediction of the aqueous solubility of compounds based on light gradient boosting machines with molecular fingerprints and the cuckoo search algorithm. ACS Omega 7(46):42027\u201342035","journal-title":"ACS Omega"},{"key":"966_CR17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-020-0414-z","volume":"12","author":"B Tang","year":"2020","unstructured":"Tang B, Kramer ST, Fang M, Qiu Y, Wu Z, Xu D (2020) A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility. J Cheminform 12:1\u20139","journal-title":"J Cheminform"},{"issue":"1","key":"966_CR18","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/s13015-021-00197-2","volume":"16","author":"NA Azam","year":"2021","unstructured":"Azam NA, Zhu J, Sun Y, Shi Y, Shurbevski A, Zhao L, Nagamochi H, Akutsu T (2021) A novel method for inference of acyclic chemical compounds with bounded branch-height based on artificial neural networks and integer programming. Algorithms Mol Biol 16(1):18","journal-title":"Algorithms Mol Biol"},{"issue":"6","key":"966_CR19","doi-asserted-by":"crossref","first-page":"2847","DOI":"10.3390\/ijms22062847","volume":"22","author":"Y Shi","year":"2021","unstructured":"Shi Y, Zhu J, Azam NA, Haraguchi K, Zhao L, Nagamochi H, Akutsu T (2021) An inverse QSAR method based on a two-layered model and integer programming. Int J Mol Sci 22(6):2847","journal-title":"Int J Mol Sci"},{"key":"966_CR20","doi-asserted-by":"crossref","unstructured":"Zhu J, Azam NA, Haraguchi K, Zhao L, Nagamochi H, Akutsu T (2022) A method for molecular design based on linear regression and integer programming. In Proceedings of the 2022 12th International Conference on Bioscience, Biochemistry and Bioinformatics (pp. 21-28)","DOI":"10.1145\/3510427.3510431"},{"key":"966_CR21","doi-asserted-by":"publisher","DOI":"10.1109\/TCBB.2024.3447780","author":"R Ido","year":"2024","unstructured":"Ido R, Cao S, Zhu J, Azam NA, Haraguchi K, Zhao L, Nagamochi H, Akutsu T (2024) A method for inferring polymers based on linear regression and integer programming. IEEE\/ACM Trans Comput Bioinform. https:\/\/doi.org\/10.1109\/TCBB.2024.3447780","journal-title":"IEEE\/ACM Trans Comput Bioinform"},{"key":"966_CR22","doi-asserted-by":"crossref","DOI":"10.1002\/9781118625590","volume-title":"Applied regression analysis","author":"NR Draper","year":"1998","unstructured":"Draper NR, Smith H (1998) Applied regression analysis, vol 326. John Wiley & Sons, Hoboken"},{"key":"966_CR23","doi-asserted-by":"crossref","unstructured":"Azam NA, Zhu J, Haraguchi K, Zhao L, Nagamochi H, Akutsu T (2021) Molecular design based on artificial neural networks, integer programming and grid neighbor search. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 360-363)","DOI":"10.1109\/BIBM52615.2021.9669710"},{"issue":"10","key":"966_CR24","doi-asserted-by":"crossref","first-page":"2948","DOI":"10.1021\/acs.jcim.3c00308","volume":"63","author":"X Zhu","year":"2023","unstructured":"Zhu X, Polyakov VR, Bajjuri K, Hu H, Maderna A, Tovee CA, Ward SC (2023) Building machine learning small molecule melting points and solubility models using CCDC melting points dataset. J Chem Inf Model 63(10):2948\u20132959","journal-title":"J Chem Inf Model"},{"issue":"4","key":"966_CR25","doi-asserted-by":"crossref","first-page":"1477","DOI":"10.1021\/ci049909h","volume":"44","author":"CA Bergstr\u00f6m","year":"2004","unstructured":"Bergstr\u00f6m CA, Wassvik CM, Norinder U, Luthman K, Artursson P (2004) Global and local computational models for aqueous solubility prediction of drug-like molecules. J Chem Inf Comput Sci 44(4):1477\u20131488","journal-title":"J Chem Inf Comput Sci"},{"issue":"6\u20137","key":"966_CR26","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1002\/minf.201400144","volume":"34","author":"OA Raevsky","year":"2015","unstructured":"Raevsky OA, Polianczyk DE, Grigorev VY, Raevskaja OE, Dearden JC (2015) In silico prediction of aqueous solubility: a comparative study of local and global predictive models. Mol Inf 34(6\u20137):417\u2013430","journal-title":"Mol Inf"},{"issue":"1","key":"966_CR27","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/s41597-022-01154-3","volume":"9","author":"J Meng","year":"2022","unstructured":"Meng J, Chen P, Wahib M, Yang M, Zheng L, Wei Y, Feng S, Liu W (2022) Boosting the predictive performance with aqueous solubility dataset curation. Sci Data 9(1):71","journal-title":"Sci Data"},{"issue":"15","key":"966_CR28","first-page":"6322","volume":"62","author":"A Soyemi","year":"2023","unstructured":"Soyemi A, Szilv\u00e1si T (2023) Calculated physicochemical properties of glycerol-derived solvents to drive plastic waste recycling. Ind Eng Chem Res 62(15):6322\u20136337","journal-title":"Ind Eng Chem Res"},{"key":"966_CR29","unstructured":"PubChem database (2024) https:\/\/pubchem.ncbi.nlm.nih.gov\/"},{"issue":"1","key":"966_CR30","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","volume":"58","author":"R Tibshirani","year":"1996","unstructured":"Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Stat Methodol 58(1):267\u2013288","journal-title":"J R Stat Soc Ser B Stat Methodol"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00966-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-00966-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-00966-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T09:49:55Z","timestamp":1742982595000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-00966-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,26]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["966"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-00966-w","relation":{},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,26]]},"assertion":[{"value":"22 October 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"37"}}