{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:28:55Z","timestamp":1772166535017,"version":"3.50.1"},"reference-count":23,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T00:00:00Z","timestamp":1724889600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T00:00:00Z","timestamp":1724889600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Ion Mobility coupled with Mass Spectrometry (IM-MS) is a promising analytical technique that enhances molecular characterization by measuring collision cross-section (CCS) values, which are indicative of the molecular size and shape. However, the effective application of CCS values in structural analysis is still constrained by the limited availability of experimental data, necessitating the development of accurate machine learning (ML) models for in silico predictions. In this study, we evaluated state-of-the-art Graph Neural Networks (GNNs), trained to predict CCS values using the largest publicly available dataset to date. Although our results confirm the high accuracy of these models within chemical spaces similar to their training environments, their performance significantly declines when applied to structurally novel regions. This discrepancy raises concerns about the reliability of in silico CCS predictions and underscores the need for releasing further publicly available CCS datasets. To mitigate this, we introduce Mol2CCS which demonstrates how generalization can be partially improved by extending models to account for additional features such as molecular fingerprints, descriptors, and the molecule types. Lastly, we also show how confidence models can support by enhancing the reliability of the CCS estimates.<\/jats:p>\n                  <jats:p>\n                    <jats:bold>Scientific contribution<\/jats:bold>\n                  <\/jats:p>\n                  <jats:p>We have benchmarked state-of-the-art graph neural networks for predicting collision cross section. Our work highlights the accuracy of these models when trained and predicted in similar chemical spaces, but also how their accuracy drops when evaluated in structurally novel regions. Lastly, we conclude by presenting potential approaches to mitigate this issue.<\/jats:p>","DOI":"10.1186\/s13321-024-00899-w","type":"journal-article","created":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T10:02:43Z","timestamp":1724925763000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Evaluating the generalizability of graph neural networks for predicting collision cross section"],"prefix":"10.1186","volume":"16","author":[{"given":"Chloe","family":"Engler Hart","sequence":"first","affiliation":[]},{"given":"Ant\u00f3nio Jos\u00e9","family":"Preto","sequence":"additional","affiliation":[]},{"given":"Shaurya","family":"Chanana","sequence":"additional","affiliation":[]},{"given":"David","family":"Healey","sequence":"additional","affiliation":[]},{"given":"Tobias","family":"Kind","sequence":"additional","affiliation":[]},{"given":"Daniel","family":"Domingo-Fern\u00e1ndez","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,8,29]]},"reference":[{"issue":"12","key":"899_CR1","doi-asserted-by":"publisher","first-page":"1836","DOI":"10.1038\/s41592-023-02078-5","volume":"20","author":"ES Baker","year":"2023","unstructured":"Baker ES, Hoang C, Uritboonthai W, Heyman HM, Pratt B, MacCoss M et al (2023) METLIN-CCS: an ion mobility spectrometry collision cross section database. Nat Methods 20(12):1836\u20131837. https:\/\/doi.org\/10.1038\/s41592-023-02078-5","journal-title":"Nat Methods"},{"key":"899_CR2","doi-asserted-by":"publisher","DOI":"10.1038\/s42255-024-01058-z","author":"ES Baker","year":"2024","unstructured":"Baker ES, Uritboonthai W, Aisporna A, Hoang C, Heyman HM, Connell L et al (2024) METLIN-CCS lipid database: an authentic standards resource for lipid classification and identification. Nat Metab. https:\/\/doi.org\/10.1038\/s42255-024-01058-z","journal-title":"Nat Metab"},{"issue":"15","key":"899_CR3","doi-asserted-by":"publisher","first-page":"2887","DOI":"10.1021\/jm9602928","volume":"39","author":"GW Bemis","year":"1996","unstructured":"Bemis GW, Murcko MA (1996) The properties of known drugs. 1. molecular frameworks. J Med Chem 39(15):2887\u20132893","journal-title":"J Med Chem"},{"issue":"5","key":"899_CR4","doi-asserted-by":"publisher","first-page":"750","DOI":"10.1021\/jasms.1c00315","volume":"33","author":"S Das","year":"2022","unstructured":"Das S, Tanemura KA, Dinpazhoh L, Keng M, Schumm C, Leahy L et al (2022) In silico collision cross section calculations to aid metabolite annotation. J Am Soc Mass Spectrom 33(5):750\u2013759. https:\/\/doi.org\/10.1021\/jasms.1c00315","journal-title":"J Am Soc Mass Spectrom"},{"issue":"7","key":"899_CR5","doi-asserted-by":"publisher","first-page":"1762","DOI":"10.1021\/ci9000579","volume":"49","author":"H Dragos","year":"2009","unstructured":"Dragos H, Gilles M, Alexandre V (2009) Predicting the predictability: a unified approach to the applicability domain problem of QSAR models. J Chem Inf Model 49(7):1762\u20131776. https:\/\/doi.org\/10.1021\/ci9000579","journal-title":"J Chem Inf Model"},{"issue":"1","key":"899_CR6","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1038\/s42004-023-00939-w","volume":"6","author":"R Guo","year":"2023","unstructured":"Guo R, Zhang Y, Liao Y, Yang Q, Xie T, Fan X et al (2023) Highly accurate and large-scale collision cross sections prediction with graph neural networks. Commun Chem 6(1):139. https:\/\/doi.org\/10.1038\/s42004-023-00939-w","journal-title":"Commun Chem"},{"issue":"1","key":"899_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1002\/jms.1383","volume":"43","author":"AB Kanu","year":"2008","unstructured":"Kanu AB, Dwivedi P, Tam M, Matz L, Hill HH Jr (2008) Ion mobility\u2013mass spectrometry. J Mass Spectrom 43(1):1\u201322. https:\/\/doi.org\/10.1002\/jms.1383","journal-title":"J Mass Spectrom"},{"key":"899_CR8","doi-asserted-by":"publisher","unstructured":"Landrum G. (2016). RDKit: open-source cheminformatics, http:\/\/www.rdkit.org\/. https:\/\/doi.org\/10.5281\/zenodo.7415128","DOI":"10.5281\/zenodo.7415128"},{"issue":"10","key":"899_CR9","doi-asserted-by":"publisher","first-page":"4050","DOI":"10.3390\/molecules28104050","volume":"28","author":"X Li","year":"2023","unstructured":"Li X, Wang H, Jiang M, Ding M, Xu X, Xu B et al (2023) Collision cross section prediction based on machine learning. Molecules 28(10):4050. https:\/\/doi.org\/10.3390\/molecules28104050","journal-title":"Molecules"},{"issue":"11","key":"899_CR10","doi-asserted-by":"publisher","first-page":"2756","DOI":"10.3390\/molecules23112756","volume":"23","author":"I Luque Ruiz","year":"2018","unstructured":"Luque Ruiz I, G\u00f3mez-Nieto M\u00c1 (2018) Study of the applicability domain of the QSAR classification models by means of the rivality and modelability indexes. Molecules 23(11):2756. https:\/\/doi.org\/10.3390\/molecules23112756","journal-title":"Molecules"},{"issue":"12","key":"899_CR11","doi-asserted-by":"publisher","first-page":"1700076","DOI":"10.1002\/minf.201700076","volume":"36","author":"S Ochi","year":"2017","unstructured":"Ochi S, Miyao T, Funatsu K (2017) Structure modification toward applicability domain of a QSAR\/QSPR model considering activity\/property. Mol Inf 36(12):1700076. https:\/\/doi.org\/10.1002\/minf.201700076","journal-title":"Mol Inf"},{"key":"899_CR12","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M et al (2011) Scikit-learn: machine learning in python. J Machine Learn Res 12:2825\u20132830","journal-title":"J Machine Learn Res"},{"issue":"4","key":"899_CR13","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1039\/C8SC04396E","volume":"10","author":"JA Picache","year":"2019","unstructured":"Picache JA, Rose BS, Balinski A, Leaptrot KL, Sherrod SD, May JC, McLean JA (2019) Collision cross section compendium to annotate and predict multi-omic compound identities. Chem Sci 10(4):983\u2013993. https:\/\/doi.org\/10.1039\/C8SC04396E","journal-title":"Chem Sci"},{"issue":"8","key":"899_CR14","doi-asserted-by":"publisher","first-page":"5191","DOI":"10.1021\/acs.analchem.8b05821","volume":"91","author":"PL Plante","year":"2019","unstructured":"Plante PL, Francovic-Fontaine \u00c9, May JC, McLean JA, Baker ES, Laviolette F et al (2019) Predicting ion mobility collision cross-sections using a deep neural network: DeepCCS. Anal Chem 91(8):5191\u20135199. https:\/\/doi.org\/10.1021\/acs.analchem.8b05821","journal-title":"Anal Chem"},{"issue":"1","key":"899_CR15","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1186\/s13321-022-00649-w","volume":"14","author":"AJ Preto","year":"2022","unstructured":"Preto AJ, Correia PC, Moreira IS (2022) DrugTax: package for drug taxonomy identification and explainable feature extraction. J Cheminform 14(1):73. https:\/\/doi.org\/10.1186\/s13321-022-00649-w","journal-title":"J Cheminform"},{"issue":"50","key":"899_CR16","doi-asserted-by":"publisher","first-page":"17456","DOI":"10.1021\/acs.analchem.2c03491","volume":"94","author":"MA Rainey","year":"2022","unstructured":"Rainey MA, Watson CA, Asef CK, Foster MR, Baker ES, Fern\u00e1ndez FM (2022) CCS Predictor 2.0: an open-source jupyter notebook tool for filtering out false positives in metabolomics. Anal Chem 94(50):17456\u201317466. https:\/\/doi.org\/10.1021\/acs.analchem.2c03491","journal-title":"Anal Chem"},{"issue":"6","key":"899_CR17","doi-asserted-by":"publisher","first-page":"4548","DOI":"10.1021\/acs.analchem.9b05772","volume":"92","author":"DH Ross","year":"2020","unstructured":"Ross DH, Cho JH, Xu L (2020) Breaking down structural diversity for comprehensive prediction of ion-neutral collision cross sections. Anal Chem 92(6):4548\u20134557. https:\/\/doi.org\/10.1021\/acs.analchem.9b05772","journal-title":"Anal Chem"},{"key":"899_CR18","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.chemolab.2015.04.013","volume":"145","author":"K Roy","year":"2015","unstructured":"Roy K, Kar S, Ambure P (2015) On a simple approach for determining applicability domain of QSAR models. Chemom Intell Lab Syst 145:22\u201329. https:\/\/doi.org\/10.1016\/j.chemolab.2015.04.013","journal-title":"Chemom Intell Lab Syst"},{"key":"899_CR19","doi-asserted-by":"crossref","unstructured":"Simonovsky M, Komodakis N. (2017). Dynamic edge-conditioned filters in convolutional neural networks on graphs. Proceedings of the IEEE conference on computer vision and pattern recognition. 3693\u20133702","DOI":"10.1109\/CVPR.2017.11"},{"key":"899_CR20","doi-asserted-by":"publisher","first-page":"503","DOI":"10.1007\/s00216-020-03019-3","volume":"413","author":"T Stricker","year":"2021","unstructured":"Stricker T, Bonner R, Lisacek F, Hopfgartner G (2021) Adduct annotation in liquid chromatography\/high-resolution mass spectrometry to enhance compound identification. Anal Bioanal Chem 413:503\u2013517. https:\/\/doi.org\/10.1007\/s00216-020-03019-3","journal-title":"Anal Bioanal Chem"},{"key":"899_CR21","unstructured":"Xie, T., Yang, Q., Sun, J., Zhang, H., Wang, Y., and Lu, H. Large-scale prediction of collision cross-section with graph convolutional network for compound identification."},{"key":"899_CR22","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae084","author":"J Xue","year":"2024","unstructured":"Xue J, Wang B, Ji H, Li W (2024) RT-transformer: retention time prediction for metabolite annotation to assist in metabolite identification. Bioinformatics. https:\/\/doi.org\/10.1093\/bioinformatics\/btae084","journal-title":"Bioinformatics"},{"issue":"37","key":"899_CR23","doi-asserted-by":"publisher","first-page":"13913","DOI":"10.1021\/acs.analchem.3c02267","volume":"95","author":"H Zhang","year":"2023","unstructured":"Zhang H, Luo M, Wang H, Ren F, Yin Y, Zhu ZJ (2023) AllCCS2: curation of ion mobility collision cross-section atlas for small molecules using comprehensive molecular representations. Anal Chem 95(37):13913\u201313921. https:\/\/doi.org\/10.1021\/acs.analchem.3c02267","journal-title":"Anal Chem"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00899-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-024-00899-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-024-00899-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T10:05:34Z","timestamp":1724925934000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-024-00899-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,29]]},"references-count":23,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["899"],"URL":"https:\/\/doi.org\/10.1186\/s13321-024-00899-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2024-32j2t","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,29]]},"assertion":[{"value":"13 May 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 August 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 August 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors were employees of Enveda Biosciences Inc. during the course of this work and have real or potential ownership interest in the company.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"105"}}