{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:32:30Z","timestamp":1772166750352,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T00:00:00Z","timestamp":1752710400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T00:00:00Z","timestamp":1752710400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Karlsruher Institut f\u00fcr Technologie (KIT)"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:sec>\n                    <jats:title>Abstract<\/jats:title>\n                    <jats:p>Large-scale cheminformatics datasets, such as those used in drug discovery and materials science, are often represented as dense similarity graphs; however, their complexity hinders scalable analysis and interpretability. We propose a novel Inverse Link Prediction (ILP) framework, powered by Graph Neural Networks (GNNs), for knowledge-preserving graph sparsification, using Metal\u2013Organic Framework (MOF) datasets as a case study. The framework comprises four key components: (1) Graph Convolutional Networks (GCNs) to predict edge importance based on node features, (2) ILP to compute inverse weights identifying redundant edges, (3) dual-weight analysis to integrate initial similarity weights with GCN-derived weights, and (4) modularity optimization to prune edges while preserving community structures and domain knowledge. Validated on MOF similarity graphs, the sparsified graphs maintain structural integrity and support robust performance across both graph-based (GCN, GraphRAGE) and non-graph-based (Gradient Boosting Trees, Logistic Regression, Na\u00efve Bayes, Deep Neural Networks) machine learning models for tasks such as pore limiting diameter prediction. This Inverse Link Prediction with Graph Convolutional Networks (ILP-GCN) framework offers a scalable and interpretable solution for cheminformatics, with broad applications in material discovery and beyond.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Graphical Abstract<\/jats:title>\n                  <\/jats:sec>","DOI":"10.1186\/s40537-025-01220-8","type":"journal-article","created":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T15:42:38Z","timestamp":1752766958000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Inverse link prediction with graph convolutional networks for knowledge-preserving sparsification in cheminformatics"],"prefix":"10.1186","volume":"12","author":[{"given":"Elnaz","family":"Bangian Tabrizi","sequence":"first","affiliation":[]},{"given":"Mehrdad","family":"Jalali","sequence":"additional","affiliation":[]},{"given":"Mahboobeh","family":"Houshmand","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,7,17]]},"reference":[{"issue":"1","key":"1220_CR1","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/s13321-023-00764-2","volume":"15","author":"M Jalali","year":"2023","unstructured":"Jalali M, Wonanke AD, W\u00f6ll C. MOFGalaxyNet: a social network analysis for predicting guest accessibility in metal\u2013organic frameworks utilizing graph convolutional networks. J Cheminform. 2023;15(1):94.","journal-title":"J Cheminform"},{"issue":"4","key":"1220_CR2","doi-asserted-by":"publisher","first-page":"704","DOI":"10.3390\/nano12040704","volume":"12","author":"M Jalali","year":"2022","unstructured":"Jalali M, Tsotsalas M, W\u00f6ll C. MOFSocialNet: exploiting metal-organic framework relationships via social network analysis. Nanomaterials. 2022;12(4):704.","journal-title":"Nanomaterials"},{"issue":"5","key":"1220_CR3","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1039\/b200393g","volume":"32","author":"SL James","year":"2003","unstructured":"James SL. Metal-organic frameworks. Chem Soc Rev. 2003;32(5):276\u201388.","journal-title":"Chem Soc Rev"},{"key":"1220_CR4","doi-asserted-by":"publisher","first-page":"2413658","DOI":"10.1002\/adma.202413658","volume":"37","author":"L Chai","year":"2025","unstructured":"Chai L, Li R, Sun Y, Zhou K, Pan J. MOF-derived carbon-based materials for energy-related applications. Adv Mater. 2025;37:2413658.","journal-title":"Adv Mater"},{"issue":"1","key":"1220_CR5","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1039\/D4CS00432A","volume":"54","author":"Z Han","year":"2025","unstructured":"Han Z, et al. Development of the design and synthesis of metal\u2013organic frameworks (MOFs)\u2013from large scale attempts, functional oriented modifications, to artificial intelligence (AI) predictions. Chem Soc Rev. 2025;54(1):367\u201395.","journal-title":"Chem Soc Rev"},{"key":"1220_CR6","doi-asserted-by":"publisher","first-page":"117765","DOI":"10.1016\/j.mseb.2024.117765","volume":"311","author":"Q Ma","year":"2025","unstructured":"Ma Q, et al. Computational design of metal-organic frameworks for sustainable energy and environmental applications: bridging theory and experiment. Mater Sci Eng, B. 2025;311:117765.","journal-title":"Mater Sci Eng, B"},{"issue":"1","key":"1220_CR7","doi-asserted-by":"publisher","first-page":"015020","DOI":"10.1088\/2632-2153\/ad9fcf","volume":"6","author":"Z Yang","year":"2025","unstructured":"Yang Z, Yu Q, Zhan Y, Liu J. Incorporating edge convolution and correlative self-attention into graph neural network for material properties prediction. Mach Learn Sci Technol. 2025;6(1):015020.","journal-title":"Mach Learn Sci Technol"},{"key":"1220_CR8","unstructured":"Borgatti SP, Agneessens F, Johnson JC, Everett MG. Analyzing social networks. 2024."},{"key":"1220_CR9","doi-asserted-by":"crossref","unstructured":"Spielman DA and Teng S-H. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, 2004; pp. 81\u201390.","DOI":"10.1145\/1007352.1007372"},{"issue":"1","key":"1220_CR10","doi-asserted-by":"publisher","first-page":"26677","DOI":"10.1038\/s41598-024-73268-0","volume":"14","author":"A Jahandoost","year":"2024","unstructured":"Jahandoost A, Dashti R, Houshmand M, Hosseini SA. Utilizing machine learning and molecular dynamics for enhanced drug delivery in nanoparticle systems. Sci Rep. 2024;14(1):26677.","journal-title":"Sci Rep"},{"issue":"2","key":"1220_CR11","first-page":"171","volume":"72","author":"CR Groom","year":"2016","unstructured":"Groom CR, Bruno IJ, Lightfoot MP, Ward SC. The Cambridge structural database. Struct Sci. 2016;72(2):171\u20139.","journal-title":"Struct Sci"},{"key":"1220_CR12","doi-asserted-by":"publisher","DOI":"10.1090\/conm\/588","volume-title":"Graph partitioning and graph clustering","author":"DA Bader","year":"2013","unstructured":"Bader DA, Meyerhenke H, Sanders P, Wagner D. Graph partitioning and graph clustering. Providence: American Mathematical Society Providence; 2013."},{"key":"1220_CR13","doi-asserted-by":"crossref","unstructured":"Satuluri V, Parthasarathy S, Ruan Y. Local graph sparsification for scalable clustering. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011; pp. 721\u2013732.","DOI":"10.1145\/1989323.1989399"},{"issue":"2","key":"1220_CR14","first-page":"1","volume":"8","author":"NK Ahmed","year":"2013","unstructured":"Ahmed NK, Neville J, Kompella R. Network sampling: from static to streaming graphs. ACM Trans Knowl Discov Data (TKDD). 2013;8(2):1\u201356.","journal-title":"ACM Trans Knowl Discov Data (TKDD)"},{"key":"1220_CR15","doi-asserted-by":"publisher","first-page":"118087","DOI":"10.1016\/j.eswa.2022.118087","volume":"208","author":"Y Shao","year":"2022","unstructured":"Shao Y, Chen L, Chen Y, Liu W. Social influence source locating based on network sparsification and stratification. Expert Syst Appl. 2022;208:118087.","journal-title":"Expert Syst Appl"},{"key":"1220_CR16","doi-asserted-by":"crossref","unstructured":"Wu H-Y and Chen Y-L. Graph sparsification with generative adversarial network. in 2020 IEEE international conference on data mining (ICDM), IEEE. 2020; pp. 1328\u20131333","DOI":"10.1109\/ICDM50108.2020.00172"},{"key":"1220_CR17","doi-asserted-by":"crossref","unstructured":"Chen Y et al. Demystifying graph sparsification algorithms in graph properties preservation. 2023. arXiv preprint arXiv:2311.12314.","DOI":"10.14778\/3632093.3632106"},{"key":"1220_CR18","doi-asserted-by":"crossref","unstructured":"Peng H et al.Towards sparsification of graph neural networks. in 2022 IEEE 40th International Conference on Computer Design (ICCD), IEEE. 2022; pp. 272\u2013279","DOI":"10.1109\/ICCD56317.2022.00048"},{"key":"1220_CR19","doi-asserted-by":"crossref","unstructured":"Aghdaei A and Feng Z. inGRASS: incremental graph spectral sparsification via low-resistance-diameter decomposition, 2024. arXiv preprint arXiv:2402.16990,\u3039","DOI":"10.1145\/3649329.3656520"},{"issue":"3","key":"1220_CR20","doi-asserted-by":"publisher","first-page":"427","DOI":"10.14778\/3632093.3632106","volume":"17","author":"Y Chen","year":"2023","unstructured":"Chen Y, et al. Demystifying graph sparsification algorithms in graph properties preservation. Proc VLDB Endowment. 2023;17(3):427\u201340.","journal-title":"Proc VLDB Endowment"},{"issue":"1","key":"1220_CR21","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1007\/s11235-024-01240-4","volume":"88","author":"P Han-huai","year":"2025","unstructured":"Han-huai P, Lin-wei W, Hao L, Abdollahi M. Identifying influential nodes in complex networks: a semi-local centrality measure based on augmented graph and average shortest path theory. Telecommun Syst. 2025;88(1):25.","journal-title":"Telecommun Syst"},{"key":"1220_CR22","doi-asserted-by":"publisher","first-page":"102473","DOI":"10.1016\/j.jocs.2024.102473","volume":"84","author":"S Esfandiari","year":"2025","unstructured":"Esfandiari S, Moosavi MR. Identifying influential nodes in complex networks through the k-shell index and neighborhood information. J Comput Sci. 2025;84:102473.","journal-title":"J Comput Sci"},{"key":"1220_CR23","doi-asserted-by":"crossref","unstructured":"Ruan Y, Liu S, Tang J, Guo Y, Yu T. GLC: a dual-perspective approach for identifying influential nodes in complex networks,\u201d Expert Systems with Applications, 2024; p. 126292","DOI":"10.1016\/j.eswa.2024.126292"},{"issue":"4","key":"1220_CR24","doi-asserted-by":"publisher","first-page":"150","DOI":"10.3390\/fi17040150","volume":"17","author":"D Ahmadzadeh","year":"2025","unstructured":"Ahmadzadeh D, Jalali M, Ghaemi R, Kheirabadi M. GraphDBSCAN: optimized DBSCAN for noise-resistant community detection in graph clustering. Future Internet. 2025;17(4):150.","journal-title":"Future Internet"},{"issue":"4","key":"1220_CR25","doi-asserted-by":"publisher","first-page":"1951","DOI":"10.1109\/TFUZZ.2023.3338565","volume":"32","author":"Y Yang","year":"2023","unstructured":"Yang Y, et al. Fuzzy-based deep attributed graph clustering. IEEE Trans Fuzzy Syst. 2023;32(4):1951\u201364.","journal-title":"IEEE Trans Fuzzy Syst"},{"key":"1220_CR26","doi-asserted-by":"crossref","unstructured":"Yang Y, Li G, Li D, Zhang J, Hu P, Hu L. Integrating fuzzy clustering and graph convolution network to accurately identify clusters from attributed graph. IEEE Transactions on Network Science and Engineering, 2024.","DOI":"10.1109\/TNSE.2024.3524077"},{"issue":"12","key":"1220_CR27","doi-asserted-by":"publisher","first-page":"2435","DOI":"10.1109\/TKDE.2018.2819651","volume":"30","author":"P Parchas","year":"2018","unstructured":"Parchas P, Papailiou N, Papadias D, Bonchi F. Uncertain graph sparsification. IEEE Trans Knowl Data Eng. 2018;30(12):2435\u201349.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1220_CR28","doi-asserted-by":"publisher","first-page":"3651","DOI":"10.1109\/TSP.2023.3300632","volume":"71","author":"S Rey","year":"2023","unstructured":"Rey S, Tenorio VM, Marqu\u00e9s AG. Robust graph filter identification and graph denoising from signal observations. IEEE Trans Signal Process. 2023;71:3651\u201366.","journal-title":"IEEE Trans Signal Process"},{"key":"1220_CR29","unstructured":"Zhang M and Chen Y. Link prediction based on graph neural networks. Advances in neural information processing systems, 2018; vol. 31"},{"issue":"6","key":"1220_CR30","doi-asserted-by":"publisher","first-page":"1150","DOI":"10.1016\/j.physa.2010.11.027","volume":"390","author":"L L\u00fc","year":"2011","unstructured":"L\u00fc L, Zhou T. Link prediction in complex networks: a survey. Phys A Stat Mech Appl. 2011;390(6):1150\u201370.","journal-title":"Phys A Stat Mech Appl"},{"key":"1220_CR31","doi-asserted-by":"publisher","first-page":"124289","DOI":"10.1016\/j.physa.2020.124289","volume":"553","author":"A Kumar","year":"2020","unstructured":"Kumar A, Singh SS, Singh K, Biswas B. Link prediction techniques, applications, and performance: a survey. Physica A Stat Mech Appl. 2020;553:124289.","journal-title":"Physica A Stat Mech Appl"},{"issue":"3","key":"1220_CR32","doi-asserted-by":"publisher","first-page":"3902","DOI":"10.1007\/s11227-023-05591-8","volume":"80","author":"D Arrar","year":"2024","unstructured":"Arrar D, Kamel N, Lakhfif A. A comprehensive survey of link prediction methods. J Supercomput. 2024;80(3):3902\u201342.","journal-title":"J Supercomput"},{"issue":"9","key":"1220_CR33","doi-asserted-by":"publisher","DOI":"10.1002\/anie.202114573","volume":"61","author":"R P\u00e9tuya","year":"2022","unstructured":"P\u00e9tuya R, et al. Machine-learning prediction of metal-organic framework guest accessibility from linker and metal chemistry. Angew Chem Int Ed. 2022;61(9): e202114573.","journal-title":"Angew Chem Int Ed"},{"issue":"1","key":"1220_CR34","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1021\/ci00057a005","volume":"28","author":"D Weininger","year":"1988","unstructured":"Weininger D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. J Chem Inf Comput Sci. 1988;28(1):31\u20136.","journal-title":"J Chem Inf Comput Sci"},{"key":"1220_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13321-015-0069-3","volume":"7","author":"D Bajusz","year":"2015","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform. 2015;7:1\u201313.","journal-title":"J Cheminform"},{"issue":"1","key":"1220_CR36","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","volume":"20","author":"J Cohen","year":"1960","unstructured":"Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Measur. 1960;20(1):37\u201346.","journal-title":"Educ Psychol Measur"},{"issue":"7","key":"1220_CR37","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0022557","volume":"6","author":"J Yang","year":"2011","unstructured":"Yang J, Chen Y. Fast computing betweenness centrality with virtual nodes on large sparse networks. PLoS ONE. 2011;6(7): e22557.","journal-title":"PLoS ONE"},{"issue":"5","key":"1220_CR38","doi-asserted-by":"publisher","first-page":"052808","DOI":"10.1103\/PhysRevE.90.052808","volume":"90","author":"T Martin","year":"2014","unstructured":"Martin T, Zhang X, Newman ME. Localization and centrality in networks. Phys Rev E. 2014;90(5):052808.","journal-title":"Phys Rev E"},{"key":"1220_CR39","doi-asserted-by":"crossref","unstructured":"Zhang J and Luo Y. Degree centrality, betweenness centrality, and closeness centrality in social network. in 2017 2nd international conference on modelling, simulation and applied mathematics (MSAM2017), 2017, pp. 300\u2013303: Atlantis press.","DOI":"10.2991\/msam-17.2017.68"},{"issue":"1","key":"1220_CR40","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/0378-8733(94)00248-9","volume":"17","author":"P Hage","year":"1995","unstructured":"Hage P, Harary F. Eccentricity and centrality in networks. Soc Netw. 1995;17(1):57\u201363.","journal-title":"Soc Netw"},{"issue":"4","key":"1220_CR41","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1016\/j.socnet.2007.04.002","volume":"29","author":"P Bonacich","year":"2007","unstructured":"Bonacich P. Some unique properties of eigenvector centrality. Soc Netw. 2007;29(4):555\u201364.","journal-title":"Soc Netw"},{"key":"1220_CR42","doi-asserted-by":"publisher","first-page":"121360","DOI":"10.1016\/j.ins.2024.121360","volume":"686","author":"B-W Zhao","year":"2025","unstructured":"Zhao B-W, et al. Regulation-aware graph learning for drug repositioning over heterogeneous biological network. Inf Sci. 2025;686:121360.","journal-title":"Inf Sci"},{"key":"1220_CR43","doi-asserted-by":"crossref","unstructured":"Su X et al. Knowledge graph neural network with spatial-aware capsule for drug-drug interaction prediction. IEEE journal of biomedical and health informatics, 2024.","DOI":"10.1109\/JBHI.2024.3419015"},{"issue":"2","key":"1220_CR44","doi-asserted-by":"publisher","first-page":"1606","DOI":"10.1109\/TCBB.2022.3196336","volume":"20","author":"X Wang","year":"2022","unstructured":"Wang X, et al. PPISB: a novel network-based algorithm of predicting protein-protein interactions with mixed membership stochastic blockmodel. IEEE\/ACM Trans Comput Biol Bioinf. 2022;20(2):1606\u201312.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"issue":"1","key":"1220_CR45","doi-asserted-by":"publisher","first-page":"761","DOI":"10.1007\/s11227-023-05513-8","volume":"80","author":"E Dalirinia","year":"2024","unstructured":"Dalirinia E, Jalali M, Yaghoobi M, Tabatabaee H. Lotus effect optimization algorithm (LEA): a lotus nature-inspired algorithm for engineering design optimization. J Supercomput. 2024;80(1):761\u201399.","journal-title":"J Supercomput"},{"issue":"4","key":"1220_CR46","doi-asserted-by":"publisher","DOI":"10.1002\/eng2.70137","volume":"7","author":"E Dalirinia","year":"2025","unstructured":"Dalirinia E, Yaghoobi M, Tabatabaee H, Chandna S, Jalali M. Multimodal lotus effect algorithm for engineering optimization problems. Eng Rep. 2025;7(4): e70137.","journal-title":"Eng Rep"}],"updated-by":[{"DOI":"10.1186\/s40537-025-01263-x","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T00:00:00Z","timestamp":1755648000000}}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01220-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01220-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01220-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,7]],"date-time":"2025-09-07T13:36:46Z","timestamp":1757252206000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01220-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,17]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1220"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01220-8","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-5854851\/v1","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,17]]},"assertion":[{"value":"18 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 June 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 July 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 August 2025","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1186\/s40537-025-01263-x","URL":"https:\/\/doi.org\/10.1186\/s40537-025-01263-x","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval\u00a0and consent to participate"}},{"value":"The authors declare no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"176"}}