{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,13]],"date-time":"2026-02-13T23:55:42Z","timestamp":1771026942371,"version":"3.50.1"},"reference-count":51,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,12,25]],"date-time":"2022-12-25T00:00:00Z","timestamp":1671926400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,25]],"date-time":"2022-12-25T00:00:00Z","timestamp":1671926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We present a novel hypergraph-based framework enabling an assessment of the importance of binary classification data elements. Specifically, we apply the hypergraph model to rate data samples\u2019 and categorical feature values\u2019 relevance to classification labels. The proposed Hypergraph-based Importance ratings are theoretically grounded on the hypergraph cut conductance minimization concept. As a result of using hypergraph representation, which is a lossless representation from the perspective of higher-order relationships in data, our approach allows for more precise exploitation of the information on feature and sample coincidences. The solution was tested using two scenarios: undersampling for imbalanced classification data and feature selection. The experimentation results have proven the good quality of the new approach when compared with other state-of-the-art and baseline methods for both scenarios measured using the average precision evaluation metric.<\/jats:p>","DOI":"10.1007\/s10115-022-01786-2","type":"journal-article","created":{"date-parts":[[2022,12,25]],"date-time":"2022-12-25T05:02:11Z","timestamp":1671944531000},"page":"1657-1683","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Hypergraph-based importance assessment for binary classification data"],"prefix":"10.1007","volume":"65","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5223-240X","authenticated-orcid":false,"given":"Pawel","family":"Misiorek","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Szymon","family":"Janowski","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,12,25]]},"reference":[{"key":"1786_CR1","doi-asserted-by":"publisher","DOI":"10.1140\/epjds\/s13688-020-00231-0","author":"SG Aksoy","year":"2020","unstructured":"Aksoy SG, Joslyn C, Ortiz Marrero C, Praggastis B, Purvine E (2020) Hypernetwork science via high-order hypergraph walks. EPJ Data Sci. https:\/\/doi.org\/10.1140\/epjds\/s13688-020-00231-0","journal-title":"EPJ Data Sci"},{"issue":"1","key":"1786_CR2","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1186\/s12859-021-04197-2","volume":"22","author":"S Feng","year":"2021","unstructured":"Feng S, Heath E, Jefferson BA, Joslyn CA, Kvinge H, Mitchell HD, Praggastis B, Eisfeld AJ, Sims AC, Thackray LB, Fan S, Walters KB, Halfmann PJ, Westhoff-Smith D, Tan Q, Menachery VD, Sheahan TP, Cockrell AS, Kocher JF, Stratton KG, Heller NC, Bramer LM, Diamond MS, Baric RS, Waters KM, Kawaoka Y, McDermott JE, Purvine E (2021) Hypergraph models of biological networks to identify genes critical to pathogenic viral response. BMC Bioinform. 22(1):287. https:\/\/doi.org\/10.1186\/s12859-021-04197-2","journal-title":"BMC Bioinform."},{"issue":"11","key":"1786_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0224307","volume":"14","author":"B Kaminski","year":"2019","unstructured":"Kaminski B, Poulin V, Pralat P, Szufel P, Th\u00e9berge F (2019) Clustering via hypergraph modularity. PLoS One 14(11):1\u201315. https:\/\/doi.org\/10.1371\/journal.pone.0224307","journal-title":"PLoS One"},{"issue":"1","key":"1786_CR4","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1007\/s41109-020-00300-3","volume":"5","author":"T Kumar","year":"2020","unstructured":"Kumar T, Vaidyanathan S, Ananthapadmanabhan H, Parthasarathy S, Ravindran B (2020) Hypergraph clustering by iteratively reweighted modularity maximization. Appl Netw Sci 5(1):52. https:\/\/doi.org\/10.1007\/s41109-020-00300-3","journal-title":"Appl Netw Sci"},{"key":"1786_CR5","doi-asserted-by":"publisher","unstructured":"Li J, He J, Zhu Y (2018) E-tail product return prediction via hypergraph-based local graph cut. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining. KDD \u201918. Association for Computing Machinery: New York, NY, pp. 519\u2013527. https:\/\/doi.org\/10.1145\/3219819.3219829","DOI":"10.1145\/3219819.3219829"},{"key":"1786_CR6","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1186\/s40537-018-0151-6","volume":"5","author":"JL Leevy","year":"2018","unstructured":"Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. J Big Data 5:42","journal-title":"J Big Data"},{"key":"1786_CR7","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s10844-017-0446-7","volume":"50","author":"M Lango","year":"2017","unstructured":"Lango M, Stefanowski J (2017) Multi-class and feature selection extensions of roughly balanced bagging for imbalanced data. J Intell Inf Syst 50:97\u2013127","journal-title":"J Intell Inf Syst"},{"issue":"2","key":"1786_CR8","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1007\/s42452-021-04148-9","volume":"3","author":"M Saarela","year":"2021","unstructured":"Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3(2):272. https:\/\/doi.org\/10.1007\/s42452-021-04148-9","journal-title":"SN Appl Sci"},{"issue":"2","key":"1786_CR9","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1007\/s10115-020-01519-3","volume":"63","author":"A Urkullu","year":"2021","unstructured":"Urkullu A, P\u00e9rez A, Calvo B (2021) Statistical model for reproducibility in ranking-based feature selection. Knowl Inf Syst 63(2):379\u2013410. https:\/\/doi.org\/10.1007\/s10115-020-01519-3","journal-title":"Knowl Inf Syst"},{"key":"1786_CR10","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1007\/978-3-642-23672-3_28","volume-title":"Computer analysis of images and patterns","author":"Z Zhang","year":"2011","unstructured":"Zhang Z, Hancock ER (2011) A hypergraph-based approach to feature selection. In: Real P, Diaz-Pernil D, Molina-Abril H, Berciano A, Kropatsch W (eds) Computer analysis of images and patterns. Springer, Berlin, Heidelberg, pp 228\u2013235"},{"key":"1786_CR11","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1007\/978-3-319-18781-5_17","volume-title":"Challenges in computational statistics and data mining","author":"J Stefanowski","year":"2016","unstructured":"Stefanowski J (2016) Dealing with data difficulty factors while learning from imbalanced data. In: Matwin S, Mielniczuk J (eds) Challenges in computational statistics and data mining. Springer, Cham, pp 333\u2013363. https:\/\/doi.org\/10.1007\/978-3-319-18781-5_17"},{"key":"1786_CR12","unstructured":"Yadati N, Nimishakavi M, Yadav P, Nitin V, Louis A, Talukdar PP (2019) Hypergcn: a new method for training graph convolutional networks on hypergraphs. In: Wallach H, Larochelle H, Beygelzimer A, d\u2019Alch\u00e9-Buc F, Fox E, Garnett R (eds.) Advances in Neural Information Processing Systems, vol. 32, Curran Associates, Inc., Red Hook, NY, USA, pp. 1\u201312"},{"issue":"01","key":"1786_CR13","doi-asserted-by":"publisher","first-page":"3558","DOI":"10.1609\/aaai.v33i01.33013558","volume":"33","author":"Y Feng","year":"2019","unstructured":"Feng Y, You H, Zhang Z, Ji R, Gao Y (2019) Hypergraph neural networks. Proceedings of the AAAI conference on artificial intelligence 33(01):3558\u20133565. https:\/\/doi.org\/10.1609\/aaai.v33i01.33013558","journal-title":"Proceedings of the AAAI conference on artificial intelligence"},{"key":"1786_CR14","doi-asserted-by":"publisher","DOI":"10.1145\/3494672","author":"D Pessach","year":"2022","unstructured":"Pessach D, Shmueli E (2022) A review on fairness in machine learning. ACM Comput Surv. https:\/\/doi.org\/10.1145\/3494672","journal-title":"ACM Comput Surv"},{"key":"1786_CR15","unstructured":"Chitra U, Raphael BJ (2019) Random walks on hypergraphs with edge-dependent vertex weights. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Proceedings of machine learning research, vol. 97, pp. 1172\u20131181. PMLR, USA. http:\/\/proceedings.mlr.press\/v97\/chitra19a.html"},{"key":"1786_CR16","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1109\/JIOT.2019.2950213","volume":"7","author":"S Zhang","year":"2020","unstructured":"Zhang S, Ding Z, Cui S (2020) Introducing hypergraph signal processing: theoretical foundation and practical applications. IEEE Int Things J 7:639\u2013660","journal-title":"IEEE Int Things J"},{"key":"1786_CR17","doi-asserted-by":"publisher","unstructured":"Chodrow PS (2020) Configuration models of random hypergraphs. J Complex Netw 8(3) https:\/\/academic.oup.com\/comnet\/article-pdf\/8\/3\/cnaa018\/33559166\/cnaa018.pdf. https:\/\/doi.org\/10.1093\/comnet\/cnaa018","DOI":"10.1093\/comnet\/cnaa018"},{"key":"1786_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-020-00327-4","volume":"7","author":"RC Chen","year":"2020","unstructured":"Chen RC, Dewi C, Huang S-W, Caraka RE (2020) Selecting critical features for data classification based on machine learning methods. J Big Data 7:1\u201326","journal-title":"J Big Data"},{"key":"1786_CR19","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511973000","volume-title":"Machine learning: the art and science of algorithms that make sense of data","author":"P Flach","year":"2012","unstructured":"Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, New York, NY"},{"issue":"6","key":"1786_CR20","doi-asserted-by":"publisher","first-page":"1429","DOI":"10.1007\/s10115-021-01560-w","volume":"63","author":"D Brzezinski","year":"2021","unstructured":"Brzezinski D, Minku LL, Pewinski T, Stefanowski J, Szumaczuk A (2021) The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowl Inf Syst 63(6):1429\u20131469. https:\/\/doi.org\/10.1007\/s10115-021-01560-w","journal-title":"Knowl Inf Syst"},{"issue":"5","key":"1786_CR21","doi-asserted-by":"publisher","first-page":"1847","DOI":"10.1016\/j.eswa.2012.09.017","volume":"40","author":"O Kwon","year":"2013","unstructured":"Kwon O, Sim JM (2013) Effects of data set features on the performances of classification algorithms. Expert Syst Appl 40(5):1847\u20131857. https:\/\/doi.org\/10.1016\/j.eswa.2012.09.017","journal-title":"Expert Syst Appl"},{"key":"1786_CR22","unstructured":"Dorogush AV, Gulin A, Gusev G, Kazeev N, Prokhorenkova LO, Vorobev A (2017) Fighting biases with dynamic boosting. CoRR abs\/1706.09516. 1706.09516"},{"key":"1786_CR23","first-page":"3146","volume":"30","author":"G Ke","year":"2017","unstructured":"Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst 30:3146\u20133154","journal-title":"Adv Neural Inf Process Syst"},{"key":"1786_CR24","doi-asserted-by":"publisher","unstructured":"D\u00f6rpinghaus J, Stefan A, Schultz B, Jacobs M (2022) Context mining and graph queries on giant biomedical knowledge graphs. Knowl Inf Syst 64(5):1239\u20131262. https:\/\/doi.org\/10.1007\/s10115-022-01668-7","DOI":"10.1007\/s10115-022-01668-7"},{"issue":"1","key":"1786_CR25","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1007\/s10115-012-0609-3","volume":"39","author":"T Hu","year":"2014","unstructured":"Hu T, Liu C, Tang Y, Sun J, Xiong H, Sung SY (2014) High-dimensional clustering: a clique-based hypergraph partitioning framework. Knowl Inf Syst 39(1):61\u201388. https:\/\/doi.org\/10.1007\/s10115-012-0609-3","journal-title":"Knowl Inf Syst"},{"key":"1786_CR26","doi-asserted-by":"publisher","first-page":"107637","DOI":"10.1016\/j.patcog.2020.107637","volume":"110","author":"S Bai","year":"2021","unstructured":"Bai S, Zhang F, Torr PHS (2021) Hypergraph convolution and hypergraph attention. Pattern Recognit 110:107637. https:\/\/doi.org\/10.1016\/j.patcog.2020.107637","journal-title":"Pattern Recognit"},{"key":"1786_CR27","doi-asserted-by":"publisher","DOI":"10.3390\/sym14030543","author":"R Qu","year":"2022","unstructured":"Qu R, Feng H, Xu C, Hu B (2022) Analysis of hypergraph signals via high-order total variation. Symmetry. https:\/\/doi.org\/10.3390\/sym14030543","journal-title":"Symmetry"},{"key":"1786_CR28","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1016\/j.neucom.2016.10.087","volume":"253","author":"W He","year":"2017","unstructured":"He W, Cheng X, Hu R, Zhu Y, Wen G (2017) Feature self-representation based hypergraph unsupervised feature selection via low-rank representation. Neurocomputing 253:127\u2013134. https:\/\/doi.org\/10.1016\/j.neucom.2016.10.087","journal-title":"Neurocomputing"},{"key":"1786_CR29","unstructured":"University of California, Irvine (UCI), Machine learning repository: statlog (German Credit Data) Dataset. https:\/\/archive.ics.uci.edu\/ml\/datasets\/statlog+(german+credit+data) (2022)"},{"key":"1786_CR30","unstructured":"Kaggle: German credit risk dataset. https:\/\/www.kaggle.com\/kabure\/german-credit-data-with-risk\/ (2022)"},{"key":"1786_CR31","doi-asserted-by":"crossref","unstructured":"Tallis M, Yadav P (2018) Reacting to variations in product demand: an application for conversion rate (cr) prediction in sponsored search. arXiv preprint arXiv:1806.08211","DOI":"10.1109\/BigData.2018.8622223"},{"key":"1786_CR32","unstructured":"Kaggle: banking dataset. https:\/\/www.kaggle.com\/prakharrathi25\/banking-dataset-marketing-targets (2022)"},{"key":"1786_CR33","unstructured":"Kaggle: HR analytics dataset. https:\/\/www.kaggle.com\/arashnic\/hr-analytics-job-change-of-data-scientists (2022)"},{"key":"1786_CR34","unstructured":"Kaggle: phishing dataset. https:\/\/www.kaggle.com\/shashwatwork\/phishing-dataset-for-machine-learning (2022)"},{"key":"1786_CR35","unstructured":"University of California, Irvine (UCI), machine learning repository: breast cancer dataset. https:\/\/archive.ics.uci.edu\/ml\/datasets\/breast+cancer (2022)"},{"key":"1786_CR36","unstructured":"Yandex: Catboost - open-source gradient boosting library. https:\/\/catboost.ai\/) (2022)"},{"key":"1786_CR37","unstructured":"CatBoost: Transforming categorical features to numerical features. https:\/\/catboost.ai\/en\/docs\/concepts\/algorithm-main-stages_cat-to-numberic (2022)"},{"key":"1786_CR38","unstructured":"Microsoft corporation: LightGBM. https:\/\/lightgbm.readthedocs.io\/ (2022)"},{"key":"1786_CR39","unstructured":"LightGBM: optimal split for categorical features. https:\/\/lightgbm.readthedocs.io\/en\/latest\/Features.html#optimal-split-for-categorical-features (2022)"},{"key":"1786_CR40","unstructured":"Scikit-learn: machine learning in python. https:\/\/scikit-learn.org (2022)"},{"key":"1786_CR41","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1016\/j.patrec.2020.03.004","volume":"133","author":"R Zhu","year":"2020","unstructured":"Zhu R, Guo Y, Xue J-H (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recogn Lett 133:217\u2013223. https:\/\/doi.org\/10.1016\/j.patrec.2020.03.004","journal-title":"Pattern Recogn Lett"},{"issue":"2","key":"1786_CR42","first-page":"679","volume":"7","author":"I Tomek","year":"1976","unstructured":"Tomek I (1976) Two Modifications of CNN. IEEE Trans Syst Man Cybern 7(2):679\u2013772","journal-title":"IEEE Trans Syst Man Cybern"},{"key":"1786_CR43","unstructured":"Imbalanced-learn: Tomek links. https:\/\/imbalanced-learn.org\/stable\/ references\/generated\/imblearn.under_sampling.TomekLinks.html (2022)"},{"issue":"3","key":"1786_CR44","doi-asserted-by":"publisher","first-page":"408","DOI":"10.1109\/TSMC.1972.4309137","volume":"2","author":"DL Wilson","year":"1972","unstructured":"Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. Syst Man Cybern IEEE Trans 2(3):408\u2013421. https:\/\/doi.org\/10.1109\/TSMC.1972.4309137","journal-title":"Syst Man Cybern IEEE Trans"},{"key":"1786_CR45","unstructured":"Imbalanced-learn: edited nearest neighbours. https:\/\/imbalanced-learn.org\/stable\/references\/generated\/imblearn.under_sampling.EditedNearestNeighbours.html (2022)"},{"key":"1786_CR46","unstructured":"Imbalanced-learn: random undersampler. https:\/\/imbalanced-learn.org\/stable\/references\/generated\/imblearn.under_sampling.Random UnderSampler.html (2022)"},{"key":"1786_CR47","unstructured":"Scikit-learn: random forest classifier. https:\/\/scikit-learn.org\/stable\/ modules\/generated\/sklearn.ensemble.RandomForestClassifier.html (2022)"},{"key":"1786_CR48","unstructured":"Scikit-learn: logistic regression. https:\/\/scikit-learn.org\/stable\/mod-ules\/generated\/sklearn.linear_model.LogisticRegression.html (2022)"},{"key":"1786_CR49","first-page":"1","volume":"7","author":"J Dem\u0161ar","year":"2006","unstructured":"Dem\u0161ar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1\u201330","journal-title":"J Mach Learn Res"},{"issue":"2","key":"1786_CR50","doi-asserted-by":"publisher","first-page":"0117844","DOI":"10.1371\/journal.pone.0117844","volume":"10","author":"H Wang","year":"2015","unstructured":"Wang H, Xu Q, Zhou L (2015) Large unbalanced credit scoring using lasso-logistic regression ensemble. PloS One 10(2):0117844","journal-title":"PloS One"},{"key":"1786_CR51","unstructured":"Ng, A.: MLOps: From model-centric to data-centric AI. DeepLearning.AI https:\/\/www.deeplearning.ai\/wp-content\/uploads\/2021\/06\/MLOps-From-Model-centric-to-Data-centric-AI.pdf (2021)"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01786-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-022-01786-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-022-01786-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,22]],"date-time":"2023-03-22T03:18:49Z","timestamp":1679455129000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-022-01786-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,25]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["1786"],"URL":"https:\/\/doi.org\/10.1007\/s10115-022-01786-2","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,25]]},"assertion":[{"value":"25 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 September 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 October 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 December 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}