{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T14:14:54Z","timestamp":1766067294443,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,2,22]],"date-time":"2021-02-22T00:00:00Z","timestamp":1613952000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>In this paper, we present a new framework dedicated to the robust detection of representative variables in high dimensional spaces with a potentially limited number of observations. Representative variables are selected by using an original regularization strategy: they are the center of specific variable clusters, denoted CORE-clusters, which respect fully interpretable constraints. Each CORE-cluster indeed contains more than a predefined amount of variables and each pair of its variables has a coherent behavior in the observed data. The key advantage of our regularization strategy is therefore that it only requires to tune two intuitive parameters: the minimal dimension of the CORE-clusters and the minimum level of similarity which gathers their variables. Interpreting the role played by a selected representative variable is additionally obvious as it has a similar observed behaviour as a controlled number of other variables. After introducing and justifying this variable selection formalism, we propose two algorithmic strategies to detect the CORE-clusters, one of them scaling particularly well to high-dimensional data. Results obtained on synthetic as well as real data are finally presented.<\/jats:p>","DOI":"10.3390\/a14020066","type":"journal-article","created":{"date-parts":[[2021,2,22]],"date-time":"2021-02-22T14:23:27Z","timestamp":1614003807000},"page":"66","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Detection of Representative Variables in Complex Systems with Interpretable Rules Using Core-Clusters"],"prefix":"10.3390","volume":"14","author":[{"given":"Camille","family":"Champion","sequence":"first","affiliation":[{"name":"Toulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2043-1927","authenticated-orcid":false,"given":"Anne-Claire","family":"Brunet","sequence":"additional","affiliation":[{"name":"Toulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, France"}]},{"given":"R\u00e9my","family":"Burcelin","sequence":"additional","affiliation":[{"name":"Institute of Cardiovascular and Metabolic Diseases INSERM, F-31432 Toulouse, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1252-2960","authenticated-orcid":false,"given":"Jean-Michel","family":"Loubes","sequence":"additional","affiliation":[{"name":"Toulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, France"},{"name":"Artificial and Natural Intelligence Toulouse Institute (3IA ANITI), F-31000 Toulouse, France"}]},{"given":"Laurent","family":"Risser","sequence":"additional","affiliation":[{"name":"Toulouse Mathematics Institute (UMR 5219), CNRS, University of Toulouse, F-31062 Toulouse, France"},{"name":"Artificial and Natural Intelligence Toulouse Institute (3IA ANITI), F-31000 Toulouse, France"}]}],"member":"1968","published-online":{"date-parts":[[2021,2,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s41109-019-0248-7","article-title":"Graph-based data clustering via multiscale community detection","volume":"5","author":"Liu","year":"2020","journal-title":"Appl. Netw. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.physrep.2005.10.009","article-title":"Complex networks: Structure and dynamics","volume":"424","author":"Boccaletti","year":"2006","journal-title":"Phys. Rep."},{"key":"ref_3","unstructured":"Newman, M. (2009). Networks: An Introduction, Oxford University Press."},{"key":"ref_4","unstructured":"MacQueen, J.B. (1967). Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/0378-8733(83)90028-X","article-title":"Network structure and minimum degree","volume":"5","author":"Seidman","year":"1983","journal-title":"Soc. Netw."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Giatsidis, C., Malliaros, F.D., Thilikos, D.M., and Vazirgiannis, M. (2014, January 27\u201331). CORECLUSTER: A degeneracy based graph clustering framework. In Proceeding of the Twenty-Eight AAAI Conference on Artificial Intelligence, Qu\u00e9bec City, QC, Canada.","DOI":"10.1609\/aaai.v28i1.8731"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1007\/s11634-010-0079-y","article-title":"Fast algorithms for determining (generalized) core groups in social networks","volume":"5","author":"Batagelj","year":"2011","journal-title":"Adv. Data Anal. Classif."},{"key":"ref_8","unstructured":"Agarwal, P.K., Har-Peled, S., and Varadarajan, K.R. (2005). Geometric approximation via coresets. Combinatorial and Computational Geometry, MSRI University Press."},{"key":"ref_9","unstructured":"Claici, S., Genevay, A., and Solomon, J. (2020). Wasserstein Measure Coresets. arXiv."},{"key":"ref_10","unstructured":"Baharan, M., Kaidi, C., and Jure, L. (2020, January 6\u201312). Coresets for robust training of deep neural networks against noisy labels. In Proceedings of the Neural Information Processing Systems (NeurIPS 2020), Vancouver, BC, Canada."},{"key":"ref_11","unstructured":"Baykal, C., Liebenwein, L., Gilitschenski, I., Feldman, D., and Rus, D. (2018). Data-dependent coresets for compressing neural networks with applications to generalization bounds. arXiv."},{"key":"ref_12","unstructured":"Bachem, O., Lucic, M., and Lattanzi, S. (2018, January 9\u201311). One-shot coresets:The case of k-clustering. In Proceeding of the International Conference on Artificial Intelligence and Statistics (AISTATS), Playa Blanca, Spain."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_14","unstructured":"Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., and Liu, H. (2021, January 14). Advancing Feature Selection Research. in ASU Feature Selection Repository, Available online: http:\/\/www.public.asu.edu\/huanliu\/papers\/tr-10-007.pdf."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"94:1","DOI":"10.1145\/3136625","article-title":"Feature Selection: A Data Perspective","volume":"50","author":"Li","year":"2018","journal-title":"ACM Comput. Surv."},{"key":"ref_16","first-page":"115","article-title":"Iterative Spectral Method for Alternative Clustering","volume":"84","author":"Wu","year":"2018","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_17","first-page":"1127","article-title":"Crowdclustering with Partitions Labels","volume":"84","author":"Chen","year":"2018","journal-title":"Proc. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Yu, L., and Liu, H. (2003, January 21\u201324). Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. Proceedings of the International Conference on Machine Learning (ICML-2003), Washington, DC, USA."},{"key":"ref_19","unstructured":"Brunet, A.C., Loubes, J.M., Azais, J.M., and Courtney, M. (2015). Method of Identification of a Relationship between Biological Elements. (App. PCT\/EP2015\/060,779), WO Patent."},{"key":"ref_20","unstructured":"Brunet, A.C., Azais, J.M., Loubes, J.M., Amar, J., and Burcelin, R. (2016). A new gene co-expression network analysis based on Core Structure Detection (CSD). arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"733","DOI":"10.1287\/opre.8.5.733","article-title":"The Maximum Capacity through a Network","volume":"8","author":"Pollack","year":"1960","journal-title":"Oper. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"898","DOI":"10.1287\/opre.9.6.898","article-title":"The Maximum Capacity Route Problem","volume":"9","author":"Hu","year":"1961","journal-title":"Oper. Res."},{"key":"ref_23","unstructured":"Randall, K.H. (1998). Cilk: Efficient Multithreaded Computing. [Ph.D. Thesis, Massachusetts Institute of Technology]."},{"key":"ref_24","unstructured":"Cysouw, M. (2018, February 02). R Function Cor.Sparse. Available online: https:\/\/www.rdocumentation.org\/packages\/qlcMatrix\/versions\/0.9.2\/topics\/cor.sparse."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1090\/S0002-9939-1956-0078686-7","article-title":"On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem","volume":"7","author":"Kruskal","year":"1956","journal-title":"Proc. Am. Math. Soc."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1137\/0201010","article-title":"Depth first search and linear graph algorithms","volume":"1","author":"Tarjan","year":"1972","journal-title":"SIAM J. Comput."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Steele, J.M. (2002). Minimal spanning trees for graphs with random edge lengths. Mathematics and Computer Science II, Springer.","DOI":"10.1007\/978-3-0348-8211-8_14"},{"key":"ref_28","unstructured":"Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. (2001). Introduction to Algorithms, MIT Press."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1287\/trsc.32.1.65","article-title":"Shortest Path Algorithms: An Evaluation Using Real Road Networks","volume":"32","author":"Zan","year":"1998","journal-title":"Transp. Sci."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized cuts and image segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","unstructured":"Ng, A., Jordan, M., and Weiss, Y. (2002, January 9\u201314). On spectral clustering: Analysis and an algorithm. Proceedings of the Advances in Neural Information Processing Systems (NIPS 2002), Vancouver, BC, Canada."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp., 10008.","DOI":"10.1088\/1742-5468\/2008\/10\/P10008"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"3273","DOI":"10.1091\/mbc.9.12.3273","article-title":"Comprehensive Identification of Cell-Cycle-regulated Genes of Yeast Saccharomyces cerevisiae by Microarray Hybridization","volume":"9","author":"Spellman","year":"1998","journal-title":"Mol. Biol. Cell"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Gadat, S., Gavra, I., and Risser, L. (2018). How to calculate the barycenter of a weighted graph. Informs, 43.","DOI":"10.1287\/moor.2017.0896"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/2\/66\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:28:03Z","timestamp":1760160483000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/2\/66"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,22]]},"references-count":34,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,2]]}},"alternative-id":["a14020066"],"URL":"https:\/\/doi.org\/10.3390\/a14020066","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2021,2,22]]}}}