{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T02:45:03Z","timestamp":1760237103104,"version":"build-2065373602"},"reference-count":34,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2020,3,2]],"date-time":"2020-03-02T00:00:00Z","timestamp":1583107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["OAC-1920147","DMS-1613035","DMS-1712714"],"award-info":[{"award-number":["OAC-1920147","DMS-1613035","DMS-1712714"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Traditional hypothesis-margin researches focus on obtaining large margins and feature selection. In this work, we show that the robustness of margins is also critical and can be measured using entropy. In addition, our approach provides clear mathematical formulations and explanations to uncover feature interactions, which is often lack in large hypothesis-margin based approaches. We design an algorithm, termed IMMIGRATE (Iterative max-min entropy margin-maximization with interaction terms), for training the weights associated with the interaction terms. IMMIGRATE simultaneously utilizes both local and global information and can be used as a base learner in Boosting. We evaluate IMMIGRATE in a wide range of tasks, in which it demonstrates exceptional robustness and achieves the state-of-the-art results with high interpretability.<\/jats:p>","DOI":"10.3390\/e22030291","type":"journal-article","created":{"date-parts":[[2020,3,2]],"date-time":"2020-03-02T07:52:35Z","timestamp":1583135555000},"page":"291","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["IMMIGRATE: A Margin-Based Feature Selection Method with Interaction Terms"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3420-5771","authenticated-orcid":false,"given":"Ruzhang","family":"Zhao","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3177-2754","authenticated-orcid":false,"given":"Pengyu","family":"Hong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Brandeis University, Waltham, MA 02453, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4450-7239","authenticated-orcid":false,"given":"Jun S.","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Statistics, Harvard University, Cambridge, MA 02138, USA"}]}],"member":"1968","published-online":{"date-parts":[[2020,3,2]]},"reference":[{"key":"ref_1","unstructured":"Fukunaga, K. (2013). Introduction to Statistical Pattern Recognition, Elsevier."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kira, K., and Rendell, L.A. (1992). A practical approach to feature selection. Machine Learning Proceedings 1992, Morgan Kaufmann.","DOI":"10.1016\/B978-1-55860-247-2.50037-1"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Gilad-Bachrach, R., Navot, A., and Tishby, N. (2004, January 4\u20138). Margin based feature selection-theory and algorithms. Proceedings of the 21st International Conference on Machine Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015352"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. European Conference on Machine Learning, Springer.","DOI":"10.1007\/3-540-57868-4_57"},{"key":"ref_5","first-page":"27","article-title":"A Novel Feature Selection Algorithm Based on Hypothesis-Margin","volume":"3","author":"Yang","year":"2008","journal-title":"JCP"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sun, Y., and Li, J. (2006, January 25\u201329). Iterative RELIEF for feature weighting. Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143959"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sun, Y., and Wu, D. (2008, January 24\u201326). A relief based feature extraction algorithm. Proceedings of the 2008 SIAM International Conference on Data Mining, Atlanta, GA, USA.","DOI":"10.1137\/1.9781611972788.17"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Bei, Y., and Hong, P. (2015, January 17\u201320). Maximizing margin quality and quantity. Proceedings of the 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, MA, USA.","DOI":"10.1109\/MLSP.2015.7324382"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.jbi.2018.07.014","article-title":"Relief-based feature selection: Introduction and review","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J. Biomed. Inform."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00116037","article-title":"The strength of weak learnability","volume":"5","author":"Schapire","year":"1990","journal-title":"Mach. Learn."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Kuhn, H.W., and Tucker, A.W. (2014). Nonlinear programming. Traces and Emergence of Nonlinear Programming, Springer.","DOI":"10.1007\/978-3-0348-0439-4_11"},{"key":"ref_12","first-page":"1","article-title":"Robust variable and interaction selection for logistic regression and general index models","volume":"114","author":"Li","year":"2018","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1111","DOI":"10.1214\/13-AOS1096","article-title":"A lasso for hierarchical interactions","volume":"41","author":"Bien","year":"2013","journal-title":"Ann. Stat."},{"key":"ref_14","first-page":"148","article-title":"Experiments with a new boosting algorithm","volume":"96","author":"Freund","year":"1996","journal-title":"Icml"},{"key":"ref_15","first-page":"124","article-title":"The alternating decision tree learning algorithm","volume":"99","author":"Freund","year":"1999","journal-title":"Icml"},{"key":"ref_16","unstructured":"Soentpiet, R. (1999). Advances in Kernel Methods: Support Vector Learning, MIT Press."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1111\/j.2517-6161.1996.tb02080.x","article-title":"Regression shrinkage and selection via the lasso","volume":"58","author":"Tibshirani","year":"1996","journal-title":"J. R. Stat. Soc."},{"key":"ref_18","unstructured":"John, G.H., and Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc."},{"key":"ref_19","unstructured":"Haykin, S. (1994). Neural Networks: A Comprehensive Foundation, Prentice Hall PTR."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/BF00153759","article-title":"Instance-based learning algorithms","volume":"6","author":"Aha","year":"1991","journal-title":"Mach. Learn."},{"key":"ref_21","first-page":"207","article-title":"Distance metric learning for large margin nearest neighbor classification","volume":"10","author":"Weinberger","year":"2009","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1023\/A:1025667309714","article-title":"Theoretical and empirical analysis of ReliefF and RReliefF","volume":"53","author":"Kononenko","year":"2003","journal-title":"Mach. Learn."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1111\/j.1469-1809.1936.tb02137.x","article-title":"The use of multiple measurements in taxonomic problems","volume":"7","author":"Fisher","year":"1936","journal-title":"Ann. Eugen."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"6503","DOI":"10.1158\/0008-5472.CAN-04-0452","article-title":"Gene expression profiling of gliomas strongly predicts survival","volume":"64","author":"Freije","year":"2004","journal-title":"Cancer Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2483","DOI":"10.1056\/NEJMoa030847","article-title":"The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma","volume":"349","author":"Tian","year":"2003","journal-title":"N. Engl. J. Med."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"530","DOI":"10.1038\/415530a","article-title":"Gene expression profiling predicts clinical outcome of breast cancer","volume":"415","author":"Dai","year":"2002","journal-title":"Nature"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","article-title":"Gene expression correlates of clinical prostate cancer behavior","volume":"1","author":"Singh","year":"2002","journal-title":"Cancer Cell"},{"key":"ref_31","unstructured":"Frank, A., and Asuncion, A. (2019, August 01). UCI Machine Learning Repository. Available online: http:\/\/archive.ics.uci.edu\/ml."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1109\/TPAMI.2011.142","article-title":"Prototype selection for nearest neighbor classification: Taxonomy and empirical study","volume":"34","author":"Garcia","year":"2012","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yu, S., Giraldo, L.G.S., Jenssen, R., and Principe, J.C. (2019). Multivariate Extension of Matrix-based Renyi\u2019s \u03b1-order Entropy Functional. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2019.2932976"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.patcog.2015.11.007","article-title":"Can high-order dependencies improve mutual information based feature selection?","volume":"53","author":"Vinh","year":"2016","journal-title":"Pattern Recognit."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/3\/291\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:03:18Z","timestamp":1760173398000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/22\/3\/291"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,2]]},"references-count":34,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,3]]}},"alternative-id":["e22030291"],"URL":"https:\/\/doi.org\/10.3390\/e22030291","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2020,3,2]]}}}