{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T02:16:22Z","timestamp":1769912182893,"version":"3.49.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T00:00:00Z","timestamp":1624233600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Office of Science, Office of Advanced Scientific Computing Research","award":["DE-AC02-05CH11231"],"award-info":[{"award-number":["DE-AC02-05CH11231"]}]},{"DOI":"10.13039\/501100010418","name":"Institute for Information & communications Technology Promotion","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Korea government","award":["2016-0-00078"],"award-info":[{"award-number":["2016-0-00078"]}]},{"DOI":"10.13039\/100017223","name":"National Energy Research Scientific Computing Center","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100017223","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Manage. Inf. Syst."],"published-print":{"date-parts":[[2021,9,30]]},"abstract":"<jats:p>\n            Variable selection (also known as\n            <jats:italic>feature selection<\/jats:italic>\n            ) is essential to optimize the learning complexity by prioritizing features, particularly for a massive, high-dimensional dataset like network traffic data. In reality, however, it is not an easy task to effectively perform the feature selection despite the availability of the existing selection techniques. From our initial experiments, we observed that the existing selection techniques produce different sets of features even under the same condition (e.g., a static size for the resulted set). In addition, individual selection techniques perform inconsistently, sometimes showing better performance but sometimes worse than others, thereby simply relying on one of them would be risky for building models using the selected features. More critically, it is demanding to automate the selection process, since it requires laborious efforts with intensive analysis by a group of experts otherwise. In this article, we explore challenges in the automated feature selection with the application of network anomaly detection. We first present our ensemble approach that benefits from the existing feature selection techniques by incorporating them, and one of the proposed ensemble techniques based on greedy search works highly consistently showing comparable results to the existing techniques. We also address the problem of when to stop to finalize the feature elimination process and present a set of methods designed to determine the number of features for the reduced feature set. Our experimental results conducted with two recent network datasets show that the identified feature sets by the presented ensemble and stopping methods consistently yield comparable performance with a smaller number of features to conventional selection techniques.\n          <\/jats:p>","DOI":"10.1145\/3446636","type":"journal-article","created":{"date-parts":[[2021,6,21]],"date-time":"2021-06-21T19:59:45Z","timestamp":1624305585000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":23,"title":["Automated Feature Selection for Anomaly Detection in Network Traffic Data"],"prefix":"10.1145","volume":"12","author":[{"given":"Makiya","family":"Nakashima","sequence":"first","affiliation":[{"name":"Texas A&amp;M University-Commerce, Commerce, TX"}]},{"given":"Alex","family":"Sim","sequence":"additional","affiliation":[{"name":"Lawrence Berkeley National Laboratory, Berkeley, CA"}]},{"given":"Youngsoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute, Daejeon, Korea"}]},{"given":"Jonghyun","family":"Kim","sequence":"additional","affiliation":[{"name":"Electronics and Telecommunications Research Institute, Daejeon, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9835-1866","authenticated-orcid":false,"given":"Jinoh","family":"Kim","sequence":"additional","affiliation":[{"name":"Texas A&amp;M University-Commerce and Lawrence Berkeley National Laboratory, Berkeley, CA"}]}],"member":"320","published-online":{"date-parts":[[2021,6,21]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASONAM.2012.72"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-018-0619-4"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSCloud.2017.26"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2015.11.016"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-017-1117-8"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2015.2494502"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MilCIS.2015.7348942"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP\u201918)","author":"Sharafaldin Iman","unstructured":"Iman Sharafaldin , Arash Habibi Lashkari , and Ali A. Ghorbani . 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization . In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP\u201918) . 108\u2013116. Iman Sharafaldin, Arash Habibi Lashkari, and Ali A. Ghorbani. 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP\u201918). 108\u2013116."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3136625"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3136625"},{"key":"e_1_2_1_11_1","volume-title":"Data Clustering","author":"Alelyani Salem","unstructured":"Salem Alelyani , Jiliang Tang , and Huan Liu . 2018. Feature selection for clustering: A review . In Data Clustering . Chapman & Hall\/CRC , 29\u201360. Salem Alelyani, Jiliang Tang, and Huan Liu. 2018. Feature selection for clustering: A review. In Data Clustering. Chapman & Hall\/CRC, 29\u201360."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2018.07.018"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-017-3131-4"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TFUZZ.2019.2892363"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2017.08.018"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-40596-4_27"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-014-5473-9"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2017.07.005"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.10.416"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISIE.2017.8001537"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cose.2017.06.005"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT\u201918)","author":"Patil Gayatri V.","unstructured":"Gayatri V. Patil , K. Vinod Pachghare , and Deepak D. Kshirsagar . 2018. Feature reduction in flow based intrusion detection system . In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT\u201918) . IEEE, 1356\u20131362. Gayatri V. Patil, K. Vinod Pachghare, and Deepak D. Kshirsagar. 2018. Feature reduction in flow based intrusion detection system. In Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT\u201918). IEEE, 1356\u20131362."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/1736481.1736489"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944968"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/1777942.1777962"},{"key":"e_1_2_1_26_1","volume-title":"2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO'15)","author":"Jovi\u0107 A.","unstructured":"A. Jovi\u0107 , K. Brki\u0107 , and N. Bogunovi\u0107 . 2015. A review of feature selection methods with applications . In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO'15) . 1200--1205, DOI:10.1109\/MIPRO.2015.7160458 10.1109\/MIPRO.2015.7160458 A. Jovi\u0107, K. Brki\u0107, and N. Bogunovi\u0107. 2015. A review of feature selection methods with applications. In 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO'15). 1200--1205, DOI:10.1109\/MIPRO.2015.7160458"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm344"},{"key":"e_1_2_1_28_1","volume-title":"Python Machine Learning","author":"Raschuka Sebastian","unstructured":"Sebastian Raschuka . 2015. Python Machine Learning . Packet Publishing Ltd . Sebastian Raschuka. 2015. Python Machine Learning. Packet Publishing Ltd."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2007.44"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Annual Conference on Biocomputing. World Scientific, 516\u2013527","author":"Tastan Oznur","year":"2009","unstructured":"Oznur Tastan , Yanjun Qi , Jaime G. Carbonell , and Judith Klein-Seetharaman . 2009 . Prediction of interactions between HIV-1 and human proteins by information integration . In Proceedings of the Annual Conference on Biocomputing. World Scientific, 516\u2013527 . Oznur Tastan, Yanjun Qi, Jaime G. Carbonell, and Judith Klein-Seetharaman. 2009. Prediction of interactions between HIV-1 and human proteins by information integration. In Proceedings of the Annual Conference on Biocomputing. World Scientific, 516\u2013527."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1736481.1736489"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Dalwinder Singh and Birmohan Singh. 2019. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. (2019) 105524.  Dalwinder Singh and Birmohan Singh. 2019. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. (2019) 105524.","DOI":"10.1016\/j.asoc.2019.105524"},{"key":"e_1_2_1_33_1","unstructured":"Devansh Arpit and Yoshua Bengio. 2019. The benefits of over-parameterization at initialization in deep ReLU networks. arXiv:1901.03611. Retrieved from https:\/\/arxiv.org\/abs\/1901.03611.  Devansh Arpit and Yoshua Bengio. 2019. The benefits of over-parameterization at initialization in deep ReLU networks. arXiv:1901.03611. Retrieved from https:\/\/arxiv.org\/abs\/1901.03611."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2017.47"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2017.01.048"},{"key":"e_1_2_1_36_1","unstructured":"KDD Cup 1999 Data. Retrieved from http:\/\/kdd.ics.uci.edu\/databases\/kddcup99\/kddcup99.html.  KDD Cup 1999 Data. Retrieved from http:\/\/kdd.ics.uci.edu\/databases\/kddcup99\/kddcup99.html."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2010.25"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"}],"container-title":["ACM Transactions on Management Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3446636","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3446636","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:31Z","timestamp":1750193251000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3446636"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,21]]},"references-count":38,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,30]]}},"alternative-id":["10.1145\/3446636"],"URL":"https:\/\/doi.org\/10.1145\/3446636","relation":{},"ISSN":["2158-656X","2158-6578"],"issn-type":[{"value":"2158-656X","type":"print"},{"value":"2158-6578","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,21]]},"assertion":[{"value":"2020-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}