{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T11:16:39Z","timestamp":1769166999319,"version":"3.49.0"},"reference-count":34,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,3,30]],"date-time":"2022-03-30T00:00:00Z","timestamp":1648598400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Ministry of Science and Technology","award":["MOST 110-2628-E-A49-011, and MOST 110-2218-E-A49-011-MBK"],"award-info":[{"award-number":["MOST 110-2628-E-A49-011, and MOST 110-2218-E-A49-011-MBK"]}]},{"name":"Center for Open Intelligent Connectivity"},{"name":"The Featured Areas Research Center Program"},{"name":"Ministry of Education in Taiwan"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Digital Threats"],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            Machine learning has been widely used for solving challenging problems in diverse areas. However, to the best of our knowledge, seldom literature has discussed in-depth how machine learning approaches can be used effectively to \u201chunt\u201d (identify) threats, especially\n            <jats:bold>advanced persistent threats (APTs)<\/jats:bold>\n            , in a monitored environment. In this study, we share our past experiences in building machine learning-based threat-hunting models. Several challenges must be considered when a security team attempts to build such models. These challenges include (1)\u00a0weak signal, (2)\u00a0imbalanced data sets, (3)\u00a0lack of high-quality labels, and (4)\u00a0no storyline. In this study, we propose Fuchikoma and APTEmu to demonstrate how we tackle the above-mentioned challenges. The former is a proof of concept system for demonstrating the ideas behind autonomous threat-hunting. It is a machine learning-based anomaly detection and threat hunting system which leverages\n            <jats:bold>natural language processing (NLP)<\/jats:bold>\n            and graph algorithms. The latter is an APT emulator, which emulates the behavior of a well-known APT called APT3, which is the target used in the first round of MITRE ATT&amp;CK Evaluations. APTEmu generates attacks on Windows machines in a virtualized environment, and the captured system events can be further used to train and enhance Fuchikoma\u2019s capabilities. We illustrate the steps and experiments we used to build the models, discuss each model\u2019s effectiveness and limitations of each model, and propose countermeasures and solutions to improve the models. Our evaluation results show that machine learning algorithms can effectively assist threat hunting processes and significantly reduce security analysts\u2019 efforts. Fuchikoma correctly identifies malicious commands and achieves high performance in terms of over 80% True Positive Rate and True Negative Rate and over 60% F3. We believe our proposed approaches provide valuable experiences in the area and shed light on automated threat-hunting research.\n          <\/jats:p>","DOI":"10.1145\/3491260","type":"journal-article","created":{"date-parts":[[2021,10,15]],"date-time":"2021-10-15T18:44:36Z","timestamp":1634323476000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Building Machine Learning-based Threat Hunting System from Scratch"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6235-7529","authenticated-orcid":false,"given":"Chung-Kuan","family":"Chen","sequence":"first","affiliation":[{"name":"CyCraft Technology Corporation, New Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Si-Chen","family":"Lin","sequence":"additional","affiliation":[{"name":"CyCraft Technology Corporation &amp; National Taiwan University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Szu-Chun","family":"Huang","sequence":"additional","affiliation":[{"name":"National Chiao Tung University &amp; National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yung-Tien","family":"Chu","sequence":"additional","affiliation":[{"name":"National Chiao Tung University &amp; National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chin-Laung","family":"Lei","sequence":"additional","affiliation":[{"name":"National Taiwan University, Taipei, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chun-Ying","family":"Huang","sequence":"additional","affiliation":[{"name":"National Chiao Tung University &amp; National Yang Ming Chiao Tung University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,3,30]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Abdulellah Alsaheel Yuhong Nan Shiqing Ma Le Yu Gregory Walkup Z. Berkay Celik Xiangyu Zhang and Dongyan Xu. [n.d.]. ATLAS: A sequence-based learning approach for attack investigation. ([n. d.])."},{"key":"e_1_3_2_3_2","unstructured":"Andrea Pierini and Giuseppe Trotta. 2018. Juicy Potato (Abusing the Golden Privileges). https:\/\/github.com\/ohpe\/juicy-potato."},{"key":"e_1_3_2_4_2","unstructured":"MITRE ATT&CK. 2019. APT3 Evaluation: Operational Flow.https:\/\/attackevals.mitre-engenuity.org\/APT3\/operational-flow.html."},{"key":"e_1_3_2_5_2","unstructured":"MITRE ATT&CK. 2021. MITRE ATT&CK Evaluations: Using ATT&CK Evaluations. https:\/\/attackevals.mitre-engenuity.org\/using-attack-evaluations.html."},{"key":"e_1_3_2_6_2","first-page":"242","volume-title":"Proceedings of IEEE 44th Conference on Local Computer Networks (LCN)","author":"Bai Tim","year":"2019","unstructured":"Tim Bai, Haibo Bian, Abbas Abou Daya, Mohammad A. Salahuddin, Noura Limam, and Raouf Boutaba. 2019. A machine learning approach for RDP-based lateral movement detection. In Proceedings of IEEE 44th Conference on Local Computer Networks (LCN). IEEE, 242\u2013245."},{"key":"e_1_3_2_7_2","volume-title":"Proceedings of the 11th International Workshop on Theory and Practice of Provenance (TaPP 2019)","author":"Barre Mathieu","year":"2019","unstructured":"Mathieu Barre, Ashish Gehani, and Vinod Yegneswaran. 2019. Mining data provenance to detect advanced persistent threats. In Proceedings of the 11th International Workshop on Theory and Practice of Provenance (TaPP 2019)."},{"key":"e_1_3_2_8_2","unstructured":"Benjamin Delpy and Vincent Le Toux. 2014. Mimikatz. https:\/\/github.com\/gentilkiwi\/mimikatz."},{"key":"e_1_3_2_9_2","volume-title":"Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit","author":"Bird Steven","year":"2009","unstructured":"Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O\u2019Reilly Media, Inc."},{"key":"e_1_3_2_10_2","first-page":"257","volume-title":"Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses ( \\( \\lbrace \\) RAID \\( \\rbrace \\)  2020)","author":"Bowman Benjamin","year":"2020","unstructured":"Benjamin Bowman, Craig Laprade, Yuede Ji, and H. Howie Huang. 2020. Detecting lateral movement in enterprise computer networks with unsupervised graph \\( \\lbrace \\) AI \\( \\rbrace \\) . In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses ( \\( \\lbrace \\) RAID \\( \\rbrace \\) 2020). 257\u2013268."},{"key":"e_1_3_2_11_2","article-title":"On preempting advanced persistent threats using probabilistic graphical models","author":"Cao Phuong","year":"2019","unstructured":"Phuong Cao. 2019. On preempting advanced persistent threats using probabilistic graphical models. arXiv preprint arXiv:1903.08826 (2019).","journal-title":"arXiv preprint arXiv:1903.08826"},{"key":"e_1_3_2_12_2","first-page":"1285","volume-title":"Proceedings of ACM SIGSAC Conference on Computer and Communications Security","author":"Du Min","year":"2017","unstructured":"Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of ACM SIGSAC Conference on Computer and Communications Security. 1285\u20131298."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.5555\/3086952"},{"key":"e_1_3_2_14_2","article-title":"Unicorn: Runtime provenance-based detector for advanced persistent threats","author":"Han Xueyuan","year":"2020","unstructured":"Xueyuan Han, Thomas Pasquier, Adam Bates, James Mickens, and Margo Seltzer. 2020. Unicorn: Runtime provenance-based detector for advanced persistent threats. arXiv preprint arXiv:2001.01525 (2020).","journal-title":"arXiv preprint arXiv:2001.01525"},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","first-page":"1172","DOI":"10.1109\/SP40000.2020.00096","volume-title":"Proceedings of 2020 IEEE Symposium on Security and Privacy (SP)","author":"Hassan Wajih Ul","year":"2020","unstructured":"Wajih Ul Hassan, Adam Bates, and Daniel Marino. 2020. Tactical provenance analysis for endpoint detection and response systems. In Proceedings of 2020 IEEE Symposium on Security and Privacy (SP). IEEE, 1172\u20131189."},{"key":"e_1_3_2_16_2","unstructured":"Sistemas Hispasec. 2004. VirusTotal. https:\/\/www.virustotal.com\/."},{"key":"e_1_3_2_17_2","first-page":"238","volume-title":"Proceedings of International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment","author":"Leichtnam Laetitia","year":"2020","unstructured":"Laetitia Leichtnam, Eric Totel, Nicolas Prigent, and Ludovic M\u00e9. 2020. Sec2graph: Network attack detection based on novelty detection on graph structured data. In Proceedings of International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment. Springer, 238\u2013258."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3319535.3363224"},{"key":"e_1_3_2_19_2","unstructured":"Rapid7 LLC. 2013. Penetration Testing Software Pen Testing Security. https:\/\/www.metasploit.com\/."},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1145\/3319535.3363217","volume-title":"Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security","author":"Milajerdi Sadegh M.","year":"2019","unstructured":"Sadegh M. Milajerdi, Birhanu Eshete, Rigel Gjomemo, and V. N. Venkatakrishnan. 2019. Poirot: Aligning attack behavior with kernel audit records for cyber threat hunting. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1795\u20131812."},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/SP.2019.00026","volume-title":"2019 IEEE Symposium on Security and Privacy (SP)","author":"Milajerdi Sadegh M.","year":"2019","unstructured":"Sadegh M. Milajerdi, Rigel Gjomemo, Birhanu Eshete, Ramachandran Sekar, and V. N. Venkatakrishnan. 2019. HOLMES: Real-time APT detection through correlation of suspicious information flows. In 2019 IEEE Symposium on Security and Privacy (SP). IEEE, 1137\u20131152."},{"key":"e_1_3_2_22_2","unstructured":"Neo4j. 2007. Neo4j Graph Platform \u2013 The Leader in Graph Databases. https:\/\/neo4j.com\/."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_2_24_2","unstructured":"Florian Roth. 2017. Sigma: Generic Signature Format for SIEM Systems. https:\/\/github.com\/Neo23x0\/sigma."},{"key":"e_1_3_2_25_2","article-title":"Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats","author":"Schindler Timo","year":"2018","unstructured":"Timo Schindler. 2018. Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats. arXiv preprint arXiv:1802.00259 (2018).","journal-title":"arXiv preprint arXiv:1802.00259"},{"key":"e_1_3_2_26_2","first-page":"905","volume-title":"Proceedings of the 28th USENIX Security Symposium (USENIX Security 19)","author":"Shen Yun","year":"2019","unstructured":"Yun Shen and Gianluca Stringhini. 2019. Attack2vec: Leveraging temporal word embeddings to understand the evolution of cyberattacks. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19). 905\u2013921."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243734.3243829"},{"key":"e_1_3_2_28_2","unstructured":"Salvatore Sinno and Ismael Cervantes. 2019. https:\/\/www.sans.org\/webcasts\/soc-superheroes-win-110655."},{"key":"e_1_3_2_29_2","unstructured":"Stephen Breen. 2017. RottenPotatoNG. https:\/\/github.com\/breenmachine\/RottenPotatoNG."},{"key":"e_1_3_2_30_2","unstructured":"Will Schroeder Justin Warner and Matt Nelson. 2015. PowerShell Empire. https:\/\/www.powershellempire.com\/."},{"key":"e_1_3_2_31_2","article-title":"CONAN: A practical real-time APT detection system with high accuracy and efficiency","author":"Xiong Chunlin","year":"2020","unstructured":"Chunlin Xiong, Tiantian Zhu, Weihao Dong, Linqi Ruan, Runqing Yang, Yan Chen, Yueqiang Cheng, Shuai Cheng, and Xutong Chen. 2020. CONAN: A practical real-time APT detection system with high accuracy and efficiency. IEEE Transactions on Dependable and Secure Computing (2020).","journal-title":"IEEE Transactions on Dependable and Secure Computing"},{"key":"e_1_3_2_32_2","first-page":"1418","volume-title":"Proceedings of IEEE 19th International Conference on Communication Technology (ICCT)","author":"Yu Han","year":"2019","unstructured":"Han Yu, Aiping Li, and Rong Jiang. 2019. Needle in a haystack: Attack detection from large-scale system audit. In Proceedings of IEEE 19th International Conference on Communication Technology (ICCT). IEEE, 1418\u20131426."},{"key":"e_1_3_2_33_2","first-page":"2449","volume-title":"Proceedings of IEEE Conference on Computer Communications INFOCOM","author":"Yuan Yali","year":"2020","unstructured":"Yali Yuan, Sripriya Srikant Adhatarao, Mingkai Lin, Yachao Yuan, Zheli Liu, and Xiaoming Fu. 2020. ADA: Adaptive deep log anomaly detector. In Proceedings of IEEE Conference on Computer Communications INFOCOM. IEEE, 2449\u20132458."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1086\/jar.33.4.3629752"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3375708.3380314"}],"container-title":["Digital Threats: Research and Practice"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3491260","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3491260","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:19Z","timestamp":1750183759000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3491260"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,30]]},"references-count":34,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3491260"],"URL":"https:\/\/doi.org\/10.1145\/3491260","relation":{},"ISSN":["2692-1626","2576-5337"],"issn-type":[{"value":"2692-1626","type":"print"},{"value":"2576-5337","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,30]]},"assertion":[{"value":"2021-02-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-09","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}