{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T14:42:12Z","timestamp":1754145732713,"version":"3.41.2"},"reference-count":25,"publisher":"Walter de Gruyter GmbH","issue":"1-2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In this research, we analyze the effect of lightweight syntactical feature extraction techniques from the field of information retrieval for log abstraction in information security. To this end, we evaluate three feature extraction techniques and three clustering algorithms on four different security datasets for anomaly detection. Results demonstrate that these techniques have a role to play for log abstraction in the form of extracting syntactic features which improves the identification of anomalous minority classes, specifically in homogeneous security datasets.<\/jats:p>","DOI":"10.1515\/itit-2021-0064","type":"journal-article","created":{"date-parts":[[2022,3,22]],"date-time":"2022-03-22T06:53:01Z","timestamp":1647931981000},"page":"15-27","source":"Crossref","is-referenced-by-count":4,"title":["Exploring syntactical features for anomaly detection in application logs"],"prefix":"10.1515","volume":"64","author":[{"given":"Rafael","family":"Copstein","sequence":"first","affiliation":[{"name":"153020 Dalhousie University , Faculty of Computer Science , 6299 South Street , Halifax , NS , Canada"}]},{"given":"Egil","family":"Karlsen","sequence":"additional","affiliation":[{"name":"153020 Dalhousie University , Faculty of Computer Science , 6299 South Street , Halifax , NS , Canada"}]},{"given":"Jeff","family":"Schwartzentruber","sequence":"additional","affiliation":[{"name":"2Keys , 20 Eglinton Ave. W. \u2013 Suite 1500 , Toronto , Ontario , Canada"}]},{"given":"Nur","family":"Zincir-Heywood","sequence":"additional","affiliation":[{"name":"153020 Dalhousie University , Faculty of Computer Science , 6299 South Street , Halifax , NS , Canada"}]},{"given":"Malcolm","family":"Heywood","sequence":"additional","affiliation":[{"name":"153020 Dalhousie University , Faculty of Computer Science , 6299 South Street , Halifax , NS , Canada"}]}],"member":"374","published-online":{"date-parts":[[2022,3,23]]},"reference":[{"key":"2025071712192953906_j_itit-2021-0064_ref_001","doi-asserted-by":"crossref","unstructured":"J. Zhu, S. He, J. Liu, P. He, Q. Xie, Z. Zheng, and M.\u2009R. Lyu, \u201cTools and benchmarks for automated log parsing,\u201d in 2019 IEEE\/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2019, pp.\u2009121\u2013130.","DOI":"10.1109\/ICSE-SEIP.2019.00021"},{"key":"2025071712192953906_j_itit-2021-0064_ref_002","doi-asserted-by":"crossref","unstructured":"D. El-Masri, F. Petrillo, Y.-G. Gu\u00e9h\u00e9neuc, A. Hamou-Lhadj, and A. Bouziane, \u201cA systematic literature review on automated log abstraction techniques,\u201d Information and Software Technology, vol.\u2009122, p.\u2009106276, 2020.","DOI":"10.1016\/j.infsof.2020.106276"},{"key":"2025071712192953906_j_itit-2021-0064_ref_003","doi-asserted-by":"crossref","unstructured":"R. Copstein, J. Schwartzentruber, N. Zincir-Heywood, and M. Heywood, \u201cLog abstraction for information security: Heuristics and reproducibility,\u201d in The 16th International Conference on Availability, Reliability and Security, ser. ARES 2021. New York, NY, USA: Association for Computing Machinery, 2021. [Online]. Available: https:\/\/doi.org\/10.1145\/3465481.3470083.","DOI":"10.1145\/3465481.3470083"},{"key":"2025071712192953906_j_itit-2021-0064_ref_004","doi-asserted-by":"crossref","unstructured":"B. Gallagher and T. Eliassi-Rad, \u201cClassification of http attacks: a study on the ecml\/pkdd 2007 discovery challenge,\u201d Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), Tech. Rep., 2009.","DOI":"10.2172\/1113394"},{"key":"2025071712192953906_j_itit-2021-0064_ref_005","doi-asserted-by":"crossref","unstructured":"H. Dev and Z. Liu, \u201cIdentifying frequent user tasks from application logs,\u201d in Proceedings of the 22nd International Conference on Intelligent User Interfaces, ser. IUI \u201917. New York, NY, USA: Association for Computing Machinery, 2017, pp.\u2009263\u2013273. [Online]. Available: https:\/\/doi.org\/10.1145\/3025171.3025184.","DOI":"10.1145\/3025171.3025184"},{"key":"2025071712192953906_j_itit-2021-0064_ref_006","doi-asserted-by":"crossref","unstructured":"K. Savitha and M. Vijaya, \u201cMining of web server logs in a distributed cluster using big data technologies,\u201d International Journal of Advanced Computer Science and Applications (IJACSA), vol.\u20095, no.\u20091, 2014.","DOI":"10.14569\/IJACSA.2014.050119"},{"key":"2025071712192953906_j_itit-2021-0064_ref_007","doi-asserted-by":"crossref","unstructured":"C. Lonvick, \u201cRfc3164: The bsd syslog protocol,\u201d 2001.","DOI":"10.17487\/rfc3164"},{"key":"2025071712192953906_j_itit-2021-0064_ref_008","doi-asserted-by":"crossref","unstructured":"A. Makanju, A.\u2009N. Zincir-Heywood, and E.\u2009E. Milios, \u201cA lightweight algorithm for message type extraction in system application logs,\u201d IEEE Trans. Knowl. Data Eng., vol.\u200924, no.\u200911, pp.\u20091921\u20131936, 2012. [Online]. Available: https:\/\/doi.org\/10.1109\/TKDE.2011.138.","DOI":"10.1109\/TKDE.2011.138"},{"key":"2025071712192953906_j_itit-2021-0064_ref_009","doi-asserted-by":"crossref","unstructured":"F. Haddadi and A.\u2009N. Zincir-Heywood, \u201cBenchmarking the effect of flow exporters and protocol filters on botnet traffic classification,\u201d IEEE Syst. J., vol.\u200910, no.\u20094, pp.\u20091390\u20131401, 2016. [Online]. Available: https:\/\/doi.org\/10.1109\/JSYST.2014.2364743.","DOI":"10.1109\/JSYST.2014.2364743"},{"key":"2025071712192953906_j_itit-2021-0064_ref_010","doi-asserted-by":"crossref","unstructured":"R. Alshammari and A.\u2009N. Zincir-Heywood, \u201cThe impact of evasion on the generalization of machine learning algorithms to classify voip traffic,\u201d in 21st International Conference on Computer Communications and Networks, ICCCN 2012, Munich, Germany, July 30 \u2013 August 2, 2012. IEEE, 2012, pp.\u20091\u20138. [Online]. Available: https:\/\/doi.org\/10.1109\/ICCCN.2012.6289243.","DOI":"10.1109\/ICCCN.2012.6289243"},{"key":"2025071712192953906_j_itit-2021-0064_ref_011","doi-asserted-by":"crossref","unstructured":"D.\u2009C. Le and N. Zincir-Heywood, \u201cA frontier: Dependable, reliable and secure machine learning for network\/system management,\u201d J. Netw. Syst. Manag., vol.\u200928, no.\u20094, pp.\u2009827\u2013849, 2020. [Online]. Available: https:\/\/doi.org\/10.1007\/s10922-020-09512-5.","DOI":"10.1007\/s10922-020-09512-5"},{"key":"2025071712192953906_j_itit-2021-0064_ref_012","unstructured":"D. Bhamare, T. Salman, M. Samaka, A. Erbad, and R. Jain, \u201cFeasibility of supervised machine learning for cloud security,\u201d CoRR, vol. abs\/1810.09878, 2018. [Online]. Available: http:\/\/arxiv.org\/abs\/1810.09878."},{"key":"2025071712192953906_j_itit-2021-0064_ref_013","doi-asserted-by":"crossref","unstructured":"B. Andriamanalimanana, A. Tekeoglu, K. Bekiroglu, S. Sengupta, C. Chiang, M. Reale, and J.\u2009E. Novillo, \u201cSymmetric kullback-leibler divergence of softmaxed distributions for anomaly scores,\u201d in Conference on Communications and Network Security. IEEE, 2019, pp.\u20091\u20136.","DOI":"10.1109\/CNS44998.2019.8952588"},{"key":"2025071712192953906_j_itit-2021-0064_ref_014","doi-asserted-by":"crossref","unstructured":"H.\u2009T. Nguyen and K. Franke, \u201cAdaptive intrusion detection system via online machine learning,\u201d in International Conference on Hybrid Intelligent Systems. IEEE, 2012, pp.\u2009271\u2013277.","DOI":"10.1109\/HIS.2012.6421346"},{"key":"2025071712192953906_j_itit-2021-0064_ref_015","unstructured":"C. Raissi, J. Brissaud, G. Dray, P. Poncelet, M. Roche, and M. Teisseire, \u201cWeb analyzing traffic challenge: description and results,\u201d in Proceedings of the ECML\/PKDD, 2007, pp.\u200947\u201352."},{"key":"2025071712192953906_j_itit-2021-0064_ref_016","unstructured":"ECML\/PKDD, \u201cEcml\/pkdd 2007 discovery challenge,\u201d September 2021, https:\/\/gitlab.fing.edu.uy\/gsi\/web-application-attacks-datasets\/-\/tree\/master\/ecml_pkdd."},{"key":"2025071712192953906_j_itit-2021-0064_ref_017","doi-asserted-by":"crossref","unstructured":"A. Aizawa, \u201cAn information-theoretic perspective of tf\u2013idf measures,\u201d Information Processing & Management, vol.\u200939, no.\u20091, pp.\u200945\u201365, 2003.","DOI":"10.1016\/S0306-4573(02)00021-3"},{"key":"2025071712192953906_j_itit-2021-0064_ref_018","unstructured":"University of Victoria, \u201cIsot-cid cloud security,\u201d October 2021, https:\/\/www.uvic.ca\/ecs\/ece\/isot\/datasets\/cloud-security\/index.php?utm_medium=redirect&utm_source=\/engineering\/ece\/isot\/datasets\/cloud-security\/index.php&utm_campaign=redirect-usage."},{"key":"2025071712192953906_j_itit-2021-0064_ref_019","unstructured":"Muhammad Anis Al Hilmi, Kurnia Adi Cahyanto, and Muhamad Mustamiin, Apache Web Server - Access Log Pre-processing for Web Intrusion Detection, IEEE Dataport, 2020, https:\/\/dx.doi.org\/10.21227\/vvvq-6w47."},{"key":"2025071712192953906_j_itit-2021-0064_ref_020","doi-asserted-by":"crossref","unstructured":"H. He and E.\u2009A. Garcia, \u201cLearning from imbalanced data,\u201d IEEE Transactions on Knowledge and Data Engineering, vol.\u200921, no.\u20099, pp.\u20091263\u20131284, 2009.","DOI":"10.1109\/TKDE.2008.239"},{"key":"2025071712192953906_j_itit-2021-0064_ref_021","doi-asserted-by":"crossref","unstructured":"N.\u2009V. Chawla, K.\u2009W. Bowyer, L.\u2009O. Hall, and W.\u2009P. Kegelmeyer, \u201cSmote: synthetic minority over-sampling technique,\u201d Journal of artificial intelligence research, vol.\u200916, pp.\u2009321\u2013357, 2002.","DOI":"10.1613\/jair.953"},{"key":"2025071712192953906_j_itit-2021-0064_ref_022","doi-asserted-by":"crossref","unstructured":"S. Lloyd, \u201cLeast squares quantization in PCM,\u201d IEEE Transactions on Information Theory, vol.\u200928, no.\u20092, pp.\u2009129\u2013137, 1982.","DOI":"10.1109\/TIT.1982.1056489"},{"key":"2025071712192953906_j_itit-2021-0064_ref_023","unstructured":"M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, \u201cA density-based algorithm for discovering clusters in large spatial databases with noise,\u201d in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996, pp.\u2009226\u2013231."},{"key":"2025071712192953906_j_itit-2021-0064_ref_024","doi-asserted-by":"crossref","unstructured":"A.\u2009P. Dempster, N.\u2009M. Laird, and D.\u2009B. Rubin, \u201cMaximum likelihood from incomplete data via the em algorithm,\u201d Journal of the Royal Statistical Society: Series B (Methodological), vol.\u200939, no.\u20091, pp.\u20091\u201322, 1977.","DOI":"10.1111\/j.2517-6161.1977.tb01600.x"},{"key":"2025071712192953906_j_itit-2021-0064_ref_025","doi-asserted-by":"crossref","unstructured":"M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I.\u2009H. Witten, \u201cThe weka data mining software: an update,\u201d ACM SIGKDD explorations newsletter, vol.\u200911, no.\u20091, pp.\u200910\u201318, 2009.","DOI":"10.1145\/1656274.1656278"}],"container-title":["it - Information Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/itit-2021-0064\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/itit-2021-0064\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,17]],"date-time":"2025-07-17T12:19:40Z","timestamp":1752754780000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/itit-2021-0064\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,23]]},"references-count":25,"journal-issue":{"issue":"1-2","published-online":{"date-parts":[[2022,3,23]]},"published-print":{"date-parts":[[2022,4,26]]}},"alternative-id":["10.1515\/itit-2021-0064"],"URL":"https:\/\/doi.org\/10.1515\/itit-2021-0064","relation":{},"ISSN":["2196-7032","1611-2776"],"issn-type":[{"type":"electronic","value":"2196-7032"},{"type":"print","value":"1611-2776"}],"subject":[],"published":{"date-parts":[[2022,3,23]]}}}