{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T12:37:16Z","timestamp":1740141436226,"version":"3.37.3"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,1,20]],"date-time":"2020-01-20T00:00:00Z","timestamp":1579478400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100012818","name":"Comunidad de Madrid","doi-asserted-by":"publisher","award":["CYNAMON (P2018\/TCS-4566)"],"award-info":[{"award-number":["CYNAMON (P2018\/TCS-4566)"]}],"id":[{"id":"10.13039\/100012818","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Spanish Government","award":["DPI2015-65833-P","TIN2014-54580-R"],"award-info":[{"award-number":["DPI2015-65833-P","TIN2014-54580-R"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,7,24]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In cybersecurity, there is a call for adaptive, accurate and efficient procedures to identifying performance shortcomings and security breaches. The increasing complexity of both Internet services and traffic determines a scenario that in many cases impedes the proper deployment of intrusion detection and prevention systems. Although it is a common practice to monitor network and applications activity, there is not a general methodology to codify and interpret the recorded events. Moreover, this lack of methodology somehow erodes the possibility of diagnosing whether event detection and recording is adequately performed. As a result, there is an urge to construct general codification and classification procedures to be applied on any type of security event in any activity log. This work is focused on defining such a method using the so-called normalized compression distance (NCD). NCD is parameter-free and can be applied to determine the distance between events expressed using strings. As a first step in the concretion of a methodology for the integral interpretation of security events, this work is devoted to the characterization of web logs. On the grounds of the NCD, we propose an anomaly-based procedure for identifying web attacks from web logs. Given a web query as stored in a security log, a NCD-based feature vector is created and classified using a support vector machine. The method is tested using the CSIC-2010 data set, and the results are analyzed with respect to similar proposals.<\/jats:p>","DOI":"10.1093\/jigpal\/jzz062","type":"journal-article","created":{"date-parts":[[2020,1,2]],"date-time":"2020-01-02T04:13:02Z","timestamp":1577938382000},"page":"546-557","source":"Crossref","is-referenced-by-count":9,"title":["On the application of compression-based metrics to identifying anomalous behaviour in web traffic"],"prefix":"10.1093","volume":"28","author":[{"given":"Gonzalo","family":"de la Torre-Abaitua","sequence":"first","affiliation":[{"name":"Departamento de Ingenier\u00eda Inform\u00e1tica, Escuela Polit\u00e9cnica Superior, Universidad Aut\u00f3noma de Madrid, Spain"}]},{"given":"Luis F","family":"Lago-Fern\u00e1ndez","sequence":"additional","affiliation":[{"name":"Departamento de Ingenier\u00eda Inform\u00e1tica, Escuela Polit\u00e9cnica Superior, Universidad Aut\u00f3noma de Madrid, Spain"}]},{"given":"David","family":"Arroyo","sequence":"additional","affiliation":[{"name":"Institute of Physical and Information Technologies (ITEFI), Spanish National Research Council (CSIC) Madrid, Spain"}]}],"member":"286","published-online":{"date-parts":[[2020,1,20]]},"reference":[{"key":"2020080108271656600_ref1","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1109\/SP.2014.21","article-title":"Doppelg\u00e4nger finder: taking stylometry to the underground","volume-title":"Security and Privacy (SP), 2014 IEEE Symposium on","author":"Afroz","year":"2014"},{"key":"2020080108271656600_ref2","first-page":"19","article-title":"Detecting broad length algorithmically generated domains","volume-title":"Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments","author":"Ahluwalia","year":"2007"},{"key":"2020080108271656600_ref3","article-title":"NCD based masquerader detection using enriched command lines","volume":"4397","author":"Bertacchini","year":"2004","journal-title":"Innovations"},{"key":"2020080108271656600_ref4","first-page":"31","article-title":"Preliminary results on masquerader detection using compression based similarity metrics 2 previous work","volume":"7,","author":"Bertacchini","year":"2007","journal-title":"Electronic Journal of SADIO"},{"key":"2020080108271656600_ref5","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1109\/SURV.2013.052213.00046","article-title":"Network anomaly detection: methods, systems and tools","volume":"16","author":"Bhuyan","year":"2014","journal-title":"Communications Surveys & Tutorials, IEEE"},{"author":"Bloom","key":"2020080108271656600_ref6","article-title":"Charles Bloom\u2019s page: source code: PPMZ. [online]"},{"author":"bzip.org","key":"2020080108271656600_ref7","article-title":"bzip2: Home"},{"key":"2020080108271656600_ref8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1541880.1541882","article-title":"Anomaly detection: a survey","volume":"41","author":"Chandola","year":"2009","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"2020080108271656600_ref9","first-page":"107","article-title":"Chaurasia. Comparative study of data mining techniques in intrusion detection","volume":"3","author":"M","year":"2016","journal-title":"International Journal of Current Engineering and Scientific Research (IJCESR)"},{"key":"2020080108271656600_ref10","doi-asserted-by":"crossref","first-page":"2309","DOI":"10.1109\/ISIT.2006.261979","article-title":"Automatic extraction of meaning from the web","volume-title":"2006 IEEE International Symposium on Information Theory","author":"Cilibrasi","year":"2006"},{"key":"2020080108271656600_ref11","doi-asserted-by":"crossref","first-page":"1523","DOI":"10.1109\/TIT.2005.844059","article-title":"Clustering by compression","volume":"51,","author":"Cilibrasi","year":"2005","journal-title":"IEEE Transactions on Information Theory"},{"article-title":"Http dataset CSIC","year":"2010","author":"CSIC-dataset","key":"2020080108271656600_ref12"},{"key":"2020080108271656600_ref13","article-title":"Big data fuels intelligence-driven security","author":"Curry","year":"2013","journal-title":"RSA Security Brief"},{"key":"2020080108271656600_ref14","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1016\/j.infsof.2016.02.005","article-title":"Securing web applications from injection and logic vulnerabilities: approaches and challenges","volume":"74","author":"Deepa","year":"2016","journal-title":"Information and Software Technology"},{"key":"2020080108271656600_ref15","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1007\/978-3-540-74141-1_22","article-title":"Catching the drift: using feature-free case-based reasoning for spam filtering","volume-title":"Case-Based Reasoning Research and Development","author":"Delany","year":"2007"},{"key":"2020080108271656600_ref16","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1016\/j.jss.2013.09.020","article-title":"A formal methodology for integral security design and verification of network protocols","volume":"89","author":"Diaz","year":"2014","journal-title":"Journal of Systems and Software"},{"volume-title":"Adaptively detecting malicious queries in web attacks","year":"2017","author":"Dong","key":"2020080108271656600_ref17"},{"key":"2020080108271656600_ref18","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1016\/j.cose.2008.08.003","article-title":"Anomaly-based network intrusion detection: techniques, systems and challenges","volume":"28","author":"Garc\u00eda-Teodoro","year":"2009","journal-title":"Computers & Security"},{"author":"Gartner","key":"2020080108271656600_ref19","article-title":"Gartner says 6.4 billion connected \u2018things\u2019 will be in use in 2016, up 30 percent from 2015"},{"author":"GNU","key":"2020080108271656600_ref20","article-title":"Gzip - GNU project - free Software Foundation"},{"article-title":"Adversarial perturbations against deep neural networks for malware classification","year":"2016","author":"Grosse","key":"2020080108271656600_ref21"},{"key":"2020080108271656600_ref22","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1145\/3133956.3134012","article-title":"Deep models under the Gan: information leakage from collaborative deep learning","volume-title":"Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security","author":"Hitaj","year":"2017"},{"key":"2020080108271656600_ref23","first-page":"1","article-title":"Shallow and deep networks intrusion detection system: a taxonomy and survey","author":"Hodo","year":"2017","journal-title":"ArXiv e-prints"},{"key":"2020080108271656600_ref24","first-page":"42","article-title":"Comparing anomaly detection techniques for HTTP","volume":"4637","author":"Ingham","year":"2007","journal-title":"Raid"},{"key":"2020080108271656600_ref25","article-title":"Machine learning with personal data","author":"Kamarinou","year":"2016","journal-title":"Queen Mary School of Law Legal Studies"},{"key":"2020080108271656600_ref26","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1145\/1014052.1014077","article-title":"Towards parameter-free data mining","volume-title":"Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Keogh","year":"2004"},{"key":"2020080108271656600_ref27","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1145\/3132847.3132866","article-title":"Crowdsourcing cybersecurity: cyber attack detection using social media","author":"Khandpur","year":"2017","journal-title":"Proceedings of the 2017 ACM on Conference on Information and Knowledge Management"},{"key":"2020080108271656600_ref28","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1109\/IWBIS.2017.8275095","article-title":"Deep learning in intrusion detection perspective: overview and further challenges","volume-title":"2017 International Workshop on Big Data and Information Security (IWBIS)","author":"Kim","year":"2017"},{"volume-title":"Intrusion Detection and Correlation: Challenges and Solutions","year":"2004","author":"Kruegel","key":"2020080108271656600_ref29"},{"key":"2020080108271656600_ref30","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1145\/948109.948144","article-title":"Anomaly detection of web-based attacks","volume-title":"Proceedings of the 10th ACM Conference on Computer and Communications Security","author":"Kruegel","year":"2003"},{"article-title":"Current challenges and future research areas for digital forensic investigation.","year":"2016","author":"Lillis","key":"2020080108271656600_ref31"},{"volume-title":"Applied Security Visualization.","year":"2009","author":"Marty","key":"2020080108271656600_ref32"},{"key":"2020080108271656600_ref33","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/978-3-642-21323-6_4","article-title":"Application of the generic feature selection measure in detection of web attacks","volume-title":"Computational Intelligence in Security for Information Systems","author":"Nguyen","year":"2011"},{"key":"2020080108271656600_ref34","first-page":"2017","article-title":"Practical black-box attacks against machine learning","volume-title":"2017 ACM Asia Conference on Computer and Communications Security, ASIA CCS 2017","author":"Papernot"},{"key":"2020080108271656600_ref35","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1080\/01969722.2013.805110","article-title":"Spam detection using data compression and signatures","volume":"44,","author":"Prilepok","year":"2013","journal-title":"Cybernetics and Systems"},{"key":"2020080108271656600_ref36","first-page":"28","article-title":"Introduction","volume-title":"Core Software Security","author":"Ransome","year":"2013"},{"key":"2020080108271656600_ref37","first-page":"1041","article-title":"Vulnerability disclosure in the age of social media: exploiting twitter for predicting real-world exploits","volume-title":"USENIX Security Symposium","author":"Sabottke","year":"2015"},{"article-title":"Scikit-learn: machine learning in python - scikit-learn 0.18.1 documentation","year":"2015","author":"scikit learn","key":"2020080108271656600_ref38"},{"key":"2020080108271656600_ref39","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1145\/3011077.3011112","article-title":"A method for detecting dga botnet based on semantic and cluster analysis","volume-title":"Proceedings of the Seventh Symposium on Information and Communication Technology, SoICT\u201916","author":"Tong","year":"2016"},{"author":"Tukaani-project","key":"2020080108271656600_ref40","article-title":"Quick benchmark: Gzip vs. Bzip2 vs. LZMA"},{"key":"2020080108271656600_ref41","first-page":"49","article-title":"Ai$^\\wedge$ 2: Training a big data machine to defend","volume-title":"Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 2016 IEEE 2nd International Conference on","author":"Veeramachaneni","year":"2016"},{"key":"2020080108271656600_ref42","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/978-0-387-84816-7_3","article-title":"Normalized information distance","author":"Vit\u00e1nyi","year":"2009","journal-title":"Information Theory and Statistical Learning"},{"key":"2020080108271656600_ref43","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1109\/COMST.2014.2336610","article-title":"A survey of distance and similarity measures used within network intrusion anomaly detection","volume":"17,","author":"Weller-Fahy","year":"2015","journal-title":"IEEE Communications Surveys and Tutorials"},{"key":"2020080108271656600_ref44","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1,","author":"Wolpert","year":"1997","journal-title":"IEEE Transactions on Evolutionary Computation"},{"article-title":"URI anomaly detection using similarity metrics.","year":"2008","author":"Yahalom","key":"2020080108271656600_ref45"},{"key":"2020080108271656600_ref46","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1109\/PacificVis.2014.22","article-title":"Bridging the gap of network management and anomaly detection through interactive visualization","volume-title":"2014 IEEE Pacific Visualization Symposium","author":"Zhang","year":"2014"},{"author":"Ziv","key":"2020080108271656600_ref47","article-title":"The ethereal perimeter"}],"container-title":["Logic Journal of the IGPL"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jigpal\/article-pdf\/28\/4\/546\/33554879\/jzz062.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jigpal\/article-pdf\/28\/4\/546\/33554879\/jzz062.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,24]],"date-time":"2023-09-24T15:51:06Z","timestamp":1695570666000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jigpal\/article\/28\/4\/546\/5709611"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,20]]},"references-count":47,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,1,20]]},"published-print":{"date-parts":[[2020,7,24]]}},"URL":"https:\/\/doi.org\/10.1093\/jigpal\/jzz062","relation":{},"ISSN":["1367-0751","1368-9894"],"issn-type":[{"type":"print","value":"1367-0751"},{"type":"electronic","value":"1368-9894"}],"subject":[],"published-other":{"date-parts":[[2020,8]]},"published":{"date-parts":[[2020,1,20]]}}}