{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T23:29:25Z","timestamp":1740180565908,"version":"3.37.3"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T00:00:00Z","timestamp":1721865600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T00:00:00Z","timestamp":1721865600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62176264"],"award-info":[{"award-number":["62176264"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cybersecurity"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Nowadays, the malicious MS-Office document has already become one of the most effective attacking vectors in APT attacks. Though many protection mechanisms are provided, they have been proved easy to bypass, and the existed detection methods show poor performance when facing malicious documents with unknown vulnerabilities or with few malicious behaviors. In this paper, we first introduce the definition of im-documents, to describe those vulnerable documents which show implicitly malicious behaviors and escape most of public antivirus engines. Then we present GLDOC\u2014a GCN based framework that is aimed at effectively detecting im-documents with dynamic analysis, and improving the possible blind spots of past detection methods. Besides the system call which is the only focus in most researches, we capture all dynamic behaviors in sandbox, take the process tree into consideration and reconstruct both of them into graphs. Using each line to learn each graph, GLDOC trains a 2-channel network as well as a classifier to formulate the malicious document detection problem into a graph learning and classification problem. Experiments show that GLDOC has a comprehensive balance of accuracy rate and false alarm rate\u2009\u2212\u200995.33% and 4.33% respectively, outperforming other detection methods. When further testing in a simulated 5-day attacking scenario, our proposed framework still maintains a stable and high detection accuracy on the unknown vulnerabilities.<\/jats:p>","DOI":"10.1186\/s42400-024-00243-7","type":"journal-article","created":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T02:01:18Z","timestamp":1721872878000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["GLDOC: detection of implicitly malicious MS-Office documents using graph convolutional networks"],"prefix":"10.1186","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4767-8794","authenticated-orcid":false,"given":"Wenbo","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Yi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taotao","family":"Kou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weitao","family":"Han","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chengyu","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,7,25]]},"reference":[{"key":"243_CR1","unstructured":"2020 Global Advanced Persistent Threat APT Research Report. Available at https:\/\/www.freebuf.com\/sectool\/242507.html"},{"key":"243_CR2","unstructured":"A roundup of the world's top 10 APT attacks in 2018. Available at https:\/\/www.freebuf.com\/articles\/193393.html"},{"key":"243_CR3","unstructured":"An update on MD5 poisoning. Available in https:\/\/blog.silentsignal.eu\/2016\/11\/28\/an-update-on-md5-poisoning\/"},{"issue":"4","key":"243_CR4","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1007\/s11416-011-0152-x","volume":"7","author":"B Anderson","year":"2011","unstructured":"Anderson B, Quist D, Neil J, Storlie C, Lane T (2011) Graph-based malware detection using dynamic analysis. J Comput Virol 7(4):247\u2013258","journal-title":"J Comput Virol"},{"key":"243_CR5","unstructured":"Domain takeover report in Hackerone. Available in https:\/\/hackerone.com\/reports\/1253926"},{"issue":"8","key":"243_CR6","doi-asserted-by":"publisher","first-page":"4088","DOI":"10.3390\/app12084088","volume":"12","author":"J Hong","year":"2022","unstructured":"Hong J, Jeong D, Kim SW (2022) Classifying malicious documents on the basis of plain-text features: problem, solution, and experiences. Appl Sci 12(8):4088","journal-title":"Appl Sci"},{"key":"243_CR7","unstructured":"https:\/\/www.fireeye.com\/content\/dam\/fireeye-www\/solutions\/pdfs\/ig-email-security-gap.pdf"},{"key":"243_CR8","unstructured":"Huneault S, Talhi C (2020) P-Code based classification to detect malicious VBA macro. In: 2020 International Symposium on Networks, Computers and Communications (ISNCC), Montreal, Canada, (pp. 20\u201322)"},{"key":"243_CR9","doi-asserted-by":"crossref","unstructured":"Jiang J, Wang C, Yu M, et al. (2021) NFDD: a dynamic malicious document detection method without manual feature dictionary[C]. Lecture Notes in Computer Science, (pp. 147\u2013159)","DOI":"10.1007\/978-3-030-86130-8_12"},{"key":"243_CR10","doi-asserted-by":"crossref","unstructured":"Khan MS, Siddiqui S, Ferens K (2017) Cognitive modeling of polymorphic malware using fractal based semantic characterization. In: Technologies for Homeland Security (HST), 2017 IEEE International Symposium, Waltham, MA, USA, IEEE, (pp. 1\u20137)","DOI":"10.1109\/THS.2017.7943487"},{"key":"243_CR11","unstructured":"Kim G, Yi H, Lee J, et al. (2016) LSTM-based system-call language modeling and robust ensemble method for designing host-based intrusion detection systems. arXiv preprint arXiv:1611.01726"},{"key":"243_CR12","doi-asserted-by":"crossref","unstructured":"Kim S, et al. (2018) Obfuscated VBA macro detection using machine learning. In: 2018 48th annual IEEE\/IFIP international conference on dependable systems and networks (DSN), Luxembourg, (pp. 25\u201328)","DOI":"10.1109\/DSN.2018.00057"},{"key":"243_CR13","unstructured":"Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907"},{"issue":"5","key":"243_CR14","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1109\/52.605929","volume":"14","author":"AP Kosoresow","year":"1997","unstructured":"Kosoresow AP, Hofmeyer S (1997) Intrusion detection via system call traces. IEEE Softw 14(5):35\u201342","journal-title":"IEEE Softw"},{"key":"243_CR15","first-page":"267","volume":"2","author":"W Li","year":"2010","unstructured":"Li W, Su PR, Shi YF (2010) A technique for detecting based on calculation malicious documents of vector spaces. J Grad Sch Chin Acad Sci 2:267\u2013274","journal-title":"J Grad Sch Chin Acad Sci"},{"key":"243_CR16","doi-asserted-by":"crossref","unstructured":"Li WJ, Stolfo S, Stavrou A, et al. (2007) A study of malcode-bearing documents[C]. Lecture Notes in Computer Science, (pp. 231\u2013250)","DOI":"10.1007\/978-3-540-73614-1_14"},{"key":"243_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2019.105598","volume":"82","author":"L Liu","year":"2019","unstructured":"Liu L, He X, Liu L et al (2019) Capturing the symptoms of malicious code in electronic documents by file\u2019s entropy signal combined with machine learning. Appl Soft Comput 82:105598","journal-title":"Appl Soft Comput"},{"key":"243_CR18","unstructured":"Long-lasting exploitation of backdoors. Available at https:\/\/paper.seebug.org\/1007\/"},{"key":"243_CR19","doi-asserted-by":"crossref","unstructured":"Maiorca D, Giacinto G, Corona (2012) A pattern recognition system for malicious PDF files detection. In: International Conference on Machine Learning and Data Mining in Pattern Recognition. (pp. 510\u2013524)","DOI":"10.1007\/978-3-642-31537-4_40"},{"key":"243_CR20","unstructured":"Meng X, Kim T (2017) PlatPal: detecting malicious documents with platform diversity. In: USENIX Security Symposium, (pp. 271\u2013287)"},{"key":"243_CR21","unstructured":"Microsoft bounty program. Available at https:\/\/www.microsoft.com\/en-us\/msrc\/bounty"},{"key":"243_CR22","first-page":"493","volume":"28","author":"M Mimura","year":"2020","unstructured":"Mimura M, Taro O (2020) Using LSI to detect unknown malicious VBA macros. J Inf Process 28:493\u2013501","journal-title":"J Inf Process"},{"key":"243_CR23","first-page":"246","volume":"49","author":"N Nissim","year":"2014","unstructured":"Nissim N, Cohen A, Glezer C, Elovici Y (2014) Detection of malicious PDF files and directions for enhancements: a state-of-the art survey. Comput Secur 49:246\u2013266","journal-title":"Comput Secur"},{"issue":"3","key":"243_CR24","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1109\/TIFS.2016.2631905","volume":"12","author":"N Nissim","year":"2017","unstructured":"Nissim N, Cohen A, Elovici Y (2017) ALDOCX: detection of unknown malicious microsoft office documents using designated active learning methods based on new structural feature extraction methodology. IEEE Trans Inf Forensics Secur 12(3):631\u2013646. https:\/\/doi.org\/10.1109\/TIFS.2016.2631905","journal-title":"IEEE Trans Inf Forensics Secur"},{"key":"243_CR25","unstructured":"Pytorch geometric. Available at https:\/\/pytorch-geometric.readthedocs.io\/en\/latest\/"},{"key":"243_CR26","unstructured":"Pytorch. Available at https:\/\/pytorch.org\/"},{"key":"243_CR27","unstructured":"Report for prepared file in Threatcn. Available at https:\/\/s.threatbook.cn\/report\/file\/05c2c1cdcafcce4e9c64e900298d0bc07ebd4be9af861da74df79ed5ed36ced8"},{"key":"243_CR28","unstructured":"Report for prepared file in VirusTotal. Available at https:\/\/www.virustotal.com\/gui\/file\/05c2c1cdcafcce4e9c64e900298d0bc07ebd4be9af861da74df79ed5ed36ced8"},{"key":"243_CR29","doi-asserted-by":"crossref","unstructured":"Ruaro N, Pagani F, Ortolani S (2022) SYMBEXCEL: Automated analysis and understanding of malicious excel 4.0 macros. In: IEEE Symposium on Security and Privacy (SP), San Francisco, USA, (pp. 23\u201325)","DOI":"10.1109\/SP46214.2022.9833765"},{"key":"243_CR30","doi-asserted-by":"crossref","unstructured":"Scofield D, Miles C, Kuhn S (2017) Fast model learning for the detection of malicious digital documents. In: Proceedings of the 7th Software Security, Protection, and Reverse Engineering, Software Security and Protection Workshop","DOI":"10.1145\/3151137.3151142"},{"key":"243_CR31","unstructured":"Security basis: response and feedback. Available at https:\/\/blog.csdn.net\/wutianxu123\/article\/details\/82940721"},{"key":"243_CR32","unstructured":"Total Security. Available at https:\/\/weishi.360.cn\/?source=homepage\/"},{"key":"243_CR33","unstructured":"Tan Y, Liu Y, Long G, et al. (2022) Federated learning on non-IID Graphs via structural knowledge sharing. arXiv preprint arXiv:2211.13009"},{"issue":"8","key":"243_CR37","first-page":"9953","volume":"37","author":"Y Tan","year":"2023","unstructured":"Tan Y, et al (2023) Federated learning on non-iid graphs via structural knowledge sharing. Proc AAAI Conf Artif Intell 37(8):9953-9961","journal-title":"Proc AAAI Conf Artif Intell"},{"key":"243_CR34","unstructured":"Top 10 vulnerabilities used by APT organizations in recent years. Available at https:\/\/www.freebuf.com\/articles\/network\/168121.html"},{"key":"243_CR35","doi-asserted-by":"crossref","unstructured":"Xu L, Zhang D, Alvarez M A, et al. (2016) Dynamic android malware classification using graph-based representations. In: IEEE 3rd international conference on cyber security and cloud computing (CSCloud). IEEE, (pp. 220\u2013231)","DOI":"10.1109\/CSCloud.2016.27"},{"issue":"3","key":"243_CR36","first-page":"54","volume":"6","author":"M Yu","year":"2021","unstructured":"Yu M, Jianguo J, Gang L et al (2021) A survey of research on malicious document detection. J Cyber Secur. 6(3):54\u201376","journal-title":"J Cyber Secur."}],"container-title":["Cybersecurity"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-024-00243-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42400-024-00243-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42400-024-00243-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,25]],"date-time":"2024-07-25T02:09:09Z","timestamp":1721873349000},"score":1,"resource":{"primary":{"URL":"https:\/\/cybersecurity.springeropen.com\/articles\/10.1186\/s42400-024-00243-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,25]]},"references-count":37,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["243"],"URL":"https:\/\/doi.org\/10.1186\/s42400-024-00243-7","relation":{},"ISSN":["2523-3246"],"issn-type":[{"type":"electronic","value":"2523-3246"}],"subject":[],"published":{"date-parts":[[2024,7,25]]},"assertion":[{"value":"4 September 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 July 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and\/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"48"}}