{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T07:02:50Z","timestamp":1771916570505,"version":"3.50.1"},"reference-count":44,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T00:00:00Z","timestamp":1727654400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"funder":[{"name":"Key JCJQ Program of China","award":["2020-JCJQ-ZD-021-00"],"award-info":[{"award-number":["2020-JCJQ-ZD-021-00"]}]},{"name":"Key JCJQ Program of China","award":["2020-JCJQ-ZD-024-12"],"award-info":[{"award-number":["2020-JCJQ-ZD-024-12"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,2,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Protocol reverse engineering is crucial in normative verification, and malware behavior analysis and vulnerability discovery. However, uncovering the structural features of binary protocols concealed within dense data representations remains a significant challenge. Accurately identifying keyword segments associated with message types is a prerequisite for meaningful semantic analysis and protocol state machine reduction. In this work, we introduce a novel approach for inferring keywords from binary protocols based on probabilistic statistics. Our method in terms of Byte employs heuristic rules to filter offset positions that are clearly unrelated to message types. We further filter candidate Byte-offsets utilizing constraint relations and provide the probabilistic ranking of each offset as the keyword segment. To enhance the reliability of keyword segment inference, we utilize the Monte Carlo algorithm to assess the difference between message clustering with candidate Byte-offset and random message clustering, and reorder candidate offsets according to the results. Then we can observe optimal values from both orderings and present the ultimate inference results. Experimental results demonstrate that our method excels in the accuracy of keyword segments identification compared with previous techniques.<\/jats:p>","DOI":"10.1093\/comjnl\/bxae096","type":"journal-article","created":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T11:32:08Z","timestamp":1727695928000},"page":"109-125","source":"Crossref","is-referenced-by-count":2,"title":["ProInfer: inference of binary protocol keywords based on probabilistic statistics"],"prefix":"10.1093","volume":"68","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7990-4165","authenticated-orcid":false,"given":"Maohua","family":"Guo","sequence":"first","affiliation":[{"name":"Key Laboratory of Cyberspace Security, Ministry of Education , 62 Science Avenue, Zhengzhou 450001,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9559-8783","authenticated-orcid":false,"given":"Yuefei","family":"Zhu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Cyberspace Security, Ministry of Education , 62 Science Avenue, Zhengzhou 450001,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8499-9402","authenticated-orcid":false,"given":"Jinlong","family":"Fei","sequence":"additional","affiliation":[{"name":"Key Laboratory of Cyberspace Security, Ministry of Education , 62 Science Avenue, Zhengzhou 450001,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2024,9,30]]},"reference":[{"key":"2025021705265293200_ref1","doi-asserted-by":"publisher","first-page":"2005","DOI":"10.1109\/TIFS.2023.3262125","article-title":"Find it with a pencil: an efficient approach for vulnerability detection in authentication protocols","volume":"18","author":"Ghahramani","year":"2023","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"2025021705265293200_ref2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/BMSB58369.2023.10211222","volume-title":"Proceedings of IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Beijing, 14\u201316 June","author":"Lv","year":"2023"},{"key":"2025021705265293200_ref3","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1504\/IJCCBS.2018.096190","article-title":"Model-based specification and validation of the dual-mode adaptive MAC protocol","volume":"8","author":"Somappa","year":"2018","journal-title":"Int J Crit Comput-Based Syst"},{"key":"2025021705265293200_ref4","first-page":"2653","volume-title":"Proceedings of the USENIX Security, Anaheim, CA, 9\u201311 August","author":"Wu","year":"2023"},{"key":"2025021705265293200_ref5","doi-asserted-by":"publisher","first-page":"1125","DOI":"10.1007\/s10207-023-00682-2","article-title":"A systematic literature review for network intrusion detection system (IDS)","volume":"22","author":"Abdulganiyu","year":"2023","journal-title":"Int J Inform Security"},{"key":"2025021705265293200_ref6","doi-asserted-by":"publisher","first-page":"460","DOI":"10.1109\/icst46399.2020.00062","volume-title":"Proceedings of IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), Porto, Portugal, 24\u201328 October","author":"Pham","year":"2020"},{"key":"2025021705265293200_ref7","first-page":"1093","volume-title":"Proceedings of the USENIX Security, Vancouver, BC, 16\u201318 August","author":"Antonakakis","year":"2017"},{"key":"2025021705265293200_ref8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/infocom.2017.8057064","volume-title":"Proceedings of IEEE Conference on Computer Communications, Atlanta, GA, 1\u20134 May","author":"De Carli","year":"2017"},{"key":"2025021705265293200_ref9","doi-asserted-by":"publisher","first-page":"238","DOI":"10.1016\/j.comcom.2021.11.009","article-title":"Protocol reverse-engineering methods and tools: a survey","volume":"182","author":"Huang","year":"2022","journal-title":"Comput Commun"},{"key":"2025021705265293200_ref10","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1631\/FITEE.2000709","article-title":"Automatic protocol reverse engineering for industrial control systems with dynamic taint analysis","volume":"23","author":"Ma","year":"2022","journal-title":"Front Inform Technol Electron Engineer"},{"key":"2025021705265293200_ref11","first-page":"167","article-title":"Private protocol reverse engineering based on network traffic: a survey","volume":"60","author":"Junchen","year":"2022","journal-title":"J Comput Res Develop"},{"key":"2025021705265293200_ref12","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1007\/s11416-016-0289-8","article-title":"State of the art of network protocol reverse engineering tools","volume":"14","author":"Duchene","year":"2018","journal-title":"J Comput Virol Hack Tech"},{"key":"2025021705265293200_ref13","volume-title":"Network Protocol Analysis Using Bioinformatics Algorithms","author":"Beddoe","year":"2004"},{"key":"2025021705265293200_ref14","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1145\/2590296.2590346","volume-title":"Proceedings of the ACM symposium on Information, computer and communications security, Kyoto, 4\u20136 June","author":"Bossert","year":"2014"},{"key":"2025021705265293200_ref15","doi-asserted-by":"publisher","first-page":"2243","DOI":"10.1109\/infocom41043.2020.9155275","volume-title":"Proceedings of IEEE Conference on Computer Communications, Virtual Conference, 6\u20139 July","author":"Kleber","year":"2020"},{"key":"2025021705265293200_ref16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/icnp.2012.6459963","volume-title":"Proceedings of IEEE International Conference on Network Protocols (ICNP), Austin, TX, 30 October-2 November","author":"Wang","year":"2012"},{"key":"2025021705265293200_ref17","first-page":"1","article-title":"Analyzing network protocols of application layer using hidden semi-Markov model","volume":"2016","author":"Cai","year":"2016","journal-title":"Math. Probl. Eng."},{"key":"2025021705265293200_ref18","first-page":"1337","article-title":"Keyword mining for private protocols tunneled over websocket","volume":"20","author":"Li","year":"2016","journal-title":"IEEE Commun Lett"},{"key":"2025021705265293200_ref19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.14722\/ndss.2021.24531","volume-title":"Proceedings of ISOC Network and Distributed System Security Symposium, Virtual Conference, 21\u201325 February","author":"Ye","year":"2021"},{"key":"2025021705265293200_ref20","first-page":"1","volume-title":"Proceedings of the USENIX Security, Santa Clara, CA, 18\u201319 Jun2","author":"Cui","year":"2007"},{"key":"2025021705265293200_ref21","first-page":"2200","article-title":"SPFPA: a format parsing approach for unknown security protocols","volume":"52","author":"Zhu","year":"2015","journal-title":"J Comput Res Develop"},{"key":"2025021705265293200_ref22","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1145\/3209914.3209937","volume-title":"Proceedings of International Conference on Information Science and Systems, Jeju Island, 27\u201329 April","author":"Li","year":"2018"},{"key":"2025021705265293200_ref23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.23919\/apnoms.2019.8893038","volume-title":"Proceedings of Asia-Pacific Network Operations and Management Symposium (APNOMS), Matsue, 18\u201320 September: IEEE","author":"Lee","year":"2019"},{"key":"2025021705265293200_ref24","doi-asserted-by":"publisher","first-page":"606","DOI":"10.1007\/978-981-15-9129-7_42","volume-title":"Proceedings of Security and Privacy in Digital Economy, Quzhou, October 30\u2013November 1","author":"Yang","year":"2020"},{"key":"2025021705265293200_ref25","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1145\/3434581.3434686","volume-title":"Proceedings of International Conference on Aviation Safety and Information Technology, Weihai, 14\u201316 October","author":"Zhao","year":"2020"},{"key":"2025021705265293200_ref26","doi-asserted-by":"publisher","first-page":"116255","DOI":"10.1109\/ACCESS.2023.3325391","article-title":"CNNPRE: a CNN-based protocol reverse engineering method","volume":"11","author":"Garshasbi","year":"2023","journal-title":"IEEE Access"},{"key":"2025021705265293200_ref27","first-page":"1","volume-title":"The Needleman-Wunsch Algorithm for Sequence Alignment, Lecture Given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute","author":"Likic","year":"2008"},{"key":"2025021705265293200_ref28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s12539-021-00473-0","article-title":"A review of parallel implementations for the smith\u2013waterman algorithm","volume":"14","author":"Xia","year":"2022","journal-title":"Interdisc Sci: Computational Life Sciences"},{"key":"2025021705265293200_ref29","first-page":"1","volume-title":"Proceedings of USENIX Workshop on Offensive Technologies, Baltimore, MD, 13\u201314 August","author":"Kleber","year":"2018"},{"key":"2025021705265293200_ref30","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.comcom.2019.06.013","article-title":"Unsupervised field segmentation of unknown protocol messages","volume":"146","author":"Sun","year":"2019","journal-title":"Comput Commun"},{"key":"2025021705265293200_ref31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/icnp55882.2022.9940264","volume-title":"Proceedings of International Conference on Network Protocols (ICNP), Lexington, Kentucky, October 30\u2013November 2","author":"Zhao","year":"2022"},{"key":"2025021705265293200_ref32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.14722\/ndss.2023.23131","volume-title":"Proceedings of ISOC Network and Distributed System Security Symposium, San Diego, California, 27 February\u20133 March","author":"Chandler","year":"2023"},{"key":"2025021705265293200_ref33","volume-title":"Consumer Media Capture: Time-Based Analysis and Event Clustering","author":"Gargi","year":"2003"},{"key":"2025021705265293200_ref34","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"2025021705265293200_ref35","first-page":"1","volume-title":"Proceedings of ISOC Network and Distributed System Security Symposium, San Diego, California, 27 February\u20133 March","author":"Chandler","year":"2023"},{"key":"2025021705265293200_ref36","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1016\/j.ipl.2007.07.002","article-title":"Optimal implementations of UPGMA and other common clustering algorithms","volume":"104","author":"Gronau","year":"2007","journal-title":"Inform Process Lett"},{"key":"2025021705265293200_ref37","author":"Smia2011"},{"key":"2025021705265293200_ref38","author":"NetPlier"},{"key":"2025021705265293200_ref39","author":"ICS-pcap"},{"key":"2025021705265293200_ref40","author":"icsmaster"},{"key":"2025021705265293200_ref41","author":"BinaryInferno"},{"key":"2025021705265293200_ref42","author":"ZeroAccess"},{"key":"2025021705265293200_ref43","author":"Tshark"},{"key":"2025021705265293200_ref44","author":"MAVLink Protocol"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/2\/109\/59445561\/bxae096.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/2\/109\/59445561\/bxae096.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,17]],"date-time":"2025-02-17T05:27:06Z","timestamp":1739770026000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/2\/109\/7789868"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,30]]},"references-count":44,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,9,30]]},"published-print":{"date-parts":[[2025,2,9]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxae096","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"value":"0010-4620","type":"print"},{"value":"1460-2067","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,2]]},"published":{"date-parts":[[2024,9,30]]}}}