{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,23]],"date-time":"2025-12-23T10:40:32Z","timestamp":1766486432315,"version":"build-2065373602"},"reference-count":25,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2019,2,10]],"date-time":"2019-02-10T00:00:00Z","timestamp":1549756800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2018YFB1003602"],"award-info":[{"award-number":["2018YFB1003602"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61472439"],"award-info":[{"award-number":["61472439"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Nature Science Foundation of China","award":["61772380"],"award-info":[{"award-number":["61772380"]}]},{"name":"Foundation for Innovative Research Groups of Hubei Province","award":["2017CFA007"],"award-info":[{"award-number":["2017CFA007"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Protocol Reverse Engineering (PRE) is crucial for information security of Internet-of-Things (IoT), and message clustering determines the effectiveness of PRE. However, the quality of services still lags behind the strict requirement of IoT applications as the results of message clustering are often coarse-grained with the intrinsic type information hidden in messages largely ignored. Aiming at this problem, this study proposes a type-aware approach to message clustering guided by type information. The approach regards a message as a combination of n-grams, and it employs the Latent Dirichlet Allocation (LDA) model to characterize messages with types and n-grams via inferring the type distribution of each message. The type distribution is finally used to measure the similarity of messages. According to this similarity, the approach clusters messages and further extracts message formats. Experimental results of the approach against Netzob in terms of a number of protocols indicate that the correctness and conciseness can be significantly improved, e.g., figures 43.86% and 3.87%, respectively for the CoAP protocol.<\/jats:p>","DOI":"10.3390\/s19030716","type":"journal-article","created":{"date-parts":[[2019,2,12]],"date-time":"2019-02-12T03:18:20Z","timestamp":1549941500000},"page":"716","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["A Type-Aware Approach to Message Clustering for Protocol Reverse Engineering"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1641-5713","authenticated-orcid":false,"given":"Xin","family":"Luo","sequence":"first","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Dan","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Cyberspace Security, Hangzhou Dianzi University, Hangzhou 310018, China"}]},{"given":"Yongjun","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410073, China"}]},{"given":"Peidai","family":"Xie","sequence":"additional","affiliation":[{"name":"College of Computer, National University of Defense Technology, Changsha 410073, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,2,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1109\/TETC.2016.2597546","article-title":"Distribution based workload modelling of continuous queries in clouds","volume":"5","author":"Khoshkbarforoushha","year":"2017","journal-title":"IEEE Trans. Emerg. Top. Comput."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"15","DOI":"10.4018\/IJDST.2016010102","article-title":"City Data Fusion: Sensor Data Fusion in the Internet of Things","volume":"7","author":"Wang","year":"2016","journal-title":"Int. J. Distrib. Syst. Technol."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1016\/j.ins.2014.10.006","article-title":"Towards building a data-intensive index for big data computing\u2014A case study of remote sensing data processing","volume":"319","author":"Ma","year":"2015","journal-title":"Inform. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1109\/MCC.2015.36","article-title":"Trustworthy processing of healthcare big data in hybrid clouds","volume":"2","author":"Nepal","year":"2015","journal-title":"IEEE Cloud Comput."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"8370341","DOI":"10.1155\/2018\/8370341","article-title":"A Survey of Automatic Protocol Reverse Engineering Approaches, Methods, and Tools on the Inputs and Outputs View","volume":"2018","author":"Sija","year":"2018","journal-title":"Secur. Commun. Netw."},{"key":"ref_6","unstructured":"Zhang, Z., Zhang, Z., Lee, P.P., Liu, Y., and Xie, G. (May, January 27). Proword: An unsupervised approach to protocol feature word extraction. Proceedings of the INFOCOM, Toronto, ON, Canada."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Xu, T., Wang, Y., Sun, J., and Zhang, X. (2015, January 26\u201329). A Markov Random Field Approach to Automated Protocol Signature Inference. Proceedings of the International Conference on Security and Privacy in Communication Systems, Dallas, TX, USA.","DOI":"10.1007\/978-3-319-28865-9_25"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Luo, J.Z., Shan, C., Cai, J., and Liu, Y. (2018). IoT Application-Layer Protocol Vulnerability Detection using Reverse Engineering. Symmetry, 10.","DOI":"10.3390\/sym10110561"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Meng, F., Zhang, C., and Wu, G. (2018, January 9\u201312). Protocol reverse based on hierarchical clustering and probability alignment from network traces. Proceedings of the 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), Shanghai, China.","DOI":"10.1109\/ICBDA.2018.8367724"},{"key":"ref_10","unstructured":"Beddoe, M.A. (2018, April 21). Network Protocol Analysis using Bioinformatics Algorithms. Available online: http:\/\/www.4tphi.net\/~awalters\/PI\/PI.html."},{"key":"ref_11","first-page":"145","article-title":"Message format extraction of cryptographic protocol based on dynamic binary analysis","volume":"46","author":"Li","year":"2014","journal-title":"J. Res. Pract. Inf. Technol."},{"key":"ref_12","unstructured":"Bossert, G., and Guih\u00e9ry, F. (2012, January 10\u201315). Security evaluation of communication protocols in common criteria. Proceedings of the IEEE International Conference on Communications, Ottowa, ON, Canada."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Esoul, O., and Walkinshaw, N. (2017, January 25\u201329). Using Segment-Based Alignment to Extract Packet Structures from Network Traces. Proceedings of the 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS), Prague, Czech Republic.","DOI":"10.1109\/QRS.2017.49"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, Y., Yun, X., Shafiq, M.Z., Wang, L., Liu, A.X., Zhang, Z., Yao, D., Zhang, Y., and Guo, L. (November, January 30). A semantics aware approach to automated reverse engineering unknown protocols. Proceedings of the 2012 20th IEEE International Conference on Network Protocols (ICNP), Austin, TX, USA.","DOI":"10.1109\/ICNP.2012.6459963"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jnca.2017.10.009","article-title":"Nonparametric approach to the automated protocol fingerprint inference","volume":"99","author":"Wang","year":"2017","journal-title":"J. Netw. Comput. Appl."},{"key":"ref_16","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1108\/eb026526","article-title":"A statistical interpretation of term specificity and its application in retrieval","volume":"28","year":"1972","journal-title":"J. Doc."},{"key":"ref_18","unstructured":"Sokal, R.R., and Michener, C.D. (1958). A Statistical Method of Evaluating Systematic Relationships, The University of Kansas."},{"key":"ref_19","unstructured":"Slonim, N., and Tishby, N. (2018, May 24). Agglomerative Information Bottleneck. Available online: https:\/\/papers.nips.cc\/paper\/1651-agglomerative-information-bottleneck.pdf."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1016\/j.neucom.2018.03.059","article-title":"Large-scale k-means clustering via variance reduction","volume":"307","author":"Zhao","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Tanganelli, G., Vallati, C., and Mingozzi, E. (2015, January 14\u201316). CoAPthon: Easy development of CoAP-based IoT applications with Python. Proceedings of the 2015 IEEE 2nd World Forum on Internet-of-Things (WF-IoT), Milan, Italy.","DOI":"10.1109\/WF-IoT.2015.7389028"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Pang, R., and Paxson, V. (2003, January 25\u201329). A High-level Programming Environment for Packet Trace. Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, Karlsruhe, Germany.","DOI":"10.1145\/863955.863994"},{"key":"ref_23","unstructured":"(2018, July 15). Wireshark \u00b7 Go Deep. Available online: https:\/\/www.wireshark.org\/."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.neucom.2018.08.045","article-title":"Bayesian tensor factorization for multi-way analysis of multi-dimensional EEG","volume":"318","author":"Tang","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1109\/TPDS.2016.2613054","article-title":"H-PARAFAC: Hierarchical parallel factor analysis of multidimensional big data","volume":"28","author":"Chen","year":"2017","journal-title":"IEEE Trans. Parallel Distrib. Syst."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/716\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:30:58Z","timestamp":1760185858000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/3\/716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,2,10]]},"references-count":25,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,2]]}},"alternative-id":["s19030716"],"URL":"https:\/\/doi.org\/10.3390\/s19030716","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2019,2,10]]}}}