{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T12:11:21Z","timestamp":1775477481658,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2020,8,3]],"date-time":"2020-08-03T00:00:00Z","timestamp":1596412800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["NSF grant # 1623276"],"award-info":[{"award-number":["NSF grant # 1623276"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006952","name":"Louisiana Board of Regents","doi-asserted-by":"publisher","award":["Board of Regents Support Fund LEQSF (2016-19)-RD-B-07"],"award-info":[{"award-number":["Board of Regents Support Fund LEQSF (2016-19)-RD-B-07"]}],"id":[{"id":"10.13039\/100006952","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>File fragment classification is an essential problem in digital forensics. Although several attempts had been made to solve this challenging problem, a general solution has not been found. In this work, we propose a hierarchical machine-learning-based approach with optimized support vector machines (SVM) as the base classifiers for file fragment classification. This approach consists of more general classifiers at the top level and more specialized fine-grain classifiers at the lower levels of the hierarchy. We also propose a primitive taxonomy for file types that can be used to perform hierarchical classification. We evaluate our model with a dataset of 14 file types, with 1000 fragments measuring 512 bytes from each file type derived from a subset of the publicly available Digital Corpora, the govdocs1 corpus. Our experiment shows comparable results to the present literature, with an average accuracy of 67.78% and an F1-measure of 65% using 10-fold cross-validation. We then improve on the hierarchy and find better results, with an increase in the F1-measure of 1%. Finally, we make our assessment and observations, then conclude the paper by discussing the scope of future research.<\/jats:p>","DOI":"10.3390\/make2030012","type":"journal-article","created":{"date-parts":[[2020,8,3]],"date-time":"2020-08-03T09:02:48Z","timestamp":1596445368000},"page":"216-232","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Hierarchy-Based File Fragment Classification"],"prefix":"10.3390","volume":"2","author":[{"given":"Manish","family":"Bhatt","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA"}]},{"given":"Avdesh","family":"Mishra","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering and Computer Science, Texas A&amp;M University-Kingsville, Kingsville, TX 78363, USA"}]},{"given":"Md Wasi Ul","family":"Kabir","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA"}]},{"given":"S. E.","family":"Blake-Gatto","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA"}]},{"given":"Rishav","family":"Rajendra","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0110-2194","authenticated-orcid":false,"given":"Md Tamjidul","family":"Hoque","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of New Orleans, 2000 Lakeshore Dr., New Orleans, LA 70148, USA"}]},{"given":"Irfan","family":"Ahmed","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA"}]}],"member":"1968","published-online":{"date-parts":[[2020,8,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Casey, E., Altheide, C., Daywalt, C., de Donno, A., Forte, D., Holley, J.O., Johnston, A., van der Knijff, R., Kokocinski, A., and Luehr, P.H. (2010). Chapter 2\u2014Forensic Analysis. Handbook of Digital Forensics and Investigation, Academic Press.","DOI":"10.1016\/B978-0-12-374267-4.00002-1"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Roussev, V., and Garfinkel, S.L. (2009, January 21). File Fragment Classification\u2014The Case for Specialized Approaches. Proceedings of the 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering, Berkeley, CA, USA.","DOI":"10.1109\/SADFE.2009.21"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"S69","DOI":"10.1016\/j.diin.2013.06.008","article-title":"File fragment encoding classification\u2014An empirical approach","volume":"10","author":"Roussev","year":"2013","journal-title":"Digit. Investig."},{"key":"ref_4","unstructured":"Darwin, I.F. (2020, August 02). Libmagic. Available online: ftp:\/\/ftp.astron.com\/pub\/file\/."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"S24","DOI":"10.1016\/j.diin.2010.05.004","article-title":"The Normalised Compression Distance As a File Fragment Classifier","volume":"7","author":"Axelsson","year":"2010","journal-title":"Digit. Investig."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"S3","DOI":"10.1016\/j.diin.2010.05.002","article-title":"Automated mapping of large binary objects using primitive fragment type classification","volume":"7","author":"Conti","year":"2010","journal-title":"Digit. Investig."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ahmed, I., and Lhee, K. (2008, January 4\u20137). Detection of Malcodes by Packet Classification. Proceedings of the 2008 Third International Conference on Availability, Reliability and Security(ARES), Barcelona, Spain.","DOI":"10.1109\/ARES.2008.100"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ahmed, I., Lhee, K.-S., Shin, H., and Hong, M. (2009). On Improving the Accuracy and Performance of Content-Based File Type Identification. Information Security and Privacy, Springer.","DOI":"10.1007\/978-3-642-02620-1_4"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ahmed, I., Lhee, K.-S., Shin, H., and Hong, M. (2010, January 22\u201326). Fast File-type Identification. Proceedings of the 2010 ACM Symposium on Applied Computing, Sierre, Switzerland.","DOI":"10.1145\/1774088.1774431"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ahmed, I., Lhee, K.-S., Shin, H.-J., and Hong, M.-P. (2011). Fast Content-Based File Type Identification. Advances in Digital Forensics VII, Springer.","DOI":"10.1007\/978-3-642-24212-0_5"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"465","DOI":"10.4103\/0256-4602.67149","article-title":"Content-Based File-Type Identification Using Cosine Similarity and a Divide-and-Conquer Approach","volume":"27","author":"Ahmed","year":"2010","journal-title":"IETE Tech. Rev."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/s11416-011-0156-6","article-title":"Classification of packet contents for malware detection","volume":"7","author":"Ahmed","year":"2011","journal-title":"J. Comput. Virol."},{"key":"ref_13","unstructured":"Li, W.-J., Wang, K., Stolfo, S.J., and Herzog, B. (2005, January 15\u201317). Fileprints: Identifying file types by n-gram analysis. Proceedings of the Sixth Annual IEEE SMC Information Assurance Workshop, West Point, NY, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1016\/j.diin.2008.05.005","article-title":"Predicting the types of file fragments","volume":"5","author":"Calhoun","year":"2008","journal-title":"Digit. Investig."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"S44","DOI":"10.1016\/j.diin.2012.05.008","article-title":"Using NLP techniques for file fragment classification","volume":"9","author":"Fitzgerald","year":"2012","journal-title":"Digit. Investig."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1863","DOI":"10.4304\/jcp.9.8.1863-1870","article-title":"A File Fragment Classification Method Based on Grayscale Image","volume":"9","author":"Xu","year":"2014","journal-title":"J. Comput."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Dumais, S., and Chen, H. (2000, January 24\u201428). Hierarchical classification of Web content. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece.","DOI":"10.1145\/345508.345593"},{"key":"ref_18","unstructured":"Sun, A., and Lim, E.-P. (December, January 29). Hierarchical text classification and evaluation. Proceedings of the 2011 IEEE International Conference on Data Mining, San Jose, CA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Nakano, F.K., Pinto, W.J., Pappa, G.L., and Cerri, R. (2017, January 14\u201319). Top-down strategies for hierarchical classification of transposable elements with neural networks. Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA.","DOI":"10.1109\/IJCNN.2017.7966165"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1109\/4235.585893","article-title":"No free lunch theorems for optimization","volume":"1","author":"Wolpert","year":"1997","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Vapnik, V.N. (1995). The Nature of Statistical Learning Theory, Springer.","DOI":"10.1007\/978-1-4757-2440-0"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1016\/j.diin.2009.06.016","article-title":"Bringing science to digital forensics with standardized forensic corpora","volume":"6","author":"Garfinkel","year":"2009","journal-title":"Digit. Investig."},{"key":"ref_23","unstructured":"Rennie, J.D.M. (2020, August 02). Derivation of the F-Measure. Other Words, Available online: http:\/\/qwone.com\/~jason\/writing\/fmeasure.pdf."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"McDaniel, M., and Heydari, M.H. (2003, January 6\u20139). Content based file type detection algorithms. Proceedings of the 36th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA.","DOI":"10.1109\/HICSS.2003.1174905"},{"key":"ref_25","unstructured":"Karresand, M., and Shahmehri, N. (2006, January 21\u201323). File type identification of data fragments by their binary structure. Proceedings of the 2006 IEEE Information Assurance Workshop, West Point, NY, USA."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A note on the concept of entropy","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Veenman, C.J. (2007, January 29\u201331). Statistical disk cluster classification for file carving. Proceedings of the Third International Symposium on Information Assurance and Security, Manchester, UK.","DOI":"10.1109\/IAS.2007.75"},{"key":"ref_28","unstructured":"Van Asch, V. (2013). Macro- and Micro-Averaged Evaluation Measures [[BASIC DRAFT]], CLiPS, Univercity of Antwerp."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2553","DOI":"10.1109\/TIFS.2018.2823697","article-title":"Sparse Coding for N-Gram Feature Extraction and Training for File Fragment Classification","volume":"13","author":"Wang","year":"2018","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Chen, Q., Liao, Q., Jiang, Z.L., Fang, J., Yiu, S., Xi, G., Li, R., Yi, Z., Wang, X., and Hui, L.C.K. (2018, January 24). File Fragment Classification Using Grayscale Image Conversion and Deep Learning in Digital Forensics. Proceedings of the IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA.","DOI":"10.1109\/SPW.2018.00029"},{"key":"ref_31","unstructured":"Mittal, G., Korus, P., and Memon, N. (2019). FiFTy: Large-Scale File Fragment Type Identification Using Neural Networks. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Stojanova, D., Ceci, M., Appice, A., Malerba, D., and D\u017eeroski, S. (2011). Global and Local Spatial Autocorrelation in Predictive Clustering Trees. Discovery Science, Springer.","DOI":"10.1007\/978-3-642-24477-3_25"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Borges, H.B., and Nievola, J.C. (2012, January 29\u201331). Hierarchical classification using a Competitive Neural Network. Proceedings of the 8th International Conference on Natural Computation, Chongqing, China.","DOI":"10.1109\/ICNC.2012.6234573"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1007\/978-3-319-46279-0_17","article-title":"Data Type Classification: Hierarchical Class-to-Type Modeling","volume":"Volume 484","author":"Beebe","year":"2016","journal-title":"Advances in Digital Forensics XII"},{"key":"ref_35","unstructured":"Vailaya, A., Figueiredo, M., Jain, A., and Zhang, H.J. (1999, January 7\u201311). Content-based hierarchical classification of vacation images. Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Florence, Italy."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Dekel, O., Keshet, J., and Singer, Y. (2004, January 4\u20138). Large margin hierarchical classification. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada.","DOI":"10.1145\/1015330.1015374"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Arabie, P., and De Soete, G. (1996). Clustering and Classification, World Scientific.","DOI":"10.1142\/1930"},{"key":"ref_38","unstructured":"Cherkassky, V., and Mulier, F. (1998). Learning from Data: Concepts, Theory, and Methods, John Wiley & Sons."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"100012","DOI":"10.1016\/j.array.2019.100012","article-title":"Machine learning applications in detecting sand boils from images","volume":"3","author":"Kuchi","year":"2019","journal-title":"Array"}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/2\/3\/12\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:54:01Z","timestamp":1760176441000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/2\/3\/12"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,3]]},"references-count":39,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2020,9]]}},"alternative-id":["make2030012"],"URL":"https:\/\/doi.org\/10.3390\/make2030012","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,3]]}}}