{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T22:53:45Z","timestamp":1776725625399,"version":"3.51.2"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,11,13]],"date-time":"2023-11-13T00:00:00Z","timestamp":1699833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Los Alamos National Laboratory (LANL) Laboratory Directed Research and Development","award":["20190020DR"],"award-info":[{"award-number":["20190020DR"]}]},{"name":"LANL Institutional Computing Program"},{"name":"U.S. Department of Energy National Nuclear Security Administration","award":["89233218CNA000001"],"award-info":[{"award-number":["89233218CNA000001"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2023,11,30]]},"abstract":"<jats:p>\n            Identification of the family to which a malware specimen belongs is essential in understanding the behavior of the malware and developing mitigation strategies. Solutions proposed by prior work, however, are often not practicable due to the lack of realistic evaluation factors. These factors include learning under class imbalance, the ability to identify new malware, and the cost of production-quality labeled data. In practice, deployed models face prominent, rare, and new malware families. At the same time, obtaining a large quantity of up-to-date labeled malware for training a model can be expensive. In this article, we address these problems and propose a novel hierarchical semi-supervised algorithm, which we call the\n            <jats:italic>HNMFk Classifier<\/jats:italic>\n            , that can be used in the early stages of the malware family labeling process. Our method is based on non-negative matrix factorization with automatic model selection, that is, with an estimation of the number of clusters. With\n            <jats:italic>HNMFk Classifier<\/jats:italic>\n            , we exploit the hierarchical structure of the malware data together with a semi-supervised setup, which enables us to classify malware families under conditions of extreme class imbalance. Our solution can perform abstaining predictions, or rejection option, which yields promising results in the identification of novel malware families and helps with maintaining the performance of the model when a low quantity of labeled data is used. We perform bulk classification of nearly 2,900 both rare and prominent malware families, through static analysis, using nearly 388,000 samples from the EMBER-2018 corpus. In our experiments, we surpass both supervised and semi-supervised baseline models with an F1 score of 0.80.\n          <\/jats:p>","DOI":"10.1145\/3624567","type":"journal-article","created":{"date-parts":[[2023,9,18]],"date-time":"2023-09-18T11:54:57Z","timestamp":1695038097000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Semi-Supervised Classification of Malware Families Under Extreme Class Imbalance via Hierarchical Non-Negative Matrix Factorization with Automatic Model Selection"],"prefix":"10.1145","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4362-0256","authenticated-orcid":false,"given":"Maksim E.","family":"Eren","sequence":"first","affiliation":[{"name":"Advanced Research in Cyber Systems, Los Alamos National Laboratory, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1421-3643","authenticated-orcid":false,"given":"Manish","family":"Bhattarai","sequence":"additional","affiliation":[{"name":"Theoretical Division, Los Alamos National Laboratory, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7168-1237","authenticated-orcid":false,"given":"Robert J.","family":"Joyce","sequence":"additional","affiliation":[{"name":"Machine Learning Research Group, Booz Allen Hamilton, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9900-1972","authenticated-orcid":false,"given":"Edward","family":"Raff","sequence":"additional","affiliation":[{"name":"Machine Learning Research Group, Booz Allen Hamilton, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9494-7139","authenticated-orcid":false,"given":"Charles","family":"Nicholas","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8636-4603","authenticated-orcid":false,"given":"Boian S.","family":"Alexandrov","sequence":"additional","affiliation":[{"name":"Theoretical Division, Los Alamos National Laboratory, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,11,13]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"183","volume-title":"Proceedings of the 6th ACM Conference on Data and Application Security and Privacy","author":"Ahmadi Mansour","year":"2016","unstructured":"Mansour Ahmadi, Dmitry Ulyanov, Stanislav Semenov, Mikhail Trofimov, and Giorgio Giacinto. 2016. Novel feature extraction, selection and fusion for effective malware family classification. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy. 183\u2013194."},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"2623","DOI":"10.1145\/3292500.3330701","volume-title":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","author":"Akiba Takuya","year":"2019","unstructured":"Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. 2019. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2623\u20132631."},{"key":"e_1_3_1_4_2","article-title":"Source identification by non-negative matrix factorization combined with semi-supervised clustering","author":"Alexandrov B. S.","year":"2020","unstructured":"B. S. Alexandrov, L. B. Alexandrov, and V. G. Stanev et al.2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10,776,718 (2020).","journal-title":"US Patent S10,776,718"},{"key":"e_1_3_1_5_2","unstructured":"Boian S. Alexandrov Ludmil B. Alexandrov Filip L. Iliev Valentin G. Stanev and Velimir V. Vesselinov. 2020. Source identification by non-negative matrix factorization combined with semi-supervised clustering. US Patent S10 776 718."},{"key":"e_1_3_1_6_2","unstructured":"Ludmil B. Alexandrov Jaegil Kim Nicholas J. Haradhvala Mi Ni Huang Alvin Wei Tian Ng Yang Wu Arnoud Boot Kyle R. Covington Dmitry A. Gordenin Erik N. Bergstrom S. M. Ashiqul Islam Nuria Lopez-Bigas Leszek J. Klimczak John R. McPherson Sandro Morganella Radhakrishnan Sabarinathan David A. Wheeler Ville Mustonen Paul Boutros Kin Chan Akihiro Fujimoto Gad Getz Marat Kazanov Michael Lawrence I\u00f1igo Martincorena Hidewaki Nakagawa Paz Polak Stephenie Prokopec Steven A. Roberts Steven G. Rozen Natalie Saini Tatsuhiro Shibata Yuichi Shiraishi Michael R. Stratton Bin Tean Teh Ignacio V\u00e1zquez-Garc\u00eda Fouad Yousif Willie Yu Lauri A. Aaltonen Federico Abascal Adam Abeshouse Hiroyuki Aburatani David J. Adams Nishant Agrawal Keun Soo Ahn Sung-Min Ahn Hiroshi Aikata Rehan Akbani Kadir C. Akdemir Hikmat Al-Ahmadie Sultan T. Al-Sedairy Fatima Al-Shahrour Malik Alawi Monique Albert Kenneth Aldape Adrian Ally Kathryn Alsop Eva G. Alvarez Fernanda Amary Samirkumar B. Amin Brice Aminou Ole Ammerpohl Matthew J. Anderson Yeng Ang Davide Antonello Pavana Anur Samuel Aparicio Elizabeth L. Appelbaum Yasuhito Arai Axel Aretz Koji Arihiro Shun-ichi Ariizumi Joshua Armenia Laurent Arnould Sylvia Asa Yassen Assenov Gurnit Atwal Sietse Aukema J. Todd Auman Miriam R. R. Aure Philip Awadalla Marta Aymerich Gary D. Bader Adrian Baez-Ortega Matthew H. Bailey Peter J. Bailey Miruna Balasundaram Saianand Balu Pratiti Bandopadhayay Rosamonde E. Banks Stefano Barbi Andrew P. Barbour Jonathan Barenboim Jill Barnholtz- Sloan Hugh Barr Elisabet Barrera John Bartlett Javier Bartolome Claudio Bassi Oliver F. Bathe Daniel Baumhoer Prashant Bavi Stephen B. Baylin Wojciech Bazant Duncan Beardsmore Timothy A. Beck Sam Behjati Andreas Behren Beifang Niu Cindy Bell Sergi Beltran Christopher Benz Andrew Berchuck Anke K. Bergmann Benjamin P. Berman Daniel M. Berney Stephan H. Bernhart Rameen Beroukhim Mario Berrios Samantha Bersani Johanna Bertl Miguel Betancourt Vinayak Bhandari Shriram G. Bhosle Andrew V. Biankin Matthias Bieg Darell Bigner Hans Binder Ewan Birney Michael Birrer Nidhan K. Biswas Bodil Bjerkehagen Tom Bodenheimer Lori Boice Giada Bonizzato Johann S. De Bono Moiz S. Bootwalla Ake Borg Arndt Borkhardt Keith A. Boroevich Ivan Borozan Christoph Borst Marcus Bosenberg Mattia Bosio Jacqueline Boultwood Guillaume Bourque Paul C. Boutros G. Steven Bova David T. Bowen Reanne Bowlby David D. L. Bowtell Sandrine Boyault Rich Boyce Jeffrey Boyd Alvis Brazma Paul Brennan Daniel S. Brewer Arie B. Brinkman Robert G. Bristow Russell R. Broaddus Jane E. Brock Malcolm Brock Annegien Broeks Angela N. Brooks Denise Brooks Benedikt Brors S\u00f8ren Brunak Timothy J. C. Bruxner Alicia L. Bruzos Alex Buchanan Ivo Buchhalter Christiane Buchholz Susan Bullman Hazel Burke Birgit Burkhardt Kathleen H. Burns John Busanovich Carlos D. Bustamante Adam P. Butler Atul J. Butte Niall J. Byrne Anne-Lise B\u00f8rresen-Dale Samantha J. Caesar-Johnson Andy Cafferkey Declan Cahill Claudia Calabrese Carlos Caldas Fabien Calvo Niedzica Camacho Peter J. Campbell Elias Campo Cinzia Cant\u00f9 Shaolong Cao Thomas E. Carey Joana Carlevaro-Fita Rebecca Carlsen Ivana Cataldo Mario Cazzola Jonathan Cebon Robert Cerfolio Dianne E. Chadwick Dimple Chakravarty Don Chalmers Calvin Wing Yiu Chan Michelle Chan-Seng-Yue Vishal S. Chandan David K. Chang Stephen J. Chanock Lorraine A. Chantrill Aur\u00e9lien Chateigner Nilanjan Chatterjee Kazuaki Chayama Hsiao-Wei Chen Jieming Chen Ken Chen Yiwen Chen Zhaohong Chen Andrew D. Cherniack Jeremy Chien Yoke-Eng Chiew Suet-Feung Chin Juok Cho Sunghoon Cho Jung Kyoon Choi Wan Choi Christine Chomienne Zechen Chong Su Pin Choo Angela Chou Angelika N. Christ Elizabeth L. Christie Eric Chuah Carrie Cibulskis Kristian Cibulskis Sara Cingarlini Peter Clapham Alexander Claviez Sean Cleary Nicole Cloonan Marek Cmero Colin C. Collins Ashton A. Connor Susanna L. Cooke Colin S. Cooper Leslie Cope Vincenzo Corbo Matthew G. Cordes Stephen M. Cordner Isidro Cort\u00e9s-Ciriano Kyle Covington Prue A. Cowin Brian Craft David Craft Chad J. Creighton Yupeng Cun Erin Curley Ioana Cutcutache Karolina Czajka Bogdan Czerniak Rebecca A. Dagg Ludmila Danilova Maria Vittoria Davi Natalie R. Davidson Helen Davies Ian J. Davis Brandi N. Davis-Dusenbery Kevin J. Dawson Francisco M. De La Vega Ricardo De Paoli-Iseppi Timothy Defreitas Angelo P. Dei Tos Olivier Delaneau John A. Demchok PCAWG Mutational Signatures Working Group and P. C. A. W. G. Consortium. 2020. The repertoire of mutational signatures in human cancer. Nature 578 7793 (01 Feb 2020) 94\u2013101. 10.1038\/s41586-020-1943-3"},{"key":"e_1_3_1_7_2","doi-asserted-by":"crossref","unstructured":"Ludmil B. Alexandrov Serena Nik-Zainal David C. Wedge Samuel A. J. R. Aparicio Sam Behjati Andrew V. Biankin Graham R. Bignell Niccol\u00f2 Bolli Ake Borg Anne-Lise B\u00f8rresen-Dale Sandrine Boyault Birgit Burkhardt Adam P. Butler Carlos Caldas Helen R. Davies Christine Desmedt Roland Eils J\u00f3runn Erla Eyfj\u00f6rd John A. Foekens Mel Greaves Fumie Hosoda Barbara Hutter Tomislav Ilicic Sandrine Imbeaud Marcin Imielinski Natalie J\u00e4ger David T. W. Jones David Jones Stian Knappskog Marcel Kool Sunil R. Lakhani Carlos L\u00f3pez-Ot\u00edn Sancha Martin Nikhil C. Munshi Hiromi Nakamura Paul A. Northcott Marina Pajic Elli Papaemmanuil Angelo Paradiso John V. Pearson Xose S. Puente Keiran Raine Manasa Ramakrishna Andrea L. Richardson Julia Richter Philip Rosenstiel Matthias Schlesner Ton N. Schumacher Paul N. Span Jon W. Teague Yasushi Totoki Andrew N. J. Tutt Rafael Vald\u00e9s-Mas Marit M. van Buuren Laura van \u2019t Veer Anne Vincent-Salomon Nicola Waddell Lucy R. Yates Jessica Zucman-Rossi P. Andrew Futreal Ultan McDermott Peter Lichter Matthew Meyerson Sean M. Grimmond Reiner Siebert El\u00edas Campo Tatsuhiro Shibata Stefan M. Pfister Peter J. Campbell Michael R. Stratton Australian Pancreatic Cancer Genome Initiative ICGC Breast Cancer Consortium ICGC MMML-Seq Consortium and I. C. G. C. PedBrain. 2013. Signatures of mutational processes in human cancer. Nature 500 7463 (01 Aug 2013) 415\u2013421. 10.1038\/nature12477","DOI":"10.1038\/nature12477"},{"issue":"1","key":"e_1_3_1_8_2","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1016\/j.celrep.2012.12.008","article-title":"Deciphering signatures of mutational processes operative in human cancer","volume":"3","author":"Alexandrov Ludmil B.","year":"2013","unstructured":"Ludmil B. Alexandrov, Serena Nik-Zainal, David C. Wedge, Peter J. Campbell, and Michael R. Stratton. 2013. Deciphering signatures of mutational processes operative in human cancer. Cell Reports 3, 1 (2013), 246\u2013259.","journal-title":"Cell Reports"},{"key":"e_1_3_1_9_2","unstructured":"H. S. Anderson and P. Roth. 2018. EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models . ArXiv e-prints (April 2018). arXiv:1804.04637 [cs.CR]."},{"key":"e_1_3_1_10_2","unstructured":"Daniel Arp Michael Spreitzenbarth Malte Hubner Hugo Gascon Konrad Rieck and CERT Siemens. 2014. Drebin: Effective and explainable detection of Android malware in your pocket.. In NDSS 14. 23\u201326."},{"key":"e_1_3_1_11_2","first-page":"1","volume-title":"Proceedings of the IEEE\/IFIP Network Operations and Management Symposium (NOMS 2020)","author":"Bak M\u00e1rton","year":"2020","unstructured":"M\u00e1rton Bak, Dorottya Papp, Csongor Tam\u00e1s, and Levente Butty\u00e1n. 2020. Clustering IoT malware based on binary similarity. In Proceedings of the IEEE\/IFIP Network Operations and Management Symposium (NOMS 2020). IEEE, 1\u20136."},{"key":"e_1_3_1_12_2","doi-asserted-by":"crossref","first-page":"104709","DOI":"10.1016\/j.jpdc.2023.04.010","article-title":"Distributed non-negative rescal with automatic model selection for exascale data","volume":"179","author":"Bhattarai Manish","year":"2023","unstructured":"Manish Bhattarai, Ismael Boureima, Erik Skau, Benjamin Nebgen, Hristo Djidjev, Sanjay Rajopadhye, James P. Smith, Boian Alexandrov, et\u00a0al. 2023. Distributed non-negative rescal with automatic model selection for exascale data. J. Parallel and Distrib. Comput. 179 (2023), 104709.","journal-title":"J. Parallel and Distrib. Comput."},{"key":"e_1_3_1_13_2","doi-asserted-by":"crossref","unstructured":"Manish Bhattarai Namita Kharat Ismael Boureima Erik Skau Benjamin Nebgen Hristo Djidjev Sanjay Rajopadhye James P. Smith and Boian Alexandrov. 2023. Distributed non-negative RESCAL with automatic model selection for exascale data. J. Parallel and Distrib. Comput. 179 (2023) 104709. 10.1016\/j.jpdc.2023.04.010","DOI":"10.1016\/j.jpdc.2023.04.010"},{"key":"e_1_3_1_14_2","first-page":"382","article-title":"Bayesian PCA","author":"Bishop Christopher M.","year":"1999","unstructured":"Christopher M. Bishop. 1999. Bayesian PCA. Advances in Neural Information Processing Systems (1999), 382\u2013388.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_15_2","volume-title":"The Cost of Cybercrime","author":"Bissell K.","year":"2019","unstructured":"K. Bissell and L. Ponemon. 2019. The Cost of Cybercrime. Technical Report. Accenture, Ponemon Institute. https:\/\/www.accenture.com\/_acnmedia\/PDF-96\/Accenture-2019-Cost-of-Cybercrime-Study-Final.pdf"},{"key":"e_1_3_1_16_2","article-title":"Distributed out-of-memory NMF on CPU\/GPU architectures","author":"Boureima Ismael","year":"2022","unstructured":"Ismael Boureima, Manish Bhattarai, Maksim Ekin Eren, Erik West Skau, Philip Romero, Stephan Johannes Eidenbenz, and Boian S. Alexandrov. 2022. Distributed out-of-memory NMF on CPU\/GPU architectures. The Journal of Supercomputing (2022). https:\/\/api.semanticscholar.org\/CorpusID:247011761","journal-title":"The Journal of Supercomputing"},{"key":"e_1_3_1_17_2","first-page":"1","volume-title":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","author":"Boureima Ismael","year":"2022","unstructured":"Ismael Boureima, Manish Bhattarai, Maksim E. Eren, Nick Solovyev, Hristo Djidjev, and Boian S. Alexandrov. 2022. Distributed out-of-memory SVD on CPU\/GPU architectures. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1\u20138."},{"issue":"12","key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"4164","DOI":"10.1073\/pnas.0308531101","article-title":"Metagenes and molecular pattern discovery using matrix factorization","volume":"101","author":"Brunet Jean-Philippe","year":"2004","unstructured":"Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov. 2004. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences 101, 12 (2004), 4164\u20134169.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"e_1_3_1_19_2","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1145\/1008992.1009016","volume-title":"Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Canny John","year":"2004","unstructured":"John Canny. 2004. GaP: A factor model for discrete data. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 122\u2013129."},{"key":"e_1_3_1_20_2","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1007\/s11634-020-00398-4","article-title":"Simultaneous dimension reduction and clustering via the NMF-EM algorithm","volume":"15","author":"Carel L\u00e9na","year":"2021","unstructured":"L\u00e9na Carel and Pierre Alquier. 2021. Simultaneous dimension reduction and clustering via the NMF-EM algorithm. Advances in Data Analysis and Classification 15 (2021), 231\u2013260.","journal-title":"Advances in Data Analysis and Classification"},{"key":"e_1_3_1_21_2","series-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","first-page":"785","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, CA)) (KDD \u201916). ACM, New York,, 785\u2013794. DOI:10.1145\/2939672.2939785"},{"key":"e_1_3_1_22_2","volume-title":"ESANN","author":"Faleiros Thiago de Paulo","year":"2016","unstructured":"Thiago de Paulo Faleiros and Alneu de Andrade Lopes. 2016. On the equivalence between algorithms for non-negative matrix factorization and Latent Dirichlet Allocation.. In ESANN."},{"key":"e_1_3_1_23_2","article-title":"MalwareDNA: Simultaneous classification of malware, malware families, and novel malware","author":"Eren Maksim E.","year":"2023","unstructured":"Maksim E. Eren, Manish Bhattarai, Kim Rasmussen, Boian S. Alexandrov, and Charles Nicholas. 2023. MalwareDNA: Simultaneous classification of malware, malware families, and novel malware. arXiv preprint arXiv:2309.01350 (2023).","journal-title":"arXiv preprint arXiv:2309.01350"},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1109\/ICMLA55696.2022.00107","volume-title":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","author":"Eren Maksim E.","year":"2022","unstructured":"Maksim E. Eren, Manish Bhattarai, Nicholas Solovyev, Luke E. Richards, Roberto Yus, Charles Nicholas, and Boian S. Alexandrov. 2022. One-shot federated group collaborative filtering. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). 647\u2013652. DOI:10.1109\/ICMLA55696.2022.00107"},{"key":"e_1_3_1_25_2","unstructured":"DocEng \u201922 Proceedings of the 22nd ACM Symposium on Document Engineering Maksim E. Eren Nick Solovyev Manish Bhattarai Kim \u00d8. Rasmussen Charles Nicholas Boian S. Alexandrov SeNMFk-SPLIT: Large corpora topic modeling by semantic non-negative matrix factorization with automatic model selection 2022 10 10.1145\/3558100.3563844 10.1145\/3558100.3563844 10.1145\/3558100.3563844"},{"key":"e_1_3_1_26_2","unstructured":"External Data Source. 2018. VirusShare Dataset. DOI:10.23721\/100\/1504313"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2018.2806891"},{"key":"e_1_3_1_28_2","first-page":"1913","volume-title":"2009 17th European Signal Processing Conference","author":"F\u00e9votte C\u00e9dric","year":"2009","unstructured":"C\u00e9dric F\u00e9votte and A. Taylan Cemgil. 2009. Nonnegative matrix factorizations as probabilistic inference in composite models. In Proceedings of the2009 17th European Signal Processing Conference. IEEE, 1913\u20131917."},{"issue":"4","key":"e_1_3_1_29_2","doi-asserted-by":"crossref","first-page":"2066","DOI":"10.1109\/TGRS.2014.2352857","article-title":"Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization","volume":"53","author":"Gillis Nicolas","year":"2014","unstructured":"Nicolas Gillis, Da Kuang, and Haesun Park. 2014. Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization. IEEE Transactions on Geoscience and Remote Sensing 53, 4 (2014), 2066\u20132078.","journal-title":"IEEE Transactions on Geoscience and Remote Sensing"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1007\/978-3-662-44848-9_32","volume-title":"Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases","author":"Greene Derek","year":"2014","unstructured":"Derek Greene, Derek O\u2019Callaghan, and P\u00e1draig Cunningham. 2014. How many topics? stability analysis for topic models. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 498\u2013513."},{"key":"e_1_3_1_31_2","article-title":"COVID-19 literature topic-based search via hierarchical NMF","author":"Grotheer Rachel","year":"2020","unstructured":"Rachel Grotheer, Yihuan Huang, Pengyu Li, Elizaveta Rebrova, Deanna Needell, Longxiu Huang, Alona Kryshchenko, Xia Li, Kyung Ha, and Oleksandr Kryshchenko. 2020. COVID-19 literature topic-based search via hierarchical NMF. arXiv preprint arXiv:2009.09074 (2020).","journal-title":"arXiv preprint arXiv:2009.09074"},{"key":"e_1_3_1_32_2","first-page":"1","volume-title":"2016 International Conference on Computing, Networking and Communications (ICNC)","author":"Hansen Steven Strandlund","year":"2016","unstructured":"Steven Strandlund Hansen, Thor Mark Tampus Larsen, Matija Stevanovic, and Jens Myrup Pedersen. 2016. An approach for detection and family classification of malware based on behavioral analysis. In Proceedings of the2016 International Conference on Computing, Networking and Communications (ICNC). 1\u20135. DOI:10.1109\/ICCNC.2016.7440587"},{"key":"e_1_3_1_33_2","volume-title":"Neural Networks: A Comprehensive Foundation","author":"Haykin Simon","year":"1994","unstructured":"Simon Haykin. 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-9863-7_1185"},{"key":"e_1_3_1_35_2","first-page":"399","volume-title":"Proceedings of 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2016) .)","author":"Huang Wenyi","year":"2016","unstructured":"Wenyi Huang and Jay Stokes. 2016. MtNet: A multi-task neural network for dynamic malware classification. In Proceedings of 13th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2016) .). Springer, 399\u2013418. https:\/\/www.microsoft.com\/en-us\/research\/publication\/mtnet-multi-task-neural-network-dynamic-malware-classification\/"},{"key":"e_1_3_1_36_2","volume-title":"Cost of a Data Breach Report","year":"2021","unstructured":"IBM. 2021. Cost of a Data Breach Report. Technical Report. IBM. https:\/\/www.ibm.com\/security\/data-breach"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"S. M. Ashiqul Islam Marcos Diaz-Gay Yang Wu Mark Barnes Raviteja Vangara Erik N. Bergstrom Yudou He Mike Vella Jingwei Wang Jon W. Teague Peter Clapham Sarah Moody Sergey Senkin Yun Rose Li Laura Riva Tongwu Zhang Andreas J. Gruber Christopher D. Steele Burcak Otlu Azhar Khandekar Ammal Abbasi Laura Humphreys Natalia Syulyukina Samuel W. Brady Boian S. Alexandrov Nischalan Pillay Jinghui Zhang David J. Adams Inigo Martincorena David C. Wedge Maria Teresa Landi Paul Brennan Michael R. Stratton Steven G. Rozen and Ludmil B. Alexandrov. 2022. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics 2 11 (2022) 100179. 10.1016\/j.xgen.2022.100179","DOI":"10.1016\/j.xgen.2022.100179"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_3_1_39_2","first-page":"1","volume-title":"Proceedings of the 2019 IEEE Symposium on Computers and Communications (ISCC)","author":"Jiang Jianguo","year":"2019","unstructured":"Jianguo Jiang, Song Li, Min Yu, Gang Li, Chao Liu, Kai Chen, Hui Liu, and Weiqing Huang. 2019. Android malware family classification based on sensitive opcode sequence. In Proceedings of the 2019 IEEE Symposium on Computers and Communications (ISCC). IEEE, 1\u20137."},{"key":"e_1_3_1_40_2","volume-title":"Machine Learning Methods for Malware Detection","year":"2020","unstructured":"Kaspersky. 2020. Machine Learning Methods for Malware Detection. Technical Report."},{"key":"e_1_3_1_41_2","first-page":"3149","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Ke Guolin","year":"2017","unstructured":"Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS\u201917) (Long Beach, CA) . Curran Associates Inc., Red Hook, NY, 3149\u20133157."},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"739","DOI":"10.1145\/2487575.2487606","volume-title":"Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Kuang Da","year":"2013","unstructured":"Da Kuang and Haesun Park. 2013. Fast rank-2 nonnegative matrix factorization for hierarchical document clustering. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 739\u2013747."},{"issue":"6755","key":"e_1_3_1_43_2","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1038\/44565","article-title":"Learning the parts of objects by non-negative matrix factorization","volume":"401","author":"Lee Daniel D.","year":"1999","unstructured":"Daniel D. Lee and H. Sebastian Seung. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788\u2013791.","journal-title":"Nature"},{"issue":"185","key":"e_1_3_1_44_2","first-page":"1","article-title":"Hyperband: A novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li Lisha","year":"2018","unstructured":"Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18, 185 (2018), 1\u201352. http:\/\/jmlr.org\/papers\/v18\/16-558.html","journal-title":"Journal of Machine Learning Research"},{"issue":"3","key":"e_1_3_1_45_2","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1007\/s11416-019-00331-0","article-title":"Nonnegative matrix factorization and metamorphic malware detection","volume":"15","author":"Ling Yeong Tyng","year":"2019","unstructured":"Yeong Tyng Ling, Nor Fazlida Mohd Sani, Mohd Taufik Abdullah, and Nor Asilah Wati Abdul Hamid. 2019. Nonnegative matrix factorization and metamorphic malware detection. Journal of Computer Virology and Hacking Techniques 15, 3 (2019), 195\u2013208.","journal-title":"Journal of Computer Virology and Hacking Techniques"},{"key":"e_1_3_1_46_2","article-title":"Towards an automated pipeline for detecting and classifying malware through machine learning","author":"Loi Nicola","year":"2021","unstructured":"Nicola Loi, Claudio Borile, and Daniele Ucci. 2021. Towards an automated pipeline for detecting and classifying malware through machine learning. arXiv preprint arXiv:2106.05625 (2021).","journal-title":"arXiv preprint arXiv:2106.05625"},{"issue":"2","key":"e_1_3_1_47_2","first-page":"1053","article-title":"Bayesian nonlinear modeling for the prediction competition","volume":"100","author":"MacKay David J. C.","year":"1994","unstructured":"David J. C. MacKay. 1994. Bayesian nonlinear modeling for the prediction competition. ASHRAE Transactions 100, 2 (1994), 1053\u20131062.","journal-title":"ASHRAE Transactions"},{"key":"e_1_3_1_48_2","volume-title":"Microsoft Researchers Work with Intel Labs to Explore New Deep Learning Approaches for Malware Classification","author":"Team Microsoft 365 Defender Threat Intelligence","year":"2020","unstructured":"Microsoft 365 Defender Threat Intelligence Team. 2020. Microsoft Researchers Work with Intel Labs to Explore New Deep Learning Approaches for Malware Classification. https:\/\/www.microsoft.com\/security\/blog"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cose.2015.04.001"},{"key":"e_1_3_1_50_2","first-page":"1923","volume-title":"Proceedings of the 2009 17th European Signal Processing Conference","author":"M\u00f8rup Morten","year":"2009","unstructured":"Morten M\u00f8rup and Lars Kai Hansen. 2009. Tuning pruning in sparse non-negative matrix factorization. In Proceedings of the 2009 17th European Signal Processing Conference. IEEE, 1923\u20131927."},{"key":"e_1_3_1_51_2","volume-title":"Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec \u201911)","author":"Nataraj L.","year":"2011","unstructured":"L. Nataraj, S. Karthikeyan, G. Jacob, and B. S. Manjunath. 2011. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec \u201911) (Pittsburgh, PA) . ACM, New York,, Article 4, 7 pages. DOI:10.1145\/2016904.2016908"},{"issue":"2","key":"e_1_3_1_52_2","first-page":"025012","article-title":"A neural network for determination of latent dimensionality in non-negative matrix factorization","volume":"2","author":"Nebgen Benjamin T.","year":"2021","unstructured":"Benjamin T. Nebgen, Raviteja Vangara, Miguel A. Hombrados-Herrera, Svetlana Kuksova, and Boian S. Alexandrov. 2021. A neural network for determination of latent dimensionality in non-negative matrix factorization. Machine Learning: Science and Technology 2, 2 (2021), 025012.","journal-title":"Machine Learning: Science and Technology"},{"key":"e_1_3_1_53_2","article-title":"Leveraging uncertainty for improved static malware detection under extreme false positive constraints","author":"Nguyen Andre T.","year":"2021","unstructured":"Andre T. Nguyen, Edward Raff, Charles Nicholas, and James Holt. 2021. Leveraging uncertainty for improved static malware detection under extreme false positive constraints. arXiv preprint arXiv:2108.04081 (2021).","journal-title":"arXiv preprint arXiv:2108.04081"},{"key":"e_1_3_1_54_2","volume-title":"VirusTotal += Bitdefender Theta","author":"Quintero Bernardo","year":"2019","unstructured":"Bernardo Quintero. 2019. VirusTotal += Bitdefender Theta. https:\/\/blog.virustotal.com\/2019\/10\/virustotal-bitdefender-theta.html"},{"key":"e_1_3_1_55_2","volume-title":"VirusTotal += Sangfor Engine Zero","author":"Quintero Bernardo","year":"2019","unstructured":"Bernardo Quintero. 2019. VirusTotal += Sangfor Engine Zero. https:\/\/blog.virustotal.com\/2019\/11\/virustotal-sangfor-engine-zero.html"},{"key":"e_1_3_1_56_2","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1145\/3097983.3098111","volume-title":"Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201917)","author":"Raff Edward","year":"2017","unstructured":"Edward Raff and Charles Nicholas. 2017. An alternative to NCD for large sequences, Lempel-Ziv Jaccard distance. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD \u201917) (Halifax, NS, Canada) . ACM, New York,, 1007\u20131015. DOI:10.1145\/3097983.3098111"},{"key":"e_1_3_1_57_2","article-title":"A survey of machine learning methods and challenges for windows malware classification","volume":"2006","author":"Raff Edward","year":"2020","unstructured":"Edward Raff and C. Nicholas. 2020. A survey of machine learning methods and challenges for windows malware classification. ArXiv abs\/2006.09271 (2020).","journal-title":"ArXiv"},{"key":"e_1_3_1_58_2","volume-title":"Proceedings of the 34th AAAI Conference on Artificial Intelligence","author":"Raff Edward","year":"2020","unstructured":"Edward Raff, Charles Nicholas, and Mark McLean. 2020. A new burrows wheeler transform Markov distance. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. http:\/\/arxiv.org\/abs\/1912.13046"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.diin.2017.12.004"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"e_1_3_1_61_2","first-page":"1105","volume-title":"Proceedings of the 2018 World Wide Web Conference","author":"Shi Tian","year":"2018","unstructured":"Tian Shi, Kyeongpil Kang, Jaegul Choo, and Chandan K. Reddy. 2018. Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations. In Proceedings of the 2018 World Wide Web Conference. 1105\u20131114."},{"key":"e_1_3_1_62_2","first-page":"507","volume-title":"Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC)","author":"Sun Bowen","year":"2017","unstructured":"Bowen Sun, Qi Li, Yanhui Guo, Qiaokun Wen, Xiaoxi Lin, and Wenhan Liu. 2017. Malware family classification method based on static feature extraction. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC). 507\u2013513. DOI:10.1109\/CompComm.2017.8322598"},{"issue":"7","key":"e_1_3_1_63_2","first-page":"1592","article-title":"Automatic relevance determination in nonnegative matrix factorization with the\/spl beta\/-divergence","volume":"35","author":"Tan Vincent Y. F.","year":"2012","unstructured":"Vincent Y. F. Tan and C\u00e9dric F\u00e9votte. 2012. Automatic relevance determination in nonnegative matrix factorization with the\/spl beta\/-divergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 7 (2012), 1592\u20131605.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_64_2","unstructured":"The Independent IT Security Institute. 2021. Malware Statistics & Trends Report: AV-TEST. https:\/\/www.av-test.org\/en\/statistics\/malware\/"},{"key":"e_1_3_1_65_2","first-page":"1692","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Trigeorgis George","year":"2014","unstructured":"George Trigeorgis, Konstantinos Bousmalis, Stefanos Zafeiriou, and Bjoern Schuller. 2014. A deep semi-NMF model for learning hidden representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1692\u20131700."},{"key":"e_1_3_1_66_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten Laurens van der","year":"2008","unstructured":"Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (2008), 2579\u20132605. http:\/\/www.jmlr.org\/papers\/v9\/vandermaaten08a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_67_2","doi-asserted-by":"crossref","DOI":"10.1109\/ACCESS.2021.3106879","article-title":"Finding the number of latent topics with semantic non-negative matrix factorization","author":"Vangara Raviteja","year":"2021","unstructured":"Raviteja Vangara, Manish Bhattarai, Erik Skau, Gopinath Chennupati, Hristo Djidjev, Thomas Tierney, James P. Smith, Valentin G Stanev, and Boian S. Alexandrov. 2021. Finding the number of latent topics with semantic non-negative matrix factorization. IEEE Access (2021).","journal-title":"IEEE Access"},{"key":"e_1_3_1_68_2","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/ICMLA51294.2020.00060","volume-title":"2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)","author":"Vangara Raviteja","year":"2020","unstructured":"Raviteja Vangara, Erik Skau, Gopinath Chennupati, Hristo Djidjev, Thomas Tierney, James P. Smith, Manish Bhattarai, Valentin G. Stanev, and Boian S. Alexandrov. 2020. Semantic nonnegative matrix factorization with automatic model determination for topic modeling. In Proceedings of the2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 328\u2013335."},{"key":"e_1_3_1_69_2","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/ICMLA51294.2020.00060","volume-title":"Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)","author":"Vangara Raviteja","year":"2020","unstructured":"Raviteja Vangara, Erik Skau, Gopinath Chennupati, Hristo Djidjev, Thomas Tierney, James P. Smith, Manish Bhattarai, Valentin G. Stanev, and Boian S. Alexandrov. 2020. Semantic nonnegative matrix factorization with automatic model determination for topic modeling. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA). 328\u2013335. DOI:10.1109\/ICMLA51294.2020.00060"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2906934"},{"key":"e_1_3_1_71_2","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1145\/860435.860485","volume-title":"Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval","author":"Xu Wei","year":"2003","unstructured":"Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. 267\u2013273."},{"key":"e_1_3_1_72_2","doi-asserted-by":"crossref","first-page":"189","DOI":"10.3115\/981658.981684","volume-title":"Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics","author":"Yarowsky David","year":"1995","unstructured":"David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics (Cambridge, Massachusetts) (ACL \u201995). Association for Computational Linguistics, , 189\u2013196. DOI:10.3115\/981658.981684"},{"issue":"2","key":"e_1_3_1_73_2","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1007\/s11460-011-0128-0","article-title":"Robust non-negative matrix factorization","volume":"6","author":"Zhang Lijun","year":"2011","unstructured":"Lijun Zhang, Zhengguang Chen, Miao Zheng, and Xiaofei He. 2011. Robust non-negative matrix factorization. Frontiers of Electrical and Electronic Engineering in China 6, 2 (2011), 192\u2013200.","journal-title":"Frontiers of Electrical and Electronic Engineering in China"},{"key":"e_1_3_1_74_2","first-page":"81","volume-title":"Proceedings of the 2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA)","author":"Zhang Shao-Huai","year":"2019","unstructured":"Shao-Huai Zhang, Cheng-Chung Kuo, and Chu-Sing Yang. 2019. Static PE malware type classification using machine learning techniques. In Proceedings of the 2019 International Conference on Intelligent Computing and its Emerging Applications (ICEA). 81\u201386. DOI:10.1109\/ICEA.2019.8858297"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2019.2947861"}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3624567","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3624567","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:49:46Z","timestamp":1750268986000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3624567"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,13]]},"references-count":74,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,11,30]]}},"alternative-id":["10.1145\/3624567"],"URL":"https:\/\/doi.org\/10.1145\/3624567","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,13]]},"assertion":[{"value":"2022-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}