{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,16]],"date-time":"2023-09-16T11:40:05Z","timestamp":1694864405838},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"17","license":[{"start":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T00:00:00Z","timestamp":1679529600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T00:00:00Z","timestamp":1679529600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001656","name":"Helmholtz-Gemeinschaft","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001656","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Forschungszentrum J\u00fclich GmbH"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Sample compression using <jats:italic>\ud835\udf16<\/jats:italic>-net effectively reduces the number of labeled instances required for accurate classification with nearest neighbor algorithms. However, one-shot construction of an <jats:italic>\ud835\udf16<\/jats:italic>-net can be extremely challenging in large-scale distributed data sets. We explore two approaches for distributed sample compression: one where local <jats:italic>\ud835\udf16<\/jats:italic>-net is constructed for each data partition and then merged during an aggregation phase, and one where a single backbone of an <jats:italic>\ud835\udf16<\/jats:italic>-net is constructed from one partition and aggregates target label distributions from other partitions. Both approaches are applied to the problem of malware detection in a complex, real-world data set of Android apps using the nearest neighbor algorithm. Examination of the compression rate, computational efficiency, and predictive power shows that a single backbone of an <jats:italic>\ud835\udf16<\/jats:italic>-net attains favorable performance while achieving a compression rate of 99%.<\/jats:p>","DOI":"10.1007\/s10489-023-04482-y","type":"journal-article","created":{"date-parts":[[2023,3,23]],"date-time":"2023-03-23T11:04:46Z","timestamp":1679569486000},"page":"19976-19989","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["DISCONA: distributed sample compression for nearest neighbor algorithm"],"prefix":"10.1007","volume":"53","author":[{"given":"Jedrzej","family":"Rybicki","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tatiana","family":"Frenklach","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rami","family":"Puzis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,3,23]]},"reference":[{"key":"4482_CR1","doi-asserted-by":"crossref","unstructured":"Allix K, Bissyand\u00e9 TF, Klein J et al (2016) AndroZoo: collecting millions of Android apps for the research community. In: Proceedings of the 13th international conference on mining software repositories (MSR\u201916). ACM, New York, pp 468\u2013471","DOI":"10.1145\/2901739.2903508"},{"key":"4482_CR2","doi-asserted-by":"publisher","unstructured":"Angiulli F (2005) Fast condensed nearest neighbor rule. In: Proceedings 22nd International Conference on Machine Learning (ICML\u201905). https:\/\/doi.org\/10.1145\/1102351.1102355. Association for Computing Machinery, New York, pp 25\u201332","DOI":"10.1145\/1102351.1102355"},{"key":"4482_CR3","unstructured":"AppBrain (2022) Android and Google Play statistics. https:\/\/www.appbrain.com\/stats, last Accessed: 28 Apr 2022"},{"key":"4482_CR4","doi-asserted-by":"publisher","unstructured":"Arp D, Spreitzenbarth M, H\u00fcbner M et al (2014) DREBIN: effective and explainable detection of android malware in your pocket. In: Symposium on network and distributed system security (NDSS). https:\/\/doi.org\/10.14722\/ndss.2014.23247. San Diego, Internet Society, pp 1\u201315","DOI":"10.14722\/ndss.2014.23247"},{"issue":"44","key":"4482_CR5","first-page":"1519","volume":"16","author":"D Berend","year":"2015","unstructured":"Berend D, Kontorovich A (2015) A finite sample analysis of the naive Bayes classifier. J Mach Learn Res 16(44):1519\u20131545","journal-title":"J Mach Learn Res"},{"issue":"6","key":"4482_CR6","doi-asserted-by":"publisher","first-page":"5380","DOI":"10.1109\/TCYB.2020.3031610","volume":"52","author":"Z Bian","year":"2022","unstructured":"Bian Z, Vong CM, Wong PK et al (2022) Fuzzy KNN method with adaptive nearest neighbors. IEEE Trans Cybern 52(6):5380\u20135393. https:\/\/doi.org\/10.1109\/TCYB.2020.3031610","journal-title":"IEEE Trans Cybern"},{"key":"4482_CR7","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1016\/j.engappai.2017.02.006","volume":"60","author":"JR Cano","year":"2017","unstructured":"Cano JR, Aljohani NR, Abbasi RA et al (2017) Prototype selection to improve monotonic nearest neighbor. Eng Appl Artif Intell 60:128\u2013135. https:\/\/doi.org\/10.1016\/j.engappai.2017.02.006","journal-title":"Eng Appl Artif Intell"},{"key":"4482_CR8","first-page":"9","volume":"2018","author":"T Chen","year":"2018","unstructured":"Chen T, Mao Q, Yang Y et al (2018) Tinydroid: a lightweight and efficient model for Android malware detection and classification. Mob Inf Syst 2018:9","journal-title":"Mob Inf Syst"},{"key":"4482_CR9","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1515\/jaiscr-2017-0011","volume":"7","author":"V Devi","year":"2017","unstructured":"Devi V, Meena L (2017) Parallel MCNN (pMCNN) with application to prototype selection on large and streaming data. Journal of Artificial Intelligence and Soft Computing Research 7:155\u2013169. https:\/\/doi.org\/10.1515\/jaiscr-2017-0011","journal-title":"Journal of Artificial Intelligence and Soft Computing Research"},{"key":"4482_CR10","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1016\/j.eswa.2019.06.029","volume":"136","author":"O Dogan","year":"2019","unstructured":"Dogan O, Oztaysi B (2019) Genders prediction from indoor customer paths by Levenshtein-based fuzzy kNN. Expert Syst Appl 136:42\u201349. https:\/\/doi.org\/10.1016\/j.eswa.2019.06.029","journal-title":"Expert Syst Appl"},{"key":"4482_CR11","unstructured":"Flores-Velazco A, Mount DM (2020) Coresets for the nearest-neighbor rule. In: Proceedings of the 28th annual european symposium on algorithms (ESA 2020), Leibniz International Proceedings in Informatics (LIPIcs), vol 173. Schloss Dagstuhl\u2013Leibniz-Zentrum f\u00fcr Informatik, pp 47:1\u201347:19"},{"key":"4482_CR12","doi-asserted-by":"publisher","unstructured":"Flores-Velazco A, Mount DM (2021) Boundary-sensitive approach for approximate nearest-neighbor classification. In: Proceedings of the 29th annual european symposium on algorithms (ESA 2021). https:\/\/doi.org\/10.4230\/LIPIcs.ESA.2021.44, pp 44:1\u201344:15","DOI":"10.4230\/LIPIcs.ESA.2021.44"},{"issue":"1","key":"4482_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.cose.2021.102386","volume":"109","author":"T Frenklach","year":"2021","unstructured":"Frenklach T, Cohen D, Shabtai A et al (2021) Android malware detection via an app similarity graph. Computers & Security 109(1):1\u201332. https:\/\/doi.org\/10.1016\/j.cose.2021.102386","journal-title":"Computers & Security"},{"key":"4482_CR14","unstructured":"Gottlieb LA, Kontorovich A, Nisnevitch P (2016) Nearly optimal classification for semimetrics. In: Artificial Intelligence and Statistics, PMLR, pp 379\u2013388"},{"issue":"5","key":"4482_CR15","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1109\/TIT.1968.1054155","volume":"14","author":"PE Hart","year":"1968","unstructured":"Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14(5):515\u2013516","journal-title":"IEEE Trans Inf Theory"},{"key":"4482_CR16","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.neucom.2021.04.112","volume":"459","author":"SC Hoi","year":"2021","unstructured":"Hoi SC, Sahoo D, Lu J et al (2021) Online learning: a comprehensive survey. Neurocomputing 459:249\u2013289. https:\/\/doi.org\/10.1016\/j.neucom.2021.04.112","journal-title":"Neurocomputing"},{"issue":"3","key":"4482_CR17","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1109\/MCSE.2007.55","volume":"9","author":"JD Hunter","year":"2007","unstructured":"Hunter JD (2007) Matplotlib: a 2D graphics environment. Computing In Science & Engineering 9 (3):90\u201395. https:\/\/doi.org\/10.1109\/MCSE.2007.55","journal-title":"Computing In Science & Engineering"},{"key":"4482_CR18","unstructured":"IDC (2020) Smartphone market share. https:\/\/www.idc.com\/promo\/smartphone-market-share\/os, last Accessed: 1 Oct 2021"},{"key":"4482_CR19","first-page":"1","volume":"18","author":"A Kontorovich","year":"2017","unstructured":"Kontorovich A, Sabato S, Urner R (2017a) Active nearest-neighbor learning in metric spaces. J Mach Learn Res 18:1\u2013 38","journal-title":"J Mach Learn Res"},{"key":"4482_CR20","first-page":"1573","volume":"30","author":"A Kontorovich","year":"2017","unstructured":"Kontorovich A, Sabato S, Weiss R (2017b) Nearest-neighbor sample compression: efficiency, consistency, infinite dimensions. Adv Neural Inf Process Syst 30:1573\u20131583","journal-title":"Adv Neural Inf Process Syst"},{"key":"4482_CR21","unstructured":"Krauthgamer R, Lee JR (2004) Navigating Nets: simple algorithms for proximity search. In: Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA\u201904). Society for Industrial and Applied Mathematics, USA, pp 798\u2013807"},{"key":"4482_CR22","doi-asserted-by":"publisher","unstructured":"Kumbure MM, Luukka P (2022) A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance. Granul Comput 7. https:\/\/doi.org\/10.1007\/s41066-021-00288","DOI":"10.1007\/s41066-021-00288"},{"key":"4482_CR23","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.patrec.2017.05.019","volume":"94","author":"T Liang","year":"2017","unstructured":"Liang T, Xu X, Xiao P (2017) A new image classification method based on modified condensed nearest neighbor and convolutional neural networks. Pattern Recogn Lett 94:105\u2013111. https:\/\/doi.org\/10.1016\/j.patrec.2017.05.019","journal-title":"Pattern Recogn Lett"},{"key":"4482_CR24","unstructured":"Littlestone N, Warmuth M (1986) Relating data compression and learnability. Tech. rep., University of California Santa Cruz"},{"key":"4482_CR25","doi-asserted-by":"publisher","first-page":"1261","DOI":"10.1016\/j.neucom.2017.06.084","volume":"275","author":"V Losing","year":"2018","unstructured":"Losing V, Hammer B, Wersing H (2018) Incremental on-line learning: a review and comparison of state of the art algorithms. Neurocomputing 275:1261\u20131274. https:\/\/doi.org\/10.1016\/j.neucom.2017.06.084","journal-title":"Neurocomputing"},{"key":"4482_CR26","doi-asserted-by":"publisher","unstructured":"McKinney W (2010) Data structures for statistical computing in Python. In: Proceedings of the 9th Python in Science Conference (SciPy 2010). https:\/\/doi.org\/10.25080\/Majora-92bf1922-00a, pp 56\u201361","DOI":"10.25080\/Majora-92bf1922-00a"},{"key":"4482_CR27","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1007\/s13218-017-0519-3","volume":"32","author":"A Munteanu","year":"2017","unstructured":"Munteanu A, Schwiegelshohn C (2017) Coresets-methods and history: a theoreticians design pattern for approximation and streaming algorithms. KI - K\u00fcnstliche Intelligenz 32:37\u201353","journal-title":"KI - K\u00fcnstliche Intelligenz"},{"key":"4482_CR28","doi-asserted-by":"publisher","unstructured":"Odusami M, Abayomi-Alli O, Misra S et al (2018) Android malware detection: a survey. In: Proceedings of the international conference on applied informatics (ICAI). https:\/\/doi.org\/10.1007\/978-3-030-01535-0_19. Springer, Cham, pp 255\u2013266","DOI":"10.1007\/978-3-030-01535-0_19"},{"key":"4482_CR29","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825\u20132830. ISSN 1533\u20137928","journal-title":"J Mach Learn Res"},{"key":"4482_CR30","unstructured":"Phillips JM (2017) Coresets and sketches. In: Handbook of discrete and computational geometry. Chapman and Hall\/CRC, pp 1269\u20131288"},{"key":"4482_CR31","doi-asserted-by":"publisher","unstructured":"Qiu J, Zhang J, Luo W et al (2020) A survey of Android malware detection with deep neural models. ACM Comput Surv 53(6). https:\/\/doi.org\/10.1145\/3417978","DOI":"10.1145\/3417978"},{"key":"4482_CR32","doi-asserted-by":"publisher","unstructured":"Shankar VG, Somani G (2016) Anti-hijack: runtime detection of malware initiated hijacking in Android. In: Proceedings of the International Conference on Information Security & Privacy (ICISP2015). https:\/\/doi.org\/10.1016\/j.procs.2016.02.105, vol 78. Elsevier, Amsterdam, pp 587\u2013594","DOI":"10.1016\/j.procs.2016.02.105"},{"key":"4482_CR33","doi-asserted-by":"publisher","first-page":"653","DOI":"10.1016\/j.procs.2022.03.086","volume":"201","author":"AS Shatnawi","year":"2022","unstructured":"Shatnawi AS, Yassen Q, Yateem A (2022) An android malware detection approach based on static feature analysis using machine learning algorithms. Procedia Computer Science 201:653\u2013658. https:\/\/doi.org\/10.1016\/j.procs.2022.03.086","journal-title":"Procedia Computer Science"},{"issue":"4","key":"4482_CR34","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1214\/aos\/1176343886","volume":"5","author":"CJ Stone","year":"1977","unstructured":"Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5(4):595\u2013620. https:\/\/doi.org\/10.1214\/aos\/1176343886","journal-title":"Ann Stat"},{"key":"4482_CR35","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1016\/j.future.2019.11.034","volume":"105","author":"R Taheri","year":"2020","unstructured":"Taheri R, Ghahramani M, Javidan R et al (2020) Similarity-based Android malware detection using hamming distance of static binary features. Futur Gener Comput Syst 105:230\u2013247","journal-title":"Futur Gener Comput Syst"},{"key":"4482_CR36","unstructured":"Turi Developer Team (2022) Turi create. https:\/\/github.com\/apple\/turicreate, last Accessed: 28 Apr 2022"},{"key":"4482_CR37","unstructured":"Virus Total (2020) VT Graph. https:\/\/www.virustotal.com\/gui\/graph-overview, last Accessed: 28 Apr 2022"},{"key":"4482_CR38","first-page":"207","volume":"10","author":"KQ Weinberger","year":"2009","unstructured":"Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207\u2013244","journal-title":"J Mach Learn Res"},{"key":"4482_CR39","doi-asserted-by":"publisher","unstructured":"Wu DJ, Mao CH, Wei TE et al (2012) DroidMat: android malware detection through manifest and API calls tracing. In: Proceedings of the 7th asia joint conference on information security. https:\/\/doi.org\/10.1109\/AsiaJCIS.2012.18. IEEE Computer Society, Washington, pp 62\u201369","DOI":"10.1109\/AsiaJCIS.2012.18"},{"key":"4482_CR40","unstructured":"Yan LK, Yin H (2012) DroidScope: seamlessly reconstructing the OS and dalvik semantic views for dynamic Android malware analysis. In: Proceedings of the 21st USENIX Security Symposium. Bellevue, USENIX Association, pp 569\u2013584"},{"key":"4482_CR41","doi-asserted-by":"publisher","unstructured":"Zhang S, Li X, Zong M et al (2017) Learning k for KNN classification. ACM Trans Intell Syst Technol 8(3). https:\/\/doi.org\/10.1145\/2990508","DOI":"10.1145\/2990508"},{"key":"4482_CR42","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1016\/j.fsidi.2021.301285","volume":"39","author":"X Zhang","year":"2021","unstructured":"Zhang X, Breitinger F, Luechinger E et al (2021) Android application forensics: a survey of obfuscation, obfuscation detection and deobfuscation techniques and their impact on investigations. Forensic Science International: Digital Investigation 39:301\u2013285. https:\/\/doi.org\/10.1016\/j.fsidi.2021.301285","journal-title":"Forensic Science International: Digital Investigation"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-023-04482-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-023-04482-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-023-04482-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T11:33:14Z","timestamp":1694777594000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-023-04482-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,23]]},"references-count":42,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["4482"],"URL":"https:\/\/doi.org\/10.1007\/s10489-023-04482-y","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"value":"0924-669X","type":"print"},{"value":"1573-7497","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,23]]},"assertion":[{"value":"19 January 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2023","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors contributed to the conception and design of the study. All authors read and approved the final manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"<!--Emphasis Type='Bold' removed-->Ethics approval"}},{"value":"The authors declare that they have no conflict of interest.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"<!--Emphasis Type='Bold' removed-->Conflict of Interests"}}]}}