{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T06:29:04Z","timestamp":1778048944004,"version":"3.51.4"},"reference-count":64,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T00:00:00Z","timestamp":1711497600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T00:00:00Z","timestamp":1711497600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100021856","name":"Ministero dell'Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond (Prot. 2017WR7SHH)"],"award-info":[{"award-number":["Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond (Prot. 2017WR7SHH)"]}],"id":[{"id":"10.13039\/501100021856","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100021856","name":"Ministero dell'Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond (Prot. 2017WR7SHH)"],"award-info":[{"award-number":["Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond (Prot. 2017WR7SHH)"]}],"id":[{"id":"10.13039\/501100021856","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100021856","name":"Ministero dell'Universit\u00e0 e della Ricerca","doi-asserted-by":"publisher","award":["Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond (Prot. 2017WR7SHH)"],"award-info":[{"award-number":["Multicriteria data structures and algorithms: from compressed to learned indexes, and beyond (Prot. 2017WR7SHH)"]}],"id":[{"id":"10.13039\/501100021856","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009112","name":"Istituto Nazionale di Alta Matematica \"Francesco Severi\"","doi-asserted-by":"publisher","award":["Analysis and Processing of Big Data based on Graph Models"],"award-info":[{"award-number":["Analysis and Processing of Big Data based on Graph Models"]}],"id":[{"id":"10.13039\/100009112","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Bloom filters, since their introduction over 50 years ago, have become a pillar to handle membership queries in small space, with relevant application in Big Data Mining and Stream Processing. Further improvements have been recently proposed with the use of Machine Learning techniques: learned Bloom filters. Those latter make considerably more complicated the proper parameter setting of this multi-criteria data structure, in particular in regard to the choice of one of its key components (the classifier) and accounting for the classification complexity of the input dataset. Given this State of the Art, our contributions are as follows. (1) A novel methodology, supported by software, for designing, analyzing and implementing learned Bloom filters that account for their own multi-criteria nature, in particular concerning classifier type choice and data classification complexity. Extensive experiments show the validity of the proposed methodology and, being our software public, we offer a valid tool to the practitioners interested in using learned Bloom filters. (2) Further contributions to the advancement of the State of the Art that are of great practical relevance are the following: (a) the classifier inference time should not be taken as a proxy for the filter reject time; (b) of the many classifiers we have considered, only two offer good performance; this result is in agreement with and further strengthens early findings in the literature; (c) Sandwiched Bloom filter, which is already known as being one of the references of this area, is further shown here to have the remarkable property of robustness to data complexity and classifier performance variability.<\/jats:p>","DOI":"10.1186\/s40537-024-00906-9","type":"journal-article","created":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T12:02:07Z","timestamp":1711540927000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["The role of classifiers and data complexity in learned Bloom filters: insights and recommendations"],"prefix":"10.1186","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7574-697X","authenticated-orcid":false,"given":"Dario","family":"Malchiodi","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Davide","family":"Raimondi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Giacomo","family":"Fumagalli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raffaele","family":"Giancarlo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marco","family":"Frasca","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,3,27]]},"reference":[{"key":"906_CR1","doi-asserted-by":"publisher","unstructured":"Kraska T, Beutel A, Chi EH, Dean J, Polyzotis N. The case for learned index structures. In: Proceedings of the 2018 international conference on management of data. SIGMOD \u201918. New York: Association for Computing Machinery, 2018. p. 489\u2013504. https:\/\/doi.org\/10.1145\/3183713.3196909.","DOI":"10.1145\/3183713.3196909"},{"key":"906_CR2","doi-asserted-by":"publisher","first-page":"103077","DOI":"10.1016\/j.jnca.2021.103077","volume":"186","author":"Q Wu","year":"2021","unstructured":"Wu Q, Wang Q, Zhang M, Zheng R, Zhu J, Hu J. Learned bloom-filter for the efficient name lookup in information-centric networking. J Netw Comput Appl. 2021;186:103077. https:\/\/doi.org\/10.1016\/j.jnca.2021.103077.","journal-title":"J Netw Comput Appl."},{"issue":"6","key":"906_CR3","doi-asserted-by":"publisher","first-page":"744","DOI":"10.1093\/bioinformatics\/btaa911","volume":"37","author":"M Kirsche","year":"2020","unstructured":"Kirsche M, Das A, Schatz MC. Sapling: accelerating suffix array queries with learned data models. Bioinformatics. 2020;37(6):744\u20139. https:\/\/doi.org\/10.1093\/bioinformatics\/btaa911.","journal-title":"Bioinformatics."},{"key":"906_CR4","doi-asserted-by":"publisher","first-page":"646","DOI":"10.1017\/9781108637435.037","volume-title":"Beyond the worst-case analysis of algorithms","author":"M Mitzenmacher","year":"2021","unstructured":"Mitzenmacher M, Vassilvitskii S. Algorithms with predictions. In: Roughgarden T, editor. Beyond the worst-case analysis of algorithms. Cambridge: Cambridge University Press; 2021. p. 646\u201362. https:\/\/doi.org\/10.1017\/9781108637435.037"},{"key":"906_CR5","volume-title":"Pattern classification","author":"RO Duda","year":"2000","unstructured":"Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2000.","edition":"2"},{"key":"906_CR6","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139165495","volume-title":"Statistical models?: Theory and practice","author":"D Freedman","year":"2005","unstructured":"Freedman D. Statistical models?: Theory and practice. Cambridge: Cambridge University Press; 2005."},{"issue":"2","key":"906_CR7","doi-asserted-by":"publisher","first-page":"318","DOI":"10.1002\/spe.3150","volume":"53","author":"D Amato","year":"2023","unstructured":"Amato D, Lo Bosco G, Giancarlo R. Standard versus uniform binary search and their variants in learned static indexing: the case of the searching on sorted data benchmarking software platform. Softw Pract Exp. 2023;53(2):318\u201346. https:\/\/doi.org\/10.1002\/spe.3150.","journal-title":"Softw Pract Exp."},{"issue":"3","key":"906_CR8","doi-asserted-by":"publisher","first-page":"56","DOI":"10.3390\/data8030056","volume":"8","author":"D Amato","year":"2023","unstructured":"Amato D, Giancarlo R, Lo Bosco G. Learned sorted table search and static indexes in small-space data models. Data. 2023;8(3):56. https:\/\/doi.org\/10.3390\/data8030056.","journal-title":"Data."},{"issue":"29","key":"906_CR9","doi-asserted-by":"publisher","first-page":"21399","DOI":"10.1007\/s00521-023-08841-1","volume":"35","author":"D Amato","year":"2023","unstructured":"Amato D, Lo Bosco G, Giancarlo R. Neural networks as building blocks for the design of efficient learned indexes. Neural Comput Appl. 2023;35(29):21399\u2013414. https:\/\/doi.org\/10.1007\/s00521-023-08841-1.","journal-title":"Neural Comput Appl."},{"key":"906_CR10","doi-asserted-by":"publisher","first-page":"74021","DOI":"10.1109\/ACCESS.2023.3295434","volume":"11","author":"P Ferragina","year":"2023","unstructured":"Ferragina P, Frasca M, Marin\u00f2 GC, Vinciguerra G. On nonlinear learned string indexing. IEEE Access. 2023;11:74021\u201334. https:\/\/doi.org\/10.1109\/ACCESS.2023.3295434.","journal-title":"IEEE Access."},{"issue":"8","key":"906_CR11","doi-asserted-by":"publisher","first-page":"1162","DOI":"10.14778\/3389133.3389135","volume":"13","author":"P Ferragina","year":"2020","unstructured":"Ferragina P, Vinciguerra G. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB. 2020;13(8):1162\u201375. https:\/\/doi.org\/10.14778\/3389133.3389135.","journal-title":"PVLDB."},{"key":"906_CR12","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/j.tcs.2021.04.015","volume":"871","author":"P Ferragina","year":"2021","unstructured":"Ferragina P, Lillo F, Vinciguerra G. On the performance of learned data structures. Theor Comput Sci. 2021;871:107\u201320.","journal-title":"Theor Comput Sci."},{"key":"906_CR13","doi-asserted-by":"crossref","unstructured":"Kipf A, Marcus R, van Renen A, Stoian M, Kemper A, Kraska T, Neumann T. Radixspline: a single-pass learned index. In: Proceedings of the of the third international workshop on exploiting artificial intelligence techniques for data management. aiDM \u201920. New York: Association for Computing Machinery; 2020. p. 1\u20135.","DOI":"10.1145\/3401071.3401659"},{"issue":"6","key":"906_CR14","doi-asserted-by":"publisher","first-page":"744","DOI":"10.1093\/bioinformatics\/btaa911","volume":"37","author":"M Kirsche","year":"2020","unstructured":"Kirsche M, Das A, Schatz MC. Sapling: accelerating suffix array queries with learned data models. Bioinformatics. 2020;37(6):744\u20139.","journal-title":"Bioinformatics."},{"issue":"5","key":"906_CR15","doi-asserted-by":"publisher","first-page":"1079","DOI":"10.14778\/3510397.3510405","volume":"15","author":"M Maltry","year":"2022","unstructured":"Maltry M, Dittrich J. A critical analysis of recursive model indexes. Proc VLDB Endow. 2022;15(5):1079\u201391. https:\/\/doi.org\/10.14778\/3510397.3510405.","journal-title":"Proc VLDB Endow."},{"key":"906_CR16","doi-asserted-by":"crossref","unstructured":"Marcus R, Kipf A, van Renen A, Stoian M, Misra S, Kemper A, Neumann T, Kraska T. Benchmarking learned indexes, vol. 14; 2020. p. 1\u201313. arXiv preprint arXiv:2006.12804","DOI":"10.14778\/3421424.3421425"},{"key":"906_CR17","doi-asserted-by":"crossref","unstructured":"Marcus R, Zhang E, Kraska T. CDFShop: Exploring and optimizing learned index structures. In: Proceedings of the 2020 ACM SIGMOD international conference on management of data. SIGMOD \u201920; 2020; p. 2789\u20132792.","DOI":"10.1145\/3318464.3384706"},{"key":"906_CR18","doi-asserted-by":"crossref","unstructured":"Boffa A, Ferragina P, Vinciguerra G. A \u201clearned\u201d approach to quicken and compress rank\/select dictionaries. In: Proceedings of the SIAM symposium on algorithm engineering and experiments (ALENEX); 2021.","DOI":"10.1137\/1.9781611976472.4"},{"issue":"7","key":"906_CR19","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1145\/362686.362692","volume":"13","author":"BH Bloom","year":"1970","unstructured":"Bloom BH. Space\/time trade-offs in hash coding with allowable errors. Commun ACM. 1970;13(7):422\u20136. https:\/\/doi.org\/10.1145\/362686.362692.","journal-title":"Commun ACM."},{"key":"906_CR20","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139924801","volume-title":"Mining of massive data sets","author":"J Leskovec","year":"2014","unstructured":"Leskovec J, Rajaraman A, Ullman JD. Mining of massive data sets. 2nd ed. Cambridge: Cambridge University Press; 2014. https:\/\/doi.org\/10.1017\/CBO9781139924801.","edition":"2"},{"issue":"6","key":"906_CR21","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1016\/j.ipl.2006.10.007","volume":"101","author":"PS Almeida","year":"2007","unstructured":"Almeida PS, Baquero C, Pregui\u00e7a N, Hutchison D. Scalable Bloom filters. Inf Process Lett. 2007;101(6):255\u201361.","journal-title":"Inf Process Lett."},{"issue":"1","key":"906_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-12-333","volume":"12","author":"P Melsted","year":"2011","unstructured":"Melsted P, Pritchard JK. Efficient counting of k-mers in DNA sequences using a Bloom filter. BMC Bioinf. 2011;12(1):1\u20137.","journal-title":"BMC Bioinf."},{"issue":"12","key":"906_CR23","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1093\/bioinformatics\/btu288","volume":"30","author":"Z Zhang","year":"2014","unstructured":"Zhang Z, Wang W. RNA-Skim: a rapid method for RNA-Seq quantification at transcript level. Bioinformatics. 2014;30(12):283\u201392. https:\/\/doi.org\/10.1093\/bioinformatics\/btu288.","journal-title":"Bioinformatics."},{"issue":"2","key":"906_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1365815.1365816","volume":"26","author":"F Chang","year":"2008","unstructured":"Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE. Bigtable: a distributed storage system for structured data. ACM Trans Compute Syst. 2008;26(2):1\u201326.","journal-title":"ACM Trans Compute Syst."},{"key":"906_CR25","unstructured":"Broder A, Mitzenmacher M. Network applications of Bloom filters: a survey. In: Internet mathematics, vol. 1, 2002. p. 636\u2013646. http:\/\/citeseerx.ist.psu.edu\/viewdoc\/summary?doi=10.1.1.20.98"},{"key":"906_CR26","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1016\/j.is.2015.01.002","volume":"54","author":"A Crainiceanu","year":"2015","unstructured":"Crainiceanu A, Lemire D. Bloofi: multidimensional Bloom filters. Inf Syst. 2015;54:311\u201324. https:\/\/doi.org\/10.1016\/j.is.2015.01.002.","journal-title":"Inf Syst."},{"key":"906_CR27","doi-asserted-by":"crossref","unstructured":"Zeng M, Zou B, Kui X, Zhu C, Xiao L, Chen Z, Du J, et\u00a0al. Pa-lbf: prefix-based and adaptive learned bloom filter for spatial data. Int J Intell Syst. 2023;2023.","DOI":"10.1155\/2023\/4970776"},{"key":"906_CR28","first-page":"1","volume-title":"Advances in Neural Information Processing Systems","author":"M Mitzenmacher","year":"2018","unstructured":"Mitzenmacher M. A model for learned bloom filters and optimizing by sandwiching. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors. Advances in Neural Information Processing Systems, vol. 31. Red Hook: Curran Associates; 2018. p. 1."},{"key":"906_CR29","unstructured":"Dai Z, Shrivastava A. Adaptive Learned Bloom Filter (Ada-BF): Efficient utilization of the classifier with application to real-time information filtering on the web. In: Advances in neural information processing systems, vol. 33, Red Hook: Curran Associates, Inc.; 2020. p. 11700\u201311710. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/86b94dae7c6517ec1ac767fd2c136580-Paper.pdf"},{"key":"906_CR30","unstructured":"Vaidya K, Knorr E, Kraska T, Mitzenmacher M. Partitioned learned Bloom filters. In: International conference on learning representations; 2021. https:\/\/openreview.net\/forum?id=6BRLOfrMhW"},{"issue":"12","key":"906_CR31","doi-asserted-by":"publisher","first-page":"2355","DOI":"10.14778\/3407790.3407830","volume":"13","author":"Q Liu","year":"2020","unstructured":"Liu Q, Zheng L, Shen Y, Chen L. Stable learned Bloom filters for data streams. Proc VLDB Endow. 2020;13(12):2355\u201367. https:\/\/doi.org\/10.14778\/3407790.3407830.","journal-title":"Proc VLDB Endow."},{"key":"906_CR32","doi-asserted-by":"crossref","unstructured":"Fumagalli G, Raimondi D, Giancarlo R, Malchiodi D, Frasca M. On the choice of general purpose classifiers in learned Bloom filters: an initial analysis within basic filters. In: Proceedings of the 11th international conference on pattern recognition applications and methods (ICPRAM); 2022. p. 675\u2013682.","DOI":"10.5220\/0010889000003122"},{"issue":"3","key":"906_CR33","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1109\/LES.2022.3156019","volume":"14","author":"Z Dai","year":"2022","unstructured":"Dai Z, Shrivastava A, Reviriego P, Hern\u00e1ndez JA. Optimizing learned bloom filters: How much should be learned? IEEE Embedded Syst Lett. 2022;14(3):123\u20136.","journal-title":"IEEE Embedded Syst Lett."},{"issue":"2","key":"906_CR34","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1016\/j.asoc.2004.12.002","volume":"6","author":"S Ali","year":"2006","unstructured":"Ali S, Smith KA. On learning algorithm selection for classification. Appl Soft Comput. 2006;6(2):119\u201338.","journal-title":"Appl Soft Comput."},{"issue":"12","key":"906_CR35","doi-asserted-by":"publisher","first-page":"4820","DOI":"10.1016\/j.eswa.2013.02.025","volume":"40","author":"J-R Cano","year":"2013","unstructured":"Cano J-R. Analysis of data complexity measures for classification. Expert Syst Appl. 2013;40(12):4820\u201331. https:\/\/doi.org\/10.1016\/j.eswa.2013.02.025.","journal-title":"Expert Syst Appl."},{"key":"906_CR36","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1016\/j.ins.2013.10.007","volume":"260","author":"MJ Flores","year":"2014","unstructured":"Flores MJ, G\u00e1mez JA, Mart\u00ednez AM. Domains of competence of the semi-naive Bayesian network classifiers. Inf Sci. 2014;260:120\u201348.","journal-title":"Inf Sci."},{"issue":"1","key":"906_CR37","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1007\/s10115-013-0700-4","volume":"42","author":"J Luengo","year":"2015","unstructured":"Luengo J, Herrera F. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowl Inf Syst. 2015;42(1):147\u201380.","journal-title":"Knowl Inf Syst."},{"key":"906_CR38","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1016\/j.comcom.2022.12.027","volume":"200","author":"R Patgiri","year":"2023","unstructured":"Patgiri R, Biswas A, Nayak S. deepbf: Malicious url detection using learned bloom filter and evolutionary deep learning. Comput Commun. 2023;200:30\u201341.","journal-title":"Comput Commun."},{"key":"906_CR39","doi-asserted-by":"crossref","unstructured":"Malchiodi D, Raimondi D, Fumagalli G, Giancarlo R, Frasca M. A critical analysis of classifier selection in learned bloom filters: the essentials. In: Iliadis, L., Maglogiannis, I., Castro, S., Jayne, C., Pimenidis, E. (eds.) Engineering application of neural networks\u201424th international Conference\u2014EAAAI\/EANN 2023\u2014Le\u00f3n, Spain, June 14-17, 2023\u2014Proceedings. Communications in Computer and Information Science, vol. 1826; 2023, p. 47\u201361. Springer.","DOI":"10.1007\/978-3-031-34204-2_5"},{"issue":"3","key":"906_CR40","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1016\/0022-0000(81)90033-7","volume":"22","author":"MN Wegman","year":"1981","unstructured":"Wegman MN, Carter JL. New hash functions and their use in authentication and set equality. J Comput Syst Sci. 1981;22(3):265\u201379. https:\/\/doi.org\/10.1016\/0022-0000(81)90033-7.","journal-title":"J Comput Syst Sci."},{"issue":"2","key":"906_CR41","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1016\/0022-0000(79)90044-8","volume":"18","author":"JL Carter","year":"1979","unstructured":"Carter JL, Wegman MN. Universal classes of hash functions. J Comput Syst Sci. 1979;18(2):143\u201354. https:\/\/doi.org\/10.1016\/0022-0000(79)90044-8.","journal-title":"J Comput Syst Sci."},{"issue":"4","key":"906_CR42","doi-asserted-by":"publisher","first-page":"485","DOI":"10.1080\/15427951.2004.10129096","volume":"1","author":"A Broder","year":"2004","unstructured":"Broder A, Mitzenmacher M. Network applications of Bloom filters: a survey. Internet Math. 2004;1(4):485\u2013509.","journal-title":"Internet Math."},{"issue":"2","key":"906_CR43","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","volume":"20","author":"DR Cox","year":"1958","unstructured":"Cox DR. The regression analysis of binary sequences. J R Stat Soc Ser B (Methodol). 1958;20(2):215\u201332.","journal-title":"J R Stat Soc Ser B (Methodol)."},{"key":"906_CR44","volume-title":"Pattern classification and scene analysis","author":"RO Duda","year":"1973","unstructured":"Duda RO, Hart PE. Pattern classification and scene analysis. New York: Willey; 1973."},{"key":"906_CR45","doi-asserted-by":"publisher","unstructured":"Cho K, van Merri\u00ebnboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder\u2013decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. p. 103\u2013111. Association for Computational Linguistics, Doha, Qatar; 2014. https:\/\/doi.org\/10.3115\/v1\/W14-4012. https:\/\/aclanthology.org\/W14-4012","DOI":"10.3115\/v1\/W14-4012"},{"key":"906_CR46","unstructured":"Morik K, Brockhausen P, Joachims T. Combining statistical learning with a knowledge-based approach: a case study in intensive care monitoring. Technical Report; 1999."},{"key":"906_CR47","unstructured":"Zell A. Simulation neuronaler netze. habilitation, Uni Stuttgart, 1994."},{"key":"906_CR48","volume-title":"Neural networks: a comprehensive foundation","author":"S Haykin","year":"1994","unstructured":"Haykin S. Neural networks: a comprehensive foundation. Upper Saddle River: Prentice Hall PTR; 1994."},{"issue":"11","key":"906_CR49","doi-asserted-by":"publisher","first-page":"1323","DOI":"10.1016\/S0167-8655(97)00109-8","volume":"18","author":"L Bruzzone","year":"1997","unstructured":"Bruzzone L, Serpico SB. Classification of imbalanced remote-sensing data by neural networks. Pattern Recogn Lett. 1997;18(11):1323\u20138. https:\/\/doi.org\/10.1016\/S0167-8655(97)00109-8.","journal-title":"Pattern Recogn Lett."},{"key":"906_CR50","volume-title":"Classification and regression trees","author":"L Breiman","year":"1984","unstructured":"Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Boca Raton: Chapman & Hall\/CRC; 1984."},{"issue":"1","key":"906_CR51","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001;45(1):5\u201332.","journal-title":"Mach Learn."},{"key":"906_CR52","doi-asserted-by":"publisher","unstructured":"Van\u00a0Hulse J, Khoshgoftaar TM, Napolitano A. Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th International Conference on Machine Learning. ICML \u201907. New York: ACM; 2007. p. 935\u2013942. https:\/\/doi.org\/10.1145\/1273496.1273614","DOI":"10.1145\/1273496.1273614"},{"issue":"1","key":"906_CR53","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1186\/1472-6947-11-51","volume":"11","author":"M Khalilia","year":"2011","unstructured":"Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inf Decis Mak. 2011;11(1):51.","journal-title":"BMC Med Inf Decis Mak."},{"issue":"5","key":"906_CR54","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3347711","volume":"52","author":"AC Lorena","year":"2019","unstructured":"Lorena AC, Garcia LPF, Lehmann J, Souto MCP, Ho TK. How complex is your classification problem? A survey on measuring classification complexity. ACM Comput Surv. 2019;52(5):1\u201334. https:\/\/doi.org\/10.1145\/3347711.","journal-title":"ACM Comput Surv"},{"issue":"9","key":"906_CR55","doi-asserted-by":"publisher","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","volume":"21","author":"H He","year":"2009","unstructured":"He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263\u201384. https:\/\/doi.org\/10.1109\/TKDE.2008.239.","journal-title":"IEEE Trans Knowl Data Eng."},{"issue":"4","key":"906_CR56","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1089\/cmb.2020.0431","volume":"28","author":"A Rahman","year":"2021","unstructured":"Rahman A, Medevedev P. Representation of k-mer sets using spectrum-preserving string sets. J Comput Biol. 2021;28(4):381\u201394. https:\/\/doi.org\/10.1089\/cmb.2020.0431.","journal-title":"J Comput Biol."},{"issue":"3","key":"906_CR57","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1038\/nbt.3442","volume":"34","author":"B Solomon","year":"2016","unstructured":"Solomon B, Kingsford C. Fast search of thousands of short-read sequencing experiments. Nat Biotechnol. 2016;34(3):300\u20132. https:\/\/doi.org\/10.1038\/nbt.3442.","journal-title":"Nat Biotechnol."},{"issue":"10","key":"906_CR58","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1186\/gb-2009-10-10-r108","volume":"10","author":"B Chor","year":"2009","unstructured":"Chor B, Horn D, Goldman N, Levy Y, Massingham T, et al. Genomic DNA k-mer spectra: models and modalities. Genome Biol. 2009;10(10):108.","journal-title":"Genome Biol."},{"issue":"10","key":"906_CR59","doi-asserted-by":"publisher","first-page":"5217","DOI":"10.1093\/nar\/gkaa265","volume":"48","author":"RAL Elworth","year":"2020","unstructured":"Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen TJ. To petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res. 2020;48(10):5217\u201334. https:\/\/doi.org\/10.1093\/nar\/gkaa265.","journal-title":"Nucleic Acids Res."},{"key":"906_CR60","unstructured":"Raimondi D, Fumagalli G. A Critical Analysis of Classifier Selection in Learned Bloom Filters\u2014Supporting Software. https:\/\/github.com\/RaimondiD\/LBF_ADABF_experiment. Last checked on May, 2023; 2023."},{"key":"906_CR61","unstructured":"Dai Z. Adaptive Learned Bloom Filter (ADA-BF): Efficient Utilization of the Classifier. https:\/\/github.com\/DAIZHENWEI\/Ada-BF. Last checked on November\u00a08, 2022; 2022."},{"key":"906_CR62","unstructured":"Python Software Foundation: pickle\u2014Python object serialization. https:\/\/docs.python.org\/3\/library\/pickle.html. Last checked on May\u00a017, 2022 (2022)"},{"key":"906_CR63","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.11.072","author":"GC Marin\u00f2","year":"2022","unstructured":"Marin\u00f2 GC, Petrini A, Malchiodi D, Frasca M. Deep neural networks compression: a comparative survey and choice recommendations. Neurocomputing. 2022. https:\/\/doi.org\/10.1016\/j.neucom.2022.11.072.","journal-title":"Neurocomputing."},{"key":"906_CR64","unstructured":"Raudys S. On the problems of sample size in pattern recognition. In: Detection, pattern recognition and experiment design: Vol. 2. Proceedings of the 2nd All-union conference statistical methods in control theory (1970). Publ. House \u201cNauka\u201d."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-00906-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-024-00906-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-00906-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T22:36:56Z","timestamp":1731623816000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-024-00906-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,27]]},"references-count":64,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["906"],"URL":"https:\/\/doi.org\/10.1186\/s40537-024-00906-9","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-2919738\/v1","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,27]]},"assertion":[{"value":"11 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"14 March 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 March 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interest"}}],"article-number":"45"}}