{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:12:30Z","timestamp":1740147150822,"version":"3.37.3"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T00:00:00Z","timestamp":1701648000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T00:00:00Z","timestamp":1701648000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005743","name":"Universit\u00e0 Cattolica del Sacro Cuore","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005743","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Adv Data Anal Classif"],"published-print":{"date-parts":[[2024,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>After being trained on a fully-labeled training set, where the observations are grouped into a certain number of known classes, novelty detection methods aim to classify the instances of an unlabeled test set while allowing for the presence of previously unseen classes. These models are valuable in many areas, ranging from social network and food adulteration analyses to biology, where an evolving population may be present. In this paper, we focus on a two-stage Bayesian semiparametric novelty detector, also known as Brand, recently introduced in the literature. Leveraging on a model-based mixture representation, Brand allows clustering the test observations into known training terms or a single novelty term. Furthermore, the novelty term is modeled with a Dirichlet Process mixture model to flexibly capture any departure from the known patterns. Brand was originally estimated using MCMC schemes, which are prohibitively costly when applied to high-dimensional data. To scale up Brand applicability to large datasets, we propose to resort to a variational Bayes approach, providing an efficient algorithm for posterior approximation. We demonstrate a significant gain in efficiency and excellent classification performance with thorough simulation studies. Finally, to showcase its applicability, we perform a novelty detection analysis using the openly-available  dataset, a large collection of satellite imaging spectra, to search for novel soil types.<\/jats:p>","DOI":"10.1007\/s11634-023-00569-z","type":"journal-article","created":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T07:02:46Z","timestamp":1701673366000},"page":"681-703","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Variational inference for semiparametric Bayesian novelty detection in large datasets"],"prefix":"10.1007","volume":"18","author":[{"given":"Luca","family":"Benedetti","sequence":"first","affiliation":[]},{"given":"Eric","family":"Boniardi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2491-6290","authenticated-orcid":false,"given":"Leonardo","family":"Chiani","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4181-5797","authenticated-orcid":false,"given":"Jacopo","family":"Ghirri","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3083-9494","authenticated-orcid":false,"given":"Marta","family":"Mastropietro","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9348-710X","authenticated-orcid":false,"given":"Andrea","family":"Cappozzo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2978-4702","authenticated-orcid":false,"given":"Francesco","family":"Denti","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,12,4]]},"reference":[{"key":"569_CR1","doi-asserted-by":"publisher","unstructured":"Aliverti E, Russo M (2022) Stratified stochastic variational inference for high-dimensional network factor model. J Comput Graph Stat 31(2):502\u2013511. https:\/\/doi.org\/10.1080\/10618600.2021.1984929, arXiv:2006.14217","DOI":"10.1080\/10618600.2021.1984929"},{"key":"569_CR2","doi-asserted-by":"crossref","unstructured":"Blei DM, Jordan MI (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal 1(1):121\u2013144. http:\/\/www.cs.berkeley.edu\/$sim$blei\/","DOI":"10.1214\/06-BA104"},{"issue":"518","key":"569_CR3","doi-asserted-by":"publisher","first-page":"859","DOI":"10.1080\/01621459.2017.1285773","volume":"112","author":"DM Blei","year":"2017","unstructured":"Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: a review for statisticians. J Am Stat Assoc 112(518):859\u2013877. https:\/\/doi.org\/10.1080\/01621459.2017.1285773","journal-title":"J Am Stat Assoc"},{"key":"569_CR4","doi-asserted-by":"publisher","unstructured":"Boudt K, Rousseeuw PJ, Vanduffel S et\u00a0al (2020) The minimum regularized covariance determinant estimator. Stat Comput 30(1):113\u2013128. https:\/\/doi.org\/10.1007\/s11222-019-09869-x, arXiv:1701.07086","DOI":"10.1007\/s11222-019-09869-x"},{"issue":"1","key":"569_CR5","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1007\/s00357-014-9147-x","volume":"31","author":"C Bouveyron","year":"2014","unstructured":"Bouveyron C (2014) Adaptive mixture discriminant analysis for supervised learning with unobserved classes. J Classif 31(1):49\u201384. https:\/\/doi.org\/10.1007\/s00357-014-9147-x. (link.springer.com\/content\/pdf\/10.1007\/s00357-014-9147-x.pdf)","journal-title":"J Classif"},{"issue":"8","key":"569_CR6","doi-asserted-by":"publisher","first-page":"e1001,127","DOI":"10.1371\/journal.pbio.1001127","volume":"9","author":"M Camilo","year":"2011","unstructured":"Camilo M, Derek PT, Sina A et al (2011) How many species are there on earth and in the ocean? PLoS Biol 9(8):e1001,127","journal-title":"PLoS Biol"},{"issue":"5","key":"569_CR7","doi-asserted-by":"publisher","first-page":"1545","DOI":"10.1007\/s11222-020-09959-1","volume":"30","author":"A Cappozzo","year":"2020","unstructured":"Cappozzo A, Greselin F, Murphy TB (2020) Anomaly and Novelty detection for robust semi-supervised learning. Stat Comput 30(5):1545\u20131571. https:\/\/doi.org\/10.1007\/s11222-020-09959-1. arxiv.org\/abs\/1911.08381 link.springer.com\/10.1007\/s11222-020-09959-1","journal-title":"Stat Comput"},{"issue":"3","key":"569_CR8","doi-asserted-by":"publisher","first-page":"201","DOI":"10.11646\/phytotaxa.261.3.1","volume":"261","author":"MJ Christenhusz","year":"2016","unstructured":"Christenhusz MJ, Byng JW (2016) The number of known plants species in the world and its annual increase. Phytotaxa 261(3):201\u2013217. https:\/\/doi.org\/10.11646\/phytotaxa.261.3.1","journal-title":"Phytotaxa"},{"issue":"4","key":"569_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11222-021-10017-7","volume":"31","author":"F Denti","year":"2021","unstructured":"Denti F, Cappozzo A, Greselin F (2021) A two-stage Bayesian semiparametric model for novelty detection with robust prior information. Stat Comput 31(4):1\u201319","journal-title":"Stat Comput"},{"issue":"430","key":"569_CR10","doi-asserted-by":"publisher","first-page":"577","DOI":"10.1080\/01621459.1995.10476550","volume":"90","author":"MD Escobar","year":"1995","unstructured":"Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577\u2013588. https:\/\/doi.org\/10.1080\/01621459.1995.10476550","journal-title":"J Am Stat Assoc"},{"key":"569_CR11","doi-asserted-by":"publisher","unstructured":"Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1(2):209\u2013230. https:\/\/doi.org\/10.1214\/aos\/1176342360, arXiv:arXiv:1011.1669v3","DOI":"10.1214\/aos\/1176342360"},{"issue":"3","key":"569_CR12","doi-asserted-by":"publisher","first-page":"336","DOI":"10.1111\/ele.12731","volume":"20","author":"W Finsinger","year":"2017","unstructured":"Finsinger W, Giesecke T, Brewer S et al (2017) Emergence patterns of novelty in European vegetation assemblages over the past 15000 years. Ecol Lett 20(3):336\u2013346. https:\/\/doi.org\/10.1111\/ele.12731","journal-title":"Ecol Lett"},{"key":"569_CR13","doi-asserted-by":"publisher","unstructured":"Fop M, Mattei PA, Bouveyron C et\u00a0al (2022) Unobserved classes and extra variables in high-dimensional discriminant analysis. Adv Data Anal Classif 16(1):55\u201392. https:\/\/doi.org\/10.1007\/s11634-021-00474-3, arXiv:2102.01982","DOI":"10.1007\/s11634-021-00474-3"},{"issue":"383","key":"569_CR14","doi-asserted-by":"publisher","first-page":"553","DOI":"10.1080\/01621459.1983.10478008","volume":"78","author":"EB Fowlkes","year":"1983","unstructured":"Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553\u2013569. https:\/\/doi.org\/10.1080\/01621459.1983.10478008","journal-title":"J Am Stat Assoc"},{"key":"569_CR15","unstructured":"Hinton G, van\u00a0der Maaten L (2008) Visualizing Data using t-SNE. J Mach Learn Res 9:2579\u20132605. http:\/\/jmlr.org\/papers\/volume9\/vandermaaten08a\/vandermaaten08a.pdf%0Ahttp:\/\/www.jmlr.org\/papers\/v9\/vandermaaten08a.html%5Cnfile:\/\/\/Files\/63\/63E4B948-D809-4073-8CE0-E56194C96FD8.pdf"},{"key":"569_CR16","unstructured":"Hoffman M, Wang C, Paisley J (2003) Stochastic variational inference. J Mach Learn Res 1\u201352. arXiv:arXiv:1206.7051v1"},{"issue":"1","key":"569_CR17","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/BF01908075","volume":"2","author":"L Hubert","year":"1985","unstructured":"Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193\u2013218. https:\/\/doi.org\/10.1007\/BF01908075","journal-title":"J Classif"},{"key":"569_CR18","doi-asserted-by":"publisher","unstructured":"Hubert M, Debruyne M, Rousseeuw PJ (2018) Minimum covariance determinant and extensions. Wiley Interdisciplinary Reviews: Comput Stat 10(3):1\u201311. https:\/\/doi.org\/10.1002\/wics.1421, arXiv:1709.07045","DOI":"10.1002\/wics.1421"},{"issue":"1","key":"569_CR19","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1023\/A:1008932416310","volume":"10","author":"TS Jaakkola","year":"2000","unstructured":"Jaakkola TS, Jordan MI (2000) Bayesian parameter estimation via variational methods. Stat Comput 10(1):25\u201337. https:\/\/doi.org\/10.1023\/A:1008932416310","journal-title":"Stat Comput"},{"issue":"2","key":"569_CR20","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1023\/A:1007665907178","volume":"37","author":"MI Jordan","year":"1999","unstructured":"Jordan MI, Ghahramani Z, Jaakkola TS et al (1999) Introduction to variational methods for graphical models. Mach Learn 37(2):183\u2013233. https:\/\/doi.org\/10.1023\/A:1007665907178","journal-title":"Mach Learn"},{"issue":"1","key":"569_CR21","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1007\/s11222-009-9150-y","volume":"21","author":"M Kalli","year":"2011","unstructured":"Kalli M, Griffin JE, Walker SG (2011) Slice sampling mixture models. Stat Comput 21(1):93\u2013105. https:\/\/doi.org\/10.1007\/s11222-009-9150-y","journal-title":"Stat Comput"},{"issue":"12","key":"569_CR22","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1016\/j.sigpro.2003.07.018","volume":"83","author":"M Markou","year":"2003","unstructured":"Markou M, Singh S (2003) Novelty detection: a review-part 1: statistical approaches. Signal Process 83(12):2481\u20132497. https:\/\/doi.org\/10.1016\/j.sigpro.2003.07.018. (linkinghub.elsevier.com\/retrieve\/pii\/S0165168403002020)","journal-title":"Signal Process"},{"key":"569_CR23","doi-asserted-by":"publisher","unstructured":"Markou M, Singh S (2003b) Novelty detection: a review-part 2. Signal Process 83(12):2499\u20132521. https:\/\/doi.org\/10.1016\/j.sigpro.2003.07.019, https:\/\/linkinghub.elsevier.com\/retrieve\/pii\/S0165168403002032","DOI":"10.1016\/j.sigpro.2003.07.019"},{"key":"569_CR24","unstructured":"Nieman D, Szabo B, van Zanten H (2022) Contraction rates for sparse variational approximations in Gaussian process regression. J Mach Learn Res 23:1\u201326. arxiv:2109.10755"},{"issue":"2","key":"569_CR25","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1198\/tast.2010.09058","volume":"64","author":"JT Ormerod","year":"2010","unstructured":"Ormerod JT, Wand MP (2010) Explaining variational approximations. Am Stat 64(2):140\u2013153. https:\/\/doi.org\/10.1198\/tast.2010.09058","journal-title":"Am Stat"},{"key":"569_CR26","doi-asserted-by":"publisher","unstructured":"Ray K, Szab\u00f3 B (2022) Variational Bayes for high-dimensional linear regression with sparse priors. J Am Stat Assoc 117(539):1270\u20131281. https:\/\/doi.org\/10.1080\/01621459.2020.1847121, arXiv:1904.07150","DOI":"10.1080\/01621459.2020.1847121"},{"issue":"2","key":"569_CR27","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1002\/asmb.2736","volume":"39","author":"T Rigon","year":"2023","unstructured":"Rigon T (2023) An enriched mixture model for functional clustering. Appl Stoch Models Bus Ind 39(2):232\u2013250","journal-title":"Appl Stoch Models Bus Ind"},{"issue":"3","key":"569_CR28","doi-asserted-by":"publisher","first-page":"212","DOI":"10.1080\/00401706.1999.10485670","volume":"41","author":"PJ Rousseeuw","year":"1999","unstructured":"Rousseeuw PJ, Driessen KV (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212\u2013223. https:\/\/doi.org\/10.1080\/00401706.1999.10485670","journal-title":"Technometrics"},{"key":"569_CR29","unstructured":"Sethuraman J (1994) A constructive definition of Dirichlet Process prior. Statistica Sinica 4(2):639\u2013650. http:\/\/www.jstor.org\/stable\/24305538"},{"key":"569_CR30","doi-asserted-by":"publisher","unstructured":"Vatanen T, Kuusela M, Malmi E et\u00a0al (2012) Semi-supervised detection of collective anomalies with an application in high energy particle physics. In: Proceedings of the international joint conference on neural networks, https:\/\/doi.org\/10.1109\/IJCNN.2012.6252712, http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.223.215 &rep=rep1 &type=pdf","DOI":"10.1109\/IJCNN.2012.6252712"},{"key":"569_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1553374.1553511","volume":"382","author":"NX Vinh","year":"2009","unstructured":"Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? ACM International Conference Proceeding Series 382:1\u20138. https:\/\/doi.org\/10.1145\/1553374.1553511","journal-title":"ACM International Conference Proceeding Series"},{"key":"569_CR32","unstructured":"Wang B, Titterington D (2012) Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values. In: Chickering M, Halpern J (eds) Proceedings of the 20th conference in uncertainty in Artificial Intelligence. AUAI Press"},{"key":"569_CR33","unstructured":"Wang C (2012) Variational inference in nonconjugate models. J Mach Learn Res arXiv:arXiv:1209.4360v2"},{"issue":"1604","key":"569_CR34","doi-asserted-by":"publisher","first-page":"2864","DOI":"10.1098\/rstb.2011.0354","volume":"367","author":"M Woolhouse","year":"2012","unstructured":"Woolhouse M, Scott F, Hudson Z et al (2012) Human viruses: discovery and emeraence. Philos Trans Roy Soc B: Biol Sci 367(1604):2864\u20132871. https:\/\/doi.org\/10.1098\/rstb.2011.0354","journal-title":"Philos Trans Roy Soc B: Biol Sci"},{"key":"569_CR35","doi-asserted-by":"publisher","unstructured":"Zhang C, Butepage J, Kjellstrom H et\u00a0al (2019) Advances in Variational Inference. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(8):2008\u20132026. https:\/\/doi.org\/10.1109\/TPAMI.2018.2889774, arXiv:1711.05597","DOI":"10.1109\/TPAMI.2018.2889774"}],"container-title":["Advances in Data Analysis and Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-023-00569-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11634-023-00569-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-023-00569-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T03:41:02Z","timestamp":1726717262000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11634-023-00569-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,4]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,9]]}},"alternative-id":["569"],"URL":"https:\/\/doi.org\/10.1007\/s11634-023-00569-z","relation":{},"ISSN":["1862-5347","1862-5355"],"issn-type":[{"type":"print","value":"1862-5347"},{"type":"electronic","value":"1862-5355"}],"subject":[],"published":{"date-parts":[[2023,12,4]]},"assertion":[{"value":"28 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 October 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 December 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}