{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T06:59:41Z","timestamp":1776754781402,"version":"3.51.2"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T00:00:00Z","timestamp":1663545600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T00:00:00Z","timestamp":1663545600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Australian Centre of Excellence for Mathematical and Statistical Frontiers","award":["CE140100049"],"award-info":[{"award-number":["CE140100049"]}]},{"name":"Australian Centre of Excellence for Mathematical and Statistical Frontiers","award":["CE140100049"],"award-info":[{"award-number":["CE140100049"]}]},{"name":"Australian Centre of Excellence for Mathematical and Statistical Frontiers","award":["CE140100049"],"award-info":[{"award-number":["CE140100049"]}]},{"name":"Australian Research Council Discovery Project Scheme","award":["DP160102544"],"award-info":[{"award-number":["DP160102544"]}]},{"name":"Australian Research Council Discovery Project Scheme","award":["DP160102544"],"award-info":[{"award-number":["DP160102544"]}]},{"name":"Australian Research Council Fellowship","award":["FT170100079"],"award-info":[{"award-number":["FT170100079"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Adv Data Anal Classif"],"published-print":{"date-parts":[[2023,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Symbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e.\u00a0<jats:italic>symbols<\/jats:italic>), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new research direction in the field of SDA. We assume that, as with a standard statistical analysis, inference is required at the level of individual-level data. However, the individual-level data are unobserved, and are aggregated into observed symbols\u2014group-based distributional-valued summaries\u2014prior to the analysis. We introduce a novel general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries. This approach opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. We illustrate this new direction for SDA research through several real and simulated data analyses, including a study of novel classes of multivariate symbol construction techniques.<\/jats:p>","DOI":"10.1007\/s11634-022-00520-8","type":"journal-article","created":{"date-parts":[[2022,9,19]],"date-time":"2022-09-19T19:05:38Z","timestamp":1663614338000},"page":"659-699","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["New models for symbolic data analysis"],"prefix":"10.1007","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7944-3925","authenticated-orcid":false,"given":"Boris","family":"Beranger","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1668-2099","authenticated-orcid":false,"given":"Huan","family":"Lin","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8943-067X","authenticated-orcid":false,"given":"Scott","family":"Sisson","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,9,19]]},"reference":[{"key":"520_CR1","doi-asserted-by":"publisher","first-page":"697","DOI":"10.1214\/07-AOS574","volume":"37","author":"C Andrieu","year":"2009","unstructured":"Andrieu C, Roberts GO (2009) The pseudo-marginal approach for efficient Monte Carlo computations. Ann Stat 37:697\u2013725","journal-title":"Ann Stat"},{"key":"520_CR2","unstructured":"Bardenet R, Doucet A, Holmes C (2014) Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 405\u2013413"},{"key":"520_CR3","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1002\/sam.10115","volume":"4","author":"L Billard","year":"2011","unstructured":"Billard L (2011) Brief overview of symbolic data and analytic issues. Stat Anal Data Min 4:149\u2013156","journal-title":"Stat Anal Data Min"},{"key":"520_CR4","doi-asserted-by":"publisher","first-page":"470","DOI":"10.1198\/016214503000242","volume":"98","author":"L Billard","year":"2003","unstructured":"Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470\u2013487","journal-title":"J Am Stat Assoc"},{"key":"520_CR5","volume-title":"Symbolic data analysis. Wiley Series in Computational Statistics","author":"L Billard","year":"2006","unstructured":"Billard L, Diday E (2006) Symbolic data analysis. Wiley Series in Computational Statistics. Wiley, Chichester"},{"key":"520_CR6","doi-asserted-by":"publisher","first-page":"57","DOI":"10.6000\/1929-6029.2015.04.01.6","volume":"4","author":"M Bland","year":"2015","unstructured":"Bland M (2015) Estimating mean and standard deviation from the sample size, three quartiles, minimum and maximum. Int J Stat Med Res 4:57\u201364","journal-title":"Int J Stat Med Res"},{"key":"520_CR7","volume-title":"Analysis of symbolic data","year":"2000","unstructured":"Bock HH, Diday E (eds) (2000) Analysis of symbolic data. Springer, Berlin"},{"key":"520_CR8","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1080\/02664763.2011.575125","volume":"39","author":"P Brito","year":"2012","unstructured":"Brito P, Duarte Silva AP (2012) Modelling interval data with normal and skew-normal distributions. J Appl Stat 39:3\u201320","journal-title":"J Appl Stat"},{"key":"520_CR9","unstructured":"Cariou V, Billard L (2015) Generalization method when manipulating relational databases. In: Brito P, Venturini G (eds) Symbolic data analysis & visualisation, RNTI-E-29, pp 59\u201388"},{"key":"520_CR10","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1002\/sam.11260","volume":"8","author":"S Dias","year":"2015","unstructured":"Dias S, Brito P (2015) Linear regression model with histogram-valued variables. Stat Anal Data Min 8:75\u2013113","journal-title":"Stat Anal Data Min"},{"issue":"3","key":"520_CR11","doi-asserted-by":"publisher","first-page":"1118","DOI":"10.1016\/j.ejor.2016.09.006","volume":"258","author":"S Dias","year":"2017","unstructured":"Dias S, Brito P (2017) Off the beaten track: a new linear model for interval data. Eur J Oper Res 258(3):1118\u20131130","journal-title":"Eur J Oper Res"},{"key":"520_CR12","unstructured":"Diday E (1988) The symbolic approach in clustering and related methods of data analysis: the basic choices. In: Brock HH (ed) Classification and related methods of data analysis, proceedings of IFCS87, pp 673\u2013684"},{"key":"520_CR13","doi-asserted-by":"publisher","first-page":"516","DOI":"10.1007\/s00357-015-9189-8","volume":"32","author":"AP Duarte Silva","year":"2015","unstructured":"Duarte Silva AP, Brito P (2015) Discriminant analysis of interval data: an assessment of parametric and distance-based approaches. J Classif 32:516\u2013541","journal-title":"J Classif"},{"key":"520_CR14","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1016\/j.cub.2014.12.022","volume":"25","author":"R Fisher","year":"2015","unstructured":"Fisher R, O\u2019Leary RA, Low-Choy S, Mengersen K, Knowlton N, Brainard RE, Caley MJ (2015) Species richness on coral reefs and the pursuit of convergent global estimates. Curr Biol 25:500\u2013505","journal-title":"Curr Biol"},{"key":"520_CR15","doi-asserted-by":"publisher","DOI":"10.1201\/b16018","volume-title":"Bayesian data analysis","author":"A Gelman","year":"2013","unstructured":"Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall, Boca Raton","edition":"3"},{"key":"520_CR16","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1002\/sta4.7","volume":"1","author":"S Guha","year":"2012","unstructured":"Guha S, Hafen R, Rounds J, Xia J, Li J, Xi B, Cleveland WS (2012) Large complex data: divide and recombine (D &R) with RHIPE. Stat 1:53\u201367","journal-title":"Stat"},{"key":"520_CR17","doi-asserted-by":"publisher","first-page":"2244","DOI":"10.1214\/aos\/1176348396","volume":"19","author":"DF Heitjan","year":"1991","unstructured":"Heitjan DF, Rubin DB (1991) Ignorability and coarse data. Ann Stat 19:2244\u20132253","journal-title":"Ann Stat"},{"key":"520_CR18","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1186\/1471-2288-5-13","volume":"5","author":"SP Hozo","year":"2005","unstructured":"Hozo SP, Djulbegovic B, Hozo I (2005) Estimating the mean and variance from the median, range and the size of a sample. BMC Med Res Methodol 5:13","journal-title":"BMC Med Res Methodol"},{"key":"520_CR19","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1007\/s11634-016-0245-y","volume":"11","author":"K Hron","year":"2017","unstructured":"Hron K, Brito P, Filzmoser P (2017) Exploratory data analysis for interval compositional data. Adv Data Anal Class 11:223\u2013241","journal-title":"Adv Data Anal Class"},{"key":"520_CR20","doi-asserted-by":"publisher","first-page":"184","DOI":"10.1002\/sam.10111","volume":"4","author":"M Ichino","year":"2011","unstructured":"Ichino M (2011) The quantile method for symbolic principal component analysis. Stat Anal Data Min 4:184\u2013198","journal-title":"Stat Anal Data Min"},{"key":"520_CR21","doi-asserted-by":"crossref","unstructured":"Ioannidis Y (2003) The history of histograms (abridged). In: Freytag JC, Lockemann P, Abiteboul S, Carey M, Selinger P, Heuer A (eds) Proceedings of the VLDB conferences. Morgan Kaufmann, pp 19\u201330","DOI":"10.1016\/B978-012722442-8\/50011-2"},{"key":"520_CR22","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/s11634-015-0197-7","volume":"9","author":"A Irpino","year":"2015","unstructured":"Irpino A, Verde R (2015) Linear regression for numeric symbolic variables: a least squares approach based on Wasserstein distance. Adv Data Anal Classif 9:81\u2013106","journal-title":"Adv Data Anal Classif"},{"key":"520_CR23","doi-asserted-by":"publisher","first-page":"668","DOI":"10.1080\/01621459.2018.1429274","volume":"114","author":"MI Jordan","year":"2019","unstructured":"Jordan MI, Lee JD, Yang Y (2019) Communication-efficient distributed statistical inference. J Am Stat Assoc 114:668\u2013681","journal-title":"J Am Stat Assoc"},{"key":"520_CR24","first-page":"1","volume":"11","author":"K Kosmelj","year":"2014","unstructured":"Kosmelj K, Le-Rademacher J, Billard L (2014) Symbolic covariance matrix for interval-valued variables and its application to principal component analysis: a case study. Metod Zvezki 11:1\u201320","journal-title":"Metod Zvezki"},{"key":"520_CR25","doi-asserted-by":"publisher","first-page":"1593","DOI":"10.1016\/j.jspi.2010.11.016","volume":"141","author":"J Le-Rademacher","year":"2011","unstructured":"Le-Rademacher J, Billard L (2011) Likelihood functions and some maximum likelihood estimators for symbolic data. J Stat Plan Inference 141:1593\u20131602","journal-title":"J Stat Plan Inference"},{"key":"520_CR26","unstructured":"Le-Rademacher J, Billard L (2013) Principal component analysis for histogram-valued data. Advances in data analysis and classification, pp 1\u201325"},{"key":"520_CR27","doi-asserted-by":"publisher","first-page":"e05617","DOI":"10.1111\/ecog.05617","volume":"2022","author":"H Lin","year":"2022","unstructured":"Lin H, Caley MJ, Sisson SA (2022) Estimating global species richness using symbolic data meta-analysis. Ecography 2022:e05617","journal-title":"Ecography"},{"key":"520_CR28","doi-asserted-by":"publisher","first-page":"694","DOI":"10.1016\/j.csda.2015.07.008","volume":"100","author":"W Lin","year":"2016","unstructured":"Lin W, Gonz\u00e1lez-Rivera G (2016) Interval-valued time series models: estimation based on order statistics exploring the Agriculture Marketing Service data. Comput Stat Data Anal 100:694\u2013711","journal-title":"Comput Stat Data Anal"},{"key":"520_CR29","doi-asserted-by":"publisher","first-page":"1785","DOI":"10.1177\/0962280216669183","volume":"27","author":"D Luo","year":"2018","unstructured":"Luo D, Wan X, Liu J, Tong T (2018) Optimally estimating the sample mean from the sample size, median, mid-range, and\/or mid-quartile range. Stat Methods Med Res 27:1785\u20131805","journal-title":"Stat Methods Med Res"},{"key":"520_CR30","doi-asserted-by":"publisher","first-page":"571","DOI":"10.2307\/2531869","volume":"44","author":"GJ McLachlan","year":"1988","unstructured":"McLachlan GJ, Jones PN (1988) Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics 44:571\u2013578","journal-title":"Biometrics"},{"key":"520_CR31","doi-asserted-by":"crossref","unstructured":"Mousavi H, Zaniolo C (2011) Fast and accurate computation of equi-depth histograms over data streams. In: Proceedings of the 14th international conference on extending database technology, pp 69\u201380","DOI":"10.1145\/1951365.1951376"},{"key":"520_CR32","doi-asserted-by":"publisher","first-page":"1727","DOI":"10.1080\/00949655.2010.500470","volume":"81","author":"EAL Neto","year":"2011","unstructured":"Neto EAL, Corderio GM, de Carvalho FAT (2011) Bivarite symbolic regression models for interval-valued variables. J Stat Comput Simul 81:1727\u20131744","journal-title":"J Stat Comput Simul"},{"key":"520_CR33","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1002\/sam.10112","volume":"4","author":"M Noirhomme-Fraiture","year":"2011","unstructured":"Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min 4:157\u2013170","journal-title":"Stat Anal Data Min"},{"key":"520_CR34","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1080\/10618600.2017.1307117","volume":"27","author":"M Quiroz","year":"2018","unstructured":"Quiroz M, Tran MN, Villani M, Kohn R (2018) Speeding up MCMC by delayed acceptance and data subsampling. J Comput Graph Stat 27:12\u201322","journal-title":"J Comput Graph Stat"},{"issue":"526","key":"520_CR35","doi-asserted-by":"publisher","first-page":"831","DOI":"10.1080\/01621459.2018.1448827","volume":"114","author":"M Quiroz","year":"2019","unstructured":"Quiroz M, Kohn R, Villani M, Tran MN (2019) Speeding up mcmc by efficient data subsampling. J Am Stat Assoc 114(526):831\u2013843","journal-title":"J Am Stat Assoc"},{"key":"520_CR36","doi-asserted-by":"publisher","first-page":"571","DOI":"10.1109\/TSIPN.2022.3188457","volume":"8","author":"P Rahman","year":"2022","unstructured":"Rahman P, Beranger B, Sisson S, Roughan M (2022) Likelihood-based inference for modelling packet transit from thinned flow summaries. IEEE Trans Signal Inf Process Netw 8:571\u2013583. https:\/\/doi.org\/10.1109\/TSIPN.2022.3188457","journal-title":"IEEE Trans Signal Inf Process Netw"},{"key":"520_CR37","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11222-019-09855-3","volume":"30","author":"LJ Rendell","year":"2020","unstructured":"Rendell LJ, Johansen AM, Lee A, Whiteley N (2020) Global consensus Monte Carlo. J Comput Graph Stat 30:1\u201329","journal-title":"J Comput Graph Stat"},{"key":"520_CR38","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1016\/j.csda.2016.05.009","volume":"103","author":"GS Rodrigues","year":"2016","unstructured":"Rodrigues GS, Nott DJ, Sisson SA (2016) Functional regression approximate Bayesian computation for Gaussian process density estimation. Comput Stat Data Anal 103:229\u2013241","journal-title":"Comput Stat Data Anal"},{"key":"520_CR39","doi-asserted-by":"publisher","first-page":"377","DOI":"10.3102\/10769986006004377","volume":"6","author":"DB Rubin","year":"1981","unstructured":"Rubin DB (1981) Estimation in parallel randomised experiments. J Educ Stat 6:377\u2013401","journal-title":"J Educ Stat"},{"key":"520_CR40","unstructured":"Schweizer B (1984) Distributions are the numbers of the future. In: Proceedings of the mathematics of fuzzy systems, pp 137\u2013149"},{"key":"520_CR41","unstructured":"Shi J, Luo D, Weng H, Zeng XT, Lin L, Tong T (2018) How to estimate the sample mean and standard deviation from the five number summary? arXiv:1801.01267"},{"key":"520_CR42","volume-title":"Handbook of approximate bayesian computation","year":"2018","unstructured":"Sisson SA, Fan Y, Beaumont MA (eds) (2018) Handbook of approximate bayesian computation. Chapman & Hall, Boca Raton"},{"key":"520_CR43","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1109\/TIM.2004.838912","volume":"54","author":"SB Vardeman","year":"2005","unstructured":"Vardeman SB, Lee CS (2005) Likelihood-based statistical estimation from quantised data. IEEE Trans Instrum Meas 54:409\u2013414","journal-title":"IEEE Trans Instrum Meas"},{"issue":"6","key":"520_CR44","doi-asserted-by":"publisher","first-page":"1648","DOI":"10.1109\/TSP.2019.2894825","volume":"67","author":"M Vono","year":"2019","unstructured":"Vono M, Dobigeon N, Chainais P (2019) Split-and-augmented Gibbs sampler\u2014application to large-scale inference problems. IEEE Trans Signal Process 67(6):1648\u20131661","journal-title":"IEEE Trans Signal Process"},{"key":"520_CR45","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1186\/1471-2288-14-135","volume":"14","author":"X Wan","year":"2014","unstructured":"Wan X, Wang W, Liu J, Tong T (2014) Estimating the sample mean and standard deviation from the sample size, median, range and\/or interquartile range. BMC Med Res Methodol 14:135","journal-title":"BMC Med Res Methodol"},{"key":"520_CR46","doi-asserted-by":"publisher","first-page":"1459","DOI":"10.1007\/s11222-020-09955-5","volume":"30","author":"T Whitaker","year":"2020","unstructured":"Whitaker T, Beranger B, Sisson SA (2020) Composite likelihood methods for histogram-valued random variables. Stat Comput 30:1459\u20131477","journal-title":"Stat Comput"},{"key":"520_CR47","doi-asserted-by":"publisher","first-page":"1049","DOI":"10.1080\/10618600.2021.1895816","volume":"30","author":"T Whitaker","year":"2021","unstructured":"Whitaker T, Beranger B, Sisson SA (2021) Logistic regression models for aggregated data. J Comput Graph Stat 30:1049\u20131067","journal-title":"J Comput Graph Stat"},{"issue":"1","key":"520_CR48","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1111\/sjos.12395","volume":"47","author":"X Zhang","year":"2020","unstructured":"Zhang X, Beranger B, Sisson SA (2020) Constructing likelihood functions for interval-valued random variables. Scand J Stat 47(1):1\u201335","journal-title":"Scand J Stat"}],"container-title":["Advances in Data Analysis and Classification"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-022-00520-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11634-022-00520-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11634-022-00520-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,7]],"date-time":"2023-08-07T17:25:51Z","timestamp":1691429151000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11634-022-00520-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,19]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["520"],"URL":"https:\/\/doi.org\/10.1007\/s11634-022-00520-8","relation":{},"ISSN":["1862-5347","1862-5355"],"issn-type":[{"value":"1862-5347","type":"print"},{"value":"1862-5355","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,19]]},"assertion":[{"value":"7 April 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 August 2022","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 August 2022","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 September 2022","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}