{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:35:19Z","timestamp":1760146519331,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T00:00:00Z","timestamp":1731628800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2105571"],"award-info":[{"award-number":["2105571"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>With the exponential growth of data across diverse fields, applying conventional statistical methods directly to large-scale datasets has become computationally infeasible. To overcome this challenge, subsampling algorithms are widely used to perform statistical analyses on smaller, more manageable subsets of the data. The effectiveness of these methods depends on their ability to identify and select data points that improve the estimation efficiency according to some optimality criteria. While much of the existing research has focused on subsampling techniques for independent data, there is considerable potential for developing methods tailored to dependent data, particularly in time-dependent contexts. In this study, we extend subsampling techniques to irregularly spaced time series data which are modeled by irregularly spaced autoregressive models. We present frameworks for various subsampling approaches, including optimal subsampling under A-optimality, information-based optimal subdata selection, and sequential thinning on streaming data. These methods use A-optimality or D-optimality criteria to assess the usefulness of each data point and prioritize the inclusion of the most informative ones. We then assess the performance of these subsampling methods using numerical simulations, providing insights into their suitability and effectiveness for handling irregularly spaced long time series. Numerical results show that our algorithms have promising performance. Their estimation efficiency can be ten times as high as that of the uniform sampling estimator. They also significantly reduce the computational time and can be up to forty times faster than the full-data estimator.<\/jats:p>","DOI":"10.3390\/a17110524","type":"journal-article","created":{"date-parts":[[2024,11,15]],"date-time":"2024-11-15T04:47:17Z","timestamp":1731646037000},"page":"524","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Subsampling Algorithms for Irregularly Spaced Autoregressive Models"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-3731-0167","authenticated-orcid":false,"given":"Jiaqi","family":"Liu","sequence":"first","affiliation":[{"name":"Department of Statistics, University of Connecticut, Storrs, CT 06269, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ziyang","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Connecticut, Storrs, CT 06269, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7729-0243","authenticated-orcid":false,"given":"HaiYing","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Connecticut, Storrs, CT 06269, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2028-4771","authenticated-orcid":false,"given":"Nalini","family":"Ravishanker","sequence":"additional","affiliation":[{"name":"Department of Statistics, University of Connecticut, Storrs, CT 06269, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,11,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"A120","DOI":"10.1051\/0004-6361\/201935560","article-title":"Discrete-time autoregressive model for unequally spaced time-series observations","volume":"627","author":"Elorrieta","year":"2019","journal-title":"Astron. Astrophys."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1016\/j.earscirev.2018.12.005","article-title":"Trend analysis of climate time series: A review of methods","volume":"190","author":"Mudelsee","year":"2019","journal-title":"Earth-Sci. Rev."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s13571-022-00280-7","article-title":"Review of statistical approaches for modeling high-frequency trading data","volume":"85","author":"Dutta","year":"2023","journal-title":"Sankhya B"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1214\/10-AOAS380","article-title":"An autoregressive approach to house price modeling","volume":"5","author":"Nagaraja","year":"2011","journal-title":"Ann. Appl. Stat."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Erdogan, E., Ma, S., Beygelzimer, A., and Rish, I. (2005, January 21\u201323). Statistical models for unequally spaced time series. Proceedings of the 2005 SIAM International Conference on Data Mining. SIAM, Newport Beach, CA, USA.","DOI":"10.1137\/1.9781611972757.74"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"e692","DOI":"10.1002\/sta4.692","article-title":"Hierarchical modeling of irregularly spaced financial returns","volume":"13","author":"Anantharaman","year":"2024","journal-title":"Stat"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1171","DOI":"10.1214\/aos\/1176350057","article-title":"The use of subseries values for estimating the variance of a general statistic from a stationary sequence","volume":"14","author":"Carlstein","year":"1986","journal-title":"Ann. Stat."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1093\/biomet\/86.3.591","article-title":"Subsampling and model selection in time series analysis","volume":"86","author":"Fukuchi","year":"1999","journal-title":"Biometrika"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1093\/biomet\/asad021","article-title":"Scalable subsampling: Computation, aggregation and inference","volume":"111","author":"Politis","year":"2023","journal-title":"Biometrika"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Shumway, R. (2000). Time Series Analysis and Its Applications, Springer.","DOI":"10.1007\/978-1-4757-3261-0"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1017\/S1743921317000448","article-title":"An autoregressive model for irregular time series of variable stars","volume":"12","author":"Eyheramendy","year":"2016","journal-title":"Proc. Int. Astron. Union"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1105","DOI":"10.1093\/mnras\/stab1216","article-title":"A novel bivariate autoregressive model for predicting and forecasting irregularly observed time series","volume":"505","author":"Elorrieta","year":"2021","journal-title":"Mon. Not. R. Astron. Soc."},{"key":"ref_13","first-page":"1","article-title":"GARCH for irregularly spaced financial data: The ACD-GARCH model","volume":"2","author":"Ghysels","year":"1998","journal-title":"Stud. Nonlinear Dyn. Econom."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1016\/j.econlet.2005.07.027","article-title":"GARCH and irregularly spaced data","volume":"90","author":"Meddahi","year":"2006","journal-title":"Econ. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1127","DOI":"10.2307\/2999632","article-title":"Autoregressive conditional duration: A new model for irregularly spaced transaction data","volume":"66","author":"Engle","year":"1998","journal-title":"Econometrica"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"519","DOI":"10.3150\/07-BEJ6189","article-title":"GARCH modelling in continuous time for irregularly spaced time series data","volume":"14","author":"Maller","year":"2008","journal-title":"Bernoulli"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"920","DOI":"10.1080\/07350015.2020.1739530","article-title":"A score-driven conditional correlation model for noisy and asynchronous data: An application to high-frequency covariance dynamics","volume":"39","author":"Buccheri","year":"2021","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_18","unstructured":"Dutta, C. (2022). Modeling Multiple Irregularly Spaced High-Frequency Financial Time Series. [Ph.D. Thesis, University of Connecticut]."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Drineas, P., Mahoney, M.W., and Muthukrishnan, S. (2006, January 22\u201324). Sampling algorithms for l2 regression and applications. Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, USA.","DOI":"10.1145\/1109557.1109682"},{"key":"ref_20","unstructured":"Yang, T., Zhang, L., Jin, R., and Zhu, S. (2015, January 7\u20139). An explicit sampling dependent spectral error bound for column subset selection. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_21","first-page":"861","article-title":"A statistical perspective on algorithmic leveraging","volume":"16","author":"Ma","year":"2015","journal-title":"J. Mach. Learn. Res."},{"key":"ref_22","unstructured":"Xie, R., Wang, Z., Bai, S., Ma, P., and Zhong, W. (2019, January 16\u201318). Online decentralized leverage score sampling for streaming multidimensional time series. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Okinawa, Japan."},{"key":"ref_23","first-page":"406","article-title":"Gradient-based sampling: An adaptive importance sampling for least-squares","volume":"29","author":"Zhu","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"829","DOI":"10.1080\/01621459.2017.1292914","article-title":"Optimal subsampling for large sample logistic regression","volume":"13","author":"Wang","year":"2018","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Teng, G., Tian, B., Zhang, Y., and Fu, S. (2022). Asymptotics of Subsampling for Generalized Linear Regression Models under Unbounded Design. Entropy, 25.","DOI":"10.3390\/e25010084"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"6605","DOI":"10.1109\/TIT.2022.3176955","article-title":"Sampling with replacement vs Poisson sampling: A comparative study in optimal subsampling","volume":"68","author":"Wang","year":"2022","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1080\/01621459.2017.1408468","article-title":"Information-based optimal subdata selection for big data linear regression","volume":"114","author":"Wang","year":"2019","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/j.jspi.2020.08.001","article-title":"Sequential online subsampling for thinning experimental designs","volume":"212","author":"Pronzato","year":"2021","journal-title":"J. Stat. Plan. Inference"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Casella, G., and Berger, R. (2024). Statistical Inference, CRC Press.","DOI":"10.1201\/9781003456285"},{"key":"ref_30","unstructured":"Kleinberg, J., and Tardos, E. (2006). Algorithm Design, Pearson\/Addison-Wesley."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wynn, H. (1982). Optimum Submeasures with Applications to Finite Population Sampling, Academic Press.","DOI":"10.1016\/B978-0-12-307502-4.50033-7"},{"key":"ref_32","unstructured":"Fedorov, V.V., and Hackl, P. (2012). Model-Oriented Design of Experiments, Springer Science & Business Media."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/17\/11\/524\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T16:32:45Z","timestamp":1760113965000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/17\/11\/524"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,15]]},"references-count":32,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2024,11]]}},"alternative-id":["a17110524"],"URL":"https:\/\/doi.org\/10.3390\/a17110524","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2024,11,15]]}}}