{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T08:28:53Z","timestamp":1761553733487,"version":"build-2065373602"},"reference-count":72,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T00:00:00Z","timestamp":1761264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"JSPS KAKENHI","award":["JP24K14859"],"award-info":[{"award-number":["JP24K14859"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Forecasting high-volume, univariate, and multivariate longitudinal data streams is a critical challenge in Big Data systems, especially with constrained computational resources and pronounced data variability. However, existing approaches often neglect multivariate statistical complexity (e.g., covariance, skewness, kurtosis) of multivariate time series or rely on recency-only windowing that discards informative historical fluctuation patterns, limiting robustness under strict resource budgets. This work makes two core contributions to big data forecasting. First, we establish a formal, multi-dimensional framework for quantifying \u201cdata bigness\u201d across statistical, computational, and algorithmic complexities, providing a rigorous foundation for analyzing resource-constrained problems. Second, guided by this framework, we extend and validate the Adaptive High-Fluctuation Recursive Segmentation (AHFRS) algorithm for multivariate time series. By incorporating higher-order statistics such as covariance, skewness, and kurtosis, AHFRS improves predictive accuracy under strict computational budgets. We validate the approach in two stages. First, a real-world case study on a univariate Bitcoin time series provides a practical stress test using a Long Short-Term Memory (LSTM) network as a robust baseline. This validation reveals a significant increase in forecasting robustness, with our method reducing the Root Mean Squared Error (RMSE) by more than 76% in a challenging scenario. Second, its generalizability is established on synthetic multivariate data sets in Finance, Retail, and Healthcare using standard statistical models. Across domains, AHFRS consistently outperforms baselines; in our multivariate Finance simulation, RMSE decreases by up to 62.5% in Finance and Mean Absolute Percentage Error (MAPE) drops by more than 10 percentage points in Healthcare. These results demonstrate that the proposed framework and AHFRS advances the theoretical modeling of data complexity and the design of adaptive, resource-efficient forecasting pipelines for real-world, high-volume data ecosystems.<\/jats:p>","DOI":"10.3390\/bdcc9110268","type":"journal-article","created":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T07:31:58Z","timestamp":1761550318000},"page":"268","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Adaptive Segmentation and Statistical Analysis for Multivariate Big Data Forecasting"],"prefix":"10.3390","volume":"9","author":[{"given":"Desmond","family":"Fomo","sequence":"first","affiliation":[{"name":"Graduate School of Data Science, Yokohama City University, Kanazawa-Hakkei Campus, 22-2 Seto, Kanazawa Ward, Yokohama 236-0027, Kanagawa, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7410-6324","authenticated-orcid":false,"given":"Aki-Hiro","family":"Sato","sequence":"additional","affiliation":[{"name":"Graduate School of Data Science, Yokohama City University, Kanazawa-Hakkei Campus, 22-2 Seto, Kanazawa Ward, Yokohama 236-0027, Kanagawa, Japan"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1108\/LR-06-2015-0061","article-title":"A Formal Definition of Big Data Based on its Essential Features","volume":"65","author":"Greco","year":"2016","journal-title":"Libr. Rev."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Ajah, I.A., and Nweke, H.F. (2019). Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications. Big Data Cogn. Comput., 3.","DOI":"10.3390\/bdcc3020032"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1016\/j.bushor.2017.01.004","article-title":"Big Data: Dimensions, evolution, impacts, and challenges","volume":"60","author":"Lee","year":"2017","journal-title":"Bus. Horiz."},{"key":"ref_4","unstructured":"Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety, META Group, Inc."},{"key":"ref_5","first-page":"75","article-title":"Mean Vector and Covariance Matrix Estimation for Big Data","volume":"3","author":"Wang","year":"2017","journal-title":"IEEE Trans. Big Data"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1093\/biomet\/57.3.519","article-title":"Measures of Multivariate Skewness and Kurtosis with Applications","volume":"57","author":"Mardia","year":"1970","journal-title":"Biometrika"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Fomo, D., and Sato, A.-H. (2024, January 8\u201310). High Fluctuation Based Recursive Segmentation for Big Data. Proceedings of the 2024 9th International Conference on Big Data Analytics (ICBDA), Tokyo, Japan.","DOI":"10.1109\/ICBDA61153.2024.10607308"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"De Mauro, A., Greco, M., and Grimaldi, M. (2014, January 1\u20134). What is Big Data? A Consensual Definition and a Review of Key Research Topics. Proceedings of the 4th International Conference on Integrated Information, Madrid, Spain.","DOI":"10.1063\/1.4907823"},{"key":"ref_9","unstructured":"(2024). Editorial: Rethinking Big Data: From 3Vs to Operational Complexity. Front. Big Data, 7."},{"key":"ref_10","first-page":"137","article-title":"Beyond the hype: Big Data concepts, methods, and analytics","volume":"35","author":"Gandomi","year":"2015","journal-title":"Int. J. Inf. Manag."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1038\/s41591-019-0414-6","article-title":"A Longitudinal Big Data Approach for Precision Health","volume":"25","author":"Contrepois","year":"2019","journal-title":"Nat. Med."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1186\/s40537-020-00329-2","article-title":"Predictive Big Data analytics for supply chain demand forecasting: Methods, applications, and research opportunities","volume":"7","author":"Seyedan","year":"2020","journal-title":"J. Big Data"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1175\/1520-0477(1998)079<0061:APGTWA>2.0.CO;2","article-title":"A Practical Guide to Wavelet Analysis","volume":"79","author":"Torrence","year":"1998","journal-title":"Bull. Am. Meteorol. Soc."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/0304-4076(86)90063-1","article-title":"Generalized Autoregressive Conditional Heteroskedasticity","volume":"31","author":"Bollerslev","year":"1986","journal-title":"J. Econom."},{"key":"ref_15","unstructured":"Bhandari, A., and Rahman, S. (2021). Big Data in Financial Markets: Algorithms, Analytics, and Applications, Springer Nature."},{"key":"ref_16","first-page":"1","article-title":"A Review Paper on Big Data and Hadoop","volume":"4","author":"Bhosale","year":"2014","journal-title":"Int. J. Sci. Res. Publ."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Fomo, D., and Sato, A.-H. (2024, January 15\u201318). Enhancing Big Data Analysis: A Recursive Window Segmentation Strategy for Multivariate Longitudinal Data. Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA.","DOI":"10.1109\/BigData62323.2024.10825192"},{"key":"ref_18","unstructured":"Jolliffe, I.T. (2002). Principal Component Analysis, Springer. [2nd ed.]."},{"key":"ref_19","first-page":"77","article-title":"Portfolio selection","volume":"7","author":"Markowitz","year":"1952","journal-title":"J. Financ."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.is.2014.07.006","article-title":"The rise of Big Data on cloud computing: Review and open research issues","volume":"47","author":"Hashem","year":"2015","journal-title":"Inf. Syst."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hirsa, A. (2016). Computational Methods in Finance, CRC Press. [2nd ed.].","DOI":"10.1201\/b12755"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/S1544-6123(03)00003-5","article-title":"On more robust estimation of skewness and kurtosis: Simulation and application to the S&P 500 index","volume":"1","author":"Kim","year":"2004","journal-title":"Financ. Res. Lett."},{"key":"ref_23","unstructured":"Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. (2009). Introduction to Algorithms, MIT Press. [3rd ed.]."},{"key":"ref_24","unstructured":"Garey, M.R., and Johnson, D.S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman."},{"key":"ref_25","unstructured":"Sipser, M. (2012). Introduction to the Theory of Computation, Cengage Learning. [3rd ed.]."},{"key":"ref_26","first-page":"1233","article-title":"Computational complexity of analyzing credit risk","volume":"20","author":"Bienstock","year":"1996","journal-title":"J. Bank. Financ."},{"key":"ref_27","unstructured":"Sabbirul, H. (2023). Retail Demand Forecasting: A Comparative Study for Multivariate Time Series. arXiv."},{"key":"ref_28","unstructured":"Hillier, F.S., and Lieberman, G.J. (2014). Introduction to Operations Research, McGraw-Hill. [10th ed.]."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press.","DOI":"10.1017\/CBO9780511574931"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1198\/106186004X12632","article-title":"A robust measure of skewness","volume":"13","author":"Brys","year":"2004","journal-title":"J. Comput. Graph. Stat."},{"key":"ref_31","unstructured":"Vazirani, V.V. (2001). Approximation Algorithms, Springer."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/0375-9601(95)00867-5","article-title":"A statistical measure of complexity","volume":"209","author":"Mancini","year":"1995","journal-title":"Phys. Lett. A"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1016\/S0375-9601(97)00855-4","article-title":"Measures of statistical complexity: Why?","volume":"238","author":"Feldman","year":"1998","journal-title":"Phys. Lett. A"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1103\/PhysRevLett.63.105","article-title":"Inferring statistical complexity","volume":"63","author":"Crutchfield","year":"1989","journal-title":"Phys. Rev. Lett."},{"key":"ref_35","first-page":"1","article-title":"Three approaches to the quantitative definition of information","volume":"1","author":"Kolmogorov","year":"1965","journal-title":"Probl. Inf. Transm."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1109\/TIT.1976.1055501","article-title":"On the complexity of finite sequences","volume":"22","author":"Lempel","year":"1976","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_37","first-page":"1","article-title":"The Statistical Complexity of Interactive Decision Making","volume":"24","author":"Foster","year":"2023","journal-title":"J. Mach. Learn. Res."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"5033","DOI":"10.1073\/pnas.91.11.5033","article-title":"A measure for brain complexity: Relating functional segregation and integration in the nervous system","volume":"91","author":"Tononi","year":"1994","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_39","unstructured":"Tableau (2023, January 31). Big Data Analytics: What it is, How it Works, Benefits, and Challenges. Available online: https:\/\/www.tableau.com\/learn\/articles\/big-data-analytics."},{"key":"ref_40","unstructured":"Simplilearn (2023, July 17). Challenges of Big Data: Basic Concepts, Case Study, and More. Available online: https:\/\/www.simplilearn.com\/challenges-of-big-data-article."},{"key":"ref_41","unstructured":"GeeksforGeeks (2023, July 17). Big Challenges with Big Data. Available online: https:\/\/www.geeksforgeeks.org\/big-challenges-with-big-data\/."},{"key":"ref_42","first-page":"13","article-title":"Exploring the Intersection of Machine Learning and Big Data: A Survey","volume":"7","author":"Hasan","year":"2024","journal-title":"Sensors"},{"key":"ref_43","unstructured":"ADA Asia (2024, January 19). Big Data Analytics: Challenges and Opportunities. Available online: https:\/\/www.adaglobal.com\/resources\/insights\/big-data-analytics-challenges-and-opportunities."},{"key":"ref_44","unstructured":"Datamation (2024, January 31). Top 7 Challenges of Big Data and Solutions. Available online: https:\/\/www.datamation.com\/big-data\/big-data-challenges\/."},{"key":"ref_45","unstructured":"Yusuf, I., Adams, C., and Abdullah, N.A. (2024, January 19\u201320). Current Challenges of Big Data Quality Management in Big Data Governance: A Literature Review. Proceedings of the Future Technologies Conference (FTC) 2024, Vancouver, BC, Canada."},{"key":"ref_46","first-page":"1","article-title":"Big Data Analytics: Challenges, Tools","volume":"3","author":"Kumar","year":"2015","journal-title":"Int. J. Innov. Res. Comput. Sci. Technol."},{"key":"ref_47","first-page":"421","article-title":"A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools","volume":"7","author":"Rathore","year":"2016","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_48","first-page":"23","article-title":"Operational NoSQL Systems: What\u2019s New and What\u2019s Next?","volume":"49","author":"Cattell","year":"2016","journal-title":"Computer"},{"key":"ref_49","unstructured":"3Pillar Global (2024, January 19). Current Issues and Challenges in Big Data Analytics. Available online: https:\/\/www.3pillarglobal.com\/insights\/current-issues-and-challenges-in-big-data-analytics\/."},{"key":"ref_50","first-page":"1","article-title":"A Challenging Tool for Research Questions in Big Data Analytics","volume":"3","author":"Sharma","year":"2022","journal-title":"Int. J. Res. Publ. Semin."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Bifet, A., and Gavald\u00e0, R. (2007, January 26\u201328). Learning from Time-Changing Data with Adaptive Windowing. Proceedings of the SIAM International Conference on Data Mining, Minneapolis, MN, USA.","DOI":"10.1137\/1.9781611972771.42"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1198\/073500104000000271","article-title":"Tests for Skewness, Kurtosis, and Normality for Time Series Data","volume":"23","author":"Bai","year":"2005","journal-title":"J. Bus. Econ. Stat."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Sato, A.-H. (2014). Segmentation Study of Foreign Exchange Market. Applied Data-Centric Social Sciences, Springer.","DOI":"10.1007\/978-4-431-54974-1"},{"key":"ref_54","unstructured":"JMP Statistical Discovery LLC (2024, July 24). Statistical Details for Change Point Detection. Available online: https:\/\/www.jmp.com\/support\/help\/en\/17.2\/index.shtml#page\/jmp\/change-point-detection.shtml."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1007\/s10115-016-0987-z","article-title":"A Survey of Methods for Time Series Change Point Detection","volume":"51","author":"Aminikhanghahi","year":"2017","journal-title":"Knowl. Inf. Syst."},{"key":"ref_56","unstructured":"Jordon, J., Szpruch, L., Horel, F., and Wiese, M. (2022). Synthetic Data\u2014What, Why and How?, The Alan Turing Institute."},{"key":"ref_57","unstructured":"Shadish, W.R., Cook, T.D., and Campbell, D.T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Houghton Mifflin."},{"key":"ref_58","unstructured":"U.S. Bureau of Labor Statistics (2025, March 23). Employment Cost Index Historical Data, Available online: https:\/\/www.bls.gov\/ncs\/ect\/."},{"key":"ref_59","unstructured":"Board of Governors of the Federal Reserve System (2025, March 23). Consumer Credit\u2014G.19. Monthly Statistical Release, Available online: https:\/\/www.federalreserve.gov\/releases\/g19\/."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1509\/jmkr.2005.42.4.415","article-title":"RFM and CLV: Using Iso-Value Curves for Customer Base Analysis","volume":"42","author":"Fader","year":"2005","journal-title":"J. Mark. Res."},{"key":"ref_61","unstructured":"Centers for Disease Control and Prevention (2025, March 19). About Adult BMI. Healthy Weight, Nutrition, and Physical Activity, Available online: https:\/\/www.cdc.gov\/bmi\/adult-calculator\/bmi-categories.html."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"e127","DOI":"10.1016\/j.jacc.2017.11.006","article-title":"2017 ACC\/AHA\/AAPA\/ABC\/ACPM\/AGS\/APhA\/ASH\/ASPC\/NMA\/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults","volume":"71","author":"Whelton","year":"2018","journal-title":"J. Am. Coll. Cardiol."},{"key":"ref_63","unstructured":"Xu, L., Skoularidou, M., Cuesta-Infante, A., and Veeramachaneni, K. (2019, January 8\u201314). Modeling Tabular Data using Conditional GAN. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, WA, Canada."},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"1449","DOI":"10.1080\/14697688.2019.1622295","article-title":"Universal features of price formation in financial markets","volume":"19","author":"Sirignano","year":"2019","journal-title":"Quant. Financ."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1002\/asmb.2209","article-title":"Deep learning in finance","volume":"33","author":"Heaton","year":"2017","journal-title":"Appl. Stoch. Model. Bus. Ind."},{"key":"ref_68","first-page":"276","article-title":"Forecasting individualized disease trajectories","volume":"9","author":"Alaa","year":"2018","journal-title":"Nat. Commun."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1038\/s41746-018-0029-1","article-title":"Scalable and accurate deep learning with electronic health records","volume":"1","author":"Rajkomar","year":"2018","journal-title":"NPJ Digit. Med."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Liu, Q., Chen, E., Ge, Y., and Zhao, J.L. (2014, January 16\u201318). Time Series Classification Using Multi-Channels Deep Convolutional Neural Networks. Proceedings of the International Conference on Web-Age Information Management, Macau, China.","DOI":"10.1007\/978-3-319-08010-9_33"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Chu, W., and Park, S. (2009, January 20\u201324). Personalized recommendation on dynamic content. Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain.","DOI":"10.1145\/1526709.1526802"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1145\/2934664","article-title":"Apache Spark: A Unified Engine for Big Data Processing","volume":"59","author":"Zaharia","year":"2016","journal-title":"Commun. ACM"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/11\/268\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T08:13:15Z","timestamp":1761552795000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/11\/268"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,24]]},"references-count":72,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11]]}},"alternative-id":["bdcc9110268"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9110268","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,10,24]]}}}