{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T07:57:16Z","timestamp":1761983836333,"version":"build-2065373602"},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T00:00:00Z","timestamp":1761955200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","award":["Destination Earth"],"award-info":[{"award-number":["Destination Earth"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Data extraction algorithms on data hypercubes, or datacubes, are traditionally only capable of cutting boxes of data along the datacube axes. For many use cases however, this returns much more data than users actually need, leading to an unnecessary consumption of I\/O resources. In this paper, we propose an alternative feature extraction technique, which carefully computes the indices of data points contained within user-requested shapes. This enables data storage systems to only read and return bytes useful to user applications from the datacube. Our main algorithm is based on high-dimensional computational geometry concepts and operates by successively reducing polytopes down to the points contained within them. We analyse this algorithm in detail before providing results about its performance and scalability. In particular, we show it is possible to achieve data reductions of up to 99% using this algorithm instead of current state of practice data extraction methods, such as meteorological field extractions from ECMWF\u2019s FDB data store, where feature shapes are extracted a posteriori as a post-processing step. As we discuss later on, this novel extraction method will considerably help scale access to large petabyte size data hypercubes in a variety of scientific fields.<\/jats:p>","DOI":"10.1186\/s40537-025-01306-3","type":"journal-article","created":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T07:54:49Z","timestamp":1761983689000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Polytope: an algorithm for efficient feature extraction on hypercubes"],"prefix":"10.1186","volume":"12","author":[{"given":"Mathilde","family":"Leuridan","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"James","family":"Hawkes","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Simon","family":"Smart","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emanuele","family":"Danovaro","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Martin","family":"Schultz","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tiago","family":"Quintino","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,11,1]]},"reference":[{"issue":"6","key":"1306_CR1","first-page":"1231","volume":"36","author":"I Yaqoob","year":"2016","unstructured":"Yaqoob I, Hashem IAT, Gani A, Mokhtar S, Ahmed E, Anuar NB, et al. Big data: from beginning to future. Int J Inf Manage. 2016;36(6):1231\u201347.","journal-title":"Int J Inf Manage"},{"key":"1306_CR2","unstructured":"Bauer P, Quintino T, Wedi N, Bonanni A, Chrust M, Deconinck W, Diamantakis M, D\u00fcben P, English S, Flemming J et al. The ECMWF Scalability Programme: Progress and Plans. ECMWF 2020. https:\/\/www.ecmwf.int\/node\/19380"},{"key":"1306_CR3","unstructured":"Data: a small four-letter word which has grown exponentially to such a big value. https:\/\/www.deloitte.com\/cy\/en\/Industries\/technology\/perspectives\/data-grown-big-value.html. Accessed on 3 Oct 2024"},{"key":"1306_CR4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-020-00399-2","volume":"8","author":"P Baumann","year":"2021","unstructured":"Baumann P, Misev D, Merticariu V, Huu BP. Array databases: concepts, standards, implementations. J Big Data. 2021;8:1\u201361.","journal-title":"J Big Data"},{"key":"1306_CR5","doi-asserted-by":"crossref","unstructured":"Baumann P, Misev D, Merticariu V, Huu, BP. Datacubes: Towards space\/time analysis-ready data. Service-Oriented Mapping: Changing Paradigm in Map Production and Geoinformation Management, 2019;269\u2013299","DOI":"10.1007\/978-3-319-72434-8_14"},{"key":"1306_CR6","doi-asserted-by":"crossref","unstructured":"Killough B. Overview of the open data cube initiative. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, 2018;pp. 8629\u20138632 . IEEE","DOI":"10.1109\/IGARSS.2018.8517694"},{"key":"1306_CR7","doi-asserted-by":"crossref","unstructured":"Baumann P. Datacube standards and their contribution to analysis-ready data. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, 2018;pp. 2051\u20132053 . IEEE","DOI":"10.1109\/IGARSS.2018.8518994"},{"key":"1306_CR8","unstructured":"Wang Z, Chu Y, Tan K-L, Agrawal D, Abbadi AE, Xu X. Scalable data cube analysis over big data. arXiv preprint arXiv:1311.5663 2013."},{"key":"1306_CR9","unstructured":"Higham DJ, Higham NJ. MATLAB Guide. vol. 150. SIAM 2016."},{"issue":"1","key":"1306_CR10","doi-asserted-by":"publisher","DOI":"10.5334\/jors.148","volume":"5","author":"S Hoyer","year":"2017","unstructured":"Hoyer S, Hamman J. Xarray: ND labeled arrays and datasets in Python. J Open Res Softw. 2017;5(1):10.","journal-title":"J Open Res Softw"},{"key":"1306_CR11","unstructured":"Xtensor Stack: Xtensor Documentation. In Xtensor (Version 0.24.6). Retrieved from Read the Docs: https:\/\/xtensor.readthedocs.io\/en\/latest\/. Accessed on 10 Feb 2023"},{"key":"1306_CR12","unstructured":"Melton J, Simon AR. SQL: 1999: understanding relational language components. Elsevier 2001."},{"issue":"2","key":"1306_CR13","doi-asserted-by":"publisher","first-page":"247","DOI":"10.1145\/22952.22956","volume":"12","author":"R Snodgrass","year":"1987","unstructured":"Snodgrass R. The temporal query language TQuel. ACM Trans Database Syst. 1987;12(2):247\u201398.","journal-title":"ACM Trans Database Syst"},{"key":"1306_CR14","volume-title":"PostgreSQL: up and running: a practical guide to the advanced open source database","author":"RO Obe","year":"2017","unstructured":"Obe RO, Hsu LS. PostgreSQL: up and running: a practical guide to the advanced open source database. O\u2019Reilly Media: Inc; 2017."},{"key":"1306_CR15","doi-asserted-by":"crossref","unstructured":"Baumann P, Furtado P, Ritsch R, Widmann N. The RasDaMan approach to multidimensional database management. In: Proceedings of the 1997 ACM Symposium on Applied Computing, 1997;pp. 166\u2013173","DOI":"10.1145\/331697.331732"},{"key":"1306_CR16","doi-asserted-by":"crossref","unstructured":"Fiore S, D\u2019Anca A, Elia D, Palazzo C, Williams D, Foster I, Aloisio G. Ophidia: a full software stack for scientific data analytics. In: 2014 International Conference on High Performance Computing & Simulation (HPCS), 2014;pp. 343\u2013350 . IEEE","DOI":"10.1109\/HPCSim.2014.6903706"},{"key":"1306_CR17","doi-asserted-by":"crossref","unstructured":"Gosink L, Shalf J, Stockinger K, Wu K, Bethel W. HDF5-FastQuery: Accelerating complex queries on HDF datasets using fast bitmap indices. In: 18th International Conference on Scientific and Statistical Database Management (SSDBM\u201906), 2006;pp. 149\u2013158 . IEEE","DOI":"10.1109\/SSDBM.2006.27"},{"key":"1306_CR18","doi-asserted-by":"crossref","unstructured":"Taylor KE. Ezget: A library of fortran subroutines to facilitate data retrieval. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States) 1996.","DOI":"10.2172\/269008"},{"key":"1306_CR19","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1023\/A:1009726021843","volume":"1","author":"J Gray","year":"1997","unstructured":"Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, et al. Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Min Knowl Discov. 1997;1:29\u201353.","journal-title":"Data Min Knowl Discov"},{"issue":"1","key":"1306_CR20","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1559\/152304004773112749","volume":"31","author":"A Frihida","year":"2004","unstructured":"Frihida A, Marceau DJ, Th\u00e9riault M. Extracting and visualizing individual space-time paths: an integration of GIS and KDD in transport demand modeling. Cartogr Geogr Inf Sci. 2004;31(1):19\u201328.","journal-title":"Cartogr Geogr Inf Sci"},{"issue":"4","key":"1306_CR21","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1007\/BF01231603","volume":"3","author":"P Baumann","year":"1994","unstructured":"Baumann P. Management of multidimensional discrete data. VLDB J. 1994;3(4):401\u201344.","journal-title":"VLDB J"},{"key":"1306_CR22","doi-asserted-by":"publisher","unstructured":"Leuridan M, Warde A, Hawkes J, Varndell J, Tsrunchev P, Figala D. ecmwf\/polytope: 1.0.21. https:\/\/doi.org\/10.5281\/zenodo.14537049 .","DOI":"10.5281\/zenodo.14537049"},{"key":"1306_CR23","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1016\/j.procs.2024.07.013","volume":"240","author":"T Geenen","year":"2024","unstructured":"Geenen T, Wedi N, Milinski S, Hadade I, Reuter B, Smart S, et al. Digital twins, the journey of an operational weather system into the heart of Destination Earth. Procedia Computer Science. 2024;240:99\u2013108.","journal-title":"Procedia Computer Science"},{"key":"1306_CR24","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1007\/BF01580381","volume":"11","author":"P Wolfe","year":"1976","unstructured":"Wolfe P. Finding the nearest point in a polytope. Math Program. 1976;11:128\u201349.","journal-title":"Math Program"},{"key":"1306_CR25","doi-asserted-by":"publisher","unstructured":"Thompson AC. Convex sets. In: Meyers, R.A. (ed.) Encyclopedia of Physical Science and Technology (Third Edition), Third edition edn., 2003;pp. 717\u2013737. Academic Press, New York. https:\/\/doi.org\/10.1016\/B0-12-227410-5\/00146-0. https:\/\/www.sciencedirect.com\/science\/article\/pii\/B0122274105001460","DOI":"10.1016\/B0-12-227410-5\/00146-0"},{"key":"1306_CR26","doi-asserted-by":"crossref","unstructured":"Chazelle B, Palios L. Decomposition algorithms in geometry. In: Algebraic Geometry and Its Applications: Collections of Papers from Shreeram S. Abhyankar\u2019s 60th Birthday Conference, 1994;pp. 419\u2013447 . Springer","DOI":"10.1007\/978-1-4612-2628-4_27"},{"key":"1306_CR27","first-page":"267","volume":"239","author":"SJ Owen","year":"1998","unstructured":"Owen SJ. A survey of unstructured mesh generation technology. IMR. 1998;239:267.","journal-title":"IMR"},{"key":"1306_CR28","doi-asserted-by":"crossref","unstructured":"Guttman A. R-trees: A dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data, 1984;pp. 47\u201357","DOI":"10.1145\/602259.602266"},{"key":"1306_CR29","unstructured":"Obe R, Hsu L. PostGIS in action. Simon and Schuster 2021."},{"issue":"2","key":"1306_CR30","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1109\/TVCG.2003.1196005","volume":"9","author":"C Stolte","year":"2003","unstructured":"Stolte C, Tang D, Hanrahan P. Multiscale visualization using data cubes. IEEE Trans Vis Comput Graph. 2003;9(2):176\u201387.","journal-title":"IEEE Trans Vis Comput Graph"},{"issue":"1","key":"1306_CR31","doi-asserted-by":"publisher","first-page":"63","DOI":"10.3138\/cart.54.1.2018-0017","volume":"54","author":"MB Purss","year":"2019","unstructured":"Purss MB, Peterson PR, Strobl P, Dow C, Sabeur ZA, Gibb RG, et al. Datacubes: a discrete global grid systems perspective. Cartographica. 2019;54(1):63\u201371.","journal-title":"Cartographica"},{"key":"1306_CR32","unstructured":"Harzheim E. Ordered sets. vol. 7. Springer Science & Business Media 2005."},{"key":"1306_CR33","doi-asserted-by":"crossref","unstructured":"Otoo EJ, Wang H, Nimako G. Multidimensional Sparse Array Storage for Data Analytics. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS), 2016;pp. 1520\u20131529 . IEEE","DOI":"10.1109\/HPCC-SmartCity-DSS.2016.0216"},{"key":"1306_CR34","unstructured":"Makris A, Tserpes K, Spiliopoulos G, Anagnostopoulos D. Performance Evaluation of MongoDB and PostgreSQL for Spatio-temporal Data. In: EDBT\/ICDT Workshops 2019."},{"key":"1306_CR35","doi-asserted-by":"crossref","unstructured":"Su Y, Agrawal G. Supporting user-defined subsetting and aggregation over parallel NetCDF datasets. In: 2012 12th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), 2012;pp. 212\u2013219 . IEEE","DOI":"10.1109\/CCGrid.2012.45"},{"key":"1306_CR36","doi-asserted-by":"crossref","unstructured":"Su Y, Agrawal G, Woodring J. Indexing and parallel query processing support for visualizing climate datasets. In: 2012 41st International Conference on Parallel Processing, 2012;pp. 249\u2013258 . IEEE","DOI":"10.1109\/ICPP.2012.33"},{"issue":"4","key":"1306_CR37","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1145\/235815.235821","volume":"22","author":"CB Barber","year":"1996","unstructured":"Barber CB, Dobkin DP, Huhdanpaa H. The quickhull algorithm for convex hulls. ACM Trans Math Softw. 1996;22(4):469\u201383.","journal-title":"ACM Trans Math Softw"},{"key":"1306_CR38","unstructured":"Burgoyne M, Blodgett D, Heazel C, Little C. OGC API-Environmental Data Retrieval Standard. Open Geospatial Consortium Inc., Wayland, MA, USA, OpenGIS\u00ae Implementation Specification OGC"},{"issue":"2","key":"1306_CR39","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1093\/comjnl\/16.2.157","volume":"16","author":"A Ricci","year":"1973","unstructured":"Ricci A. A constructive geometry for computer graphics. Comput J. 1973;16(2):157\u201360.","journal-title":"Comput J"},{"issue":"4","key":"1306_CR40","doi-asserted-by":"publisher","first-page":"223","DOI":"10.1016\/0010-4485(90)90051-D","volume":"22","author":"RR Martin","year":"1990","unstructured":"Martin RR, Stephenson P. Sweeping of three-dimensional objects. Computer-Aided Design. 1990;22(4):223\u201334.","journal-title":"Computer-Aided Design"},{"key":"1306_CR41","unstructured":"ECMWF: Key facts and figures. https:\/\/www.ecmwf.int\/en\/about\/media-centre\/key-facts-and-figures. Accessed on 29 Sep 2024"},{"key":"1306_CR42","unstructured":"ECMWF: MARS Catalogue. https:\/\/apps.ecmwf.int\/mars-catalogue\/. Accessed on 29 Sep 2024"},{"key":"1306_CR43","volume-title":"Destination Earth: Digital Twins of the Earth System","author":"N Wedi","year":"2022","unstructured":"Wedi N, Quintino T, Modigliani U, Baousis V, Geenen T, Sandu I, et al. Destination Earth: Digital Twins of the Earth System. Copernicus Meetings: Technical report; 2022."},{"issue":"6","key":"1306_CR44","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1109\/MCSE.2023.3260519","volume":"24","author":"N Wedi","year":"2022","unstructured":"Wedi N, Bauer P, Sandu I, Hoffmann J, Sheridan S, Cereceda R, et al. Destination Earth: High-Performance Computing for Weather and Climate. Computing in Science & Engineering. 2022;24(6):29\u201337.","journal-title":"Computing in Science & Engineering"},{"key":"1306_CR45","unstructured":"Raoult B. Architecture of the new MARS server. https:\/\/www.ecmwf.int\/sites\/default\/files\/elibrary\/1997\/11839-architecture-new-mars-server.pdf. Accessed on 11 Feb 2023"},{"key":"1306_CR46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1476-069X-11-36","volume":"11","author":"Y Guo","year":"2012","unstructured":"Guo Y, Punnasiri K, Tong S. Effects of temperature on mortality in Chiang Mai city, Thailand: a time series study. Environ Health. 2012;11:1\u20139.","journal-title":"Environ Health"},{"key":"1306_CR47","doi-asserted-by":"crossref","unstructured":"Chen R, Yin P, Wang L, Liu C, Niu Y, Wang W, Jiang Y, Liu Y, Liu J, Qi J et al. Association between ambient temperature and mortality risk and burden: time series study in 272 main Chinese cities. BMJ 2018;363","DOI":"10.1136\/bmj.k4306"},{"issue":"4","key":"1306_CR48","doi-asserted-by":"publisher","first-page":"613","DOI":"10.3233\/IDA-2011-0485","volume":"15","author":"C Ordonez","year":"2011","unstructured":"Ordonez C. Data set preprocessing and transformation in a database system. Intell Data Anal. 2011;15(4):613\u201331.","journal-title":"Intell Data Anal"},{"key":"1306_CR49","unstructured":"4 ways data is improving healthcare. https:\/\/www.weforum.org\/agenda\/2019\/12\/four-ways-data-is-improving-healthcare. Accessed on 24 Apr 2023"},{"issue":"2","key":"1306_CR50","doi-asserted-by":"publisher","first-page":"655","DOI":"10.1002\/mrm.26152","volume":"77","author":"O Viessmann","year":"2017","unstructured":"Viessmann O, Li L, Benjamin P, Jezzard P. T2-weighted intracranial vessel wall imaging at 7 tesla using a DANTE-prepared variable flip angle turbo spin echo readout (DANTE-SPACE). Magn Reson Med. 2017;77(2):655\u201363.","journal-title":"Magn Reson Med"},{"issue":"5","key":"1306_CR51","doi-asserted-by":"publisher","first-page":"2112","DOI":"10.1002\/mrm.30203","volume":"92","author":"MH Buck","year":"2024","unstructured":"Buck MH, Hess AT, Jezzard P. Simulation-based optimization and experimental comparison of intracranial T2-weighted DANTE-SPACE vessel wall imaging at 3T and 7T. Magn Reson Med. 2024;92(5):2112\u201326.","journal-title":"Magn Reson Med"},{"key":"1306_CR52","doi-asserted-by":"crossref","unstructured":"Leuridan M, Bradley C, Hawkes J, Quintino T, Schultz M. Performance analysis of an efficient algorithm for feature extraction from large scale meteorological data stores. In: Proceedings of the Platform for Advanced Scientific Computing Conference, 2025;pp. 1\u20139","DOI":"10.1145\/3732775.3733573"},{"key":"1306_CR53","doi-asserted-by":"crossref","unstructured":"Smart SD, Quintino T, Raoult B. A high-performance distributed object-store for exascale numerical weather prediction and climate. In: Proceedings of the Platform for Advanced Scientific Computing Conference, 2019;pp. 1\u201311","DOI":"10.1145\/3324989.3325726"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01306-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01306-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01306-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,1]],"date-time":"2025-11-01T07:54:55Z","timestamp":1761983695000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01306-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,1]]},"references-count":53,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1306"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01306-3","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,1]]},"assertion":[{"value":"21 December 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"243"}}