{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T03:02:15Z","timestamp":1760151735384,"version":"build-2065373602"},"reference-count":26,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2022,4,14]],"date-time":"2022-04-14T00:00:00Z","timestamp":1649894400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Union","doi-asserted-by":"publisher","award":["825258."],"award-info":[{"award-number":["825258."]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>This paper introduces the Hopsworks platform to the entire Earth Observation (EO) data community and the Copernicus programme. Hopsworks is a scalable data-intensive open-source Artificial Intelligence (AI) platform that was jointly developed by Logical Clocks and the KTH Royal Institute of Technology for building end-to-end Machine Learning (ML)\/Deep Learning (DL) pipelines for EO data. It provides the full stack of services needed to manage the entire life cycle of data in ML. In particular, Hopsworks supports the development of horizontally scalable DL applications in notebooks and the operation of workflows to support those applications, including parallel data processing, model training, and model deployment at scale. To the best of our knowledge, this is the first work that demonstrates the services and features of the Hopsworks platform, which provide users with the means to build scalable end-to-end ML\/DL pipelines for EO data, as well as support for the discovery and search for EO metadata. This paper serves as a demonstration and walkthrough of the stages of building a production-level model that includes data ingestion, data preparation, feature extraction, model training, model serving, and monitoring. To this end, we provide a practical example that demonstrates the aforementioned stages with real-world EO data and includes source code that implements the functionality of the platform. We also perform an experimental evaluation of two frameworks built on top of Hopsworks, namely Maggy and AutoAblation. We show that using Maggy for hyperparameter tuning results in roughly half the wall-clock time required to execute the same number of hyperparameter tuning trials using Spark while providing linear scalability as more workers are added. Furthermore, we demonstrate how AutoAblation facilitates the definition of ablation studies and enables the asynchronous parallel execution of ablation trials.<\/jats:p>","DOI":"10.3390\/rs14081889","type":"journal-article","created":{"date-parts":[[2022,4,19]],"date-time":"2022-04-19T02:39:31Z","timestamp":1650335971000},"page":"1889","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Scalable Artificial Intelligence for Earth Observation Data Using Hopsworks"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2014-2749","authenticated-orcid":false,"given":"Desta Haileselassie","family":"Hagos","sequence":"first","affiliation":[{"name":"Division of Software and Computer Systems, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden"}]},{"given":"Theofilos","family":"Kakantousis","sequence":"additional","affiliation":[{"name":"Logical Clocks AB, 118 72 Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7236-4637","authenticated-orcid":false,"given":"Sina","family":"Sheikholeslami","sequence":"additional","affiliation":[{"name":"Division of Software and Computer Systems, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0422-6560","authenticated-orcid":false,"given":"Tianze","family":"Wang","sequence":"additional","affiliation":[{"name":"Division of Software and Computer Systems, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6779-7435","authenticated-orcid":false,"given":"Vladimir","family":"Vlassov","sequence":"additional","affiliation":[{"name":"Division of Software and Computer Systems, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2748-8929","authenticated-orcid":false,"given":"Amir Hossein","family":"Payberah","sequence":"additional","affiliation":[{"name":"Division of Software and Computer Systems, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden"}]},{"given":"Moritz","family":"Meister","sequence":"additional","affiliation":[{"name":"Logical Clocks AB, 118 72 Stockholm, Sweden"}]},{"given":"Robin","family":"Andersson","sequence":"additional","affiliation":[{"name":"Logical Clocks AB, 118 72 Stockholm, Sweden"}]},{"given":"Jim","family":"Dowling","sequence":"additional","affiliation":[{"name":"Division of Software and Computer Systems, KTH Royal Institute of Technology, 100 44 Stockholm, Sweden"},{"name":"Logical Clocks AB, 118 72 Stockholm, Sweden"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,14]]},"reference":[{"key":"ref_1","unstructured":"Hagos, D.H., Kakantousis, T., Vlassov, V., Sheikholeslami, S., Wang, T., Dowling, J., Fleming, A., Cziferszky, A., Muerth, M., and Appel, F. (2021, January 18\u201320). The ExtremeEarth Software Architecture for Copernicus Earth Observation Data. Proceedings of the 2021 Conference on Big Data from Space, Publications Office of the European Union, Virtual Event."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"9038","DOI":"10.1109\/JSTARS.2021.3107982","article-title":"ExtremeEarth Meets Satellite Data From Space","volume":"14","author":"Hagos","year":"2021","journal-title":"IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens."},{"key":"ref_3","unstructured":"Kakantousis, T., Kouzoupis, A., Buso, F., Berthou, G., Dowling, J., and Haridi, S. (April, January 31). Horizontally Scalable ML Pipelines with a Feature Store. Proceedings of the 2nd SysML Conference, Palo Alto, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Meister, M., Sheikholeslami, S., Payberah, A.H., Vlassov, V., and Dowling, J. (2020, January 1). Maggy: Scalable Asynchronous Parallel Hyperparameter Search. Proceedings of the 1st Workshop on Distributed Machine Learning, Barcelona, Spain.","DOI":"10.1145\/3426745.3431338"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ismail, M., Gebremeskel, E., Kakantousis, T., Berthou, G., and Dowling, J. (2017, January 5\u20138). Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata. Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA.","DOI":"10.1109\/ICDCS.2017.41"},{"key":"ref_6","unstructured":"Niazi, S., Ismail, M., Haridi, S., Dowling, J., Grohsschmiedt, S., and Ronstr\u00f6m, M. (March, January 27). HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST 17), Santa Clara, CA, USA."},{"key":"ref_7","unstructured":"Andersson, R. (2017). GPU integration for Deep Learning on YARN. [Master\u2019s Thesis, KTH Royal Institute of Technology]."},{"key":"ref_8","unstructured":"Robbie, G., Owen, C., and Yevgeni, L. (2022, April 09). Introducing Petastorm: Uber ATG\u2019s Data Access Library for Deep Learning. Available online: https:\/\/eng.uber.com\/petastorm\/."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.visinf.2017.01.006","article-title":"Towards better analysis of machine learning models: A visual analytics perspective","volume":"1","author":"Liu","year":"2017","journal-title":"Vis. Inform."},{"key":"ref_10","unstructured":"Garg, N. (2013). Apache Kafka, Packt Publishing Ltd."},{"key":"ref_11","unstructured":"De la R\u00faa Mart\u00ednez, J. (2020). Scalable Architecture for Automating Machine Learning Model Monitoring. [Master\u2019s Thesis, KTH Royal Institute of Technology]."},{"key":"ref_12","first-page":"26","article-title":"Hyperparameter optimization for machine learning models based on Bayesian optimization","volume":"17","author":"Wu","year":"2019","journal-title":"J. Electron. Sci. Technol."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bergstra, J., Yamins, D., and Cox, D.D. (2013, January 24\u201329). Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. Proceedings of the 12th Python in Science Conference, Austin, TX, USA.","DOI":"10.25080\/Majora-8b375195-003"},{"key":"ref_14","unstructured":"Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Hardt, M., Recht, B., and Talwalkar, A. (2018). A system for massively parallel hyperparameter tuning. arXiv."},{"key":"ref_15","unstructured":"Meister, M., Sheikholeslami, S., Andersson, R., Ormenisan, A.A., and Dowling, J. (2020, January 2\u20134). Towards Distribution Transparency for Supervised ML with Oblivious Training Functions. Proceedings of the Workshop on MLOps Systems, Austin, TX, USA."},{"key":"ref_16","unstructured":"Bergstra, J., Yamins, D., and Cox, D. (2013, January 17\u201319). Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_17","unstructured":"Ginsbourger, D., Janusevskis, J., and Le Riche, R. (2011). Dealing with Asynchronicity in Parallel Gaussian Process Based Global Optimization. [Ph.D. Thesis, Mines Saint-Etienne]."},{"key":"ref_18","first-page":"6765","article-title":"Hyperband: A novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Prechelt, L. (1998). Early stopping-but when?. Neural Networks: Tricks of the Trade, Springer.","DOI":"10.1007\/3-540-49430-8_3"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Sheikholeslami, S., Meister, M., Wang, T., Payberah, A.H., Vlassov, V., and Dowling, J. (2021, January 26). AutoAblation: Automated Parallel Ablation Studies for Deep Learning. Proceedings of the 1st Workshop on Machine Learning and Systems, Online.","DOI":"10.1145\/3437984.3458834"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016, January 27\u201330). Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.308"},{"key":"ref_22","unstructured":"Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv."},{"key":"ref_23","first-page":"281","article-title":"Random search for hyper-parameter optimization","volume":"13","author":"Bergstra","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_24","unstructured":"LeCun, Y. (2022, April 09). The MNIST Database of Handwritten Digits. Available online: http:\/\/yann.lecun.com\/exdb\/mnist\/."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1002\/gdj3.73","article-title":"A labelled ocean SAR imagery dataset of ten geophysical phenomena from Sentinel-1 wave mode","volume":"6","author":"Wang","year":"2019","journal-title":"Geosci. Data J."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. (2009, January 20\u201325). Imagenet: A Large-Scale Hierarchical Image Database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/8\/1889\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:54:11Z","timestamp":1760136851000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/8\/1889"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,14]]},"references-count":26,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["rs14081889"],"URL":"https:\/\/doi.org\/10.3390\/rs14081889","relation":{},"ISSN":["2072-4292"],"issn-type":[{"type":"electronic","value":"2072-4292"}],"subject":[],"published":{"date-parts":[[2022,4,14]]}}}