{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:15:33Z","timestamp":1773843333760,"version":"3.50.1"},"reference-count":69,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2020,6,3]],"date-time":"2020-06-03T00:00:00Z","timestamp":1591142400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical \u2018batch\u2019 processing\u2014extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.<\/jats:p>","DOI":"10.3390\/s20113166","type":"journal-article","created":{"date-parts":[[2020,6,4]],"date-time":"2020-06-04T04:36:09Z","timestamp":1591245369000},"page":"3166","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":36,"title":["A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8796-0674","authenticated-orcid":false,"given":"Adeyinka","family":"Akanbi","sequence":"first","affiliation":[{"name":"Centre for Sustainable Smart Cities, Central University of Technology, Free State 9300, South Africa"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8914-0055","authenticated-orcid":false,"given":"Muthoni","family":"Masinde","sequence":"additional","affiliation":[{"name":"Centre for Sustainable Smart Cities, Central University of Technology, Free State 9300, South Africa"}]}],"member":"1968","published-online":{"date-parts":[[2020,6,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"516","DOI":"10.1016\/j.chb.2016.04.023","article-title":"An empirical examination of consumer adoption of Internet of Things services: Network externalities and concern for information privacy perspectives","volume":"62","author":"Hsu","year":"2016","journal-title":"Comput. Hum. Behav."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s10708-013-9516-8","article-title":"The real-time city? Big data and smart urbanism","volume":"79","author":"Kitchin","year":"2014","journal-title":"GeoJournal"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1186\/s40537-019-0271-7","article-title":"A new Internet of Things architecture for real-time prediction of various diseases using machine learning on big data environment","volume":"6","author":"Maalmi","year":"2019","journal-title":"J. Big Data"},{"key":"ref_4","unstructured":"Marcu, O.C., Costan, A., Antoniu, G., P\u00e9rez-Hern\u00e1ndez, M., Tudoran, R., Bortoli, S., and Nicolae, B. (2018). Storage and Ingestion Systems in Support of Stream Processing: A Survey, HAL."},{"key":"ref_5","first-page":"4","article-title":"Apache flink: Stream and batch processing in a single engine","volume":"36","author":"Carbone","year":"2015","journal-title":"Bull. IEEE Comput. Soc. Tech. Comm. Data Eng."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Ari, I., Olmezogullari, E., and \u00c7elebi, \u00d6.F. (2012, January 3\u20136). Data stream analytics and mining in the cloud. Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, Taipei, Taiwan.","DOI":"10.1109\/CloudCom.2012.6427563"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Strohbach, M., Ziekow, H., Gazis, V., and Akiva, N. (2015). Towards a big data analytics framework for IoT and smart city applications. Modeling and Processing for Next-Generation Big-Data Technologies, Springer.","DOI":"10.1007\/978-3-319-09177-8_11"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1004","DOI":"10.1016\/j.procs.2015.05.093","article-title":"Stream processing of healthcare sensor data: Studying user traces to identify challenges from a big data perspective","volume":"52","author":"Bonnaire","year":"2015","journal-title":"Procedia Comput. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kaisler, S., Armour, F., Espinosa, J.A., and Money, W. (2013, January 7\u201310). Big data: Issues and challenges moving forward. Proceedings of the 2013 46th Hawaii International Conference on System Sciences, Wailea, HI, USA.","DOI":"10.1109\/HICSS.2013.645"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1016\/j.procs.2015.04.188","article-title":"A brief introduction on Big Data 5Vs characteristics and Hadoop technology","volume":"48","author":"Ishwarappa","year":"2015","journal-title":"Procedia Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cumbane, S.P., and Gid\u00f3falvi, G. (2019). Review of Big Data and Processing Frameworks for Disaster Response Applications. ISPRS Int. J. Geo-Inf., 8.","DOI":"10.3390\/ijgi8090387"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Habiba, M., and Akhter, S. (2013, January 9\u201311). A cloud based natural disaster management system. Proceedings of the International Conference on Grid and Pervasive Computing, Seoul, Korea.","DOI":"10.1007\/978-3-642-38027-3_16"},{"key":"ref_13","unstructured":"Akanbi, A.K., and Masinde, M. (2018, January 9\u201311). Semantic interoperability middleware architecture for heterogeneous environmental data sources. Proceedings of the 2018 IST-Africa Week Conference (IST-Africa), Gaborone, Botswana."},{"key":"ref_14","unstructured":"(2020, January 14). Apache Storm. Available online: https:\/\/storm.apache.org\/."},{"key":"ref_15","unstructured":"(2019, October 06). Apache Flink. Available online: https:\/\/flink.apache.org."},{"key":"ref_16","unstructured":"(2020, January 12). Apache Kafka. Available online: https:\/\/kafka.apache.org\/."},{"key":"ref_17","unstructured":"(2019, November 10). Apache Spark. Available online: https:\/\/spark.apache.org."},{"key":"ref_18","unstructured":"(2019, November 21). Amazon EC2. Available online: https:\/\/aws.amazon.com\/ec2\/."},{"key":"ref_19","unstructured":"(2019, December 15). Microsoft Azure. Available online: https:\/\/azure.microsoft.com\/en-us\/."},{"key":"ref_20","unstructured":"(2019, December 15). Google Cloud. Available online: https:\/\/cloud.google.com."},{"key":"ref_21","unstructured":"(2019, November 23). Confluent. Available online: https:\/\/www.confluent.io."},{"key":"ref_22","unstructured":"Garg, N. (2013). Apache Kafka, Packt Publishing Ltd."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1109\/MCOM.2002.1031837","article-title":"Channel islands in a reflective ocean: Large-scale event distribution in heterogeneous networks","volume":"40","author":"Crowcroft","year":"2002","journal-title":"IEEE Commun. Mag."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Akanbi, A.K., and Masinde, M. (2015, January 7\u201311). Towards semantic integration of heterogeneous sensor data with indigenous knowledge for drought forecasting. Proceedings of the Doctoral Symposium of the 16th International Middleware Conference, Vancouver, BC, Canada.","DOI":"10.1145\/2843966.2843968"},{"key":"ref_25","unstructured":"Akanbi, A.K., and Masinde, M. (2015, January 6\u20137). A Framework for Accurate Drought Forecasting System Using Semantics-Based Data Integration Middleware. Proceedings of the International Conference on e-Infrastructure and e-Services for Developing Countries, Ouagadougou, Burkina Faso."},{"key":"ref_26","unstructured":"Henricksen, K., Indulska, J., McFadden, T., and Balasubramaniam, S. (November, January 31). Middleware for distributed context-aware systems. Proceedings of the OTM Confederated International Conferences On the Move to Meaningful Internet Systems, Agia Napa, Cyprus."},{"key":"ref_27","unstructured":"Yu, X., Niyogi, K., Mehrotra, S., and Venkatasubramanian, N. (2003). Adaptive middleware for distributed sensor environments. IEEE Distrib. Syst. Online, 5."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Clemente, P.J., and Lozano-Tello, A. (2018). Model driven development applied to complex event processing for near real-time open data. Sensors, 18.","DOI":"10.3390\/s18124125"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3337065","article-title":"Big data analytics for large-scale wireless networks: Challenges and opportunities","volume":"52","author":"Dai","year":"2019","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_30","unstructured":"Ballard, C., Farrell, D.M., Lee, M., Stone, P.D., Thibault, S., and Tucker, S. (2010). IBM Infosphere Streams Harnessing Data in Motion, IBM Redbooks."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1634","DOI":"10.14778\/3137765.3137770","article-title":"Samza: Stateful scalable stream processing at LinkedIn","volume":"10","author":"Noghabi","year":"2017","journal-title":"Proc. VLDB Endow."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Chintapalli, S., Dagit, D., Evans, B., Farivar, R., Graves, T., Holderbaugh, M., Liu, Z., Nusbaum, K., Patil, K., and Peng, B.J. (2016, January 23\u201327). Benchmarking streaming computation engines: Storm, flink and spark streaming. Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, USA.","DOI":"10.1109\/IPDPSW.2016.138"},{"key":"ref_33","unstructured":"Inoubli, W., Aridhi, S., Mezni, H., Maddouri, M., and Nguifo, E. (2018). A Comparative Study on Streaming Frameworks for Big Data, HAL."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3170432","article-title":"Recent advancements in event processing","volume":"51","author":"Dayarathna","year":"2018","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Tun, M.T., Nyaung, D.E., and Phyu, M.P. (2019, January 30\u201331). Performance Evaluation of Intrusion Detection Streaming Transactions Using Apache Kafka and Spark Streaming. Proceedings of the 2019 International Conference on Advanced Information Technologies (ICAIT), Zurich, Switzerland.","DOI":"10.1109\/AITC.2019.8920960"},{"key":"ref_36","unstructured":"Jafarpour, H., and Desai, R. (2019, January 26\u201329). KSQL: Streaming SQL Engine for Apache Kafka. Proceedings of the EDBT, Lisbon, Portugal."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1109\/SURV.2013.042313.00197","article-title":"Context aware computing for the internet of things: A survey","volume":"16","author":"Perera","year":"2013","journal-title":"IEEE Commun. Surv. Tutor."},{"key":"ref_38","unstructured":"Cao, Y., Chen, S., Hou, P., and Brown, D. (2015, January 6\u20137). FAST: A fog computing assisted distributed analytics system to monitor fall for stroke mitigation. Proceedings of the 2015 IEEE International Conference on Networking, Architecture and Storage (NAS), Boston, MA, USA."},{"key":"ref_39","unstructured":"Korhonen, T. (2019). Using Kafka to Build Scalable and Fault Tolerant Systems. [Bachelor\u2019s Thesis, Laurea University of Applied Sciences]."},{"key":"ref_40","unstructured":"Garg, N. (2015). Learning Apache Kafka, Packt Publishing Ltd."},{"key":"ref_41","unstructured":"Narkhede, N., Shapira, G., and Palino, T. (2017). Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale, O\u2019Reilly Media, Inc."},{"key":"ref_42","unstructured":"Russom, P. (2011). Big Data Analytics. TDWI Best Practices Report, TDWI. Fourth Quarter 2011."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Tin, P., Zin, T.T., Toriu, T., and Hama, H. (2013, January 16\u201318). An integrated framework for disaster event analysis in big data environments. Proceedings of the 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Beijing, China.","DOI":"10.1109\/IIH-MSP.2013.72"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Wang, Q., and Shang, Y. (2019). A Distributed Complex Event Processing System Based on Publish\/Subscribe. Recent Developments in Intelligent Computing, Communication and Devices, Springer.","DOI":"10.1007\/978-981-10-8944-2_113"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Bartolini, I., and Patella, M. (2019). Real-Time Stream Processing in Social Networks with RAM3S. Future Internet, 11.","DOI":"10.3390\/fi11120249"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Amarasinghe, G., de Assun\u00e7\u00e3o, M.D., Harwood, A., and Karunasekera, S. (2019). ECSNeT++: A simulator for distributed stream processing on edge and cloud environments. Future Gener. Comput. Syst.","DOI":"10.1016\/j.future.2019.11.014"},{"key":"ref_47","unstructured":"Lindquist, K.G., Vernon, F.L., Harvey, D., Quinlan, D., Orcutt, J., Rajasekar, A., Hansen, T.S., and Foley, S. (2007, January 25\u201327). The Data Acquisition Core of the ROADNet Real-Time Monitoring System. Proceedings of the Data Sharing and Interoperability on the World-Wide Sensor Web, Cambridge, MA, USA."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/j.envsoft.2014.10.007","article-title":"Web technologies for environmental Big Data","volume":"63","author":"Vitolo","year":"2015","journal-title":"Environ. Model. Softw."},{"key":"ref_49","unstructured":"(2020, March 14). REAP Project. Available online: http:\/\/reap.ecoinformatics.org\/."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Wang, J., Crawl, D., and Altintas, I. (2009, January 16). Kepler+ Hadoop: A general architecture facilitating data-intensive applications in scientific workflow systems. Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science, Portland, OR, USA.","DOI":"10.1145\/1645164.1645176"},{"key":"ref_51","first-page":"1","article-title":"Architecting the IoT Paradigm: A Middleware for Autonomous Distributed Sensor Networks","volume":"2015","author":"Eleftherakis","year":"2015","journal-title":"Int. J. Distrib. Sens. Netw."},{"key":"ref_52","unstructured":"Wetz, P., Trinh, T.D., Do, B.L., Anjomshoaa, A., Kiesling, E., and Tjoa, A.M. (2014, January 10\u201312). Towards an Environmental Information System for Semantic Stream Data. Proceedings of the EnviroInfo, Oldenburg, Germany."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Dia, A.F., Kazi-Aoul, Z., Boly, A., and Chabchoub, Y. (2018). C-SPARQL extension for sampling RDF graphs streams. Advances in Knowledge Discovery and Management, Springer.","DOI":"10.1007\/978-3-319-65406-5_2"},{"key":"ref_54","unstructured":"Anicic, D., Fodor, P., Rudolph, S., and Stojanovic, N. (April, January 28). EP-SPARQL: A unified language for event processing and stream reasoning. Proceedings of the 20th International Conference on World Wide Web, Hyderabad, India."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Rjoub, G., Bentahar, J., and Wahab, O.A. (2019). BigTrustScheduling: Trust-aware big data task scheduling approach in cloud computing environments. Future Gener. Comput. Syst.","DOI":"10.1016\/j.future.2019.11.019"},{"key":"ref_56","unstructured":"Kalim, F., Xu, L., Bathey, S., Meherwal, R., and Gupta, I. (2018, January 11\u201313). Henge: Intent-driven multi-tenant stream processing. Proceedings of the ACM Symposium on Cloud Computing, Carlsbad, CA, USA."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Yang, Z., Nguyen, P., Jin, H., and Nahrstedt, K. (2019, January 7\u20139). MIRAS: Model-based Reinforcement Learning for Microservice Resource Allocation over Scientific Workflows. Proceedings of the 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), Richardson, TX, USA.","DOI":"10.1109\/ICDCS.2019.00021"},{"key":"ref_58","first-page":"9478","article-title":"Apache kafka: Next generation distributed messaging system","volume":"3","author":"Thein","year":"2014","journal-title":"Int. J. Sci. Eng. Technol. Res."},{"key":"ref_59","unstructured":"Kafka, A. (2020, May 18). A High-Throughput Distributed Messaging System. Available online: https:\/\/blog.csdn.net\/macyang\/article\/details\/8546941."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1016\/j.future.2018.10.058","article-title":"An edge-stream computing infrastructure for real-time analysis of wearable sensors data","volume":"93","author":"Greco","year":"2019","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_61","unstructured":"Reelsen, A. (2020, March 10). Using Elasticsearch, Logstash and Kibana to Create Realtime Dashboards. Dostupn\u00e9 z. Available online: https:\/\/speakerdeck.com\/elasticsearch\/using-elasticsearch-logstash-and-kibana-to-create-realtime-dashboards?slide=8."},{"key":"ref_62","unstructured":"(2019, November 21). MongoDB. Available online: https:\/\/www.mongodb.com."},{"key":"ref_63","unstructured":"(2019, November 21). Cassandra. Available online: http:\/\/cassandra.apache.org."},{"key":"ref_64","unstructured":"(2019, November 12). Akka. Available online: https:\/\/akka.io."},{"key":"ref_65","unstructured":"(2019, November 12). Zeppelin. Available online: https:\/\/zeppelin.apache.org."},{"key":"ref_66","first-page":"1181","article-title":"Daily quantification of drought severity and duration","volume":"5","author":"Byun","year":"1996","journal-title":"J. Clim."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"2747","DOI":"10.1175\/1520-0442(1999)012<2747:OQODSA>2.0.CO;2","article-title":"Objective quantification of drought severity and duration","volume":"12","author":"Byun","year":"1999","journal-title":"J. Clim."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Shree, R., Choudhury, T., Gupta, S.C., and Kumar, P. (2017, January 10\u201311). KAFKA: The modern platform for data management and analysis in big data domain. Proceedings of the 2017 2nd International Conference on Telecommunication and Networks (TEL-NET), Noida, India.","DOI":"10.1109\/TEL-NET.2017.8343593"},{"key":"ref_69","unstructured":"(2019, October 23). Kafka Connect. Available online: https:\/\/www.confluent.io\/connectors."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/11\/3166\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:35:14Z","timestamp":1760175314000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/20\/11\/3166"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,3]]},"references-count":69,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["s20113166"],"URL":"https:\/\/doi.org\/10.3390\/s20113166","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,3]]}}}