{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:39:12Z","timestamp":1740123552959,"version":"3.37.3"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2024,2,2]],"date-time":"2024-02-02T00:00:00Z","timestamp":1706832000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,2,2]],"date-time":"2024-02-02T00:00:00Z","timestamp":1706832000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"European Commission's Horizon 2020 Framework program and  the European High-Performance Computing Joint Undertaking","award":["955558","955558","955558","955558","955558","955558","955558","955558"],"award-info":[{"award-number":["955558","955558","955558","955558","955558","955558","955558","955558"]}]},{"name":"National Centre for HPC, Big Data and Quantum Computing","award":["CN00000013 - CUP H23C22000360005","CN00000013 - CUP H23C22000360005","CN00000013 - CUP H23C22000360005","CN00000013 - CUP H23C22000360005"],"award-info":[{"award-number":["CN00000013 - CUP H23C22000360005","CN00000013 - CUP H23C22000360005","CN00000013 - CUP H23C22000360005","CN00000013 - CUP H23C22000360005"]}]},{"name":"FAIR \u2013 Future Artificial Intelligence Research","award":["CUP H23C22000860006","CUP H23C22000860006"],"award-info":[{"award-number":["CUP H23C22000860006","CUP H23C22000860006"]}]},{"name":"Spanish Government","award":["PID2019-107255GB","PID2019-107255GB","PID2019-107255GB"],"award-info":[{"award-number":["PID2019-107255GB","PID2019-107255GB","PID2019-107255GB"]}]},{"name":"Departament de Recerca i Universitats de la Generalitat de Catalunya","award":["2021 SGR 00412","2021 SGR 00412","2021 SGR 00412"],"award-info":[{"award-number":["2021 SGR 00412","2021 SGR 00412","2021 SGR 00412"]}]},{"DOI":"10.13039\/501100007069","name":"Universit\u00e0 della Calabria","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007069","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2024,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Developing and executing large-scale data analysis applications in parallel and distributed environments can be a complex and time-consuming task. Developers often find themselves diverted from their application logic to handle technical details about the underlying runtime and related issues. To simplify this process, ParSoDA, a Java library, has been proposed to facilitate the development of parallel data mining applications executed on HPC systems. It simplifies the process by providing built-in scalability mechanisms relying on the Hadoop and Spark frameworks. This paper presents ParSoDA-Py, the Python version of the ParSoDA library, which allows for further support of commonly used runtimes and libraries for big data analysis. After a complete library redesign, ParSoDA can be now easily integrated with other Python-based distributed runtimes for HPC systems, such as COMPSs and Apache Spark, and with the large ecosystem of Python-based data processing libraries. The paper discusses the adaptation process, which takes into consideration the new technical requirements, and evaluates both usability and scalability through some case study applications.<\/jats:p>","DOI":"10.1007\/s11227-023-05883-z","type":"journal-article","created":{"date-parts":[[2024,2,2]],"date-time":"2024-02-02T16:02:24Z","timestamp":1706889744000},"page":"11741-11761","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Boosting HPC data analysis performance with the ParSoDA-Py library"],"prefix":"10.1007","volume":"80","author":[{"given":"Loris","family":"Belcastro","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Salvatore","family":"Giamp\u00e0","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fabrizio","family":"Marozzo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Domenico","family":"Talia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paolo","family":"Trunfio","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rosa M.","family":"Badia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jorge","family":"Ejarque","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nihad","family":"Mammadli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,2,2]]},"reference":[{"key":"5883_CR1","volume-title":"Data analysis in the cloud: models. Techniques and Applications","author":"D Talia","year":"2015","unstructured":"Talia D, Trunfio P, Marozzo F (2015) Data analysis in the cloud: models. Techniques and Applications. Elsevier, Amsterdam, The Netherlands"},{"issue":"1","key":"5883_CR2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s13278-018-0547-5","volume":"9","author":"L Belcastro","year":"2019","unstructured":"Belcastro L, Marozzo F, Talia D, Trunfio P (2019) Parsoda: high-level parallel programming for social data mining. Soc Netw Anal Min 9(1):1","journal-title":"Soc Netw Anal Min"},{"issue":"4","key":"5883_CR3","first-page":"1","volume":"9","author":"L Belcastro","year":"2022","unstructured":"Belcastro L, Cantini R, Marozzo F, Orsino A, Talia D, Trunfio P (2022) Programming big data analysis: principles and solutions. J Big Data 9(4):1","journal-title":"J Big Data"},{"key":"5883_CR4","doi-asserted-by":"publisher","first-page":"546","DOI":"10.1016\/j.future.2018.04.032","volume":"86","author":"W Inoubli","year":"2018","unstructured":"Inoubli W, Aridhi S, Mezni H, Maddouri M, Mephu Nguifo E (2018) An experimental survey on big data frameworks. Futur Gener Comput Syst 86:546\u2013564","journal-title":"Futur Gener Comput Syst"},{"issue":"2","key":"5883_CR5","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1145\/3484622.3484626","volume":"50","author":"C Doulkeridis","year":"2021","unstructured":"Doulkeridis C, Vlachou A, Pelekis N, Theodoridis Y (2021) A survey on big data processing frameworks for mobility analytics. SIGMOD Rec 50(2):18\u201329","journal-title":"SIGMOD Rec"},{"key":"5883_CR6","doi-asserted-by":"publisher","DOI":"10.1142\/q0444","volume-title":"Programming big data applications","author":"D Talia","year":"2024","unstructured":"Talia D, Trunfio P, Marozzo F, Belcastro L, Cantini R, Orsino A (2024) Programming big data applications. World Scientific (Europe), Munich, Germany"},{"issue":"3","key":"5883_CR7","doi-asserted-by":"publisher","first-page":"49","DOI":"10.3166\/isi.19.3.49-72","volume":"19","author":"S Amer-Yahia","year":"2014","unstructured":"Amer-Yahia S, Ibrahim N, Kengne CK, Ulliana F, Rousset M-C (2014) Socle: towards a framework for data preparation in social applications. Ing\u00e9nierie des Syst\u00e8mes d Inf 19(3):49\u201372","journal-title":"Ing\u00e9nierie des Syst\u00e8mes d Inf"},{"issue":"1","key":"5883_CR8","first-page":"50","volume":"27","author":"\u00c3 Cuesta","year":"2014","unstructured":"Cuesta \u00c3, Barrero DF, R-Doreno MD (2014) A framework for massive twitter data extraction and analysis. Malay J Comput Sci 27(1):50\u201367","journal-title":"Malay J Comput Sci"},{"doi-asserted-by":"crossref","unstructured":"Hussain A, Vatrapu R, Hardt D, Jaffari ZA (2014) Social data analytics tool: a demonstrative case study of methodology and software. In: Analyzing Social Media Data and Web Networks, pp. 99\u2013118. Springer, Amsterdam, The Netherlands","key":"5883_CR9","DOI":"10.1057\/9781137276773_5"},{"doi-asserted-by":"crossref","unstructured":"Zhou D, Chen L, He Y (2015) An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29","key":"5883_CR10","DOI":"10.1609\/aaai.v29i1.9526"},{"doi-asserted-by":"crossref","unstructured":"You L, Motta G, Sacco D, Ma T (2014) Social data analysis framework in cloud and mobility analyzer for smarter cities. In: Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics, pp. 96\u2013101 . IEEE","key":"5883_CR11","DOI":"10.1109\/SOLI.2014.6960700"},{"key":"5883_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.softx.2023.101538","volume":"24","author":"D Elia","year":"2023","unstructured":"Elia D, Palazzo C, Fiore S, D\u2019Anca A, Mariello A, Aloisio G (2023) Pyophidia: a python library for high performance data analytics at scale. SoftwareX 24:101538","journal-title":"SoftwareX"},{"doi-asserted-by":"crossref","unstructured":"Fiore S, Palazzo C, D\u2019Anca A, Foster I, Williams DN, Aloisio G (2013) A big data analytics framework for scientific data management. In: 2013 IEEE International Conference on Big Data, pp. 1\u20138","key":"5883_CR13","DOI":"10.1109\/BigData.2013.6691720"},{"doi-asserted-by":"crossref","unstructured":"Aldinucci M, Danelutto M, Kilpatrick P, Torquati M (2017) Fastflow: high-level and efficient streaming on multicore. Programming multi-core and many-core computing systems, 261\u2013280","key":"5883_CR14","DOI":"10.1002\/9781119332015.ch13"},{"issue":"5\u20136","key":"5883_CR15","doi-asserted-by":"publisher","first-page":"454","DOI":"10.1007\/s10766-022-00737-2","volume":"50","author":"J L\u00f6ff","year":"2022","unstructured":"L\u00f6ff J, Hoffmann RB, Pieper R, Griebler D, Fernandes LG (2022) Dsparlib: A c++ template library for distributed stream parallelism. Int J Parallel Prog 50(5\u20136):454\u2013485","journal-title":"Int J Parallel Prog"},{"issue":"24","key":"5883_CR16","doi-asserted-by":"publisher","first-page":"4175","DOI":"10.1002\/cpe.4175","volume":"29","author":"D Rio Astorga","year":"2017","unstructured":"Rio Astorga D, Dolz MF, Fern\u00e1ndez J, Garc\u00eda JD (2017) A generic parallel pattern interface for stream and data processing. Concurr Comput: Pract Exp 29(24):4175","journal-title":"Concurr Comput: Pract Exp"},{"doi-asserted-by":"crossref","unstructured":"Belcastro L, Marozzo F, Talia D, Trunfio P (2017) A parallel library for social media analytics. In: The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017), Genoa, Italy, pp. 683\u2013690 . ISBN 978-1-5386-3250-5","key":"5883_CR17","DOI":"10.1109\/HPCS.2017.105"},{"doi-asserted-by":"crossref","unstructured":"Belcastro L, Marozzo F, Talia D, Trunfio P (2017) Appraising spark on large-scale social media analysis. In: Euro-Par Workshops. Lecture Notes in Computer Science, pp. 483\u2013495, Santiago de Compostela, Spain . ISBN: 978-3-319-75178-8","key":"5883_CR18","DOI":"10.1007\/978-3-319-75178-8_39"},{"unstructured":"Martin RC (1996) The open-closed principle. More C++ gems 19(96), 9","key":"5883_CR19"},{"issue":"1","key":"5883_CR20","first-page":"66","volume":"31","author":"E Tejedor","year":"2017","unstructured":"Tejedor E, Becerra Y, Alomar G, Queralt A, Badia RM, Torres J, Cortes T, Labarta J (2017) Pycompss: parallel computational workflows in python. IJHPCA 31(1):66\u201382","journal-title":"IJHPCA"},{"doi-asserted-by":"crossref","unstructured":"Mammadli N, Ejarque\u00a0Artigas J, \u00c1lvarez\u00a0Cid-Fuentes J, Badia\u00a0Sala RM (2022) Dds: integrating data analytics transformations in task-based workflows [version 1; peer review: 1 approved, 2 approved with reservations]. Open Research Europe 2(article 66), 1\u201316","key":"5883_CR21","DOI":"10.12688\/openreseurope.14569.1"},{"key":"5883_CR22","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.115733","volume":"186","author":"L Belcastro","year":"2021","unstructured":"Belcastro L, Marozzo F, Perrella E (2021) Automatic detection of user trajectories from social media posts. Expert Syst Appl 186:115733","journal-title":"Expert Syst Appl"},{"doi-asserted-by":"crossref","unstructured":"Li C et al. (2008) Efficiently mining closed subsequences with gap constraints","key":"5883_CR23","DOI":"10.1137\/1.9781611972788.28"},{"unstructured":"Chin D, Zappone A, Zhao J (2016) Analyzing twitter sentiment of the 2016 presidential candidates. Am J Sci Res","key":"5883_CR24"},{"issue":"12","key":"5883_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0144296","volume":"10","author":"P Kralj Novak","year":"2015","unstructured":"Kralj Novak P, Smailovi\u0107 J, Sluban B, Mozeti\u010d I (2015) Sentiment of emojis. PLOS ONE 10(12):1\u201322","journal-title":"PLOS ONE"},{"issue":"1","key":"5883_CR26","doi-asserted-by":"publisher","first-page":"47177","DOI":"10.1109\/ACCESS.2020.2978950","volume":"8","author":"L Belcastro","year":"2020","unstructured":"Belcastro L, Cantini R, Marozzo F, Talia D, Trunfio P (2020) Learning political polarization on social media using neural networks. IEEE Access 8(1):47177\u201347187","journal-title":"IEEE Access"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-023-05883-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-023-05883-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-023-05883-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,6]],"date-time":"2024-05-06T11:12:22Z","timestamp":1714993942000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-023-05883-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,2,2]]},"references-count":26,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,5]]}},"alternative-id":["5883"],"URL":"https:\/\/doi.org\/10.1007\/s11227-023-05883-z","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"type":"print","value":"0920-8542"},{"type":"electronic","value":"1573-0484"}],"subject":[],"published":{"date-parts":[[2024,2,2]]},"assertion":[{"value":"23 December 2023","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 February 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}