{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:21:09Z","timestamp":1760059269892,"version":"build-2065373602"},"reference-count":33,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T00:00:00Z","timestamp":1749081600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Italian Ministry of University and Research (MUR) National Innovation Ecosystem grant","award":["ECS00000041\u2014VITALITY\u2014CUP J13C22000430001"],"award-info":[{"award-number":["ECS00000041\u2014VITALITY\u2014CUP J13C22000430001"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>With the advent of Big Data, data mining techniques have become crucial for improving decision-making across diverse sectors, yet their employment demands significant resources and time. Time is critical in industrial contexts, as delays can lead to increased costs, missed opportunities, and reduced competitive advantage. To address this, systems for analyzing data can help prototype data mining pipelines, mitigating the risks of failure and resource wastage, especially when experimenting with novel techniques. Moreover, business experts often lack deep technical expertise and need robust support to validate their pipeline designs quickly. This paper presents Rainfall, a novel framework for rapidly prototyping data mining pipelines, developed through collaborative projects with industry. The framework\u2019s requirements stem from a combination of literature review findings, iterative industry engagement, and analysis of existing tools. Rainfall enables the visual programming, execution, monitoring, and management of data mining pipelines, lowering the barrier for non-technical users. Pipelines are composed of configurable nodes that encapsulate functionalities from popular libraries or custom user-defined code, fostering experimentation. The framework is evaluated through a case study and SWOT analysis with INGKA, a large-scale industry partner, alongside usability testing with real users and validation against scenarios from the literature. The paper then underscores the value of industry\u2013academia collaboration in bridging theoretical innovation with practical application.<\/jats:p>","DOI":"10.3390\/bdcc9060150","type":"journal-article","created":{"date-parts":[[2025,6,5]],"date-time":"2025-06-05T09:43:18Z","timestamp":1749116598000},"page":"150","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Framework for Rapidly Prototyping Data Mining Pipelines"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6767-2184","authenticated-orcid":false,"given":"Flavio","family":"Corradini","sequence":"first","affiliation":[{"name":"Computer Science Department, School of Science and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032 Camerino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-9639-0762","authenticated-orcid":false,"given":"Luca","family":"Mozzoni","sequence":"additional","affiliation":[{"name":"Computer Science Department, School of Science and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032 Camerino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8545-3740","authenticated-orcid":false,"given":"Marco","family":"Piangerelli","sequence":"additional","affiliation":[{"name":"Computer Science Department, School of Science and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032 Camerino, Italy"},{"name":"Vici & C. S.p.A., Via J. Gutenberg, 5, 47822 Santarcangelo di Romagna, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5374-2364","authenticated-orcid":false,"given":"Barbara","family":"Re","sequence":"additional","affiliation":[{"name":"Computer Science Department, School of Science and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032 Camerino, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6872-0616","authenticated-orcid":false,"given":"Lorenzo","family":"Rossi","sequence":"additional","affiliation":[{"name":"Computer Science Department, School of Science and Technology, University of Camerino, Via Madonna delle Carceri 7, 62032 Camerino, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.2307\/41703503","article-title":"Business intelligence and analytics: From big data to big impact","volume":"36","author":"Chen","year":"2012","journal-title":"MIS Q."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kitchin, R. (2014). The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences, Sage.","DOI":"10.4135\/9781473909472"},{"key":"ref_3","unstructured":"Ackermann, L., K\u00e4ppel, M., Marcus, L., Moder, L., Dunzer, S., Hornsteiner, M., Liessmann, A., Zisgen, Y., Empl, P., and Herm, L.V. (2024). Recent Advances in Data-Driven Business Process Management. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"799","DOI":"10.1016\/j.ymssp.2017.11.016","article-title":"Machinery health prognostics: A systematic review from data acquisition to RUL prediction","volume":"104","author":"Lei","year":"2018","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1016\/j.cie.2018.09.034","article-title":"Learning-based scheduling of flexible manufacturing systems using ensemble methods","volume":"126","author":"Priore","year":"2018","journal-title":"Comput. Ind. Eng."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"3469","DOI":"10.1007\/s10845-022-02014-y","article-title":"SPECTRE: A deep learning network for posture recognition in manufacturing","volume":"34","author":"Ciccarelli","year":"2023","journal-title":"J. Intell. Manuf."},{"key":"ref_7","first-page":"200071","article-title":"Stable heuristic miner: Applying statistical stability to discover the common patient pathways from location event logs","volume":"14","author":"Araghi","year":"2022","journal-title":"Intell. Syst. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1007\/s41019-023-00234-7","article-title":"Anomaly Detection with Sub-Extreme Values: Health Provider Billing","volume":"9","author":"Muspratt","year":"2024","journal-title":"Data Sci. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Iatropoulos, D., Sarlis, V., and Tjortjis, C. (2025). A Data Mining Approach to Identify NBA Player Quarter-by-Quarter Performance Patterns. Big Data Cogn. Comput., 9.","DOI":"10.3390\/bdcc9040074"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1007\/s41019-019-00115-y","article-title":"Deep learning for user interest and response prediction in online display advertising","volume":"5","author":"Gharibshah","year":"2020","journal-title":"Data Sci. Eng."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1007\/s10766-007-0068-8","article-title":"A run-time system for efficient execution of scientific workflows on distributed environments","volume":"36","author":"Teodoro","year":"2008","journal-title":"Int. J. Parallel Program."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"02002","DOI":"10.1051\/e3sconf\/202132502002","article-title":"Comparison of data mining algorithms for pressure prediction of crude oil pipeline to identify congeal","volume":"Volume 325","author":"Santoso","year":"2021","journal-title":"Proceedings of the E3S Web of Conferences"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1700552","DOI":"10.1002\/adem.201700552","article-title":"Fused deposition modeling for unmanned aerial vehicles (UAVs): A review","volume":"20","author":"Klippstein","year":"2018","journal-title":"Adv. Eng. Mater."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"683","DOI":"10.1007\/s00170-007-1129-2","article-title":"Qualification of rapid prototyping tools: Proposition of a procedure and a test part","volume":"38","author":"Scaravetti","year":"2008","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref_15","first-page":"3208","article-title":"Application of Rapid Prototyping Technology in Product Design","volume":"989","author":"Li","year":"2014","journal-title":"Adv. Mater. Res."},{"key":"ref_16","first-page":"185","article-title":"Determining factors hindering university-industry collaboration: An analysis from the perspective of academicians in the context of entrepreneurial science paradigm","volume":"4","author":"Kaymaz","year":"2011","journal-title":"Int. J. Soc. Inq."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., and Euler, T. (2006, January 20\u201323). Yale: Rapid prototyping for complex data mining tasks. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA.","DOI":"10.1145\/1150402.1150531"},{"key":"ref_18","unstructured":"Wirth, R., and Hipp, J. (2000, January 11\u201313). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, Manchester, UK."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1108\/eb026584","article-title":"Foundation of evaluation","volume":"30","year":"1974","journal-title":"J. Doc."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1013208.1013209","article-title":"Advances in dataflow programming languages","volume":"36","author":"Johnston","year":"2004","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"del Amo, I.G., Torres, M.G., Batista, B.M., P\u00e9rez, J.M., Vega, J.M., and Mart\u00edn, R.R. (2005, January 7\u201311). Data mining with scatter search. Proceedings of the International Conference on Computer Aided Systems Theory, Las Palmas de Gran Canaria, Spain.","DOI":"10.1007\/11556985_25"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Van Der Aalst, W. (2016). Data Science in Action, Springer.","DOI":"10.1007\/978-3-662-49851-4_1"},{"key":"ref_24","unstructured":"Frank, E., Hall, M.A., and Witten, I.H. (2016). The WEKA Workbench. Data Mining: Practical Machine Learning Tools and Techniques, Elsevier."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cal\u00f2, T., and De Russis, L. (2023, January 27\u201330). Towards A Visual Programming Tool to Create Deep Learning Models. Proceedings of the Companion Proceedings of the 2023 ACM SIGCHI Symposium on Engineering Interactive Computing Systems, Swansea, UK.","DOI":"10.1145\/3596454.3597181"},{"key":"ref_26","first-page":"1","article-title":"Introduction to BPMN","volume":"2","author":"White","year":"2004","journal-title":"Ibm Cooperation"},{"key":"ref_27","first-page":"1","article-title":"Xes-standard definition","volume":"1409","author":"Gunther","year":"2014","journal-title":"BPM Rep."},{"key":"ref_28","first-page":"1","article-title":"SWOT analysis","volume":"12","author":"Galea","year":"2015","journal-title":"Wiley Encycl. Manag."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"2","author":"Davies","year":"1979","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1002\/widm.1045","article-title":"Replaying history on process models for conformance checking and performance analysis","volume":"2","author":"Adriansyah","year":"2012","journal-title":"Wiley Interdiscip. Rev. Data Min. Knowl. Discov."},{"key":"ref_31","first-page":"4","article-title":"SUS\u2014A quick and dirty usability scale","volume":"189","author":"Brooke","year":"1996","journal-title":"Usability Eval. Ind."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1016\/j.datak.2004.12.009","article-title":"Complexity and clarity in conceptual modeling: Comparison of mandatory and optional properties","volume":"55","author":"Gemino","year":"2005","journal-title":"Data Knowl. Eng."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1016\/j.is.2009.03.009","article-title":"Activity labeling in process modeling: Empirical insights and recommendations","volume":"35","author":"Mendling","year":"2010","journal-title":"Inf. Syst."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/6\/150\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:47:12Z","timestamp":1760032032000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/6\/150"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,5]]},"references-count":33,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["bdcc9060150"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9060150","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,6,5]]}}}