{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,29]],"date-time":"2025-06-29T04:04:42Z","timestamp":1751169882365,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>\n            Real-life business process event logs may suffer from significant data quality problems negatively influencing process mining analysis. Over time, a range of approaches has been developed to detect and repair these quality problems. Validation of these approaches tends to be challenging due to the lack of a ground truth. Moreover, the identification and definition of event log quality problems have been tackled mainly through a pattern-based approach, with systematic and extensible methods currently lacking. In this article, we present\n            <jats:italic toggle=\"yes\">FLAWD<\/jats:italic>\n            , a formal language for describing event log data quality issues that enables solutions addressing the shortcomings of process mining data quality research identified above.\n            <jats:italic toggle=\"yes\">FLAWD<\/jats:italic>\n            can be used to formally describe and possibly reason over event log data quality errors, as well as to guide the development of tools for controlled and sophisticated \u201cpolluting\u201d of event logs through which benchmark datasets may be systematically created. We present the abstract syntax grammar of\n            <jats:italic toggle=\"yes\">FLAWD<\/jats:italic>\n            and an open-source software tool based on it that allows for the insertion of all so-called event log imperfection patterns in a stochastic manner. We show how\n            <jats:italic toggle=\"yes\">FLAWD<\/jats:italic>\n            has been used in our research to generate benchmark datasets and how it can be used to formally describe and replicate a range of errors found in real-life event logs.\n          <\/jats:p>","DOI":"10.1145\/3743144","type":"journal-article","created":{"date-parts":[[2025,6,7]],"date-time":"2025-06-07T09:13:56Z","timestamp":1749287636000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Language to Model and Simulate Data Quality Issues in Process Mining"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6944-4705","authenticated-orcid":false,"given":"Marco","family":"Comuzzi","sequence":"first","affiliation":[{"name":"Ulsan National Institute of Science and Technology, UNIST","place":["Ulsan, Korea (the Republic of)"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8322-8056","authenticated-orcid":false,"given":"Jonghyeon","family":"Ko","sequence":"additional","affiliation":[{"name":"Queensland University of Technology","place":["Brisbane, Australia"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9089-6896","authenticated-orcid":false,"given":"Fabrizio","family":"Maggi","sequence":"additional","affiliation":[{"name":"Free University of Bozen-Bolzano","place":["Bolzano, Italy"]}]}],"member":"320","published-online":{"date-parts":[[2025,6,28]]},"reference":[{"key":"e_1_3_3_2_2","first-page":"289","volume-title":"Proceedings of the International Conference on Business Process Management","author":"Alman Anti","year":"2023","unstructured":"Anti Alman, Fabrizio Maria Maggi, Marco Montali, and Andrey Rivkin. 2023. Generating event logs from hybrid process models. In Proceedings of the International Conference on Business Process Management. Springer, 289\u2013301."},{"key":"e_1_3_3_3_2","first-page":"116","volume-title":"Proceedings of the On the Move to Meaningful Internet Systems. OTM 2018 Conferences: Confederated International Conferences: CoopIS, CandTC, and ODBASE 2018, Valletta, Malta, October 22\u201326, 2018","author":"Andrews Robert","year":"2018","unstructured":"Robert Andrews, Suriadi Suriadi, Chun Ouyang, and Erik Poppe. 2018. Towards event log querying for data quality: Let\u2019s start with detecting log imperfections. In Proceedings of the On the Move to Meaningful Internet Systems. OTM 2018 Conferences: Confederated International Conferences: CoopIS, CandTC, and ODBASE 2018, Valletta, Malta, October 22\u201326, 2018. Springer, 116\u2013134."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2020.113265"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jsis.2022.101745"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3630025"},{"key":"e_1_3_3_7_2","first-page":"127","volume-title":"Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013, Singapore, 16\u201319 April, 2013","author":"Bose Jagadeesh Chandra J. C.","year":"2013","unstructured":"Jagadeesh Chandra J. C. Bose, R. S. Mans, and Wil M. P. van der Aalst. 2013. Wanna improve process mining results? It\u2019s high time we consider data quality issues seriously. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2013, Singapore, 16\u201319 April, 2013. 127\u2013134."},{"key":"e_1_3_3_8_2","first-page":"1","article-title":"Plg2: Multiperspective process randomization with online and offline simulations.","volume":"1789","author":"Burattin Andrea","year":"2016","unstructured":"Andrea Burattin. 2016. Plg2: Multiperspective process randomization with online and offline simulations.BPM (Demos) 1789 (2016), 1\u20136.","journal-title":"BPM (Demos)"},{"key":"e_1_3_3_9_2","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1007\/978-3-031-16103-2_14","volume-title":"Proceedings of the International Conference on Business Process Management","author":"Burattin Andrea","year":"2022","unstructured":"Andrea Burattin, Barbara Re, Lorenzo Rossi, and Francesco Tiezzi. 2022. A purpose-guided log generation framework. In Proceedings of the International Conference on Business Process Management. Springer, 181\u2013198."},{"issue":"1","key":"e_1_3_3_10_2","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1007\/s44311-024-00002-4","article-title":"Predictive process monitoring: Concepts, challenges, and future research directions","volume":"1","author":"Ceravolo Paolo","year":"2024","unstructured":"Paolo Ceravolo, Marco Comuzzi, Jochen De Weerdt, Chiara Di Francescomarino, and Fabrizio Maria Maggi. 2024. Predictive process monitoring: Concepts, challenges, and future research directions. Process Science 1, 1 (2024), 2.","journal-title":"Process Science"},{"key":"e_1_3_3_11_2","volume-title":"Proceedings of the ICPM 2024 Workshops","author":"Comuzzi Marco","year":"2025","unstructured":"Marco Comuzzi, Sungkyu Kim, Ko Jonghyeon, Musa Salamov, Cinzia Cappiello, and Barbara Pernici. 2025. On the impact of low-quality activity labels in predictive process monitoring. In Proceedings of the ICPM 2024 Workshops. Springer. https:\/\/ml4pm.di.unimi.it\/preproceedings\/ICPM_2024_paper_175.pdf"},{"key":"e_1_3_3_12_2","first-page":"117","article-title":"Declare4Py: A Python library for declarative process mining.","volume":"3216","author":"Donadello Ivan","year":"2022","unstructured":"Ivan Donadello, Francesco Riva, Fabrizio Maria Maggi, and Aladdin Shikhizada. 2022. Declare4Py: A Python library for declarative process mining.BPM (PhD\/Demos) 3216 (2022), 117\u2013121.","journal-title":"BPM (PhD\/Demos)"},{"key":"e_1_3_3_13_2","series-title":"CEUR Workshop Proceedings","first-page":"117","volume-title":"Proceedings of the BPM (PhD\/Demos)","volume":"3216","author":"Donadello Ivan","year":"2022","unstructured":"Ivan Donadello, Francesco Riva, Fabrizio Maria Maggi, and Aladdin Shikhizada. 2022. Declare4Py: A Python library for declarative process mining. In Proceedings of the BPM (PhD\/Demos)(CEUR Workshop Proceedings, Vol. 3216). 117\u2013121."},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2022.102039"},{"issue":"4","key":"e_1_3_3_15_2","doi-asserted-by":"crossref","first-page":"85","DOI":"10.2308\/HORIZONS-2022-153","article-title":"The development of the process mining event log generator (PMELG) tool","volume":"37","author":"Hawkins Steven R.","year":"2023","unstructured":"Steven R. Hawkins, Jeffrey Pickerd, Scott L. Summers, and David A. Wood. 2023. The development of the process mining event log generator (PMELG) tool. Accounting Horizons 37, 4 (2023), 85\u201395.","journal-title":"Accounting Horizons"},{"key":"e_1_3_3_16_2","series-title":"CEUR Workshop Proceedings","first-page":"23","volume-title":"Proceedings of the BPM Demo Track 2016 Co-located with the 14th International Conference on Business Process Management (BPM 2016), Rio de Janeiro, Brazil, September 21, 2016.","volume":"1789","author":"Jouck Toon","year":"2016","unstructured":"Toon Jouck and B. Depaire. 2016. PTandLogGenerator: A generator for artificial event data. In Proceedings of the BPM Demo Track 2016 Co-located with the 14th International Conference on Business Process Management (BPM 2016), Rio de Janeiro, Brazil, September 21, 2016.Leonardo Azevedo and Cristina Cabanillas (Eds.), CEUR Workshop Proceedings, Vol. 1789, 23\u201327."},{"key":"e_1_3_3_17_2","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1007\/s12599-018-0541-5","article-title":"Generating artificial data for empirical analysis of control-flow discovery algorithms: A process tree and log generator","volume":"61","author":"Jouck Toon","year":"2019","unstructured":"Toon Jouck and B. Depaire. 2019. Generating artificial data for empirical analysis of control-flow discovery algorithms: A process tree and log generator. Business and Information Systems Engineering 61, 6 (2019), 695\u2013712.","journal-title":"Business and Information Systems Engineering"},{"key":"e_1_3_3_18_2","first-page":"1","volume-title":"Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI\u201916)","author":"Kherbouche Mohammed Oussama","year":"2016","unstructured":"Mohammed Oussama Kherbouche, Nassim Laga, and Pierre-Aymeric Masse. 2016. Towards a better assessment of event logs quality. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI\u201916). IEEE, 1\u20138."},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12599-023-00794-y"},{"key":"e_1_3_3_20_2","series-title":"CEUR Workshop Proceedings","first-page":"35","volume-title":"Proceedings of the ICPM Doctoral Consortium and Tool Demonstration Track 2020 Co-located with the 2nd International Conference on Process Mining (ICPM 2020).","volume":"2703","author":"Ko Jonghyeon","year":"2020","unstructured":"Jonghyeon Ko, Jongyup Lee, and Marco Comuzzi. 2020. AIR-BAGEL: An interactive root cause-based anomaly generator for event logs. In Proceedings of the ICPM Doctoral Consortium and Tool Demonstration Track 2020 Co-located with the 2nd International Conference on Process Mining (ICPM 2020).Claudio Di Ciccio, Beno\u00eet Depaire, Jochen De Weerdt, Chiara Di Francescomarino, and Jorge Munoz-Gama (Eds.), CEUR Workshop Proceedings, Vol. 2703, 35\u201338."},{"key":"e_1_3_3_21_2","first-page":"123","volume-title":"Proceedings of the International Conference on Business Process Management","author":"Koschmider Agnes","year":"2021","unstructured":"Agnes Koschmider, Kay Kaczmarek, Mathias Krause, and Sebastiaan J. van Zelst. 2021. Demystifying noise and outliers in event logs: Review and future directions. In Proceedings of the International Conference on Business Process Management. Springer, 123\u2013135."},{"key":"e_1_3_3_22_2","volume-title":"Proceedings of the BPM 2015","author":"Leontjeva Anna","year":"2015","unstructured":"Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, and Fabrizio Maria Maggi. 2015. Complex symbolic sequence encodings for predictive monitoring of business processes. In Proceedings of the BPM 2015."},{"key":"e_1_3_3_23_2","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1007\/978-3-031-70396-6_13","volume-title":"Proceedings of the International Conference on Business Process Management","author":"Maldonado Andrea","year":"2024","unstructured":"Andrea Maldonado, Christian M. M. Frey, Gabriel Marques Tavares, Nikolina Rehwald, and Thomas Seidl. 2024. GEDI: Generating event data with intentional features for benchmarking process mining. In Proceedings of the International Conference on Business Process Management. Springer, 221\u2013237."},{"key":"e_1_3_3_24_2","doi-asserted-by":"crossref","first-page":"373","DOI":"10.1007\/978-3-031-08848-3_12","volume-title":"Proceedings of the Process Mining Handbook","author":"Mannhardt Felix","year":"2022","unstructured":"Felix Mannhardt. 2022. Responsible process mining. In Proceedings of the Process Mining Handbook. Springer, Cham, 373\u2013401."},{"key":"e_1_3_3_25_2","series-title":"Springer, Communications in Computer and Information Science","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1007\/978-3-540-92219-3_32","volume-title":"Proceedings of the Biomedical Engineering Systems and Technologies, International Joint Conference, BIOSTEC 2008, Funchal, Madeira, Portugal, January 28\u201331, 2008, Revised Selected Papers.","volume":"25","author":"Mans R. S.","year":"2008","unstructured":"R. S. Mans, Helen Schonenberg, Minseok Song, Wil M. P. van der Aalst, and Piet J. M. Bakker. 2008. Application of process mining in healthcare - A case study in a dutch hospital. In Proceedings of the Biomedical Engineering Systems and Technologies, International Joint Conference, BIOSTEC 2008, Funchal, Madeira, Portugal, January 28\u201331, 2008, Revised Selected Papers.Ana L. N. Fred, Joaquim Filipe, and Hugo Gamboa (Eds.), Springer, Communications in Computer and Information Science, Vol. 25, 425\u2013438."},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.5555\/91414"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.04.052"},{"key":"e_1_3_3_28_2","series-title":"CEUR Workshop Proceedings","volume-title":"Proceedings of the Workshop AI for Public Administration, Ital-IA, Pisa, Italy","volume":"3486","author":"Pernici Barbara","year":"2023","unstructured":"Barbara Pernici, Alessandro Campi, Marco Dilettis, and Paolo Gerosa. 2023. Why are Italian trials taking so long? A process mining approach. In Proceedings of the Workshop AI for Public Administration, Ital-IA, Pisa, Italy(CEUR Workshop Proceedings, Vol. 3486)."},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2023.102210"},{"key":"e_1_3_3_30_2","first-page":"76","volume-title":"Proceedings of the International Conference on Cooperative Information Systems","author":"Sadeghianasl Sareh","year":"2019","unstructured":"Sareh Sadeghianasl, Arthur H. M. ter Hofstede, Moe T. Wynn, and Suriadi Suriadi. 2019. A contextual approach to detecting synonymous and polluted activity labels in process event logs. In Proceedings of the International Conference on Cooperative Information Systems. 76\u201394."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2023.102246"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2023.102246"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2016.07.011"},{"issue":"3","key":"e_1_3_3_34_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3613247","article-title":"Process-data quality: The true frontier of process mining","volume":"15","author":"Hofstede Arthur H. M. ter","year":"2023","unstructured":"Arthur H. M. ter Hofstede, Agnes Koschmider, Andrea Marrella, Robert Andrews, Dominik A. Fischer, Sareh Sadeghianasl, Moe Thandar Wynn, Marco Comuzzi, Jochen De Weerdt, Kanika Goel, et\u00a0al. 2023. Process-data quality: The true frontier of process mining. ACM Journal of Data and Information Quality 15, 3 (2023), 1\u201321.","journal-title":"ACM Journal of Data and Information Quality"},{"key":"e_1_3_3_35_2","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1007\/978-3-642-28108-2_19","volume-title":"Proceedings of the Business Process Management Workshops: BPM 2011 International Workshops, Clermont-Ferrand, France, August 29, 2011, Revised Selected Papers, Part I 9","author":"Aalst Wil van Der","year":"2012","unstructured":"Wil van Der Aalst, Arya Adriansyah, Ana Karla Alves De Medeiros, Franco Arcieri, Thomas Baier, Tobias Blickle, Jagadeesh Chandra Bose, Peter Van Den Brand, Ronald Brandtjen, Joos Buijs, et\u00a0al. 2012. Process mining manifesto. In Proceedings of the Business Process Management Workshops: BPM 2011 International Workshops, Clermont-Ferrand, France, August 29, 2011, Revised Selected Papers, Part I 9. Springer, 169\u2013194."},{"key":"e_1_3_3_36_2","volume-title":"Process Mining - Data Science in Action, Second Edition","author":"Aalst Wil M. P. van der","year":"2016","unstructured":"Wil M. P. van der Aalst. 2016. Process Mining - Data Science in Action, Second Edition. Springer, Cham."},{"key":"e_1_3_3_37_2","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1007\/978-3-030-26619-6_2","volume-title":"Proceedings of the Business Process Management: 17th International Conference, BPM 2019, Vienna, Austria, September 1\u20136, 2019","author":"Wynn Moe Thandar","year":"2019","unstructured":"Moe Thandar Wynn and Shazia Sadiq. 2019. Responsible process mining-a data quality perspective. In Proceedings of the Business Process Management: 17th International Conference, BPM 2019, Vienna, Austria, September 1\u20136, 2019. Springer, 10\u201315."},{"key":"e_1_3_3_38_2","first-page":"130","volume-title":"Proceedings of the International Conference on Advanced Information Systems Engineering","author":"Zisgen Yorck","year":"2022","unstructured":"Yorck Zisgen, Dominik Janssen, and Agnes Koschmider. 2022. Generating synthetic sensor event logs for process mining. In Proceedings of the International Conference on Advanced Information Systems Engineering. Springer, 130\u2013137."}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3743144","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,28]],"date-time":"2025-06-28T11:13:47Z","timestamp":1751109227000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3743144"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,28]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3743144"],"URL":"https:\/\/doi.org\/10.1145\/3743144","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2025,6,28]]},"assertion":[{"value":"2024-09-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-06-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}