{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,18]],"date-time":"2026-05-18T10:44:28Z","timestamp":1779101068980,"version":"3.51.4"},"reference-count":50,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,6,9]],"date-time":"2022-06-09T00:00:00Z","timestamp":1654732800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Process mining aims to gain knowledge of business processes via the discovery of process models from event logs generated by information systems. The insights revealed from process mining heavily rely on the quality of the event logs. Activities extracted from different data sources or the free-text nature within the same system may lead to inconsistent labels. Such inconsistency would then lead to redundancy in activity labels, which refer to labels that have different syntax but share the same behaviours. Redundant activity labels can introduce unnecessary complexities to the event logs. The identification of these labels from data-driven process discovery are difficult and rely heavily on human intervention. Neither existing process discovery algorithms nor event data preprocessing techniques can solve such redundancy efficiently. In this paper, we propose a multi-view approach to automatically detect redundant activity labels by using not only context-aware features such as control\u2013flow relations and attribute values but also semantic features from the event logs. Our evaluation of several publicly available datasets and a real-life case study demonstrate that our approach can efficiently detect redundant activity labels even with low-occurrence frequencies. The proposed approach can add value to the preprocessing step to generate more representative event logs.<\/jats:p>","DOI":"10.3390\/fi14060181","type":"journal-article","created":{"date-parts":[[2022,6,10]],"date-time":"2022-06-10T00:22:39Z","timestamp":1654820559000},"page":"181","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["A Multi-View Framework to Detect Redundant Activity Labels for More Representative Event Logs in Process Mining"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1068-6408","authenticated-orcid":false,"given":"Qifan","family":"Chen","sequence":"first","affiliation":[{"name":"School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9002-8650","authenticated-orcid":false,"given":"Yang","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Charmaine S.","family":"Tam","sequence":"additional","affiliation":[{"name":"Centre for Translational Data Science and Northern Clinical School, The University of Sydney, Sydney, NSW 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2726-9109","authenticated-orcid":false,"given":"Simon K.","family":"Poon","sequence":"additional","affiliation":[{"name":"School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Van Der Aalst, W. (2016). Data science in action. Process Mining, Springer.","DOI":"10.1007\/978-3-662-49851-4"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Marin-Castro, H.M., and Tello-Leal, E. (2021). Event Log Preprocessing for Process Mining: A Review. Appl. Sci., 11.","DOI":"10.3390\/app112210556"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1016\/j.datak.2010.06.001","article-title":"Mining process models with prime invisible tasks","volume":"69","author":"Wen","year":"2010","journal-title":"Data Knowl. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Maggi, F.M., Bose, R., and van der Aalst, W.M. (2012, January 25\u201329). Efficient discovery of understandable declarative process models from event logs. Proceedings of the International Conference on Advanced Information Systems Engineering, Gdansk, Poland.","DOI":"10.1007\/978-3-642-31095-9_18"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Mans, R.S., Van der Aalst, W.M., and Vanwersch, R.J. (2015). Process Mining in Healthcare: Evaluating and Exploiting Operational Healthcare Processes, Springer.","DOI":"10.1007\/978-3-319-16071-9"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Weijters, A., and Ribeiro, J. (2011, January 11\u201315). Flexible heuristics miner (FHM). Proceedings of the 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Paris, France.","DOI":"10.1109\/CIDM.2011.5949453"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/s10115-018-1214-x","article-title":"Split miner: Automated discovery of accurate and simple business process models from event logs","volume":"59","author":"Augusto","year":"2019","journal-title":"Knowl. Inf. Syst."},{"key":"ref_8","unstructured":"Chen, Q., Lu, Y., Tam, C., and Poon, S. (2021, January 6\u201310). Process Mining to Discover and Preserve Infrequent Relations in Event Logs: An Application to Understand the Laboratory Test Ordering Process Using the MIMIC-III Dataset. Proceedings of the Australasian Conference on Information Systems (ACIS), Sydney, Australia."},{"key":"ref_9","unstructured":"Van Der Aalst, W., Adriansyah, A., De Medeiros, A.K.A., Arcieri, F., Baier, T., Blickle, T., Bose, J.C., Van Den Brand, P., Brandtjen, R., and Buijs, J. (September, January 30). Process mining manifesto. Proceedings of the International Conference on Business Process Management, Clermont-Ferrand, France."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/j.is.2016.07.011","article-title":"Event log imperfection patterns for process mining: Towards a systematic approach to cleaning event logs","volume":"64","author":"Suriadi","year":"2017","journal-title":"Inf. Syst."},{"key":"ref_11","unstructured":"Sadeghianasl, S., ter Hofstede, A.H., Wynn, M.T., and Suriadi, S. (2012, January 10\u201314). A contextual approach to detecting synonymous and polluted activity labels in process event logs. Proceedings of the OTM Confederated International Conferences On the Move to Meaningful Internet Systems, Rome, Italy."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.35","article-title":"MIMIC-III, a freely accessible critical care database","volume":"3","author":"Johnson","year":"2016","journal-title":"Sci. Data"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Sadeghianasl, S., ter Hofstede, A.H., Suriadi, S., and Turkay, S. (2020, January 5\u20138). Collaborative and interactive detection and repair of activity labels in process event logs. Proceedings of the 2020 2nd International Conference on Process Mining (ICPM), Padua, Italy.","DOI":"10.1109\/ICPM49681.2020.00017"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"165865","DOI":"10.1109\/ACCESS.2021.3134915","article-title":"Process Activity Ontology Learning From Event Logs Through Gamification","volume":"9","author":"Sadeghianasl","year":"2021","journal-title":"IEEE Access"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lu, Y., Chen, Q., and Poon, S.K. (2022). A Deep Learning Approach for Repairing Missing Activity Labels in Event Logs for Process Mining. Information, 13.","DOI":"10.3390\/info13050234"},{"key":"ref_16","first-page":"40","article-title":"Disco: Discover Your Processes","volume":"940","author":"Rozinat","year":"2012","journal-title":"BPM (Demos)"},{"key":"ref_17","unstructured":"Mannhardt, F., and Blinde, D. (2017). Analyzing the Trajectories of Patients with Sepsis Using Process Mining, RADAR+ EMISA@ CAiSE."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Tam, C.S., Gullick, J., Saavedra, A., Vernon, S.T., Figtree, G.A., Chow, C.K., Cretikos, M., Morris, R.W., William, M., and Morris, J. (2021). Combining structured and unstructured data in EMRs to create clinically-defined EMR-derived cohorts. BMC Med Inform. Decis. Mak., 21.","DOI":"10.1186\/s12911-021-01441-w"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.1109\/TKDE.2004.47","article-title":"Workflow mining: Discovering process models from event logs","volume":"16","author":"Weijters","year":"2004","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/s10618-007-0065-y","article-title":"Mining process models with non-free-choice constructs","volume":"15","author":"Wen","year":"2007","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Leemans, S.J., Fahland, D., and van der Aalst, W.M. (2013, January 26\u201330). Discovering block-structured process models from event logs containing infrequent behaviour. Proceedings of the International Conference on Business Process Management, Beijing, China.","DOI":"10.1007\/978-3-319-06257-0_6"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1016\/j.is.2015.07.004","article-title":"BPMN Miner: Automated discovery of BPMN process models with hierarchical structure","volume":"56","author":"Conforti","year":"2016","journal-title":"Inf. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Buijs, J.C., Van Dongen, B.F., and van Der Aalst, W.M. (2012, January 10\u201314). On the role of fitness, precision, generalization and simplicity in process discovery. Proceedings of the OTM Confederated International Conferences On the Move to Meaningful Internet Systems, Rome, Italy.","DOI":"10.1007\/978-3-642-33606-5_19"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Fox, F., Aggarwal, V.R., Whelton, H., and Johnson, O. (2018, January 4\u20137). A data quality framework for process mining of electronic health record data. Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA.","DOI":"10.1109\/ICHI.2018.00009"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Mans, R.S., van der Aalst, W.M., Vanwersch, R.J., and Moleman, A.J. (2012). Process mining in healthcare: Data challenges when answering frequently posed questions. Process Support and Knowledge Representation in Health Care, Springer.","DOI":"10.1007\/978-3-642-36438-9_10"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bose, R.J.C., Mans, R.S., and van der Aalst, W.M. (2013, January 16\u201319). Wanna improve process mining results?. Proceedings of the 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), Singapore.","DOI":"10.1109\/CIDM.2013.6597227"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2229156.2229157","article-title":"Process mining: Overview and opportunities","volume":"3","year":"2012","journal-title":"ACM Trans. Manag. Inf. Syst. (TMIS)"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Conforti, R., La Rosa, M., Ter Hofstede, A.H., and Augusto, A. (2020, January 13\u201318). Automatic repair of same-timestamp errors in business process event logs. Proceedings of the International Conference on Business Process Management, Seville, Spain.","DOI":"10.1007\/978-3-030-58666-9_19"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sim, S., Bae, H., and Choi, Y. (2019, January 24\u201326). Likelihood-based multiple imputation by event chain methodology for repair of imperfect event logs with missing data. Proceedings of the 2019 International Conference on Process Mining (ICPM), Aachen, Germany.","DOI":"10.1109\/ICPM.2019.00013"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Alharbi, A., Bulpitt, A., and Johnson, O. (2017, January 10\u201315). Improving pattern detection in healthcare process mining using an interval-based event selection method. Proceedings of the International Conference on Business Process Management, Barcelona, Spain.","DOI":"10.1007\/978-3-319-65015-9_6"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"van der Aa, H., Gal, A., Leopold, H., Reijers, H.A., Sagi, T., and Shraga, R. (2017, January 12\u201316). Instance-based process matching using event-log information. Proceedings of the International Conference on Advanced Information Systems Engineering, Essen, Germany.","DOI":"10.1007\/978-3-319-59536-8_18"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Klinkm\u00fcller, C., Weber, I., Mendling, J., Leopold, H., and Ludwig, A. (2013). Increasing recall of process model matching by improved\nactivity label matching. Business Process Management, Springer.","DOI":"10.1007\/978-3-642-40176-3_17"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1016\/j.is.2010.09.006","article-title":"Similarity of business process models: Metrics and evaluation","volume":"36","author":"Dijkman","year":"2011","journal-title":"Inf. Syst."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Richter, F., Zellner, L., Azaiz, I., Winkel, D., and Seidl, T. (2019, January 1\u20136). LIProMa: Label-independent process matching. Proceedings of the International Conference on Business Process Management, Vienna, Austria.","DOI":"10.1007\/978-3-030-37453-2_16"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Koschmider, A., Ullrich, M., Heine, A., and Oberweis, A. (2015). Revising the Vocabulary of Business Process Element Labels. International Conference on Advanced Information Systems Engineering, Springer.","DOI":"10.1007\/978-3-319-19069-3_5"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1016\/j.is.2009.03.009","article-title":"Activity labeling in process modeling: Empirical insights and recommendations","volume":"35","author":"Mendling","year":"2010","journal-title":"Inf. Syst."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1023\/A:1026543900054","article-title":"The earth mover\u2019s distance as a metric for image retrieval","volume":"40","author":"Rubner","year":"2000","journal-title":"Int. J. Comput. Vis."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Assent, I., Wenning, A., and Seidl, T. (2006, January 3\u20137). Approximation techniques for indexing the earth mover\u2019s distance in multimedia databases. Proceedings of the 22nd International Conference on Data Engineering (ICDE\u201906), Atlanta, GA, USA.","DOI":"10.1109\/ICDE.2006.25"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, M., Liu, Y., Luan, H., Sun, M., Izuha, T., and Hao, J. (2016, January 12\u201317). Building earth mover\u2019s distance on bilingual word embeddings for machine translation. Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10351"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Brockhoff, T., Uysal, M.S., and van der Aalst, W.M. (2020, January 5\u20138). Time-aware Concept Drift Detection Using the Earth Mover\u2019s Distance. Proceedings of the 2020 2nd International Conference on Process Mining (ICPM), Padua, Italy.","DOI":"10.1109\/ICPM49681.2020.00016"},{"key":"ref_41","unstructured":"Guo, Q., Wen, L., Wang, J., Yan, Z., and Philip, S.Y. (2016, January 18\u201322). Mining invisible tasks in non-free-choice constructs. Proceedings of the International Conference on Business Process Management, Rio de Janeiro, Brazil."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1007\/BF02289588","article-title":"Hierarchical clustering schemes","volume":"32","author":"Johnson","year":"1967","journal-title":"Psychometrika"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1145\/191843.191925","article-title":"Fast subsequence matching in time-series databases","volume":"23","author":"Faloutsos","year":"1994","journal-title":"Acm Sigmod Rec."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1080\/01621459.1926.10502161","article-title":"The choice of a class interval","volume":"21","author":"Sturges","year":"1926","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_45","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Sov. Phys. Dokl."},{"key":"ref_46","first-page":"411","article-title":"spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing","volume":"7","author":"Honnibal","year":"2017","journal-title":"Appear"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.inffus.2004.04.008","article-title":"Classifier selection for majority voting","volume":"6","author":"Ruta","year":"2005","journal-title":"Inf. Fusion"},{"key":"ref_48","unstructured":"Berti, A., Van Zelst, S.J., and van der Aalst, W. (2019). Process mining for python (PM4Py): Bridging the gap between process-and data science. arXiv."},{"key":"ref_49","first-page":"232","article-title":"Acute coronary syndrome: Current treatment","volume":"95","author":"Switaj","year":"2017","journal-title":"Am. Fam. Physician"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"128","DOI":"10.5694\/mja16.00368","article-title":"National Heart Foundation of Australia and Cardiac Society of Australia and New Zealand: Australian clinical guidelines for the management of acute coronary syndromes 2016","volume":"205","author":"Chew","year":"2016","journal-title":"Med. J. Aust."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/6\/181\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:27:22Z","timestamp":1760138842000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/6\/181"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,9]]},"references-count":50,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["fi14060181"],"URL":"https:\/\/doi.org\/10.3390\/fi14060181","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,9]]}}}