{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T14:36:50Z","timestamp":1774449410156,"version":"3.50.1"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T00:00:00Z","timestamp":1692662400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>In the big data domain, data quality assessment operations are often complex and must be implementable in a distributed and timely manner. This article tries to generalize the quality assessment operations by providing a new ISO-based declarative data quality assessment framework (BIGQA). BIGQA is a flexible solution that supports data quality assessment in different domains and contexts. It facilitates the planning and execution of big data quality assessment operations for data domain experts and data management specialists at any phase in the data life cycle. This work implements BIGQA to demonstrate its ability to produce customized data quality reports while running efficiently on parallel or distributed computing frameworks. BIGQA generates data quality assessment plans using straightforward operators designed to handle big data and guarantee a high degree of parallelism when executed. Moreover, it allows incremental data quality assessment to avoid reading the whole dataset each time the quality assessment operation is required. The result was validated using radiation wireless sensor data and Stack Overflow users\u2019 data to show that it can be implemented within different contexts. The experiments show a 71% performance improvement over a 1 GB flat file on a single processing machine compared with a non-parallel application and a 75% performance improvement over a 25 GB flat file within a distributed environment compared to a non-distributed application.<\/jats:p>","DOI":"10.1145\/3603706","type":"journal-article","created":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T02:39:31Z","timestamp":1686710371000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":14,"title":["BIGQA: Declarative Big Data Quality Assessment"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1160-5980","authenticated-orcid":false,"given":"Hadi","family":"Fadlallah","sequence":"first","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5710-6901","authenticated-orcid":false,"given":"Rima","family":"Kilany","sequence":"additional","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7476-4754","authenticated-orcid":false,"given":"Houssein","family":"Dhayne","sequence":"additional","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6285-0279","authenticated-orcid":false,"given":"Rami","family":"El Haddad","sequence":"additional","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5705-3427","authenticated-orcid":false,"given":"Rafiqul","family":"Haque","sequence":"additional","affiliation":[{"name":"Intelligencia R &amp; D, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8706-8889","authenticated-orcid":false,"given":"Yehia","family":"Taher","sequence":"additional","affiliation":[{"name":"University of Versailles Saint-Quentin-en-Yvelines (UVSQ), France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6976-133X","authenticated-orcid":false,"given":"Ali","family":"Jaber","sequence":"additional","affiliation":[{"name":"Lebanese University, Lebanon"}]}],"member":"320","published-online":{"date-parts":[[2023,8,22]]},"reference":[{"key":"e_1_3_2_2_2","first-page":"104","volume-title":"Proceedings of the International Conference on Software Development (SWDC-REK\u201905)","author":"Abran Alain","year":"2005","unstructured":"Alain Abran, Rafa E. Al-Qutaish, Jean-Marc Desharnais, and Naji Habra. 2005. An information model for software quality measurement with ISO standards. In Proceedings of the International Conference on Software Development (SWDC-REK\u201905), Reykjavik, 104\u2013116."},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"548","DOI":"10.1016\/j.future.2018.07.014","article-title":"Context-aware data quality assessment for big data","volume":"89","author":"Ardagna D.","year":"2018","unstructured":"D. Ardagna, C. Cappiello, Walter Sam\u00e1, and M. Vitali. 2018. Context-aware data quality assessment for big data. Future Gener. Comput. Syst. 89 (2018), 548\u2013562.","journal-title":"Future Gener. Comput. Syst."},{"key":"e_1_3_2_4_2","volume-title":"MDDAUI","author":"Barzdins J.","year":"2007","unstructured":"J. Barzdins, A. Zarins, Karlis Cerans, A. Kalnins, Edgars Rencis, L. Lace, Renars Liepins, and A. Sprogis. 2007. GrTP: Transformation based graphical tool building platform. In MDDAUI."},{"key":"e_1_3_2_5_2","doi-asserted-by":"crossref","first-page":"60","DOI":"10.4018\/JDM.2015010103","article-title":"From data quality to big data quality","volume":"26","author":"Batini C.","year":"2015","unstructured":"C. Batini, A. Rula, M. Scannapieco, and G. Viscusi. 2015. From data quality to big data quality. J. Database Manag. 26, 1 (2015), 60\u201382.","journal-title":"J. Database Manag."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","first-page":"695","DOI":"10.14778\/3402707.3402710","article-title":"Generic schema matching, ten years later","volume":"4","author":"Bernstein Philip A.","year":"2011","unstructured":"Philip A. Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic schema matching, ten years later. Proceedings of the VLDB Endowment 4, 11 (2011), 695\u2013701.","journal-title":"Proceedings of the VLDB Endowment"},{"key":"e_1_3_2_7_2","unstructured":"L. Bertossi. 2017. Some Declarative Approaches to Data Quality. Retrieved February 3 2021 from http:\/\/people.scs.carleton.ca\/bertossi\/talks\/tutBicod17.pdf."},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","unstructured":"L. Bertossi and L. Bravo. 2013. Generic and declarative approaches to data quality management. Handbook of Data Quality: Research and Practice 181\u2013211.","DOI":"10.1007\/978-3-642-36257-6_9"},{"key":"e_1_3_2_9_2","first-page":"999","article-title":"Domain-specific characteristics of data quality","author":"Bicevska Zane","year":"2017","unstructured":"Zane Bicevska, Janis Bicevskis, and Ivo Oditis. 2017. Domain-specific characteristics of data quality. In 2017 Federated Conference on Computer Science and Information Systems (FedCSIS\u201917)999\u20131003.","journal-title":"2017 Federated Conference on Computer Science and Information Systems (FedCSIS\u201917)"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Zane Bicevska J. Bicevskis and Ivo Oditis. 2017. Models of data quality. In Information Technology for Management. Ongoing Research and Development: 15th Conference AITM 2017 and 12th Conference (ISM\u201917 Held as Part of FedCSIS Prague Czech Republic September 3-6 2017) Extended Selected Papers 15. Springer 194\u2013211.","DOI":"10.1007\/978-3-319-77721-4_11"},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1016\/j.procs.2017.01.087","article-title":"Executable data quality models","volume":"104","author":"Bicevskis J.","year":"2017","unstructured":"J. Bicevskis, Zane Bicevska, and G. Karnitis. 2017. Executable data quality models. Procedia Computer Science 104 (2017), 138\u2013145.","journal-title":"Procedia Computer Science"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1109\/SNAMS.2018.8554915","article-title":"An approach to data quality evaluation","author":"Bicevskis J.","year":"2018","unstructured":"J. Bicevskis, Zane Bicevska, A. Nikiforova, and Ivo Oditis. 2018. An approach to data quality evaluation. In 2018 5th International Conference on Social Networks Analysis, Management and Security (SNAMS\u201918)196\u2013201.","journal-title":"2018 5th International Conference on Social Networks Analysis, Management and Security (SNAMS\u201918)"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"Amit Chandel Oktie Hassanzadeh Nick Koudas Mohammad Sadoghi and Divesh Srivastava. 2007. Benchmarking declarative approximate selection predicates. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data 353\u2013364.","DOI":"10.1145\/1247480.1247521"},{"key":"e_1_3_2_14_2","unstructured":"Roger Clarke. 2014. Quality Factors in Big Data and Big Data Analytics. Retrieved December 30 2019 from http:\/\/www.rogerclarke.com\/EC\/BDQF.html."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","DOI":"10.1145\/2623330.2630811","article-title":"Sampling for big data: A tutorial","author":"Cormode Graham","year":"2014","unstructured":"Graham Cormode and Nick G. Duffield. 2014. Sampling for big data: A tutorial. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1975\u20131975.","journal-title":"Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"},{"key":"e_1_3_2_16_2","unstructured":"Microsoft Corporation. 2018. SQL Server Integration Services. Retrieved August 25 2020 from https:\/\/docs.microsoft.com\/en-us\/sql\/integration-services\/sql-server-integration-services?view=sql-server-ver15."},{"key":"e_1_3_2_17_2","volume-title":"Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality [White Paper]","author":"Corporation Oracle","year":"2013","unstructured":"Oracle Corporation. 2013. Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality [White Paper]. Technical Report. Oracle Corporation. Retrieved August 1, 2020 from https:\/\/www.oracle.com\/technetwork\/middleware\/data-integrator\/overview\/oracledi-comprehensive-quality-131748.pdf."},{"key":"e_1_3_2_18_2","volume-title":"Data Quality Evaluation in Data Integration Systems","author":"Costabel Peralta","year":"2006","unstructured":"Peralta Costabel and V. Carmen. 2006. Data Quality Evaluation in Data Integration Systems. Ph. D. Dissertation. Universit\u00e9 de Versailles-Saint Quentin en Yvelines; Universit\u00e9 de la R\u00e9publique d\u00daruguay."},{"key":"e_1_3_2_19_2","volume-title":"BDCSIntell","author":"Fadlallah Hadi","year":"2019","unstructured":"Hadi Fadlallah, Yehia Taher, Rafiqul Haque, and Ali H. Jaber. 2019. ORADIEX: A big data driven smart framework for real-time surveillance and analysis of individual exposure to radioactive pollution. In BDCSIntell, 52\u201356."},{"key":"e_1_3_2_20_2","volume-title":"BDCSIntell","author":"Fadlallah Hadi","year":"2018","unstructured":"Hadi Fadlallah, Yehia Taher, and Ali H. Jaber. 2018. RaDEn: A scalable and efficient radiation data engineering. In BDCSIntell, 89\u201393."},{"key":"e_1_3_2_21_2","doi-asserted-by":"crossref","unstructured":"Jenny Rose Finkel Trond Grenager and Christopher D. Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL\u201905) . 363\u2013370.","DOI":"10.3115\/1219840.1219885"},{"key":"e_1_3_2_22_2","volume-title":"Mastering Regular Expressions (3rd ed.)","author":"Friedl Jeffrey E. F.","year":"2006","unstructured":"Jeffrey E. F. Friedl. 2006. Mastering Regular Expressions (3rd ed.). O\u2019Reilly Media, Inc., Sebastopol, CA."},{"key":"e_1_3_2_23_2","volume-title":"VLDB","author":"Galhardas H.","year":"2001","unstructured":"H. Galhardas, D. Florescu, D. Shasha, E. Simon, and Cristian-Augustin Saita. 2001. Declarative data cleaning: Language, model, and algorithms. In VLDB."},{"key":"e_1_3_2_24_2","first-page":"433","article-title":"Big data validation and quality assurance \u2013 Issues, challenges, and needs","author":"Gao Jerry Zeyu","year":"2016","unstructured":"Jerry Zeyu Gao, Chunli Xie, and Chuanqi Tao. 2016. Big data validation and quality assurance \u2013 Issues, challenges, and needs. In 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE\u201916)433\u2013441.","journal-title":"2016 IEEE Symposium on Service-Oriented System Engineering (SOSE\u201916)"},{"key":"e_1_3_2_25_2","unstructured":"Gartner. 2017. How to Create a Business Case for Data Quality Improvement. Retrieved May 1 2021 from https:\/\/www.gartner.com\/smarterwithgartner\/how-to-create-a-business-case-for-data-quality-improvement\/."},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"Mouzhi Ge and Markus Helfert. 2007. A review of information quality research-develop a research agenda. In The International Conference on Information Quality (ICIQ\u201907) . Citeseer 76\u201391.","DOI":"10.1049\/cp:20070800"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","DOI":"10.1145\/2939672.2939754","article-title":"node2vec: Scalable feature learning for networks","author":"Grover Aditya","year":"2016","unstructured":"Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855\u2013864.","journal-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1016\/j.jpdc.2021.05.012","article-title":"SparkDQ: Efficient generic big data quality management on distributed data-parallel computation","volume":"156","author":"Gu Rong","year":"2021","unstructured":"Rong Gu, Yang Qi, Tongyu Wu, Zhaokang Wang, Xiaolong Xu, C. Yuan, and Yihua Huang. 2021. SparkDQ: Efficient generic big data quality management on distributed data-parallel computation. J. Parallel Distributed Comput. 156 (2021), 132\u2013147.","journal-title":"J. Parallel Distributed Comput."},{"issue":"1","key":"e_1_3_2_29_2","first-page":"1","article-title":"Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations","volume":"10","author":"Gudivada Venkat","year":"2017","unstructured":"Venkat Gudivada, Amy Apon, and Junhua Ding. 2017. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. International Journal on Advances in Software 10, 1 (2017), 1\u201320.","journal-title":"International Journal on Advances in Software"},{"key":"e_1_3_2_30_2","volume-title":"The 2nd International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA\u201916)","author":"Gudivada Venkat N.","year":"2016","unstructured":"Venkat N. Gudivada, Dhana Rao, and William I. Grosky. 2016. Data quality centric application framework for big data. In The 2nd International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA\u201916), 33."},{"key":"e_1_3_2_31_2","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.fss.2014.01.016","article-title":"Parallel sampling from big data with uncertainty distribution","volume":"258","author":"He Qing","year":"2015","unstructured":"Qing He, Haocheng Wang, Fuzhen Zhuang, Tianfeng Shang, and Zhongzhi Shi. 2015. Parallel sampling from big data with uncertainty distribution. Fuzzy Sets Syst. 258 (2015), 117\u2013133.","journal-title":"Fuzzy Sets Syst."},{"key":"e_1_3_2_32_2","first-page":"187","article-title":"A context aware information quality framework","author":"Helfert Markus","year":"2009","unstructured":"Markus Helfert and Owen Foley. 2009. A context aware information quality framework. 2009 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology. 187\u2013193.","journal-title":"2009 4th International Conference on Cooperation and Promotion of Information Resources in Science and Technology"},{"key":"e_1_3_2_33_2","volume-title":"CAiSE","author":"Herschel Melanie","year":"2007","unstructured":"Melanie Herschel and I. Manolescu. 2007. Declarative XML data cleaning with XClean. In CAiSE."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","unstructured":"Kasra Hosseini Federico Nanni and Mariona Coll Ardanuy. 2020. DeezyMatch: A flexible deep learning approach to fuzzy string matching. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations . 62\u201369.","DOI":"10.18653\/v1\/2020.emnlp-demos.9"},{"key":"e_1_3_2_35_2","unstructured":"IBM. 2020. The Four V\u2019s of Big Data. Retrieved May 1 2021 from http:\/\/www.ibmbigdatahub.com\/infographic\/four-vs-big-data. Accessed May 1 2021."},{"key":"e_1_3_2_36_2","volume-title":"Informatica Data Quality Data Sheet","year":"2018","unstructured":"Informatica. 2018. Informatica Data Quality Data Sheet. Technical Report. Informatica. Retrieved August 25, 2020 from https:\/\/www.informatica.com\/content\/dam\/informatica-com\/en\/collateral\/data-sheet\/en_informatica-data-quality_data-sheet_6710.pdf."},{"key":"e_1_3_2_37_2","volume-title":"25012:2008 Software Engineering\u2013Software Product Quality Requirements and Evaluation (SQuaRE)\u2014Data Quality Model","year":"2008","unstructured":"ISO\/IEC. 2008. 25012:2008 Software Engineering\u2013Software Product Quality Requirements and Evaluation (SQuaRE)\u2014Data Quality Model. Standard. ISO\/IEC."},{"key":"e_1_3_2_38_2","volume-title":"ISO\/IEC 25021:2012 Systems and Software Engineering\u2013Systems and Software Quality Requirements and Evaluation (SQuaRE) \u2013Quality Measure Elements","year":"2012","unstructured":"ISO\/IEC. 2012. ISO\/IEC 25021:2012 Systems and Software Engineering\u2013Systems and Software Quality Requirements and Evaluation (SQuaRE) \u2013Quality Measure Elements. Standard. ISO\/IEC."},{"key":"e_1_3_2_39_2","volume-title":"ISO\/IEC 25000:2014. Systems and Software Engineering \u2013 System and Software Quality Requirements and Evaluation (SQuaRE) \u2013 Guide to SQuaRE","year":"2014","unstructured":"ISO\/IEC. 2014. ISO\/IEC 25000:2014. Systems and Software Engineering \u2013 System and Software Quality Requirements and Evaluation (SQuaRE) \u2013 Guide to SQuaRE. Standard. ISO\/IEC."},{"key":"e_1_3_2_40_2","volume-title":"ISO\/IEC 25024:2015 Systems and Software Engineering\u2013Systems and Software Quality Requirements and Evaluation (SQuaRE)\u2013Measurement of Data Quality","year":"2015","unstructured":"ISO\/IEC. 2015. ISO\/IEC 25024:2015 Systems and Software Engineering\u2013Systems and Software Quality Requirements and Evaluation (SQuaRE)\u2013Measurement of Data Quality. Standard. ISO\/IEC."},{"key":"e_1_3_2_41_2","volume-title":"ISO\/IEC 15939:2017 Systems and Software Engineering\u2013Measurement Process","year":"2017","unstructured":"ISO\/IEC. 2017. ISO\/IEC 15939:2017 Systems and Software Engineering\u2013Measurement Process. Standard. ISO\/IEC."},{"key":"e_1_3_2_42_2","volume-title":"ISO\/IEC 20547-3:2020 Big Data Reference Architecture - Part 3: Reference Architecture","year":"2020","unstructured":"ISO\/IEC. 2020. ISO\/IEC 20547-3:2020 Big Data Reference Architecture - Part 3: Reference Architecture. Standard. ISO\/IEC."},{"key":"e_1_3_2_43_2","doi-asserted-by":"crossref","unstructured":"Shawn R. Jeffery Gustavo Alonso Michael J. Franklin Wei Hong and Jennifer Widom. 2006. Declarative support for sensor data cleaning. In Pervasive Computing: 4th International Conference (PERVASIVE\u201906 Dublin Ireland May 7-10 2006) . Proceedings 4. Springer 83\u2013100.","DOI":"10.1007\/11748625_6"},{"key":"e_1_3_2_44_2","doi-asserted-by":"crossref","unstructured":"Zuhair Khayyat Ihab F. Ilyas Alekh Jindal S. Madden M. Ouzzani Paolo Papotti Jorge-Arnulfo Quian\u00e9-Ruiz Nan Tang and Si Yin. 2015. BigDansing: A system for big data cleansing. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data . 1215\u20131230.","DOI":"10.1145\/2723372.2747646"},{"key":"e_1_3_2_45_2","article-title":"Sampling techniques for big data analysis","author":"Kim Jae Kwang","year":"2019","unstructured":"Jae Kwang Kim and Zhonglei Wang. 2019. Sampling techniques for big data analysis. International Statistical Review 87 (2019), S177\u2013S191.","journal-title":"International Statistical Review"},{"key":"e_1_3_2_46_2","doi-asserted-by":"crossref","unstructured":"Won Y. Kim Byoungju Choi E. Hong S. Kim and D. Lee. 2003. A taxonomy of dirty data. Data Mining and Knowledge Discovery 7 (2003) 81\u201399.","DOI":"10.1023\/A:1021564703268"},{"key":"e_1_3_2_47_2","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/S0378-7206(00)00060-4","article-title":"Quality metrics for intranet applications","volume":"38","author":"Leung H.","year":"2001","unstructured":"H. Leung. 2001. Quality metrics for intranet applications. Inf. Manag. 38, 3 (2001), 137\u2013152.","journal-title":"Inf. Manag."},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.future.2015.11.024","article-title":"A data quality in use model for big data","volume":"63","author":"Merino Jorge","year":"2016","unstructured":"Jorge Merino, I. Caballero, Bibiano Rivas, M. Serrano, and M. Piattini. 2016. A data quality in use model for big data. Future Gener. Comput. Syst. 63 (2016), 123\u2013130.","journal-title":"Future Gener. Comput. Syst."},{"key":"e_1_3_2_49_2","unstructured":"Microsoft. 2012. Introduction to Data Quality Services. Retrieved March 8 2021 from https:\/\/docs.microsoft.com\/en-us\/sql\/data-quality-services\/introduction-to-data-quality-services?view=sql-server-ver15."},{"key":"e_1_3_2_50_2","unstructured":"Heiko M\u00fcller and Johann Christoph Freytag. 2005. Problems Methods and Challenges in Comprehensive Data Cleansing . Professoren des Inst. F\u00fcr Informatik."},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","unstructured":"A. Nikiforova and J. Bicevskis. 2019. An extended data object-driven approach to data quality evaluation: Contextual data quality analysis. In ICEIS (1) . 274\u2013281.","DOI":"10.5220\/0007838602740281"},{"key":"e_1_3_2_52_2","first-page":"107","article-title":"User-oriented approach to data quality evaluation","volume":"26","author":"Nikiforova A.","year":"2020","unstructured":"A. Nikiforova, J. Bicevskis, Zane Bicevska, and Ivo Oditis. 2020. User-oriented approach to data quality evaluation. J. UCS 26, 1 (2020), 107\u2013126.","journal-title":"J. UCS"},{"key":"e_1_3_2_53_2","volume-title":"ICIQ","author":"Oliveira Paulo","year":"2005","unstructured":"Paulo Oliveira, F\u00e1tima Rodrigues, and P. Henriques. 2005. A formal definition of data quality problems. In ICIQ."},{"key":"e_1_3_2_54_2","first-page":"219","volume-title":"2nd Int. Workshop on Data and Information Quality","author":"Oliveira Paulo","year":"2005","unstructured":"Paulo Oliveira, F\u00e1tima Rodrigues, Pedro Henriques, and Helena Galhardas. 2005. A taxonomy of data quality problems. In 2nd Int. Workshop on Data and Information Quality. 219\u2013233."},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/978-3-642-16518-4_1","volume-title":"Schema Matching and Mapping","author":"Rahm Erhard","year":"2011","unstructured":"Erhard Rahm. 2011. Towards large-scale schema and ontology matching. In Schema Matching and Mapping. Springer, 3\u201327."},{"key":"e_1_3_2_56_2","first-page":"3","article-title":"Data cleaning: Problems and current approaches","volume":"23","author":"Rahm E.","year":"2000","unstructured":"E. Rahm and H. Do. 2000. Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23, 4 (2000), 3\u201313.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_3_2_57_2","first-page":"1","volume-title":"Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics","author":"Ritze Dominique","year":"2015","unstructured":"Dominique Ritze, Oliver Lehmberg, and Christian Bizer. 2015. Matching html tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. 1\u20136."},{"issue":"5","key":"e_1_3_2_58_2","doi-asserted-by":"crossref","first-page":"532","DOI":"10.21817\/indjcse\/2020\/v11i5\/201105116","article-title":"Sampling based join-aggregate query processing technique for big data","volume":"11","author":"Sadineni Praveen Kumar","year":"2020","unstructured":"Praveen Kumar Sadineni. 2020. Sampling based join-aggregate query processing technique for big data. Indian Journal of Computer Science and Engineering 11, 5 (2020), 532\u2013546.","journal-title":"Indian Journal of Computer Science and Engineering"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","first-page":"1294","DOI":"10.1109\/ICDE.2014.6816764","article-title":"Data quality: The other face of Big Data","author":"Saha B.","year":"2014","unstructured":"B. Saha and D. Srivastava. 2014. Data quality: The other face of Big Data. In 2014 IEEE 30th International Conference on Data Engineering. 1294\u20131297.","journal-title":"2014 IEEE 30th International Conference on Data Engineering"},{"issue":"3","key":"e_1_3_2_60_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3157734","article-title":"Efficient parallel random sampling\u2014Vectorized, cache-efficient, and online","volume":"44","author":"Sanders Peter","year":"2018","unstructured":"Peter Sanders, Sebastian Lamm, Lorenz H\u00fcbschle-Schneider, Emanuel Schrade, and Carsten Dachsbacher. 2018. Efficient parallel random sampling\u2014Vectorized, cache-efficient, and online. ACM Transactions on Mathematical Software (TOMS) 44, 3 (2018), 1\u201314.","journal-title":"ACM Transactions on Mathematical Software (TOMS)"},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","first-page":"1781","DOI":"10.14778\/3229863.3229867","article-title":"Automating large-scale data quality verification","volume":"11","author":"Schelter Sebastian","year":"2018","unstructured":"Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating large-scale data quality verification. Proc. VLDB Endow. 11, 12 (2018), 1781\u20131794.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_3_2_62_2","unstructured":"Calidad Software. 2022. ISO\/IEC 25012. Retrieved March 22 2020 from https:\/\/iso25000.com\/index.php\/en\/iso-25000-standards\/iso-25012."},{"key":"e_1_3_2_63_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/253769.253804","article-title":"Data quality in context","volume":"40","author":"Strong D.","year":"1997","unstructured":"D. Strong, Y. Lee, and R. Wang. 1997. Data quality in context. Commun. ACM 40, 5 (1997), 103\u2013110.","journal-title":"Commun. ACM"},{"key":"e_1_3_2_64_2","doi-asserted-by":"crossref","unstructured":"Y. Taher Rafiqul Haque Mohammed AlShaer W. Heuvel Mohand-Said Hacid and M. Dbouk. 2016. A context-aware analytics for processing tweets and analysing sentiment in realtime (short paper). In On the Move to Meaningful Internet Systems: OTM 2016 Conferences: Confederated International Conferences: CoopIS C&TC and ODBASE 2016 Rhodes Greece October 24-28 2016 Proceedings . Springer 910\u2013917.","DOI":"10.1007\/978-3-319-48472-3_57"},{"key":"e_1_3_2_65_2","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1109\/BigDataCongress.2015.35","article-title":"Big data pre-processing: A quality framework","author":"Taleb Ikbal","year":"2015","unstructured":"Ikbal Taleb, Rachida Dssouli, and Mohamed Adel Serhani. 2015. Big data pre-processing: A quality framework. In 2015 IEEE International Congress on Big Data. 191\u2013198.","journal-title":"2015 IEEE International Congress on Big Data"},{"key":"e_1_3_2_66_2","first-page":"1","article-title":"Big data quality framework: A holistic approach to continuous quality management","volume":"8","author":"Taleb Ikbal","year":"2021","unstructured":"Ikbal Taleb, Mohamed Adel Serhani, Chafik Bouhaddioui, and Rachida Dssouli. 2021. Big data quality framework: A holistic approach to continuous quality management. Journal of Big Data 8, 1 (2021), 1\u201341.","journal-title":"Journal of Big Data"},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1109\/INNOVATIONS.2018.8605945","article-title":"Big data quality assessment model for unstructured data","author":"Taleb Ikbal","year":"2018","unstructured":"Ikbal Taleb, M. A. Serhani, and R. Dssouli. 2018. Big data quality assessment model for unstructured data. In 2018 International Conference on Innovations in Information Technology (IIT\u201918). 69\u201374.","journal-title":"2018 International Conference on Innovations in Information Technology (IIT\u201918)"},{"key":"e_1_3_2_68_2","volume-title":"How to Manage Modern Data Quality [White Paper]","year":"2020","unstructured":"Talend. 2020. How to Manage Modern Data Quality [White Paper]. Technical Report. Talend. Retrieved August 1, 2020 from https:\/\/www.talend.com\/resources\/definitive-guide-data-quality-how-to-manage."},{"key":"e_1_3_2_69_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/07421222.1996.11518099","article-title":"Beyond accuracy: What data quality means to data consumers","volume":"12","author":"Wang R.","year":"1996","unstructured":"R. Wang and D. Strong. 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 12, 4 (1996), 5\u201333.","journal-title":"J. Manag. Inf. Syst."},{"key":"e_1_3_2_70_2","first-page":"298","article-title":"A classification of data quality assessment and improvement methods","volume":"3","author":"Woodall P.","year":"2014","unstructured":"P. Woodall, Martin Oberhofer, and A. Borek. 2014. A classification of data quality assessment and improvement methods. Int. J. Inf. Qual. 3, 4 (2014), 298\u2013321.","journal-title":"Int. J. Inf. Qual."},{"key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1109\/BigDataService.2017.42","article-title":"A survey on quality assurance techniques for big data applications","author":"Zhang Pengcheng","year":"2017","unstructured":"Pengcheng Zhang, Xuewu Zhou, Wenrui Li, and Jerry Zeyu Gao. 2017. A survey on quality assurance techniques for big data applications. 2017 IEEE 3rd International Conference on Big Data Computing Service and Applications (BigDataService\u201917). 313\u2013319.","journal-title":"2017 IEEE 3rd International Conference on Big Data Computing Service and Applications (BigDataService\u201917)"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603706","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3603706","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:21Z","timestamp":1750178241000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603706"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,22]]},"references-count":70,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3603706"],"URL":"https:\/\/doi.org\/10.1145\/3603706","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,22]]},"assertion":[{"value":"2022-04-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-12","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}