{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T09:30:13Z","timestamp":1770543013927,"version":"3.49.0"},"reference-count":143,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T00:00:00Z","timestamp":1692662400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>The term data quality refers to measuring the fitness of data regarding the intended usage. Poor data quality leads to inadequate, inconsistent, and erroneous decisions that could escalate the computational cost, cause a decline in profits, and cause customer churn. Thus, data quality is crucial for researchers and industry practitioners.<\/jats:p><jats:p>Different factors drive the assessment of data quality. Data context is deemed one of the key factors due to the contextual diversity of real-world use cases of various entities such as people and organizations. Data used in a specific context (e.g., an organization policy) may need to be more efficacious for another context. Hence, implementing a data quality assessment solution in different contexts is challenging.<\/jats:p><jats:p>Traditional technologies for data quality assessment reached the pinnacle of maturity. Existing solutions can solve most of the quality issues. The data context in these solutions is defined as validation rules applied within the ETL (extract, transform, load) process, i.e., the data warehousing process. In contrast to traditional data quality management, it is impossible to specify all the data semantics beforehand for big data. We need context-aware data quality rules to detect semantic errors in a massive amount of heterogeneous data generated at high speed. While many researchers tackle the quality issues of big data, they define the data context from a specific standpoint. Although data quality is a longstanding research issue in academia and industries, it remains an open issue, especially with the advent of big data, which has fostered the challenge of data quality assessment more than ever.<\/jats:p><jats:p>This article provides a scoping review to study the existing context-aware data quality assessment solutions, starting with the existing big data quality solutions in general and then covering context-aware solutions. The strength and weaknesses of such solutions are outlined and discussed. The survey showed that none of the existing data quality assessment solutions could guarantee context awareness with the ability to handle big data. Notably, each solution dealt only with a partial view of the context. We compared the existing quality models and solutions to reach a comprehensive view covering the aspects of context awareness when assessing data quality. This led us to a set of recommendations framed in a methodological framework shaping the design and implementation of any context-aware data quality service for big data. Open challenges are then identified and discussed.<\/jats:p>","DOI":"10.1145\/3603707","type":"journal-article","created":{"date-parts":[[2023,6,14]],"date-time":"2023-06-14T02:39:31Z","timestamp":1686710371000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Context-aware Big Data Quality Assessment: A Scoping Review"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1160-5980","authenticated-orcid":false,"given":"Hadi","family":"Fadlallah","sequence":"first","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5710-6901","authenticated-orcid":false,"given":"Rima","family":"Kilany","sequence":"additional","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7476-4754","authenticated-orcid":false,"given":"Houssein","family":"Dhayne","sequence":"additional","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6285-0279","authenticated-orcid":false,"given":"Rami","family":"El Haddad","sequence":"additional","affiliation":[{"name":"Saint-Joseph University, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5705-3427","authenticated-orcid":false,"given":"Rafiqul","family":"Haque","sequence":"additional","affiliation":[{"name":"Intelligencia R &amp; D, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8706-8889","authenticated-orcid":false,"given":"Yehia","family":"Taher","sequence":"additional","affiliation":[{"name":"University of Versailles Saint-Quentin-en-Yvelines (UVSQ), France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6976-133X","authenticated-orcid":false,"given":"Ali","family":"Jaber","sequence":"additional","affiliation":[{"name":"Lebanese University, Lebanon"}]}],"member":"320","published-online":{"date-parts":[[2023,8,22]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"crossref","unstructured":"Ziawasch Abedjan Lukasz Golab and Felix Naumann. 2017. Data profiling: A tutorial. In Proceedings of the 2017 ACM International Conference on Management of Data (2017) 1747\u20131751.","DOI":"10.1145\/3035918.3054772"},{"issue":"4","key":"e_1_3_2_3_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-031-01865-7","article-title":"Data profiling","volume":"10","author":"Abedjan Ziawasch","year":"2018","unstructured":"Ziawasch Abedjan, Lukasz Golab, Felix Naumann, and Thorsten Papenbrock. 2018. Data profiling. Synthes. Lect. Data Manag. 10, 4 (2018), 1\u2013154.","journal-title":"Synthes. Lect. Data Manag."},{"key":"e_1_3_2_4_2","first-page":"260","volume-title":"The Semantic Web\u2013ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21\u201325, 2013, Proceedings, Part II 12","author":"Acosta Maribel","year":"2013","unstructured":"Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, S\u00f6ren Auer, and Jens Lehmann. 2013. Crowdsourcing linked data quality assessment. In The Semantic Web\u2013ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21\u201325, 2013, Proceedings, Part II 12. Springer, 260\u2013276."},{"key":"e_1_3_2_5_2","volume-title":"Challenges and Opportunities with Big Data [White Paper]","author":"Agrawal Divyakant","year":"2011","unstructured":"Divyakant Agrawal, Philip Bernstein, Elisa Bertino, Susan Davidson, Umeshwas Dayal, Michael Franklin, Johannes Gehrke, Laura Haas, Alon Halevy, Jiawei Han et\u00a0al. 2011. Challenges and Opportunities with Big Data [White Paper]. Technical Report. Computing Research Association. Retrieved from http:\/\/cra.org\/ccc\/docs\/init\/bigdatawhitepaper.pdf."},{"key":"e_1_3_2_6_2","doi-asserted-by":"crossref","unstructured":"Jameela Al-Jaroodi and Nader Mohamed. 2018. Service-oriented architecture for big data analytics in smart cities. In 18th IEEE\/ACM International Symposium on Cluster Cloud and Grid Computing (CCGRID\u201918) . 633\u2013640.","DOI":"10.1109\/CCGRID.2018.00052"},{"key":"e_1_3_2_7_2","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1016\/j.future.2019.02.044","article-title":"IBRIDIA: A hybrid solution for processing big logistics data","volume":"97","author":"AlShaer Mohammed","year":"2019","unstructured":"Mohammed AlShaer, Yehia Taher, Rafiqul Haque, Mohand-Sa\u00efd Hacid, and Mohamed Dbouk. 2019. IBRIDIA: A hybrid solution for processing big logistics data. Fut. Gen. Comput. Syst. 97 (2019), 792\u2013804.","journal-title":"Fut. Gen. Comput. Syst."},{"key":"e_1_3_2_8_2","doi-asserted-by":"crossref","first-page":"548","DOI":"10.1016\/j.future.2018.07.014","article-title":"Context-aware data quality assessment for big data","volume":"89","author":"Ardagna Danilo","year":"2018","unstructured":"Danilo Ardagna, Cinzia Cappiello, Walter Sam\u00e1, and Monica Vitali. 2018. Context-aware data quality assessment for big data. Fut. Gen. Comput. Syst. 89 (2018), 548\u2013562.","journal-title":"Fut. Gen. Comput. Syst."},{"key":"e_1_3_2_9_2","article-title":"Improving the data quality in the research information systems","author":"Azeroual Otmane","year":"2019","unstructured":"Otmane Azeroual and Mohammad Abuosba. 2019. Improving the data quality in the research information systems. arXiv preprint arXiv:1901.07388 (2019).","journal-title":"arXiv preprint arXiv:1901.07388"},{"key":"e_1_3_2_10_2","volume-title":"10th International Conference on Model-driven Engineering Languages and Systems, Models","author":"B\u0101rzdi\u0146\u0161 J\u0101nis","year":"2007","unstructured":"J\u0101nis B\u0101rzdi\u0146\u0161, Andris Zari\u0146\u0161, K\u0101rlis \u010cer\u0101ns, Audris Kalni\u0146\u0161, Edgars Rencis, Lelde L\u0101ce, Ren\u0101rs Liepi\u0146\u0161, and Art\u016brs Sprog\u0300is. 2007. GrTP: Transformation based graphical tool building platform. In 10th International Conference on Model-driven Engineering Languages and Systems, Models."},{"issue":"3","key":"e_1_3_2_11_2","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1504\/IJICA.2008.019688","article-title":"A comprehensive data quality methodology for web and structured data","volume":"1","author":"Batini Carlo","year":"2008","unstructured":"Carlo Batini, Federico Cabitza, Cinzia Cappiello, and Chiara Francalanci. 2008. A comprehensive data quality methodology for web and structured data. Int. J. Innov. Comput. Applic. 1, 3 (2008), 205\u2013218.","journal-title":"Int. J. Innov. Comput. Applic."},{"issue":"1","key":"e_1_3_2_12_2","doi-asserted-by":"crossref","first-page":"60","DOI":"10.4018\/JDM.2015010103","article-title":"From data quality to big data quality","volume":"26","author":"Batini Carlo","year":"2015","unstructured":"Carlo Batini, Anisa Rula, Monica Scannapieco, and Gianluigi Viscusi. 2015. From data quality to big data quality. J. Datab. Manag. 26, 1 (2015), 60\u201382.","journal-title":"J. Datab. Manag."},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","first-page":"103441","DOI":"10.1016\/j.autcon.2020.103441","article-title":"Cloud computing in construction industry: Use cases, benefits and challenges","volume":"122","author":"Bello Sururah A.","year":"2021","unstructured":"Sururah A. Bello, Lukumon O. Oyedele, Olugbenga O. Akinade, Muhammad Bilal, Juan Manuel Davila Delgado, Lukman A. Akanbi, Anuoluwapo O. Ajayi, and Hakeem A. Owolabi. 2021. Cloud computing in construction industry: Use cases, benefits and challenges. Automat. Construct. 122 (2021), 103441.","journal-title":"Automat. Construct."},{"issue":"11","key":"e_1_3_2_14_2","doi-asserted-by":"crossref","first-page":"695","DOI":"10.14778\/3402707.3402710","article-title":"Generic schema matching, ten years later","volume":"4","author":"Bernstein Philip A.","year":"2011","unstructured":"Philip A. Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic schema matching, ten years later. Proc. VLDB Endow. 4, 11 (2011), 695\u2013701.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"Janki Bhimani Ningfang Mi Miriam Leeser and Zhengyu Yang. 2017. FiM: Performance prediction for parallel computation in iterative data processing applications. In IEEE 10th International Conference on Cloud Computing (CLOUD\u201917) . 359\u2013366.","DOI":"10.1109\/CLOUD.2017.53"},{"issue":"3","key":"e_1_3_2_16_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3309684","article-title":"New performance modeling methods for parallel data processing applications","volume":"29","author":"Bhimani Janki","year":"2019","unstructured":"Janki Bhimani, Ningfang Mi, Miriam Leeser, and Zhengyu Yang. 2019. New performance modeling methods for parallel data processing applications. ACM Trans. Model. Comput. Simul. 29, 3 (2019), 1\u201324.","journal-title":"ACM Trans. Model. Comput. Simul."},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","unstructured":"Zane Bicevska Janis Bicevskis and Ivo Oditis. 2017. Domain-specific characteristics of data quality. Federated Conference on Computer Science and Information Systems (FedCSIS\u201917) . 999\u20131003.","DOI":"10.15439\/2017F279"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1007\/978-3-319-77721-4_11","volume-title":"Information Technology for Management. Ongoing Research and Development: 15th Conference, AITM 2017, and 12th Conference, ISM 2017, Held as Part of FedCSIS, Prague, Czech Republic, September 3\u20136, 2017, Extended Selected Papers 15","author":"Bicevska Zane","year":"2018","unstructured":"Zane Bicevska, Janis Bicevskis, and Ivo Oditis. 2018. Models of data quality. In Information Technology for Management. Ongoing Research and Development: 15th Conference, AITM 2017, and 12th Conference, ISM 2017, Held as Part of FedCSIS, Prague, Czech Republic, September 3\u20136, 2017, Extended Selected Papers 15. Springer, 194\u2013211."},{"key":"e_1_3_2_19_2","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1016\/j.procs.2017.01.087","article-title":"Executable data quality models","volume":"104","author":"Bicevskis Janis","year":"2017","unstructured":"Janis Bicevskis, Zane Bicevska, and Girts Karnitis. 2017. Executable data quality models. Procedia Comput. Sci. 104 (2017), 138\u2013145.","journal-title":"Procedia Comput. Sci."},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","unstructured":"Janis Bicevskis Zane Bicevska Anastasija Nikiforova and Ivo Oditis. 2018. An approach to data quality evaluation. In Fifth International Conference on Social Networks Analysis Management and Security (SNAMS\u201918) . 196\u2013201.","DOI":"10.1109\/SNAMS.2018.8554915"},{"key":"e_1_3_2_21_2","unstructured":"Jacqueline Biscobing. 2018. What Is Data Sampling? Retrieved from https:\/\/www.techtarget.com\/searchbusinessanalytics\/definition\/data-sampling."},{"key":"e_1_3_2_22_2","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1007\/978-3-319-91479-4_43","volume-title":"Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications: 17th International Conference, IPMU 2018, C\u00e1diz, Spain, June 11\u201315, 2018, Proceedings, Part III 17","author":"Bronselaer Antoon","year":"2018","unstructured":"Antoon Bronselaer, Joachim Nielandt, Toon Boeckling, and Guy De Tr\u00e9. 2018. Operational measurement of data quality. In Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications: 17th International Conference, IPMU 2018, C\u00e1diz, Spain, June 11\u201315, 2018, Proceedings, Part III 17. Springer, 517\u2013528."},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/978-3-642-02184-8_13","article-title":"Using ontologies providing domain knowledge for data quality management","author":"Br\u00fcggemann Stefan","year":"2009","unstructured":"Stefan Br\u00fcggemann and Fabian Gr\u00fcning. 2009. Using ontologies providing domain knowledge for data quality management. Networked Knowledge-Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic Systems. Springer, 187\u2013203.","journal-title":"Networked Knowledge-Networked Media: Integrating Knowledge Management, New Media Technologies and Semantic Systems"},{"key":"e_1_3_2_24_2","first-page":"26","volume-title":"Workshop: Issues and Opportunities for Improving the Quality and Use of Data within the DoD","author":"Buneman Peter","year":"2010","unstructured":"Peter Buneman and Susan B. Davidson. 2010. Data provenance\u2013The foundation of data quality. In Workshop: Issues and Opportunities for Improving the Quality and Use of Data within the DoD, Arlington, 26\u201328."},{"key":"e_1_3_2_25_2","article-title":"The challenges of data quality and data quality assessment in the big data era","volume":"14","author":"Cai Li","year":"2015","unstructured":"Li Cai and Yangyong Zhu. 2015. The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 14 (2015).","journal-title":"Data Sci. J."},{"issue":"1","key":"e_1_3_2_26_2","first-page":"60","article-title":"A data quality methodology for heterogeneous data","volume":"3","author":"Carlo Batini","year":"2011","unstructured":"Batini Carlo, Barone Daniele, Cabitza Federico, and Grega Simone. 2011. A data quality methodology for heterogeneous data. Int. J. Datab. Manag. Syst. 3, 1 (2011), 60\u201379.","journal-title":"Int. J. Datab. Manag. Syst."},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"O.-Hoon Choi Jun-Eun Lim Hong-Seok Na and Doo-Kwon Baik. 2008. An efficient method of data quality using quality evaluation ontology. 2008 Third International Conference on Convergence and Hybrid Information Technology 2 (2008) 1058\u20131061.","DOI":"10.1109\/ICCIT.2008.118"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","first-page":"24634","DOI":"10.1109\/ACCESS.2019.2899751","article-title":"An overview of data quality frameworks","volume":"7","author":"Cichy Corinna","year":"2019","unstructured":"Corinna Cichy and Stefan Rass. 2019. An overview of data quality frameworks. IEEE Access 7 (2019), 24634\u201324648.","journal-title":"IEEE Access"},{"key":"e_1_3_2_29_2","unstructured":"Roger Clarke. 2014. Quality Factors in Big Data and Big Data Analytics . Xamax Consultancy Pty Ltd."},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"Graham Cormode and Nick Duffield. 2014. Sampling for big data: A tutorial. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 1975\u20131975.","DOI":"10.1145\/2623330.2630811"},{"key":"e_1_3_2_31_2","unstructured":"Microsoft Corporation. 2013. Data Quality Services. Retrieved from https:\/\/docs.microsoft.com\/en-us\/sql\/data-quality-services\/data-quality-services?view=sql-server-ver15."},{"key":"e_1_3_2_32_2","unstructured":"Microsoft Corporation. 2018. SQL Server Integration Services. Retrieved from https:\/\/docs.microsoft.com\/en-us\/sql\/integration-services\/sql-server-integration-services?view=sql-server-ver15."},{"key":"e_1_3_2_33_2","volume-title":"Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality [White Paper]","author":"Corporation Oracle","year":"2013","unstructured":"Oracle Corporation. 2013. Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality [White Paper]. Technical Report. Oracle Corporation. Retrieved from https:\/\/www.oracle.com\/technetwork\/middleware\/data-integrator\/overview\/oracledi-comprehensive-quality-131748.pdf."},{"key":"e_1_3_2_34_2","doi-asserted-by":"crossref","first-page":"439","DOI":"10.1007\/978-3-319-32467-8_39","volume-title":"Information Technology: New Generations: 13th International Conference on Information Technology","author":"Dai Wei","year":"2016","unstructured":"Wei Dai, Isaac Wardlaw, Yu Cui, Kashif Mehdi, Yanyan Li, and Jun Long. 2016. Data profiling technology of data governance regarding big data: Review and rethinking. In Information Technology: New Generations: 13th International Conference on Information Technology. Springer, 439\u2013450."},{"key":"e_1_3_2_35_2","doi-asserted-by":"crossref","unstructured":"Wei Dai Kenji Yoshigoe and William Parsley. 2018. Improving data quality through deep learning and statistical models. In Information Technology-New Generations: 14th International Conference on Information Technology . 515\u2013522.","DOI":"10.1007\/978-3-319-54978-1_66"},{"issue":"1","key":"e_1_3_2_36_2","first-page":"1","article-title":"Big Data management in smart grid: Concepts, requirements and implementation","volume":"4","author":"Daki Houda","year":"2017","unstructured":"Houda Daki, Asmaa El Hannani, Abdelhak Aqqal, Abdelfattah Haidine, and Aziz Dahbi. 2017. Big Data management in smart grid: Concepts, requirements and implementation. J. Big Data 4, 1 (2017), 1\u201319.","journal-title":"J. Big Data"},{"issue":"1","key":"e_1_3_2_37_2","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: simplified data processing on large clusters","volume":"51","author":"Dean Jeffrey","year":"2008","unstructured":"Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1, 107\u2013113.","journal-title":"Commun. ACM"},{"key":"e_1_3_2_38_2","doi-asserted-by":"crossref","first-page":"91265","DOI":"10.1109\/ACCESS.2019.2927491","article-title":"In search of big medical data integration solutions\u2014A comprehensive survey","volume":"7","author":"Dhayne Houssein","year":"2019","unstructured":"Houssein Dhayne, Rafiqul Haque, Rima Kilany, and Yehia Taher. 2019. In search of big medical data integration solutions\u2014A comprehensive survey. IEEE Access 7 (2019), 91265\u201391290.","journal-title":"IEEE Access"},{"issue":"3","key":"e_1_3_2_39_2","first-page":"49","volume":"3","author":"Dmitriyev Viktor","year":"2015","unstructured":"Viktor Dmitriyev, Tariq Mahmoud, and Pablo Michel Mar\u00edn-Ortega. 2015. Int. J. Inf. Syst. Proj. Manag. 3, 3 (2015), 49\u201363.","journal-title":"Int. J. Inf. Syst. Proj. Manag."},{"key":"e_1_3_2_40_2","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1007\/978-3-642-36257-6_13","article-title":"Data fusion: Resolving conflicts from multiple sources","author":"Dong Xin Luna","year":"2013","unstructured":"Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2013. Data fusion: Resolving conflicts from multiple sources. Handbook of Data Quality: Research and Practice. Springer, 293\u2013318.","journal-title":"Handbook of Data Quality: Research and Practice."},{"key":"e_1_3_2_41_2","first-page":"1245","volume-title":"IEEE 29th International Conference on Data Engineering (ICDE\u201913)","author":"Dong Xin Luna","year":"2013","unstructured":"Xin Luna Dong and Divesh Srivastava. 2013. Big data integration. In IEEE 29th International Conference on Data Engineering (ICDE\u201913). IEEE, 1245\u20131248."},{"key":"e_1_3_2_42_2","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/978-3-319-74313-4_8","volume-title":"Perspectives of System Informatics: 11th International Andrei P. Ershov Informatics Conference, PSI 2017, Moscow, Russia, June 27\u201329, 2017, Revised Selected Papers 11","author":"Dragoni Nicola","year":"2018","unstructured":"Nicola Dragoni, Ivan Lanese, Stephan Thordal Larsen, Manuel Mazzara, Ruslan Mustafin, and Larisa Safina. 2018. Microservices: How to make your application scale. In Perspectives of System Informatics: 11th International Andrei P. Ershov Informatics Conference, PSI 2017, Moscow, Russia, June 27\u201329, 2017, Revised Selected Papers 11. Springer, 95\u2013104."},{"issue":"1","key":"e_1_3_2_43_2","doi-asserted-by":"crossref","first-page":"112","DOI":"10.51983\/ajcst-2018.7.1.1817","article-title":"Importance of MapReduce for big data applications: A survey","volume":"7","author":"Durairaj M.","year":"2018","unstructured":"M. Durairaj and T. S. Poornappriya. 2018. Importance of MapReduce for big data applications: A survey. Asian J. Comput. Sci. Technol. 7, 1 (2018), 112\u2013118.","journal-title":"Asian J. Comput. Sci. Technol."},{"issue":"3","key":"e_1_3_2_44_2","first-page":"400","article-title":"Automated continuous data quality measurement with QuaIIe","volume":"11","author":"Ehrlinger Lisa","year":"2018","unstructured":"Lisa Ehrlinger, Bernhard Werth, and Wolfram W\u00f6\u00df. 2018. Automated continuous data quality measurement with QuaIIe. Int. J. Advanc. Softw. 11, 3 (2018), 400\u2013417.","journal-title":"Int. J. Advanc. Softw."},{"key":"e_1_3_2_45_2","first-page":"21","volume-title":"10th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA\u201918)","author":"Ehrlinger Lisa","year":"2018","unstructured":"Lisa Ehrlinger, Bernhard Werth, and Wolfram W\u00f6\u00df. 2018. QuaIIe: A data quality assessment tool for integrated information systems. In 10th International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA\u201918). 21\u201331."},{"key":"e_1_3_2_46_2","first-page":"15","volume-title":"22nd MIT International Conference on Information Quality (ICIQ\u201917)","author":"Ehrlinger Lisa","year":"2017","unstructured":"Lisa Ehrlinger and Wolfram W\u00f6\u00df. 2017. Automated data quality monitoring. In 22nd MIT International Conference on Information Quality (ICIQ\u201917). 15\u20131."},{"key":"e_1_3_2_47_2","volume-title":"International Conference on Information Quality (ICIQ\u201905)","author":"Even Adir","year":"2005","unstructured":"Adir Even and Ganesan Shankaranarayanan. 2005. Value-driven data quality assessment. In International Conference on Information Quality (ICIQ\u201905)."},{"issue":"2","key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1145\/1240616.1240623","article-title":"Utility-driven assessment of data quality","volume":"38","author":"Even Adir","year":"2007","unstructured":"Adir Even and Ganesan Shankaranarayanan. 2007. Utility-driven assessment of data quality. ACM SIGMIS Datab.: DATAB. Adv. Inf. Syst. 38, 2 (2007), 75\u201393.","journal-title":"ACM SIGMIS Datab.: DATAB. Adv. Inf. Syst."},{"key":"e_1_3_2_49_2","first-page":"52","volume-title":"International Conference on Big Data and Cybersecurity Intelligence (BDCSIntell\u201919)","author":"Fadlallah Hadi","year":"2019","unstructured":"Hadi Fadlallah, Yehia Taher, Rafiqul Haque, and Ali Jaber. 2019. ORADIEX: A big data driven smart framework for real-time surveillance and analysis of individual exposure to radioactive pollution. In International Conference on Big Data and Cybersecurity Intelligence (BDCSIntell\u201919). 52\u201356."},{"key":"e_1_3_2_50_2","first-page":"89","volume-title":"International Conference on Big Data and Cybersecurity Intelligence (BDCSIntell\u201918)","author":"Fadlallah Hadi","year":"2018","unstructured":"Hadi Fadlallah, Yehia Taher, and Ali Jaber. 2018. RaDEn: A scalable and efficient radiation data engineering. In International Conference on Big Data and Cybersecurity Intelligence (BDCSIntell\u201918). 89\u201393."},{"key":"e_1_3_2_51_2","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1145\/2506364.2506366","volume-title":"2nd ACM International Workshop on Crowdsourcing for Multimedia","author":"Salas \u00d3scar Figuerola","year":"2013","unstructured":"\u00d3scar Figuerola Salas, Velibor Adzic, Akash Shah, and Hari Kalva. 2013. Assessing internet video quality using crowdsourcing. In 2nd ACM International Workshop on Crowdsourcing for Multimedia. 23\u201328."},{"key":"e_1_3_2_52_2","first-page":"363","volume-title":"43rd Annual Meeting of the Association for Computational Linguistics (ACL\u201905)","author":"Finkel Jenny Rose","year":"2005","unstructured":"Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In 43rd Annual Meeting of the Association for Computational Linguistics (ACL\u201905). 363\u2013370."},{"key":"e_1_3_2_53_2","doi-asserted-by":"crossref","unstructured":"Jerry Gao Chunli Xie and Chuanqi Tao. 2016. Big data validation and quality assuranceIssues challenges and needs. In IEEE symposium on service-oriented system engineering (SOSE16) . 433\u2013441.","DOI":"10.1109\/SOSE.2016.63"},{"key":"e_1_3_2_54_2","first-page":"76","article-title":"A review of information quality research-develop a research agenda","author":"Ge Mouzhi","year":"2007","unstructured":"Mouzhi Ge and Markus Helfert. 2007. A review of information quality research-develop a research agenda. In International Conference on Information Quality (ICIQ\u201907). 76\u201391.","journal-title":"International Conference on Information Quality (ICIQ\u201907)"},{"key":"e_1_3_2_55_2","first-page":"132","article-title":"SparkDQ: Efficient generic big data quality management on distributed data-parallel computation","volume":"156","author":"Gu Rong","year":"2021","unstructured":"Rong Gu, Yang Qi, Tongyu Wu, Zhaokang Wang, Xiaolong Xu, Chunfeng Yuan, and Yihua Huang. 2021. SparkDQ: Efficient generic big data quality management on distributed data-parallel computation. J. ParallelDistrib. Comput. 156 (2021), 132\u2013147.","journal-title":"J. ParallelDistrib. Comput."},{"issue":"1","key":"e_1_3_2_56_2","first-page":"1","article-title":"Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations","volume":"10","author":"Gudivada Venkat","year":"2017","unstructured":"Venkat Gudivada, Amy Apon, and Junhua Ding. 2017. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Advanc. Softw. 10, 1 (2017), 1\u201320.","journal-title":"Int. J. Advanc. Softw."},{"key":"e_1_3_2_57_2","article-title":"Data quality centric application framework for big data","author":"Gudivada Venkat N.","year":"2016","unstructured":"Venkat N. Gudivada, Dhana Rao, and William I. Grosky. 2016. Data quality centric application framework for big data. In International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA\u201916).","journal-title":"International Conference on Big Data, Small Data, Linked Data and Open Data (ALLDATA\u201916)"},{"issue":"1","key":"e_1_3_2_58_2","first-page":"1","article-title":"Uncertainty in big data analytics: Survey, opportunities, and challenges","volume":"6","author":"Hariri Reihaneh H.","year":"2019","unstructured":"Reihaneh H. Hariri, Erik M. Fredericks, and Kate M. Bowers. 2019. Uncertainty in big data analytics: Survey, opportunities, and challenges. J. Big Data 6, 1 (2019), 1\u201316.","journal-title":"J. Big Data"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","unstructured":"Wilhelm Hasselbring. 2016. Microservices for scalability: Keynote talk abstract. In Proceedings of the 7th ACM\/SPEC on International Conference on Performance Engineering . 133\u2013134.","DOI":"10.1145\/2851553.2858659"},{"key":"e_1_3_2_60_2","doi-asserted-by":"crossref","unstructured":"Brian Hay Kara Nance and Matt Bishop. 2011. Storm clouds rising: Security challenges for IaaS cloud computing. In 2011 44th Hawaii International Conference on System Sciences . 1\u20137.","DOI":"10.1109\/HICSS.2011.386"},{"key":"e_1_3_2_61_2","doi-asserted-by":"crossref","unstructured":"Qinlu He Zhanhuai Li and Xiao Zhang. 2010. Data deduplication techniques. In 2010 International Conference on Future Information Technology and Management Engineering 1 (2010) 430\u2013433.","DOI":"10.1109\/FITME.2010.5656539"},{"key":"e_1_3_2_62_2","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.fss.2014.01.016","article-title":"Parallel sampling from big data with uncertainty distribution","volume":"258","author":"He Qing","year":"2015","unstructured":"Qing He, Haocheng Wang, Fuzhen Zhuang, Tianfeng Shang, and Zhongzhi Shi. 2015. Parallel sampling from big data with uncertainty distribution. Fuzzy Sets Syst. 258 (2015), 117\u2013133.","journal-title":"Fuzzy Sets Syst."},{"key":"e_1_3_2_63_2","doi-asserted-by":"crossref","unstructured":"Markus Helfert and Owen Foley. 2009. A context aware information quality framework. In 2009 Fourth International Conference on Cooperation and Promotion of Information Resources in Science and Technology . 187\u2013193.","DOI":"10.1109\/COINFO.2009.65"},{"issue":"4","key":"e_1_3_2_64_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3447772","article-title":"Knowledge graphs","volume":"54","author":"Hogan Aidan","year":"2021","unstructured":"Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d\u2019Amato, Gerard de Melo, Claudio Gutierrez, Sabrina Kirrane, Jos\u00e9 Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, et\u00a0al. 2021. Knowledge graphs. ACM Comput. Surv. 54, 4 (2021), 1\u201337.","journal-title":"ACM Comput. Surv."},{"key":"e_1_3_2_65_2","first-page":"62","volume-title":"Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Hosseini Kasra","year":"2020","unstructured":"Kasra Hosseini, Federico Nanni, and Mariona Coll Ardanuy. 2020. DeezyMatch: A flexible deep learning approach to fuzzy string matching. In Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 62\u201369."},{"key":"e_1_3_2_66_2","doi-asserted-by":"crossref","unstructured":"Tobias Ho\u00dffeld Matthias Hirth Pavel Korshunov Philippe Hanhart Bruno Gardlo Christian Keimel and Christian Timmerer. 2014. Survey of web-based crowdsourcing frameworks for subjective quality assessment. In IEEE 16th International Workshop on Multimedia Signal Processing (MMSP\u201914) . 1\u20136.","DOI":"10.1109\/MMSP.2014.6958831"},{"key":"e_1_3_2_67_2","doi-asserted-by":"crossref","DOI":"10.1145\/3310205","volume-title":"Data Cleaning","author":"Ilyas Ihab F.","year":"2019","unstructured":"Ihab F. Ilyas and Xu Chu. 2019. Data Cleaning. ACM New York, NY."},{"key":"e_1_3_2_68_2","doi-asserted-by":"crossref","first-page":"2028","DOI":"10.1109\/ACCESS.2015.2490723","article-title":"Evaluating the quality of social media data in big data architecture","volume":"3","author":"Immonen Anne","year":"2015","unstructured":"Anne Immonen, Pekka P\u00e4\u00e4kk\u00f6nen, and Eila Ovaska. 2015. Evaluating the quality of social media data in big data architecture. IEEE Access 3 (2015), 2028\u20132043.","journal-title":"IEEE Access"},{"key":"e_1_3_2_69_2","unstructured":"Talend Inc.2022. Data Quality and Machine Learning: What\u2019s the Connection? Retrieved from https:\/\/www.talend.com\/resources\/machine-learning-data-quality\/."},{"key":"e_1_3_2_70_2","volume-title":"Informatica Data Quality Data Sheet","year":"2018","unstructured":"Informatica. 2018. Informatica Data Quality Data Sheet. Technical Report. Informatica. Retrieved from https:\/\/www.informatica.com\/content\/dam\/informatica-com\/en\/collateral\/data-sheet\/en_informatica-data-quality_data-sheet_6710.pdf."},{"issue":"1","key":"e_1_3_2_71_2","doi-asserted-by":"crossref","first-page":"9","DOI":"10.14445\/22312803\/IJCTT-V19P103","article-title":"Big data analysis: Apache Storm perspective","volume":"19","author":"Iqbal Muhammad Hussain","year":"2015","unstructured":"Muhammad Hussain Iqbal, Tariq Rahim Soomro et\u00a0al. 2015. Big data analysis: Apache Storm perspective. Int. J. Comput. Trends Technol. 19, 1 (2015), 9\u201314.","journal-title":"Int. J. Comput. Trends Technol."},{"key":"e_1_3_2_72_2","volume-title":"ISO\/IEC 9126-1:2001. Software Engineering \u2013 Product Quality \u2013 Part 1: Quality Model","year":"2001","unstructured":"ISO\/IEC. 2001. ISO\/IEC 9126-1:2001. Software Engineering \u2013 Product Quality \u2013 Part 1: Quality Model. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/22749.html."},{"key":"e_1_3_2_73_2","volume-title":"25012:2008 Software Engineering \u2013 Software Product Quality Requirements and Evaluation (SQuaRE) \u2013 Data Quality Model","year":"2008","unstructured":"ISO\/IEC. 2008. 25012:2008 Software Engineering \u2013 Software Product Quality Requirements and Evaluation (SQuaRE) \u2013 Data Quality Model. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/35736.html."},{"key":"e_1_3_2_74_2","volume-title":"ISO\/IEC 25000:2014. Systems and Software Engineering \u2013 System and Software Quality Requirements and Evaluation (SQuaRE) \u2013 Guide to SQuaRE","year":"2014","unstructured":"ISO\/IEC. 2014. ISO\/IEC 25000:2014. Systems and Software Engineering \u2013 System and Software Quality Requirements and Evaluation (SQuaRE) \u2013 Guide to SQuaRE. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/64764.html."},{"key":"e_1_3_2_75_2","volume-title":"ISO\/IEC 25024:2015 Systems and Software Engineering \u2013 Systems and Software Quality Requirements and Evaluation (SQuaRE) \u2013 Measurement of Data Quality","year":"2015","unstructured":"ISO\/IEC. 2015. ISO\/IEC 25024:2015 Systems and Software Engineering \u2013 Systems and Software Quality Requirements and Evaluation (SQuaRE) \u2013 Measurement of Data Quality. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/35749.html."},{"key":"e_1_3_2_76_2","volume-title":"ISO\/IEC 15939:2017 Systems and Software Engineering \u2013 Measurement Process","year":"2017","unstructured":"ISO\/IEC. 2017. ISO\/IEC 15939:2017 Systems and Software Engineering \u2013 Measurement Process. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/71197.html."},{"key":"e_1_3_2_77_2","volume-title":"ISO\/IEC 20547-3:2020 Big Data Reference Architecture - Part 3: Reference Architecture","year":"2020","unstructured":"ISO\/IEC. 2020. ISO\/IEC 20547-3:2020 Big Data Reference Architecture - Part 3: Reference Architecture. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/71277.html."},{"key":"e_1_3_2_78_2","volume-title":"ISO\/IEC AWI 5259-1 Artificial Intelligence \u2013 Data Quality for Analytics and Machine Learning (ML) \u2013 Part 1: Overview, Terminology, and Examples","year":"2022","unstructured":"ISO\/IEC. 2022. ISO\/IEC AWI 5259-1 Artificial Intelligence \u2013 Data Quality for Analytics and Machine Learning (ML) \u2013 Part 1: Overview, Terminology, and Examples. Standard. ISO\/IEC. Retrieved from https:\/\/www.iso.org\/standard\/81088.html."},{"key":"e_1_3_2_79_2","volume-title":"ISO\/TS 8000-1:2011 - Data Quality - Part 1: Overview","year":"2011","unstructured":"ISO\/TS. 2011. ISO\/TS 8000-1:2011 - Data Quality - Part 1: Overview. Standard. ISO\/TS. Retrieved from https:\/\/www.iso.org\/standard\/50798.html."},{"key":"e_1_3_2_80_2","doi-asserted-by":"crossref","unstructured":"Michael A. Iverson Fusun Ozguner and Lee C. Potter. 1999. Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. In Proceedings Eighth Heterogeneous Computing Workshop (HCW\u201999) . 99\u2013111.","DOI":"10.1109\/HCW.1999.765115"},{"key":"e_1_3_2_81_2","unstructured":"Changqing Ji Yu Li Wenming Qiu Uchechukwu Awada and Keqiu Li. 2012. Big data processing in cloud computing environments. In 2012 12th International Symposium on Pervasive Systems Algorithms and Networks (2012) 17\u201323."},{"key":"e_1_3_2_82_2","doi-asserted-by":"crossref","unstructured":"Anirudh Kadadi Rajeev Agrawal Christopher Nyamful and Rahman Atiq. 2014. Challenges of data integration and interoperability in big data. In 2014 IEEE International Conference on Big Data (big data) (2014) 38\u201340.","DOI":"10.1109\/BigData.2014.7004486"},{"issue":"1","key":"e_1_3_2_83_2","article-title":"Dealing with missing values in data.","volume":"5","author":"Kaiser Ji\u0159\u00ed","year":"2014","unstructured":"Ji\u0159\u00ed Kaiser. 2014. Dealing with missing values in data. J. Syst. Integr. 5, 1 (2014) 42\u201351.","journal-title":"J. Syst. Integr."},{"key":"e_1_3_2_84_2","article-title":"A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections","author":"Karami Amir","year":"2015","unstructured":"Amir Karami, Aryya Gangopadhyay, Bin Zhou, and Hadi Kharrazi. 2015. A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections. In iConference 2015.","journal-title":"iConference 2015"},{"key":"e_1_3_2_85_2","first-page":"1284","volume-title":"International Conference on Sustainable Computing and Data Communication Systems (ICSCDS\u201922)","author":"Karmakar Anurag","year":"2022","unstructured":"Anurag Karmakar, Anaswara Raghuthaman, Om Sudhakar Kote, and N. Jayapandian. 2022. Cloud computing application: Research challenges and opportunity. In International Conference on Sustainable Computing and Data Communication Systems (ICSCDS\u201922). IEEE, 1284\u20131289."},{"key":"e_1_3_2_86_2","volume-title":"SIGMOD Conference","author":"Khayyat Zuhair","year":"2015","unstructured":"Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, S. Madden, M. Ouzzani, Paolo Papotti, Jorge-Arnulfo Quian\u00e9-Ruiz, Nan Tang, and Si Yin. 2015. BigDansing: A system for big data cleansing. In SIGMOD Conference."},{"key":"e_1_3_2_87_2","first-page":"S177\u2013S191","article-title":"Sampling techniques for big data analysis","volume":"87","author":"Kim Jae Kwang","year":"2019","unstructured":"Jae Kwang Kim and Zhonglei Wang. 2019. Sampling techniques for big data analysis. Int. Statist. Rev. 87 (2019), S177\u2013S191.","journal-title":"Int. Statist. Rev."},{"key":"e_1_3_2_88_2","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1007\/978-3-642-41360-5_22","volume-title":"Knowledge Engineering and the Semantic Web: 4th International Conference, KESW 2013, St. Petersburg, Russia, October 7\u20139, 2013. Proceedings 4","author":"Kontokostas Dimitris","year":"2013","unstructured":"Dimitris Kontokostas, Amrapali Zaveri, S\u00f6ren Auer, and Jens Lehmann. 2013. TripleCheckMate: A tool for crowdsourcing the quality assessment of linked data. In Knowledge Engineering and the Semantic Web: 4th International Conference, KESW 2013, St. Petersburg, Russia, October 7\u20139, 2013. Proceedings 4. Springer, 265\u2013272."},{"key":"e_1_3_2_89_2","doi-asserted-by":"crossref","unstructured":"Pradeep Kumar Roheet Bhatnagar Kuntal Gaur and Anurag Bhatnagar. 2021. Classification of imbalanced data: Review of methods and applications. IOP Conference Series: Materials Science and Engineering 1099 1 (2021) 012077.","DOI":"10.1088\/1757-899X\/1099\/1\/012077"},{"key":"e_1_3_2_90_2","doi-asserted-by":"crossref","unstructured":"Tien Fabrianti Kusumasari et\u00a0al. 2016. Data profiling for data quality improvement with OpenRefine. In International Conference on Information Technology Systems and Innovation (ICITSI\u201916) . 1\u20136.","DOI":"10.1109\/ICITSI.2016.7858197"},{"issue":"3","key":"e_1_3_2_91_2","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/S0378-7206(00)00060-4","article-title":"Quality metrics for intranet applications","volume":"38","author":"Leung Hareton K. N.","year":"2001","unstructured":"Hareton K. N. Leung. 2001. Quality metrics for intranet applications. Inf. Manag. 38, 3 (2001), 137\u2013152.","journal-title":"Inf. Manag."},{"key":"e_1_3_2_92_2","doi-asserted-by":"crossref","first-page":"72713","DOI":"10.1109\/ACCESS.2020.2988120","article-title":"Sampling for big data profiling: A survey","volume":"8","author":"Liu Zhicheng","year":"2020","unstructured":"Zhicheng Liu and Aoqian Zhang. 2020. Sampling for big data profiling: A survey. IEEE Access 8 (2020), 72713\u201372726.","journal-title":"IEEE Access"},{"key":"e_1_3_2_93_2","doi-asserted-by":"crossref","first-page":"7776","DOI":"10.1109\/ACCESS.2017.2696365","article-title":"Machine learning with big data: Challenges and approaches","volume":"5","author":"L\u2019Heureux Alexandra","year":"2017","unstructured":"Alexandra L\u2019Heureux, Katarina Grolinger, Hany F. Elyamany, and Miriam A. M. Capretz. 2017. Machine learning with big data: Challenges and approaches. IEEE Access 5 (2017), 7776\u20137797.","journal-title":"IEEE Access"},{"key":"e_1_3_2_94_2","doi-asserted-by":"crossref","unstructured":"Jyoti Malhotra and Jagdish Bakal. 2015. A survey and comparative study of data deduplication techniques. In International Conference on Pervasive Computing (ICPC\u201915) . 1\u20135.","DOI":"10.1109\/PERVASIVE.2015.7087116"},{"key":"e_1_3_2_95_2","doi-asserted-by":"publisher","DOI":"10.4018\/978-1-5225-0182-4.ch005"},{"issue":"4","key":"e_1_3_2_96_2","doi-asserted-by":"crossref","first-page":"448","DOI":"10.25122\/jml-2021-0100","article-title":"Security challenges and solutions using healthcare cloud computing","volume":"14","author":"Mehrtak Mohammad","year":"2021","unstructured":"Mohammad Mehrtak, SeyedAhmad SeyedAlinaghi, Mehrzad MohsseniPour, Tayebeh Noori, Amirali Karimi, Ahmadreza Shamsabadi, Mohammad Heydari, Alireza Barzegary, Pegah Mirzapour, Mahdi Soleymanzadeh, et\u00a0al. 2021. Security challenges and solutions using healthcare cloud computing. J. Med. Life 14, 4 (2021), 448.","journal-title":"J. Med. Life"},{"key":"e_1_3_2_97_2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1016\/j.future.2015.11.024","article-title":"A data quality in use model for big data","volume":"63","author":"Merino Jorge","year":"2016","unstructured":"Jorge Merino, Ismael Caballero, Bibiano Rivas, Manuel Serrano, and Mario Piattini. 2016. A data quality in use model for big data. Fut. Gen. Comput. Syst. 63 (2016), 123\u2013130.","journal-title":"Fut. Gen. Comput. Syst."},{"key":"e_1_3_2_98_2","first-page":"335","volume-title":"The Semantic Web: ESWC 2017 Satellite Events: ESWC 2017 Satellite Events, Portoro\u017e, Slovenia, May 28\u2013June 1, 2017, Revised Selected Papers 14","author":"Mihindukulasooriya Nandana","year":"2017","unstructured":"Nandana Mihindukulasooriya, Ra\u00fal Garc\u00eda-Castro, Freddy Priyatna, Edna Ruckhaus, and Nelson Saturno. 2017. A linked data profiling service for quality assessment. In The Semantic Web: ESWC 2017 Satellite Events: ESWC 2017 Satellite Events, Portoro\u017e, Slovenia, May 28\u2013June 1, 2017, Revised Selected Papers 14. Springer, 335\u2013340."},{"key":"e_1_3_2_99_2","volume-title":"International Conference on Very Large Data Bases.","author":"Missier Paolo","year":"2006","unstructured":"Paolo Missier, Suzanne Embury, Mark Greenwood, Alun Preece, and Binling Jin. 2006. Quality views: Capturing and exploiting the user perspective on data quality. In International Conference on Very Large Data Bases."},{"key":"e_1_3_2_100_2","doi-asserted-by":"crossref","unstructured":"Hajar Mousannif Hasna Sabah Yasmina Douiji and Younes Oulad Sayad. 2014. From big data to big projects: A step-by-step roadmap. In 2014 International Conference on Future Internet of Things and Cloud . 373\u2013378.","DOI":"10.1109\/FiCloud.2014.66"},{"key":"e_1_3_2_101_2","first-page":"1","article-title":"Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach","volume":"18","author":"Munn Zachary","year":"2018","unstructured":"Zachary Munn, Micah D. J. Peters, Cindy Stern, Catalin Tufanaru, Alexa McArthur, and Edoardo Aromataris. 2018. Systematic review or scoping review? Guidance for authors when choosing between a systematic or scoping review approach. BMC Med. Res. Methodol. 18 (2018), 1\u20137.","journal-title":"BMC Med. Res. Methodol."},{"key":"e_1_3_2_102_2","doi-asserted-by":"crossref","unstructured":"Goutam Mylavarapu Johnson P. Thomas and K. Ashwin Viswanathan. 2019. An automated big data accuracy assessment tool. In IEEE 4th International Conference on Big Data Analytics (ICBDA\u201919) . 193\u2013197.","DOI":"10.1109\/ICBDA.2019.8713218"},{"key":"e_1_3_2_103_2","doi-asserted-by":"crossref","unstructured":"Goutam Mylavarapu K. Ashwin Viswanathan and Johnson P. Thomas. 2019. Assessing context-aware data consistency. In IEEE\/ACS 16th International Conference on Computer Systems and Applications (AICCSA\u201919) . 1\u20136.","DOI":"10.1109\/AICCSA47632.2019.9035250"},{"issue":"1","key":"e_1_3_2_104_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-014-0007-7","article-title":"Deep learning applications and challenges in big data analytics","volume":"2","author":"Najafabadi Maryam M.","year":"2015","unstructured":"Maryam M. Najafabadi, Flavio Villanustre, Taghi M. Khoshgoftaar, Naeem Seliya, Randall Wald, and Edin Muharemagic. 2015. Deep learning applications and challenges in big data analytics. J. Big Data 2, 1 (2015), 1\u201321.","journal-title":"J. Big Data"},{"issue":"12","key":"e_1_3_2_105_2","doi-asserted-by":"crossref","first-page":"1986","DOI":"10.14778\/3352063.3352116","article-title":"Data lake management: Challenges and opportunities","volume":"12","author":"Nargesian Fatemeh","year":"2019","unstructured":"Fatemeh Nargesian, Erkang Zhu, Ren\u00e9e J. Miller, Ken Q. Pu, and Patricia C. Arocena. 2019. Data lake management: Challenges and opportunities. Proc. VLDB Endow. 12, 12 (2019), 1986\u20131989.","journal-title":"Proc. VLDB Endow."},{"issue":"4","key":"e_1_3_2_106_2","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1145\/2590989.2590995","article-title":"Data profiling revisited","volume":"42","author":"Naumann Felix","year":"2014","unstructured":"Felix Naumann. 2014. Data profiling revisited. ACM SIGMOD Rec. 42, 4 (2014), 40\u201349.","journal-title":"ACM SIGMOD Rec."},{"key":"e_1_3_2_107_2","first-page":"169","volume-title":"International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE\u201908)","author":"Niemel\u00e4 Eila","year":"2008","unstructured":"Eila Niemel\u00e4, Antti Evesti, and Pekka Savolainen. 2008. Modeling quality attribute variability. In International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE\u201908). 169\u2013176."},{"key":"e_1_3_2_108_2","first-page":"274","volume-title":"International Conference on Enterprise Information Systems (ICEIS\u201919)","author":"Nikiforova Anastasija","year":"2019","unstructured":"Anastasija Nikiforova and Janis Bicevskis. 2019. An extended data object-driven approach to data quality evaluation: Contextual data quality analysis. In International Conference on Enterprise Information Systems (ICEIS\u201919). 274\u2013281."},{"issue":"1","key":"e_1_3_2_109_2","doi-asserted-by":"crossref","first-page":"107","DOI":"10.3897\/jucs.2020.007","article-title":"User-oriented approach to data quality evaluation.","volume":"26","author":"Nikiforova Anastasija","year":"2020","unstructured":"Anastasija Nikiforova, Janis Bicevskis, Zane Bicevska, and Ivo Oditis. 2020. User-oriented approach to data quality evaluation. J. Univers. Comput. Sci. 26, 1 (2020), 107\u2013126.","journal-title":"J. Univers. Comput. Sci."},{"issue":"4","key":"e_1_3_2_110_2","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1016\/j.bdr.2015.01.001","article-title":"Reference architecture and classification of technologies, products and services for big data systems","volume":"2","author":"P\u00e4\u00e4kk\u00f6nen Pekka","year":"2015","unstructured":"Pekka P\u00e4\u00e4kk\u00f6nen and Daniel Pakkala. 2015. Reference architecture and classification of technologies, products and services for big data systems. Big Data Res. 2, 4 (2015), 166\u2013186.","journal-title":"Big Data Res."},{"key":"e_1_3_2_111_2","unstructured":"Peter F. Patel-Schneider. 2015. Towards large-scale schema and ontology matching. Retrieved from https:\/\/www.semanticscholar.org\/paper\/Towards-Large-scale-Schema-And-Ontology-Matching-Patel-Schneider\/ceee2bdaef83a0f09480fa6fb191cf3372137152."},{"key":"e_1_3_2_112_2","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1007\/s10115-018-1164-3","article-title":"A systematic review of provenance systems","volume":"57","author":"P\u00e9rez Beatriz","year":"2018","unstructured":"Beatriz P\u00e9rez, Julio Rubio, and Carlos S\u00e1enz-Ad\u00e1n. 2018. A systematic review of provenance systems. Knowl. Inf. Syst. 57 (2018), 495\u2013543.","journal-title":"Knowl. Inf. Syst."},{"issue":"4","key":"e_1_3_2_113_2","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1145\/505248.506010","article-title":"Data quality assessment","volume":"45","author":"Pipino Leo L.","year":"2002","unstructured":"Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. 2002. Data quality assessment. Commun. ACM 45, 4 (2002), 211\u2013218.","journal-title":"Commun. ACM"},{"issue":"1","key":"e_1_3_2_114_2","first-page":"3","article-title":"Developing a measurement instrument for subjective aspects of information quality","volume":"22","author":"Price Rosanne","year":"2008","unstructured":"Rosanne Price, Dina Neiger, and Graeme Shanks. 2008. Developing a measurement instrument for subjective aspects of information quality. Commun. Assoc. Inf. Syst. 22, 1 (2008), 3.","journal-title":"Commun. Assoc. Inf. Syst."},{"key":"e_1_3_2_115_2","unstructured":"Kumar Rahul and R. K. Banyal. 2019. Data cleaning mechanism for big data and cloud computing. In 6th International Conference on Computing for Sustainable Global Development (INDIACom\u201919) . 195\u2013198."},{"key":"e_1_3_2_116_2","doi-asserted-by":"crossref","unstructured":"Lakshmish Ramaswamy Victor Lawson and Siva Venkat Gogineni. 2013. Towards a quality-centric big data architecture for federated sensor services. In 2013 IEEE International Congress on Big Data . 86\u201393.","DOI":"10.1109\/BigData.Congress.2013.21"},{"key":"e_1_3_2_117_2","doi-asserted-by":"crossref","unstructured":"R. Rawat and R. Yadav. 2021. Big data: Big data analysis issues and challenges and technologies. IOP Conference Series: Materials Science and Engineering 1022 1 (2021) 012014.","DOI":"10.1088\/1757-899X\/1022\/1\/012014"},{"issue":"5","key":"e_1_3_2_118_2","doi-asserted-by":"crossref","first-page":"532","DOI":"10.21817\/indjcse\/2020\/v11i5\/201105116","article-title":"Sampling based join-aggregate query processing technique for big data","volume":"11","author":"Sadineni Praveen Kumar","year":"2020","unstructured":"Praveen Kumar Sadineni. 2020. Sampling based join-aggregate query processing technique for big data. Indian J. Comput. Sci. Eng. 11, 5, 532\u2013546.","journal-title":"Indian J. Comput. Sci. Eng."},{"key":"e_1_3_2_119_2","doi-asserted-by":"crossref","unstructured":"Barna Saha and Divesh Srivastava. 2014. Data quality: The other face of big data. In 2014 IEEE 30th International Conference on Data Engineering . 1294\u20131297.","DOI":"10.1109\/ICDE.2014.6816764"},{"issue":"12","key":"e_1_3_2_120_2","doi-asserted-by":"crossref","first-page":"1781","DOI":"10.14778\/3229863.3229867","article-title":"Automating large-scale data quality verification","volume":"11","author":"Schelter Sebastian","year":"2018","unstructured":"Sebastian Schelter, Dustin Lange, Philipp Schmidt, Meltem Celikel, Felix Biessmann, and Andreas Grafberger. 2018. Automating large-scale data quality verification. Proc. VLDB Endow. 11, 12 (2018), 1781\u20131794.","journal-title":"Proc. VLDB Endow."},{"key":"e_1_3_2_121_2","volume-title":"Data Quality","author":"Sharma Gaurav","year":"2021","unstructured":"Gaurav Sharma. 2021. Data Quality. Retrieved from https:\/\/www.computer.org\/publications\/tech-news\/trends\/big-data-and-cloud-computing."},{"key":"e_1_3_2_122_2","unstructured":"Norbert Siegmund Marko Rosenm\u00fcller Martin Kuhlemann Christian K\u00e4stner Sven Apel Fabien Duchateau and Justin Fagnan. 2015. Schema matching bibtex. In Proceedings of the VLDB Endowment ."},{"key":"e_1_3_2_123_2","unstructured":"Calidad Software. 2022. ISO\/IEC 25012. Retrieved from https:\/\/iso25000.com\/index.php\/en\/iso-25000-standards\/iso-25012."},{"key":"e_1_3_2_124_2","doi-asserted-by":"crossref","unstructured":"Dragan Stojanovi\u0107 Natalija Stojanovi\u0107 and Jovan Turanjanin. 2015. Processing big trajectory and Twitter data streams using Apache STORM. (2015) 301\u2013304. Retrieved from https:\/\/www.semanticscholar.org\/paper\/Schema-Matching-Bibtex-Siegmund-Rosenm%C3%BCller\/a4d94ddaab429e5874386dd29822e470b57d6ee4.","DOI":"10.1109\/TELSKS.2015.7357792"},{"issue":"5","key":"e_1_3_2_125_2","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/253769.253804","article-title":"Data quality in context","volume":"40","author":"Strong Diane M.","year":"1997","unstructured":"Diane M. Strong, Yang W. Lee, and Richard Y. Wang. 1997. Data quality in context. Commun. ACM 40, 5 (1997), 103\u2013110.","journal-title":"Commun. ACM"},{"key":"e_1_3_2_126_2","first-page":"910","volume-title":"On the Move to Meaningful Internet Systems: OTM 2016 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24\u201328, 2016, Proceedings","author":"Taher Yehia","year":"2016","unstructured":"Yehia Taher, Rafiqul Haque, Mohammed AlShaer, Willem Jan van den Heuvel, Mohand-Sa\u00efd Hacid, and Mohamed Dbouk. 2016. A context-aware analytics for processing tweets and analysing sentiment in realtime (short paper). In On the Move to Meaningful Internet Systems: OTM 2016 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2016, Rhodes, Greece, October 24\u201328, 2016, Proceedings. Springer, 910\u2013917."},{"key":"e_1_3_2_127_2","doi-asserted-by":"crossref","unstructured":"Yehia Taher Rafiqul Haque and Mohand-Said Hacid. 2017. BDLaaS: Big data lab as a service for experimenting big data solution. In IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS* W\u201917) . 155\u2013159.","DOI":"10.1109\/FAS-W.2017.140"},{"key":"e_1_3_2_128_2","doi-asserted-by":"crossref","unstructured":"Ikbal Taleb Rachida Dssouli and Mohamed Adel Serhani. 2015. Big data pre-processing: A quality framework. (2015) 191\u2013198.","DOI":"10.1109\/BigDataCongress.2015.35"},{"key":"e_1_3_2_129_2","doi-asserted-by":"crossref","unstructured":"Ikbal Taleb Mohamed Adel Serhani and Rachida Dssouli. 2018. Big data quality assessment model for unstructured data. In International Conference on Innovations in Information Technology (IIT\u201918) . 69\u201374.","DOI":"10.1109\/INNOVATIONS.2018.8605945"},{"key":"e_1_3_2_130_2","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1007\/978-3-030-23381-5_5","volume-title":"Services\u2013SERVICES 2019: 15th World Congress, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, June 25\u201330, 2019, Proceedings 15","author":"Taleb Ikbal","year":"2019","unstructured":"Ikbal Taleb, Mohamed Adel Serhani, and Rachida Dssouli. 2019. Big data quality: A data quality profiling model. In Services\u2013SERVICES 2019: 15th World Congress, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, June 25\u201330, 2019, Proceedings 15. Springer, 61\u201377."},{"key":"e_1_3_2_131_2","volume-title":"How to Manage Modern Data Quality [White Paper]","year":"2020","unstructured":"Talend. 2020. How to Manage Modern Data Quality [White Paper]. Technical Report. Talend. Retrieved from https:\/\/www.talend.com\/resources\/definitive-guide-data-quality-how-to-manage."},{"issue":"2","key":"e_1_3_2_132_2","article-title":"Towards a powerful solution for data accuracy assessment in the big data context","volume":"11","author":"Talha Mohamed","year":"2020","unstructured":"Mohamed Talha, Nabil Elmarzouqi, and Anas Abou El Kalam. 2020. Towards a powerful solution for data accuracy assessment in the big data context. Int. J. Advanc. Comput. Sci. Applic. 11, 2 (2020).","journal-title":"Int. J. Advanc. Comput. Sci. Applic."},{"key":"e_1_3_2_133_2","first-page":"363","volume-title":"13th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201916)","author":"Venkataraman Shivaram","year":"2016","unstructured":"Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. 2016. Ernest: Efficient performance prediction for large-scale advanced analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201916). 363\u2013378."},{"issue":"2","key":"e_1_3_2_134_2","first-page":"52","article-title":"Machine learning in big data","volume":"1","author":"Wang Lidong","year":"2016","unstructured":"Lidong Wang and Cheryl Ann Alexander. 2016. Machine learning in big data. Int. J. Math., Eng. Manag. Sci. 1, 2 (2016), 52\u201361.","journal-title":"Int. J. Math., Eng. Manag. Sci."},{"issue":"2","key":"e_1_3_2_135_2","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1145\/269012.269022","article-title":"A product perspective on total data quality management","volume":"41","author":"Wang Richard Y.","year":"1998","unstructured":"Richard Y. Wang. 1998. A product perspective on total data quality management. Commun. ACM 41, 2 (1998), 58\u201365.","journal-title":"Commun. ACM"},{"key":"e_1_3_2_136_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1080\/07421222.1996.11518099","article-title":"Beyond accuracy: What data quality means to data consumers","volume":"12","author":"Wang Richard Y.","year":"1996","unstructured":"Richard Y. Wang and Diane Strong. 1996. Beyond accuracy: What data quality means to data consumers. J. Manag. Inf. Syst. 12 (1996), 5\u201333.","journal-title":"J. Manag. Inf. Syst."},{"key":"e_1_3_2_137_2","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1016\/j.future.2020.01.010","article-title":"Evaluating the crowd quality for subjective questions based on a Spark computing environment","volume":"106","author":"Wang Xinxin","year":"2020","unstructured":"Xinxin Wang, Depeng Dang, and Zixian Guo. 2020. Evaluating the crowd quality for subjective questions based on a Spark computing environment. Fut. Gen. Comput. Syst. 106 (2020), 426\u2013437.","journal-title":"Fut. Gen. Comput. Syst."},{"key":"e_1_3_2_138_2","doi-asserted-by":"crossref","unstructured":"Chen Wei-Liang Zhang Shi-Dong and Gao Xiang. 2009. Anchoring the consistency dimension of data quality using ontology in data integration. (2009) 201\u2013205.","DOI":"10.1109\/WISA.2009.32"},{"issue":"4","key":"e_1_3_2_139_2","first-page":"298","article-title":"A classification of data quality assessment and improvement methods","volume":"3","author":"Woodall Philip","year":"2014","unstructured":"Philip Woodall, Martin Oberhofer, and Alexander Borek. 2014. A classification of data quality assessment and improvement methods. Int. J. Inf. Qual. 3, 4 (2014), 298\u2013321.","journal-title":"Int. J. Inf. Qual."},{"key":"e_1_3_2_140_2","article-title":"Sensing as a service and big data","author":"Zaslavsky Arkady","year":"2013","unstructured":"Arkady Zaslavsky, Charith Perera, and Dimitrios Georgakopoulos. 2013. Sensing as a service and big data. arXiv preprint arXiv:1301.0159 (2013).","journal-title":"arXiv preprint arXiv:1301.0159"},{"key":"e_1_3_2_141_2","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1145\/2506182.2506195","volume-title":"9th International Conference on Semantic Systems","author":"Zaveri Amrapali","year":"2013","unstructured":"Amrapali Zaveri, Dimitris Kontokostas, Mohamed A. Sherif, Lorenz B\u00fchmann, Mohamed Morsey, S\u00f6ren Auer, and Jens Lehmann. 2013. User-driven quality evaluation of DBpedia. In 9th International Conference on Semantic Systems. 97\u2013104."},{"key":"e_1_3_2_142_2","doi-asserted-by":"crossref","unstructured":"Pengcheng Zhang Xuewu Zhou Wenrui Li and Jerry Gao. 2017. A survey on quality assurance techniques for big data applications. (2017) 313\u2013319.","DOI":"10.1109\/BigDataService.2017.42"},{"key":"e_1_3_2_143_2","doi-asserted-by":"crossref","first-page":"108565","DOI":"10.1016\/j.patcog.2022.108565","article-title":"Split, embed and merge: An accurate table structure recognizer","volume":"126","author":"Zhang Zhenrong","year":"2022","unstructured":"Zhenrong Zhang, Jianshu Zhang, Jun Du, and Fengren Wang. 2022. Split, embed and merge: An accurate table structure recognizer. Pattern Recognit. 126 (2022), 108565.","journal-title":"Pattern Recognit."},{"key":"e_1_3_2_144_2","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1016\/j.neucom.2017.01.026","article-title":"Machine learning on big data: Opportunities and challenges","volume":"237","author":"Zhou Lina","year":"2017","unstructured":"Lina Zhou, Shimei Pan, Jianwu Wang, and Athanasios V. Vasilakos. 2017. Machine learning on big data: Opportunities and challenges. Neurocomputing 237 (2017), 350\u2013361.","journal-title":"Neurocomputing"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603707","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3603707","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:37:21Z","timestamp":1750178241000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3603707"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,22]]},"references-count":143,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3603707"],"URL":"https:\/\/doi.org\/10.1145\/3603707","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"value":"1936-1955","type":"print"},{"value":"1936-1963","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,22]]},"assertion":[{"value":"2022-04-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-05-08","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}