{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T07:47:41Z","timestamp":1761896861591,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2018,10,16]],"date-time":"2018-10-16T00:00:00Z","timestamp":1539648000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Education Ministry's New Century Excellent Talents Supporting Plan in China","award":["B43451914"],"award-info":[{"award-number":["B43451914"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>In banks, governments, and Internet companies, inconsistent data problems may often arise when various information systems are collecting, processing, and updating data due to human or equipment reasons. The emergence of inconsistent data makes it impossible to obtain correct information from the data and reduces its availability. Such problems may be fatal in data-intensive enterprises, which causes huge economic losses. Moreover, it is very difficult to clean inconsistent data in databases, especially for data containing conditional functional dependencies with built-in predicates (CFDPs), because it tends to contain more candidate repair values. For the inconsistent data containing CFDPs to detect incomplete and repair difficult problems in databases, we propose a dependency lifting algorithm (DLA) based on the maximum dependency set (MDS) and a reparation algorithm (C-Repair) based on integrating the minimum cost and attribute correlation, respectively. In detection, we find recessive dependencies from the original dependency set to obtain the MDS and improve the original algorithm by dynamic domain adjustment, which extends the applicability to continuous attributes and improves the detection accuracy. In reparation, we first set up a priority queue (PQ) for elements to be repaired based on the minimum cost idea to select a candidate element; then, we treat the corresponding conflict-free instance (     I  n v      ) as the training set to learn the correlation among attributes and compute the weighted distance (WDis) between the tuple of the candidate element and other tuples in      I  n v       according to the correlation; and, lastly, we perform reparation based on the WDis and re-compute the PQ after each reparation round to improve the efficiency, and use a label, flag, to mark the repaired elements to ensure the convergence at the same time. By setting up a contrast experiment, we compare the DLA with the CFDPs based algorithm, and the C-Repair with a cost-based, interpolation-based algorithm on a simulated instance and a real instance. From the experimental results, the DLA and C-Repair algorithms have better detection and repair ability at a higher time cost.<\/jats:p>","DOI":"10.3390\/sym10100516","type":"journal-article","created":{"date-parts":[[2018,10,16]],"date-time":"2018-10-16T11:07:51Z","timestamp":1539688071000},"page":"516","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Inconsistent Data Cleaning Based on the Maximum Dependency Set and Attribute Correlation"],"prefix":"10.3390","volume":"10","author":[{"given":"Pei","family":"Li","sequence":"first","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chaofan","family":"Dai","sequence":"additional","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wenqian","family":"Wang","sequence":"additional","affiliation":[{"name":"Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,10,16]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1145\/2854006.2854008","article-title":"Data Quality: From Theory to Practice","volume":"44","author":"Fan","year":"2015","journal-title":"ACM SIGMOD Rec."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Tu, S., and Huang, M. (2016, January 20\u201322). Scalable Functional Dependencies Discovery from Big Data. Proceedings of the IEEE Second International Conference on Multimedia Big Data, Taipei, Taiwan.","DOI":"10.1109\/BigMM.2016.63"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1587\/transinf.2015EDL8170","article-title":"An Optimization Strategy for CFDMiner: An Algorithm of Discovering Constant Conditional Functional Dependencies","volume":"E99-D","author":"Zhou","year":"2016","journal-title":"IEICE Trans. Inf. Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"3274","DOI":"10.1109\/TKDE.2015.2451632","article-title":"Extending Conditional Dependencies with Built-in Predicates","volume":"27","author":"Ma","year":"2015","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_5","first-page":"1727","article-title":"Consistent Estimation of Query Result in Inconsistent Data","volume":"38","author":"Liu","year":"2015","journal-title":"Chin. J. Comput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1016\/j.ijar.2017.02.003","article-title":"On repairing and querying inconsistent probabilistic spatio-temporal databases","volume":"84","author":"Parisi","year":"2017","journal-title":"Int. J. Approx. Reason."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bohannon, P., Fan, W., Flaster, M., and Rastogi, R. (2005, January 14\u201316). A cost-based model and effective heuristic for repairing constraints by value modification. Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA.","DOI":"10.1145\/1066157.1066175"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1080\/13658816.2014.965711","article-title":"Rank-based strategies for cleaning inconsistent spatial databases","volume":"29","author":"Brisaboa","year":"2015","journal-title":"Int. J. Geogr. Inf. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Bai, L., Shao, Z., Lin, Z., and Cheng, S. (2017). Fixing inconsistencies of fuzzy spatiotemporal XML data. Appl. Intell., 1\u201319.","DOI":"10.1007\/s10489-016-0888-6"},{"key":"ref_10","first-page":"232","article-title":"Study on Data Repair and Consistency Query Processing","volume":"43","author":"Liu","year":"2016","journal-title":"Comput. Sci."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Arieli, O., and Zamansky, A. (2016). A Graded Approach to Database Repair by Context-Aware Distance Semantics, Elsevier North-Holland, Inc.","DOI":"10.1016\/j.fss.2015.06.007"},{"key":"ref_12","first-page":"1685","article-title":"Repairing Inconsistent Relational Data Based on Possible Word Model","volume":"27","author":"Xu","year":"2016","journal-title":"J. Softw."},{"key":"ref_13","first-page":"2664","article-title":"Data consistency repair method for enterprise information integration","volume":"10","author":"Liu","year":"2013","journal-title":"Comput. Integr. Manuf. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kim, J., Jang, G.J., and Lee, M. (2016). Investigation of the Efficiency of Unsupervised Learning for Multi-task Classification in Convolutional Neural Network. Neural Information Processing, Springer International Publishing.","DOI":"10.1007\/978-3-319-46675-0_60"},{"key":"ref_15","first-page":"1","article-title":"A generalized Fellegi-Holt paradigm for automatic error localization","volume":"42","author":"Scholtus","year":"2016","journal-title":"Surv. Methodol."},{"key":"ref_16","unstructured":"Zhang, C., and Diao, Y. (2016, January 13\u201315). Conditional functional dependency discovery and data repair based on decision tree. Proceedings of the International Conference on Fuzzy Systems and Knowledge Discovery, Changsha, China."},{"key":"ref_17","unstructured":"Cao, J., and Diao, X. (2017). Introduction to Data Quality, National Defense Industry Press."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11760-016-0938-x","article-title":"Improving retrieval framework using information gain models","volume":"11","author":"Le","year":"2017","journal-title":"Signal Image Video Process."},{"key":"ref_19","first-page":"429","article-title":"Informative Gene Selection Method Based on Symmetric Uncertainly and SVM Recursive Feature Elimination","volume":"30","author":"Ye","year":"2017","journal-title":"Pattern Recognit. Artif. Intell."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1007\/s10619-018-7240-6","article-title":"An effective weighted rule-based method for entity resolution","volume":"36","author":"Ahmad","year":"2018","journal-title":"Distrib. Parallel Databases"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1109\/TEVC.2013.2285016","article-title":"A New Multiobjective Evolutionary Algorithm for Mining a Reduced Set of Interesting Positive and Negative Quantitative Association Rules","volume":"18","author":"Martin","year":"2014","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_22","unstructured":"Zhang, X.J., Wang, M., and Meng, X.F. (2014). An Accurate Method for Mining top-k Frequent Pattern Under Differential Privacy. J. Comput. Res. Dev., 104\u2013114."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"157","DOI":"10.2147\/CLEP.S129785","article-title":"Missing data and multiple imputation in clinical epidemiological research","volume":"9","author":"Pedersen","year":"2017","journal-title":"Clin. Epidemiol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Diao, Y., Liu, K.Y., Meng, X., Ye, X., and He, K. (2015, January 12\u201314). A Big Data Online Cleaning Algorithm Based on Dynamic Outlier Detection. Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Nanjing, China.","DOI":"10.1109\/CyberC.2015.68"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/10\/516\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:26:06Z","timestamp":1760196366000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/10\/10\/516"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,10,16]]},"references-count":24,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2018,10]]}},"alternative-id":["sym10100516"],"URL":"https:\/\/doi.org\/10.3390\/sym10100516","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2018,10,16]]}}}