{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T10:59:16Z","timestamp":1777546756316,"version":"3.51.4"},"reference-count":57,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,8,21]],"date-time":"2021-08-21T00:00:00Z","timestamp":1629504000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,21]],"date-time":"2021-08-21T00:00:00Z","timestamp":1629504000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency of duplicate removal process. However, duplicate detection has become more challenging due to the presence of missing values within the records where during the clustering and matching process, missing values can cause records deemed similar to be inserted into the wrong group, hence, leading to undetected duplicates. In this paper, duplicate detection improvement was proposed despite the presence of missing values within a data set through Duplicate Detection within the Incomplete Data set (DDID) method. The missing values were hypothetically added to the key attributes of three data sets under study, using an arbitrary pattern to simulate both complete and incomplete data sets. The results were analyzed, then, the performance of duplicate detection was evaluated by using the Hot Deck method to compensate for the missing values in the key attributes. It was hypothesized that by using Hot Deck, duplicate detection performance would be improved. Furthermore, the DDID performance was compared to an early duplicate detection method namely DuDe, in terms of its accuracy and speed. The findings yielded that even though the data sets were incomplete, DDID was able to offer a better accuracy and faster duplicate detection as compared to DuDe. The results of this study offer insights into constraints of duplicate detection within incomplete data sets.<\/jats:p>","DOI":"10.1186\/s40537-021-00502-1","type":"journal-article","created":{"date-parts":[[2021,8,21]],"date-time":"2021-08-21T05:03:05Z","timestamp":1629522185000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Missing values compensation in duplicates detection using hot deck method"],"prefix":"10.1186","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8004-6791","authenticated-orcid":false,"given":"Abdulrazzak","family":"Ali","sequence":"first","affiliation":[]},{"given":"Nurul A.","family":"Emran","sequence":"additional","affiliation":[]},{"given":"Siti A.","family":"Asmai","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,8,21]]},"reference":[{"issue":"3","key":"502_CR1","first-page":"463","volume":"26","author":"RW Griffeth","year":"2000","unstructured":"Griffeth RW, Hom PW, Gaertner S. A meta-analysis of antecedents and correlates of employee turnover: update, moderator tests, and research implications for the next millennium. J Manag. 2000;26(3):463\u201388.","journal-title":"J Manag"},{"key":"502_CR2","unstructured":"Shilane P, Chitloor R, Jonnala UK. 99 deduplication problems. In: 8th USENIX workshop on hot topics in storage and file systems (HotStorage 16), USENIX association, Denver, CO. 2016. p. 1\u20135."},{"issue":"9","key":"502_CR3","doi-asserted-by":"publisher","first-page":"1681","DOI":"10.1109\/JPROC.2016.2571298","volume":"104","author":"W Xia","year":"2016","unstructured":"Xia W, Jiang H, Feng D, Douglis F, Shilane P, Hua Y, Fu M, Zhang Y, Zhou Y. A comprehensive study of the past, present, and future of data deduplication. Proc IEEE. 2016;104(9):1681\u2013710.","journal-title":"Proc IEEE"},{"key":"502_CR4","doi-asserted-by":"crossref","unstructured":"Chernov I, Ivashko E, Rumiantsev A, Ponomarev V, Shabaev A. Survey on deduplication techniques in flash-based storage. In: 2018 22nd conference of open innovations association (FRUCT). IEEE, Jyvaskyla, Finland. 2018.","DOI":"10.23919\/FRUCT.2018.8468295"},{"key":"502_CR5","doi-asserted-by":"crossref","unstructured":"Xu L, Pavlo A, Sengupta S, Ganger GR. Online deduplication for databases. In: proceedings of the 2017 ACM international conference on management of data. ACM, Chicago Illinois USA. 2017; p. 1355\u201368.","DOI":"10.1145\/3035918.3035938"},{"issue":"21","key":"502_CR6","first-page":"18","volume":"116","author":"V Wandhekar","year":"2015","unstructured":"Wandhekar V. Validation of deduplication in data using similarity measure. Int J Comput Appl. 2015;116(21):18\u201322.","journal-title":"Int J Comput Appl"},{"key":"502_CR7","doi-asserted-by":"crossref","unstructured":"Menestrina D, Whang SE, Garcia-molina H. Evaluating entity resolution results. In: proceedings of the VLDB endowment. VLDB Endowment, Singapore. 2010;3:208\u201319","DOI":"10.14778\/1920841.1920871"},{"key":"502_CR8","unstructured":"Panse F. Duplicate detection in probabilistic relational databases. University of Hambur, PhD thesis. 2015."},{"issue":"2","key":"502_CR9","first-page":"1225","volume":"6","author":"VH Umathe","year":"2015","unstructured":"Umathe VH, Chaudhary G. A review on incomplete data and clustering. Int J Comput Sci Inf Technol. 2015;6(2):1225\u20137.","journal-title":"Int J Comput Sci Inf Technol"},{"key":"502_CR10","doi-asserted-by":"publisher","first-page":"69162","DOI":"10.1109\/ACCESS.2019.2910287","volume":"7","author":"S Wang","year":"2019","unstructured":"Wang S, Li M, Hu N, Zhu E, Hu J, Liu X, Yin J. K-means clustering with incomplete data. IEEE Access. 2019;7:69162\u201371.","journal-title":"IEEE Access"},{"issue":"8","key":"502_CR11","doi-asserted-by":"publisher","first-page":"941","DOI":"10.3923\/itj.2012.941.945","volume":"11","author":"V Subramaniyaswamy","year":"2012","unstructured":"Subramaniyaswamy V, Pandian C. A complete survey of duplicate record detection using data mining techniques. Inf Technol J. 2012;11(8):941\u20135.","journal-title":"Inf Technol J"},{"issue":"4","key":"502_CR12","doi-asserted-by":"publisher","first-page":"2404","DOI":"10.1214\/14-AOAS779","volume":"8","author":"M Sadinle","year":"2014","unstructured":"Sadinle M. Detecting duplicates in a homicide registry using a Bayesian partitioning approach. Ann Appl Stat. 2014;8(4):2404\u201334.","journal-title":"Ann Appl Stat"},{"issue":"8","key":"502_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0159644","volume":"11","author":"Q Chen","year":"2016","unstructured":"Chen Q, Zobel J, Zhang X, Verspoor K. Supervised learning for detection of duplicates in genomic sequence databases. PLoS ONE. 2016;11(8):1\u201320.","journal-title":"PLoS ONE"},{"key":"502_CR14","unstructured":"Huang Y, Chiang F. Refining duplicate detection for improved data quality. In: TDDL\/MDQual\/Futurity@ TPDL. 2017."},{"key":"502_CR15","doi-asserted-by":"crossref","unstructured":"Ali A, Emran NA, Asmai SA, Thabet A. Duplicates detection within incomplete data sets using blocking and dynamic sorting key methods. Int J Adv Comput Sci Appl. 2018;9(9).","DOI":"10.14569\/IJACSA.2018.090979"},{"issue":"1","key":"502_CR16","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1186\/s40537-017-0099-y","volume":"4","author":"HT Wubetie","year":"2017","unstructured":"Wubetie HT. Missing data management and statistical measurement of socio-economic status: application of big data. J Big Data. 2017;4(1):47.","journal-title":"J Big Data"},{"key":"502_CR17","doi-asserted-by":"crossref","unstructured":"Lazar A, Jin L, Spurlock CA, Wu K, Alex S. Data quality challenges with missing values and mixed types in joint sequence analysis. In: data quality challenges with missing values and mixed types in joint sequence analysis. In: 2017 IEEE international conference on big data (Big Data). Boston, MA, USA: IEEE. 2017; p. 2620\u20137.","DOI":"10.1109\/BigData.2017.8258222"},{"issue":"1","key":"502_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TKDE.2007.250581","volume":"19","author":"AK Elmagarmid","year":"2007","unstructured":"Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: a survey. IEEE Trans Knowl Data Eng. 2007;19(1):1\u201316.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"502_CR19","unstructured":"Monge AE, Elkan CP. An efficient domain-independent algorithm for detecting approximately duplicate database records. In: DMKD; 1997."},{"key":"502_CR20","unstructured":"Bilenko M, Mooney RJ. Learning to combine trained distance metrics for duplicate detection in databases. Technical report, Department of Computer Sciences University of Texas at Austin. 2002."},{"key":"502_CR21","doi-asserted-by":"publisher","first-page":"729","DOI":"10.1007\/978-3-642-14246-8_69","volume-title":"Web-Age Information Management","author":"J Song","year":"2010","unstructured":"Chen L, Tang C, Yang J, Gao Y. A multilevel and domain-independent duplicate detection model for scientific database. In: Tang C, Yang J, Chen L, Gao Y, editors. Web-Age Information Management. Berlin: Springer; 2010. p. 729\u201341."},{"issue":"1","key":"502_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-031-01835-0","volume":"2","author":"F Naumann","year":"2010","unstructured":"Naumann F, Herschel M. An introduction to duplicate detection. Synth Lect Data Manag. 2010;2(1):1\u201387.","journal-title":"Synth Lect Data Manag"},{"issue":"2","key":"502_CR23","first-page":"145","volume":"5","author":"J Tamilselvi","year":"2009","unstructured":"Tamilselvi J, Saravanan V. Detection and elimination of duplicate data using token-based method for a data warehouse: a clustering based approach. Int J Dyn Fluids. 2009;5(2):145\u201364.","journal-title":"Int J Dyn Fluids"},{"issue":"2","key":"502_CR24","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1016\/j.datak.2009.10.003","volume":"69","author":"H K\u00f6pcke","year":"2010","unstructured":"K\u00f6pcke H, Rahm E. Frameworks for entity matching: a comparison. Data Knowl Eng. 2010;69(2):197\u2013210.","journal-title":"Data Knowl Eng"},{"issue":"1","key":"502_CR25","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1186\/s40537-018-0123-x","volume":"5","author":"H Alrehamy","year":"2018","unstructured":"Alrehamy H, Walker C. SemLinker: automating big data integration for casual users. J Big Data. 2018;5(1):14.","journal-title":"J Big Data"},{"issue":"74","key":"502_CR26","first-page":"1","volume":"6","author":"N Konstantinou","year":"2019","unstructured":"Konstantinou N, Abel E, Bellomarini L, Bogatu A, Civili C, Irfanie E, Koehler M, Mazilu L, Sallinger E, Fernandes AAA, Gottlob G, Keane JA, Paton NW. VADA: an architecture for end user informed data preparation. J Big Data. 2019;6(74):1\u201332.","journal-title":"J Big Data"},{"issue":"8","key":"502_CR27","first-page":"1","volume":"8","author":"S Haque","year":"2021","unstructured":"Haque S, Mengersen K, Stern S. Assessing the accuracy of record linkages with Markov chain based Monte Carlo simulation approach. J Big Data. 2021;8(8):1\u201325.","journal-title":"J Big Data"},{"key":"502_CR28","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1007\/11890591_5","volume-title":"Journal on data semantics VII","author":"P Lehti","year":"2006","unstructured":"Lehti P, Fankhauser P. Unsupervised duplicate detection using sample non-duplicates. In: Spaccapietra S, editor. Journal on data semantics VII. Berlin, Heidelberg: Springer; 2006. p. 136\u201364."},{"issue":"5","key":"502_CR29","doi-asserted-by":"publisher","first-page":"1330","DOI":"10.1109\/TKDE.2014.2365807","volume":"27","author":"A Bronselaer","year":"2015","unstructured":"Bronselaer A, Van Britsom D, De Tr\u00e9 G. Propagation of data fusion. IEEE Trans Knowl Data Eng. 2015;27(5):1330\u201343.","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"5","key":"502_CR30","first-page":"266","volume":"8","author":"B Bharathi","year":"2017","unstructured":"Bharathi B, Reddy CS. Duplicate record deletion in relational database management systems. Int J Sci Eng Res. 2017;8(5):266\u201371.","journal-title":"Int J Sci Eng Res"},{"issue":"6","key":"502_CR31","first-page":"1893","volume":"10","author":"SA Babu","year":"2017","unstructured":"Babu SA. Duplicate record detection and replacement within a relational database. Adv Comput Sci Technol. 2017;10(6):1893\u2013901.","journal-title":"Adv Comput Sci Technol"},{"issue":"2","key":"502_CR32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s41060-018-0129-7","volume":"6","author":"Y van Gennip","year":"2018","unstructured":"van Gennip Y, Hunter B, Ma A, Moyer D, de Vera R, Bertozzi LA. Unsupervised record matching with noisy and incomplete data. Int J Data Sci Anal. 2018;6(2):1\u201321.","journal-title":"Int J Data Sci Anal"},{"key":"502_CR33","doi-asserted-by":"crossref","unstructured":"Sitaram D, Dalwani A, Narang A, Das M, Auradkar P. A measure of similarity of time series containing missing data using the mahalanobis distance. In: advances in computing and communication engineering (ICACCE). 2015 second international conference. Dehradun, India: IEEE. 2015; p. 622\u20137.","DOI":"10.1109\/ICACCE.2015.14"},{"key":"502_CR34","unstructured":"Abdallah L, Shimshoni I. A distance function for data with missing values and its application. Int J Comput Sci Eng. 2013; p. 7."},{"key":"502_CR35","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1007\/978-3-319-17398-6_11","volume-title":"Pattern analysis. Intelligent security and the internet of things","author":"NA Emran","year":"2015","unstructured":"Emran NA. Data completeness measures. In: Abraham A, Muda AK, Choo Y-H, editors. Pattern analysis. Intelligent security and the internet of things. Cham: Springer International Publishing; 2015. p. 117\u201330."},{"key":"502_CR36","unstructured":"Emran NA, Embury SM, Missier P. Model-driven component generation for families of completeness. In: QDB\/MUD, CTIT workshop proceedings series, Auckland, New Zealand. 2008; p. 123\u201332."},{"issue":"1","key":"502_CR37","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1198\/000313007X172556","volume":"61","author":"NJ Horton","year":"2007","unstructured":"Horton NJ, Kleinman KP. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat. 2007;61(1):79\u201390.","journal-title":"Am Stat"},{"issue":"4","key":"502_CR38","doi-asserted-by":"publisher","first-page":"353","DOI":"10.1076\/edre.7.4.353.8937","volume":"7","author":"TD Pigott","year":"2001","unstructured":"Pigott TD. A review of methods for missing data. Educ Res Eval. 2001;7(4):353\u201383.","journal-title":"Educ Res Eval"},{"key":"502_CR39","first-page":"137","volume-title":"Handbook of computational statistics: concepts and methods","author":"SK Ng","year":"2004","unstructured":"Ng SK, Krishnan T, Mclachlan GJ. The EM algorithm. In: James EG, Karl HW, Yuichi M, editors. Handbook of computational statistics: concepts and methods. Berlin: Springer; 2004. p. 137\u201368."},{"key":"502_CR40","unstructured":"Draisbach U, Naumann F. DuDe: the duplicate detection toolkit. In: proceedings of the international workshop on quality in databases (QDB), Singapore. 2010;10000:1000000."},{"key":"502_CR41","unstructured":"Ellis, B. A consolidated, macro for iterative hot deck imputation. document pr\u00e9sent\u00e9 au NorthEast SAS Users Group-2007. 2007."},{"issue":"12","key":"502_CR42","doi-asserted-by":"publisher","first-page":"2361","DOI":"10.1016\/j.jss.2008.05.008","volume":"81","author":"Q Song","year":"2008","unstructured":"Song Q, Shepperd M, Chen X, Liu J. Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation. J Syst Softw. 2008;81(12):2361\u201370.","journal-title":"J Syst Softw"},{"key":"502_CR43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2015\/538613","volume":"2015","author":"J Sim","year":"2015","unstructured":"Sim J, Lee JS, Kwon O. Missing values and optimal selection of an imputation method and classification algorithm to improve the accuracy of ubiquitous computing applications. Mathematical Problems in Engineering. 2015. p. 1\u201314.","journal-title":"Mathematical Problems in Engineering"},{"key":"502_CR44","unstructured":"Bilenko M, Mooney RJ. On evaluation and training-set construction for duplicate detection. In: Proceedings of the KDD-2003 workshop on data cleaning, record linkage, and object consolidation. ACM, Washington, DC. 2003; p. 7\u201312."},{"key":"502_CR45","unstructured":"Ong S, Pei A. A comparative study of record matching algorithms. PhD thesis, University of Edinburgh, Scotland. 2008."},{"key":"502_CR46","doi-asserted-by":"crossref","unstructured":"Ektefa M, Marzanah AJ, Sidi F, Memar S, Ibrahim H, Ramali A. A threshold-based similarity measure for duplicate detection. In: IEEE Conference on open systems, IEEE, Langkawi, Malaysia. 2011; p. 37\u201341.","DOI":"10.1109\/ICOS.2011.6079233"},{"key":"502_CR47","unstructured":"Daggupati B. Unsupervised duplicate detection (UDD) Of query results from multiple web databases. PhD thesis, California State University Channel Islands. 2011."},{"issue":"5","key":"502_CR48","doi-asserted-by":"publisher","first-page":"1028","DOI":"10.1109\/TKDE.2012.60","volume":"25","author":"L Leitao","year":"2013","unstructured":"Leitao L, Calado P, Herschel M. Efficient and effective duplicate detection in hierarchical data. IEEE Trans Knowl Data Eng. 2013;25(5):1028\u201341.","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"6","key":"502_CR49","first-page":"28","volume":"127","author":"A Skandar","year":"2015","unstructured":"Skandar A, Rehman M, Anjum M. An efficient duplication record detection algorithm for data cleansing. Int J Comput Appl. 2015;127(6):28\u201337.","journal-title":"Int J Comput Appl"},{"key":"502_CR50","doi-asserted-by":"crossref","unstructured":"Bo C, Wang K, Fox JJ, Skadron K. Entity resolution acceleration using the automata processor. In: proceedings\u20142016 IEEE international conference on big data, Big Data. 2016; p. 311\u20138.","DOI":"10.1109\/BigData.2016.7840617"},{"issue":"1","key":"502_CR51","first-page":"7","volume":"8","author":"M Priyanka","year":"2017","unstructured":"Priyanka M, Baby A. A survey on various duplicate detection methods. Int J Comput Sci Inf Technol. 2017;8(1):7\u20139.","journal-title":"Int J Comput Sci Inf Technol"},{"issue":"2","key":"502_CR52","doi-asserted-by":"publisher","first-page":"51","DOI":"10.14445\/22312803\/IJCTT-V48P112","volume":"48","author":"MT Meshram","year":"2017","unstructured":"Meshram MT. Duplicate detection with map reduce and deletion procedure. Int J Comput Trends Technol. 2017;48(2):51\u20133.","journal-title":"Int J Comput Trends Technol"},{"key":"502_CR53","unstructured":"Zieger T. Self-adaptive data quality automating duplicate detection. PhD thesis, Potsdam. 2018."},{"issue":"2","key":"502_CR54","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1109\/TBDATA.2016.2637378","volume":"6","author":"K Hildebrandt","year":"2020","unstructured":"Hildebrandt K, Panse F, Wilcke N, Ritter N. Large-scale data pollution with apache spark. IEEE Trans Big Data. 2020;6(2):396\u2013411.","journal-title":"IEEE Trans Big Data"},{"key":"502_CR55","doi-asserted-by":"crossref","unstructured":"Yan S, Lee D, Kan M-Y, Giles LC. Adaptive sorted neighborhood methods for efficient record linkage. In: proceedings of the 7th ACM\/IEEE-CS joint conference on digital libraries. ACM, Vancouver BC Canada. 2007; p. 185\u201394.","DOI":"10.1145\/1255175.1255213"},{"key":"502_CR56","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1145\/568271.223807","volume":"24","author":"MA Hern\u00e1ndez","year":"1995","unstructured":"Hern\u00e1ndez MA, Stolfo SJ. The merge\/purge problem for large databases. ACM SIGMOD Rec. 1995;24:127\u201338.","journal-title":"ACM SIGMOD Rec"},{"key":"502_CR57","unstructured":"Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady. 1966;10(8)."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00502-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-021-00502-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00502-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,11]],"date-time":"2025-09-11T03:30:13Z","timestamp":1757561413000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00502-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,21]]},"references-count":57,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["502"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00502-1","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-390519\/v1","asserted-by":"object"}]},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,21]]},"assertion":[{"value":"1 April 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 August 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 August 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 September 2025","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The email address was updated.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"All authors agree to have their work published in Big Data Journal.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"112"}}