{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T18:12:03Z","timestamp":1769883123854,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,3,15]],"date-time":"2022-03-15T00:00:00Z","timestamp":1647302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Not Only SQL (NoSQL) is a critical technology that is scalable and provides flexible schemas, thereby complementing existing relational database technologies. Although NoSQL is flourishing, present solutions lack the features required by enterprises for critical missions. In this paper, we explore solutions to the data recovery issue in NoSQL. Data recovery for any database table entails restoring the table to a prior state or replaying (insert\/update) operations over the table given a time period in the past. Recovery of NoSQL database tables enables applications such as failure recovery, analysis for historical data, debugging, and auditing. Particularly, our study focuses on columnar NoSQL databases. We propose and evaluate two solutions to address the data recovery problem in columnar NoSQL and implement our solutions based on Apache HBase, a popular NoSQL database in the Hadoop ecosystem widely adopted across industries. Our implementations are extensively benchmarked with an industrial NoSQL benchmark under real environments.<\/jats:p>","DOI":"10.3390\/fi14030092","type":"journal-article","created":{"date-parts":[[2022,3,16]],"date-time":"2022-03-16T03:34:13Z","timestamp":1647401653000},"page":"92","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["The Time Machine in Columnar NoSQL Databases: The Case of Apache HBase"],"prefix":"10.3390","volume":"14","author":[{"given":"Chia-Ping","family":"Tsai","sequence":"first","affiliation":[{"name":"Apache HBase and Kafka Project Management Committees, Wilmington, DE 19801, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3927-6613","authenticated-orcid":false,"given":"Che-Wei","family":"Chang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, Chinese Culture University, Taipei 11114, Taiwan"}]},{"given":"Hung-Chang","family":"Hsiao","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 70101, Taiwan"}]},{"given":"Haiying","family":"Shen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Virginia, Charlottesville, VA 22908, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1109\/MITP.2013.1","article-title":"A Survey of Cloud Database Systems","volume":"16","author":"Deka","year":"2014","journal-title":"IT Prof."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1109\/MC.2010.58","article-title":"Will NoSQL Databases Live Up to Their Promise?","volume":"43","author":"Leavitt","year":"2010","journal-title":"IEEE Comput."},{"key":"ref_3","unstructured":"(2022, February 17). Apache Cassandra. Available online: http:\/\/cassandra.apache.org\/."},{"key":"ref_4","unstructured":"(2022, February 17). Couchbase. Available online: http:\/\/www.couchbase.com\/."},{"key":"ref_5","unstructured":"(2022, February 17). Apache HBase. Available online: http:\/\/hbase.apache.org\/."},{"key":"ref_6","unstructured":"(2022, February 17). MongoDB. Available online: http:\/\/www.mongodb.org\/."},{"key":"ref_7","unstructured":"(2022, February 17). Apache HDFS. Available online: http:\/\/hadoop.apache.org\/docs\/r1.2.1\/hdfs_design.html."},{"key":"ref_8","unstructured":"(2022, February 17). MySQL. Available online: http:\/\/www.mysql.com\/."},{"key":"ref_9","unstructured":"(2022, February 17). Apache Phoenix. Available online: http:\/\/phoenix.apache.org\/."},{"key":"ref_10","unstructured":"(2022, February 17). DB-Engines. Available online: https:\/\/db-engines.com\/en\/ranking_trend\/wide+column+store."},{"key":"ref_11","unstructured":"Dean, J., and Ghemawat, S. (2004, January 6\u20138). MapReduce: Simplified Data Processing on Large Clusters. Proceedings of the 6th Symposium Operating System Design and Implementation (OSDI\u201904), San Francisco, CA, USA."},{"key":"ref_12","unstructured":"(2022, February 17). Apache Hadoop. Available online: http:\/\/hadoop.apache.org\/."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R. (2010, January 10\u201311). Benchmarking Cloud Serving Systems with YCSB. Proceedings of the ACM Symposium Cloud Computing (SOCC\u201910), Indianapolis, IN, USA.","DOI":"10.1145\/1807128.1807152"},{"key":"ref_14","unstructured":"(2022, February 17). The Network Time Protocol. Available online: http:\/\/www.ntp.org\/."},{"key":"ref_15","unstructured":"(2022, February 17). HBase Regions. Available online: https:\/\/hbase.apache.org\/book\/regions.arch.html."},{"key":"ref_16","unstructured":"(2022, February 17). HBase APIs. Available online: http:\/\/hbase.apache.org\/0.94\/apidocs\/."},{"key":"ref_17","unstructured":"(2022, February 17). IBM BladeCenter HS23. Available online: http:\/\/www-03.ibm.com\/systems\/bladecenter\/hardware\/servers\/hs23\/."},{"key":"ref_18","unstructured":"Webster, C. (2015). Hadoop Virtualization, O\u2019Reilly."},{"key":"ref_19","unstructured":"(2022, February 17). VMware. Available online: http:\/\/www.vmware.com\/."},{"key":"ref_20","unstructured":"(2022, February 17). HBase Snapshots. Available online: https:\/\/hbase.apache.org\/book\/ops.snapshots.html."},{"key":"ref_21","unstructured":"(2022, February 17). Cloudera Snapshots. Available online: http:\/\/www.cloudera.com\/content\/cloudera-content\/cloudera-docs\/CM5\/latest\/Cloudera-Backup-Disaster-Recovery\/cm5bdr_snapshot_intro.html."},{"key":"ref_22","unstructured":"(2022, February 17). HBase Replication. Available online: http:\/\/blog.cloudera.com\/blog\/2012\/07\/hbase-replication-overview-2\/."},{"key":"ref_23","unstructured":"(2022, February 17). HBase Export. Available online: https:\/\/hbase.apache.org\/book\/ops_mgt.html#export."},{"key":"ref_24","unstructured":"(2022, February 17). HBase CopyTable. Available online: https:\/\/hbase.apache.org\/book\/ops_mgt.htm#copytable."},{"key":"ref_25","unstructured":"(2022, February 17). Oracle Database Backup and Recovery. Available online: http:\/\/docs.oracle.com\/cd\/E11882_01\/backup.112\/e10642\/rcmintro.htm#BRADV8001."},{"key":"ref_26","unstructured":"Matos, D.R., and Correia, M. (November, January 31). NoSQL Undo: Recovering NoSQL Databases by Undoing Operations. Proceedings of the IEEE International Symposium Network Computing and Applications (NCA), Boston, MA, USA."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Abadi, A., Haib, A., Melamed, R., Nassar, A., Shribman, A., and Yasin, H. (2016, January 10\u201313). Holistic Disaster Recovery Approach for Big Data NoSQL Workloads. Proceedings of the IEEE International Conference Big Data (BigData), Atlanta, GA, USA.","DOI":"10.1109\/BigData.2016.7840833"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhou, J., Bruno, N., and Lin, W. (2012, January 20\u201324). Advanced Partitioning Techniques for Massively Distributed Computation. Proceedings of the ACM SIGMOD, Scottsdale, AZ, USA.","DOI":"10.1145\/2213836.2213839"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2804","DOI":"10.1109\/TMC.2019.2934461","article-title":"Improving Urban Crowd Flow Prediction on Flexible Region Partition","volume":"19","author":"Wang","year":"2019","journal-title":"IEEE Trans. Mob. Comput."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1268","DOI":"10.1109\/TCC.2019.2922379","article-title":"A Power Consumption Model for Cloud Servers Based on Elman Neural Network","volume":"9","author":"Wu","year":"2020","journal-title":"IEEE Trans. Cloud Comput. (TCC)"},{"key":"ref_31","unstructured":"Ye, K., Shen, H., Wang, Y., and Xu, C.-Z. (2020). Multi-tier Workload Consolidations in the Cloud: Profiling, Modeling and Optimization. IEEE Trans. Cloud Comput. (TCC), preprint."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"70150","DOI":"10.1109\/ACCESS.2020.2985282","article-title":"Enabling Serverless Deployment of Large-Scale AI Workloads","volume":"8","author":"Christidis","year":"2020","journal-title":"IEEE Access"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"2676","DOI":"10.1109\/TKDE.2014.2302297","article-title":"Consistent Online Backup in Transactional File Systems","volume":"26","author":"Deka","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng. (TKDE)"},{"key":"ref_34","unstructured":"(2022, February 17). Delta Lake. Available online: https:\/\/databricks.com\/blog\/2019\/02\/04\/introducing-delta-time-travel-for-large-scale-data-lakes.html."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/3\/92\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:36:45Z","timestamp":1760135805000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/14\/3\/92"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,15]]},"references-count":34,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["fi14030092"],"URL":"https:\/\/doi.org\/10.3390\/fi14030092","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,15]]}}}