{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,1]],"date-time":"2026-06-01T20:36:33Z","timestamp":1780346193012,"version":"3.54.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>Distributed databases have been widely researched and developed in recent years due to their scalability, availability, and consistency guarantees. The write-ahead logging (WAL) system is one of the most vital components in a database. It is still a non-trivial problem to design a replicated logging system as the foundation of a distributed database with the power of ACID transactions. This paper proposes PALF, a Paxos-backed Append-only Log File System, to address these challenges. The basic idea behind PALF is to co-design the logging system with the entire database for supporting database-specific functions and to abstract the functions as PALF primitives to power other distributed systems. Many database functions, including transaction processing, database restore, and physical standby databases, have been built based on PALF primitives. Evaluation shows that PALF greatly outperforms well-known implementations of consensus protocols and is fully competent for distributed database workloads. PALF has been deployed as a component of the OceanBase 4.0 database and has been made open-source along with it.<\/jats:p>","DOI":"10.14778\/3685800.3685803","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"3745-3758","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["PALF: Replicated Write-Ahead Logging for Distributed Databases"],"prefix":"10.14778","volume":"17","author":[{"given":"Fusheng","family":"Han","sequence":"first","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao","family":"Liu","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Bin","family":"Chen","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Debin","family":"Jia","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jianfeng","family":"Zhou","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xuwang","family":"Teng","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chuanhui","family":"Yang","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Huafeng","family":"Xi","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wei","family":"Tian","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shuning","family":"Tao","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sen","family":"Wang","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Quanqing","family":"Xu","sequence":"additional","affiliation":[{"name":"OceanBase, Ant Group"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhenkun","family":"Yang","sequence":"additional","affiliation":[{"name":"OceanBase"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3314047"},{"key":"e_1_2_1_2_1","volume-title":"Sep","year":"2015","unstructured":"Baidu. Braft: An industrial-grade c++ implementation of raft consensus algorithm based on brpc., Sep 2015. URL: https:\/\/github.com\/baidu\/braft."},{"key":"e_1_2_1_3_1","first-page":"617","volume-title":"14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)","author":"Balakrishnan Mahesh","year":"2020","unstructured":"Mahesh Balakrishnan, Jason Flinn, Chen Shen, Mihir Dharamshi, Ahmed Jafri, Xiao Shi, Santosh Ghosh, Hazem Hassan, Aaryaman Sagar, Rhed Shi, et al. Virtual consensus in delos. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 617--632, Virtual Event \/ Portland, OR, USA, 2020. USENIX Association."},{"key":"e_1_2_1_4_1","first-page":"1","volume-title":"9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12)","author":"Balakrishnan Mahesh","year":"2012","unstructured":"Mahesh Balakrishnan, Dahlia Malkhi, Vijayan Prabhakaran, Ted Wobbler, Michael Wei, and John D. Davis. CORFU: A shared log design for flash clusters. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pages 1--14, San Jose, CA, April 2012. USENIX Association. URL: https:\/\/www.usenix.org\/conference\/nsdi12\/technical-sessions\/presentation\/balakrishnan."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1298455.1298487"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281100.1281103"},{"key":"e_1_2_1_7_1","volume-title":"Oct","year":"2022","unstructured":"Cockroach. Replication layer of crdb., Oct 2022. URL: https:\/\/www.cockroachlabs.com\/docs\/stable\/architecture\/replication-layer.html#non-voting-replicas."},{"key":"e_1_2_1_8_1","volume-title":"Oct","year":"2022","unstructured":"Cockroach. Transactions of crdb., Oct 2022. URL: https:\/\/www.cockroachlabs.com\/docs\/stable\/transactions."},{"key":"e_1_2_1_9_1","first-page":"261","volume-title":"10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12)","author":"Corbett James C.","year":"2012","unstructured":"James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google's Globally-Distributed database. In 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 261--264, Hollywood, CA, October 2012. USENIX Association."},{"key":"e_1_2_1_10_1","volume-title":"Tpc benchmark","author":"T. P. P. Council","year":"2010","unstructured":"T. P. P. Council. Tpc benchmark, 2010. URL: http:\/\/tpc.org\/TPC_Documents_Current_Versions\/pdf\/tpc-c_v5.11.0.pdf."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/5505.5508"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1041680.1041682"},{"key":"e_1_2_1_13_1","first-page":"325","volume-title":"Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation, NSDI'20","author":"Ding Cong","year":"2020","unstructured":"Cong Ding, David Chu, Evan Zhao, Xiang Li, Lorenzo Alvisi, and Robbert Van Renesse. Scalog: Seamless reconfiguration and total order in a scalable shared log. In Proceedings of the 17th Usenix Conference on Networked Systems Design and Implementation, NSDI'20, pages 325--338, USA, 2020. USENIX Association."},{"key":"e_1_2_1_14_1","volume-title":"Etcd: Distributed reliable key-value store for the most critical data of a distributed system","year":"2013","unstructured":"Etcd-Io. Etcd: Distributed reliable key-value store for the most critical data of a distributed system, 2013. URL: https:\/\/github.com\/etcd-io\/etcd."},{"key":"e_1_2_1_15_1","volume-title":"Jun","year":"2013","unstructured":"Etcd-Io. Etcd raft library, Jun 2013. URL: https:\/\/pkg.go.dev\/go.etcd.io\/etcd\/raft\/v3."},{"key":"e_1_2_1_16_1","first-page":"575","volume-title":"2022 USENIX Annual Technical Conference (USENIX ATC 22)","author":"Fouto Pedro","year":"2022","unstructured":"Pedro Fouto, Nuno Pregui\u00e7a, and Joao Leit\u00e3o. High throughput replication with integrated membership management. In 2022 USENIX Annual Technical Conference (USENIX ATC 22), pages 575--592, Carlsbad, CA, July 2022. USENIX Association. URL: https:\/\/www.usenix.org\/conference\/atc22\/presentation\/fouto."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132863.1132867"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2017.163"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565837"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/645575.658332"},{"key":"e_1_2_1_21_1","volume-title":"Nonblocking memory management support for dynamic-sized data structures. ACM Transactions on Computer Systems (TOCS), 23(2):146--196","author":"Herlihy Maurice","year":"2005","unstructured":"Maurice Herlihy, Victor Luchangco, Paul Martin, and Mark Moir. Nonblocking memory management support for dynamic-sized data structures. ACM Transactions on Computer Systems (TOCS), 23(2):146--196, 2005."},{"key":"e_1_2_1_23_1","first-page":"11","volume-title":"Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIX-ATC'10","author":"Hunt Patrick","year":"2010","unstructured":"Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIX-ATC'10, page 11, USA, 2010. USENIX Association."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2011.5958223"},{"key":"e_1_2_1_25_1","unstructured":"Alexey Kopytov. Sysbench: a system performance benchmark. http:\/\/sysbench.sourceforge.net\/ 2004."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/279227.279229"},{"key":"e_1_2_1_27_1","first-page":"51","volume-title":"December 2001)","author":"Lamport Leslie","year":"2001","unstructured":"Leslie Lamport. Paxos made simple. ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), pages 51--58, December 2001."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1582716.1582783"},{"key":"e_1_2_1_29_1","volume-title":"Oracle Core: Essential Internals for DBAs and Developers","author":"Lewis Jonathan","year":"2012","unstructured":"Jonathan Lewis. Oracle Core: Essential Internals for DBAs and Developers. Apress, 2012."},{"key":"e_1_2_1_30_1","volume-title":"View stamped replication revisited","author":"Liskov Barbara","year":"2012","unstructured":"Barbara Liskov and James Cowling. View stamped replication revisited. 2012."},{"key":"e_1_2_1_31_1","first-page":"305","volume-title":"17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20)","author":"Liu Ming","year":"2020","unstructured":"Ming Liu, Arvind Krishnamurthy, Harsha V. Madhyastha, Rishi Bhardwaj, Karan Gupta, Chinmay Kamat, Huapeng Yuan, Aditya Jaltade, Roger Liao, Pavan Konka, and Anoop Jawahar. Fine-Grained replicated state machines for a cluster storage system. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 305--323, Santa Clara, CA, February 2020. USENIX Association."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2010.5544272"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/2643634.2643666"},{"key":"e_1_2_1_35_1","volume-title":"Jul","year":"2013","unstructured":"Oracle. Archived redo logs in oracle, Jul 2013. URL: https:\/\/docs.oracle.com\/cd\/B19306_01\/server.102\/b14231\/archredo.htm."},{"key":"e_1_2_1_36_1","volume-title":"Jul","year":"2013","unstructured":"Oracle. Oracle data guard, Jul 2013. URL: https:\/\/www.oracle.com\/database\/data-guard\/."},{"key":"e_1_2_1_37_1","volume-title":"Jul","year":"2013","unstructured":"Oracle. Oracle recovery manager, Jul 2013. URL: https:\/\/www.oracle.com\/database\/technologies\/high-availability\/rman.html."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/98163.98167"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.4230\/LIPIcs.OPODIS.2021.26"},{"key":"e_1_2_1_41_1","volume-title":"Database system concepts","author":"Silberschatz Abraham","year":"2011","unstructured":"Abraham Silberschatz, Henry F Korth, and Shashank Sudarshan. Database system concepts. 2011."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386134"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3514221.3526053"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056101"},{"key":"e_1_2_1_45_1","volume-title":"Quanqing Xu. LCL: A Lock Chain Length-based Distributed Algorithm for Deadlock Detection and Resolution. In Proceeding of the 39th IEEE International Conference on Data Engineering (ICDE)","author":"Yang Zhenkun","year":"2023","unstructured":"Zhenkun Yang, Chen Qian, Xuwang Teng, Fanyu Kong, Fusheng Han, and Quanqing Xu. LCL: A Lock Chain Length-based Distributed Algorithm for Deadlock Detection and Resolution. In Proceeding of the 39th IEEE International Conference on Data Engineering (ICDE), 2023."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554830"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554830"},{"key":"e_1_2_1_48_1","volume-title":"Jan","year":"2016","unstructured":"Yugabyte. Yugabyte-db: The cloud native distributed sql database for mission-critical applications., Jan 2016. URL: https:\/\/github.com\/yugabyte\/yugabyte-db."},{"key":"e_1_2_1_49_1","volume-title":"Oct","year":"2022","unstructured":"Yugabyte. xcluster replication., Oct 2022. URL: https:\/\/docs.yugabyte.com\/preview\/architecture\/docdb-replication\/async-replication\/."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457559"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685803","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:25:16Z","timestamp":1735622716000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685803"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":48,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685803"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685803","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}