{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:40:21Z","timestamp":1755790821386,"version":"3.44.0"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T00:00:00Z","timestamp":1710201600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100006374","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2021R1A2B5B03001551"],"award-info":[{"award-number":["NRF-2021R1A2B5B03001551"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2024,3,12]]},"abstract":"<jats:p>A database replay system (DRS) captures workloads on a production system and then replays them in a test system to test various system changes, avoiding any risk before realizing them in production. The dependency graph generation in a DRS is crucial in preserving output determinism while maximizing concurrency. The state-of-the-art dependency graph generation algorithm deployed in a commercial DBMS uses a generate-and-prune strategy. It first generates a dependency graph by performing backward scans for each request in a workload. It then prunes all redundant edges using an expensive, transitive reduction algorithm. However, we notice that this generates a large dependency graph that contains many redundant edges and its worst-case time complexity is quadratic to the number of requests in a workload. In order to solve these challenging problems, we formally propose four classes of dependency graphs for DRSs. We then present a stateful single forward scan algorithm, SSFS, to generate any class of dependency graphs by performing a single scan over all requests while succinctly maintaining states. Here, states refer to information that is stored and maintained for efficient dependency graph generation. We also propose the parallel SSFS to utilize the computation power with multi-core CPUs while balancing the loads. We implemented our DRS in a leading commercial DBMS. Extensive experiments using the TPC-C, SD benchmarks, and a real-world customer workload show that our DRS significantly improves the dependency graph generation time by up to two orders of magnitude, compared to the state-of-the-art.<\/jats:p>","DOI":"10.1145\/3639322","type":"journal-article","created":{"date-parts":[[2024,3,26]],"date-time":"2024-03-26T18:51:32Z","timestamp":1711479092000},"page":"1-26","source":"Crossref","is-referenced-by-count":0,"title":["DoppelGanger++: Towards Fast Dependency Graph Generation for Database Replay"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-3672-5384","authenticated-orcid":false,"given":"Wonseok","family":"Lee","sequence":"first","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-1815-3320","authenticated-orcid":false,"given":"Jaehyun","family":"Ha","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9206-9563","authenticated-orcid":false,"given":"Wook-Shin","family":"Han","sequence":"additional","affiliation":[{"name":"POSTECH, Pohang, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6675-2613","authenticated-orcid":false,"given":"Changgyoo","family":"Park","sequence":"additional","affiliation":[{"name":"SAP Labs Korea, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-6727-5280","authenticated-orcid":false,"given":"Myunggon","family":"Park","sequence":"additional","affiliation":[{"name":"SAP Labs Korea, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5522-2722","authenticated-orcid":false,"given":"Juhyeng","family":"Han","sequence":"additional","affiliation":[{"name":"SAP Labs Korea, Seoul, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2938-3354","authenticated-orcid":false,"given":"Juchang","family":"Lee","sequence":"additional","affiliation":[{"name":"SAP Labs Korea, Seoul, Republic of Korea"}]}],"member":"320","published-online":{"date-parts":[[2024,3,26]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2010. TPC Benchmark C. http:\/\/www.tpc.org\/tpcc\/. Accessed: 2022-06--23."},{"key":"e_1_2_1_2_1","unstructured":"2020. Oracle Database 19c: Real Application Testing Overview. Technical Report."},{"key":"e_1_2_1_3_1","unstructured":"2022. Capturing and Replaying Workloads. https:\/\/help.sap.com\/docs\/SAP_HANA_COCKPIT\/ afa922439b204e9caf22c78b6b69e4f2\/4f3c88249d484b0faba0e6b27b82c2dd.html?locale=en-US"},{"key":"e_1_2_1_4_1","unstructured":"2022. SAP Standard Application Benchmarks. https:\/\/www.sap.com\/about\/benchmark.html."},{"key":"e_1_2_1_5_1","unstructured":"2022. Sql server distributed replay. https:\/\/learn.microsoft.com\/en-us\/sql\/tools\/distributed-replay\/sql-server-distributed-replay?view=sql-server-ver16"},{"key":"e_1_2_1_6_1","unstructured":"2023. The Internals of PostgreSQL. https:\/\/www.interdb.jp\/. Accessed: 2023--10--20."},{"key":"e_1_2_1_7_1","unstructured":"2023. Technical Report. https:\/\/sites.google.com\/view\/systemx-replay"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213556.2213560"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1137\/0201008"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629575.1629594"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2452376.2452392"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554840"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3471485.3471492"},{"key":"e_1_2_1_14_1","volume-title":"Graph theory","author":"Diestel Reinhard","year":"2005","unstructured":"Reinhard Diestel. 2005. Graph theory 3rd ed. Graduate texts in mathematics 173 (2005), 33.","edition":"3"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3282495.3282502"},{"key":"e_1_2_1_16_1","volume-title":"Rethinking serializable multiversion concurrency control. arXiv preprint arXiv:1412.2324","author":"Faleiro Jose M","year":"2014","unstructured":"Jose M Faleiro and Daniel J Abadi. 2014. Rethinking serializable multiversion concurrency control. arXiv preprint arXiv:1412.2324 (2014)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055540.3055553"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993636.1993647"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/582318.582329"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1376616.1376732"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407860"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989330"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544908"},{"key":"e_1_2_1_24_1","unstructured":"Bahman Javadi Isfahani. 2017. Evaluating a modern in-memory columnar data management system with a contemporary OLTP workload. (2017)."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3430915.3430918"},{"key":"e_1_2_1_26_1","unstructured":"Qian Li Peter Kraft Michael Cafarella \u00c7agatay Demiralp Goetz Graefe Christos Kozyrakis Michael Stonebraker Lalith Suresh and Matei Zaharia. 2023. Transactions Make Debugging Easy.. In CIDR."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442197"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402755.3402757"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00076"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2017.2727492"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989399"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(88)90032-1"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007352.1007372"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.2975650"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213838"},{"volume-title":"Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery","author":"Weikum Gerhard","key":"e_1_2_1_36_1","unstructured":"Gerhard Weikum and Gottfried Vossen. 2001. Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Elsevier."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064011"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3209950.3209958"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCS.1988.12538"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915208"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639322","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3639322","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T15:16:36Z","timestamp":1755789396000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3639322"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,12]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3,12]]}},"alternative-id":["10.1145\/3639322"],"URL":"https:\/\/doi.org\/10.1145\/3639322","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2024,3,12]]}}}