{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T16:40:12Z","timestamp":1758904812139,"version":"3.44.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"name":"National Key RD Program of China","award":["2022YFB3103401"],"award-info":[{"award-number":["2022YFB3103401"]}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62102352, 62472378, U23A20306"],"award-info":[{"award-number":["62102352, 62472378, U23A20306"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"the Zhejiang Province Pioneer Plan","award":["2024C01074"],"award-info":[{"award-number":["2024C01074"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,9,22]]},"abstract":"<jats:p>\n            Watermarking is an effective technique to prevent illegal copying of datasets in data markets. However, existing techniques on tabular datasets either fall short in the three fundamental goals (detectability, non-intrusiveness, and robustness) or lack the capabilities of being blind and buyer-traceable. In this paper, we propose a blind and buyer-traceable watermarking scheme, B\n            <jats:sup>2<\/jats:sup>\n            Mark, based on statistical hypothesis testing. To our best knowledge, this is the first blind watermarking scheme applicable to both numerical and categorical data. During the embedding process, B\n            <jats:sup>2<\/jats:sup>\n            Mark embeds multi-bit watermark information as a buyer identifier based on the value domain (instead of noise domain) partition and a cryptographic hash function. In the detection phase, B\n            <jats:sup>2<\/jats:sup>\n            Mark employs a hypothesis testing-based approach to extract the embedded buyer identifier without accessing the original dataset, ensuring buyer-traceable and blind watermark detection. Experiments on various datasets confirm that B\n            <jats:sup>2<\/jats:sup>\n            Mark achieves three fundamental goals and can effectively and efficiently trace data leaks in multi-buyer scenarios.\n          <\/jats:p>","DOI":"10.1145\/3749158","type":"journal-article","created":{"date-parts":[[2025,9,23]],"date-time":"2025-09-23T17:17:03Z","timestamp":1758647823000},"page":"1-26","source":"Crossref","is-referenced-by-count":0,"title":["B\n            <sup>2<\/sup>\n            Mark: A Blind and Buyer-Traceable Watermarking Scheme for Tabular Datasets"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0346-3006","authenticated-orcid":false,"given":"Yihao","family":"Zheng","sequence":"first","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2921-2827","authenticated-orcid":false,"given":"Jinfei","family":"Liu","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3441-6277","authenticated-orcid":false,"given":"Kui","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University, Hangzhou, Zhejiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7354-0428","authenticated-orcid":false,"given":"Li","family":"Xiong","sequence":"additional","affiliation":[{"name":"Emory University, Atlanta, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,9,23]]},"reference":[{"key":"e_1_2_1_1_1","article-title":"Fragile Audio Watermark based on Empirical Mode Decomposition for Content Authentication","volume":"8","author":"Abdulmunem Matheel E","year":"2017","unstructured":"Matheel E Abdulmunem and Ameer A Badr. 2017. Fragile Audio Watermark based on Empirical Mode Decomposition for Content Authentication. International Journal of Advanced Research in Computer Science, Vol. 8, 5 (2017).","journal-title":"International Journal of Advanced Research in Computer Science"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","unstructured":"Stefan Aeberhard and M. Forina. 1992. Wine. UCI Machine Learning Repository. DOI: https:\/\/doi.org\/10.24432\/C5PC7J.","DOI":"10.24432\/C5PC7J"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-155860869-6\/50022-6"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/S10489-020-01903-0"},{"volume-title":"AWS Data Exchange. https:\/\/aws.amazon.com\/cn\/data-exchange\/ [Online","year":"2024","key":"e_1_2_1_5_1","unstructured":"Amazon. 2024. AWS Data Exchange. https:\/\/aws.amazon.com\/cn\/data-exchange\/ [Online; accessed 10-January-2024]."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","unstructured":"Jock Blackard. 1998. Covertype. UCI Machine Learning Repository. DOI: https:\/\/doi.org\/10.24432\/C50K5N.","DOI":"10.24432\/C50K5N"},{"key":"e_1_2_1_7_1","volume-title":"The Eleventh International Conference on Learning Representations, ICLR 2023","author":"Borisov Vadim","year":"2023","unstructured":"Vadim Borisov, Kathrin Se\u00dfler, Tobias Leemann, Martin Pawelczyk, and Gjergji Kasneci. 2023. Language Models are Realistic Tabular Data Generators. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net. https:\/\/openreview.net\/forum?id=cEygmQNOeI"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-78119-3_18"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csi.2023.103830"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/S00521-016-2215-X"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_2_1_12_1","first-page":"1125","volume-title":"Undetectable Watermarks for Language Models. In The Thirty Seventh Annual Conference on Learning Theory","volume":"247","author":"Christ Miranda","year":"2024","unstructured":"Miranda Christ, Sam Gunn, and Or Zamir. 2024. Undetectable Watermarks for Language Models. In The Thirty Seventh Annual Conference on Learning Theory, June 30 - July 3, 2023, Edmonton, Canada (Proceedings of Machine Learning Research), Vol. 247. PMLR, 1125-1139. https:\/\/proceedings.mlr.press\/v247\/christ24a.html"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","unstructured":"Paulo Cortez. 2008. Student Performance. UCI Machine Learning Repository. DOI: https:\/\/doi.org\/10.24432\/C5TG7T.","DOI":"10.24432\/C5TG7T"},{"volume-title":"Machine learning in finance","author":"Dixon Matthew F","key":"e_1_2_1_14_1","unstructured":"Matthew F Dixon, Igor Halperin, and Paul Bilokon. 2020. Machine learning in finance. Vol. 1170. Springer."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.24432\/C56C76"},{"key":"e_1_2_1_16_1","unstructured":"Joseph L Fleiss Bruce Levin and Myunghee Cho Paik. 2013. Statistical methods for rates and proportions. john wiley & sons."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.ESWA.2022.118713"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1929934.1929937"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581783.3612015"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/0095-0696(78)90006-2"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2405.14018"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589761"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2851517"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599513"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2349911"},{"key":"e_1_2_1_26_1","volume-title":"Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology","author":"Jiang Fei","year":"2017","unstructured":"Fei Jiang, Yong Jiang, Hui Zhi, Yi Dong, Hao Li, Sufeng Ma, Yilong Wang, Qiang Dong, Haipeng Shen, and Yongjun Wang. 2017. Artificial intelligence in healthcare: past, present and future. Stroke and vascular neurology, Vol. 2, 4 (2017)."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.COSE.2009.04.001"},{"key":"e_1_2_1_28_1","volume-title":"A comprehensive survey of watermarking relational databases research. arXiv preprint arXiv:1801.08271","author":"Kamran Muhammad","year":"2018","unstructured":"Muhammad Kamran and Muddassar Farooq. 2018. A comprehensive survey of watermarking relational databases research. arXiv preprint arXiv:1801.08271 (2018)."},{"key":"e_1_2_1_29_1","first-page":"17061","volume-title":"International Conference on Machine Learning, ICML 2023","volume":"202","author":"Kirchenbauer John","year":"2023","unstructured":"John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. 2023. A Watermark for Large Language Models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learning Research), Vol. 202. PMLR, 17061-17084. https:\/\/proceedings.mlr.press\/v202\/kirchenbauer23a.html"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2022.3194191"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1029146.1029159"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3390\/math8111994"},{"volume-title":"Introduction to linear regression analysis","author":"Montgomery Douglas C","key":"e_1_2_1_33_1","unstructured":"Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. 2021. Introduction to linear regression analysis. John Wiley & Sons."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2409.14700"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSAA.2016.49"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2023.3324932"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872772"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","unstructured":"Athanasios Tsanas and Angeliki Xifara. 2012. Energy Efficiency. UCI Machine Learning Repository. DOI: https:\/\/doi.org\/10.24432\/C51307.","DOI":"10.24432\/C51307"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.02.083"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/AICAS54282.2022.9869923"},{"key":"e_1_2_1_41_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Xu Lei","year":"2019","unstructured":"Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling Tabular data using Conditional GAN. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 7333-7343. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/254ed7d2de3b23ab10936522dd547b78-Abstract.html"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.3390\/a10010027"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2411.07267"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3658644.3690373"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3006415"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3749158","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T16:20:56Z","timestamp":1758903656000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3749158"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,22]]},"references-count":45,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,9,22]]}},"alternative-id":["10.1145\/3749158"],"URL":"https:\/\/doi.org\/10.1145\/3749158","relation":{},"ISSN":["2836-6573"],"issn-type":[{"type":"electronic","value":"2836-6573"}],"subject":[],"published":{"date-parts":[[2025,9,22]]}}}