{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,1]],"date-time":"2024-06-01T00:27:36Z","timestamp":1717201656488},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:p>Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data during storage, transmission, and consumption, fewer studies have been developed to detect whether they are already leaked for model training without authorization. This issue is particularly challenging due to the absence of information and control over the training process conducted by potential attackers.<\/jats:p>\n          <jats:p>\n            In this paper, we concentrate on the domain of tabular data and introduce a novel methodology, Local Distribution Shifting Synthesis (LDSS), to detect leaked data that are used to train classification models. The core concept behind LDSS involves injecting a small volume of synthetic data-characterized by local shifts in class distribution-into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone, as the synthetic data injection results in a pronounced disparity in the predictions of models trained on leaked and modified datasets. LDSS is\n            <jats:italic>model-oblivious<\/jats:italic>\n            and hence compatible with a diverse range of classification models. We have conducted extensive experiments on seven types of classification models across five real-world datasets. The comprehensive results affirm the reliability, robustness, fidelity, security, and efficiency of LDSS. Extending LDSS to regression tasks further highlights its versatility and efficacy compared with baseline methods.\n          <\/jats:p>","DOI":"10.14778\/3659437.3659446","type":"journal-article","created":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T16:22:27Z","timestamp":1717172547000},"page":"1898-1910","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying"],"prefix":"10.14778","volume":"17","author":[{"given":"Biao","family":"Wu","sequence":"first","affiliation":[{"name":"National University of Singapore"}]},{"given":"Qiang","family":"Huang","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]},{"given":"Anthony K. H.","family":"Tung","sequence":"additional","affiliation":[{"name":"National University of Singapore"}]}],"member":"320","published-online":{"date-parts":[[2024,5,31]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Yossi Adi Carsten Baum Moustapha Cisse Benny Pinkas and Joseph Keshet. 2018. Turning Your Weakness into a Strength: Watermarking Deep Neural Networks by Backdooring. In USENIX Security. 1615--1631."},{"key":"e_1_2_1_2_1","first-page":"1448","article-title":"WikiLeaks and the institutional framework for national security disclosures","volume":"121","author":"Bellia Patricia L","year":"2011","unstructured":"Patricia L Bellia. 2011. WikiLeaks and the institutional framework for national security disclosures. Yale LJ 121 (2011), 1448.","journal-title":"Yale LJ"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/s42786-020-00020-3"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2007.901206"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","first-page":"729663","DOI":"10.3389\/fdata.2021.729663","article-title":"A Systematic Review on Model Watermarking for Neural Networks","volume":"4","author":"Boenisch Franziska","year":"2021","unstructured":"Franziska Boenisch. 2021. A Systematic Review on Model Watermarking for Neural Networks. Frontiers in Big Data 4 (2021), 729663.","journal-title":"Frontiers in Big Data"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","first-page":"9512","DOI":"10.1609\/aaai.v36i9.21184","article-title":"Cosine model watermarking against ensemble distillation","volume":"36","author":"Charette Laurent","year":"2022","unstructured":"Laurent Charette, Lingyang Chu, Yizhou Chen, Jian Pei, Lanjun Wang, and Yong Zhang. 2022. Cosine model watermarking against ensemble distillation. In AAAI, Vol. 36. 9512--9520.","journal-title":"AAAI"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_2_1_8_1","volume-title":"Bita Darvish Rouhani, and Farinaz Koushanfar","author":"Chen Huili","year":"2019","unstructured":"Huili Chen, Bita Darvish Rouhani, and Farinaz Koushanfar. 2019. Blackmarks: Blackbox Multibit Watermarking for Deep Neural Networks. arXiv preprint arXiv:1904.00344 (2019)."},{"key":"e_1_2_1_9_1","volume-title":"Targeted Backdoor Attacks on Deep Learning Systems using Data Poisoning. arXiv preprint arXiv:1712.05526","author":"Chen Xinyun","year":"2017","unstructured":"Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. 2017. Targeted Backdoor Attacks on Deep Learning Systems using Data Poisoning. arXiv preprint arXiv:1712.05526 (2017)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1002\/widm.1211"},{"key":"e_1_2_1_11_1","unstructured":"Ingemar Cox Matthew Miller Jeffrey Bloom Jessica Fridrich and Ton Kalker. 2007. Digital Watermarking and Steganography. Morgan Kaufmann."},{"key":"e_1_2_1_12_1","volume-title":"Deepsigns: An End-to-End Watermarking Framework for Ownership Protection of Deep Neural Networks. In ASPLOS. 485--497.","author":"Rouhani Bita Darvish","year":"2019","unstructured":"Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. 2019. Deepsigns: An End-to-End Watermarking Framework for Ownership Protection of Deep Neural Networks. In ASPLOS. 485--497."},{"key":"e_1_2_1_13_1","unstructured":"Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"e_1_2_1_14_1","volume-title":"Supervised GAN Watermarking for Intellectual Property Protection. In IEEE International Workshop on Information Forensics and Security. 1--6.","author":"Fei Jianwei","year":"2022","unstructured":"Jianwei Fei, Zhihua Xia, Benedetta Tondi, and Mauro Barni. 2022. Supervised GAN Watermarking for Intellectual Property Protection. In IEEE International Workshop on Information Forensics and Security. 1--6."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1137\/090771806"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/76.867936"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Qiang Huang Yifan Lei and Anthony KH Tung. 2021. Point-to-Hyperplane Nearest Neighbor Search Beyond the Unit Hypersphere. In SIGMOD. 777--789.","DOI":"10.1145\/3448016.3457240"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2007.07.002"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2019.2921572"},{"key":"e_1_2_1_20_1","volume-title":"The Art of Computer Programming: Seminumerical Algorithms","author":"Knuth Donald E","unstructured":"Donald E Knuth. 2014. The Art of Computer Programming: Seminumerical Algorithms, volume 2. Addison-Wesley Professional."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-019-04434-z"},{"key":"e_1_2_1_22_1","unstructured":"Jong-Seok Lee Taeg-Sang Cho Jiye Lee Myung-Kee Jang Tae-Kwang Jang Dongkyung Nam and Cheol Hoon Park. 2004. A Stochastic Search Approach for the Multidimensional Largest Empty Sphere Problem. (2004) 1--11."},{"key":"e_1_2_1_23_1","unstructured":"Yifan Lei Qiang Huang Mohan Kankanhalli and Anthony Tung. 2019. Sublinear Time Nearest Neighbor Search over Generalized Weighted Space. In ICML. 3773--3781."},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Yifan Lei Qiang Huang Mohan Kankanhalli and Anthony KH Tung. 2020. Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring. In SIGMOD. 2589--2599.","DOI":"10.1145\/3318464.3389778"},{"key":"e_1_2_1_25_1","doi-asserted-by":"crossref","first-page":"14991","DOI":"10.1609\/aaai.v37i12.26750","article-title":"PLMmark: a secure and robust black-box watermarking framework for pre-trained language models","volume":"37","author":"Li Peixuan","year":"2023","unstructured":"Peixuan Li, Pengzhou Cheng, Fangqi Li, Wei Du, Haodong Zhao, and Gongshen Liu. 2023. PLMmark: a secure and robust black-box watermarking framework for pre-trained language models. In AAAI, Vol. 37. 14991--14999.","journal-title":"AAAI"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.051"},{"key":"e_1_2_1_27_1","volume-title":"Open-sourced Dataset Protection via Backdoor Watermarking. arXiv preprint arXiv:2010.05821","author":"Li Yiming","year":"2020","unstructured":"Yiming Li, Ziqi Zhang, Jiawang Bai, Baoyuan Wu, Yong Jiang, and Shu-Tao Xia. 2020. Open-sourced Dataset Protection via Backdoor Watermarking. arXiv preprint arXiv:2010.05821 (2020)."},{"key":"e_1_2_1_28_1","volume-title":"Kai Ming Ting, and Zhi-Hua Zhou","author":"Liu Fei Tony","year":"2008","unstructured":"Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In ICDM. 413--422."},{"key":"e_1_2_1_29_1","unstructured":"Thomas P Minka. 2000. Automatic Choice of Dimensionality for PCA. In NIPS. 577--583."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Ryota Namba and Jun Sakuma. 2019. Robust Watermarking of Neural Network with Exponential Weighting. In AsiaCCS. 228--240.","DOI":"10.1145\/3321705.3329808"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.1013"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-022-01113-3"},{"key":"e_1_2_1_33_1","volume-title":"IEEE International Conference on Industrial Informatics. 709--716","author":"Potdar Vidyasagar M","year":"2005","unstructured":"Vidyasagar M Potdar, Song Han, and Elizabeth Chang. 2005. A Survey of Digital Image Watermarking Techniques. In IEEE International Conference on Industrial Informatics. 709--716."},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","first-page":"103102","DOI":"10.1016\/j.cose.2023.103102","article-title":"A novel model watermarking for protecting generative adversarial network","volume":"127","author":"Qiao Tong","year":"2023","unstructured":"Tong Qiao, Yuyan Ma, Ning Zheng, Hanzhou Wu, Yanli Chen, Ming Xu, and Xiangyang Luo. 2023. A novel model watermarking for protecting generative adversarial network. Computers & Security 127 (2023), 103102.","journal-title":"Computers & Security"},{"key":"e_1_2_1_35_1","first-page":"1852","article-title":"Watermarking deep neural networks in image processing","volume":"32","author":"Quan Yuhui","year":"2020","unstructured":"Yuhui Quan, Huan Teng, Yixin Chen, and Hui Ji. 2020. Watermarking deep neural networks in image processing. TNNLS 32, 5 (2020), 1852--1865.","journal-title":"TNNLS"},{"key":"e_1_2_1_36_1","volume-title":"IEEE 14th International Conference on Machine Learning and Applications. 896--902","author":"Ribeiro Mauro","year":"2015","unstructured":"Mauro Ribeiro, Katarina Grolinger, and Miriam AM Capretz. 2015. Mlaas: Machine learning as a service. In IEEE 14th International Conference on Machine Learning and Applications. 896--902."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6871"},{"key":"e_1_2_1_38_1","unstructured":"Ali Shafahi W Ronny Huang Mahyar Najibi Octavian Suciu Christoph Studer Tudor Dumitras and Tom Goldstein. 2018. Poison Frogs! Targeted Clean-label Poisoning Attacks on Neural Networks. In NeurIPS. 6106--6116."},{"key":"e_1_2_1_39_1","volume-title":"2020 International Conference on Electronics and Sustainable Communication Systems. 490--494","author":"Sheikh Mohammad Ahmad","year":"2020","unstructured":"Mohammad Ahmad Sheikh, Amit Kumar Goel, and Tapas Kumar. 2020. An approach for prediction of loan approval using machine learning algorithm. In 2020 International Conference on Electronics and Sustainable Communication Systems. 490--494."},{"key":"e_1_2_1_40_1","volume-title":"Halla Hrund Sk\u00falad\u00f3ttir, and H\u00f6skuldur Borgthorsson","author":"Shklovski Irina","year":"2014","unstructured":"Irina Shklovski, Scott D Mainwaring, Halla Hrund Sk\u00falad\u00f3ttir, and H\u00f6skuldur Borgthorsson. 2014. Leakiness and creepiness in app space: Perceptions of privacy and mobile app use. In SIGCHI. 2347--2356."},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Reza Shokri Marco Stronati Congzheng Song and Vitaly Shmatikov. 2017. Membership Inference Attacks Against Machine Learning Models. In S&P. 3--18.","DOI":"10.1109\/SP.2017.41"},{"key":"e_1_2_1_42_1","doi-asserted-by":"crossref","unstructured":"Congzheng Song Thomas Ristenpart and Vitaly Shmatikov. 2017. Machine Learning Models that Remember Too Much. In CCS. 587--601.","DOI":"10.1145\/3133956.3134077"},{"key":"e_1_2_1_43_1","volume-title":"Samuel Marchal, and N Asokan.","author":"Szyller Sebastian","year":"2021","unstructured":"Sebastian Szyller, Buse Gul Atli, Samuel Marchal, and N Asokan. 2021. Dawn: Dynamic adversarial watermarking of neural networks. In MM. 4417--4425."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1162\/089976699300016728"},{"key":"e_1_2_1_45_1","first-page":"2073","article-title":"Demystifying Membership Inference Attacks in Machine Learning As A Service","volume":"14","author":"Truex Stacey","year":"2021","unstructured":"Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, and Wenqi Wei. 2021. Demystifying Membership Inference Attacks in Machine Learning As A Service. TSC 14, 06 (2021), 2073--2089.","journal-title":"TSC"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Yusuke Uchida Yuki Nagai Shigeyuki Sakazawa and Shin'ichi Satoh. 2017. Embedding watermarks into deep neural networks. In ICMR. 269--277.","DOI":"10.1145\/3078971.3078974"},{"key":"e_1_2_1_47_1","first-page":"22","article-title":"Watermarking in Deep Neural Networks via Error Back-Propagation","volume":"2020","author":"Wang Jiangfeng","year":"2020","unstructured":"Jiangfeng Wang, Hanzhou Wu, Xinpeng Zhang, and Yuwei Yao. 2020. Watermarking in Deep Neural Networks via Error Back-Propagation. Electronic Imaging 2020, 4 (2020), 22--1.","journal-title":"Electronic Imaging"},{"key":"e_1_2_1_48_1","doi-asserted-by":"crossref","first-page":"619","DOI":"10.3390\/sym14030619","article-title":"Protecting the intellectual property of speaker recognition model by black-box watermarking in the frequency domain","volume":"14","author":"Wang Yumin","year":"2022","unstructured":"Yumin Wang and Hanzhou Wu. 2022. Protecting the intellectual property of speaker recognition model by black-box watermarking in the frequency domain. Symmetry 14, 3 (2022), 619.","journal-title":"Symmetry"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.3030671"},{"key":"e_1_2_1_50_1","first-page":"908","article-title":"Intellectual Property Protection for Deep Learning Models: Taxonomy, Methods, Attacks, and Evaluations","volume":"3","author":"Xue Mingfu","year":"2021","unstructured":"Mingfu Xue, Yushu Zhang, Jian Wang, and Weiqiang Liu. 2021. Intellectual Property Protection for Deep Learning Models: Taxonomy, Methods, Attacks, and Evaluations. TAI 3, 6 (2021), 908--923.","journal-title":"TAI"},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Peng Yang Yingjie Lao and Ping Li. 2021. Robust watermarking for deep neural networks via bi-level optimization. In ICCV. 14841--14850.","DOI":"10.1109\/ICCV48922.2021.01457"},{"key":"e_1_2_1_52_1","volume-title":"Exploring structure consistency for deep model watermarking. arXiv preprint arXiv:2108.02360","author":"Zhang Jie","year":"2021","unstructured":"Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Zehua Ma, Weiming Zhang, Gang Hua, and Nenghai Yu. 2021. Exploring structure consistency for deep model watermarking. arXiv preprint arXiv:2108.02360 (2021)."},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","first-page":"12805","DOI":"10.1609\/aaai.v34i07.6976","article-title":"Model Watermarking for Image Processing Networks","volume":"34","author":"Zhang Jie","year":"2020","unstructured":"Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Weiming Zhang, Wenbo Zhou, Hao Cui, and Nenghai Yu. 2020. Model Watermarking for Image Processing Networks. In AAAI, Vol. 34. 12805--12812.","journal-title":"AAAI"},{"key":"e_1_2_1_54_1","first-page":"4005","article-title":"Deep Model Intellectual Property Protection via Deep Watermarking","volume":"44","author":"Zhang Jie","year":"2021","unstructured":"Jie Zhang, Dongdong Chen, Jing Liao, Weiming Zhang, Huamin Feng, Gang Hua, and Nenghai Yu. 2021. Deep Model Intellectual Property Protection via Deep Watermarking. TPAMI 44, 8 (2021), 4005--4020.","journal-title":"TPAMI"},{"key":"e_1_2_1_55_1","volume-title":"Heqing Huang, and Ian Molloy.","author":"Zhang Jialong","year":"2018","unstructured":"Jialong Zhang, Zhongshu Gu, Jiyong Jang, Hui Wu, Marc Ph Stoecklin, Heqing Huang, and Ian Molloy. 2018. Protecting Intellectual Property of Deep Neural Networks with Watermarking. In AsiaCCS. 159--172."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3659437.3659446","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T16:24:12Z","timestamp":1717172652000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3659437.3659446"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4]]},"references-count":55,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["10.14778\/3659437.3659446"],"URL":"https:\/\/doi.org\/10.14778\/3659437.3659446","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,4]]},"assertion":[{"value":"2024-05-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}