{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T04:40:32Z","timestamp":1771994432983,"version":"3.50.1"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:00:00Z","timestamp":1740182400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"JSPS Kakenhi","award":["22H05001 and JP20A402"],"award-info":[{"award-number":["22H05001 and JP20A402"]}]},{"name":"JST ASPIRE","award":["JPMJAP2302"],"award-info":[{"award-number":["JPMJAP2302"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            Sentence embeddings represent the meaning of a given sentence using a fixed dimensional vector. Different approaches have been proposed in the Natural Language Processing (NLP) community for learning encoders that can produce accurate sentence embeddings that perform well for diverse downstream tasks requiring sentence representations. Despite prior work focusing mainly on creating accurate sentence embeddings, how to keep private the sensitive information contained in the sentences remains an unexplored research problem. In this article, we propose Covering Metric Analytic Gaussian (CMAG), a\n            <jats:italic toggle=\"yes\">covering<\/jats:italic>\n            metric Differential Privacy (DP) mechanism for sentence embeddings such that minimal random noise is added to a set of sentence embeddings produced by an encoder to protect the private information expressed in those sentences. Given a sentence embedding\n            <jats:bold>\n              <jats:italic toggle=\"yes\">s<\/jats:italic>\n            <\/jats:bold>\n            , CMAG considers the Mahalanobis distance between\n            <jats:bold>\n              <jats:italic toggle=\"yes\">s<\/jats:italic>\n            <\/jats:bold>\n            and the other sentence embeddings\n            <jats:bold>\n              <jats:italic toggle=\"yes\">s<\/jats:italic>\n            <\/jats:bold>\n            \u2019 in the local neighbourhood of\n            <jats:bold>\n              <jats:italic toggle=\"yes\">s<\/jats:italic>\n            <\/jats:bold>\n            to determine the minimal amount of random noise that must be added to\n            <jats:bold>\n              <jats:italic toggle=\"yes\">s<\/jats:italic>\n            <\/jats:bold>\n            to obtain provable metric DP guarantees. Experimental results show that the proposed DP mechanism protects private information better than previously proposed DP mechanisms while reporting good performance in a broad range of downstream NLP tasks.\n          <\/jats:p>\n          <jats:p\/>","DOI":"10.1145\/3708321","type":"journal-article","created":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T05:53:06Z","timestamp":1734501186000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["A Metric Differential Privacy Mechanism for Sentence Embeddings"],"prefix":"10.1145","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4476-7003","authenticated-orcid":false,"given":"Danushka","family":"Bollegala","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Liverpool","place":["Liverpool, United Kingdom of Great Britain and Northern Ireland"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-6017-8681","authenticated-orcid":false,"given":"Shuichi","family":"Otake","sequence":"additional","affiliation":[{"name":"National Institute of Informatics","place":["Chiyoda-ku, Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1996-7233","authenticated-orcid":false,"given":"Tomoya","family":"Machide","sequence":"additional","affiliation":[{"name":"International Professional University of Technology in Tokyo","place":["Shinjuku-ku, Japan"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6056-4287","authenticated-orcid":false,"given":"Ken-Ichi","family":"Kawarabayashi","sequence":"additional","affiliation":[{"name":"National Institute of Informatics","place":["Chiyoda-ku, Japan"]}]}],"member":"320","published-online":{"date-parts":[[2025,2,22]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2976749.2978318"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-80387-2_9"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508859.2516735"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00106"},{"key":"e_1_3_3_6_2","series-title":"Proceedings of Machine Learning Research","first-page":"394","volume-title":"ICML","volume":"80","author":"Balle Borja","year":"2018","unstructured":"Borja Balle and Yu-Xiang Wang. 2018. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In ICML(Proceedings of Machine Learning Research, Vol. 80). PMLR, Stockholm, Sweden, 394\u2013403."},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3342220.3344925"},{"key":"e_1_3_3_8_2","unstructured":"David M. Blei Andrew Y. Ng and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003) 993\u20131022."},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.findings-ijcnlp.7"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12010"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/S17-2001"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-39077-7_5"},{"key":"e_1_3_3_13_2","first-page":"1609","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918)","author":"Conneau Alexis","year":"2018","unstructured":"Alexis Conneau and Douwe Kiela. 2018. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918), Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, H\u00e9l\u00e8ne Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA), Miyazaki, Japan, 1609\u20131704."},{"key":"e_1_3_3_14_2","first-page":"9","volume-title":"Proceedings of the 3rd International Workshop on Paraphrasing (IWP\u201905)","author":"Dolan William B.","year":"2005","unstructured":"William B. Dolan and Chris Brockett. 2005. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the 3rd International Workshop on Paraphrasing (IWP\u201905). Association for Computational Linguistics, Jeju Island, Korea, 9\u201316."},{"key":"e_1_3_3_15_2","volume-title":"The Algorithmic Foundations of Differential Privacy","author":"Dwork Cynthia","year":"2014","unstructured":"Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Vol. 9. Foundations and Trends in TCS, Microsoft Research."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3336191.3371856"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.trustnlp-1.3"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.552"},{"key":"e_1_3_3_19_2","series-title":"Proceedings of Machine Learning Research","first-page":"3887","volume-title":"Proceedings of the 37th International Conference on Machine Learning","volume":"119","author":"Guo Ruiqi","year":"2020","unstructured":"Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar. 2020. Accelerating large-scale inference with anisotropic vector quantization. In Proceedings of the 37th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 119), Hal Daum\u00e9 III and Aarti Singh (Eds.). PMLR, Virtual, 3887\u20133896."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.114"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-short.87"},{"issue":"21","key":"e_1_3_3_22_2","first-page":"703","article-title":"Differential privacy for functions and functional data","volume":"14","author":"Hall Rob","year":"2013","unstructured":"Rob Hall, Alessandro Rinaldo, and Larry Wasserman. 2013. Differential privacy for functions and functional data. J. Mach. Learn. Res. 14, 21 (2013), 703\u2013727. Retrieved from https:\/\/jmlr.org\/papers\/v14\/hall13a.html","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_3_23_2","article-title":"LoRA: Low-Rank adaptation of large language models","author":"Hu Edward J.","year":"2021","unstructured":"Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. LoRA: Low-Rank adaptation of large language models. Int. Conf. Learn. Represent. (June2021). http:\/\/arxiv.org\/abs\/2106.09685","journal-title":"Int. Conf. Learn. Represent."},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/1014052.1014073"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1137\/090756090"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.acl-long.339"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-main.207"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072378"},{"key":"e_1_3_3_29_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2005"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.213"},{"key":"e_1_3_3_31_2","first-page":"216","volume-title":"Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Marelli Marco","year":"2014","unstructured":"Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and Roberto Zamparelli. 2014. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914). European Language Resources Association (ELRA), Reykjavik, Iceland, 216\u2013223."},{"key":"e_1_3_3_32_2","first-page":"1","volume-title":"ICLRCoRR","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Kai Chen, and Jeffrey Dean. 2013. Efficient estimation of word representation in vector space. In ICLR. CoRR 1, 1\u201310."},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218990"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_3_35_2","first-page":"9","volume-title":"NeurIPS","author":"Rogers Ryan M.","year":"2016","unstructured":"Ryan M. Rogers, Aaron Roth, Jonathan Ullman, and Salil Vadhan. 2016. Privacy odometers and filters: Pay-as-you-go composition. In NeurIPS, Vol. 29. Curran Associates, Inc., Barcelona, Spain, 9 pages."},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.448"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D13-1170"},{"key":"e_1_3_3_38_2","volume-title":"Lecture 6.5 \u2013 RMSProp","author":"Tielmand T.","year":"2012","unstructured":"T. Tielmand and G. Hinton. 2012. Divide the gradient by a running average of its recent magnitude. In Lecture 6.5 \u2013 RMSProp. CORSERA: Neural Networks for Machine Learning, Toronto, Canada."},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539417"},{"key":"e_1_3_3_40_2","first-page":"90","volume-title":"ACL","author":"Wang Sida","year":"2012","unstructured":"Sida Wang and Christopher D. Manning. 2012. Baselines and bigrams: Simple, good sentiment and topic classification. In ACL. Association for Computational Linguistics, Jeju Island, Korea, 90\u201394."},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.87"},{"key":"e_1_3_3_42_2","first-page":"3898","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"Wu Xing","year":"2022","unstructured":"Xing Wu, Chaochen Gao, Liangjun Zang, Jizhong Han, Zhongyuan Wang, and Songlin Hu. 2022. ESimCSE: Enhanced sample building method for contrastive learning of unsupervised sentence embedding. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 3898\u20133907."},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.737"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.privatenlp-1.2"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01187"},{"key":"e_1_3_3_46_2","doi-asserted-by":"crossref","unstructured":"M. Abramowitz and I. A. Stegun. 1965. Handbook of Mathematical Functions: With Formulas Graphs and Mathematical Tables. Dover Publications. https:\/\/books.google.co.uk\/books?id=MtU8uP7XMvoC","DOI":"10.2307\/1266136"}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708321","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3708321","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T13:49:37Z","timestamp":1756475377000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3708321"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,22]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3708321"],"URL":"https:\/\/doi.org\/10.1145\/3708321","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,22]]},"assertion":[{"value":"2024-04-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-19","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}