{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T02:17:13Z","timestamp":1771467433242,"version":"3.50.1"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,2,2]],"date-time":"2021-02-02T00:00:00Z","timestamp":1612224000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100007297","name":"Office of Naval Research","doi-asserted-by":"publisher","award":["N00014-18-1-2670"],"award-info":[{"award-number":["N00014-18-1-2670"]}],"id":[{"id":"10.13039\/100007297","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Manage. Inf. Syst."],"published-print":{"date-parts":[[2021,6,30]]},"abstract":"<jats:p>\n            Theft of intellectual property is a growing problem\u2014one that is exacerbated by the fact that a successful compromise of an enterprise might only become known months after the hack. A recent solution called FORGE addresses this problem by automatically generating\n            <jats:italic>N<\/jats:italic>\n            \u201cfake\u201d versions of any real document so that the attacker has to determine which of the\n            <jats:italic>N<\/jats:italic>\n            + 1 documents that they have exfiltrated from a compromised network is real. In this article, we remove two major drawbacks in FORGE: (i) FORGE requires ontologies in order to generate fake documents\u2014however, in the real world, ontologies, especially good ontologies, are infrequently available. The WE-FORGE system proposed in this article completely eliminates the need for ontologies by using distance metrics on word embeddings instead. (ii) FORGE generates fake documents by first identifying \u201ctarget\u201d concepts in the original document and then substituting \u201creplacement\u201d concepts for them. However, we will show that this can lead to sub-optimal results (e.g., as target concepts are selected\n            <jats:italic>without<\/jats:italic>\n            knowing the availability and\/or quality of the replacement concepts, they can sometimes lead to poor results). Our WE-FORGE system addresses this problem in two possible ways by performing a joint optimization to select concepts and replacements simultaneously. We conduct a human study involving both computer science and chemistry documents and show that WE-FORGE successfully deceives adversaries.\n          <\/jats:p>","DOI":"10.1145\/3418289","type":"journal-article","created":{"date-parts":[[2021,2,3]],"date-time":"2021-02-03T05:04:36Z","timestamp":1612328676000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Using Word Embeddings to Deter Intellectual Property Theft through Automated Generation of Fake Documents"],"prefix":"10.1145","volume":"12","author":[{"given":"Almas","family":"Abdibayev","sequence":"first","affiliation":[{"name":"Dartmouth College, Hanover, New Hampshire"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongkai","family":"Chen","sequence":"additional","affiliation":[{"name":"Dartmouth College, Hanover, New Hampshire"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haipeng","family":"Chen","sequence":"additional","affiliation":[{"name":"Dartmouth College, Hanover, New Hampshire"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Deepti","family":"Poluru","sequence":"additional","affiliation":[{"name":"Dartmouth College, Hanover, New Hampshire"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"V. S.","family":"Subrahmanian","sequence":"additional","affiliation":[{"name":"Dartmouth College, Hanover, New Hampshire"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,2,2]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2382196.2382284"},{"key":"e_1_2_2_2_1","volume-title":"\u201cO\u2019Reilly Media","author":"Bird Steven","unstructured":"Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. \u201cO\u2019Reilly Media, Inc.\u201d"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_2_2_4_1","volume-title":"Stolfo","author":"Bowen Brian M.","year":"2009","unstructured":"Brian M. Bowen, Shlomo Hershkop, Angelos D. Keromytis, and Salvatore J. Stolfo. 2009. Baiting inside attackers using decoy documents. In Proceedings of the International Conference on Security and Privacy in Communication Systems. Springer, 51--70."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2019.2898661"},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.21512\/comtech.v7i4.3746"},{"key":"e_1_2_2_7_1","unstructured":"Catalin Cimpanu. 2020. FBI is investigating more than 1 000 cases of Chinese theft of US technology. Retrieved from https:\/\/www.zdnet.com\/article\/fbi-is-investigating-more-than-1000-cases-of-chinese-theft-of-us-technology\/."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.polisci.2.1.25"},{"key":"e_1_2_2_9_1","volume-title":"Mining of Massive Data Sets","author":"Leskovec Jure","unstructured":"Jure Leskovec, Anand Rajaraman, and Jeffrey David Ullman. 2020. Mining of Massive Data Sets. Cambridge University Press."},{"key":"e_1_2_2_10_1","volume-title":"Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability","volume":"1","author":"\u00a0al James","year":"1967","unstructured":"James MacQueen et\u00a0al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Oakland, CA, 281--297."},{"key":"e_1_2_2_11_1","first-page":"521","article-title":"6, 6\u2032-bis-(1-phosphanorbornadiene) diphosphines, their preparation and their uses","volume":"6","author":"Mathey Francois","year":"2003","unstructured":"Francois Mathey, Francois Mercier, Michel Spagnol, Frederic Robin, and Virginie Mouries. 2003. 6, 6\u2032-bis-(1-phosphanorbornadiene) diphosphines, their preparation and their uses. US Patent 6,521,795.","journal-title":"US Patent"},{"key":"e_1_2_2_12_1","unstructured":"Tomas Mikolov Kai Chen Greg Corrado and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781."},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security. 93--94","author":"Park Younghee","unstructured":"Younghee Park and Salvatore J. Stolfo. 2012. Software decoys for insider threat. In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security. 93--94."},{"key":"e_1_2_2_14_1","unstructured":"Eric Rosenbaum. 2019. 1 in 5 corporations say China has stolen their IP within the last year: CNBC CFO survey. Retrieved from https:\/\/www.cnbc.com\/2019\/02\/28\/1-in-5-companies-say-china-stole-their-ip-within-the-last-year-cnbc.html."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/0377-0427(87)90125-7"},{"key":"e_1_2_2_16_1","volume-title":"Strategic Studies","author":"Schelling Thomas C.","unstructured":"Thomas C. Schelling. 2008. Arms and influence. In Strategic Studies. Routledge, 96--114."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/11926078_45"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2008.57"},{"key":"e_1_2_2_19_1","volume-title":"Proceedings of the 2012 IEEE Symposium on Security and Privacy Workshops. IEEE, 129--133","author":"Voris Jonathan","unstructured":"Jonathan Voris, Nathaniel Boggs, and Salvatore J. Stolfo. 2012. Lost in translation: Improving decoy documents via automated translation. In Proceedings of the 2012 IEEE Symposium on Security and Privacy Workshops. IEEE, 129--133."},{"key":"e_1_2_2_20_1","volume-title":"Proceedings of the International Conference on Trustworthy Computing and Services. Springer, 123--129","author":"Wang Lei","year":"2013","unstructured":"Lei Wang, Chenglong Li, QingFeng Tan, and XueBin Wang. 2013. Generation and distribution of decoy document system. In Proceedings of the International Conference on Trustworthy Computing and Services. Springer, 123--129."},{"key":"e_1_2_2_21_1","unstructured":"Jonathan White and Dale Thompson. 2006. Using synthetic decoys to digitally watermark personally-identifying data and to promote data security. In Security and Management. Citeseer 91--99."},{"key":"e_1_2_2_22_1","first-page":"103","article-title":"Automating the generation of fake documents to detect network intruders","volume":"2","author":"Whitham Ben","year":"2013","unstructured":"Ben Whitham. 2013. Automating the generation of fake documents to detect network intruders. International Journal of Cyber-Security and Digital Forensics 2, 1 (2013), 103.","journal-title":"International Journal of Cyber-Security and Digital Forensics"},{"key":"e_1_2_2_23_1","volume-title":"Proceedings in the 15th Australian Information Warfare Conference","author":"Whitham Ben","year":"2014","unstructured":"Ben Whitham. 2014. Design requirements for generating deceptive content to protect document repositories. In Proceedings in the 15th Australian Information Warfare Conference, Perth, Australia."},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/IAW.2004.1437806"}],"container-title":["ACM Transactions on Management Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3418289","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3418289","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3418289","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:27Z","timestamp":1750197747000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3418289"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,2]]},"references-count":24,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2021,6,30]]}},"alternative-id":["10.1145\/3418289"],"URL":"https:\/\/doi.org\/10.1145\/3418289","relation":{},"ISSN":["2158-656X","2158-6578"],"issn-type":[{"value":"2158-656X","type":"print"},{"value":"2158-6578","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,2]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-02-02","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}