{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T16:46:46Z","timestamp":1759682806777,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,11,11]],"date-time":"2019-11-11T00:00:00Z","timestamp":1573430400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Data and Information Quality"],"published-print":{"date-parts":[[2020,3,31]]},"abstract":"<jats:p>We are experiencing an amazing data-centered revolution. Incredible amounts of data are collected, integrated, and analyzed, leading to key breakthroughs in science and society. This well of knowledge, however, is at a great risk if we do not dispense with some of the data flood. First, the amount of generated data grows exponentially and already at 2020 is expected to be more than twice the available storage. Second, even disregarding storage constraints, uncontrolled data retention risks privacy and security, as recognized, e.g., by the recent EU Data Protection reform. Data disposal policies must be developed to benefit and protect organizations and individuals.<\/jats:p>\n          <jats:p>Retaining the knowledge hidden in the data while respecting storage, processing, and regulatory constraints is a great challenge. The difficulty stems from the distinct, intricate requirements entailed by each type of constraint, the scale and velocity of data, and the constantly evolving needs. While multiple data sketching, summarization, and deletion techniques were developed to address specific aspects of the problem, we are still very far from a comprehensive solution. Every organization has to battle the same tough challenges with ad hoc solutions that are application-specific and rarely sharable.<\/jats:p>\n          <jats:p>In this article, we will discuss the logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information. In particular, we will overview relevant related work, highlighting new research challenges and potential reuse of existing techniques.<\/jats:p>","DOI":"10.1145\/3326920","type":"journal-article","created":{"date-parts":[[2019,11,11]],"date-time":"2019-11-11T18:13:30Z","timestamp":1573496010000},"page":"1-7","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Getting Rid of Data"],"prefix":"10.1145","volume":"12","author":[{"given":"Tova","family":"Milo","sequence":"first","affiliation":[{"name":"School of Computer Science, Tel Aviv University, Tel Aviv, Israel"}]}],"member":"320","published-online":{"date-parts":[[2019,11,11]]},"reference":[{"volume-title":"Proceedings of the CIKM. 483--492","author":"Ainy E.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/2095686.2095693"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465318"},{"key":"e_1_2_1_4_1","unstructured":"M. Besta and T. Hoefler. 2018. Survey and taxonomy of lossless graph compression and space-efficient graph representations. Retrieved from CoRR abs\/1806.01799 (2018).  M. Besta and T. Hoefler. 2018. Survey and taxonomy of lossless graph compression and space-efficient graph representations. Retrieved from CoRR abs\/1806.01799 (2018)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"A. Cal\u00ec D. Calvanese and M. Lenzerini. 2013. Data integration under integrity constraints. In Seminal Contributions to Information Systems Engineering 25 Years of CAiSE. 335--352.  A. Cal\u00ec D. Calvanese and M. Lenzerini. 2013. Data integration under integrity constraints. In Seminal Contributions to Information Systems Engineering 25 Years of CAiSE. 335--352.","DOI":"10.1007\/978-3-642-36926-1_27"},{"volume-title":"Proceedings of the SIGMOD.","author":"Chaudhuri S.","key":"e_1_2_1_6_1"},{"key":"e_1_2_1_7_1","first-page":"10","article-title":"BigGorilla: An open-source ecosystem for data preparation and integration","volume":"41","author":"Chen C.","year":"2018","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1561\/1900000006"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3080008"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-031-01891-6","volume-title":"Business Processes: A Database Perspective. Morgan 8 Claypool Publishers.","author":"Deutch D.","year":"2012"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2011.09.004"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"A. Doan A. Y. Halevy and Z. G. Ives. 2012. Principles of Data Integration. Morgan Kaufmann.  A. Doan A. Y. Halevy and Z. G. Ives. 2012. Principles of Data Integration. Morgan Kaufmann.","DOI":"10.1016\/B978-0-12-416044-6.00015-6"},{"key":"e_1_2_1_13_1","unstructured":"GDPR. (2016). General Data Protection Regulation (GDPR). Retrieved from https:\/\/en.wikipedia.org\/wiki\/General_Data_Protection_Regulation.  GDPR. (2016). General Data Protection Regulation (GDPR). Retrieved from https:\/\/en.wikipedia.org\/wiki\/General_Data_Protection_Regulation."},{"volume-title":"Proceedings of the ICDE. 174--185","author":"Glavic B.","key":"e_1_2_1_14_1"},{"volume-title":"Proceedings of the PODS. 93--99","author":"Green T. J.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","first-page":"38","article-title":"Effective data cleaning with continuous evaluation","volume":"39","author":"Ilyas Ihab F.","year":"2016","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2611567"},{"volume-title":"Proceedings of the CIDR.","author":"Kersten M. L.","key":"e_1_2_1_18_1"},{"volume-title":"Proceedings of the PODS. 75--90","author":"Koch C.","key":"e_1_2_1_19_1"},{"volume-title":"Proceedings of the SIGMOD. 489--504","author":"Kraska T.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Y. Liu T. Safavi A. Dighe and D. Koutra. 2018. Graph summarization methods and applications: A survey. ACM Comput. Surv. 51 3 (2018) 62:1--62:34.  Y. Liu T. Safavi A. Dighe and D. Koutra. 2018. Graph summarization methods and applications: A survey. ACM Comput. Surv. 51 3 (2018) 62:1--62:34.","DOI":"10.1145\/3186727"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3097983.3105809"},{"key":"e_1_2_1_23_1","first-page":"1","article-title":"The smart crowd\u2014Learning from the ones who know","volume":"3","author":"Milo Tova","year":"2017","journal-title":"Proceedings of the ICDT."},{"key":"e_1_2_1_24_1","first-page":"26","article-title":"Optimizing open-ended crowdsourcing: The next frontier in crowdsourced data management","volume":"39","author":"Parameswaran Aditya G.","year":"2016","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_25_1","first-page":"265","article-title":"Big data reduction methods: A survey. Data Sci","volume":"1","author":"Rehman M. H.","year":"2016","journal-title":"Eng."},{"key":"e_1_2_1_26_1","unstructured":"retention [n.d.]. Data Retention. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Data_retention.  retention [n.d.]. Data Retention. Retrieved from https:\/\/en.wikipedia.org\/wiki\/Data_retention."},{"key":"e_1_2_1_27_1","article-title":"Digital data storage is undergoing mind-boggling growth","author":"Rizzatti L.","year":"2016","journal-title":"EETimes Magazine"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2007.06.001"}],"container-title":["Journal of Data and Information Quality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3326920","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3326920","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:25:32Z","timestamp":1750206332000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3326920"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,11]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,3,31]]}},"alternative-id":["10.1145\/3326920"],"URL":"https:\/\/doi.org\/10.1145\/3326920","relation":{},"ISSN":["1936-1955","1936-1963"],"issn-type":[{"type":"print","value":"1936-1955"},{"type":"electronic","value":"1936-1963"}],"subject":[],"published":{"date-parts":[[2019,11,11]]},"assertion":[{"value":"2019-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-11-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}