{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T14:27:00Z","timestamp":1780583220721,"version":"3.54.1"},"reference-count":19,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,8]]},"abstract":"<jats:p>\n            Modern data estates are spread across data located on premises, on the edge and in one or more public clouds, spread across various sources like multiple relational databases, file and storage systems, and no-SQL systems, both operational and analytic; this phenomenon is referred to as\n            <jats:italic toggle=\"yes\">data sprawl.<\/jats:italic>\n            Data administrators who wish to enforce compliance across the entire organization have to inventory their data, identify what parts of it are sensitive, and govern the sensitive data appropriately --- across the entirety of their sprawling data estate. Today, governance of data is completely siloed; each of the data subsystems has its own (and varied) governance features. Policies applied to sensitive data are applied piece-meal by iterating over all the data sources in a custom language specific to each source. This makes data governance cumbersome, error-prone (because a given policy must be manually enforced across different subsystems, inconsistencies can easily arise), and expensive.\n          <\/jats:p>\n          <jats:p>\n            This paper presents\n            <jats:italic toggle=\"yes\">Microsoft Purview<\/jats:italic>\n            , a service for unified governance of the entire data estate of an organization from a single central pane of glass. The Purview service consists of three parts: (1) a\n            <jats:italic toggle=\"yes\">Data Map<\/jats:italic>\n            or\n            <jats:italic toggle=\"yes\">metadata catalog<\/jats:italic>\n            that is populated by automated scanning of data sources in the organization, (2) a system to store and manage sensitivity\n            <jats:italic toggle=\"yes\">classification<\/jats:italic>\n            of data, and (3) a\n            <jats:italic toggle=\"yes\">policy<\/jats:italic>\n            system that enables data security officers to author and implement policies that span the entire organization, e.g., a policy that says, \"Non-full-time employees should be denied access to data classified as PII (Personally Identifiable Information.\")\n          <\/jats:p>\n          <jats:p>Purview transforms data governance across a complex data estate by offering the ability to govern centrally and automating data discovery, classification and policy enforcement. While other commercial catalog systems also build a global catalog, Purview is unique in its support for policies. It is also distinguished by covering both structured and unstructured data, thanks to its deep integration with Office 365 and its governance framework; indeed, \"Microsoft Purview\" represents a new unified offering that combines Office 365 governance and what was formerly a service for governing structured data called \"Azure Purview\".<\/jats:p>\n          <jats:p>By integrating with Office 365's Rights Management Service, Purview offers central governance over structured data stored in databases and stores, reports in systems such as Power BI, as well as document data stored in Office 365. The Purview vision is to make the metadata in the Data Map increasingly richer through further automation and curation support and to use this 360 degree view of the data estate to support a wide range of governance policies, ranging from access control to lifecycle management (e.g., retention, deletion, restricting data movement). This paper covers the design and implementation challenges in building the Purview service for Attribute-Based Access Control (ABAC) policies, focusing specifically on a detailed description of its integration with Azure SQL Database. We illustrate the power of unifying Office 365 governance with structured data governance through Purview policies that enforce consistent access control even as data flows between Office 365 and structured data engines like Azure SQL Database. We also describe the results of our empirical evaluation of the performance overheads imposed by Purview.<\/jats:p>","DOI":"10.14778\/3611540.3611552","type":"journal-article","created":{"date-parts":[[2023,9,15]],"date-time":"2023-09-15T11:32:37Z","timestamp":1694777557000},"page":"3624-3635","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Microsoft Purview: A System for Central Governance of Data"],"prefix":"10.14778","volume":"16","author":[{"given":"Shafi","family":"Ahmad","sequence":"first","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dillidorai","family":"Arumugam","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Srdan","family":"Bozovic","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Elnata","family":"Degefa","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sailesh","family":"Duvvuri","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Steven","family":"Gott","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nitish","family":"Gupta","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Joachim","family":"Hammer","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nivedita","family":"Kaluskar","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Raghav","family":"Kaushik","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rakesh","family":"Khanduja","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Prasad","family":"Mujumdar","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gaurav","family":"Malhotra","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pankaj","family":"Naik","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Nikolas","family":"Ogg","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Krishna Kumar","family":"Parthasarthy","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Raghu","family":"Ramakrishnan","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vlad","family":"Rodriguez","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Rahul","family":"Sharma","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jakub","family":"Szymaszek","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Andreas","family":"Wolter","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Alation Data Catalog and Data Governance 2021. https:\/\/www.alation.com\/."},{"key":"e_1_2_1_2_1","unstructured":"Amazon Web Services Identity and Access Management 2023. https:\/\/aws.amazon.com\/iam\/."},{"key":"e_1_2_1_3_1","volume-title":"Data Governance and Metadata Framework","author":"Atlas Apache","year":"2021","unstructured":"Apache Atlas: Data Governance and Metadata Framework 2021. https:\/\/atlas.apache.org\/."},{"key":"e_1_2_1_4_1","unstructured":"Apache Ranger 2023. https:\/\/ranger.apache.org\/."},{"key":"e_1_2_1_5_1","unstructured":"Azure Active Directory 2023. https:\/\/azure.microsoft.com\/en-us\/services\/active-directory\/."},{"key":"e_1_2_1_6_1","volume-title":"NoSQL Database","author":"Azure Cosmos","year":"2021","unstructured":"Azure Cosmos DB: NoSQL Database 2021. https:\/\/azure.microsoft.com\/en-us\/services\/cosmos-db\/."},{"key":"e_1_2_1_7_1","unstructured":"Azure Event Hub 2021. https:\/\/azure.microsoft.com\/en-us\/services\/event-hubs\/."},{"key":"e_1_2_1_8_1","volume-title":"A unified data governance solution that maximizes the business value of your data","author":"Purview Azure","year":"2021","unstructured":"Azure Purview: A unified data governance solution that maximizes the business value of your data 2021. https:\/\/azure.microsoft.com\/en-us\/services\/purview\/."},{"key":"e_1_2_1_9_1","volume-title":"The Data Intelligence Cloud","author":"Collibra","year":"2021","unstructured":"Collibra: The Data Intelligence Cloud 2021. https:\/\/www.collibra.com\/us\/en."},{"key":"e_1_2_1_10_1","unstructured":"Databricks Unity 2023. https:\/\/www.databricks.com\/product\/unity-catalog."},{"key":"e_1_2_1_11_1","unstructured":"Google Data Catalog 2021. https:\/\/cloud.google.com\/data-catalog."},{"key":"e_1_2_1_12_1","unstructured":"Google Identity and Access Management 2023. https:\/\/cloud.google.com\/iam."},{"key":"e_1_2_1_13_1","volume-title":"Big SQL","author":"IBM","year":"2021","unstructured":"IBM Db2 Big SQL 2021. https:\/\/www.ibm.com\/docs\/en\/db2-big-sql\/7.1?topic=authorization-ranger."},{"key":"e_1_2_1_14_1","unstructured":"Informatica Enterprise Data Catalog 2021. https:\/\/www.informatica.com\/products\/data-catalog\/enterprise-data-catalog.html."},{"key":"e_1_2_1_15_1","unstructured":"Microsoft Excel 2023. https:\/\/www.microsoft.com\/en-us\/microsoft-365\/excel."},{"key":"e_1_2_1_16_1","unstructured":"Microsoft PowerBI 2023. https:\/\/powerbi.microsoft.com\/."},{"key":"e_1_2_1_17_1","first-page":"365","year":"2023","unstructured":"Office 365 2023. https:\/\/www.office.com.","journal-title":"Office"},{"key":"e_1_2_1_18_1","unstructured":"Open Policy Agent 2021. https:\/\/www.openpolicyagent.org\/."},{"key":"e_1_2_1_19_1","unstructured":"TPC-C Benchmark 2019. http:\/\/www.tpc.org\/tpcc\/."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3611540.3611552","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T22:33:03Z","timestamp":1757543583000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3611540.3611552"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8]]},"references-count":19,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2023,8]]}},"alternative-id":["10.14778\/3611540.3611552"],"URL":"https:\/\/doi.org\/10.14778\/3611540.3611552","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,8]]},"assertion":[{"value":"2023-08-01","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}