{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T10:35:52Z","timestamp":1777631752101,"version":"3.51.4"},"reference-count":89,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,7]]},"abstract":"<jats:p>The twenty-first century has been dominated by the need for large scale data processing, marking the birth of big data platforms such as Cosmos. This paper describes the evolution of the exabyte-scale Cosmos big data platform at Microsoft; our journey right from scale and reliability all the way to efficiency and usability, and our next steps towards improving security, compliance, and support for heterogeneous analytics scenarios. We discuss how the evolution of Cosmos parallels the evolution of the big data field, and how the changes in the Cosmos workloads over time parallel the changing requirements of users across industry.<\/jats:p>","DOI":"10.14778\/3476311.3476390","type":"journal-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T22:48:43Z","timestamp":1635461323000},"page":"3148-3161","source":"Crossref","is-referenced-by-count":10,"title":["The cosmos big data platform at Microsoft"],"prefix":"10.14778","volume":"14","author":[{"given":"Conor","family":"Power","sequence":"first","affiliation":[{"name":"Microsoft"}]},{"given":"Hiren","family":"Patel","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Alekh","family":"Jindal","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Jyoti","family":"Leeka","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Bob","family":"Jenkins","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Michael","family":"Rys","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Ed","family":"Triou","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Dexin","family":"Zhu","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Lucky","family":"Katahanas","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Chakrapani Bhat","family":"Talapady","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Joshua","family":"Rowe","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Fan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Rich","family":"Draves","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Marc","family":"Friedman","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Ivan Santa Maria","family":"Filho","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Amrish","family":"Kumar","sequence":"additional","affiliation":[{"name":"Microsoft"}]}],"member":"320","published-online":{"date-parts":[[2021,10,28]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2845915"},{"key":"e_1_2_1_2_1","volume-title":"Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, et al.","author":"Agrawal Ashvin","year":"2019"},{"key":"e_1_2_1_3_1","unstructured":"Amazon. 2017. Amazon Athena. https:\/\/docs.amazonaws.cn\/en_us\/athena\/latest\/APIReference\/athena-api.pdf.  Amazon. 2017. Amazon Athena. https:\/\/docs.amazonaws.cn\/en_us\/athena\/latest\/APIReference\/athena-api.pdf."},{"key":"e_1_2_1_4_1","volume-title":"Experiences with using data cleaning technology for bing services. Data Engineering Bulletin","author":"Arasu Arvind","year":"2012"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415560"},{"key":"e_1_2_1_6_1","unstructured":"Michael Armbrust Ali Ghodsi Reynold Xin and Matei Zaharia. [n.d.]. Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. ([n. d.]).  Michael Armbrust Ali Ghodsi Reynold Xin and Matei Zaharia. [n.d.]. Lake-house: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics. ([n. d.])."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/3485849.3485858"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415573"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2612669.2612702"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1972457.1972472"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685071"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043571"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735503"},{"key":"e_1_2_1_14_1","first-page":"51","article-title":"Trill: Engineering a Library for Diverse Analytics","volume":"38","author":"Chandramouli Badrish","year":"2015","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.55"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3488766.3488834"},{"key":"e_1_2_1_17_1","volume-title":"Retrieved","author":"Clarke Gavin","year":"2008"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3323234.3323250"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2835776.2835820"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732981"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1251254.1251264"},{"key":"e_1_2_1_22_1","unstructured":"Dremio. 2021. Dremio. https:\/\/www.dremio.com\/data-lake\/.  Dremio. 2021. Dremio. https:\/\/www.dremio.com\/data-lake\/."},{"key":"e_1_2_1_23_1","volume-title":"Retrieved","author":"Foley Mary Jo","year":"2009"},{"key":"e_1_2_1_24_1","unstructured":".NET Foundation. 2020. .NET for Apache Spark. https:\/\/github.com\/dotnet\/spark.  .NET Foundation. 2020. .NET for Apache Spark. https:\/\/github.com\/dotnet\/spark."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2482626.2482660"},{"key":"e_1_2_1_27_1","unstructured":"Google. 2015. Google Cloud Dataflow. https:\/\/cloud.google.com\/dataflow\/.  Google. 2015. Google Cloud Dataflow. https:\/\/cloud.google.com\/dataflow\/."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3026877.3026885"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3269206.3271747"},{"key":"e_1_2_1_30_1","unstructured":"Apache Hadoop. 2005. https:\/\/hadoop.apache.org.  Apache Hadoop. 2005. https:\/\/hadoop.apache.org."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2011.113"},{"key":"e_1_2_1_32_1","first-page":"261","article-title":"Starfish: A Self-tuning System for Big Data Analytics","volume":"11","author":"Herodotou Herodotos","year":"2011","journal-title":"Cidr"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1243418.1243426"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272998.1273005"},{"key":"e_1_2_1_35_1","unstructured":"Alekh Jindal K Venkatesh Emani Maureen Daum Olga Poppe Brandon Haynes Anna Pavlenko Ayushi Gupta Karthik Ramachandra Carlo Curino Andreas Mueller et al. [n.d.]. Magpie: Python at Speed and Scale using Cloud Backends. ([n.d.]).  Alekh Jindal K Venkatesh Emani Maureen Daum Olga Poppe Brandon Haynes Anna Pavlenko Ayushi Gupta Karthik Ramachandra Carlo Curino Andreas Mueller et al. [n.d.]. Magpie: Python at Speed and Scale using Cloud Backends. ([n.d.])."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192971"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190656"},{"key":"e_1_2_1_38_1","volume-title":"Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft. In ICDE.","author":"Jindal Alekh","year":"2021"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352130"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368292"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2882940"},{"key":"e_1_2_1_42_1","volume-title":"Sagedb: A learned database system. In CIDR.","author":"Kraska Tim","year":"2019"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/2792426"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367516"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368299"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2481244.2481247"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987564"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987564"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183751"},{"key":"e_1_2_1_50_1","unstructured":"Microsoft. 2015. Azure Data Lake. https:\/\/azure.github.io\/AzureDataLake\/.  Microsoft. 2015. Azure Data Lake. https:\/\/azure.github.io\/AzureDataLake\/."},{"key":"e_1_2_1_51_1","unstructured":"Microsoft. 2016. U-SQL. http:\/\/usql.io.  Microsoft. 2016. U-SQL. http:\/\/usql.io."},{"key":"e_1_2_1_52_1","unstructured":"Microsoft. 2016. U-SQL Release Notes. https:\/\/github.com\/Azure\/AzureDataLake\/tree\/master\/docs\/Release_Notes.  Microsoft. 2016. U-SQL Release Notes. https:\/\/github.com\/Azure\/AzureDataLake\/tree\/master\/docs\/Release_Notes."},{"key":"e_1_2_1_53_1","unstructured":"Microsoft. 2017. U-SQL Data Definition Language. https:\/\/docs.microsoft.com\/en-us\/u-sql\/data-definition-language-ddl-statements.  Microsoft. 2017. U-SQL Data Definition Language. https:\/\/docs.microsoft.com\/en-us\/u-sql\/data-definition-language-ddl-statements."},{"key":"e_1_2_1_54_1","unstructured":"Microsoft. 2017. U-SQL Language Reference. https:\/\/docs.microsoft.com\/en-us\/u-sql\/.  Microsoft. 2017. U-SQL Language Reference. https:\/\/docs.microsoft.com\/en-us\/u-sql\/."},{"key":"e_1_2_1_55_1","unstructured":"Microsoft. 2018. Azure RSL. https:\/\/github.com\/Azure\/RSL.  Microsoft. 2018. Azure RSL. https:\/\/github.com\/Azure\/RSL."},{"key":"e_1_2_1_56_1","unstructured":"Microsoft. 2018. IntelliSense. https:\/\/docs.microsoft.com\/en-us\/visualstudio\/ide\/using-intellisense?view=vs-2019.  Microsoft. 2018. IntelliSense. https:\/\/docs.microsoft.com\/en-us\/visualstudio\/ide\/using-intellisense?view=vs-2019."},{"key":"e_1_2_1_57_1","unstructured":"Microsoft. 2021. Azure Synapse Analytics. https:\/\/azure.microsoft.com\/en-in\/services\/synapse-analytics\/.  Microsoft. 2021. Azure Synapse Analytics. https:\/\/azure.microsoft.com\/en-in\/services\/synapse-analytics\/."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303973"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231759"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989444"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415547"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2168836.2168857"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.14778\/3339490.3339495"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056100"},{"key":"e_1_2_1_66_1","volume-title":"Patent 7,840,585","author":"Ramsey W.D.","year":"2010"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/2433396.2433486"},{"key":"e_1_2_1_68_1","unstructured":"Michael Rys. 2015. Introducing U-SQL - A Language that makes Big Data Processing Easy. https:\/\/devblogs.microsoft.com\/visualstudio\/introducing-u-sql-a-language-that-makes-big-data-processing-easy\/.  Michael Rys. 2015. Introducing U-SQL - A Language that makes Big Data Processing Easy. https:\/\/devblogs.microsoft.com\/visualstudio\/introducing-u-sql-a-language-that-makes-big-data-processing-easy\/."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/QRS-C.2017.99"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415554"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357223.3362716"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380584"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.106"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.5555\/3323298.3323321"},{"key":"e_1_2_1_75_1","unstructured":"Snowflake. 2021. Snowflake Data Cloud. https:\/\/www.snowflake.com\/.  Snowflake. 2021. Snowflake Data Cloud. https:\/\/www.snowflake.com\/."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463707"},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807278"},{"key":"e_1_2_1_78_1","unstructured":"New York Times. 2009. A Deluge of Data Shapes a New Era in Computing. https:\/\/cacm.acm.org\/news\/54396-a-deluge-of-data-shapes-a-new-era-in-computing\/fulltext.  New York Times. 2009. A Deluge of Data Shapes a New Era in Computing. https:\/\/cacm.acm.org\/news\/54396-a-deluge-of-data-shapes-a-new-era-in-computing\/fulltext."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064029"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523616.2523633"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.14778\/3291264.3291267"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304510.2304514"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733022"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.14778\/3192965.3192967"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.5555\/1863103.1863113"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213839"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447802"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457569"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3476311.3476390","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:40:37Z","timestamp":1672227637000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3476311.3476390"}},"subtitle":["over a decade of progress and a decade to look forward"],"short-title":[],"issued":{"date-parts":[[2021,7]]},"references-count":89,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["10.14778\/3476311.3476390"],"URL":"https:\/\/doi.org\/10.14778\/3476311.3476390","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,7]]}}}