{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,25]],"date-time":"2025-06-25T13:33:52Z","timestamp":1750858432005},"reference-count":20,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:p>\n            We demonstrate\n            <jats:italic>Pipemizer<\/jats:italic>\n            , an optimizer and recommender aimed at improving the performance of queries or jobs in pipelines. These job pipelines are ubiquitous in modern data analytics due to jobs reading output files written by other jobs. Given that more than 650k jobs run on Microsoft's SCOPE job service per day and about 70% have inter-job dependencies, identifying optimization opportunities across query jobs is of considerable interest to both cluster operators and users.\n            <jats:italic>Pipemizer<\/jats:italic>\n            addresses this need by providing recommendations to users, allowing users to understand their system, and facilitating automated application of recommendations.\n            <jats:italic>Pipemizer<\/jats:italic>\n            introduces novel optimizations that include holistic pipeline-aware statistics generation, inter-job operator push-up, and job split &amp; merge. This demonstration showcases optimizations and recommendations generated by\n            <jats:italic>Pipemizer<\/jats:italic>\n            , enabling users to understand and optimize job pipelines.\n          <\/jats:p>","DOI":"10.14778\/3554821.3554881","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T22:28:39Z","timestamp":1664490519000},"page":"3710-3713","source":"Crossref","is-referenced-by-count":5,"title":["Pipemizer"],"prefix":"10.14778","volume":"15","author":[{"given":"Sunny","family":"Gakhar","sequence":"first","affiliation":[{"name":"Microsoft"}]},{"given":"Joyce","family":"Cahoon","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Wangchao","family":"Le","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Xiangnan","family":"Li","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Kaushik","family":"Ravichandran","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Hiren","family":"Patel","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Marc","family":"Friedman","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Brandon","family":"Haynes","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Shi","family":"Qiao","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Alekh","family":"Jindal","sequence":"additional","affiliation":[{"name":"Microsoft"}]},{"given":"Jyoti","family":"Leeka","sequence":"additional","affiliation":[{"name":"Microsoft"}]}],"member":"320","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Apache Airflow. https:\/\/airflow.apache.org\/.  [n.d.]. Apache Airflow. https:\/\/airflow.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. Asimov Windows Telemetry. https:\/\/mywindowshub.com\/microsoft-uses-real-time-telemetry-asimov-build-test-update-windows-9\/.  [n.d.]. Asimov Windows Telemetry. https:\/\/mywindowshub.com\/microsoft-uses-real-time-telemetry-asimov-build-test-update-windows-9\/."},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. AWS Data Pipeline. https:\/\/aws.amazon.com\/datapipeline\/.  [n.d.]. AWS Data Pipeline. https:\/\/aws.amazon.com\/datapipeline\/."},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. Azure Data Factory. https:\/\/metaflow.org\/.  [n.d.]. Azure Data Factory. https:\/\/metaflow.org\/."},{"key":"e_1_2_1_5_1","volume-title":"CIDR 2021 Keynote by Benoit Dageville, Snowflake Co-Founder President of Products. http:\/\/cidrdb.org\/cidr2021\/keynotespeakers.html.","unstructured":"[n.d.]. CIDR 2021 Keynote by Benoit Dageville, Snowflake Co-Founder President of Products. http:\/\/cidrdb.org\/cidr2021\/keynotespeakers.html. [n.d.]. CIDR 2021 Keynote by Benoit Dageville, Snowflake Co-Founder President of Products. http:\/\/cidrdb.org\/cidr2021\/keynotespeakers.html."},{"key":"e_1_2_1_6_1","unstructured":"[n.d.]. Dagster. https:\/\/dagster.io\/.  [n.d.]. Dagster. https:\/\/dagster.io\/."},{"key":"e_1_2_1_7_1","unstructured":"[n.d.]. Google Dataflow. https:\/\/cloud.google.com\/dataflow.  [n.d.]. Google Dataflow. https:\/\/cloud.google.com\/dataflow."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Sanjay Agrawal et al. 2006. Automatic physical design tuning: workload as a sequence. In SIGMOD.  Sanjay Agrawal et al. 2006. Automatic physical design tuning: workload as a sequence. In SIGMOD.","DOI":"10.1145\/1142473.1142549"},{"key":"e_1_2_1_9_1","unstructured":"Josep Aguilar-Saborit etal 2020. POLARIS: the distributed SQL engine in azure synapse. PVLDB (2020).  Josep Aguilar-Saborit et al. 2020. POLARIS: the distributed SQL engine in azure synapse. PVLDB (2020)."},{"key":"e_1_2_1_10_1","volume-title":"ACM SIGMOD Record","author":"Chaudhuri Surajit","year":"1998","unstructured":"Surajit Chaudhuri and Vivek Narasayya . 1998. AutoAdmin \"what-if\" index analysis utility. ACM SIGMOD Record ( 1998 ). Surajit Chaudhuri and Vivek Narasayya. 1998. AutoAdmin \"what-if\" index analysis utility. ACM SIGMOD Record (1998)."},{"key":"e_1_2_1_11_1","unstructured":"Andrew Chung et al. 2020. Unearthing inter-job dependencies for better cluster scheduling. In OSDI 20.  Andrew Chung et al. 2020. Unearthing inter-job dependencies for better cluster scheduling. In OSDI 20."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(03)00031-X"},{"key":"e_1_2_1_13_1","unstructured":"Per-Olof Fj\u00e4llstr\u00f6m. 1998. Algorithms for graph partitioning: A survey.  Per-Olof Fj\u00e4llstr\u00f6m. 1998. Algorithms for graph partitioning: A survey."},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the ACM Symposium on Cloud Computing. 416--427","author":"Alekh","unstructured":"Alekh Jindal et al. 2019. Peregrine: Workload optimization for cloud query engines . In Proceedings of the ACM Symposium on Cloud Computing. 416--427 . Alekh Jindal et al. 2019. Peregrine: Workload optimization for cloud query engines. In Proceedings of the ACM Symposium on Cloud Computing. 416--427."},{"key":"e_1_2_1_15_1","unstructured":"Alekh Jindal et al. 2021. Production Experiences from Computation Reuse at Microsoft.. In EDBT. 623--634.  Alekh Jindal et al. 2021. Production Experiences from Computation Reuse at Microsoft.. In EDBT. 623--634."},{"key":"e_1_2_1_16_1","volume-title":"Microsoft: Over a Decade of Progress and a Decade to Look Forward. PVLDB 14, 12","author":"Conor Power","year":"2021","unstructured":"Conor Power et al. 2021 . The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward. PVLDB 14, 12 (2021). Conor Power et al. 2021. The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward. PVLDB 14, 12 (2021)."},{"key":"e_1_2_1_17_1","unstructured":"Prasan Roy et al. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD.  Prasan Roy et al. 2000. Efficient and extensible algorithms for multi query optimization. In SIGMOD."},{"key":"e_1_2_1_18_1","volume-title":"Multi-query optimization in mapreduce framework. PVLDB","author":"Wang Guoping","year":"2013","unstructured":"Guoping Wang and Chee-Yong Chan . 2013. Multi-query optimization in mapreduce framework. PVLDB ( 2013 ). Guoping Wang and Chee-Yong Chan. 2013. Multi-query optimization in mapreduce framework. PVLDB (2013)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Doris Xin et al. 2021. Production machine learning pipelines: Empirical analysis and optimization opportunities. In SIGMOD.  Doris Xin et al. 2021. Production machine learning pipelines: Empirical analysis and optimization opportunities. In SIGMOD.","DOI":"10.1145\/3448016.3457566"},{"key":"e_1_2_1_20_1","unstructured":"Yiwen Zhu et al. 2021. Phoebe: a learning-based checkpoint optimizer. PVLDB (2021).  Yiwen Zhu et al. 2021. Phoebe: a learning-based checkpoint optimizer. PVLDB (2021)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3554821.3554881","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:35:32Z","timestamp":1672227332000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3554821.3554881"}},"subtitle":["an optimizer for analytics data pipelines"],"short-title":[],"issued":{"date-parts":[[2022,8]]},"references-count":20,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["10.14778\/3554821.3554881"],"URL":"https:\/\/doi.org\/10.14778\/3554821.3554881","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,8]]}}}