{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T05:02:20Z","timestamp":1755838940331,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2019,11,30]]},"abstract":"<jats:p>\n            We present the design, implementation, and evaluation of\n            <jats:italic>INSTalytics<\/jats:italic>\n            , a co-designed stack of a cluster file system and the compute layer, for efficient big-data analytics in large-scale data centers.\n            <jats:italic>INSTalytics<\/jats:italic>\n            amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension,\n            <jats:italic>INSTalytics<\/jats:italic>\n            enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle.\n          <\/jats:p>\n          <jats:p>\n            To achieve this,\n            <jats:italic>INSTalytics<\/jats:italic>\n            uses compute-awareness to customize the three-way replication that the cluster file system employs for availability. A new heterogeneous replication layout enables\n            <jats:italic>INSTalytics<\/jats:italic>\n            to preserve the same recovery cost and availability as traditional replication.\n            <jats:italic>INSTalytics<\/jats:italic>\n            also uses compute-awareness to expose a new\n            <jats:italic>sliced-read<\/jats:italic>\n            API that improves performance of joins by enabling multiple compute nodes to read slices of a data block efficiently via co-ordinated request scheduling and selective caching at the storage nodes.\n          <\/jats:p>\n          <jats:p>\n            We have built a prototype implementation of\n            <jats:italic>INSTalytics<\/jats:italic>\n            in a production analytics stack, and we show that recovery performance and availability is similar to physical replication, while providing significant improvements in query performance, suggesting a new approach to designing cloud-scale big-data analytics systems.\n          <\/jats:p>","DOI":"10.1145\/3369738","type":"journal-article","created":{"date-parts":[[2020,4,4]],"date-time":"2020-04-04T10:48:51Z","timestamp":1585997331000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["INSTalytics"],"prefix":"10.1145","volume":"15","author":[{"given":"Muthian","family":"Sivathanu","sequence":"first","affiliation":[{"name":"Microsoft Research India, Bangalore, Karnataka"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Midhul","family":"Vuppalapati","sequence":"additional","affiliation":[{"name":"Microsoft Research India, Bangalore, Karnataka"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bhargav S.","family":"Gulavani","sequence":"additional","affiliation":[{"name":"Microsoft Research India, Bangalore, Karnataka"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kaushik","family":"Rajan","sequence":"additional","affiliation":[{"name":"Microsoft Research India, Bangalore, Karnataka"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jyoti","family":"Leeka","sequence":"additional","affiliation":[{"name":"Microsoft Research India, Bangalore, Karnataka"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jayashree","family":"Mohan","sequence":"additional","affiliation":[{"name":"University of Texas-Austin, Austin, Texas"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Piyus","family":"Kedia","sequence":"additional","affiliation":[{"name":"Indraprastha Institute of Information Technology Delhi, New Delhi, Delhi"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,1,16]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"AMPLab. [n.d.]. AMP big-data benchmark. Retrieved from https:\/\/amplab.cs.berkeley.edu\/benchmark\/.  AMPLab. [n.d.]. AMP big-data benchmark. Retrieved from https:\/\/amplab.cs.berkeley.edu\/benchmark\/."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_2_1_3_1","volume-title":"Arpaci-Dusseau","author":"Arpaci-Dusseau Andrea C.","year":"2001","unstructured":"Andrea C. Arpaci-Dusseau and Remzi H . Arpaci-Dusseau . 2001 . Information and control in gray-box systems. In ACM SIGOPS Operating Systems Review, Vol. 35 . ACM , 43--56. Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau. 2001. Information and control in gray-box systems. In ACM SIGOPS Operating Systems Review, Vol. 35. ACM, 43--56."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190532"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.364531"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201914)","author":"Boutin Eric","year":"2014","unstructured":"Eric Boutin , Jaliya Ekanayake , Wei Lin , Bing Shi , Jingren Zhou , Zhengping Qian , Ming Wu , and Lidong Zhou . 2014 . Apollo: Scalable and coordinated scheduling for cloud-scale computing . In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201914) . 285--300. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for cloud-scale computing. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201914). 285--300."},{"key":"e_1_2_1_7_1","volume-title":"Solom Heddaya et al","author":"Curino Carlo","year":"2019","unstructured":"Carlo Curino , Subru Krishnan , Konstantinos Karanasos , Sriram Rao , Giovanni M. Fumarola , Botong Huang , Kishore Chaliparambil , Arun Suresh , Young Chen , Solom Heddaya et al . 2019 . Hydra : A federated resource manager for data-center scale analytics. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. USENIX Association , 177--191. Carlo Curino, Subru Krishnan, Konstantinos Karanasos, Sriram Rao, Giovanni M. Fumarola, Botong Huang, Kishore Chaliparambil, Arun Suresh, Young Chen, Solom Heddaya et al. 2019. Hydra: A federated resource manager for data-center scale analytics. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 177--191."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920908"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350272"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002943"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX. https:\/\/www.usenix.org\/conference\/osdi10\/availability-globally-distributed-storage-systems.","author":"Ford Daniel","year":"2010","unstructured":"Daniel Ford , Fran\u00e7ois Labelle , Florentina Popovici , Murray Stokely , Van-Anh Truong , Luiz Barroso , Carrie Grimes , and Sean Quinlan . 2010 . Availability in globally distributed storage systems . In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX. https:\/\/www.usenix.org\/conference\/osdi10\/availability-globally-distributed-storage-systems. Daniel Ford, Fran\u00e7ois Labelle, Florentina Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan. 2010. Availability in globally distributed storage systems. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation. USENIX. https:\/\/www.usenix.org\/conference\/osdi10\/availability-globally-distributed-storage-systems."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"volume-title":"Proceedings of the 6th International Conference on Data Engineering. IEEE, 456--465","author":"Hsiao H.-I.","key":"e_1_2_1_14_1","unstructured":"H.-I. Hsiao and David J . DeWitt. 1990. Chained declustering: A new availability strategy for multiprocessor database machines . In Proceedings of the 6th International Conference on Data Engineering. IEEE, 456--465 . H.-I. Hsiao and David J. DeWitt. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. In Proceedings of the 6th International Conference on Data Engineering. IEEE, 456--465."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIXATC\u201912)","author":"Huang Cheng","year":"2012","unstructured":"Cheng Huang , Huseyin Simitci , Yikang Xu , Aaron Ogus , Brad Calder , Parikshit Gopalan , Jin Li , and Sergey Yekhanin . 2012 . Erasure coding in windows azure storage . In Proceedings of the USENIX Annual Technical Conference (USENIXATC\u201912) . USENIX, Boston, MA, 15--26. Retrieved from https:\/\/www.usenix.org\/conference\/atc12\/technical-sessions\/presentation\/huang. Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the USENIX Annual Technical Conference (USENIXATC\u201912). USENIX, Boston, MA, 15--26. Retrieved from https:\/\/www.usenix.org\/conference\/atc12\/technical-sessions\/presentation\/huang."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2486001.2486019"},{"key":"e_1_2_1_18_1","volume-title":"Thekkath","author":"Lee Edward K.","year":"1996","unstructured":"Edward K. Lee and Chandramohan A . Thekkath . 1996 . Petal : Distributed virtual disks. In ACM SIGPLAN Notices, Vol. 31 . ACM , 84--92. Edward K. Lee and Chandramohan A. Thekkath. 1996. Petal: Distributed virtual disks. In ACM SIGPLAN Notices, Vol. 31. ACM, 84--92."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/1920841.1920886"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1559845.1559865"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056100"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0093-1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2619239.2626325"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/358818.358822"},{"volume-title":"Proceedings of the Symposium on Cloud Computing (SoCC\u201917)","author":"Shanbhag Anil","key":"e_1_2_1_25_1","unstructured":"Anil Shanbhag , Alekh Jindal , Samuel Madden , Jorge Quiane , and Aaron J. Elmore . 2017. A robust partitioning scheme for ad-hoc query workloads . In Proceedings of the Symposium on Cloud Computing (SoCC\u201917) . ACM, New York, NY, 229--241. DOI:https:\/\/doi.org\/10.1145\/3127479.3131613 10.1145\/3127479.3131613 Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, and Aaron J. Elmore. 2017. A robust partitioning scheme for ad-hoc query workloads. In Proceedings of the Symposium on Cloud Computing (SoCC\u201917). ACM, New York, NY, 229--241. DOI:https:\/\/doi.org\/10.1145\/3127479.3131613"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"volume-title":"Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03)","author":"Sivathanu Muthian","key":"e_1_2_1_27_1","unstructured":"Muthian Sivathanu , Vijayan Prabhakaran , Florentina I. Popovici , Timothy E. Denehy , Andrea C. Arpaci-Dusseau , and Remzi H . Arpaci-Dusseau. 2003. Semantically smart disk systems . In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03) . USENIX Association, 73--88. http:\/\/dl.acm.org\/citation.cfm?id&equals;1090694.1090702. Muthian Sivathanu, Vijayan Prabhakaran, Florentina I. Popovici, Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2003. Semantically smart disk systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST'03). USENIX Association, 73--88. http:\/\/dl.acm.org\/citation.cfm?id&equals;1090694.1090702."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733004.2733044"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the USENIX Annual Technical Conference. 337--350","author":"Tai Amy","year":"2016","unstructured":"Amy Tai , Michael Wei , Michael J. Freedman , Ittai Abraham , and Dahlia Malkhi . 2016 . Replex: A scalable, highly available multi-index data store . In Proceedings of the USENIX Annual Technical Conference. 337--350 . Amy Tai, Michael Wei, Michael J. Freedman, Ittai Abraham, and Dahlia Malkhi. 2016. Replex: A scalable, highly available multi-index data store. In Proceedings of the USENIX Annual Technical Conference. 337--350."},{"key":"e_1_2_1_30_1","volume-title":"Lee","author":"Thekkath Chandramohan A.","year":"1997","unstructured":"Chandramohan A. Thekkath , Timothy Mann , and Edward K . Lee . 1997 . Frangipani : A Scalable Distributed File System. Vol. 31 . ACM. Chandramohan A. Thekkath, Timothy Mann, and Edward K. Lee. 1997. Frangipani: A Scalable Distributed File System. Vol. 31. ACM."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 13th European Conference on Computer Systems (EuroSys\u201918)","author":"Zhang Haoyu","year":"1905","unstructured":"Haoyu Zhang , Brian Cho , Ergin Seyfe , Avery Ching , and Michael J. Freedman . 2018. Riffle: Optimized shuffle service for large-scale data analytics . In Proceedings of the 13th European Conference on Computer Systems (EuroSys\u201918) . ACM, New York, NY. DOI:https:\/\/doi.org\/10.1145\/3 1905 08.3190534 10.1145\/3190508.3190534 Haoyu Zhang, Brian Cho, Ergin Seyfe, Avery Ching, and Michael J. Freedman. 2018. Riffle: Optimized shuffle service for large-scale data analytics. In Proceedings of the 13th European Conference on Computer Systems (EuroSys\u201918). ACM, New York, NY. DOI:https:\/\/doi.org\/10.1145\/3190508.3190534"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-012-0280-z"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE\u201910)","author":"Zhou J.","year":"2010","unstructured":"J. Zhou , P. A. Larson , and R. Chaiken . 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer . In Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE\u201910) . 1060--1071. DOI:https:\/\/doi.org\/10.1109\/ICDE. 2010 .5447802 10.1109\/ICDE.2010.5447802 J. Zhou, P. A. Larson, and R. Chaiken. 2010. Incorporating partitioning and parallel plans into the SCOPE optimizer. In Proceedings of the IEEE 26th International Conference on Data Engineering (ICDE\u201910). 1060--1071. DOI:https:\/\/doi.org\/10.1109\/ICDE.2010.5447802"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369738","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3369738","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:08Z","timestamp":1750200068000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3369738"}},"subtitle":["Cluster Filesystem Co-design for Big-data Analytics"],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":35,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,11,30]]}},"alternative-id":["10.1145\/3369738"],"URL":"https:\/\/doi.org\/10.1145\/3369738","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"type":"print","value":"1553-3077"},{"type":"electronic","value":"1553-3093"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"2019-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-01-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}