{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T02:51:34Z","timestamp":1774320694638,"version":"3.50.1"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2020,8]]},"abstract":"<jats:p>Helios is a distributed, highly-scalable system used at Microsoft for flexible ingestion, indexing, and aggregation of large streams of real-time data that is designed to plug into relational engines. The system collects close to a quadrillion events indexing approximately 16 trillion search keys per day from hundreds of thousands of machines across tens of data centers around the world. Helios use cases within Microsoft include debugging\/diagnostics in both public and government clouds, workload characterization, cluster health monitoring, deriving business insights and performing impact analysis of incidents in other large-scale systems such as Azure Data Lake and Cosmos. Helios also serves as a reference blueprint for other large-scale systems within Microsoft. We present the simple data model behind Helios, which offers great flexibility and control over costs, and enables the system to asynchronously index massive streams of data. We also present our experiences in building and operating Helios over the last five years at Microsoft.<\/jats:p>","DOI":"10.14778\/3415478.3415547","type":"journal-article","created":{"date-parts":[[2020,9,14]],"date-time":"2020-09-14T18:46:46Z","timestamp":1600109206000},"page":"3231-3244","source":"Crossref","is-referenced-by-count":15,"title":["Helios"],"prefix":"10.14778","volume":"13","author":[{"given":"Rahul","family":"Potharaju","sequence":"first","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Terry","family":"Kim","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Wentao","family":"Wu","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Vidip","family":"Acharya","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Steve","family":"Suh","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Andrew","family":"Fogarty","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Apoorve","family":"Dave","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Sinduja","family":"Ramanujam","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Tomas","family":"Talius","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Lev","family":"Novik","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]},{"given":"Raghu","family":"Ramakrishnan","sequence":"additional","affiliation":[{"name":"Microsoft Corporation"}]}],"member":"320","published-online":{"date-parts":[[2020,8]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Azure sql data warehouse. https:\/\/docs.microsoft.com\/en-us\/azure\/sql-data-warehouse\/."},{"key":"e_1_2_1_2_1","unstructured":"Event hubs. https:\/\/azure.microsoft.com\/en-us\/services\/event-hubs\/."},{"key":"e_1_2_1_3_1","unstructured":"Hyperspace. https:\/\/github.com\/microsoft\/hyperspace."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0095-z"},{"key":"e_1_2_1_5_1","first-page":"496","volume-title":"VLDB","author":"Agrawal S.","year":"2000","unstructured":"S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in SQL databases. In VLDB, pages 496--505, 2000."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007609"},{"key":"e_1_2_1_7_1","volume-title":"Storage infrastructure behind Facebook messages: Using HBase at scale","author":"Aiyer A. S.","year":"2012","unstructured":"A. S. Aiyer, M. Bautin, G. J. Chen, P. Damania, P. Khemani, K. Muthukkaruppan, K. Ranganathan, N. Spiegelberg, L. Tang, and M. Vaidya. Storage infrastructure behind Facebook messages: Using HBase at scale. IEEE Data Eng. Bull., 2012."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536229"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824076"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190664"},{"key":"e_1_2_1_11_1","first-page":"1383","volume-title":"SIGMOD","author":"Armbrust M.","year":"2015","unstructured":"M. Armbrust et al. Spark sql: Relational data processing in spark. In SIGMOD, pages 1383--1394, 2015."},{"key":"e_1_2_1_12_1","volume-title":"ACM SIGMOD","author":"Bailis P.","year":"2017","unstructured":"P. Bailis, E. Gan, K. Rong, and S. Suri. MacroBase, A Fast Data Analysis Engine. In ACM SIGMOD, 2017."},{"key":"e_1_2_1_13_1","volume-title":"CIDR","author":"Baker J.","year":"2011","unstructured":"J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, 2011."},{"key":"e_1_2_1_14_1","first-page":"9","volume-title":"CIDR","volume":"11","author":"Bernstein P. A.","year":"2011","unstructured":"P. A. Bernstein, C. W. Reid, and S. Das. Hyder-a transactional record manager for shared flash. In CIDR, volume 11, pages 9--20, 2011."},{"key":"e_1_2_1_15_1","volume-title":"Thrill: High-performance algorithmic distributed batch data processing with c++","author":"Bingmann T.","year":"2016","unstructured":"T. Bingmann, M. Axtmann, E. J\u00f6bstl, S. Lamm, H. C. Nguyen, A. Noe, S. Schlag, M. Stumpp, T. Sturm, and P. Sanders. Thrill: High-performance algorithmic distributed batch data processing with c++. In IEEE Big Data, 2016."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038916.2038932"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2043556.2043571"},{"issue":"4","key":"e_1_2_1_18_1","first-page":"28","article-title":"Apache flink\u2122: Stream and batch processing in a single engine","volume":"38","author":"Carbone P.","year":"2015","unstructured":"P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas. Apache flink\u2122: Stream and batch processing in a single engine. IEEE Data Eng. Bull., 38(4):28--38, 2015.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454166"},{"issue":"14","key":"e_1_2_1_20_1","first-page":"1623","article-title":"Quill: Efficient, transferable, and rich analytics at scale","volume":"9","author":"Chandramouli B.","year":"2016","unstructured":"B. Chandramouli, R. C. Fernandez, J. Goldstein, A. Eldawy, and A. Quamar. Quill: Efficient, transferable, and rich analytics at scale. PVLDB, 9(14):1623--1634, 2016.","journal-title":"PVLDB"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735503"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1365815.1365816"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/69.3.653"},{"key":"e_1_2_1_24_1","first-page":"146","volume-title":"VLDB","author":"Chaudhuri S.","year":"1997","unstructured":"S. Chaudhuri and V. R. Narasayya. An efficient cost-driven index selection tool for microsoft SQL server. In VLDB, pages 146--155, 1997."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICAC.2004.1301345"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-007-0092-4"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/356770.356776"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/1454159.1454167"},{"key":"e_1_2_1_29_1","volume-title":"Foundations and Trends in Databases","author":"Cormode G.","year":"2012","unstructured":"G. Cormode, M. Garofalakis, P. J. Haas, and C. Jermaine. Synopses for massive data: Samples, histograms, wavelets, sketches. Foundations and Trends in Databases, 2012."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2008.4497562"},{"key":"e_1_2_1_31_1","volume-title":"What will we do when the world's data hits 163 zettabytes in 2025? https:\/\/goo.gl\/iwsPww","author":"I. D. Corporation","year":"2017","unstructured":"I. D. Corporation. What will we do when the world's data hits 163 zettabytes in 2025? https:\/\/goo.gl\/iwsPww, 2017."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2017.12.001"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1294261.1294281"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/69.50905"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463709"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2463710"},{"key":"e_1_2_1_37_1","volume-title":"How much does 1 hour of downtime cost the average business? https:\/\/goo.gl\/fqqvTW","author":"Fife L.","year":"2017","unstructured":"L. Fife. How much does 1 hour of downtime cost the average business? https:\/\/goo.gl\/fqqvTW, 2017."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/945445.945450"},{"key":"e_1_2_1_39_1","volume-title":"Site reliability engineering. https:\/\/goo.gl\/YwqcQL","year":"2017","unstructured":"Google. Site reliability engineering. https:\/\/goo.gl\/YwqcQL, 2017."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/602259.602266"},{"key":"e_1_2_1_41_1","first-page":"562","volume-title":"VLDB","author":"Hellerstein J. M.","year":"1995","unstructured":"J. M. Hellerstein, J. F. Naughton, and A. Pfeffer. Generalized search trees for database systems. In VLDB, pages 562--573, 1995."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/253260.253272"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_2_1_44_1","volume-title":"Questioning the lambda architecture. https:\/\/goo.gl\/5Es6N9","author":"Kreps J.","year":"2014","unstructured":"J. Kreps. Questioning the lambda architecture. https:\/\/goo.gl\/5Es6N9, 2014."},{"key":"e_1_2_1_45_1","first-page":"1","volume-title":"Proceedings of the NetDB","author":"Kreps J.","year":"2011","unstructured":"J. Kreps, N. Narkhede, J. Rao, et al. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB, pages 1--7, 2011."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1773912.1773922"},{"key":"e_1_2_1_47_1","volume-title":"ACM Transactions on Computer Systems (TOCS), 16(2):133--169","author":"Lamport L.","year":"1998","unstructured":"L. Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133--169, 1998."},{"issue":"4","key":"e_1_2_1_48_1","first-page":"18","article-title":"Paxos made simple","volume":"32","author":"Lamport L.","year":"2001","unstructured":"L. Lamport et al. Paxos made simple. ACM Sigact News, 32(4):18--25, 2001.","journal-title":"ACM Sigact News"},{"key":"e_1_2_1_49_1","volume-title":"CIDR","author":"Levandoski J. J.","year":"2011","unstructured":"J. J. Levandoski, D. B. Lomet, M. F. Mokbel, and K. Zhao. Deuteronomy: Transaction support for cloud data. In CIDR, 2011."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544834"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2674005.2674996"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231765"},{"key":"e_1_2_1_53_1","first-page":"369","volume-title":"CRYPTO","author":"Merkle R. C.","year":"1987","unstructured":"R. C. Merkle. A digital signature based on a conventional encryption function. In CRYPTO, pages 369--378, 1987."},{"key":"e_1_2_1_54_1","volume-title":"https:\/\/goo.gl\/2KkwMv","author":"Compliance GDPR","year":"2017","unstructured":"Microsoft. GDPR Compliance. https:\/\/goo.gl\/2KkwMv, 2017."},{"key":"e_1_2_1_55_1","volume-title":"Rotating devops role improves engineering service quality. https:\/\/goo.gl\/x63caG","year":"2017","unstructured":"Microsoft. Rotating devops role improves engineering service quality. https:\/\/goo.gl\/x63caG, 2017."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2018.2844341"},{"key":"e_1_2_1_57_1","volume-title":"Procs. of SPDC. ACM","author":"Oki B. M.","year":"1988","unstructured":"B. M. Oki and B. H. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Procs. of SPDC. ACM, 1988."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_59_1","volume-title":"CIDR","author":"Pavlo A.","year":"2017","unstructured":"A. Pavlo et al. Self-driving database management systems. In CIDR, 2017."},{"key":"e_1_2_1_60_1","volume-title":"Database management systems (3. ed.)","author":"Ramakrishnan R.","year":"2003","unstructured":"R. Ramakrishnan and J. Gehrke. Database management systems (3. ed.). McGraw-Hill, 2003."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056100"},{"key":"e_1_2_1_62_1","first-page":"193","volume-title":"SSD","author":"Samet H.","year":"1989","unstructured":"H. Samet. Hierarchical spatial data structures. In SSD, pages 193--212, 1989."},{"key":"e_1_2_1_63_1","first-page":"511","volume-title":"Readings in Artificial Intelligence and Databases","author":"Selinger P. G.","year":"1988","unstructured":"P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access path selection in a relational database management system. In Readings in Artificial Intelligence and Databases, pages 511--522. Elsevier, 1988."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595641"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595641"},{"key":"e_1_2_1_67_1","first-page":"91","volume-title":"OSDI","volume":"4","author":"Van Renesse R.","year":"2004","unstructured":"R. Van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, volume 4, pages 91--104, 2004."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3056101"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522737"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3415478.3415547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T02:34:05Z","timestamp":1758076445000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3415478.3415547"}},"subtitle":["hyperscale indexing for the cloud &amp; edge"],"short-title":[],"issued":{"date-parts":[[2020,8]]},"references-count":69,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2020,8]]}},"alternative-id":["10.14778\/3415478.3415547"],"URL":"https:\/\/doi.org\/10.14778\/3415478.3415547","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2020,8]]}}}