{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T05:23:34Z","timestamp":1755926614682,"version":"3.41.0"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"3-4","license":[{"start":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T00:00:00Z","timestamp":1606694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Institute of Information & Communications Technology Planning & Evaluation"},{"name":"Korea government","award":["No.2015-0-00221"],"award-info":[{"award-number":["No.2015-0-00221"]}]},{"name":"BK21 FOUR Intelligence Computing"},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"crossref","award":["4199990214639"],"award-info":[{"award-number":["4199990214639"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2020,11,30]]},"abstract":"<jats:p>Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often require significant effort to use. Second, policy interfaces that extend a high-level application programming model ensure correctness, but do not provide sufficient fine control.<\/jats:p>\n          <jats:p>\n            We describe Apache Nemo, an optimization framework for distributed dataflow processing that provides fine control for high performance and also ensures correctness for ease of use. We combine several techniques to achieve this, including an intermediate representation of dataflow, compiler optimization passes, and runtime extensions. Our evaluation results show that Nemo enables composable and reusable optimizations that bring performance improvements on par with existing specialized runtimes tailored for a specific deployment scenario. Apache Nemo is open-sourced at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/nemo.apache.org\">https:\/\/nemo.apache.org<\/jats:ext-link>\n            as an Apache incubator project.\n          <\/jats:p>","DOI":"10.1145\/3468144","type":"journal-article","created":{"date-parts":[[2021,10,16]],"date-time":"2021-10-16T01:25:55Z","timestamp":1634347555000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Apache Nemo: A Framework for Optimizing Distributed Data Processing"],"prefix":"10.1145","volume":"38","author":[{"given":"Won Wook","family":"Song","sequence":"first","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Youngseok","family":"Yang","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jeongyoon","family":"Eo","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jangho","family":"Seo","sequence":"additional","affiliation":[{"name":"Naver Corporation, Gyeonggi-do, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joo Yeon","family":"Kim","sequence":"additional","affiliation":[{"name":"Samsung Electronics, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sanha","family":"Lee","sequence":"additional","affiliation":[{"name":"Naver Corporation, Gyeonggi-do, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gyewon","family":"Lee","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Taegeon","family":"Um","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haeyoon","family":"Cho","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Byung-Gon","family":"Chun","sequence":"additional","affiliation":[{"name":"Seoul National University, Seoul, Rep. of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,15]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742797"},{"key":"e_1_3_2_3_2","unstructured":"Bert Hubert. 2020. Linux Traffic Control. Retrieved from https:\/\/lartc.org\/manpages\/tc.txt."},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190532"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.68"},{"key":"e_1_3_2_6_2","unstructured":"CAIDA. 2020. The CAIDA Anonymized Internet Traces 2016 Dataset. Retrieved from https:\/\/www.caida.org\/data\/passive\/passive_2016_dataset.xml."},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/69.50905"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2542142.2542143"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595634"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.14778\/2732279.2732281"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/2741948.2741968"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/93597.98720"},{"key":"e_1_3_2_13_2","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation","author":"Guo Zhenyu","year":"2012","unstructured":"Zhenyu Guo, Xuepeng Fan, Rishan Chen, Jiaxing Zhang, Hucheng Zhou, Sean McDirmid, Chang Liu, Wei Lin, Jingren Zhou, and Lidong Zhou. 2012. Spotting code optimizations in data-parallel pipelines through periSCOPE. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201912)."},{"key":"e_1_3_2_14_2","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation","author":"Hindman Benjamin","year":"2011","unstructured":"Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, and Ion Stoica. 2011. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation","author":"Hsieh Kevin","year":"2017","unstructured":"Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, and Onur Mutlu. 2017. Gaia: Geo-distributed machine learning approaching LAN speeds. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/1272996.1273005"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.14778\/1978665.1978670"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/2465351.2465354"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2009.263"},{"key":"e_1_3_2_20_2","volume-title":"Proceedings of the Conference on Innovative Data Systems Research","author":"Kraska Tim","year":"2019","unstructured":"Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Jialin Ding, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A learned database system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR\u201919)."},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1807128.1807140"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213840"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.5555\/977395.977673"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/1538788.1538814"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3267814"},{"key":"e_1_3_2_26_2","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation","author":"Mahajan Kshiteej","year":"2018","unstructured":"Kshiteej Mahajan, Mosharaf Chowdhury, Aditya Akella, and Shuchi Chawla. 2018. Dynamic query re-planning using QOOP. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918)."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-92bf1922-00a"},{"key":"e_1_3_2_28_2","unstructured":"Microsoft. 2020. Dryad Research Prototype. Retrieved from https:\/\/github.com\/MicrosoftResearch\/Dryad."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522738"},{"key":"e_1_3_2_30_2","volume-title":"Proceedings of the Conference on Innovative Data Systems Research","author":"Palkar Shoumik","year":"2017","unstructured":"Shoumik Palkar, James J. Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A common runtime for high performance data analytics. In Proceedings of the Conference on Innovative Data Systems Research (CIDR\u201917)."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787505"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391245"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391233"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391242"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391236"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1990.10475311"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742790"},{"key":"e_1_3_2_39_2","unstructured":"SciPy.org. 2020. NumPy. Retrieved from https:\/\/www.numpy.org."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2901318.2901319"},{"key":"e_1_3_2_41_2","unstructured":"The Apache Software Foundation. 2020. Apache Beam. Retrieved from https:\/\/beam.apache.org."},{"key":"e_1_3_2_42_2","unstructured":"The Apache Software Foundation. 2020. Apache Crail. Retrieved from https:\/\/crail.apache.org\/."},{"key":"e_1_3_2_43_2","unstructured":"The Apache Software Foundation. 2020. Apache Flink. Retrieved from https:\/\/flink.apache.org\/."},{"key":"e_1_3_2_44_2","unstructured":"The Apache Software Foundation. 2020. Apache Hadoop. Retrieved from https:\/\/hadoop.apache.org."},{"key":"e_1_3_2_45_2","unstructured":"The Apache Software Foundation. 2020. Apache Spark. Retrieved from https:\/\/spark.apache.org."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2010.5447738"},{"key":"e_1_3_2_47_2","unstructured":"TPC. 2020. TPC-H. Retrieved from http:\/\/www.tpc.org\/tpch."},{"key":"e_1_3_2_48_2","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation","author":"Viswanathan Raajay","year":"2016","unstructured":"Raajay Viswanathan, Ganesh Ananthanarayanan, and Aditya Akella. 2016. CLARINET: WAN-aware optimization for analytics queries. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201916)."},{"key":"e_1_3_2_49_2","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation","author":"Vulimiri Ashish","year":"2015","unstructured":"Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, and George Varghese. 2015. Global analytics in the face of bandwidth and regulatory constraints. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742793"},{"key":"e_1_3_2_51_2","unstructured":"Wikimedia. 2020. Page view statistics for Wikimedia projects. Retrieved from https:\/\/dumps.wikimedia.org\/other\/pagecounts-raw."},{"key":"e_1_3_2_52_2","unstructured":"Yahoo!2020. Yahoo! Music User Ratings of Songs with Artist Album and Genre Meta Information v. 1.0. Retrieved from https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=r."},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987576"},{"key":"e_1_3_2_54_2","volume-title":"Proceedings of the USENIX Annual Technical Conference","author":"Yang Youngseok","year":"2019","unstructured":"Youngseok Yang, Jeongyoon Eo, Geon-Woo Kim, Joo Yeon Kim, Sanha Lee, Jangho Seo, Won Wook Song, and Byung-Gon Chun. 2019. Apache Nemo: A framework for building distributed dataflow optimization policies. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201919)."},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064181"},{"key":"e_1_3_2_56_2","volume-title":"Proceedings of the USENIX Symposium on Operating Systems Design and Implementation","author":"Yu Yuan","year":"2008","unstructured":"Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, \u00dalfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201908)."},{"key":"e_1_3_2_57_2","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation","author":"Zaharia Matei","year":"2012","unstructured":"Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation."},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190534"},{"key":"e_1_3_2_59_2","volume-title":"Proceedings of the USENIX Symposium on Networked Systems Design and Implementation","author":"Zhang Jiaxing","year":"2012","unstructured":"Jiaxing Zhang, Hucheng Zhou, Rishan Chen, Xuepeng Fan, Zhenyu Guo, Haoxiang Lin, Jack Y. Li, Wei Lin, Jingren Zhou, and Lidong Zhou. 2012. Optimizing data shuffling in data-parallel computation by understanding user-defined functions. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation."}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3468144","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3468144","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:28:07Z","timestamp":1750195687000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3468144"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":57,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[2020,11,30]]}},"alternative-id":["10.1145\/3468144"],"URL":"https:\/\/doi.org\/10.1145\/3468144","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"type":"print","value":"0734-2071"},{"type":"electronic","value":"1557-7333"}],"subject":[],"published":{"date-parts":[[2020,11,30]]},"assertion":[{"value":"2020-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}