{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,24]],"date-time":"2025-10-24T16:40:05Z","timestamp":1761324005050,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2015,5,11]],"date-time":"2015-05-11T00:00:00Z","timestamp":1431302400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"China 1000-talents program"},{"name":"National High Technology Research andDevelopment Program of China","award":["2012AA010902"],"award-info":[{"award-number":["2012AA010902"]}]},{"name":"International Collaboration Key Program of CAS","award":["171111KYSB20130002"],"award-info":[{"award-number":["171111KYSB20130002"]}]},{"name":"China 10000-talents program"},{"name":"Google Faculty Research Award"},{"name":"European Research Council under the European Community's Seventh Framework Programme (FP7\/2007-2013) \/ ERC","award":["259295"],"award-info":[{"award-number":["259295"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["61100163, 61133004, 61222204, 61221062, 61303158, 61432016, 61472396, and 61473275"],"award-info":[{"award-number":["61100163, 61133004, 61222204, 61221062, 61303158, 61432016, 61472396, and 61473275"]}]},{"name":"National High Technology Research and Development Program of China","award":["2012AA012202"],"award-info":[{"award-number":["2012AA012202"]}]},{"name":"Strategic Priority Research Program of CAS","award":["XDA06010403"],"award-info":[{"award-number":["XDA06010403"]}]},{"name":"NSFC","award":["60873057, 60921002, 60925009, 61033009, 61202055, and 61402445"],"award-info":[{"award-number":["60873057, 60921002, 60925009, 61033009, 61202055, and 61402445"]}]},{"name":"National Basic Research Program of China","award":["2011CB302504"],"award-info":[{"award-number":["2011CB302504"]}]},{"name":"Intel Collaborative Research Institute for Computational Intelligence (ICRI-CI)"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,7,8]]},"abstract":"<jats:p>Iterative optimization is a simple but powerful approach that searches the best possible combination of compiler optimizations for a given workload. However, iterative optimization is plagued by several practical issues that prevent it from being widely used in practice: a large number of runs are required to find the best combination, the optimum combination is dataset dependent, and the exploration process incurs significant overhead that needs to be compensated for by performance benefits. Therefore, although iterative optimization has been shown to have a significant performance potential, it seldom is used in production compilers.<\/jats:p>\n          <jats:p>In this article, we propose iterative optimization for the data center (IODC): we show that the data center offers a context in which all of the preceding hurdles can be overcome. The basic idea is to spawn different combinations across workers and recollect performance statistics at the master, which then evolves to the optimum combination of compiler optimizations. IODC carefully manages costs and benefits, and it is transparent to the end user. To bring IODC to practice, we evaluate it in the presence of co-runners to better reflect real-life data center operation with multiple applications co-running per server. We enhance IODC with the capability to find compatible co-runners along with a mechanism to dynamically adjust the level of aggressiveness to improve its robustness in the presence of co-running applications.<\/jats:p>\n          <jats:p>We evaluate IODC using both MapReduce and compute-intensive throughput server applications. To reflect the large number of users interacting with the system, we gather a very large collection of datasets (up to hundreds of millions of unique datasets per program), for a total storage of 16.4TB and 850 days of CPU time. We report an average performance improvement of 1.48 \u00d7 and up to 2.08 \u00d7 for five MapReduce applications, and 1.12 \u00d7 and up to 1.39 \u00d7 for nine server applications. Furthermore, our experiments demonstrate that IODC is effective in the presence of co-runners, improving performance by greater than 13% compared to the worst possible co-runner schedule.<\/jats:p>","DOI":"10.1145\/2739048","type":"journal-article","created":{"date-parts":[[2015,5,11]],"date-time":"2015-05-11T16:30:57Z","timestamp":1431361857000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Practical Iterative Optimization for the Data Center"],"prefix":"10.1145","volume":"12","author":[{"given":"Shuangde","family":"Fang","sequence":"first","affiliation":[{"name":"SKLCA, ICT, CAS, China; Graduate School, CAS, Beijing, China"}]},{"given":"Wenwen","family":"Xu","sequence":"additional","affiliation":[{"name":"SKLCA, ICT, CAS, China; Graduate School, CAS, Beijing, China"}]},{"given":"Yang","family":"Chen","sequence":"additional","affiliation":[{"name":"Microsoft Research, Beijing, China"}]},{"given":"Lieven","family":"Eeckhout","sequence":"additional","affiliation":[{"name":"Ghent University, Belgium"}]},{"given":"Olivier","family":"Temam","sequence":"additional","affiliation":[{"name":"INRIA, Saclay, France"}]},{"given":"Yunji","family":"Chen","sequence":"additional","affiliation":[{"name":"SKLCA, ICT, CAS, Beijing, China"}]},{"given":"Chengyong","family":"Wu","sequence":"additional","affiliation":[{"name":"SKLCA, ICT, CAS, Beijing, China"}]},{"given":"Xiaobing","family":"Feng","sequence":"additional","affiliation":[{"name":"SKLCA, ICT, CAS, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2015,5,11]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.37"},{"volume-title":"Proceedings of 11th International Conference on Computer Communications and Networks (ICCCN\u201902)","author":"Aghdaie N.","key":"e_1_2_1_2_1","unstructured":"N. Aghdaie and Y. Tamir . 2002. Implementation and evaluation of transparent fault-tolerant Web service with kernel-level support . In Proceedings of 11th International Conference on Computer Communications and Networks (ICCCN\u201902) . IEEE, Los Alamitos, CA, 63--68. N. Aghdaie and Y. Tamir. 2002. Implementation and evaluation of transparent fault-tolerant Web service with kernel-level support. In Proceedings of 11th International Conference on Computer Communications and Networks (ICCCN\u201902). IEEE, Los Alamitos, CA, 63--68."},{"volume-title":"Proceedings of 20th International Conference on Very Large Data Bases (VLDB\u201994)","author":"Agrawal R.","key":"e_1_2_1_3_1","unstructured":"R. Agrawal and R. Srikant . 1994. Fast algorithms for mining association rules in large databases . In Proceedings of 20th International Conference on Very Large Data Bases (VLDB\u201994) . 487--499. R. Agrawal and R. Srikant. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases (VLDB\u201994). 487--499."},{"volume-title":"Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201906)","author":"Berube P.","key":"e_1_2_1_4_1","unstructured":"P. Berube and J. N. Amaral . 2006. Aestimo: A feedback-directed optimization evaluation tool . In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201906) . 251--260. P. Berube and J. N. Amaral. 2006. Aestimo: A feedback-directed optimization evaluation tool. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS\u201906). 251--260."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1454115.1454128"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2007.32"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2150983"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1806596.1806647"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065910.1065921"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/314403.314414"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1242572.1242610"},{"volume-title":"Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI\u201904)","author":"Dean J.","key":"e_1_2_1_12_1","unstructured":"J. Dean and S. Ghemawat . 2004. MapReduce: Simplified data processing on large clusters . In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI\u201904) . 107--113. J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI\u201904). 107--113."},{"volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912)","author":"Dwyer T.","key":"e_1_2_1_13_1","unstructured":"T. Dwyer , A. Fedorova , S. Blagodurov , M. Roth , F. Gaud , and J. Pei . 2012. A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads . In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912) . IEEE, Los Alamitos, CA, 1--11. T. Dwyer, A. Fedorova, S. Blagodurov, M. Roth, F. Gaud, and J. Pei. 2012. A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912). IEEE, Los Alamitos, CA, 1--11."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1065910.1065922"},{"volume-title":"Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC\u201907)","author":"Fursin G.","key":"e_1_2_1_15_1","unstructured":"G. Fursin , J. Cavazos , M. O\u2019Boyle , and O. Temam . 2007. MiDataSets: Creating the conditions for a more realistic evaluation of iterative optimization . In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC\u201907) . 245--260. G. Fursin, J. Cavazos, M. O\u2019Boyle, and O. Temam. 2007. MiDataSets: Creating the conditions for a more realistic evaluation of iterative optimization. In Proceedings of the International Conference on High Performance Embedded Architectures and Compilers (HiPEAC\u201907). 245--260."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-92990-1_5"},{"key":"e_1_2_1_18_1","volume-title":"Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructure. 367","author":"Gu Y.","year":"2009","unstructured":"Y. Gu and R. L. Grossman . 2009 . Sector and sphere: The design and implementation of a high performance data cloud. Theme Issue of the Philosophical Transactions of the Royal Society A : Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructure. 367 , 1897, 2429--2445. Y. Gu and R. L. Grossman. 2009. Sector and sphere: The design and implementation of a high performance data cloud. Theme Issue of the Philosophical Transactions of the Royal Society A: Crossing Boundaries: Computational Science, E-Science and Global E-Infrastructure. 367, 1897, 2429--2445."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/1128020.1128563"},{"volume-title":"Apache Hadoop Home Page. Retrieved","year":"2015","key":"e_1_2_1_20_1","unstructured":"Hadoop. 2014. Apache Hadoop Home Page. Retrieved April 6, 2015 , from http:\/\/hadoop.apache.org. Hadoop. 2014. Apache Hadoop Home Page. Retrieved April 6, 2015, from http:\/\/hadoop.apache.org."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1898699.1898951"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356058.1356080"},{"volume-title":"Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912)","author":"Kambadur M.","key":"e_1_2_1_23_1","unstructured":"M. Kambadur , T. Moseley , R. Hank , and M. A. Kim . 2012. Measuring interference between live datacenter applications . In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912) . 1--12. M. Kambadur, T. Moseley, R. Hank, and M. A. Kim. 2012. Measuring interference between live datacenter applications. In Proceedings of the International Conference on High Performance Computing, Networking, Storage, and Analysis (SC\u201912). 1--12."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996863"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2254756.2254790"},{"volume-title":"Proceedings of the 6th International Conference on Natural Computation (ICNC\u201910)","author":"Liu Z.","key":"e_1_2_1_26_1","unstructured":"Z. Liu , H. Li , and G. Miao . 2010. MapReduce-based backpropagation neural network over large scale mobile data . In Proceedings of the 6th International Conference on Natural Computation (ICNC\u201910) . 1726--1730. Z. Liu, H. Li, and G. Miao. 2010. MapReduce-based backpropagation neural network over large scale mobile data. In Proceedings of the 6th International Conference on Natural Computation (ICNC\u201910). 1726--1730."},{"key":"e_1_2_1_27_1","unstructured":"Loongson. 2014. Loongson 2F. Retrieved April 6 2015 from http:\/\/www.loongson.cn\/.  Loongson. 2014. Loongson 2F. Retrieved April 6 2015 from http:\/\/www.loongson.cn\/."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155650"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772991"},{"volume-title":"Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN\u201908)","author":"Marwah M.","key":"e_1_2_1_30_1","unstructured":"M. Marwah , S. Mishra , and C. Fetzer . 2008. Enhanced server fault-tolerance for improved user experience . In Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN\u201908) . 167--176. M. Marwah, S. Mishra, and C. Fetzer. 2008. Enhanced server fault-tolerance for improved user experience. In Proceedings of the IEEE International Conference on Dependable Systems and Networks with FTCS and DCC (DSN\u201908). 167--176."},{"volume-title":"MovieLens Data Sets. Retrieved","year":"2015","key":"e_1_2_1_31_1","unstructured":"MovieLens. 2014. MovieLens Data Sets. Retrieved April 6, 2015 , from http:\/\/www.grouplens.org\/node\/73. MovieLens. 2014. MovieLens Data Sets. Retrieved April 6, 2015, from http:\/\/www.grouplens.org\/node\/73."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/0305-0483(83)90088-9"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2006.38"},{"key":"e_1_2_1_34_1","volume-title":"PAPI: Performance Application Programming Interface. Retrieved","author":"PAPI","year":"2013","unstructured":"PAPI 5.1. 2013 . PAPI: Performance Application Programming Interface. Retrieved April 6, 2015, from http:\/\/icl.cs.utk.edu\/papi. PAPI 5.1. 2013. PAPI: Performance Application Programming Interface. Retrieved April 6, 2015, from http:\/\/icl.cs.utk.edu\/papi."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/L-CA.2010.1"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-006-7957-2"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2007.346181"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.29"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781141"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2000417.2000419"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISDA.2009.181"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/850941.852890"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2739048","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2739048","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:16:23Z","timestamp":1750227383000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2739048"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,11]]},"references-count":41,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2015,7,8]]}},"alternative-id":["10.1145\/2739048"],"URL":"https:\/\/doi.org\/10.1145\/2739048","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,5,11]]},"assertion":[{"value":"2014-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-05-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}