{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:24:46Z","timestamp":1750307086563,"version":"3.41.0"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2012,9,1]],"date-time":"2012-09-01T00:00:00Z","timestamp":1346457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000923","name":"Australian Research Council","doi-asserted-by":"publisher","award":["DP0987236"],"award-info":[{"award-number":["DP0987236"]}],"id":[{"id":"10.13039\/501100000923","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["60921002"],"award-info":[{"award-number":["60921002"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science and Technology Major Projects","award":["2011ZX01028-001-002"],"award-info":[{"award-number":["2011ZX01028-001-002"]}]},{"DOI":"10.13039\/501100002855","name":"Ministry of Science and Technology of the People's Republic of China","doi-asserted-by":"publisher","award":["2011CB302504"],"award-info":[{"award-number":["2011CB302504"]}],"id":[{"id":"10.13039\/501100002855","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2012,9]]},"abstract":"<jats:p>Algorithm-specific, that is, semantic-specific optimizations have been observed to bring significant performance gains, especially for a diverse set of multi\/many-core architectures. However, current programming models and compiler technologies for the state-of-the-art architectures do not exploit well these performance opportunities. In this article, we propose a pattern-making methodology that enables algorithm-specific optimizations to be encapsulated into \u201coptimization patterns\u201d. Such optimization patterns are expressed in terms of preprocessor directives so that simple annotations can result in significant performance improvements. To validate this new methodology, a framework, named EPOD, is developed to map these directives into the underlying optimization schemes for a particular architecture.<\/jats:p>\n          <jats:p>It is difficult to create an exact performance model to determine an optimal or near-optimal optimization scheme (including which optimizations to apply and in which order) for a specific application, due to the complexity of applications and architectures. However, it is trackable to build individual optimization components and let compiler developers synthesize an optimization scheme from these components. Therefore, our EPOD framework provides an Optimization Programming Interface (OPI) for compiler developers to define new optimization schemes. Thus, new patterns can be integrated into EPOD in a flexible manner.<\/jats:p>\n          <jats:p>We have identified and implemented a number of optimization patterns for three representative computer platforms. Our experimental results show that a pattern-guided compiler can outperform the state-of-the-art compilers and even achieve performance as competitive as hand-tuned code. Therefore, such a pattern-making methodology represents an encouraging direction for domain experts' experience and knowledge to be integrated into general-purpose compilers.<\/jats:p>","DOI":"10.1145\/2355585.2355587","type":"journal-article","created":{"date-parts":[[2012,10,2]],"date-time":"2012-10-02T13:50:00Z","timestamp":1349185800000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Extendable pattern-oriented optimization directives"],"prefix":"10.1145","volume":"9","author":[{"given":"Huimin","family":"Cui","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Jingling","family":"Xue","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia"}]},{"given":"Lei","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Yang","family":"Yang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Xiaobing","family":"Feng","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"given":"Dongrui","family":"Fan","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2012,10,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/997163.997196"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/255129.255132"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11970-5_14"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1055531.1055533"},{"volume-title":"Proceedings of the Workshop on Profile Directed Feedback-Compilation (PACT'98)","author":"Bodin F.","key":"e_1_2_1_5_1","unstructured":"Bodin , F. , Kisuki , T. , Knijnenburg , P. , O'Boyle , M. , and Rohou , E . 1998. Iterative compilation in a non-linear optimisation space . In Proceedings of the Workshop on Profile Directed Feedback-Compilation (PACT'98) . Bodin, F., Kisuki, T., Knijnenburg, P., O'Boyle, M., and Rohou, E. 1998. Iterative compilation in a non-linear optimisation space. In Proceedings of the Workshop on Profile Directed Feedback-Compilation (PACT'98)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996864"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1094811.1094852"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.10"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/314403.314414"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1946459.1946479"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/070693199"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing.","author":"Datta K.","key":"e_1_2_1_13_1","unstructured":"Datta , K. , Murphy , M. , Volkov , V. , Williams , S. , Carter , J. , Oliker , L. , Patterson , D. A., SHALF, J. , and Yelick , K. A . 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures . In Proceedings of the ACM\/IEEE Conference on Supercomputing. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D. A., SHALF, J., and Yelick, K. A. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the ACM\/IEEE Conference on Supercomputing."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2010.13"},{"volume-title":"Proceedings of the 17th International Conference on Parallel Processing.","author":"Di P.","key":"e_1_2_1_15_1","unstructured":"Di , P. and Xue , J . 2011. Model-driven tile size selection for DOACROSS loops on GPUs . In Proceedings of the 17th International Conference on Parallel Processing. Di, P. and Xue, J. 2011. Model-driven tile size selection for DOACROSS loops on GPUs. In Proceedings of the 17th International Conference on Parallel Processing."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69330-7_10"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-009-9295-3"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1188455.1188543"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301661"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/277650.277725"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/11587514_4"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the GCC Developers' Summit.","author":"Fursin G.","year":"2008","unstructured":"Fursin , G. , Miranda , C. , 2008 . MILEPOST GCC: Machine learning based research compiler . In Proceedings of the GCC Developers' Summit. Fursin, G., Miranda, C., et al. 2008. MILEPOST GCC: Machine learning based research compiler. In Proceedings of the GCC Developers' Summit."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-92990-1_5"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-006-0012-3"},{"volume-title":"Proceedings of the Conference on the State of the Art in Scientific and Parallel Computing.","author":"Gjermundsen A.","key":"e_1_2_1_25_1","unstructured":"Gjermundsen , A. and Elster , A. C . 2010. LBM vs. SOR solvers on GPU for real-time fluid simulations . In Proceedings of the Conference on the State of the Art in Scientific and Parallel Computing. Gjermundsen, A. and Elster, A. C. 2010. LBM vs. SOR solvers on GPU for real-time fluid simulations. In Proceedings of the Conference on the State of the Art in Scientific and Parallel Computing."},{"volume-title":"Automatic Parallelization of Loop Programs for Distributed Memory Architectures","author":"Griebl M.","key":"e_1_2_1_26_1","unstructured":"Griebl , M. 2004. Automatic Parallelization of Loop Programs for Distributed Memory Architectures . University of Passau . Griebl, M. 2004. Automatic Parallelization of Loop Programs for Distributed Memory Architectures. University of Passau."},{"volume-title":"Proceedings of the IFIP TC2\/WG2.5 Working Conference on the Architecture of Scientific Software. 175-- 192","author":"Guyer S. Z.","key":"e_1_2_1_27_1","unstructured":"Guyer , S. Z. and Lin , C . 2001. Broadway: A software architecture for scientific computing . In Proceedings of the IFIP TC2\/WG2.5 Working Conference on the Architecture of Scientific Software. 175-- 192 . Guyer, S. Z. and Lin, C. 2001. Broadway: A software architecture for scientific computing. In Proceedings of the IFIP TC2\/WG2.5 Working Conference on the Architecture of Scientific Software. 175-- 192."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-13374-9_4"},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the International Parallel and Distributed Processing Symposium.","author":"Kamil S.","year":"2010","unstructured":"Kamil , S. , Chan , C. , Williams , S. , Oliker , L. , Shalf , J. , Howison , M. , Bethel , E. W. , and Prabhat . 2010 . A generalized framework for auto-tuning stencil computations . In Proceedings of the International Parallel and Distributed Processing Symposium. Kamil, S., Chan, C., Williams, S., Oliker, L., Shalf, J., Howison, M., Bethel, E. W., and Prabhat. 2010. A generalized framework for auto-tuning stencil computations. In Proceedings of the International Parallel and Distributed Processing Symposium."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250734.1250761"},{"key":"e_1_2_1_31_1","doi-asserted-by":"crossref","unstructured":"Liao C. Quinlan D. Willcock J. and Panas T. 2010. Semantic-aware automatic parallelization of modern applications using high-level abstractions. J. Paral. Program.  Liao C. Quinlan D. Willcock J. and Panas T. 2010. Semantic-aware automatic parallelization of modern applications using high-level abstractions. J. Paral. Program.","DOI":"10.1007\/s10766-010-0139-0"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/11596110_6"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810085.1810124"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1693453.1693483"},{"key":"e_1_2_1_35_1","unstructured":"MGMRES. MGMRES: Restarted GMRES solver for sparse linear systems. http:\/\/people.sc.fsu.edu\/burkardt\/c src\/mgmres\/mgmres.html.  MGMRES. MGMRES: Restarted GMRES solver for sparse linear systems. http:\/\/people.sc.fsu.edu\/burkardt\/c src\/mgmres\/mgmres.html."},{"key":"e_1_2_1_36_1","unstructured":"TBB. 2010. Intel Corporation. Intel(R) Threading Building Blocks: Getting Started Guide.  TBB. 2010. Intel Corporation. Intel(R) Threading Building Blocks: Getting Started Guide."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2009.5161054"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1183401.1183448"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542275.1542312"},{"volume-title":"Proceedings of the ACM\/IEEE Conference on Supercomputing.","author":"Volkov V.","key":"e_1_2_1_40_1","unstructured":"Volkov , V. and Demmel , J . 2008. Benchmarking GPUs to tune dense linear algebra . In Proceedings of the ACM\/IEEE Conference on Supercomputing. Volkov, V. and Demmel, J. 2008. Benchmarking GPUs to tune dense linear algebra. In Proceedings of the ACM\/IEEE Conference on Supercomputing."},{"key":"e_1_2_1_41_1","article-title":"Automated empirical optimizations of software and the ATLAS project","author":"Whaley R. C.","year":"2001","unstructured":"Whaley , R. C. , Petitet , A. , and Dongarra , J. 2001 . Automated empirical optimizations of software and the ATLAS project . J. Paral. Comput. Whaley, R. C., Petitet, A., and Dongarra, J. 2001. Automated empirical optimizations of software and the ATLAS project. J. Paral. Comput.","journal-title":"J. Paral. Comput."},{"volume-title":"Proceedings of the 16th International Conference on Supercomputing.","author":"Wu P.","key":"e_1_2_1_42_1","unstructured":"Wu , P. , Feautrier , P. , Padua , D. , and Sura , Z . 2002 . In Proceedings of the 16th International Conference on Supercomputing. Wu, P., Feautrier, P., Padua, D., and Sura, Z. 2002. In Proceedings of the 16th International Conference on Supercomputing."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/378795.378860"},{"volume-title":"Loop Tiling for Parallelism","author":"Xue J.","key":"e_1_2_1_44_1","unstructured":"Xue , J. 2000. Loop Tiling for Parallelism . Kluwer Academic Publishers , Boston . Xue, J. 2000. Loop Tiling for Parallelism. Kluwer Academic Publishers, Boston."},{"key":"e_1_2_1_45_1","volume-title":"POET: A scripting language for applying parameterized source-to-source program transformations. Tech. Rep. CS-TR-2010-012, Computer Science","author":"Yi Q.","year":"2010","unstructured":"Yi , Q. 2010 . POET: A scripting language for applying parameterized source-to-source program transformations. Tech. Rep. CS-TR-2010-012, Computer Science , University of Texas at San Antonio . Yi, Q. 2010. POET: A scripting language for applying parameterized source-to-source program transformations. Tech. Rep. CS-TR-2010-012, Computer Science, University of Texas at San Antonio."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781140"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03869-3_87"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355587","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2355585.2355587","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:34:24Z","timestamp":1750239264000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2355585.2355587"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,9]]},"references-count":47,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2012,9]]}},"alternative-id":["10.1145\/2355585.2355587"],"URL":"https:\/\/doi.org\/10.1145\/2355585.2355587","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2012,9]]},"assertion":[{"value":"2011-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}