{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:48:14Z","timestamp":1767340094081,"version":"3.41.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2015,1,9]],"date-time":"2015-01-09T00:00:00Z","timestamp":1420761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00e5det","doi-asserted-by":"publisher","award":["EXCESS 611183","621-2009-4449","611183","11.03","OpCoReS"],"award-info":[{"award-number":["EXCESS 611183","621-2009-4449","611183","11.03","OpCoReS"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00e5det","doi-asserted-by":"crossref","award":["EXCESS 611183","621-2009-4449","611183","11.03","OpCoReS"],"award-info":[{"award-number":["EXCESS 611183","621-2009-4449","611183","11.03","OpCoReS"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>\n            Exploiting effectively massively parallel architectures is a major challenge that stream programming can help facilitate. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks, as in a pipelined task graph. In this article, we consider\n            <jats:italic>crown scheduling<\/jats:italic>\n            , a novel technique for the combined optimization of resource allocation, mapping, and discrete voltage\/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal offline algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic\n            <jats:italic>Longest Task, Lowest Group<\/jats:italic>\n            (LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the\n            <jats:italic>Height<\/jats:italic>\n            heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.\n          <\/jats:p>\n          <jats:p>Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium-sized streaming task collections even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. Our heuristics produce makespan and energy consumption close to optimality within the limits of the phase-separated crown scheduling technique and the crown structure. Their optimization time is longer than the one of other algorithms we test, but our heuristics consistently produce better solutions.<\/jats:p>","DOI":"10.1145\/2687653","type":"journal-article","created":{"date-parts":[[2015,1,12]],"date-time":"2015-01-12T20:02:10Z","timestamp":1421092930000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Manycore Systems"],"prefix":"10.1145","volume":"11","author":[{"given":"Nicolas","family":"Melot","sequence":"first","affiliation":[{"name":"Link\u00f6ping University, Sweden"}]},{"given":"Christoph","family":"Kessler","sequence":"additional","affiliation":[{"name":"Link\u00f6ping University, Sweden"}]},{"given":"J\u00f6rg","family":"Keller","sequence":"additional","affiliation":[{"name":"FernUniversit\u00e4t in Hagen, Hagen, Germany"}]},{"given":"Patrick","family":"Eitschberger","sequence":"additional","affiliation":[{"name":"FernUniversit\u00e4t in Hagen, Hagen, Germany"}]}],"member":"320","published-online":{"date-parts":[[2015,1,9]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Lecture Notes in Computer Science","volume":"1284","author":"Amoura A. K.","unstructured":"A. K. Amoura , E. Bampis , C. Kenyon , and Y. Manoussakis . 1997. Scheduling independent multiprocessor tasks. In Algorithms -- ESA\u201997, Rainer Burkard and Gerhard Woeginger (Eds.) . Lecture Notes in Computer Science , Vol. 1284 . Springer, New York, NY, 1--12. DOI: http:\/\/dx.doi.org\/10.1007\/3-540-63397-9_1 10.1007\/3-540-63397-9_1 A. K. Amoura, E. Bampis, C. Kenyon, and Y. Manoussakis. 1997. Scheduling independent multiprocessor tasks. In Algorithms -- ESA\u201997, Rainer Burkard and Gerhard Woeginger (Eds.). Lecture Notes in Computer Science, Vol. 1284. Springer, New York, NY, 1--12. DOI: http:\/\/dx.doi.org\/10.1007\/3-540-63397-9_1"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the A4MMC Workshop on Applications for Multi- and Many-Core Processors at ISCA","author":"Avdic K.","year":"2011","unstructured":"K. Avdic , N. Melot , C. Kessler , and J. Keller . 2011. Parallel sorting on Intel single-chip cloud computer . In Proceedings of the A4MMC Workshop on Applications for Multi- and Many-Core Processors at ISCA 2011 . K. Avdic, N. Melot, C. Kessler, and J. Keller. 2011. Parallel sorting on Intel single-chip cloud computer. In Proceedings of the A4MMC Workshop on Applications for Multi- and Many-Core Processors at ISCA 2011."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1137\/0209064"},{"volume-title":"Proceedings of the 1990 International Conference on Parallel Processing. 72--75","author":"Belkhale K. P.","key":"e_1_2_1_4_1","unstructured":"K. P. Belkhale and P. Banerjee . 1990. An approximate algorithm for the partitionable independent task scheduling problem . In Proceedings of the 1990 International Conference on Parallel Processing. 72--75 . K. P. Belkhale and P. Banerjee. 1990. An approximate algorithm for the partitionable independent task scheduling problem. In Proceedings of the 1990 International Conference on Parallel Processing. 72--75."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:ANOR.0000030682.25673.c0"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-011-9349-0"},{"volume-title":"Proceedings of the 6th MARC Symposium. ONERA, 46--51","author":"Cichowski P.","key":"e_1_2_1_7_1","unstructured":"P. Cichowski , J. Keller , and C. Kessler . 2012. Modelling power consumption of the Intel SCC . In Proceedings of the 6th MARC Symposium. ONERA, 46--51 . P. Cichowski, J. Keller, and C. Kessler. 2012. Modelling power consumption of the Intel SCC. In Proceedings of the 6th MARC Symposium. ONERA, 46--51."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/0209062"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2229163.2229172"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2012.01.011"},{"key":"e_1_2_1_11_1","volume-title":"AMPL: A Modeling Language for Mathematical Programming","author":"Fourer R.","year":"1993","unstructured":"R. Fourer , D. M. Gay , and B. W. Kernighan . 1993 . AMPL: A Modeling Language for Mathematical Programming . Scientific Press , South San Francisco. R. Fourer, D. M. Gay, and B. W. Kernighan. 1993. AMPL: A Modeling Language for Mathematical Programming. Scientific Press, South San Francisco."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/0204015"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1168857.1168877"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/0117039"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810479.1810482"},{"key":"e_1_2_1_16_1","volume-title":"IFIP Congress. 471--475","author":"Kahn G.","year":"1974","unstructured":"G. Kahn . 1974 . The semantics of simple language for parallel programming . In IFIP Congress. 471--475 . G. Kahn. 1974. The semantics of simple language for parallel programming. In IFIP Congress. 471--475."},{"key":"e_1_2_1_17_1","first-page":"1987","article-title":"Optimized on-chip-pipelining for memory-intensive computations on multi-core processors with explicit memory hierarchy","volume":"18","author":"Keller J.","year":"2012","unstructured":"J. Keller , C. Kessler , and R. Hulten . 2012 . Optimized on-chip-pipelining for memory-intensive computations on multi-core processors with explicit memory hierarchy . J. Universal Comput. Sci. 18 , 14, 1987 -- 2023 . J. Keller, C. Kessler, and R. Hulten. 2012. Optimized on-chip-pipelining for memory-intensive computations on multi-core processors with explicit memory hierarchy. J. Universal Comput. Sci. 18, 14, 1987--2023.","journal-title":"J. Universal Comput. Sci."},{"volume-title":"Proceedings 25th PARS-Workshop (PARS-Mitteilungen). 37--46","author":"Kessler C.","key":"e_1_2_1_18_1","unstructured":"C. Kessler , P. Eitschberger , and J. Keller . 2013a. Energy-efficient static scheduling of streaming task collections with malleable tasks . In Proceedings 25th PARS-Workshop (PARS-Mitteilungen). 37--46 . C. Kessler, P. Eitschberger, and J. Keller. 2013a. Energy-efficient static scheduling of streaming task collections with malleable tasks. In Proceedings 25th PARS-Workshop (PARS-Mitteilungen). 37--46."},{"volume-title":"Proceedings of the 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS\u201913)","author":"Kessler C.","key":"e_1_2_1_19_1","unstructured":"C. Kessler , N. Melot , P. Eitschberger , and J. Keller . 2013b. Crown scheduling: Energy-efficient resource allocation, mapping and discrete frequency scaling for collections of malleable streaming tasks . In Proceedings of the 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS\u201913) . C. Kessler, N. Melot, P. Eitschberger, and J. Keller. 2013b. Crown scheduling: Energy-efficient resource allocation, mapping and discrete frequency scaling for collections of malleable streaming tasks. In Proceedings of the 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS\u201913)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1987.13876"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2008.122"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-010-0416-0"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-2217(02)00123-6"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00450-013-0240-x"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2010.07.004"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-007-9070-1"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32820-6_18"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/RTCSA.2012.10"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2644865.2541962"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2687653","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2687653","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:12:14Z","timestamp":1750227134000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2687653"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,9]]},"references-count":29,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2687653"],"URL":"https:\/\/doi.org\/10.1145\/2687653","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2015,1,9]]},"assertion":[{"value":"2014-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}