{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T18:13:07Z","timestamp":1775326387884,"version":"3.50.1"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2017,9,27]],"date-time":"2017-09-27T00:00:00Z","timestamp":1506470400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"DARPA PERFECT","award":["C#: R0011-13-C-0003"],"award-info":[{"award-number":["C#: R0011-13-C-0003"]}]},{"name":"C-FAR","award":["C#: 2013-MA-2384"],"award-info":[{"award-number":["C#: 2013-MA-2384"]}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["A#: 1527821"],"award-info":[{"award-number":["A#: 1527821"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2017,10,31]]},"abstract":"<jats:p>Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization task. We present COSMOS, an automatic methodology for the design-space exploration (DSE) of complex accelerators, that coordinates both HLS and memory optimization tools in a compositional way. First, thanks to the co-design of datapath and memory, COSMOS produces a large set of Pareto-optimal implementations for each component of the accelerator. Then, COSMOS leverages compositional design techniques to quickly converge to the desired trade-off point between cost and performance at the system level. When applied to the system-level design (SLD) of an accelerator for wide-area motion imagery (WAMI), COSMOS explores the design space as completely as an exhaustive search, but it reduces the number of invocations to the HLS tool by up to 14.6\u00d7.<\/jats:p>","DOI":"10.1145\/3126566","type":"journal-article","created":{"date-parts":[[2017,9,27]],"date-time":"2017-09-27T12:33:53Z","timestamp":1506515633000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["COSMOS"],"prefix":"10.1145","volume":"16","author":[{"given":"Luca","family":"Piccolboni","sequence":"first","affiliation":[{"name":"Columbia University, New York, NY, USA"}]},{"given":"Paolo","family":"Mantovani","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY, USA"}]},{"given":"Giuseppe Di","family":"Guglielmo","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY, USA"}]},{"given":"Luca P.","family":"Carloni","sequence":"additional","affiliation":[{"name":"Columbia University, New York, NY, USA"}]}],"member":"320","published-online":{"date-parts":[[2017,9,27]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/1465482.1465560"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391962.1391969"},{"key":"e_1_2_1_3_1","unstructured":"K. Barker T. Benson D. Campbell D. Ediger R. Gioiosa A. Hoisie D. Kerbyson J. Manzano A. Marquez L. Song N. Tallent and A. Tumeo. 2013. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute. http:\/\/hpc.pnl.gov\/PERFECT\/.  K. Barker T. Benson D. Campbell D. Ediger R. Gioiosa A. Hoisie D. Kerbyson J. Manzano A. Marquez L. Song N. Tallent and A. Tumeo. 2013. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute. http:\/\/hpc.pnl.gov\/PERFECT\/."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1941487.1941507"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press.   S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press.","DOI":"10.1017\/CBO9780511804441"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"J. Campos G. Chiola J. M. Colom and M. Silva. 1992. Properties and Performance Bounds for Timed Marked Graphs. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications (1992).  J. Campos G. Chiola J. M. Colom and M. Silva. 1992. Properties and Performance Bounds for Timed Marked Graphs. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications (1992).","DOI":"10.1109\/81.139289"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2480849"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2897937.2905018"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.58"},{"key":"e_1_2_1_10_1","volume-title":"Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks","author":"Chen Y. H.","year":"2017","unstructured":"Y. H. Chen , T. Krishna , J. S. Emer , and V. Sze . 2017 . Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks . IEEE Journal of Solid-State Circuits ( 2017). Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits (2017)."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2593069.2596667"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2488491"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062208"},{"key":"e_1_2_1_14_1","volume-title":"Proc. of the ACM\/IEEE International Conference on Computer-Aided Design (ICCAD).","author":"Cong J.","unstructured":"J. Cong , P. Zhang , and Y. Zou . 2011. Combined Loop Transformation and Hierarchy Allocation for Data Reuse Optimization . In Proc. of the ACM\/IEEE International Conference on Computer-Aided Design (ICCAD). J. Cong, P. Zhang, and Y. Zou. 2011. Combined Loop Transformation and Hierarchy Allocation for Data Reuse Optimization. In Proc. of the ACM\/IEEE International Conference on Computer-Aided Design (ICCAD)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228586"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2744769.2744794"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISVLSI.2008.73"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2009.2026356"},{"key":"e_1_2_1_19_1","volume-title":"Transaction-Level Modeling with SystemC","author":"Ghenassia F.","unstructured":"F. Ghenassia . 2006. Transaction-Level Modeling with SystemC . Springer-Verlag . F. Ghenassia. 2006. Transaction-Level Modeling with SystemC. Springer-Verlag."},{"key":"e_1_2_1_20_1","volume-title":"Proc. of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Ham T. J.","unstructured":"T. J. Ham , L. Wu , N. Sundaram , N. Satish , and M. Martonosi . 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics . In Proc. of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In Proc. of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1119772.1119882"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSCC.2014.6757323"},{"key":"e_1_2_1_23_1","volume-title":"DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks","author":"Kim L. W.","year":"2017","unstructured":"L. W. Kim . 2017. DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks . IEEE Transactions on Neural Networks and Learning Systems ( 2017 ). L. W. Kim. 2017. DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (2017)."},{"key":"e_1_2_1_24_1","volume-title":"Proc. of the ACM\/IEEE Conference on Design, Automation and Test in Europe (DATE).","author":"Kurra S.","unstructured":"S. Kurra , N. K. Singh , and P. R. Panda . 2007. The Impact of Loop Unrolling on Controller Delay in High Level Synthesis . In Proc. of the ACM\/IEEE Conference on Design, Automation and Test in Europe (DATE). S. Kurra, N. K. Singh, and P. R. Panda. 2007. The Impact of Loop Unrolling on Controller Delay in High Level Synthesis. In Proc. of the ACM\/IEEE Conference on Design, Automation and Test in Europe (DATE)."},{"key":"e_1_2_1_25_1","volume-title":"Proc. of the ACM\/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC).","author":"Li B.","unstructured":"B. Li , Z. Fang , and R. Iyer . 2011. Template-based Memory Access Engine for Accelerators in SoCs . In Proc. of the ACM\/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC). B. Li, Z. Fang, and R. Iyer. 2011. Template-based Memory Access Engine for Accelerators in SoCs. In Proc. of the ACM\/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC)."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463209.2488795"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2024724.2024818"},{"key":"e_1_2_1_28_1","volume-title":"Proc. of the AMC\/IEEE Conference on Design, Automation, and Test in Europe (DATE).","author":"Liu H. Y.","unstructured":"H. Y. Liu , M. Petracca , and L. P. Carloni . 2012. Compositional System-Level Design Exploration with Planning of High-Level Synthesis . In Proc. of the AMC\/IEEE Conference on Design, Automation, and Test in Europe (DATE). H. Y. Liu, M. Petracca, and L. P. Carloni. 2012. Compositional System-Level Design Exploration with Planning of High-Level Synthesis. In Proc. of the AMC\/IEEE Conference on Design, Automation, and Test in Europe (DATE)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2847263.2847274"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2086696.2086727"},{"key":"e_1_2_1_31_1","volume-title":"Proc. of the Electronic System Level Synthesis Conference (ESLsyn).","author":"Mahapatra A.","unstructured":"A. Mahapatra and B. Carrion Schafer . 2014. Machine-learning based Simulated Annealer Method for High Level Synthesis Design Space Exploration . In Proc. of the Electronic System Level Synthesis Conference (ESLsyn). A. Mahapatra and B. Carrion Schafer. 2014. Machine-learning based Simulated Annealer Method for High Level Synthesis Design Space Exploration. In Proc. of the Electronic System Level Synthesis Conference (ESLsyn)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10617-012-9096-8"},{"key":"e_1_2_1_33_1","volume-title":"Proc. of the IEEE International Symposium on Electronic System Design (ISED).","author":"Mishra V. K.","unstructured":"V. K. Mishra and A. Sengupta . 2014. PSDSE: Particle Swarm Driven Design Space Exploration of Architecture and Unrolling Factors for Nested Loops in High Level Synthesis . In Proc. of the IEEE International Symposium on Electronic System Design (ISED). V. K. Mishra and A. Sengupta. 2014. PSDSE: Particle Swarm Driven Design Space Exploration of Architecture and Unrolling Factors for Nested Loops in High Level Synthesis. In Proc. of the IEEE International Symposium on Electronic System Design (ISED)."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.24143"},{"key":"e_1_2_1_35_1","volume-title":"Proc. of the IEEE High Performance Extreme Computing Conference (HPEC).","author":"Piccolboni L.","unstructured":"L. Piccolboni , P. Mantovani , G. Di Guglielmo , and L. P. Carloni . 2017. Broadening the Exploration of the Accelerator Design Space in Embedded Scalable Platforms . In Proc. of the IEEE High Performance Extreme Computing Conference (HPEC). L. Piccolboni, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. 2017. Broadening the Exploration of the Accelerator Design Space in Embedded Scalable Platforms. In Proc. of the IEEE High Performance Extreme Computing Conference (HPEC)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2656075.2656098"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2016.2611506"},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"R. Porter A. M. Fraser and D. Hush. 2010. Wide-Area Motion Imagery. IEEE Signal Processing Magazine (2010).  R. Porter A. M. Fraser and D. Hush. 2010. Wide-Area Motion Imagery. IEEE Signal Processing Magazine (2010).","DOI":"10.1109\/MSP.2010.937396"},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"A. Qamar F. B. Muslim F. Gregoretti L. Lavagno and M. T. Lazarescu. 2017. High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? IEEE Access (2017).  A. Qamar F. B. Muslim F. Gregoretti L. Lavagno and M. T. Lazarescu. 2017. High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? IEEE Access (2017).","DOI":"10.1109\/ACCESS.2016.2635378"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1980.230492"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.32"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2006.890107"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2015.2472007"},{"key":"e_1_2_1_44_1","volume-title":"Proc. of the IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT).","author":"Carrion Schafer B.","unstructured":"B. Carrion Schafer , T. Takenaka , and K. Wakabayashi . 2009. Adaptive Simulated Annealer for High Level Synthesis Design Space Exploration . In Proc. of the IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT). B. Carrion Schafer, T. Takenaka, and K. Wakabayashi. 2009. Adaptive Simulated Annealer for High Level Synthesis Design Space Exploration. In Proc. of the IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"B. Carrion Schafer and K. Wakabayashi. 2012. Machine Learning Predictive Modelling High-Level Synthesis Design Space Exploration. IET Computers Digital Techniques (2012).  B. Carrion Schafer and K. Wakabayashi. 2012. Machine Learning Predictive Modelling High-Level Synthesis Design Space Exploration. IET Computers Digital Techniques (2012).","DOI":"10.1049\/iet-cdt.2011.0115"},{"key":"e_1_2_1_46_1","volume-title":"Proc. of the Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD).","author":"Seznec A.","year":"2015","unstructured":"A. Seznec . 2015 . Bank-interleaved Cache or Memory Indexing Does Not Require Euclidean Division . In Proc. of the Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD). A. Seznec. 2015. Bank-interleaved Cache or Memory Indexing Does Not Require Euclidean Division. In Proc. of the Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD)."},{"key":"e_1_2_1_47_1","volume-title":"Proc. of the ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Shao Y. S.","unstructured":"Y. S. Shao , B. Reagen , G. Y. Wei , and D. Brooks . 2014. Aladdin: A Pre-RTL, Power-performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures . In Proc. of the ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Y. S. Shao, B. Reagen, G. Y. Wei, and D. Brooks. 2014. Aladdin: A Pre-RTL, Power-performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. In Proc. of the ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126566","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3126566","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3126566","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T19:05:02Z","timestamp":1750273502000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3126566"}},"subtitle":["Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators"],"short-title":[],"issued":{"date-parts":[[2017,9,27]]},"references-count":48,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2017,10,31]]}},"alternative-id":["10.1145\/3126566"],"URL":"https:\/\/doi.org\/10.1145\/3126566","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,9,27]]},"assertion":[{"value":"2017-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-09-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}