{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,5,24]],"date-time":"2023-05-24T22:49:34Z","timestamp":1684968574924},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2007,3]]},"abstract":"<jats:p>\n            The real-time execution of contemporary complex media applications requires energy-efficient processing capabilities beyond those of current superscalar processors. We observe that the complexity of contemporary media applications requires support for multiple forms of parallelism, including ILP, TLP, and various forms of DLP, such as subword SIMD, short vectors, and streams. Based on our observations, we propose an architecture, called ALP, that efficiently integrates all of these forms of parallelism with evolutionary changes to the programming model and hardware. The novel part of ALP is a DLP technique called\n            <jats:italic>SIMD vectors and streams (SVectors\/SStreams)<\/jats:italic>\n            , which is integrated within a conventional superscalar-based CMP\/SMT architecture with subword SIMD. This technique lies between subword SIMD and vectors, providing significant benefits over the former at a lower cost than the latter. Our evaluations show that each form of parallelism supported by ALP is important. Specifically, SVectors\/SStreams are effective, compared to a system with the other enhancements in ALP. They give speedups of 1.1 to 3.4X and energy-delay product improvements of 1.1 to 5.1X for applications with DLP.\n          <\/jats:p>","DOI":"10.1145\/1216544.1216546","type":"journal-article","created":{"date-parts":[[2007,4,5]],"date-time":"2007-04-05T19:20:08Z","timestamp":1175800808000},"page":"3","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["ALP"],"prefix":"10.1145","volume":"4","author":[{"given":"Ruchira","family":"Sasanka","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois"}]},{"given":"Man-Lap","family":"Li","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois"}]},{"given":"Sarita V.","family":"Adve","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign, Urbana, Illinois"}]},{"given":"Yen-Kuang","family":"Chen","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, California"}]},{"given":"Eric","family":"Debes","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, California"}]}],"member":"320","published-online":{"date-parts":[[2007,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proc. of the 31st Annual Intl. Symp. on Comp. Architecture.","author":"Ahn J. H.","unstructured":"Ahn , J. H. , Dally , W. J. , Khailany , B. , Kapasi , U. J. , and Das , A . 2004. Evaluating the imagine stream architecture . In Proc. of the 31st Annual Intl. Symp. on Comp. Architecture. Ahn, J. H., Dally, W. J., Khailany, B., Kapasi, U. J., and Das, A. 2004. Evaluating the imagine stream architecture. In Proc. of the 31st Annual Intl. Symp. on Comp. Architecture."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/320080.320119"},{"key":"e_1_2_1_3_1","volume-title":"thesis","author":"Asanovic K.","unstructured":"Asanovic , K. 1998. Vector Microprocessors . Ph.D. thesis , Univ. of California at Berkeley. Asanovic, K. 1998. Vector Microprocessors. Ph.D. thesis, Univ. of California at Berkeley."},{"key":"e_1_2_1_4_1","unstructured":"Beveridge R. and Draper B. 2003. Evaluation of face recognition algorithms. http:\/\/www.cs.colostate.edu\/evalfacerec\/.  Beveridge R. and Draper B. 2003. Evaluation of face recognition algorithms. http:\/\/www.cs.colostate.edu\/evalfacerec\/."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339657"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2004.840415"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331547"},{"key":"e_1_2_1_8_1","unstructured":"Cray Inc. 2005. Cray X1 System Overview. http:\/\/www.cray.com\/products\/x1e\/.  Cray Inc. 2005. Cray X1 System Overview. http:\/\/www.cray.com\/products\/x1e\/."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050187"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.612247"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/514191.514197"},{"key":"e_1_2_1_12_1","volume-title":"Proc. of the 3rd Intl. Symp. on High-Perf. Comp. Architecture.","author":"Espasa R.","unstructured":"Espasa , R. and Valero , M . 1997. Simultaneous multithreaded vector architecture . In Proc. of the 3rd Intl. Symp. on High-Perf. Comp. Architecture. Espasa, R. and Valero, M. 1997. Simultaneous multithreaded vector architecture. In Proc. of the 3rd Intl. Symp. on High-Perf. Comp. Architecture."},{"key":"e_1_2_1_13_1","volume-title":"Proc. of the 25th Annual Intl. Symp. on Comp. Architecture.","author":"Espasa R.","unstructured":"Espasa , R. , Valero , M. , and Smith , J. E . 1997. Out-of-order vector architectures . In Proc. of the 25th Annual Intl. Symp. on Comp. Architecture. Espasa, R., Valero, M., and Smith, J. E. 1997. Out-of-order vector architectures. In Proc. of the 25th Annual Intl. Symp. on Comp. Architecture."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/545215.545247"},{"key":"e_1_2_1_15_1","volume-title":"Computer Architecture: A Quantitative Approach. Morgan Kaufmann","author":"Hennessy J. L.","year":"2002","unstructured":"Hennessy , J. L. and Patterson , D. A . 2002 . Computer Architecture: A Quantitative Approach. Morgan Kaufmann , San Mateo, CA . Hennessy, J. L. and Patterson, D. A. 2002. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA."},{"key":"e_1_2_1_16_1","volume-title":"Proc. of Workshop on Computer Architecture Evaluation using Commercial Workloads.","author":"Holliman M.","unstructured":"Holliman , M. and Chen , Y . -K. 2003. MPEG decoding workload characterization . In Proc. of Workshop on Computer Architecture Evaluation using Commercial Workloads. Holliman, M. and Chen, Y.-K. 2003. MPEG decoding workload characterization. In Proc. of Workshop on Computer Architecture Evaluation using Commercial Workloads."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.982915"},{"key":"e_1_2_1_18_1","volume-title":"Scalability, Programmability","author":"Hwang K.","unstructured":"Hwang , K. 1993. Advanced Computer Architecture: Parallelism , Scalability, Programmability . McGraw-Hill , New York . Hwang, K. 1993. Advanced Computer Architecture: Parallelism, Scalability, Programmability. McGraw-Hill, New York."},{"key":"e_1_2_1_19_1","volume-title":"Intel Itanium Architecture Software Developer's Manual","author":"Intel Corporation 2001.","unstructured":"Intel Corporation 2001. Intel Itanium Architecture Software Developer's Manual . Intel Corporation , Santa Clara, CA . Intel Corporation 2001. Intel Itanium Architecture Software Developer's Manual. Intel Corporation, Santa Clara, CA."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/70082.68195"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/360128.360145"},{"key":"e_1_2_1_22_1","unstructured":"Kitagawa K. Tagaya S. Hagihara Y. and Kanoh Y. 2002. A hardware overview of SX-6 and SX-7 supercomputer. http:\/\/www.nec.co.jp\/techrep\/en\/r_and_d\/r03\/r03-no1\/rd02.pdf.  Kitagawa K. Tagaya S. Hagihara Y. and Kanoh Y. 2002. A hardware overview of SX-6 and SX-7 supercomputer. http:\/\/www.nec.co.jp\/techrep\/en\/r_and_d\/r03\/r03-no1\/rd02.pdf."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859664"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/998680.1006736"},{"key":"e_1_2_1_26_1","volume-title":"Proc. of the 31st Annual Intl. Symp. on Microarchitecture.","author":"Lee C. G.","unstructured":"Lee , C. G. and Stoodley , M. G . 1999. Simple vector microprocessors for multimedia applications . In Proc. of the 31st Annual Intl. Symp. on Microarchitecture. Lee, C. G. and Stoodley, M. G. 1999. Simple vector microprocessors for multimedia applications. In Proc. of the 31st Annual Intl. Symp. on Microarchitecture."},{"key":"e_1_2_1_29_1","volume-title":"IEEE Intl. Symp. on Workload Characterization.","author":"Li M.-L.","unstructured":"Li , M.-L. , Sasanka , R. , Adve , S. V. , Chen , Y.-K. , and Debes , E . 2005. The ALPBench benchmark suite for multimedia applications . In IEEE Intl. Symp. on Workload Characterization. Li, M.-L., Sasanka, R., Adve, S. V., Chen, Y.-K., and Debes, E. 2005. The ALPBench benchmark suite for multimedia applications. In IEEE Intl. Symp. on Workload Characterization."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339673"},{"key":"e_1_2_1_31_1","volume-title":"Proc. of the 6th Intl. Symp. on High-Perf. Comp. Architecture. 39--48","author":"Mathew B. K.","unstructured":"Mathew , B. K. , McKee , S. A. , Carter , J. B. , and Davis , A . 2000. Design of a parallel vector access unit for sdram memories . In Proc. of the 6th Intl. Symp. on High-Perf. Comp. Architecture. 39--48 . Mathew, B. K., McKee, S. A., Carter, J. B., and Davis, A. 2000. Design of a parallel vector access unit for sdram memories. In Proc. of the 6th Intl. Symp. on High-Perf. Comp. Architecture. 39--48."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/237578.237594"},{"key":"e_1_2_1_33_1","unstructured":"MPEG Software Simulation Group. 1994. MSSG MPEG2 encoder and decoder. http:\/\/www.mpeg.org\/MPEG\/MSSG\/.  MPEG Software Simulation Group. 1994. MSSG MPEG2 encoder and decoder. http:\/\/www.mpeg.org\/MPEG\/MSSG\/."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/237090.237142"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/305138.305148"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/339647.339685"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.19820"},{"key":"e_1_2_1_38_1","volume-title":"et al","author":"Reddy R.","year":"2001","unstructured":"Reddy , R. et al . 2001 . CMU SPHINX. http:\/\/www.speech.cs.cmu.edu\/sphinx\/. Reddy, R. et al. 2001. CMU SPHINX. http:\/\/www.speech.cs.cmu.edu\/sphinx\/."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/224056.224078"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2005.1430571"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/859618.859667"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006238"},{"key":"e_1_2_1_43_1","volume-title":"Proc. of the 8th Intl. Symp. on High-Perf. Comp. Architecture.","author":"Skadron K.","unstructured":"Skadron , K. , Abdelzaher , T. , and Stan , M. R . 2002. Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management . In Proc. of the 8th Intl. Symp. on High-Perf. Comp. Architecture. Skadron, K., Abdelzaher, T., and Stan, M. R. 2002. Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management. In Proc. of the 8th Intl. Symp. on High-Perf. Comp. Architecture."},{"key":"e_1_2_1_44_1","unstructured":"Stone J. E. 2003. Taychon raytracer. http:\/\/jedi.ks.uiuc.edu\/~johns\/raytracer\/.  Stone J. E. 2003. Taychon raytracer. http:\/\/jedi.ks.uiuc.edu\/~johns\/raytracer\/."},{"key":"e_1_2_1_45_1","volume-title":"Proc. of the 11th Intl. Conf. on Parallel and Distributed Systems.","author":"Tamaki Y.","year":"1999","unstructured":"Tamaki , Y. Sukegawa , N. , Ito , M. , 1999 . Node architecture and performance evaluation of the Hitachi super technical server SR8000 . In Proc. of the 11th Intl. Conf. on Parallel and Distributed Systems. Tamaki, Y. Sukegawa, N., Ito, M., et al. 1999. Node architecture and performance evaluation of the Hitachi super technical server SR8000. In Proc. of the 11th Intl. Conf. on Parallel and Distributed Systems."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1028176.1006733"},{"key":"e_1_2_1_47_1","volume-title":"Fast algorithms for the discrete cosine transform and for the discrete fourier transform","author":"Wang Z.","unstructured":"Wang , Z. 1984. Fast algorithms for the discrete cosine transform and for the discrete fourier transform . In IEEE Transactions in Acoustics, Speech, and Signal Processing . Vol. ASSP-32 . Wang, Z. 1984. Fast algorithms for the discrete cosine transform and for the discrete fourier transform. In IEEE Transactions in Acoustics, Speech, and Signal Processing. Vol. ASSP-32."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.966490"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1216544.1216546","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T20:52:24Z","timestamp":1672260744000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1216544.1216546"}},"subtitle":["Efficient support for all levels of parallelism for complex media applications"],"short-title":[],"issued":{"date-parts":[[2007,3]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,3]]}},"alternative-id":["10.1145\/1216544.1216546"],"URL":"https:\/\/doi.org\/10.1145\/1216544.1216546","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,3]]},"assertion":[{"value":"2007-03-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}