{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T09:18:10Z","timestamp":1766135890318},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2007,10,6]],"date-time":"2007-10-06T00:00:00Z","timestamp":1191628800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2008,4]]},"DOI":"10.1007\/s11227-007-0149-x","type":"journal-article","created":{"date-parts":[[2007,10,4]],"date-time":"2007-10-04T22:17:04Z","timestamp":1191536224000},"page":"64-97","source":"Crossref","is-referenced-by-count":11,"title":["Exploring the performance limits of simultaneous multithreading for memory intensive applications"],"prefix":"10.1007","volume":"44","author":[{"given":"Evangelia","family":"Athanasaki","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikos","family":"Anastopoulos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kornilios","family":"Kourtis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nectarios","family":"Koziris","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2007,10,6]]},"reference":[{"key":"149_CR1","unstructured":"Omni OpenMP compiler project (2003) Released in the international conference for high performance computing, networking and storage (SC\u201903), November 2003"},{"key":"149_CR2","doi-asserted-by":"crossref","unstructured":"Athanasaki E, Koziris N (2004) Fast indexing for blocked array layouts to improve multi-level cache locality. In: Proceedings of the 8th workshop on interaction between compilers and computer architectures (INTERACT\u201904), held in conjunction with HPCA-10, Madrid, Spain, February 2004, pp\u00a0109\u2013119","DOI":"10.1109\/INTERA.2004.1299515"},{"key":"149_CR3","doi-asserted-by":"crossref","DOI":"10.1137\/1.9781611971538","volume-title":"Templates for the solution of linear systems: building blocks for iterative methods","author":"R Barrett","year":"1994","unstructured":"Barrett R, Berry M, Chan T, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, van der Vorst H (1994) Templates for the solution of linear systems: building blocks for iterative methods. SIAM, Philadelphia"},{"key":"149_CR4","unstructured":"Bulpin J, Pratt I (2004) Multiprogramming performance of the Pentium 4 with hyper-threading. In: Proceedings of the third annual workshop on duplicating, deconstructing and debunking (WDDD 2004) held in conjunction with ISCA 04, Munich, Germany, June 2004, p\u00a05362"},{"key":"149_CR5","doi-asserted-by":"crossref","unstructured":"Collins J, Wang H, Tullsen D, Hughes C, Lee Y-F, Lavery D, Shen J (2001) Speculative precomputation: long-range prefetching of delinquent loads. In Proceedings of the 28th annual international symposium on computer architecture (ISCA \u201901), G\u00f6teborg, Sweden, July 2001, pp\u00a014\u201325","DOI":"10.1145\/379240.379248"},{"key":"149_CR6","volume-title":"Introduction to algorithms","author":"T Cormen","year":"2001","unstructured":"Cormen T, Leiserson C, Rivest R (2001) Introduction to algorithms. MIT Press, Cambridge"},{"key":"149_CR7","doi-asserted-by":"crossref","unstructured":"Curtis-Maury M, Wang T, Antonopoulos C, Nikolopoulos D (2005) Integrating multiple forms of multithreaded execution on multi-SMT systems: a study with scientific applications. In: ICQES","DOI":"10.1109\/QEST.2005.16"},{"key":"149_CR8","unstructured":"Drepper U (2005) Futexes are tricky. December 2005"},{"key":"149_CR9","unstructured":"Intel Corporation. IA-32 Intel architecture optimization. Order Number: 248966-011"},{"key":"149_CR10","unstructured":"Intel Corporation (2001) Using spin-loops on Intel Pentium 4 processor and Intel Xeon processor. Order Number: 248674-002, May 2001"},{"key":"149_CR11","unstructured":"Kim D, Liao S-W, Wang P, del Cuvillo J, Tian X, Zou X, Wang H, Yeung D, Girkar M, Shen J (2004) Physical experimentation with prefetching helper threads on Intel\u2019s hyper-threaded processors. In: Proceedings of the 2nd IEEE\/ACM international symposium on code generation and optimization (CGO 2004), San Jose, CA, March 2004, pp\u00a027\u201338"},{"issue":"3","key":"149_CR12","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1145\/263326.263382","volume":"15","author":"J Lo","year":"1997","unstructured":"Lo J, Eggers S, Emer J, Levy H, Stamm R, Tullsen D (1997) Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading. ACM Trans Comput Syst 15(3):322\u2013354","journal-title":"ACM Trans Comput Syst"},{"key":"149_CR13","doi-asserted-by":"crossref","unstructured":"Lo J, Eggers S, Levy H, Parekh S, Tullsen D (1997) Tuning compiler optimizations for simultaneous multithreading. In: Proceedings of the 30th annual ACM\/IEEE international symposium on microarchitecture (MICRO-30), Research Triangle Park, NC, December 1997, pp\u00a0114\u2013124","DOI":"10.1109\/MICRO.1997.645803"},{"key":"149_CR14","doi-asserted-by":"crossref","unstructured":"Luk C-K (2001) Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors. In: Proceedings of the 28th annual international symposium on computer architecture (ISCA \u201901), G\u00f6teborg, Sweden, July 2001, pp\u00a040\u201351","DOI":"10.1145\/379240.379250"},{"issue":"6","key":"149_CR15","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1145\/1064978.1065034","volume":"40","author":"C-K Luk","year":"2005","unstructured":"Luk C-K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) In: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not 40(6):190\u2013200","journal-title":"SIGPLAN Not"},{"key":"149_CR16","doi-asserted-by":"crossref","unstructured":"Luk C-K, Mowry T (1996) Compiler-based prefetching for recursive data structures. In: Proceedings of the 7th international conference on architectural support for programming languages and operating systems (ASPLOS-VII), Boston, MA, October 1996, pp\u00a0222\u2013233","DOI":"10.1145\/237090.237190"},{"issue":"2","key":"149_CR17","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1109\/12.752654","volume":"48","author":"C-K Luk","year":"1999","unstructured":"Luk C-K, Mowry T (1999) Automatic compiler-inserted prefetching for pointer-based applications. IEEE Trans Comput 48(2):134\u2013141","journal-title":"IEEE Trans Comput"},{"key":"149_CR18","first-page":"4","volume":"6","author":"D Marr","year":"2002","unstructured":"Marr D, Binns F, Hill D, Hinton G, Koufaty D, Miller JA, Upton M (2002) Hyper-threading technology architecture and microarchitecture. Intel Technol J 6:4\u201315","journal-title":"Intel Technol J"},{"key":"149_CR19","doi-asserted-by":"crossref","unstructured":"Mitchell N, Carter L, Ferrante J, Tullsen D (1999) ILP versus TLP on SMT. In: Proceedings of the 1999 ACM\/IEEE conference on supercomputing (CDROM), November 1999","DOI":"10.1145\/331532.331569"},{"issue":"1","key":"149_CR20","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1145\/273011.273021","volume":"16","author":"T Mowry","year":"1998","unstructured":"Mowry T (1998) Tolerating latency in multiprocessors through compiler-inserted prefetching. ACM Trans Comput Syst 16(1):55\u201392","journal-title":"ACM Trans Comput Syst"},{"key":"149_CR21","first-page":"62","volume-title":"Design and evaluation of a compiler algorithm for prefetching","author":"T Mowry","year":"1992","unstructured":"Mowry T, Lam M, Gupta A (1992) Design and evaluation of a compiler algorithm for prefetching. In: ASPLOS-V: proceedings of the fifth international conference on architectural support for programming languages and operating systems, New York, NY, USA. ACM Press, New York, pp\u00a062\u201373"},{"key":"149_CR22","unstructured":"Nethercote N, Seward J (2003) Valgrind: a program supervision framework. In: Proceedings of the 3rd workshop on runtime verification (RV\u201903), Boulder, CO, July 2003"},{"key":"149_CR23","volume-title":"Computer architecture. A\u00a0quantitative approach","author":"D Patterson","year":"2003","unstructured":"Patterson D, Hennessy J (2003) Computer architecture. A\u00a0quantitative approach, 3rd edn. Kaufmann, Los Altos","edition":"3"},{"key":"149_CR24","doi-asserted-by":"crossref","unstructured":"Roth A, Sohi G (2001) Speculative data-driven Multithreading. In: Proceedings of the 7th international symposium on high performance computer architecture (HPCA \u201901), Nuevo Leone, Mexico, January 2001, pp\u00a037\u201348","DOI":"10.1109\/HPCA.2001.903250"},{"key":"149_CR25","volume-title":"Database systems concepts","author":"A Silberschatz","year":"2001","unstructured":"Silberschatz A, Korth H, Sudarshan S (2001) Database systems concepts, 4th edn. McGraw\u2013Hill\/Higher Education, New York","edition":"4"},{"key":"149_CR26","doi-asserted-by":"crossref","unstructured":"Sundaramoorthy K, Purser Z, Rotenberg E (2000) Slipstream processors: improving both performance and fault tolerance. In: Proceddings of the 9th international conference on architectural support for programming languages and operating systems (ASPLOS IX), Cambridge, MA, November 2000, pp\u00a0257\u2013268","DOI":"10.1145\/378993.379247"},{"key":"149_CR27","doi-asserted-by":"crossref","unstructured":"Temam O, Granston E, Jalby W (1993) To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of the 1993 ACM\/IEEE conference on supercomputing (SC\u201993), Portland, OR, November 1993, pp\u00a0410\u2013419","DOI":"10.1145\/169627.169762"},{"key":"149_CR28","doi-asserted-by":"crossref","unstructured":"Tuck N, Tullsen D (2003) Initial observations of the simultaneous multithreading Pentium 4 processor. In: Proceedings of the 12th international conference on parallel architectures and compilation techniques (PACT \u201903), New Orleans, LA, September 2003","DOI":"10.1109\/PACT.2003.1237999"},{"key":"149_CR29","doi-asserted-by":"crossref","unstructured":"Tullsen D, Eggers S, Emer J, Levy H, Lo J, Stamm R (1996) Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor. In: Proceedings of the 23rd annual international symposium on computer architecture (ISCA \u201996), Philadelphia, PA, May 1996, pp\u00a0191\u2013202","DOI":"10.1145\/232973.232993"},{"key":"149_CR30","doi-asserted-by":"crossref","unstructured":"Tullsen D, Eggers S, Levy H (1995) Simultaneous multithreading: maximizing on-chip parallelism. In: Proceedings of the 22nd annual international symposium on computer architecture (ISCA \u201995), Santa Margherita Ligure, Italy, June 1995, pp\u00a0392\u2013403","DOI":"10.1145\/223982.224449"},{"issue":"1","key":"149_CR31","first-page":"22","volume":"6","author":"H Wang","year":"2002","unstructured":"Wang H, Wang P, Weldon RD, Ettinger S, Saito H, Girkar M, Shih S, Liao W, Shen J (2002) Speculative precomputation: exploring the use of multithreading for latency. Intel Technol J 6(1):22\u201335","journal-title":"Intel Technol J"},{"key":"149_CR32","doi-asserted-by":"crossref","unstructured":"Wang T, Blagojevic F, Nikolopoulos D (2004) Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors. In: Proceddings of the 7th ACM SIGPLAN workshop on languages, compilers, and runtime support for scalable systems (LCR\u20192004), Houston, TX, October 2004","DOI":"10.1145\/1066650.1066667"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-007-0149-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11227-007-0149-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-007-0149-x","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,6,1]],"date-time":"2019-06-01T10:23:56Z","timestamp":1559384636000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11227-007-0149-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,10,6]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,4]]}},"alternative-id":["149"],"URL":"https:\/\/doi.org\/10.1007\/s11227-007-0149-x","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,10,6]]}}}