{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T13:25:37Z","timestamp":1760707537575},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2005,9,1]],"date-time":"2005-09-01T00:00:00Z","timestamp":1125532800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2005,9]]},"DOI":"10.1007\/s11227-005-0298-8","type":"journal-article","created":{"date-parts":[[2005,5,17]],"date-time":"2005-05-17T09:59:45Z","timestamp":1116323985000},"page":"197-226","source":"Crossref","is-referenced-by-count":6,"title":["Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs"],"prefix":"10.1007","volume":"33","author":[{"given":"Maria","family":"Athanasaki","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aristidis","family":"Sotiropoulos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Georgios","family":"Tsoukalas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nectarios","family":"Koziris","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Panayiotis","family":"Tsanakas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","reference":[{"key":"298_CR1","doi-asserted-by":"crossref","unstructured":"C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP), Williamsburg, VA, pp. 39\u201350, April 1991.","DOI":"10.1145\/109625.109631"},{"issue":"2","key":"298_CR2","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1006\/jpdc.1999.1530","volume":"57","author":"T. Andronikos","year":"1999","unstructured":"T. Andronikos, N. Koziris, G. Papakonstantinou, and P. Tsanakas. Optimal scheduling for UET\/UET-UCT generalized N-dimensional grid task graphs. Journal of Parallel and Distributed Computing 57(2):140\u2013165, 1999.","journal-title":"Journal of Parallel and Distributed Computing"},{"issue":"3","key":"298_CR3","first-page":"243","volume":"10","author":"H. R. Arabnia","year":"1996","unstructured":"H. R. Arabnia and S. M. Bhandarkar. Parallel stereocorrelation on a reconfigurable multi-ring network. Journal of Supercomputing (Kluwer Academin Publishers), Special Issue on Parallel and Distributed Processing 10(3):243\u2013270, 1996.","journal-title":"Journal of Supercomputing (Kluwer Academin Publishers), Special Issue on Parallel and Distributed Processing"},{"key":"298_CR4","doi-asserted-by":"crossref","unstructured":"S. Araki, A. Bilas, C. Dubnicki, J. Edler, K. Konishi, and J. Philbin. User-space communication: A quantitative study. In Proceedings of the 1998 Supercomputing Conference on High Performance Networking and Computing (SC98) Orlando, Florida, Nov. 1998.","DOI":"10.1109\/SC.1998.10038"},{"key":"298_CR5","doi-asserted-by":"crossref","unstructured":"M. Athanasaki, E. Koukis, and N. Koziris. Scheduling of tiled nested loops onto a cluster with a fixed number of SMP nodes. In Proceedings of the 12-th Euromicro Conference on Parallel, Distributed and Network based Processing (PDP04) IEEE Computer Society Press, A Coruna, Spain, pp. 424\u2013433, 2004.","DOI":"10.1109\/EMPDP.2004.1271475"},{"key":"298_CR6","doi-asserted-by":"crossref","unstructured":"M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, and N. Koziris. A pipelined execution of tiled nested loops on SMPs with computation and communication overlapping. In Proceedings of the Workshop on Compile\/Runtime Techniques for Parallel Computing, in conjunction with 2002 Int\u2019l Conference on Parallel Processing (ICPP-2002) Vancouver, Canada, pp. 559\u2013567, 2002.","DOI":"10.1109\/ICPPW.2002.1039778"},{"key":"298_CR7","doi-asserted-by":"crossref","unstructured":"M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, and N. Koziris. Pipelined scheduling of tiled nested loops onto clusters of SMPs using memory mapped network interfaces. In Proceedings of the 2002 ACM\/IEEE conference on Supercomputing (SC2002) IEEE Computer Society Press, Baltimore, Maryland, Nov. 2002.","DOI":"10.1109\/SC.2002.10008"},{"issue":"3","key":"298_CR8","doi-asserted-by":"crossref","first-page":"292","DOI":"10.1109\/71.584095","volume":"8","author":"S. M. Bhandarkar","year":"1997","unstructured":"S. M. Bhandarkar and H. R. Arabnia. Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans. on Parallel and Distributed Systems 8(3): 292\u2013310, 1997.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR9","doi-asserted-by":"crossref","unstructured":"A. Bilas, C. Liao, and J. P. Singh. Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems. In Proceedings of the 26th Int\u2019l Symposium on Computer Architecture ISCA-26 Atlanta, GA, pp. 282\u2013293, 1999.","DOI":"10.1109\/ISCA.1999.765958"},{"key":"298_CR10","unstructured":"M. Blumrich. Network Interface for Protected, User-Level Communication PhD thesis, Princeton University, April 1996."},{"issue":"1","key":"298_CR11","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1109\/40.342015","volume":"15","author":"N. J. Boden","year":"1995","unstructured":"N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W. Su. Myrinet. A Gigabit-per-second local area network. IEEE Micro 15(1):29\u201336, 1995.","journal-title":"IEEE Micro"},{"key":"298_CR12","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/0167-9260(94)90019-1","volume":"17","author":"P. Boulet","year":"1994","unstructured":"P. Boulet, A. Darte, T. Risset, and Y. Robert. (Pen)-ultimate tiling? INTEGRATION, The VLSI Jounal 17:33\u201351, 1994.","journal-title":"INTEGRATION, The VLSI Jounal"},{"key":"298_CR13","doi-asserted-by":"crossref","unstructured":"F. O. Carroll, H. Tezuka, A. Hori, and Y. Ishikawa. The design and implementation of zero copy MPI using commodity hardware with a high performance network. In Proceedings of the Int\u2019l Conference on Supercomputing Melbourne, Australia, pp. 243\u2013249, 1998.","DOI":"10.1145\/277830.277883"},{"key":"298_CR14","doi-asserted-by":"crossref","unstructured":"F. T. Chong, R. Barua, F. Dahlgren, J. Kubiatowicz, and A. Agarwal. The sensitivity of communication mechanisms to bandwidth and latency. In Proceedings of the HPCA-4 High Performance Communication Architectures, pp. 37\u201346, 1998.","DOI":"10.1109\/HPCA.1998.650544"},{"key":"298_CR15","unstructured":"Compaq, Intel, and Microsoft. Virtual Interface Architecture Specification Dec. 1997."},{"key":"298_CR16","first-page":"167","volume":"14","author":"F. Desprez","year":"1997","unstructured":"F. Desprez, J. Dongarra, and Y. Robert. Determining the idle time of a tiling: New results. Journal of Information Science and Engineering 14:167\u2013190, 1997.","journal-title":"Journal of Information Science and Engineering"},{"key":"298_CR17","doi-asserted-by":"crossref","unstructured":"I. Drossitis, G. Goumas, N. Koziris, G. Papakonstantinou, and P. Tsanakas. Evaluation of loop grouping methods based on orthogonal projection spaces. In Proceedings of the Int\u2019l Conference on Parallel Processing Toronto, Canada, pp. 469\u2013476, Aug. 2000.","DOI":"10.1109\/ICPP.2000.876163"},{"key":"298_CR18","doi-asserted-by":"crossref","unstructured":"T. Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: A user-level network interface for parallel and distributed computing. In Proceedings of the 15th ACM Symposium on Operating System Principles Copper Mountain, Colorado, pp. 40\u201353, Dec. 1995.","DOI":"10.1145\/224056.224061"},{"key":"298_CR19","unstructured":"K. Ghouas, K. Omang, and H. Bugge. VIA over SCI\u2014Consequences of a zero copy implementation and comparison with VIA over myrinet. In Proceedings of the Workshop on Communication Architecture for Clusters (CAC\u2019 2001) in Conjunction with Int\u2019l Parallel and Distributed Processing Symposium (IPDPS \u201801) San Francisco, April 2001."},{"key":"298_CR20","unstructured":"F. Giacomini, T. Amundsen, A. Bogaerts, R. Hauser, B. Johnsen, H. Kohmann, R. Nordstrom, and P. Werner. Low Level SCI software functional specification-Software Infrastructure for SCI. ESPRIT Project 23174."},{"issue":"10","key":"298_CR21","doi-asserted-by":"crossref","first-page":"1021","DOI":"10.1109\/TPDS.2003.1239870","volume":"14","author":"G. Goumas","year":"2003","unstructured":"G. Goumas, M. Athanasaki, and N. Koziris. An efficient code generation technique for tiled iteration spaces. IEEE Trans. on Parallel and Distributed Systems 14(10):1021\u20131034, 2003.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR22","doi-asserted-by":"crossref","unstructured":"G. Goumas, A. Sotiropoulos, and N. Koziris. Minimizing completion time for loop tiling with computation and communication overlapping. In Proceedings of IEEE Int\u2019l Parallel and Distributed Processing Symposium (IPDPS\u201901) San Francisco, April 2001.","DOI":"10.1109\/IPDPS.2001.924976"},{"key":"298_CR23","doi-asserted-by":"crossref","unstructured":"H. Hellwagner. The SCI standard and applications of SCI. In H. Hellwagner and A. Reinefield, eds., Scalable Coherent Interface (SCI): Architecture and Software for High-Performance Computer Clusters Springer-Verlag, pp. 3\u201334, Sept. 1999.","DOI":"10.1007\/10704208_2"},{"issue":"5","key":"298_CR24","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1109\/71.679213","volume":"9","author":"E. Hodzic","year":"1998","unstructured":"E. Hodzic and W. Shang. On supernode transformation with minimized total running time. IEEE Trans. on Parallel and Distributed Systems 9(5):417\u2013428, 1998.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"issue":"12","key":"298_CR25","doi-asserted-by":"crossref","first-page":"1220","DOI":"10.1109\/TPDS.2002.1158261","volume":"13","author":"E. Hodzic","year":"2002","unstructured":"E. Hodzic and W. Shang. On time optimal supernode shape. IEEE Trans. on Parallel and Distributed Systems 13(12):1220\u20131233, 2002.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR26","doi-asserted-by":"crossref","unstructured":"K. Hogstedt, L. Carter, and J. Ferrante. Determining the idle time of a tiling. In Principles of Programming Languages (POPL) pp. 160\u2013173, Jan. 1997.","DOI":"10.1145\/263699.263716"},{"key":"298_CR27","doi-asserted-by":"crossref","unstructured":"K. Hogstedt, L. Carter, and J. Ferrante. Selecting tile shape for minimal execution time. In ACM Symposium on Parallel Algorithms and Architectures pp. 201\u2013211, 1999.","DOI":"10.1145\/305619.305641"},{"issue":"3","key":"298_CR28","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1109\/TPDS.2003.1189587","volume":"14","author":"K. Hogstedt","year":"2003","unstructured":"K. Hogstedt, L. Carter, and J. Ferrante. On the parallel execution time of tiled loops. IEEE Trans. on Parallel and Distributed Systems 14(3):307\u2013321, 2003.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR29","doi-asserted-by":"crossref","unstructured":"F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the 15th Ann. ACM SIGACT-SIGPLAN Symp. Principles of Programming Languages San Diego, California, pp. 319\u2013329, 1988.","DOI":"10.1145\/73560.73588"},{"issue":"2","key":"298_CR30","first-page":"159","volume":"48","author":"M. Kandemir","year":"1999","unstructured":"M. Kandemir, J. Ramanujam, and A. Choudary. Improving cache locality by a combination of loop and data transformations. IEEE Trans. on Parallel and Distributed Systems 48(2):159\u2013167, 1999.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR31","doi-asserted-by":"crossref","unstructured":"V. Karamcheti and A. Chien. Software overhead in messaging layers: where does the time go? In Proceedings of the 6th Int\u2019l Conference on Architectural Support for Programming Languages and Operating Systems pp. 51\u201360, Oct. 1994.","DOI":"10.1145\/195473.195499"},{"issue":"4","key":"298_CR32","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1109\/71.97900","volume":"2","author":"C.-T. King","year":"1991","unstructured":"C.-T. King, W.-H. Chou, and L. Ni. Pipelined data-parallel algorithms: Part II design. IEEE Trans. on Parallel and Distributed Systems 2(4):430\u2013439, 1991.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR33","doi-asserted-by":"crossref","unstructured":"M. Lam, E. Rothberg, and M. Wolf. The cache performance and optimizations of blocked algorithms. In Second Int\u2019l Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) Santa Clara, California, pp. 63\u201374, April 1991.","DOI":"10.1145\/106972.106981"},{"issue":"3","key":"298_CR34","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1109\/71.914756","volume":"12","author":"N. Manjikian","year":"2001","unstructured":"N. Manjikian and T. S. Abdelrahman. Exploiting wavefront parallelism on large-scale shared-memory multiprocessors. IEEE Trans. on Parallel and Distributed Systems 12(3):259\u2013271, 2001.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR35","doi-asserted-by":"crossref","unstructured":"R. Martin, A. Vahdat, D. Culler, and T. Anderson. Effects of communication latency, overhead, and bandwidth in a cluster architecture. In Proceedings of Int\u2019l Symposium on Computer Architecture Denver, CO, June 1997.","DOI":"10.1145\/264107.264146"},{"issue":"7","key":"298_CR36","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1109\/TPDS.2003.1214317","volume":"14","author":"N. Park","year":"2003","unstructured":"N. Park, B. Hong, and V. Prasanna. Tiling, block data layout and memory hierarchy performance. IEEE Trans. on Parallel and Distributed Systems 14(7):640\u2013654, 2003.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR37","first-page":"364","volume-title":"Computer Organization & Design. The Hardware\/Software Interface","author":"D. Patterson","year":"1994","unstructured":"D. Patterson and J.Hennessy. Computer Organization & Design. The Hardware\/Software Interface Morgan Kaufmann Publishers, San Francisco, CA, pp. 364\u2013367, 1994."},{"key":"298_CR38","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1016\/0743-7315(92)90027-K","volume":"16","author":"J. Ramanujam","year":"1992","unstructured":"J. Ramanujam and P. Sadayappan. Tiling multidimensional iteration spaces for multicomputers. Journal of Parallel and Distributed Computing 16:108\u2013120, 1992.","journal-title":"Journal of Parallel and Distributed Computing"},{"key":"298_CR39","doi-asserted-by":"crossref","unstructured":"L. Renganarayana and S. Rajopadhye. A geometric programming framework for optimal multi-level tiling. In Proceedings of the 2004 ACM\/IEEE conference on Supercomputing (SC2004), Pittsburgh, PA USA, Nov. 2004.","DOI":"10.1109\/SC.2004.3"},{"key":"298_CR40","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1007\/BF01245404","volume":"9","author":"J.-P. Sheu","year":"1995","unstructured":"J.-P. Sheu and T.-S. Chen. Partitioning and mapping nested loops for linear array multicomputers. Journal of Supercomputing 9:183\u2013202, 1995.","journal-title":"Journal of Supercomputing"},{"issue":"4","key":"298_CR41","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1109\/71.97900","volume":"2","author":"J.-P. Sheu","year":"1991","unstructured":"J.-P. Sheu and T.-H. Tai. Partitioning and mapping nested loops on multiprocessor systems. IEEE Trans. on Parallel and Distributed Systems 2(4):430\u2013439, 1991.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR42","doi-asserted-by":"crossref","unstructured":"P. Shivam, P. Wyckoff, and D. Panda. EMP: Zero-copy OS-bypass NIC-driven gigabit ethernet message passing. In Proceedings of the ACM Supercomputing 2001 (SC2001) Denver, CO, USA, Nov. 2001.","DOI":"10.1145\/582034.582091"},{"key":"298_CR43","doi-asserted-by":"crossref","unstructured":"A. Sotiropoulos, G. Tsoukalas, and N. Koziris. Enhancing the performance of tiled loop execution onto clusters using memory mapped network interfaces and pipelined schedules. In Proceedings of the 2002 Workshop on Communication Architecture for Clusters (CAC\u201902), Int\u2019l Parallel and Distributed Processing Symposium (IPDPS\u201902) Fort Lauderdale, Florida, April 2002.","DOI":"10.1109\/IPDPS.2002.1016567"},{"key":"298_CR44","doi-asserted-by":"crossref","unstructured":"H. Tezuka, F. Carroll, A. Hori, and Y. Ishikawa. Pin-down cache: A virtual memory management technique for zero-copy communication. In Proceedings of 12th Int\u2019l Parallel Processing Symposium Orlando, FL, pp. 308\u2013314, March 1998.","DOI":"10.1109\/IPPS.1998.669932"},{"issue":"9","key":"298_CR45","doi-asserted-by":"crossref","first-page":"941","DOI":"10.1109\/71.879777","volume":"11","author":"P. Tsanakas","year":"2000","unstructured":"P. Tsanakas, N. Koziris, and G. Papakonstantinou. Chain grouping: A method for partitioning loops onto mesh-connected processor arrays. IEEE Trans. on Parallel and Distributed Systems 11(9):941\u2013955, 2000.","journal-title":"IEEE Trans. on Parallel and Distributed Systems"},{"key":"298_CR46","doi-asserted-by":"crossref","unstructured":"R. Wang, A. Krishnamurthy, R. Martin, T. Anderson, and D. Culler. Modeling communication pipeline latency. In Proceedings of SIGMETRICS \u201898\/PERFORMANCE \u201898 Conference on the Measurement and Modeling of Computer Systems June 1998.","DOI":"10.1145\/277851.277867"},{"issue":"1","key":"298_CR47","doi-asserted-by":"crossref","first-page":"42","DOI":"10.1006\/jpdc.1997.1310","volume":"42","author":"J. Xue","year":"1997","unstructured":"J. Xue. Communication-minimal tiling of uniform dependence loops. Journal of Parallel and Distributed Computing 42(1):42\u201359, 1997.","journal-title":"Journal of Parallel and Distributed Computing"},{"issue":"4","key":"298_CR48","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1142\/S0129626497000401","volume":"7","author":"J. Xue","year":"1997","unstructured":"J. Xue. On tiling as a loop transformation. Parallel Processing Letters 7(4):409\u2013424, 1997.","journal-title":"Parallel Processing Letters"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-005-0298-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s11227-005-0298-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-005-0298-8","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,4,7]],"date-time":"2020-04-07T10:31:50Z","timestamp":1586255510000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s11227-005-0298-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,9]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2005,9]]}},"alternative-id":["298"],"URL":"https:\/\/doi.org\/10.1007\/s11227-005-0298-8","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"value":"0920-8542","type":"print"},{"value":"1573-0484","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,9]]}}}