{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:39:51Z","timestamp":1750307991298,"version":"3.41.0"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2006,4,1]],"date-time":"2006-04-01T00:00:00Z","timestamp":1143849600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2006,4]]},"abstract":"<jats:p>Traditional full-featured operating systems are known to have properties that limit the scalability of distributed memory parallel programs, the most common programming paradigm utilized in high end computing. Furthermore, as processor counts increase with the most capable systems, the necessary activity to manage the system becomes more of a burden. To make a general purpose operating system scale to such levels, new technology is required for parallel resource management and global system management (including fault management). In this paper, we describe the shortcomings of full-featured operating systems and runtime systems and discuss an approach to scale such systems to one hundred thousand processors with both scalable parallel application performance and efficient system management.<\/jats:p>","DOI":"10.1145\/1131322.1131334","type":"journal-article","created":{"date-parts":[[2006,7,24]],"date-time":"2006-07-24T17:00:26Z","timestamp":1153760426000},"page":"43-49","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["HPC-Colony"],"prefix":"10.1145","volume":"40","author":[{"given":"Sayantan","family":"Chakravorty","sequence":"first","affiliation":[{"name":"University of Illinois"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Celso L.","family":"Mendes","sequence":"additional","affiliation":[{"name":"University of Illinois"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Laxmikant V.","family":"Kal\u00e9","sequence":"additional","affiliation":[{"name":"University of Illinois"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Terry","family":"Jones","sequence":"additional","affiliation":[{"name":"Lawrence Livermore National Lab."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Tauferner","sequence":"additional","affiliation":[{"name":"IBM"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Todd","family":"Inglett","sequence":"additional","affiliation":[{"name":"IBM"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jos\u00e9","family":"Moreira","sequence":"additional","affiliation":[{"name":"IBM"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2006,4]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"306","volume-title":"Adaptive MPI,\" in Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC","author":"Huang C.","year":"2003","unstructured":"C. Huang , O. Lawlor , and L. V. Kal\u00e9 , \" Adaptive MPI,\" in Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003 ), LNCS 2958, (College Station , Texas), pp. 306 -- 322 , October 2003. C. Huang, O. Lawlor, and L. V. Kal\u00e9, \"Adaptive MPI,\" in Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), LNCS 2958, (College Station, Texas), pp. 306--322, October 2003."},{"key":"e_1_2_1_2_1","volume-title":"NAMD: Biomolecular simulation on thousands of processors,\" in Proceedings of SC","author":"Phillips J. C.","year":"2002","unstructured":"J. C. Phillips , G. Zheng , S. Kumar , and L. V. Kal\u00e9 , \" NAMD: Biomolecular simulation on thousands of processors,\" in Proceedings of SC 2002 , ( Baltimore , MD ), September 2002. J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kal\u00e9, \"NAMD: Biomolecular simulation on thousands of processors,\" in Proceedings of SC 2002, (Baltimore, MD), September 2002."},{"key":"e_1_2_1_3_1","first-page":"167","volume-title":"Handling application-induced load imbalance using parallel objects,\" in Parallel and Distributed Computing for Symbolic and Irregular Applications","author":"Brunner R. K.","year":"2000","unstructured":"R. K. Brunner and L. V. Kal\u00e9 , \" Handling application-induced load imbalance using parallel objects,\" in Parallel and Distributed Computing for Symbolic and Irregular Applications , pp. 167 -- 181 , World Scientific Publishing , 2000 . R. K. Brunner and L. V. Kal\u00e9, \"Handling application-induced load imbalance using parallel objects,\" in Parallel and Distributed Computing for Symbolic and Irregular Applications, pp. 167--181, World Scientific Publishing, 2000."},{"key":"e_1_2_1_5_1","volume-title":"Topology-aware task mapping for reducing communication contention on large parallel machines,\" in Proceedings of IEEE International Parallel and Distributed Processing Symposium","author":"Agarwal T.","year":"2006","unstructured":"T. Agarwal , A. Sharma , and L. V. Kal\u00e9 , \" Topology-aware task mapping for reducing communication contention on large parallel machines,\" in Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006 , April 2006. T. Agarwal, A. Sharma, and L. V. Kal\u00e9, \"Topology-aware task mapping for reducing communication contention on large parallel machines,\" in Proceedings of IEEE International Parallel and Distributed Processing Symposium 2006, April 2006."},{"key":"e_1_2_1_6_1","volume-title":"of Computer Science","author":"Huang C.","year":"2004","unstructured":"C. Huang , \"System support for checkpoint and restart of charm++ and ampi applications,\" Master's thesis, Dept. of Computer Science , University of Illinois , 2004 . C. Huang, \"System support for checkpoint and restart of charm++ and ampi applications,\" Master's thesis, Dept. of Computer Science, University of Illinois, 2004."},{"key":"e_1_2_1_7_1","volume-title":"CA)","author":"Zheng G.","year":"2004","unstructured":"G. Zheng , L. Shi , and L. V. Kal\u00e9 , \" Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi,\" in 2004 IEEE International Conference on Cluster Computing, (San Dieago , CA) , September 2004 . G. Zheng, L. Shi, and L. V. Kal\u00e9, \"Ftc-charm++: An in-memory checkpoint-based fault tolerant runtime for charm++ and mpi,\" in 2004 IEEE International Conference on Cluster Computing, (San Dieago, CA), September 2004."},{"key":"e_1_2_1_8_1","volume-title":"A fault tolerant protocol for massively parallel machines,\" in FTPDS Workshop for IPDPS","author":"Chakravorty S.","year":"2004","unstructured":"S. Chakravorty and L. V. Kale , \" A fault tolerant protocol for massively parallel machines,\" in FTPDS Workshop for IPDPS 2004 , IEEE Press , 2004. S. Chakravorty and L. V. Kale, \"A fault tolerant protocol for massively parallel machines,\" in FTPDS Workshop for IPDPS 2004, IEEE Press, 2004."},{"key":"e_1_2_1_9_1","volume-title":"October","author":"Apparao P.","year":"2004","unstructured":"P. Apparao and G. Averill , \" Firmware-based platform reliability.\" Intel white paper , October 2004 . P. Apparao and G. Averill, \"Firmware-based platform reliability.\" Intel white paper, October 2004."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956799"},{"key":"e_1_2_1_13_1","first-page":"68","article-title":"Improving application performance on hpc systems with process synchronization","author":"Terry P.","year":"2004","unstructured":"P. Terry , A. Shan , and P. Huttunen , \" Improving application performance on hpc systems with process synchronization ,\" Linux Journal , pp. 68 -- 73 , November 2004 . P. Terry, A. Shan, and P. Huttunen, \"Improving application performance on hpc systems with process synchronization,\" Linux Journal, pp. 68--73, November 2004.","journal-title":"Linux Journal"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050161"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.492.0367"},{"key":"e_1_2_1_17_1","first-page":"667","article-title":"Models for dynamic load balancing in homogeneous multiple processor systems","volume":"36","author":"Chow Y.-C.","year":"1982","unstructured":"Y.-C. Chow and W. H. Kohler , \" Models for dynamic load balancing in homogeneous multiple processor systems ,\" in IEEE Transactions on Computers , vol. c- 36 , pp. 667 -- 679 , May 1982 . Y.-C. Chow and W. H. Kohler, \"Models for dynamic load balancing in homogeneous multiple processor systems,\" in IEEE Transactions on Computers, vol. c-36, pp. 667--679, May 1982.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1985.232489"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/4434.749133"},{"key":"e_1_2_1_20_1","volume-title":"Conf. on Distributed Computing Systems","author":"Ha'c A.","year":"1987","unstructured":"A. Ha'c and X. Jin , \" Dynamic Load Balancing in Distributed System Using a Decentralized Algorithm,\" in Proc. of 7-th Intl . Conf. on Distributed Computing Systems , April 1987 . A. Ha'c and X. Jin, \"Dynamic Load Balancing in Distributed System Using a Decentralized Algorithm,\" in Proc. of 7-th Intl. Conf. on Distributed Computing Systems, April 1987."},{"key":"e_1_2_1_21_1","first-page":"230","volume-title":"CA.)","author":"Sinha A.","year":"1993","unstructured":"A. Sinha and L. Kal\u00e9 , \" A load balancing strategy for prioritized execution of tasks,\" in International Parallel Processing Symposium, (New Port Beach , CA.) , pp. 230 -- 237 , April 1993 . A. Sinha and L. Kal\u00e9, \"A load balancing strategy for prioritized execution of tasks,\" in International Parallel Processing Symposium, (New Port Beach, CA.), pp. 230--237, April 1993."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.243526"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0307-904X(00)00043-3"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apnum.2004.08.028"},{"key":"e_1_2_1_25_1","unstructured":"P. Colella D. Graves T. Ligocki D. Martin D. Modiano D. Serafini and B. Van Straalen \"Chombo Software Package for AMR Applications Design Document \" 2003. http:\/\/seesar.lbl.gov\/anag\/chombo\/ChomboDesign-1.4. pdf. P. Colella D. Graves T. Ligocki D. Martin D. Modiano D. Serafini and B. Van Straalen \"Chombo Software Package for AMR Applications Design Document \" 2003. http:\/\/seesar.lbl.gov\/anag\/chombo\/ChomboDesign-1.4. pdf."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/62297.62323"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1987.1676922"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.16495"},{"key":"e_1_2_1_29_1","first-page":"526","volume-title":"Checkpointing and process migration for MPI,\" in Proceedings of the 10th International Parallel Processing Symposium","author":"Stellner G.","year":"1996","unstructured":"G. Stellner , \"CoCheck : Checkpointing and process migration for MPI,\" in Proceedings of the 10th International Parallel Processing Symposium , pp. 526 -- 531 , 1996 . G. Stellner, \"CoCheck: Checkpointing and process migration for MPI,\" in Proceedings of the 10th International Parallel Processing Symposium, pp. 526--531, 1996."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1023540604208"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/509593.509626"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3959.3962"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004046052"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"R. Batchu A. Skjellum Z. Cui M. Beddhu J. P. Neelamegam Y. Dandass and M. Apte \"Mpi\/fttm: Architecture and taxonomies for fault-tolerant message-passing middleware for performance-portable parallel computing \" in Proceedings of the 1st International Symposium on Cluster Computing and the Grid p. 26 IEEE Computer Society 2001. R. Batchu A. Skjellum Z. Cui M. Beddhu J. P. Neelamegam Y. Dandass and M. Apte \"Mpi\/fttm: Architecture and taxonomies for fault-tolerant message-passing middleware for performance-portable parallel computing \" in Proceedings of the 1st International Symposium on Cluster Computing and the Grid p. 26 IEEE Computer Society 2001.","DOI":"10.1109\/CCGRID.2001.923171"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626400000342"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050176"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.142678"},{"key":"e_1_2_1_38_1","volume-title":"Proactive fault tolerance in MPI applications via task migration","author":"Chakravorty S.","year":"2006","unstructured":"S. Chakravorty , C. L. Mendes , and L. V. Kal\u00e9 , \" Proactive fault tolerance in MPI applications via task migration ,\" 2006 . Submitted to publication. S. Chakravorty, C. L. Mendes, and L. V. Kal\u00e9, \"Proactive fault tolerance in MPI applications via task migration,\" 2006. Submitted to publication."},{"key":"e_1_2_1_39_1","unstructured":"J. K. Ousterhout \"Scheduling techniques for concurrent systems \" in Third International Conference on Distributed Computing Systems pp. 22--30 May 1982. J. K. Ousterhout \"Scheduling techniques for concurrent systems \" in Third International Conference on Distributed Computing Systems pp. 22--30 May 1982."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050204"},{"key":"e_1_2_1_42_1","volume-title":"Support for simultaneous multiple substrate performance monitoring","author":"London K.","year":"2005","unstructured":"K. London , S. Moore , D. Terpstra , and J. Dongarra , \" Support for simultaneous multiple substrate performance monitoring ,\" October 2005 . Poster Session at LACSI Symposium 2005. K. London, S. Moore, D. Terpstra, and J. Dongarra, \"Support for simultaneous multiple substrate performance monitoring,\" October 2005. Poster Session at LACSI Symposium 2005."}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1131322.1131334","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1131322.1131334","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T15:06:16Z","timestamp":1750259176000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1131322.1131334"}},"subtitle":["services and interfaces for very large systems"],"short-title":[],"issued":{"date-parts":[[2006,4]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2006,4]]}},"alternative-id":["10.1145\/1131322.1131334"],"URL":"https:\/\/doi.org\/10.1145\/1131322.1131334","relation":{},"ISSN":["0163-5980"],"issn-type":[{"type":"print","value":"0163-5980"}],"subject":[],"published":{"date-parts":[[2006,4]]},"assertion":[{"value":"2006-04-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}