{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T12:11:33Z","timestamp":1763467893613,"version":"3.41.0"},"reference-count":74,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2008,9,1]],"date-time":"2008-09-01T00:00:00Z","timestamp":1220227200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["ACI-0324974CNS-0305606"],"award-info":[{"award-number":["ACI-0324974CNS-0305606"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007523","name":"Advanced Cyberinfrastructure","doi-asserted-by":"publisher","award":["ACI-0324974CNS-0305606"],"award-info":[{"award-number":["ACI-0324974CNS-0305606"]}],"id":[{"id":"10.13039\/100007523","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Comput. Syst."],"published-print":{"date-parts":[[2008,9]]},"abstract":"<jats:p>Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on its allotted processors. We present a randomized work-stealing thread scheduler for fork-join multithreaded jobs that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. Assuming that the job scheduler never allots a job more processors than requested by the job's thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors.<\/jats:p>\n          <jats:p>\n            We model the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the operating environment as well as to the job scheduler's administrative policies. For example, the job scheduler might make a large number of processors available exactly when the job has little use for them. To analyze the performance of our adaptive thread scheduler under this stringent adversarial assumption, we introduce a new technique called\n            <jats:italic>trim analysis,<\/jats:italic>\n            which allows us to prove that our thread scheduler performs poorly on no more than a small number of time steps, exhibiting near-optimal behavior on the vast majority.\n          <\/jats:p>\n          <jats:p>\n            More precisely, suppose that a job has work\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>1<\/jats:sub>\n            and span\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>\u221e<\/jats:sub>\n            . On a machine with\n            <jats:italic>P<\/jats:italic>\n            processors, A-STEAL completes the job in an expected duration of\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>1<\/jats:sub>\n            \/\n            <jats:italic>P\u02dc<\/jats:italic>\n            +\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>\u221e<\/jats:sub>\n            +\n            <jats:italic>L<\/jats:italic>\n            lg\n            <jats:italic>P<\/jats:italic>\n            ) time steps, where\n            <jats:italic>L<\/jats:italic>\n            is the length of a scheduling quantum, and\n            <jats:italic>P\u02dc<\/jats:italic>\n            denotes the\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>\u221e<\/jats:sub>\n            +\n            <jats:italic>L<\/jats:italic>\n            lg\n            <jats:italic>P<\/jats:italic>\n            )-trimmed availability. This quantity is the average of the processor availability over all time steps except the\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>\u221e<\/jats:sub>\n            +\n            <jats:italic>L<\/jats:italic>\n            lg\n            <jats:italic>P<\/jats:italic>\n            ) time steps that have the highest processor availability. When the job's parallelism dominates the trimmed availability, that is,\n            <jats:italic>P\u02dc<\/jats:italic>\n            &lt;\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>1<\/jats:sub>\n            \/\n            <jats:italic>T<\/jats:italic>\n            <jats:sub>\u221e<\/jats:sub>\n            , the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly the length of its span, which is optimal.\n          <\/jats:p>\n          <jats:p>We measured the performance of A-STEAL on a simulated multiprocessor system using synthetic workloads. For jobs with sufficient parallelism, our experiments confirm that A-STEAL provides almost perfect linear speedup across a variety of processor availability profiles. We compared A-STEAL with the ABP algorithm, an adaptive work-stealing thread scheduler developed by Arora et al. [1998] which does not employ parallelism feedback. On moderately to heavily loaded machines with large numbers of processors, A-STEAL typically completed jobs more than twice as quickly as ABP, despite being allotted the same number or fewer processors on every step, while wasting only 10% of the processor cycles wasted by ABP.<\/jats:p>","DOI":"10.1145\/1394441.1394443","type":"journal-article","created":{"date-parts":[[2008,9,23]],"date-time":"2008-09-23T13:37:59Z","timestamp":1222177079000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":36,"title":["Adaptive work-stealing with parallelism feedback"],"prefix":"10.1145","volume":"26","author":[{"given":"Kunal","family":"Agrawal","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA"}]},{"given":"Charles E.","family":"Leiserson","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, Cambridge, MA"}]},{"given":"Yuxiong","family":"He","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]},{"given":"Wen Jing","family":"Hsu","sequence":"additional","affiliation":[{"name":"Nanyang Technological University"}]}],"member":"320","published-online":{"date-parts":[[2008,9,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/341800.341801"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1122971.1122988"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2006.14"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1229428.1229448"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/277651.277678"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/185675.185815"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-004-1115-0"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/301970.301974"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/215399.215403"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/232627.232650"},{"key":"e_1_2_1_11_1","unstructured":"Blumofe R. D. 1995. Executing multithreaded programs efficiently. Ph.D. Thesis. Massachusetts Institute of Technology.   Blumofe R. D. 1995. Executing multithreaded programs efficiently. Ph.D. Thesis. Massachusetts Institute of Technology."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/209936.209958"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1996.0107"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539793259471"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324234"},{"key":"e_1_2_1_16_1","unstructured":"Blumofe R. D. Leiserson C. E. and Song B. 1998. Automatic processor allocation for work-stealing jobs. unpublished.  Blumofe R. D. Leiserson C. E. and Song B. 1998. Automatic processor allocation for work-stealing jobs. unpublished."},{"first-page":"133","volume-title":"Proceedings of the USENIX 1997 Annual Technical Conference (USENJX'97)","author":"Blumofe R. D.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/277851.277939"},{"key":"e_1_2_1_19_1","unstructured":"Blumofe R. D. and Papadopoulos D. 1999. Hood: a user-level threads library for multiprogrammed multiprocessors. Tech. Rep. University of Texas at Austin.  Blumofe R. D. and Papadopoulos D. 1999. Hood: a user-level threads library for multiprogrammed multiprocessors. Tech. Rep. University of Texas at Austin."},{"first-page":"96","volume-title":"Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC'94)","author":"Blumofe R. D.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/321812.321815"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/800223.806778"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1073970.1073974"},{"volume-title":"Proceedings of the International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP). 200--223","author":"Chiang S.-H.","key":"e_1_2_1_24_1"},{"first-page":"50","volume-title":"Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS'01)","author":"Cirne W.","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","unstructured":"Cormen T. H. Leiserson C. E. Rivest R. L. and Stein C. 2001. Introduction to Algorithms (Second ed). The MIT Press and McGraw-Hill.   Cormen T. H. Leiserson C. E. Rivest R. L. and Stein C. 2001. Introduction to Algorithms (Second ed). The MIT Press and McGraw-Hill."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/237502.237510"},{"first-page":"159","volume-title":"Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA '96)","author":"Deng X.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","unstructured":"DESMOJ. 1999. DESMO-J: a framework for discrete-event modelling and simulation. http:\/\/asi-www.informatik.uni-hamburg.de\/desmoj\/.  DESMOJ. 1999. DESMO-J: a framework for discrete-event modelling and simulation. http:\/\/asi-www.informatik.uni-hamburg.de\/desmoj\/."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1019077214124"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.21127"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/301250.301299"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022952324290"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.55693"},{"key":"e_1_2_1_35_1","unstructured":"Feitelson D. 2005. Parallel workloads archive. http:\/\/www.cs.huji.ac.il\/labs\/parallel\/workload\/.  Feitelson D. 2005. Parallel workloads archive. http:\/\/www.cs.huji.ac.il\/labs\/parallel\/workload\/."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5555\/646377.689375"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/22719.24067"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/277650.277725"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/32.90447"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1137\/0117039"},{"key":"e_1_2_1_42_1","unstructured":"Gu N. 1995. Competitive analysis of dynamic processor allocation strategies. Masters Thesis. York University.  Gu N. 1995. Competitive analysis of dynamic processor allocation strategies. Masters Thesis. York University."},{"volume-title":"Proceedings of the International Workshop on Massive Parallelism: Hardware, Software, and Applications.","author":"Halbherr M.","key":"e_1_2_1_43_1"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/800055.802017"},{"volume-title":"Proceedings of the Conference on Applications of Heavy Tailed Distributions in Economics.","year":"1999","author":"Harchol-Balter M.","key":"e_1_2_1_45_1"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/263326.263344"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00446-005-0144-5"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/571825.571876"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.355.0743"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/62212.62240"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/317499.317539"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/98457.98761"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0743-7315(03)00108-4"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.5555\/646381.689673"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/151244.151246"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/91556.91631"},{"first-page":"422","volume-title":"Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'93)","author":"Motwani R.","key":"e_1_2_1_57_1"},{"key":"e_1_2_1_58_1","doi-asserted-by":"crossref","unstructured":"Motwani R. and Raghavan P. 1995. Randomized Algorithms (1st Ed). Cambridge University Press.   Motwani R. and Raghavan P. 1995. Randomized Algorithms (1st Ed). Cambridge University Press.","DOI":"10.1017\/CBO9780511814075"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/314602.314607"},{"first-page":"463","volume-title":"Proceedings of the 10th International Parallel Processing Symposium (IPPS'96)","author":"Nguyen T. D.","key":"e_1_2_1_60_1"},{"first-page":"155","volume-title":"Proceedings of the International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP)","author":"Nguyen T. D.","key":"e_1_2_1_61_1"},{"volume-title":"Proceedings of the 9th International Parallel Processing Symposium (IPPS '95)","author":"Parsons E. W.","key":"e_1_2_1_62_1"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1016\/0166-5316(94)90037-X"},{"first-page":"165","volume-title":"Proceedings of the 9th International Parallel Processing Symposium (IPPS '95)","author":"Rosti E.","key":"e_1_2_1_64_1"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/113379.113401"},{"key":"e_1_2_1_66_1","unstructured":"Sen S. 2004. Dynamic processor allocation for adaptively parallel jobs. Masters Thesis. Massachusetts Institute of Technology.  Sen S. 2004. Dynamic processor allocation for adaptively parallel jobs. Masters Thesis. Massachusetts Institute of Technology."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/75108.75391"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1016\/0166-5316(94)90036-1"},{"key":"e_1_2_1_69_1","unstructured":"Song B. 1998. Scheduling adaptively parallel jobs. Masters Thesis. Massachusetts Institute of Technology.  Song B. 1998. Scheduling adaptively parallel jobs. Masters Thesis. Massachusetts Institute of Technology."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.5555\/646376.689357"},{"key":"e_1_2_1_71_1","unstructured":"Supercomputing Technologies Group. 2001. Cilk 5.3.2 Reference Manual. MIT Laboratory for Computer Science.  Supercomputing Technologies Group. 2001. Cilk 5.3.2 Reference Manual. MIT Laboratory for Computer Science."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1016\/0166-5316(96)00030-2"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/74850.74866"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.585"},{"key":"e_1_2_1_75_1","unstructured":"Zipf G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley.  Zipf G. K. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley."}],"container-title":["ACM Transactions on Computer Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1394441.1394443","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1394441.1394443","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T13:56:35Z","timestamp":1750254995000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1394441.1394443"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,9]]},"references-count":74,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2008,9]]}},"alternative-id":["10.1145\/1394441.1394443"],"URL":"https:\/\/doi.org\/10.1145\/1394441.1394443","relation":{},"ISSN":["0734-2071","1557-7333"],"issn-type":[{"type":"print","value":"0734-2071"},{"type":"electronic","value":"1557-7333"}],"subject":[],"published":{"date-parts":[[2008,9]]},"assertion":[{"value":"2007-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2007-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2008-09-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}