{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T19:23:22Z","timestamp":1774121002718,"version":"3.50.1"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,5,25]],"date-time":"2022-05-25T00:00:00Z","timestamp":1653436800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["IIP1361847"],"award-info":[{"award-number":["IIP1361847"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            Modern Chip Multiprocessors (CMPs) are integrating an increasing amount of cores to address the continually growing demand for high-application performance. The cores of a CMP share several components of the memory hierarchy, such as Last-Level Cache (LLC) and main memory. This allows for considerable gains in multithreaded applications while also helping to maintain architectural simplicity. However, sharing resources can also result in performance bottleneck due to contention among concurrently executing applications. In this work, we formulate a fine-grained application characterization methodology that leverages Performance Monitoring Counters (PMCs) and Cache Monitoring Technology (CMT) in Intel processors. We utilize this characterization methodology to develop two contention-aware scheduling policies, one\n            <jats:italic>static<\/jats:italic>\n            and one\n            <jats:italic>dynamic<\/jats:italic>\n            , that co-schedule applications based on their resource-interference profiles. Our approach focuses on minimizing contention on both the main-memory bandwidth and the LLC by monitoring the pressure that each application inflicts on these resources. We achieve performance benefits for diverse workloads, outperforming Linux and three state-of-the-art contention-aware schedulers in terms of system throughput and fairness for both single and multithreaded workloads. Compared with Linux, our policy achieves up to 16% greater throughput for single-threaded and up to 40% greater throughput for multithreaded applications. Additionally, the policies increase fairness by up to 65% for single-threaded and up to 130% for multithreaded ones.\n          <\/jats:p>","DOI":"10.1145\/3524616","type":"journal-article","created":{"date-parts":[[2022,5,25]],"date-time":"2022-05-25T12:45:14Z","timestamp":1653482714000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["A Pressure-Aware Policy for Contention Minimization on Multicore Systems"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1733-6358","authenticated-orcid":false,"given":"Shivam","family":"Kundan","sequence":"first","affiliation":[{"name":"School of Electrical, Computer and Biomedical Engineering, Southern Illinois University, Carbondale, Illinois, U.S.A."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Theodoros","family":"Marinakis","sequence":"additional","affiliation":[{"name":"NVIDIA Corporation, Redmond, Washington, U.S.A"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0985-3045","authenticated-orcid":false,"given":"Iraklis","family":"Anagnostopoulos","sequence":"additional","affiliation":[{"name":"School of Electrical, Computer and Biomedical Engineering, Southern Illinois University, Carbondale, Illinois, U.S.A."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dimitri","family":"Kagaris","sequence":"additional","affiliation":[{"name":"School of Electrical, Computer and Biomedical Engineering, Southern Illinois University, Carbondale, Illinois, U.S.A."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,25]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304005"},{"key":"e_1_3_1_3_2","first-page":"104","volume-title":"High Performance Computer Architecture (HPCA\u201918)","author":"El-Sayed Nosayba","year":"2018","unstructured":"Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In IEEE International Symposium onHigh Performance Computer Architecture (HPCA\u201918). IEEE, 104\u2013117."},{"key":"e_1_3_1_4_2","first-page":"187","volume-title":"IEEE International Parallel & Distributed Processing Symposium","author":"Feliu Josu\u00e9","year":"2015","unstructured":"Josu\u00e9 Feliu, Julio Sahuquillo, Salvador Petit, and Jos\u00e9 Duato. 2015. Addressing fairness in SMT multicores with a progress-aware scheduler. In IEEE International Parallel & Distributed Processing Symposium. IEEE, 187\u2013196."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2015.2428694"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2620977"},{"key":"e_1_3_1_7_2","first-page":"19","volume-title":"Architectural Support for Programming Languages and Operating Systems","author":"Gan Yu","year":"2019","unstructured":"Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging big data to navigate the complexity of performance debugging in cloud microservices. In Architectural Support for Programming Languages and Operating Systems. ACM, 19\u201333."},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Stefano Gualandi and Michele Lombardi. 2013. A simple and effective decomposition for the multidimensional binpacking constraint. In International Conference on Principles and Practice of Constraint Programming . Springer Berlin Heidelberg 356\u2013364.","DOI":"10.1007\/978-3-642-40627-0_29"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628123"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.14459\/2016md1286948"},{"key":"e_1_3_1_11_2","volume-title":"COSH@HiPEAC","author":"Haritatos Alexandros-Herodotos","year":"2016","unstructured":"Alexandros-Herodotos Haritatos, Nikela Papadopoulou, Konstantinos Nikas, Georgios I. Goumas, and Nectarios Koziris. 2016. Contention-aware scheduling policies for fairness and throughput. In COSH@HiPEAC."},{"key":"e_1_3_1_12_2","first-page":"657","volume-title":"IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916)","author":"Herdrich Andrew","year":"2016","unstructured":"Andrew Herdrich, Edwin Verplanke, Priya Autee, Ramesh Illikkal, Chris Gianos, Ronak Singhal, and Ravi Iyer. 2016. Cache QoS: From concept to reality in the Intel\u00ae Xeon\u00ae processor E5-2600 v3 product family. In IEEE International Symposium on High Performance Computer Architecture (HPCA\u201916). IEEE, 657\u2013668."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/2150976.2151003"},{"key":"e_1_3_1_14_2","unstructured":"Hao-Qiang Jin Michael Frumkin and Jerry Yan. 1999. The OpenMP implementation of NAS parallel benchmarks and its performance. (1999)."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-38171-3_9"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.5555\/1025127.1026001"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE.2019.8715259"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2008.48"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2018.2845841"},{"key":"e_1_3_1_20_2","first-page":"159","volume-title":"2019 IEEE International Symposium on High Performance Computer Architecture","author":"Kulkarni Neeraj","year":"2019","unstructured":"Neeraj Kulkarni, Feng Qi, and Christina Delimitrou. 2019. Pliant: Leveraging approximation to improve datacenter resource efficiency. In 2019 IEEE International Symposium on High Performance Computer Architecture. IEEE, 159\u2013171."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS51556.2021.9401337"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2021.106971"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687670"},{"key":"e_1_3_1_24_2","first-page":"450","volume-title":"ACM SIGARCH Computer Architecture News","author":"Lo David","year":"2015","unstructured":"David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving resource efficiency at scale. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 450\u2013462."},{"key":"e_1_3_1_25_2","unstructured":"P. Notebaert M. Berkelaar and K. Eikland. 2015. R Package \u2018lpSolve\u2019. CRAN. https:\/\/cran.r-project.org\/web\/packages\/lpSolve\/lpSolve.pdf."},{"key":"e_1_3_1_26_2","first-page":"1","volume-title":"IEEE International Symposium on Circuits and Systems (ISCAS\u201917)","author":"Marinakis Theodoros","year":"2017","unstructured":"Theodoros Marinakis, Alexandros-Herodotos Haritatos, Konstantinos Nikas, Georgios Goumas, and Iraklis Anagnostopoulos. 2017. An efficient and fair scheduling policy for multiprocessor platforms. In IEEE International Symposium on Circuits and Systems (ISCAS\u201917). IEEE, 1\u20134."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/LES.2019.2956990"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/2155620.2155650"},{"key":"e_1_3_1_29_2","first-page":"19","article-title":"A survey of memory bandwidth and machine balance in current high performance computers","author":"McCalpin John D.","year":"1995","unstructured":"John D. McCalpin. 1995. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter (1995), 19\u201325.","journal-title":"IEEE TCCA Newsletter"},{"key":"e_1_3_1_30_2","doi-asserted-by":"crossref","unstructured":"M. D. Moffitt.2013. Multidimensional bin packing revisited. In International Conference on Principles and Practice of Constraint Programming . Springer Berlin Heidelberg 513\u2013528.","DOI":"10.1007\/978-3-642-40627-0_39"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33558-7_56"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2006.24"},{"key":"e_1_3_1_33_2","unstructured":"Douglas Pase. 2008. The pChase Benchmark Page. http:\/\/pchase.org\/."},{"key":"e_1_3_1_34_2","article-title":"Polybench: The polyhedral benchmark suite","author":"Pouchet Louis-No\u00ebl","year":"2012","unstructured":"Louis-No\u00ebl Pouchet. 2012. Polybench: The polyhedral benchmark suite. Retrieved April 13, 2022 from http:\/\/www. cs. ucla. edu\/pouchet\/software\/polybench.","journal-title":"http:\/\/www. cs. ucla. edu\/pouchet\/software\/polybench"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243183"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/3319804"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654066"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/2000417.2000419"},{"key":"e_1_3_1_39_2","first-page":"24","volume-title":"ACM SIGARCH Computer Architecture News","author":"Woo S. C.","year":"1995","unstructured":"S. C. Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. In ACM SIGARCH Computer Architecture News, Vol. 23. ACM, 24\u201336."},{"key":"e_1_3_1_40_2","first-page":"174","volume-title":"ACM SIGARCH Computer Architecture News","author":"Xie Yuejian","year":"2009","unstructured":"Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion\/insertion pseudo-partitioning of multi-core shared caches. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 174\u2013183."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2318857.2254792"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2015.2425889"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/1736020.1736036"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/2379776.2379780"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524616","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524616","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3524616","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:54Z","timestamp":1750183794000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3524616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,25]]},"references-count":43,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3524616"],"URL":"https:\/\/doi.org\/10.1145\/3524616","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,25]]},"assertion":[{"value":"2021-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}