{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:35:22Z","timestamp":1750221322219,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2017,6,30]],"date-time":"2017-06-30T00:00:00Z","timestamp":1498780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Swiss National Fund Ambizione Fellowship","award":["PZ00P2_161375"],"award-info":[{"award-number":["PZ00P2_161375"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2017,6,30]]},"abstract":"<jats:p>High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating designs that minimize contention, and several programming techniques have been proposed to mitigate its effects. However, there are currently few architectural mechanisms to allow scaling contended data structures at high thread counts.<\/jats:p>\n          <jats:p>In this article, we investigate hardware support for scalable contended data structures. We propose Lease\/Release, a simple addition to standard directory-based MESI cache coherence protocols, allowing participants to lease memory, at the granularity of cache lines, by delaying coherence messages for a short, bounded period of time. Our analysis shows that Lease\/Release can significantly reduce the overheads of contention for both non-blocking (lock-free) and lock-based data structure implementations while ensuring that no deadlocks are introduced. We validate Lease\/Release empirically on the Graphite multiprocessor simulator on a range of data structures, including queue, stack, and priority queue implementations, as well as on transactional applications. Results show that Lease\/Release consistently improves both throughput and energy usage, by up to 5x, both for lock-free and lock-based data structure designs.<\/jats:p>","DOI":"10.1145\/3132168","type":"journal-article","created":{"date-parts":[[2017,10,10]],"date-time":"2017-10-10T12:17:08Z","timestamp":1507637828000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Lease\/Release"],"prefix":"10.1145","volume":"4","author":[{"given":"Syed Kamran","family":"Haider","sequence":"first","affiliation":[{"name":"University of Connecticut"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"William","family":"Hasenplaugh","sequence":"additional","affiliation":[{"name":"D. E. Shaw Research"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dan","family":"Alistarh","sequence":"additional","affiliation":[{"name":"IST Austria and ETH Zurich, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,10,10]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00446-013-0185-0"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2015.11"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2597630"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2688500.2688523"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2011.21"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2370036.2145837"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522714"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2694344.2694359"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-40047-6_60"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2686884"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/11864219_14"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1835698.1835736"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/08072646X"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989549"},{"volume-title":"Cambridge University Computer Laboratory","author":"Fraser Keir","key":"e_1_2_1_16_1","unstructured":"Keir Fraser . 2004. Practical Lock-Freedom . Ph.D. Dissertation . Cambridge University Computer Laboratory , Cambridge, UK . Also available as Technical Report UCAM-CL-TR-579. Keir Fraser. 2004. Practical Lock-Freedom. Ph.D. Dissertation. Cambridge University Computer Laboratory, Cambridge, UK. Also available as Technical Report UCAM-CL-TR-579."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/68182.68188"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45414-4_21"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810479.1810540"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2429069.2429109"},{"key":"e_1_2_1_21_1","unstructured":"Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann.  Maurice Herlihy and Nir Shavit. 2008. The Art of Multiprocessor Programming. Morgan Kaufmann."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/384286.264166"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 10th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT\u201915)","author":"Leiserson Charles","year":"2015","unstructured":"Charles Leiserson . 2015 . A simple deterministic algorithm for guaranteeing the forward progress of transactions . In Proceedings of the 10th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT\u201915) . Charles Leiserson. 2015. A simple deterministic algorithm for guaranteeing the forward progress of transactions. In Proceedings of the 10th ACM SIGPLAN Workshop on Transactional Computing (TRANSACT\u201915)."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS\u201900)","author":"Lotan Itay","year":"2000","unstructured":"Itay Lotan and Nir Shavit . 2000 . Skiplist-based concurrent priority queues . In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS\u201900) . IEEE, Los Alamitos, CA, 263--268. Itay Lotan and Nir Shavit. 2000. Skiplist-based concurrent priority queues. In Proceedings of the 14th International Parallel and Distributed Processing Symposium (IPDPS\u201900). IEEE, Los Alamitos, CA, 263--268."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPPS.1994.288305"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/106973.106999"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/564870.564881"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/248052.248106"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2010.5416635"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517327.2442527"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750403"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2692916.2555256"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201913)","author":"Nishtala Rajesh","year":"2013","unstructured":"Rajesh Nishtala , Hans Fugal , Steven Grimm , Marc Kwiatkowski , Herman Lee , Harry C. Li , Ryan McElroy , Mike Paleczny , Daniel Peek , Paul Saab , David Stafford , Tony Tung , and Venkateshwaran Venkataramani . 2013 . Scaling memcache at Facebook . In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201913) . 385--398. https:\/\/www.usenix.org\/conference\/nsdi13\/technical-sessions\/presentation\/nishtala. Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at Facebook. In Proceedings of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI\u201913). 385--398. https:\/\/www.usenix.org\/conference\/nsdi13\/technical-sessions\/presentation\/nishtala."},{"volume-title":"Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA\u201900)","author":"Rajwar Ravi","key":"e_1_2_1_35_1","unstructured":"Ravi Rajwar , Alain Kagi , and James R. Goodman . 2000. Improving the throughput of synchronization by insertion of delays . In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA\u201900) . IEEE, Los Alamitos, CA, 168--179. Ravi Rajwar, Alain Kagi, and James R. Goodman. 2000. Improving the throughput of synchronization by insertion of delays. In Proceedings of the 6th International Symposium on High Performance Computer Architecture (HPCA\u201900). IEEE, Los Alamitos, CA, 168--179."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/782814.782853"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2755573.2755616"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.2200\/S00499ED1V01Y201304CAC023"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/215399.215419"},{"key":"e_1_2_1_41_1","volume-title":"Wood","author":"Sorin Daniel J.","year":"2011","unstructured":"Daniel J. Sorin , Mark D. Hill , and David A . Wood . 2011 . A Primer on Memory Consistency and Cache Coherence. Morgan 8 Claypool . Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2011. A Primer on Memory Consistency and Cache Coherence. Morgan 8 Claypool."},{"key":"e_1_2_1_43_1","volume-title":"TARDIS: Timestamp based coherence algorithm for distributed shared memory. arXiv:1501.04504.","author":"Yu Xiangyao","year":"2015","unstructured":"Xiangyao Yu and Srinivas Devadas . 2015 . TARDIS: Timestamp based coherence algorithm for distributed shared memory. arXiv:1501.04504. Xiangyao Yu and Srinivas Devadas. 2015. TARDIS: Timestamp based coherence algorithm for distributed shared memory. arXiv:1501.04504."}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3132168","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3132168","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:13:56Z","timestamp":1750212836000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3132168"}},"subtitle":["Architectural Support for Scaling Contended Data Structures"],"short-title":[],"issued":{"date-parts":[[2017,6,30]]},"references-count":39,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2017,6,30]]}},"alternative-id":["10.1145\/3132168"],"URL":"https:\/\/doi.org\/10.1145\/3132168","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2017,6,30]]},"assertion":[{"value":"2017-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}