{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T14:24:03Z","timestamp":1760711043336,"version":"3.41.0"},"reference-count":53,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,12,16]],"date-time":"2022-12-16T00:00:00Z","timestamp":1671148800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2022,12,31]]},"abstract":"<jats:p>The producer-consumer communication over shared memory is a critical function of current scalable systems. Queues that provide low latency and high throughput on highly utilized systems can improve the overall performance perceived by the end users. In order to address this demand, we set as priority to achieve both high operation performance and item transfer speed. The Relaxed Concurrent Queues (RCQs) are a family of queues that we have designed and implemented for that purpose. Our key idea is a relaxed ordering model that splits the enqueue and dequeue operations into a stage of sequential assignment to a queue slot and a stage of concurrent execution across the slots. At each slot, we apply no order restrictions among the operations of the same type. We define several variants of the RCQ algorithms with respect to offered concurrency, required hardware instructions, supported operations, occupied memory space, and precondition handling. For specific RCQ algorithms, we provide pseudo-code definitions and reason about their correctness and progress properties. Additionally, we theoretically estimate and experimentally validate the worst-case distance between an RCQ algorithm and a strict first-in-first-out (FIFO) queue. We developed prototype implementations of the RCQ algorithms and experimentally compare them with several representative strict FIFO and relaxed data structures over a range of workload and system settings. The RCQS algorithm is a provably linearizable lock-free member of the RCQ family. We experimentally show that RCQS achieves factors to orders of magnitude advantage over the state-of-the-art strict or relaxed queue algorithms across several latency and throughput statistics of the queue operations and item transfers.<\/jats:p>","DOI":"10.1145\/3565514","type":"journal-article","created":{"date-parts":[[2022,10,4]],"date-time":"2022-10-04T11:06:08Z","timestamp":1664881568000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["A Family of Relaxed Concurrent Queues for Low-Latency Operations and Item Transfers"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0173-3997","authenticated-orcid":false,"given":"Giorgos","family":"Kappes","sequence":"first","affiliation":[{"name":"University of Ioannina, Ioannina, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1542-7878","authenticated-orcid":false,"given":"Stergios V.","family":"Anastasiadis","sequence":"additional","affiliation":[{"name":"University of Ioannina, Ioannina, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,12,16]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"2007. Intel\u00ae 64 Architecture Memory Ordering White Paper. Intel Corporation."},{"key":"e_1_3_2_3_2","unstructured":"2017. Atomic operations library (memory_order). Retrieved October 6 2022 from https:\/\/en.cppreference.com\/w\/c\/atomic\/memory_order."},{"key":"e_1_3_2_4_2","unstructured":"2020. Intel\u00ae 64 and IA-32 Architectures Software Developer\u2019s Manual Volume 2: Instruction Set Reference A-Z . Order Number 325383-072US Intel Corporation."},{"key":"e_1_3_2_5_2","unstructured":"2020. tcmalloc. Retrieved October 6 2022 from https:\/\/google.github.io\/tcmalloc\/."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15291-7_16"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-17653-1_29"},{"key":"e_1_3_2_8_2","first-page":"11","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Alistarh Dan","year":"2015","unstructured":"Dan Alistarh, Justin Kopinsky, Jerry Li, and Nir Shavit. 2015. The spraylist: A scalable relaxed priority queue. In Symposium on Principles and Practice of Parallel Programming (San Francisco, CA). ACM, New York, NY, 11\u201320."},{"key":"e_1_3_2_9_2","first-page":"145","volume-title":"Symposium on Parallel Algorithms and Architectures","author":"Alistarh Dan","year":"2019","unstructured":"Dan Alistarh, Giorgi Nadiradze, and Nikita Koval. 2019. Efficiency guarantees for parallel incremental algorithms under relaxed schedulers. In Symposium on Parallel Algorithms and Architectures (Phoenix, AZ). ACM, New York, NY, 145\u2013154."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1002\/0471791571"},{"key":"e_1_3_2_11_2","doi-asserted-by":"crossref","unstructured":"Daniel Cederman Anders Gidenstam Phuong Ha Hkan Sundell Marina Papatriantafilou and Philippas Tsigas. 2017. Lock-free concurrent data structures. In Programming Multi-Core and Many-Core Computing Systems Sabri Pllana and Fatos Xhafa (Eds.). John Wiley & Sons Ltd Hoboken NJ Chapter 3 59\u201379.","DOI":"10.1002\/9781119332015.ch3"},{"key":"e_1_3_2_12_2","unstructured":"Dave Dice. 2014. PTLQueue : a scalable bounded-capacity MPMC queue. Retrieved October 6 2022 from https:\/\/blogs.oracle.com\/dave\/ptlqueue-:-a-scalable-bounded-cap acity-mpmc-queue."},{"key":"e_1_3_2_13_2","first-page":"313","volume-title":"Symposium on Parallelism in Algorithms and Architectures","author":"Dice David","year":"2011","unstructured":"David Dice and Oleksandr Otenko. 2011. Brief announcement: Multilane \u2014a concurrent blocking multiset. In Symposium on Parallelism in Algorithms and Architectures (San Jose, CA). ACM, New York, NY, 313\u2013314."},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPP.2008.82"},{"key":"e_1_3_2_15_2","first-page":"257","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Fatourou Panagiota","year":"2012","unstructured":"Panagiota Fatourou and Nikolaos D. Kallimanis. 2012. Revisiting the combining synchronization technique. In Symposium on Principles and Practice of Parallel Programming (New Orleans, LA). ACM, New York, NY, 257\u2013266."},{"key":"e_1_3_2_16_2","first-page":"479","volume-title":"Ottawa Linux Symposium","author":"Franke Hubertus","year":"2002","unstructured":"Hubertus Franke, Rusty Russell, and Matthew Kirkwood. 2002. Fuss, futexes and furwocks: Fast userlevel locking in Linux. In Ottawa Linux Symposium (Ottawa, Canada), Ottawa Linux Symposium (OLS), 479\u2013495."},{"key":"e_1_3_2_17_2","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1145\/3357223.3362737","volume-title":"Symposium on Cloud Computing","author":"Golestani Hossein","year":"2019","unstructured":"Hossein Golestani, Amirhossein Mirhosseini, and Thomas F. Wenisch. 2019. Software data planes: You can\u2019t always spin to win. In Symposium on Cloud Computing (Santa Cruz, CA). ACM, New York, NY, 337\u2013350."},{"key":"e_1_3_2_18_2","first-page":"361","volume-title":"Symposium on Parallelism in Algorithms and Architectures","author":"Gruber Jakob","year":"2016","unstructured":"Jakob Gruber, Jesper Larsson Tr\u00e4ff, and Martin Wimmer. 2016. Brief announcement: Benchmarking concurrent priority queues. In Symposium on Parallelism in Algorithms and Architectures (Pacific Grove, CA). ACM, New York, NY, 361\u2013362."},{"key":"e_1_3_2_19_2","first-page":"6:1\u20136:15","volume-title":"International Conference on Concurrency Theory","author":"Haas Andreas","year":"2016","unstructured":"Andreas Haas, Thomas A. Henzinger, Andreas Holzer, Christoph M. Kirsch, Michael Lippautz, Hannes Payer, Ali Sezgin, Ana Sokolova, and Helmut Veith. 2016. Local linearizability for concurrent container-type data structures. In International Conference on Concurrency Theory (Quebec City, Canada). Jos\u00e9e Desharnais and Radha Jagadeesan (Eds.). Leibniz International Proceedings in Informatics, Vol. 59. Dagstuhl Publishing, Saarbr\u00fccken\/Wadern, Germany, 6:1\u20136:15."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-26850-7_1"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/2482767.2482789"},{"key":"e_1_3_2_22_2","first-page":"317","volume-title":"Symposium on Principles of Programming Languages","author":"Henzinger Thomas A.","year":"2013","unstructured":"Thomas A. Henzinger, Christoph M. Kirsch, Hannes Payer, Ali Sezgin, and Ana Sokolova. 2013. Quantitative relaxation of concurrent data structures. In Symposium on Principles of Programming Languages (Rome, Italy). ACM, New York, NY, 317\u2013328."},{"key":"e_1_3_2_23_2","volume-title":"The Art of Multiprocessor Programming (2nd ed.)","author":"Herlihy Maurice","year":"2021","unstructured":"Maurice Herlihy, Nir Shavit, Victor Luchangco, and Michael Spear. 2021. The Art of Multiprocessor Programming (2nd ed.). Morgan Kaufmann, an imprint of Elsevier, Cambridge, MA, Chapter 3, 49\u201374."},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/78969.78972"},{"key":"e_1_3_2_25_2","first-page":"74","volume-title":"Symposium on Cloud Computing","author":"Kappes Giorgos","year":"2020","unstructured":"Giorgos Kappes and Stergios V. Anastasiadis. 2020. A user-level toolkit for storage I\/O isolation on multitenant hosts. In Symposium on Cloud Computing (Virtual Event, USA). ACM, New York, NY, 74\u201389."},{"key":"e_1_3_2_26_2","first-page":"454","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Kappes Giorgos","year":"2021","unstructured":"Giorgos Kappes and Stergios V. Anastasiadis. 2021. POSTER: A lock-free relaxed concurrent queue for fast work distribution. In Symposium on Principles and Practice of Parallel Programming (Virtual Event). ACM, New York, NY, 454\u2013456."},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3205289.3205291"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-39958-9_18"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2767046"},{"key":"e_1_3_2_30_2","volume-title":"The Art of Computer Programming, Volume 1 (3rd ed.)","author":"Knuth Donald Ervin","year":"1997","unstructured":"Donald Ervin Knuth. 1997. The Art of Computer Programming, Volume 1 (3rd ed.). Addison-Wesley, Boston, MA, Chapter 2, 232\u2013465."},{"key":"e_1_3_2_31_2","volume-title":"Relaxed Concurrent Ordering Structures","author":"Kopinsky Justin","year":"2018","unstructured":"Justin Kopinsky. 2018. Relaxed Concurrent Ordering Structures. Ph.D. Dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, Chapter 1, 11\u201325."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1979.1675439"},{"key":"e_1_3_2_33_2","first-page":"71","volume-title":"Symposium on Operating Systems Principles","author":"Lee Collin","year":"2015","unstructured":"Collin Lee, Seo Jin Park, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout. 2015. Implementing linearizability at large scale and low latency. In Symposium on Operating Systems Principles (Monterey, CA). ACM, New York, NY, 71\u201386."},{"key":"e_1_3_2_34_2","volume-title":"Handbook of Data Structures and Applications (2nd ed.)","author":"Mehta Dinesh P.","year":"2018","unstructured":"Dinesh P. Mehta. 2018. Handbook of Data Structures and Applications (2nd ed.). Dinesh P. Mehta and Sartaj Sahni (Eds.). CRC Press, Taylor & Francis Group, Boca Raton, FL, Chapter 2, 23\u201334."},{"key":"e_1_3_2_35_2","first-page":"267","volume-title":"Symposium on Principles of Distributed Computing","author":"Michael Maged M.","year":"1996","unstructured":"Maged M. Michael and Michael L. Scott. 1996. Simple, fast and practical non-blocking and blocking concurrent queue algorithms. In Symposium on Principles of Distributed Computing (Philadelphia, PA). ACM, New York, NY, 267\u2013275."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/2980987"},{"key":"e_1_3_2_37_2","first-page":"103","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Morrison Adam","year":"2013","unstructured":"Adam Morrison and Yehuda Afek. 2013. Fast concurrent queues for x86 processors. In Symposium on Principles and Practice of Parallel Programming (Shenzhen, China). ACM, New York, NY, 103\u2013112."},{"key":"e_1_3_2_38_2","first-page":"89","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Ostrovsky Or","year":"2020","unstructured":"Or Ostrovsky and Adam Morrison. 2020. Scaling concurrent queues by using HTM to profit from failed atomic operations. In Symposium on Principles and Practice of Parallel Programming (San Diego, CA). ACM, New York, NY, 89\u2013101."},{"key":"e_1_3_2_39_2","first-page":"80","volume-title":"Symposium on Parallelism in Algorithms and Architectures","author":"Rihani Hamza","year":"2015","unstructured":"Hamza Rihani, Peter Sanders, and Roman Dementiev. 2015. Brief announcement: MultiQueues: Simple relaxed concurrent priority queues. In Symposium on Parallelism in Algorithms and Architectures (Portland, OR). ACM, New York, NY, 80\u201382."},{"key":"e_1_3_2_40_2","first-page":"31:1\u201331:15","volume-title":"International Symposium on Distributed Computing","author":"Rukundo Adones","year":"2019","unstructured":"Adones Rukundo, Aras Atalar, and Philippas Tsigas. 2019. Monotonically relaxing concurrent data-structure semantics for increasing performance: An efficient 2D design framework. In International Symposium on Distributed Computing (Budapest, Hungary). Jukka Suomela (Ed.). Leibniz International Proceedings in Informatics, Vol. 146. Dagstuhl Publishing, Saarbr\u00fccken\/Wadern, Germany, 31:1\u201331:15."},{"key":"e_1_3_2_41_2","first-page":"147","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Scherer William N.","year":"2006","unstructured":"William N. Scherer, Doug Lea, and Michael L. Scott. 2006. Scalable synchronous queues. In Symposium on Principles and Practice of Parallel Programming (New York, NY). ACM, New York, NY, 147\u2013156."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30186-8_13"},{"issue":"7","key":"e_1_3_2_43_2","first-page":"87","article-title":"x86-TSO: A rigorous and usable programmer\u2019s model for x86 multiprocessors","volume":"53","author":"Sewel Peter","year":"2010","unstructured":"Peter Sewel, Susmit Sarkar, Scott Owens, Francesco Zappa Nardelli, and Magnus O. Myreen. 2010. x86-TSO: A rigorous and usable programmer\u2019s model for x86 multiprocessors. Communications of the ACM 53, 7 (May2010), 87\u201397.","journal-title":"Communications of the ACM"},{"key":"e_1_3_2_44_2","first-page":"55","volume-title":"International Conference on Distributed Computing and Networking","author":"Shafiei Niloufar","year":"2009","unstructured":"Niloufar Shafiei. 2009. Non-blocking array-based algorithms for stacks and queues. In International Conference on Distributed Computing and Networking (Hyderabad, India). ACM, New York, NY, 55\u201366."},{"key":"e_1_3_2_45_2","first-page":"470","volume-title":"International Conference on Parallel and Distributed Systems","author":"Shann Chien-Hua","year":"2000","unstructured":"Chien-Hua Shann, Ting-Lu Huang, and Cheng Chen. 2000. A practical nonblocking queue algorithm using compare-and-swap. In International Conference on Parallel and Distributed Systems (Iwate, Japan). IEEE Computer Society, Los Alamitos, CA, 470\u2013476."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00446-016-0272-0"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356161"},{"key":"e_1_3_2_48_2","first-page":"335","volume-title":"Symposium on Parallelism in Algorithms and Architectures","author":"Sundell H\u00e5kan","year":"2011","unstructured":"H\u00e5kan Sundell, Anders Gidenstam, Marina Papatriantafilou, and Philippas Tsigas. 2011. A lock-free algorithm for concurrent bags. In Symposium on Parallelism in Algorithms and Architectures (San Jose, CA). ACM, New York, NY, 335\u2013344."},{"key":"e_1_3_2_49_2","unstructured":"The kernel development community. 2021. Linux Scheduler. Linux version 5.15.0-rc3. Retrieved October 7 2022 from https:\/\/www.kernel.org\/doc\/html\/latest\/scheduler\/index.html."},{"key":"e_1_3_2_50_2","first-page":"134","volume-title":"Symposium on Parallel Algorithms and Architectures","author":"Tsigas Philippas","year":"2001","unstructured":"Philippas Tsigas and Yi Zhang. 2001. A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems. In Symposium on Parallel Algorithms and Architectures (Crete Island, Greece). ACM, New York, NY, 134\u2013143."},{"key":"e_1_3_2_51_2","first-page":"277","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Wimmer Martin","year":"2015","unstructured":"Martin Wimmer, Jakob Gruber, Jesper Larsson Tr\u00e4ff, and Philippas Tsigas. 2015. The lock-free k-LSM relaxed priority queue. In Symposium on Principles and Practice of Parallel Programming (San Francisco, CA). ACM, New York, NY, 277\u2013278."},{"key":"e_1_3_2_52_2","unstructured":"Chaoran Yang. 2020. A benchmark framework for concurrent queue implementations. Retrieved October 7 2022 from https:\/\/github.com\/chaoran\/fast-wait-free-queue."},{"key":"e_1_3_2_53_2","first-page":"16:1\u201316:13","volume-title":"Symposium on Principles and Practice of Parallel Programming","author":"Yang Chaoran","year":"2016","unstructured":"Chaoran Yang and John Mellor-Crummey. 2016. A wait-free queue as fast as fetch-and-add. In Symposium on Principles and Practice of Parallel Programming (Barcelona, Spain). ACM, New York, NY, 16:1\u201316:13."},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337911"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3565514","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3565514","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:48:51Z","timestamp":1750182531000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3565514"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,16]]},"references-count":53,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,31]]}},"alternative-id":["10.1145\/3565514"],"URL":"https:\/\/doi.org\/10.1145\/3565514","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2022,12,16]]},"assertion":[{"value":"2021-05-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-29","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-12-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}