{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T02:22:31Z","timestamp":1773886951167,"version":"3.50.1"},"reference-count":61,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2021,2,11]],"date-time":"2021-02-11T00:00:00Z","timestamp":1613001600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,11]],"date-time":"2021-02-11T00:00:00Z","timestamp":1613001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS-1838222, IIS-1527984, IIS-1447826 and IIS-1305253"],"award-info":[{"award-number":["IIS-1838222, IIS-1527984, IIS-1447826 and IIS-1305253"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["The VLDB Journal"],"published-print":{"date-parts":[[2021,5]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The join and group-by aggregation are two memory intensive operators that are affecting the performance of relational databases. Hashing is a common approach used to implement both operators. Recent paradigm shifts in multi-core processor architectures have reinvigorated research into how the join and group-by aggregation operators can leverage these advances. However, the poor spatial locality of the hashing approach has hindered performance on multi-core processor architectures which rely on using large cache hierarchies for latency mitigation. Multithreaded architectures can better cope with poor spatial locality by masking memory latency with many outstanding requests. Nevertheless, the number of parallel threads, even in the most advanced multithreaded processors, such as UltraSPARC, is not enough to fully cover the main memory access latency. In this paper, we explore the hardware re-configurability of FPGAs to enable deeper execution pipelines that maintain hundreds (instead of tens) of outstanding memory requests across four FPGAs-drastically increasing concurrency and throughput. We present two end-to-end in-memory accelerators for the join and group-by aggregation operators using FPGAs. Both accelerators use massive multithreading to mask long memory delays of traversing linked-list data structures, while concurrently managing hundreds of thread states across four FPGAs locally. We explore how content addressable memories can be intermixed within our multithreaded designs to act as a <jats:italic>synchronizing cache<\/jats:italic>, which enforces locks and merges jobs together before they are written to memory. Throughput results for our hash-join operator accelerator show a speedup between 2<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> and 3.4<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> over the best multi-core approaches with comparable memory bandwidths on uniform and skewed datasets. The accelerator for the hash-based group-by aggregation operator demonstrates that leveraging CAMs achieves average speedup of 3.3<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> with a best case of 9.4<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mo>\u00d7<\/mml:mo>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> in terms of throughput over CPU implementations across five types of data distributions.<\/jats:p>","DOI":"10.1007\/s00778-020-00642-5","type":"journal-article","created":{"date-parts":[[2021,2,11]],"date-time":"2021-02-11T17:03:49Z","timestamp":1613063029000},"page":"333-359","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Efficient local locking for massively multithreaded in-memory hash-based operators"],"prefix":"10.1007","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4782-4600","authenticated-orcid":false,"given":"Bashar","family":"Romanous","sequence":"first","affiliation":[]},{"given":"Skyler","family":"Windh","sequence":"additional","affiliation":[]},{"given":"Ildar","family":"Absalyamov","sequence":"additional","affiliation":[]},{"given":"Prerna","family":"Budhkar","sequence":"additional","affiliation":[]},{"given":"Robert","family":"Halstead","sequence":"additional","affiliation":[]},{"given":"Walid","family":"Najjar","sequence":"additional","affiliation":[]},{"given":"Vassilis","family":"Tsotras","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,2,11]]},"reference":[{"key":"642_CR1","doi-asserted-by":"crossref","unstructured":"Absalyamov, I., Budhkar, P., Windh, S., Halstead, R.J., Najjar, W.A., Tsotras, V.J.: FPGA-accelerated group-by aggregation using synchronizing caches. In: Proceedings of the 12th International Workshop on Data Management on New Hardware, DaMoN \u201916, pp. 11:1\u201311:9. NY, USA: ACM (2016)","DOI":"10.1145\/2933349.2933360"},{"issue":"10","key":"642_CR2","doi-asserted-by":"publisher","first-page":"1064","DOI":"10.14778\/2336664.2336678","volume":"5","author":"M-C Albutiu","year":"2012","unstructured":"Albutiu, M.-C., Kemper, A., Neumann, T.: Massively parallel sort-merge joins in main memory multi-core database systems. Proc. VLDB Endow. 5(10), 1064\u20131075 (2012)","journal-title":"Proc. VLDB Endow."},{"key":"642_CR3","unstructured":"Alpha Data. http:\/\/www.alpha-data.com\/dcp\/capi.php (2015)"},{"key":"642_CR4","unstructured":"AWS Events. AWS re:invent 2019: Amazon redshift reimagined: RA3 and AQUA (ANT230). https:\/\/youtu.be\/6pZrE_tveLI (2019). Accessed 2020-7-11"},{"issue":"1","key":"642_CR5","doi-asserted-by":"publisher","first-page":"85","DOI":"10.14778\/2732219.2732227","volume":"7","author":"C Balkesen","year":"2013","unstructured":"Balkesen, C., Alonso, G., Teubner, J., \u00d6zsu, M\u00a0.T.: Multi-core, main-memory joins: sort vs. Hash revisited. Proc. VLDB Endow. 7(1), 85\u201396 (2013)","journal-title":"Proc. VLDB Endow."},{"key":"642_CR6","doi-asserted-by":"crossref","unstructured":"Balkesen, C., Teubner, J., Alonso, G., \u00d6zsu, M.: Main-memory hash joins on multi-core CPUs: tuning to the underlying hardware. In: Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE\u201913, pp. 362\u2013373 (2013)","DOI":"10.1109\/ICDE.2013.6544839"},{"key":"642_CR7","doi-asserted-by":"crossref","unstructured":"Bandi, N., Metwally, A., Agrawal, D., El\u00a0Abbadi, A.: Fast data stream algorithms using associative memories. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD \u201907, pp. 247\u2013256 (2007)","DOI":"10.1145\/1247480.1247510"},{"key":"642_CR8","doi-asserted-by":"crossref","unstructured":"Barber, R., Lohman, G., Raman, V., Sidle, R., Lightstone, S., Schiefer, B.: In-memory BLU acceleration in IBM\u2019s DB2 and dashDB: optimized for modern workloads and hardware architectures. In: 2015 IEEE 31st International Conference on Data Engineering, pp. 1246\u20131252. ieeexplore.ieee.org (2015)","DOI":"10.1109\/ICDE.2015.7113372"},{"key":"642_CR9","doi-asserted-by":"crossref","unstructured":"Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory Hash join algorithms for multi-core CPUs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD\u201911, pp. 37\u201348 (2011)","DOI":"10.1145\/1989323.1989328"},{"key":"642_CR10","unstructured":"Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: memory access. In: Proceedings of the 25th International Conference on Very Large Data Bases, VLDB\u201999, pp. 54\u201365 (1999)"},{"issue":"2","key":"642_CR11","doi-asserted-by":"publisher","first-page":"13:1","DOI":"10.1145\/3310229","volume":"16","author":"P Budhkar","year":"2019","unstructured":"Budhkar, P., Absalyamov, I., Zois, V., Windh, S., Najjar, W.A., Tsotras, V.J.: Accelerating in-memory database selections using latency masking hardware threads. ACM Trans. Archit. Code Optim. 16(2), 13:1\u201313:28 (2019)","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"642_CR12","doi-asserted-by":"crossref","unstructured":"Burtscher, M., Nasre, R., Pingali, K.: A quantitative study of irregular programs on GPUs. In: 2012 IEEE International Symposium on Workload Characterization (IISWC), pp. 141\u2013151 (2012)","DOI":"10.1109\/IISWC.2012.6402918"},{"key":"642_CR13","doi-asserted-by":"crossref","unstructured":"Casper, J., Olukotun, K.: Hardware acceleration of database operations. In: Proceedings of the 2014 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 151\u2013160 (2014)","DOI":"10.1145\/2554688.2554787"},{"key":"642_CR14","first-page":"32","volume":"3","author":"S Chen","year":"2007","unstructured":"Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Improving hash join performance through prefetching. ACM Trans. Database Syst. 3, 32\u20133 (2007)","journal-title":"ACM Trans. Database Syst."},{"key":"642_CR15","doi-asserted-by":"crossref","unstructured":"Cheng, X., He, B., Du, X., Lau, C.T.: A study of main-memory hash joins on many-core processor: a case with intel knights landing architecture. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM\u2019 17, pp. 657\u2013666. ACM, NY, USA (2017)","DOI":"10.1145\/3132847.3132916"},{"key":"642_CR16","unstructured":"Cieslewicz, J., Ross, K.A.: Adaptive aggregation on chip multiprocessors. In: International Conference on Very Large Data Bases, VLDB \u201907, pp. 339\u2013350 (2007)"},{"key":"642_CR17","doi-asserted-by":"crossref","unstructured":"Cieslewicz, J., Ross, K.A.: Data partitioning on chip multiprocessors. In: Proceedings of the 4th International Workshop on Data Management on New Hardware, DaMoN \u201908, pp. 25\u201334 (2008)","DOI":"10.1145\/1457150.1457156"},{"key":"642_CR18","unstructured":"Convey Computers. http:\/\/www.conveycomputer.com (2015)"},{"key":"642_CR19","unstructured":"(dbInsight), T.B.: Amazon redshift turns AQUA. https:\/\/www.zdnet.com\/article\/amazon-redshift-turns-aqua\/. Accessed 2020-8-17"},{"issue":"4","key":"642_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2629471","volume":"7","author":"U Dhawan","year":"2015","unstructured":"Dhawan, U., Dehon, A.: Area-efficient near-associative memories on FPGAs. ACM Trans. Reconfigurable Technol. Syst.: TRETS 7(4), 1\u201322 (2015)","journal-title":"ACM Trans. Reconfigurable Technol. Syst.: TRETS"},{"key":"642_CR21","doi-asserted-by":"crossref","unstructured":"Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL server\u2019s memory-optimized OLTP engine. In: Proceedings of the ACM International Conference on Management of Data, SIGMOD \u201913, pp. 1243\u20131254. ACM, NY, USA (2013)","DOI":"10.1145\/2463676.2463710"},{"key":"642_CR22","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s00778-019-00581-w","volume":"29","author":"J Fang","year":"2019","unstructured":"Fang, J., Mulder, Y.T.B., Hidders, J., Lee, J., Hofstee, H.P.: In-memory database acceleration on FPGAs: a survey. VLDB J. 29, 33\u201359 (2019)","journal-title":"VLDB J."},{"issue":"1","key":"642_CR23","first-page":"28","volume":"35","author":"F F\u00e4rber","year":"2012","unstructured":"F\u00e4rber, F., May, N., Lehner, W., Gro\u00dfe, P., M\u00fcller, I., Rauhe, H., Dees, J.: The SAP HANA database\u2014an architecture overview. IEEE Data Eng. Bull. 35(1), 28\u201333 (2012)","journal-title":"IEEE Data Eng. Bull."},{"key":"642_CR24","doi-asserted-by":"crossref","unstructured":"Fernandez, E.B., Najjar, W.A., Lonardi, S., Villarreal, J.: Multithreaded FPGA acceleration of DNA sequence mapping. In: 2012 IEEE Conference on High Performance Extreme Computing, pp. 1\u20136. ieeexplore.ieee.org (2012)","DOI":"10.1109\/HPEC.2012.6408669"},{"key":"642_CR25","unstructured":"Francisco, P.: The Netezza data appliance architecture: a platform for high performance data warehousing and analytics. IBM Redbook. REDP-4725-00 (2011). https:\/\/www.ibmbigdatahub.com\/sites\/default\/files\/document\/redguide_2011.pdf"},{"key":"642_CR26","doi-asserted-by":"crossref","unstructured":"Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.: Quickly generating billion-record synthetic databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD\u201994, pp. 243\u2013252 (1994)","DOI":"10.1145\/191839.191886"},{"key":"642_CR27","unstructured":"Gupta, P.K.: Accelerating datacenter workloads. In: 26th International Conference on Field Programmable Logic and Applications (FPL). fpl2016.org (2016)"},{"key":"642_CR28","unstructured":"Halstead, R.J., Absalyamov, I., Najjar, W.A., Tsotras, V.J.: FPGA-based multithreading for in-memory hash joins. In: CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4\u20137, 2015, Online Proceedings, CIDR\u201915 (2015)"},{"key":"642_CR29","first-page":"21","volume":"20","author":"RJ Halstead","year":"2014","unstructured":"Halstead, R.J., Najjar, W.A., Huseini, O.: SpVM acceleration with latency masking threads on FPGAs. Algorithms 20, 21 (2014)","journal-title":"Algorithms"},{"key":"642_CR30","doi-asserted-by":"crossref","unstructured":"Halstead, R.J., Sukhwani, B., Min, H., Thoennes, M., Dube, P., Asaad, S., Iyer, B.: Accelerating join operation for relational databases with FPGAs. In: Proceedings of the 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM\u201913, pp. 17\u201320 (2013)","DOI":"10.1109\/FCCM.2013.17"},{"issue":"4","key":"642_CR31","doi-asserted-by":"publisher","first-page":"21:1","DOI":"10.1145\/1620585.1620588","volume":"34","author":"B He","year":"2009","unstructured":"He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21:1\u201321:39 (2009)","journal-title":"ACM Trans. Database Syst."},{"key":"642_CR32","unstructured":"Ionescu, M., Schauser, K.: Optimizing parallel bitonic sort. In: Proceedings of the 11th International Symposium on Parallel Processing, pp. 303\u2013309 (1997)"},{"key":"642_CR33","doi-asserted-by":"crossref","unstructured":"Jun, S., Xu, S.: Terabyte sort on FPGA-accelerated flash storage. In: 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 17\u201324. ieeexplore.ieee.org (2017)","DOI":"10.1109\/FCCM.2017.53"},{"key":"642_CR34","doi-asserted-by":"crossref","unstructured":"Khattab, O., Hammoud, M., Shekfeh, O.: Polyhj: a polymorphic main-memory hash join paradigm for multi-core machines. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM\u2019 18, pp. 1323\u20131332. ACM, NY, USA (2018)","DOI":"10.1145\/3269206.3271680"},{"issue":"2","key":"642_CR35","doi-asserted-by":"publisher","first-page":"1378","DOI":"10.14778\/1687553.1687564","volume":"2","author":"C Kim","year":"2009","unstructured":"Kim, C., Kaldewey, T., Lee, V.W., Sedlar, E., Nguyen, A.D., Satish, N., Chhugani, J., Di\u00a0Blas, A., Dubey, P.: Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. Proc. VLDB Endow. 2(2), 1378\u20131389 (2009)","journal-title":"Proc. VLDB Endow."},{"key":"642_CR36","unstructured":"Kim, K., Johnson, R., Pandis, I.: BionicDB: fast and power-efficient OLTP on FPGA. In: EDBT (2019)"},{"issue":"11","key":"642_CR37","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1109\/2.330035","volume":"27","author":"A Krikelis","year":"1994","unstructured":"Krikelis, A., Weems, C.C.: Associative processing and processors. Computer 27(11), 12\u201317 (1994)","journal-title":"Computer"},{"key":"642_CR38","unstructured":"Kuehn, J.T., Smith, B.J.: The horizon supercomputing system: architecture and software. In: Proceedings of the 1988 ACM\/IEEE Conference on Supercomputing, Supercomputing \u201988, pp. 28\u201334. IEEE, Los Alamitos, CA, USA (1988)"},{"issue":"3","key":"642_CR39","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1109\/TC.1983.1676217","volume":"100","author":"M Kumar","year":"1983","unstructured":"Kumar, M., Hirschberg, D.: An efficient implementation of batcher\u2019s odd-even merge algorithm and its application in parallel sorting schemes. IEEE Trans. Comput. 100(3), 254\u2013264 (1983)","journal-title":"IEEE Trans. Comput."},{"issue":"2","key":"642_CR40","first-page":"6","volume":"36","author":"T Lahiri","year":"2013","unstructured":"Lahiri, T., Neimat, M.-A., Folkman, S.: Oracle TimesTen: an in-memory database for enterprise applications. IEEE Data Eng. Bull. 36(2), 6\u201313 (2013)","journal-title":"IEEE Data Eng. Bull."},{"issue":"12","key":"642_CR41","doi-asserted-by":"publisher","first-page":"1706","DOI":"10.14778\/3137765.3137776","volume":"10","author":"J Lee","year":"2017","unstructured":"Lee, J., Kim, H., Yoo, S., Choi, K., Hofstee, H.P., Nam, G.-J., Nutter, M.R., Jamsek, D.: Extrav: boosting graph processing near storage with a coherent accelerator. Proc. VLDB Endow. 10(12), 1706\u20131717 (2017)","journal-title":"Proc. VLDB Endow."},{"key":"642_CR42","doi-asserted-by":"crossref","unstructured":"Ma, X., Zhang, D., Chiou, D.: FPGA-accelerated transactional execution of graph workloads. In: Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA\u201917, pp. 227\u2013236. ACM, NY, USA (2017)","DOI":"10.1145\/3020078.3021743"},{"issue":"4","key":"642_CR43","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1109\/TKDE.2002.1019210","volume":"14","author":"S Manegold","year":"2002","unstructured":"Manegold, S., Boncz, P., Kersten, M.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. 14(4), 709\u2013730 (2002)","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"642_CR44","unstructured":"MemSQL. https:\/\/www.memsql.com (2013)"},{"issue":"1","key":"642_CR45","doi-asserted-by":"publisher","first-page":"229","DOI":"10.14778\/1687627.1687654","volume":"2","author":"R Mueller","year":"2009","unstructured":"Mueller, R., Teubner, J., Alonso, G.: Streams on wires: a query compiler for FPGAs. Proc. VLDB Endow. 2(1), 229\u2013240 (2009)","journal-title":"Proc. VLDB Endow."},{"key":"642_CR46","doi-asserted-by":"crossref","unstructured":"M\u00fcller, I., Sanders, P., Lacurie, A., Lehner, W., F\u00e4rber, F.: Cache-efficient aggregation: hashing is sorting. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD\u201915, pp. 1123\u20131136. ACM, NY, USA (2015)","DOI":"10.1145\/2723372.2747644"},{"key":"642_CR47","unstructured":"Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., Menon, P., Mowry, T.C., Perron, M., Quah, I.: Others. Self-driving database management systems. In: CIDR, vol.\u00a04, p.\u00a01. cc.gatech.edu (2017)"},{"key":"642_CR48","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1007\/s00778-019-00546-z","volume":"29","author":"C Pohl","year":"2020","unstructured":"Pohl, C., Sattler, K.-U., Graefe, G.: Joins on high-bandwidth memory: a new level in the memory hierarchy. VLDB J. 29, 797\u2013817 (2020)","journal-title":"VLDB J."},{"key":"642_CR49","doi-asserted-by":"crossref","unstructured":"Putnam, A., Caulfield, A., Chung, E., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J.-Y., Lanka, S., Larus, J., Peterson, E., Pope, S., Smith, A., Thong, J., Xiao, P., Burger, D.: A reconfigurable fabric for accelerating large-scale datacenter services. In: 2014 ACM\/IEEE 41st International Symposium on Computer Architecture (ISCA), ISCA\u201914, pp. 13\u201324 (2014)","DOI":"10.1109\/ISCA.2014.6853195"},{"key":"642_CR50","doi-asserted-by":"crossref","unstructured":"Sadoghi, M., Javed, R., Tarafdar, N., Singh, H., Palaniappan, R., Jacobsen, H.-A.: Multi-query stream processing on FPGAs. In: Proceedings of the 2012 IEEE International Conference on Data Engineering, ICDE\u201912, pp. 1229\u20131232 (2012)","DOI":"10.1109\/ICDE.2012.39"},{"key":"642_CR51","unstructured":"Sheffield, D.: IvyTown Xeon+ FPGA: the HARP program. In: International Symposium on Computer Architecture (ISCA): Tutorial (2016)"},{"key":"642_CR52","doi-asserted-by":"crossref","unstructured":"Sidler, D., Istvan, Z., Owaida, M., Kara, K., Alonso, G.: doppioDB: a hardware accelerated database. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD \u201917, pp. 1659\u20131662. ACM, NY, USA (2017)","DOI":"10.1145\/3035918.3058746"},{"key":"642_CR53","unstructured":"Thistle, M.R., Smith, B.J.: A processor architecture for Horizon. In: Proceedings of the 1988 ACM\/IEEE Conference on Supercomputing, pp. 35\u201341 (1988)"},{"key":"642_CR54","doi-asserted-by":"crossref","unstructured":"T\u00f6z\u00fcn, P., Gold, B., Ailamaki, A.: OLTP in wonderland: where do cache misses come from in major OLTP components? In: Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN \u201913, pp. 8:1\u20138:6. ACM, NY, USA (2013)","DOI":"10.1145\/2485278.2485286"},{"key":"642_CR55","first-page":"48","volume":"33","author":"A Vahidsafa","year":"2013","unstructured":"Vahidsafa, A., Turullols, S., Smentek, D., Sivaramakrishnan, R., Loewenstein, P., Jairath, S., Feehrer, J.: The Oracle Sparc T5 16-core processor scales to eight sockets. IEEE Micro 33, 48\u201357 (2013)","journal-title":"IEEE Micro"},{"issue":"4","key":"642_CR56","doi-asserted-by":"publisher","first-page":"1071","DOI":"10.1109\/TKDE.2014.2359675","volume":"27","author":"L Wang","year":"2015","unstructured":"Wang, L., Zhou, M., Zhang, Z., Shan, M.-C., Zhou, A.: NUMA-aware scalable and efficient in-memory aggregation on large domains. IEEE Trans. Knowl. Data Eng. 27(4), 1071\u20131084 (2015)","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"642_CR57","doi-asserted-by":"crossref","unstructured":"Windh, S., Budhkar, P., Najjar, W.A.: CAMs as synchronizing caches for multithreaded irregular applications on FPGAs. In: Proceedings of the IEEE\/ACM International Conference on Computer-Aided Design, ICCAD \u201915, pp. 331\u2013336. IEEE, Piscataway, NJ, USA (2015)","DOI":"10.1109\/ICCAD.2015.7372588"},{"key":"642_CR58","doi-asserted-by":"crossref","unstructured":"Wu, L., Lottarini, A., Paine, T.K., Kim, M.A., Ross, K.A.: Q100: the architecture and design of a database processing unit. In: Proceeding of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS \u201914, pp. 255\u2013268 (2014)","DOI":"10.1145\/2541940.2541961"},{"key":"642_CR59","doi-asserted-by":"crossref","unstructured":"Ye, Y., Ross, K.A., Vesdapunt, N.: Scalable aggregation on multicore processors. In: Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN \u201911, pp. 1\u20139 (2011)","DOI":"10.1145\/1995441.1995442"},{"key":"642_CR60","unstructured":"Yiannacouras, P., Rose, J.: A parameterized automatic cache generator for FPGAs. In: Proceedings IEEE International Conference on Field-Programmable Technology (FPT), pp. 324\u2013327 (2003)"},{"key":"642_CR61","unstructured":"Zynq, X. http:\/\/www.xilinx.com\/products\/silicon-devices\/soc\/zynq-7000.html (2015)"}],"container-title":["The VLDB Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-020-00642-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00778-020-00642-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00778-020-00642-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,5,28]],"date-time":"2021-05-28T07:07:40Z","timestamp":1622185660000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00778-020-00642-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,11]]},"references-count":61,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,5]]}},"alternative-id":["642"],"URL":"https:\/\/doi.org\/10.1007\/s00778-020-00642-5","relation":{},"ISSN":["1066-8888","0949-877X"],"issn-type":[{"value":"1066-8888","type":"print"},{"value":"0949-877X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,11]]},"assertion":[{"value":"5 February 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 August 2020","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 October 2020","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 February 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}