{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,27]],"date-time":"2025-03-27T15:15:23Z","timestamp":1743088523822,"version":"3.40.3"},"publisher-location":"Cham","reference-count":22,"publisher":"Springer Nature Switzerland","isbn-type":[{"type":"print","value":"9783031215339"},{"type":"electronic","value":"9783031215346"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,1,18]],"date-time":"2023-01-18T00:00:00Z","timestamp":1674000000000},"content-version":"vor","delay-in-days":382,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Heterogeneous accelerator enhanced computing architectures are a common solution in embedded computing, mainly due to the constraints in energy and power efficiency. Such accelerator enhanced systems dispatch data- and computing-intensive tasks to specialized, optimized and thus efficient hardware units, leaving most control flow tasks for the more generic but less efficient central processing units (CPUs). Nowadays, also high-performance computing (HPC) systems are becoming more heterogeneous by incorporating accelerators into the computing nodes.<\/jats:p><jats:p>In this chapter, we introduce the concept of heterogeneous computing and present the design of a hardware accelerator for solving the Link Assessment (LA) problem, in introduced Chapter 3. The hardware accelerator integrates its main dedicated processing units with a customized cache design and light-weight data path. We provide detailed area, energy, and timing results for a 28\u00a0nm application specific integrated circuit (ASIC) process and DDR3 memory devices. Compared to an CPU-based cluster, our proposed solution uses 38x less memory and is 1030x more energy efficient for processing a users-movies dataset with half a million edges.<\/jats:p>","DOI":"10.1007\/978-3-031-21534-6_4","type":"book-chapter","created":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T20:02:53Z","timestamp":1673985773000},"page":"57-75","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Custom Hardware Architecture for\u00a0the\u00a0Link Assessment Problem"],"prefix":"10.1007","author":[{"given":"Andr\u00e9","family":"Chinazzo","sequence":"first","affiliation":[]},{"given":"Christian De","family":"Schryver","sequence":"additional","affiliation":[]},{"given":"Katharina","family":"Zweig","sequence":"additional","affiliation":[]},{"given":"Norbert","family":"Wehn","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,1,18]]},"reference":[{"issue":"10","key":"4_CR1","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1145\/1562764.1562783","volume":"52","author":"K Asanovic","year":"2009","unstructured":"Asanovic, K., et al.: A view of the parallel computing landscape. Commun. ACM 52(10), 56\u201367 (2009). https:\/\/doi.org\/10.1145\/1562764.1562783","journal-title":"Commun. ACM"},{"unstructured":"Brugger, C.: A new approach to efficient heterogeneous computing = Ein neuer Ansatz f\u00fcr effiziente, heterogene Datenverarbeitung. Ph.D. thesis, University of Kaiserslautern, Germany (2016)","key":"4_CR2"},{"issue":"1","key":"4_CR3","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1109\/MDAT.2017.2750900","volume":"35","author":"C Brugger","year":"2018","unstructured":"Brugger, C., Grigorovici, V., Jung, M., de Schryver, C., Weis, C., Wehn, N., Zweig, K.A.: A memory centric architecture of the link assessment algorithm in large graphs. IEEE Des. Test 35(1), 7\u201315 (2018). https:\/\/doi.org\/10.1109\/MDAT.2017.2750900","journal-title":"IEEE Des. Test"},{"doi-asserted-by":"publisher","unstructured":"Brugger, C., et al.: A custom computing system for finding similarties in complex networks. In: ISVLSI, pp. 262\u2013267. IEEE Computer Society (2015). https:\/\/doi.org\/10.1109\/ISVLSI.2015.78","key":"4_CR4","DOI":"10.1109\/ISVLSI.2015.78"},{"unstructured":"Duranton, M., et al.: Hipeac vision 2019. European Network of Excellence on High Performance and Embedded Architecture and Compilation (HiPEAC) (2019)","key":"4_CR5"},{"unstructured":"Dutoit, D., et al.: A 0.9 pJ\/bit, 12.8 GByte\/s WideIO memory interface in a 3D-IC NoC-based MPSoC. In: Symposium, VLSIT, pp. C22\u2013C23. IEEE (2013)","key":"4_CR6"},{"issue":"3","key":"4_CR7","doi-asserted-by":"publisher","first-page":"122","DOI":"10.1109\/MM.2012.17","volume":"32","author":"H Esmaeilzadeh","year":"2012","unstructured":"Esmaeilzadeh, H., Blem, E.R., Amant, R.S., Sankaralingam, K., Burger, D.: Dark silicon and the end of multicore scaling. IEEE Micro 32(3), 122\u2013134 (2012). https:\/\/doi.org\/10.1109\/MM.2012.17","journal-title":"IEEE Micro"},{"doi-asserted-by":"publisher","unstructured":"Garraghan, P., Al-Anii, Y., Summers, J., Thompson, H., Kapur, N., Djemame, K.: A unified model for holistic power usage in cloud datacenter servers. In: UCC, pp. 11\u201319. ACM (2016). https:\/\/doi.org\/10.1145\/2996890.2996896","key":"4_CR8","DOI":"10.1145\/2996890.2996896"},{"key":"4_CR9","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/978-3-540-77220-0_21","volume-title":"High Performance Computing \u2013 HiPC 2007","author":"P Harish","year":"2007","unstructured":"Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197\u2013208. Springer, Heidelberg (2007). https:\/\/doi.org\/10.1007\/978-3-540-77220-0_21"},{"doi-asserted-by":"publisher","unstructured":"Howard, J., et al.: A 48-core IA-32 message-passing processor with DVFS in 45 nm CMOS. In: ISSCC, pp. 108\u2013109. IEEE (2010). https:\/\/doi.org\/10.1109\/ISSCC.2010.5434077","key":"4_CR10","DOI":"10.1109\/ISSCC.2010.5434077"},{"unstructured":"Jung, M.: System-level modeling, analysis and optimization of dram memories and controller architectures. Ph.D. thesis, University of Kaiserslautern, Germany (2017)","key":"4_CR11"},{"key":"4_CR12","doi-asserted-by":"publisher","first-page":"63","DOI":"10.2197\/ipsjtsldm.8.63","volume":"8","author":"M Jung","year":"2015","unstructured":"Jung, M., Weis, C., Wehn, N.: Dramsys: a flexible DRAM subsystem design space exploration framework. IPSJ Trans. Syst. LSI Des. Methodol. 8, 63\u201374 (2015). https:\/\/doi.org\/10.2197\/ipsjtsldm.8.63","journal-title":"IPSJ Trans. Syst. LSI Des. Methodol."},{"issue":"3","key":"4_CR13","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1109\/MDAT.2014.2314600","volume":"31","author":"EA Lee","year":"2014","unstructured":"Lee, E.A., et al.: The swarm at the edge of the cloud. IEEE Des. Test 31(3), 8\u201320 (2014). https:\/\/doi.org\/10.1109\/MDAT.2014.2314600","journal-title":"IEEE Des. Test"},{"doi-asserted-by":"publisher","unstructured":"Miller, B.A., et al.: A scalable signal processing architecture for massive graph analysis. In: ICASSP, pp. 5329\u20135332. IEEE (2012). https:\/\/doi.org\/10.1109\/ICASSP.2012.6289124","key":"4_CR14","DOI":"10.1109\/ICASSP.2012.6289124"},{"issue":"5594","key":"4_CR15","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1126\/science.298.5594.824","volume":"298","author":"R Milo","year":"2002","unstructured":"Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon, U.: Network motifs: simple building blocks of complex networks. Science 298(5594), 824\u2013827 (2002). https:\/\/doi.org\/10.1126\/science.298.5594.824","journal-title":"Science"},{"issue":"1","key":"4_CR16","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1093\/comjnl\/bxx046","volume":"61","author":"W Mula","year":"2018","unstructured":"Mula, W., Kurz, N., Lemire, D.: Faster population counts using AVX2 instructions. Comput. J. 61(1), 111\u2013120 (2018). https:\/\/doi.org\/10.1093\/comjnl\/bxx046","journal-title":"Comput. J."},{"unstructured":"de Schryver, C.: Design methodologies for hardware accelerated heterogeneous computing systems. Ph.D. thesis, University of Kaiserslautern, Germany (2014)","key":"4_CR17"},{"issue":"5","key":"4_CR18","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/MSP.2014.2327238","volume":"31","author":"K Slavakis","year":"2014","unstructured":"Slavakis, K., Giannakis, G.B., Mateos, G.: Modeling and optimization for big data analytics: (statistical) learning tools for our era of data deluge. IEEE Signal Process. Mag. 31(5), 18\u201331 (2014). https:\/\/doi.org\/10.1109\/MSP.2014.2327238","journal-title":"IEEE Signal Process. Mag."},{"issue":"4","key":"4_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pone.0152536","volume":"11","author":"A Spitz","year":"2016","unstructured":"Spitz, A., Gimmler, A., Stoeck, T., Zweig, K.A., Horv\u00e1t, E.: Assessing low-intensity relationships in complex networks. PLoS ONE 11(4), 1\u201317 (2016). https:\/\/doi.org\/10.1371\/journal.pone.0152536","journal-title":"PLoS ONE"},{"issue":"4","key":"4_CR20","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1145\/1498765.1498785","volume":"52","author":"S Williams","year":"2009","unstructured":"Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65\u201376 (2009). https:\/\/doi.org\/10.1145\/1498765.1498785","journal-title":"Commun. ACM"},{"unstructured":"Zweig, K.A., Brugger, C., Grigorovici, V., De Schryver, C., Wehn, N.: Automated determination of network motifs (2015)","key":"4_CR21"},{"issue":"3","key":"4_CR22","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1007\/s13278-011-0021-0","volume":"1","author":"KA Zweig","year":"2011","unstructured":"Zweig, K.A., Kaufmann, M.: A systematic approach to the one-mode projection of bipartite graphs. Soc. Netw. Analys. Min. 1(3), 187\u2013218 (2011). https:\/\/doi.org\/10.1007\/s13278-011-0021-0","journal-title":"Soc. Netw. Analys. Min."}],"container-title":["Lecture Notes in Computer Science","Algorithms for Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-21534-6_4","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T20:03:29Z","timestamp":1673985809000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-21534-6_4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783031215339","9783031215346"],"references-count":22,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-21534-6_4","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"18 January 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}}]}}