{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:58:45Z","timestamp":1760151525091,"version":"build-2065373602"},"reference-count":51,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T00:00:00Z","timestamp":1648166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Electronics"],"abstract":"<jats:p>Many practical data-processing algorithms fail to execute efficiently on general-purpose CPUs (Central Processing Units) due to the sequential matter of their operations and memory bandwidth limitations. To achieve desired performance levels, reconfigurable (FPGA (Field-Programmable Gate Array)-based) hardware accelerators are frequently explored that permit the processing units\u2019 architectures to be better adapted to the specific problem\/algorithm requirements. In particular, network-based data-processing algorithms are very well suited to implementation in reconfigurable hardware because several data-independent operations can easily and naturally be executed in parallel over as many processing blocks as actually required and technically possible. GPUs (Graphics Processing Units) have also demonstrated good results in this area but they tend to use significantly more power than FPGA, which could be a limiting factor in embedded applications. Moreover, GPUs employ a Single Instruction, Multiple Threads (SIMT) execution model and are therefore optimized to SIMD (Single Instruction, Multiple Data) operations, while in FPGAs fully custom datapaths can be built, eliminating much of the control overhead. This review paper aims to analyze, compare, and discuss different approaches to implementing network-based hardware accelerators in FPGA and programmable SoC (Systems-on-Chip). The performed analysis and the derived recommendations would be useful to hardware designers of future network-based hardware accelerators.<\/jats:p>","DOI":"10.3390\/electronics11071029","type":"journal-article","created":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T15:31:21Z","timestamp":1648222281000},"page":"1029","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["A Survey of Network-Based Hardware Accelerators"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6684-9416","authenticated-orcid":false,"given":"Iouliia","family":"Skliarova","sequence":"first","affiliation":[{"name":"Institute of Electronics and Informatics Engineering of Aveiro (IEETA), Department of Electronics, Telecommunications and Informatics, University of Aveiro, Campus Universit\u00e1rio de Santiago, 3810-193 Aveiro, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,25]]},"reference":[{"key":"ref_1","unstructured":"Oak Ridge National Laboratory (2022, January 08). SUMMIT Oak Ridge National Laboratory\u2019s 200 Petaflop Supercomputer, Available online: https:\/\/www.olcf.ornl.gov\/olcf-resources\/compute-systems\/summit\/."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"072001","DOI":"10.1007\/s11432-016-5588-7","article-title":"The Sunway TaihuLight supercomputer: System and applications","volume":"59","author":"Fu","year":"2019","journal-title":"Sci. China Inf. Sci."},{"key":"ref_3","unstructured":"Fujitsu (2022, January 08). Supercomputer Fugaku Specifications. Available online: https:\/\/www.fujitsu.com\/global\/about\/innovation\/fugaku\/specifications\/."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1016\/j.micpro.2019.05.012","article-title":"Constraint programming in embedded systems design: Considered helpful","volume":"69","author":"Kuchcinski","year":"2019","journal-title":"Microprocess. Microsyst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Rodr\u00edguez, A., Valverde, J., Portilla, J., Otero, A., Riesgo, T., and De la Torre, E. (2018). FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework. Sensors, 18.","DOI":"10.3390\/s18061877"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"e6055","DOI":"10.1002\/cpe.6055","article-title":"A high-performance FPGA-based multicrossbar prioritized network-on-chip","volume":"33","author":"Alaei","year":"2021","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Podobas, A., Zohouri, H.R., Maruyama, N., and Matsuoka, S. (2017, January 4\u20138). Evaluating high-level design strategies on FPGAs for high-performance computing. Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium.","DOI":"10.23919\/FPL.2017.8056760"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1007\/s11740-020-00964-x","article-title":"Data acquisition and control at the edge: A hardware\/software-reconfigurable approach","volume":"14","author":"Streit","year":"2020","journal-title":"Prod. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Vanderbauwhede, W., and Benkrid, K. (2013). High-Performance Computing Using FPGAs, Springer.","DOI":"10.1007\/978-1-4614-1791-0"},{"key":"ref_10","unstructured":"Zohouri, H.R. (2018). High Performance Computing with FPGAs and OpenCL. [Ph.D. Thesis, Tokyo Institute of Technology]. Available online: https:\/\/arxiv.org\/ftp\/arxiv\/papers\/1810\/1810.09773.pdf."},{"key":"ref_11","unstructured":"Xiong, Q. (2019). FPGA Acceleration of High Performance Computing Communication Middleware. [Ph.D. Thesis, Boston University]. Available online: https:\/\/open.bu.edu\/handle\/2144\/38211."},{"key":"ref_12","first-page":"1","article-title":"Real-time high definition license plate localization and recognition accelerator for IoT endpoint system on chip","volume":"25","author":"Huang","year":"2022","journal-title":"J. Appl. Sci. Eng."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1109\/TPDS.2021.3124125","article-title":"FARNN: FPGA-GPU Hybrid Acceleration Platform for Recurrent Neural Networks","volume":"33","author":"Cho","year":"2022","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1109\/TPDS.2021.3104257","article-title":"EXA2PRO: A Framework for High Development Productivity on Heterogeneous Computing Systems","volume":"33","author":"Papadopoulos","year":"2022","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"2944","DOI":"10.1109\/TIP.2014.2311656","article-title":"A distributed canny edge detector: Algorithm and FPGA implementation","volume":"23","author":"Xu","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"1861","DOI":"10.1109\/TVLSI.2019.2905242","article-title":"A high-throughput and power-efficient FPGA implementation of yolo CNN for object detection","volume":"27","author":"Nguyen","year":"2019","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1109","DOI":"10.1007\/s00521-018-3761-1","article-title":"A survey of FPGA-based accelerators for convolutional neural networks","volume":"32","author":"Mittal","year":"2020","journal-title":"Neural Comput. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3079758","article-title":"Throughput-optimized FPGA accelerator for deep convolutional neural networks","volume":"10","author":"Liu","year":"2017","journal-title":"ACM Trans. Reconfig. Technol. Syst."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1038\/s41928-018-0057-5","article-title":"High-performance parallel computing for next-generation holographic imaging","volume":"1","author":"Sugie","year":"2018","journal-title":"Nat. Electron."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1109\/JPROC.2018.2802438","article-title":"Onboard Processing with Hybrid and Reconfigurable Computing on Small Satellites","volume":"106","author":"George","year":"2018","journal-title":"Proc. IEEE"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Seng, K.P., Lee, P.J., and Ang, L.M. (2021). Embedded intelligence on FPGA: Survey, applications and challenges. Electronics, 10.","DOI":"10.3390\/electronics10080895"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1109\/MCAS.2021.3071609","article-title":"A Survey of FPGA-Based Robotic Computing","volume":"21","author":"Wan","year":"2021","journal-title":"IEEE Circuits Syst. Mag."},{"key":"ref_23","unstructured":"Knuth, D.E. (2011). The Art of Computer Programming. Sorting and Searching, Addison-Wesley. [3rd ed.]."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"3430","DOI":"10.1109\/TCSI.2008.924892","article-title":"Algorithms of Finding the First Two Minimum Values and Their Hardware Implementation","volume":"55","author":"Wey","year":"2008","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Skliarova, I., and Sklyarov, V. (2019). FPGA-Based Hardware Accelerators, Springer.","DOI":"10.1007\/978-3-030-20721-2"},{"key":"ref_26","first-page":"557","article-title":"Design and implementation of counting networks","volume":"97","author":"Sklyarov","year":"2015","journal-title":"Comput. J."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00778-011-0232-z","article-title":"Sorting Networks on FPGAs","volume":"21","author":"Mueller","year":"2012","journal-title":"Int. J. Very Large Data Bases"},{"key":"ref_28","unstructured":"Mueller, R. (2010). Data Stream Processing on Embedded Devices. [Ph.D. Thesis, ETH]."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zuluaga, M., Milder, P., and Puschel, M. (2012, January 3\u20137). Computer Generation of Streaming Sorting Networks. Proceedings of the 49th Design Automation Conference, San Francisco, CA, USA.","DOI":"10.1145\/2228360.2228588"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"47","DOI":"10.4316\/AECE.2013.04008","article-title":"Fast Regular Circuits for Network-based Parallel Data Processing","volume":"13","author":"Sklyarov","year":"2013","journal-title":"Adv. Electr. Comput. Eng."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1016\/j.micpro.2014.03.003","article-title":"High-performance implementation of regular and easily scalable sorting networks on an FPGA","volume":"38","author":"Sklyarov","year":"2014","journal-title":"Microprocess. Microsyst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"323","DOI":"10.3176\/proc.2017.3.07","article-title":"Fast Iterative Circuits and RAM-based Mergers to Accelerate Data Sort in Software\/Hardware Systems","volume":"66","author":"Sklyarov","year":"2017","journal-title":"Proc. Est. Acad. Sci."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1471","DOI":"10.1109\/TVLSI.2018.2822300","article-title":"Low-Cost Sorting Network Circuits Using Unary Processing","volume":"26","author":"Najafi","year":"2018","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1601","DOI":"10.1109\/TVLSI.2019.2912554","article-title":"RTHS: A Low-Cost High-Performance Real-Time Hardware Sorter, Using a Multidimensional Sorting Algorithm","volume":"27","author":"Norollah","year":"2019","journal-title":"IEEE Trans. Very Large Scale Integr. (VLSI) Syst."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Srivastava, A., Chen, R., Prasanna, V.K., and Chelmis, C. (2015, January 7\u20139). A hybrid design for high performance large-scale sorting on FPGA. Proceedings of the 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig), Riviera Maya, Mexico.","DOI":"10.1109\/ReConFig.2015.7393322"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Ricco, M., Mathe, L., Monmasson, E., and Teodorescu, R. (2018). FPGA-Based Implementation of MMC Control Based on Sorting Networks. Energies, 11.","DOI":"10.3390\/en11092394"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Mendoza, I.L., Pizano Escalante, J.L., Gonz\u00e1lez, J.C., and Longoria G\u00e1ndara, O.H. (2019, January 5\u20137). Implementation of a parameterizable sorting network for spatial modulation detection on FPGA. Proceedings of the 2019 IEEE Colombian Conference on Communications and Computing (COLCOM), Barranquilla, Colombia.","DOI":"10.1109\/ColComCon.2019.8809112"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Ayoubi, R., Istambouli, S., Abbas, A.W., and Akkad, G. (2019, January 3\u20135). Hardware Architecture For A Shift-Based Parallel Odd-Even Transposition Sorting Network. Proceedings of the 2019 Fourth International Conference on Advances in Computational Tools for Engineering Applications (ACTEA), Beirut, Lebanon.","DOI":"10.1109\/ACTEA.2019.8851099"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chen, R., Siriyal, S., and Prasanna, V. (2015, January 22\u201324). Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA. Proceedings of the 2015 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/2684746.2689068"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Farmahini-Farahani, A. (2012). Modular Design of High-Throughput, Low-Latency Sorting Units. [Master\u2019s Thesis, University of Wisconsin\u2013Madison].","DOI":"10.1109\/TC.2012.108"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Tzimpragos, G., Kachris, C., Soudris, D., and Tomkos, I. (2014, January 19\u201323). A Low-Latency Algorithm and FPGA Design for the Min-Search of LDPC Decoders. Proceedings of the IEEE International Parallel & Distributed Processing Symposium Workshop\u2014IPDPSW\u20192014, Phoenix, AZ, USA.","DOI":"10.1109\/IPDPSW.2014.36"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Skliarova, I. (2021). Accelerating Population Count with a Hardware Co-Processor for MicroBlaze. J. Low Power Electron. Appl., 11.","DOI":"10.3390\/jlpea11020020"},{"key":"ref_43","unstructured":"Pedroni, V. (2004, January 23\u201326). Compact Hamming-comparator-based rank order filter for digital VLSI and FPGA implementations. Proceedings of the IEEE International Symposium on Circuits and Systems\u2014ISCAS\u20192004, Vancouver, BC, Canada."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1049\/el:20070141","article-title":"Efficient Hamming weight comparators of binary vectors","volume":"43","author":"Piestrak","year":"2007","journal-title":"Electron Lett."},{"key":"ref_45","first-page":"167","article-title":"Efficient Hamming weight comparators for binary vectors based on accumulative and up\/down parallel counters","volume":"56","author":"Parhami","year":"2009","journal-title":"IEEE Trans. Circuits Syst. II Express Briefs"},{"key":"ref_46","first-page":"4825","article-title":"Digital Hamming weight and distance analyzers for binary vectors and matrices","volume":"9","author":"Sklyarov","year":"2013","journal-title":"Int. J. Innov. Comput. Inf. Control"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"8972065","DOI":"10.1155\/2016\/8972065","article-title":"On-chip reconfigurable hardware accelerators for popcount computations","volume":"2016","author":"Sklyarov","year":"2016","journal-title":"Int. J. Reconfig. Comput."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Pilz, S., Porrmann, F., Kaiser, M., Hagemeyer, J., Hogan, J.M., and R\u00fcckert, U. (2020). Accelerating Binary String Comparisons with a Scalable, Streaming-Based System Architecture Based on FPGAs. Algorithms, 13.","DOI":"10.3390\/a13020047"},{"key":"ref_49","first-page":"1","article-title":"Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing. ACM Trans. Reconfig","volume":"12","author":"Umuroglu","year":"2019","journal-title":"Technol. Syst."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Rasoulinezhad, S., Zhou, H., Wang, L., Boland, D., and Leong, P.H.W. (2020, January 26\u201328). LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations. Proceedings of the 2020 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3373087.3375303"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1003","DOI":"10.1587\/transinf.2016EDP7383","article-title":"A High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism","volume":"100","author":"Kobayashi","year":"2017","journal-title":"IEICE Trans. Inf. Syst."}],"container-title":["Electronics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-9292\/11\/7\/1029\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:43:00Z","timestamp":1760136180000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-9292\/11\/7\/1029"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,25]]},"references-count":51,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["electronics11071029"],"URL":"https:\/\/doi.org\/10.3390\/electronics11071029","relation":{},"ISSN":["2079-9292"],"issn-type":[{"type":"electronic","value":"2079-9292"}],"subject":[],"published":{"date-parts":[[2022,3,25]]}}}