{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,15]],"date-time":"2026-01-15T22:45:50Z","timestamp":1768517150850,"version":"3.49.0"},"reference-count":35,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2021,4,24]],"date-time":"2021-04-24T00:00:00Z","timestamp":1619222400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["JLPEA"],"abstract":"<jats:p>This paper proposes a Field-Programmable Gate Array (FPGA)-based hardware accelerator for assisting the embedded MicroBlaze soft-core processor in calculating population count. The population count is frequently required to be executed in cyber-physical systems and can be applied to large data sets, such as in the case of molecular similarity search in cheminformatics, or assisting with computations performed by binarized neural networks. The MicroBlaze instruction set architecture (ISA) does not support this operation natively, so the count has to be realized as either a sequence of native instructions (in software) or in parallel in a dedicated hardware accelerator. Different hardware accelerator architectures are analyzed and compared to one another and to implementing the population count operation in MicroBlaze. The achieved experimental results with large vector lengths (up to 217) demonstrate that the best hardware accelerator with DMA (Direct Memory Access) is ~31 times faster than the best software version running on MicroBlaze. The proposed architectures are scalable and can easily be adjusted to both smaller and bigger input vector lengths. The entire system was implemented and tested on a Nexys-4 prototyping board containing a low-cost\/low-power Artix-7 FPGA.<\/jats:p>","DOI":"10.3390\/jlpea11020020","type":"journal-article","created":{"date-parts":[[2021,4,25]],"date-time":"2021-04-25T02:12:57Z","timestamp":1619316777000},"page":"20","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Accelerating Population Count with a Hardware Co-Processor for MicroBlaze"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6684-9416","authenticated-orcid":false,"given":"Iouliia","family":"Skliarova","sequence":"first","affiliation":[{"name":"Department of Electronics, Telecommunications and Informatics, Institute of Electronics and Informatics Engineering of Aveiro (IEETA), Campus Universit\u00e1rio de Santiago, University of Aveiro, 3810-193 Aveiro, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,24]]},"reference":[{"key":"ref_1","first-page":"341","article-title":"An overview and some challenges in cyber-physical systems","volume":"93","author":"Kim","year":"2013","journal-title":"J. Indian Inst. Sci."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/s10270-015-0469-x","article-title":"Cyber-physical systems challenges: A needs analysis for collaborating embedded software systems","volume":"15","author":"Mosterman","year":"2016","journal-title":"Softw. Syst. Model"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Rodr\u00edguez, A., Valverde, J., Portilla, J., Otero, A., Riesgo, T., and de la Torre, E. (2018). FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework. Sensors, 18.","DOI":"10.3390\/s18061877"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Qasaimeh, M., Denolf, K., Vissers, J.L.K., Zambreno, J., and Jones, P.H. (2019, January 2\u20133). Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision Kernels. Proceedings of the 2019 IEEE International Conference on Embedded Software and Systems (ICESS), Las Vegas, NV, USA.","DOI":"10.1109\/ICESS.2019.8782524"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Hong, T., Kang, Y., and Chung, J. (2020). InSight: An FPGA-Based Neuromorphic Computing System for Deep Neural Networks. J. Low Power Electron. Appl., 10.","DOI":"10.3390\/jlpea10040036"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Spagnolo, F., Perri, S., Frustaci, F., and Corsonello, P. (2020). Energy-Efficient Architecture for CNNs Inference on Heterogeneous FPGA. J. Low Power Electron. Appl., 10.","DOI":"10.3390\/jlpea10010001"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Sarwar, I., Turvani, G., Casu, M.R., Tobon, J.A., Vipiana, F., Scapaticci, R., and Crocco, L. (2018). Low-Cost Low-Power Acceleration of a Microwave Imaging Algorithm for Brain Stroke Monitoring. J. Low Power Electron. Appl., 8.","DOI":"10.3390\/jlpea8040043"},{"key":"ref_8","unstructured":"Intel Corp (2021, March 14). Intel\u00ae 64 and IA-32 Architectures Software Developer\u2019s Manual, Volume 2 (2A, 2B, 2C & 2D): Instruction Set Reference, A\u2013Z. Available online: https:\/\/www.intel.com\/content\/dam\/www\/public\/us\/en\/documents\/manuals\/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf."},{"key":"ref_9","unstructured":"(2021, March 14). Arm, Lda., Arm Armv8-A A32\/T32 Instruction Set Architecture. Available online: https:\/\/developer.arm.com\/documentation\/ddi0597\/2020-12\/SIMD-FP-Instructions\/VCNT\u2014Vector-Count-Set-Bits-?lang=en."},{"key":"ref_10","unstructured":"Xilinx, Inc. (2021, March 14). MicroBlaze Processor Reference Guide. UG081 (v9.0). Available online: https:\/\/www.xilinx.com\/support\/documentation\/sw_manuals\/mb_ref_guide.pdf."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Nurvitadhi, E., Sheffield, D., Sim, J., Mishra, A., Venkatesh, G., and Marr, D. (2016, January 7\u20139). Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT), Xi\u2019an, China.","DOI":"10.1109\/FPT.2016.7929192"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Kim, J.H., Lee, J., and Anderson, J.H. (2018, January 10\u201314). FPGA Architecture Enhancements for Efficient BNN Implementation. Proceedings of the 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan.","DOI":"10.1109\/FPT.2018.00039"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3064","DOI":"10.1109\/TCSI.2019.2907488","article-title":"Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays","volume":"66","author":"Agrawal","year":"2019","journal-title":"IEEE Trans. Circuits Syst. I Regul. Pap."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Huang, C.H., Chen, P.J., Lin, Y.J., Chen, B.W., and Zheng, J.X. (2021). A robot-based intelligent management design for agricultural cyber-physical systems. Comput. Electron. Agric., 181.","DOI":"10.1016\/j.compag.2020.105967"},{"key":"ref_15","unstructured":"Schanck, J. (2020). Improving Post-Quantum Cryptography through Cryptanalysis. [Ph.D. Thesis, University of Waterloo]. Available online: https:\/\/uwspace.uwaterloo.ca\/bitstream\/handle\/10012\/16060\/Schanck_John.pdf?sequence=3&isAllowed=y."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"218","DOI":"10.1515\/jmc-2019-0027","article-title":"Improved cryptanalysis of the AJPS Mersenne based cryptosystem","volume":"14","author":"Coron","year":"2020","journal-title":"J. Math. Cryptol."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Mitchell, R., and Chen, I.R. (2014). A Survey of Intrusion Detection Techniques for Cyber-Physical Systems. ACM Comput. Surv., 55.","DOI":"10.1145\/2542049"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"220","DOI":"10.11648\/j.pamj.20160506.17","article-title":"Error Detection and Correction Using Hamming and Cyclic Codes in a Communication Channel","volume":"5","author":"John","year":"2016","journal-title":"Pure Appl. Math. J."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"76","DOI":"10.1186\/s13321-019-0398-8","article-title":"The chemfp project","volume":"11","author":"Dalke","year":"2019","journal-title":"J. Cheminform."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1093\/bioinformatics\/btw038","article-title":"ParDRe: Faster parallel duplicated reads removal tool for sequencing studies","volume":"32","author":"Schmidt","year":"2016","journal-title":"Bioinformatics"},{"key":"ref_21","unstructured":"Anderson, S.E. (2021, March 14). Bit Twiddling Hacks. Available online: http:\/\/graphics.stanford.edu\/~seander\/bithacks.html#CountBitsSetTable."},{"key":"ref_22","first-page":"8972065","article-title":"On-chip reconfigurable hardware accelerators for popcount computations","volume":"2016","author":"Sklyarov","year":"2016","journal-title":"Int. J. Re Config. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"63","DOI":"10.4316\/AECE.2014.02011","article-title":"Hamming Weight Counters and Comparators based on Embedded DSP Blocks for Implementation in FPGA","volume":"14","author":"Sklyarov","year":"2014","journal-title":"Adv. Electr. Comput. Eng."},{"key":"ref_24","first-page":"167","article-title":"Efficient Hamming weight comparators for binary vectors based on accumulative and up\/down parallel counters","volume":"56","author":"Parhami","year":"2009","journal-title":"IEEE Trans. Circuits Syst. Ii Express Briefs"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1049\/el:20070141","article-title":"Efficient Hamming weight comparators of binary vectors","volume":"43","author":"Piestrak","year":"2007","journal-title":"Electron. Lett."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1007\/s00607-013-0360-y","article-title":"Design and implementation of counting networks","volume":"97","author":"Sklyarov","year":"2015","journal-title":"Computing"},{"key":"ref_27","first-page":"1","article-title":"Beating the Popcount","volume":"9","year":"2003","journal-title":"Int. J. Inf. Technol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1007\/s11265-014-0915-y","article-title":"Multi-core DSP-based vector set bits counters\/comparators","volume":"80","author":"Sklyarov","year":"2015","journal-title":"J. Signal. Process. Syst."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Sklyarov, V., Skliarova, I., Barkalov, A., and Titarenko, L. (2014). Synthesis and Optimization of FPGA-Based Systems, Springer.","DOI":"10.1007\/978-3-319-04708-9"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Pilz, S., Porrmann, F., Kaiser, M., Hagemeyer, J., Hogan, J.M., and R\u00fcckert, U. (2020). Accelerating Binary String Comparisons with a Scalable, Streaming-Based System Architecture Based on FPGAs. Algorithms, 13.","DOI":"10.3390\/a13020047"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3337929","article-title":"Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing","volume":"12","author":"Umuroglu","year":"2019","journal-title":"ACM Trans. Reconfig. Technol. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rasoulinezhad, S., Zhou, H., Wang, L., Boland, D., and Leong, P.H.W. (2020, January 26\u201328). LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations. Proceedings of the 2020 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA.","DOI":"10.1145\/3373087.3375303"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Preu\u00dfer, T.B. (2017, January 4\u20138). Generic and Universal Parallel Matrix Summation with a Flexible Compression Goal for Xilinx FPGAs. Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium.","DOI":"10.23919\/FPL.2017.8056834"},{"key":"ref_34","unstructured":"(2021, March 21). Xilinx, Inc. 7 Series FPGAs Data Sheet: Overview. Available online: https:\/\/www.xilinx.com\/support\/documentation\/data_sheets\/ds180_7Series_Overview.pdf."},{"key":"ref_35","unstructured":"(2021, March 21). Digilent, Nexys 4 Reference Manual. Available online: https:\/\/reference.digilentinc.com\/reference\/programmable-logic\/nexys-4\/reference-manual."}],"container-title":["Journal of Low Power Electronics and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-9268\/11\/2\/20\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:52:15Z","timestamp":1760161935000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-9268\/11\/2\/20"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,24]]},"references-count":35,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["jlpea11020020"],"URL":"https:\/\/doi.org\/10.3390\/jlpea11020020","relation":{},"ISSN":["2079-9268"],"issn-type":[{"value":"2079-9268","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,24]]}}}