{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:43:39Z","timestamp":1760031819585,"version":"build-2065373602"},"reference-count":24,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,3,24]],"date-time":"2025-03-24T00:00:00Z","timestamp":1742774400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSERC Discovery Grants program","award":["RGPIN-2019-05166","RGPIN-2021-03935"],"award-info":[{"award-number":["RGPIN-2019-05166","RGPIN-2021-03935"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>High-Level Synthesis (HLS) tools have transformed FPGA development by streamlining digital design and enhancing efficiency. Meanwhile, advancements in semiconductor technology now support the integration of hundreds of floating-point units on a single chip, enabling more resource-intensive computations. CuFP, an HLS library, facilitates the creation of customized floating-point operators with configurable exponent and mantissa bit widths, providing greater flexibility and resource efficiency. This paper introduces the integration of the self-alignment technique (SAT) into the CuFP library, extending its capability for customized addition-related floating-point operations with enhanced precision and resource utilization. Our findings demonstrate that incorporating SAT into CuFP enables the efficient FPGA deployment of complex floating-point operators, achieving significant reductions in computational latency and improved resource efficiency. Specifically, for a vector size of 64, CuFPSAF reduces execution cycles by 29.4% compared to CuFP and by 81.5% compared to vendor IP while maintaining the same DSP utilization as CuFP and reducing it by 59.7% compared to vendor IP. These results highlight the efficiency of SAT in FPGA-based floating-point computations.<\/jats:p>","DOI":"10.3390\/computers14040118","type":"journal-article","created":{"date-parts":[[2025,3,24]],"date-time":"2025-03-24T12:01:50Z","timestamp":1742817710000},"page":"118","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing CuFP Library with Self-Alignment Technique"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8478-4388","authenticated-orcid":false,"given":"Fahimeh","family":"Hajizadeh","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, Polytechnique Montreal, Montreal, QC H3T 1J4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9000-5467","authenticated-orcid":false,"given":"Tarek","family":"Ould-Bachir","sequence":"additional","affiliation":[{"name":"MOTCE Laboratory, Department of Computer Engineering, Polytechnique Montr\u00e9al, Montreal, QC H3T 1J4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7707-0483","authenticated-orcid":false,"given":"Jean Pierre","family":"David","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Polytechnique Montreal, Montreal, QC H3T 1J4, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Xie, K., Lu, Q., Jiang, H., and Wang, H. (2025). Accurate Sum and Dot Product with New Instruction for High-Precision Computing on ARMv8 Processor. Mathematics, 13.","DOI":"10.3390\/math13020270"},{"key":"ref_2","unstructured":"Xilinx (2023). UG1399: Vitis High-Level Synthesis User Guide, Xilinx."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Hajizadeh, F., Ould-Bachir, T., and David, J.P. (2024). CuFP: An HLS Library for Customized Floating-Point Operators. Electronics, 13.","DOI":"10.20944\/preprints202406.1239.v1"},{"key":"ref_4","unstructured":"Sohn, J., and Swartzlander, E.E. (2013, January 7\u201310). Improved Architectures for a Floating-Point Fused Dot Product Unit. Proceedings of the 2013 IEEE 21st Symposium on Computer Arithmetic, Austin, TX, USA."},{"key":"ref_5","first-page":"1","article-title":"Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators","volume":"6","author":"David","year":"2013","journal-title":"Acm Trans. Reconfigurable Technol. Syst."},{"key":"ref_6","unstructured":"(2008). IEEE Standard for Floating-Point Arithmetic (Standard No. Std 754-2008)."},{"key":"ref_7","first-page":"579","article-title":"Novel architecture for floating point accumulator with cancelation error detection","volume":"66","author":"Jamro","year":"2023","journal-title":"Bull. Pol. Acad. Sci. Tech. Sci."},{"key":"ref_8","unstructured":"Perera, A., Nilsen, R., Haugan, T., and Ljokelsoy, K. (2021, January 3\u20137). A Design Method of an Embedded Real-Time Simulator for Electric Drives using Low-Cost System-on-Chip Platform. Proceedings of the PCIM Europe Digital Days 2021; International Exhibition and Conference for Power Electronics, Intelligent Motion, Renewable Energy and Energy Management, Virtual Event."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zamiri, E., Sanchez, A., Yushkova, M., Mart\u00ednez-Garc\u00eda, M.S., and de Castro, A. (2021). Comparison of Different Design Alternatives for Hardware-in-the-Loop of Power Converters. Electronics, 10.","DOI":"10.3390\/electronics10080926"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Sanchez, A., Todorovich, E., and De Castro, A. (2018). Exploring the Limits of Floating-Point Resolution for Hardware-In-the-Loop Implemented with FPGAs. Electronics, 7.","DOI":"10.3390\/electronics7100219"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mart\u00ednez-Garc\u00eda, M.S., de Castro, A., Sanchez, A., and Garrido, J. (2019). Analysis of Resolution in Feedback Signals for Hardware-in-the-Loop Models of Power Converters. Electronics, 8.","DOI":"10.3390\/electronics8121527"},{"key":"ref_12","unstructured":"Silvano, C., Pilato, C., and Reichenbach, M. (2023, January 2\u20136). TrueFloat: A Templatized Arithmetic Library for HLS Floating-Point Operators. Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece."},{"key":"ref_13","unstructured":"Thomas, D.B. (May, January 28). Templatised Soft Floating-Point for High-Level Synthesis. Proceedings of the IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), San Diego, CA, USA."},{"key":"ref_14","unstructured":"Gao, J., Shen, J., Zhang, Y., Ji, W., and Huang, H. (2024). Precision-Aware Iterative Algorithms Based on Group-Shared Exponents of Floating-Point Numbers. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Filippas, D., Nicopoulos, C., and Dimitrakopoulos, G. (2022). Templatized Fused Vector Floating-Point Dot Product for High-Level Synthesis. J. Low Power Electron. Appl., 12.","DOI":"10.3390\/jlpea12040056"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1109\/MDT.2011.44","article-title":"Designing Custom Arithmetic Data Paths with FloPoCo","volume":"28","author":"Pasca","year":"2011","journal-title":"IEEE Des. Test Comput."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1839480.1839486","article-title":"VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware","volume":"3","author":"Wang","year":"2010","journal-title":"ACM Trans. Reconfigurable Technol. Syst."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1109\/TC.2010.271","article-title":"FFT Implementation with Fused Floating-Point Operations","volume":"61","author":"Swartzlander","year":"2012","journal-title":"IEEE Trans. Comput."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ferrandi, F., Castellana, V.G., Curzel, S., Fezzardi, P., Fiorito, M., Lattuada, M., Minutoli, M., Pilato, C., and Tumeo, A. (2021, January 5\u20139). Invited: Bambu: An Open-Source Research Framework for the High-Level Synthesis of Complex Applications. Proceedings of the ACM\/IEEE Design Automation Conference (DAC), San Francisco, CA, USA.","DOI":"10.1109\/DAC18074.2021.9586110"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3377403","article-title":"Application-Specific Arithmetic in High-Level Synthesis Tools","volume":"17","author":"Uguen","year":"2020","journal-title":"ACM Trans. Archit. Code Optim."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1591","DOI":"10.1109\/TCAD.2015.2513673","article-title":"A Survey and Evaluation of FPGA High-Level Synthesis Tools","volume":"35","author":"Nane","year":"2016","journal-title":"IEEE Trans. Comput. Aided Des. Integr. Circuits Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1123","DOI":"10.1109\/TCAD.2022.3193646","article-title":"Leveraging Modern C++ in High-Level Synthesis","volume":"42","author":"Lahti","year":"2023","journal-title":"IEEE Trans. Comput. Aided Des. Integr. Circuits Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1109\/12.841125","article-title":"Accelerating pipelined integer and floating-point accumulations in configurable hardware with delayed addition techniques","volume":"49","author":"Luo","year":"2000","journal-title":"IEEE Trans. Comput."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2314","DOI":"10.1109\/JSSC.2006.881557","article-title":"A 6.2-GFlops Floating-Point Multiply-Accumulator With Conditional Normalization","volume":"41","author":"Vangal","year":"2006","journal-title":"IEEE J. Solid-State Circuits"}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/4\/118\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:59:15Z","timestamp":1760029155000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/4\/118"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,24]]},"references-count":24,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["computers14040118"],"URL":"https:\/\/doi.org\/10.3390\/computers14040118","relation":{},"ISSN":["2073-431X"],"issn-type":[{"type":"electronic","value":"2073-431X"}],"subject":[],"published":{"date-parts":[[2025,3,24]]}}}