{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:21:30Z","timestamp":1750306890817,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,1,1]],"date-time":"2013-01-01T00:00:00Z","timestamp":1356998400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,1]]},"abstract":"<jats:p>The key to enabling widespread use of FPGAs for algorithm acceleration is to allow programmers to create efficient designs without the time-consuming hardware design process. Programmers are used to developing scientific and mathematical algorithms in high-level languages (C\/C++) using floating point data types. Although easy to implement, the dynamic range provided by floating point is not necessary in many applications; more efficient implementations can be realized using fixed point arithmetic. While this topic has been studied previously [Han et al. 2006; Olson et al. 1999; Gaffar et al. 2004; Aamodt and Chow 1999], the degree of full automation has always been lacking. We present a novel design flow for cases where FPGAs are used to offload computations from a microprocessor. Our LLVM-based algorithm inserts value profiling code into an unmodified C\/C++ application to guide its automatic conversion to fixed point. This allows for fast and accurate design space exploration on a host microprocessor before any accelerators are mapped to the FPGA. Through experimental results, we demonstrate that fixed-point conversion can yield resource savings of up to 2x--3x reductions. Embedded RAM usage is minimized, and 13%--22% higher<jats:italic>F<jats:sub>max<\/jats:sub><\/jats:italic>than the original floating-point implementation is observed. In a case study, we show that 17% reduction in logic and 24% reduction in register usage can be realized by using our algorithm in conjunction with a High-Level Synthesis (HLS) tool.<\/jats:p>","DOI":"10.1145\/2400682.2400702","type":"journal-article","created":{"date-parts":[[2013,1,22]],"date-time":"2013-01-22T15:28:56Z","timestamp":1358868536000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Profile-guided floating- to fixed-point conversion for hybrid FPGA-processor applications"],"prefix":"10.1145","volume":"9","author":[{"given":"Doris","family":"Chen","sequence":"first","affiliation":[{"name":"Altera Corporation"}]},{"given":"Deshanand","family":"Singh","sequence":"additional","affiliation":[{"name":"Altera Corporation"}]}],"member":"320","published-online":{"date-parts":[[2013,1,20]]},"reference":[{"volume-title":"Proceedings of the 1st Workshop on Media Processors and Digital Signal Processing.","author":"Aamodt T.","unstructured":"Aamodt , T. and Chow , P . 1999. Numerical error minimizing floating-point to fixed-point ANSI C compilation . In Proceedings of the 1st Workshop on Media Processors and Digital Signal Processing. Aamodt, T. and Chow, P. 1999. Numerical error minimizing floating-point to fixed-point ANSI C compilation. In Proceedings of the 1st Workshop on Media Processors and Digital Signal Processing.","key":"e_1_2_1_1_1"},{"unstructured":"Altera. 2010. Quartus II Handbook v11.0. http:\/\/www.altera.com\/literature\/archives\/lit-archive-index.jsp&quest;doctype=Handbooks&prodCat=Quartus&percnt;&percnt;20II&percnt;20Software Altera. 2010. Quartus II Handbook v11.0. http:\/\/www.altera.com\/literature\/archives\/lit-archive-index.jsp&quest;doctype=Handbooks&prodCat=Quartus&percnt;&percnt;20II&percnt;20Software","key":"e_1_2_1_2_1"},{"unstructured":"Balough C. 2011. Strategic considerations for emerging SoC FPGA's. Altera whitepaper. Balough C. 2011. Strategic considerations for emerging SoC FPGA's. Altera whitepaper.","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of the 12th International Conference on Field-Programmable Logic and Applications. Springer, 657--666","author":"Belanovic P.","unstructured":"Belanovic , P. and Lesser , M . 2002. A library of parameterized floating-point modules and their use . In Proceedings of the 12th International Conference on Field-Programmable Logic and Applications. Springer, 657--666 . Belanovic, P. and Lesser, M. 2002. A library of parameterized floating-point modules and their use. In Proceedings of the 12th International Conference on Field-Programmable Logic and Applications. Springer, 657--666.","key":"e_1_2_1_4_1"},{"volume-title":"Proceedings of the Workshop on Reconfigurable Computing at HiPEAC.","author":"Brown A. W.","unstructured":"Brown , A. W. , Kelly , P. H. , and Luk , W . 2008. Profile-Directed speculative optimization of reconfigurable floating point data paths . In Proceedings of the Workshop on Reconfigurable Computing at HiPEAC. Brown, A. W., Kelly, P. H., and Luk, W. 2008. Profile-Directed speculative optimization of reconfigurable floating point data paths. In Proceedings of the Workshop on Reconfigurable Computing at HiPEAC.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Compiler Construction (CC'99)","volume":"1575","author":"Cilio A. G. M.","unstructured":"Cilio , A. G. M. and Corp oraal, H . 1999. Floating point to fixed point conversion of c code . In Proceedings of the International Conference on Compiler Construction (CC'99) . Lecture Notes in Computer Science , vol. 1575 . Springer, 229--243. Cilio, A. G. M. and Corporaal, H. 1999. Floating point to fixed point conversion of c code. In Proceedings of the International Conference on Compiler Construction (CC'99). Lecture Notes in Computer Science, vol. 1575. Springer, 229--243."},{"doi-asserted-by":"crossref","unstructured":"Cope B. Cheeung P. Y. K. Luk W. and Witt S. 2005. Have GPU's made FPGA's redundant in the field of video processing&quest; In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT'05). 111--118. Cope B. Cheeung P. Y. K. Luk W. and Witt S. 2005. Have GPU's made FPGA's redundant in the field of video processing&quest; In Proceedings of the IEEE International Conference on Field Programmable Technology (FPT'05). 111--118.","key":"e_1_2_1_7_1","DOI":"10.1109\/FPT.2005.1568533"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 22nd International Conference on Field-Programmable Logic and Applications.","author":"Czajkowski T.","year":"2012","unstructured":"Czajkowski , T. , Aydonat , U. , Denisenko , D. , Freeman , J. , Kinser , M. , 2012 . From openCL to high-performance hardware on FPGA's . In Proceedings of the 22nd International Conference on Field-Programmable Logic and Applications. Czajkowski, T., Aydonat, U., Denisenko, D., Freeman, J., Kinser, M., et al. 2012. From openCL to high-performance hardware on FPGA's. In Proceedings of the 22nd International Conference on Field-Programmable Logic and Applications."},{"doi-asserted-by":"publisher","key":"e_1_2_1_9_1","DOI":"10.1145\/1344671.1344717"},{"doi-asserted-by":"publisher","key":"e_1_2_1_10_1","DOI":"10.1145\/2145694.2145704"},{"volume-title":"Proceedings of the Annual International IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'04)","author":"Gaffar A. A.","unstructured":"Gaffar , A. A. , Mencer , O. , Luk , W. , and Cheung , P. Y. K. 2004. Unifying bit-width optimization for fixed-point and floating-point designs . In Proceedings of the Annual International IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'04) . 79--88. Gaffar, A. A., Mencer, O., Luk, W., and Cheung, P. Y. K. 2004. Unifying bit-width optimization for fixed-point and floating-point designs. In Proceedings of the Annual International IEEE Symposium on Field Programmable Custom Computing Machines (FCCM'04). 79--88.","key":"e_1_2_1_11_1"},{"volume-title":"Proceeding of the 40th ASILOMAR Conference on Signals, Systems and Computers. 79--83","author":"Han K.","unstructured":"Han , K. , Olson , A. G. , and Evans , B. L . 2006. Automatic floating-point to fixed-point transformations . In Proceeding of the 40th ASILOMAR Conference on Signals, Systems and Computers. 79--83 . Han, K., Olson, A. G., and Evans, B. L. 2006. Automatic floating-point to fixed-point transformations. In Proceeding of the 40th ASILOMAR Conference on Signals, Systems and Computers. 79--83.","key":"e_1_2_1_12_1"},{"volume-title":"Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems. 3--12","author":"Jacobson H. K.","unstructured":"Jacobson , H. K. , Kudva , P. N. , Bose , P. , Cook , P. W. and Schuster , S. E . 2002. Synchronous interlocked pipelines . In Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems. 3--12 . Jacobson, H. K., Kudva, P. N., Bose, P., Cook, P. W. and Schuster, S. E. 2002. Synchronous interlocked pipelines. In Proceedings of the International Symposium on Advanced Research in Asynchronous Circuits and Systems. 3--12.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA).","author":"Johnson J.","unstructured":"Johnson , J. , Chagnon , T. , Vachranukunkiet , P. , Nagvajara , P. , and Nwankpa , C . 2008. Sparse LU decomposition using FPGA's . In Proceedings of the International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA). Johnson, J., Chagnon, T., Vachranukunkiet, P., Nagvajara, P., and Nwankpa, C. 2008. Sparse LU decomposition using FPGA's. In Proceedings of the International Workshop on State-of-the-Art in Scientific and Parallel Computing (PARA).","key":"e_1_2_1_14_1"},{"unstructured":"Khronos OpenCLWorking Group. 2008. The openCL specification version 1.0.29. http:\/\/www.khronos.org\/registry\/cl\/specs\/opencl-1.0.29.pdf Khronos OpenCLWorking Group. 2008. The openCL specification version 1.0.29. http:\/\/www.khronos.org\/registry\/cl\/specs\/opencl-1.0.29.pdf","key":"e_1_2_1_15_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_16_1","DOI":"10.1109\/ARITH.2011.32"},{"unstructured":"Lattner C. 2007 New LLVM C front-end: \u201cclang\u201d. In cfe-dev mailing list. http:\/\/lists.cs.uiuc.edu\/pipermail\/llvmdev\/2007-July\/009817.html Lattner C. 2007 New LLVM C front-end: \u201cclang\u201d. In cfe-dev mailing list. http:\/\/lists.cs.uiuc.edu\/pipermail\/llvmdev\/2007-July\/009817.html","key":"e_1_2_1_17_1"},{"volume-title":"Proceedings of the International Symposium on Code Generation and Optimization (CGO'04)","author":"Lattner C.","unstructured":"Lattner , C. and Adve , V . 2004. LLVM: A compilation framework for lifelong program analysis & transformation . In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04) . Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'04).","key":"e_1_2_1_18_1"},{"volume-title":"Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence","author":"Lin W.","unstructured":"Lin , W. , Tao , D. , Kacprzyk , J. , Li , Z. , Izquiedo , E. , Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence . Springer . Lin, W., Tao, D., Kacprzyk, J., Li, Z., Izquiedo, E., et al. 2011. Multimedia Analysis, Processing and Communications. Studies in Computational Intelligence. Springer.","key":"e_1_2_1_19_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_20_1","DOI":"10.1145\/1513895.1513905"},{"unstructured":"NVIDIA. 2010. NVDIA openCL SDK code samples. https:\/\/developer.nvidia.com\/opencl NVIDIA. 2010. NVDIA openCL SDK code samples. https:\/\/developer.nvidia.com\/opencl","key":"e_1_2_1_21_1"},{"volume-title":"Proceedings of the 17th IEEE Nordic Micro Electronics Event (NORDIC'99)","author":"Olson H.","unstructured":"Olson , H. , Jantsch , A. , and Tenhunen , H . 1999. Floating-To fixed-point refinement in mathlab with an object-oriented library . In Proceedings of the 17th IEEE Nordic Micro Electronics Event (NORDIC'99) . Olson, H., Jantsch, A., and Tenhunen, H. 1999. Floating-To fixed-point refinement in mathlab with an object-oriented library. In Proceedings of the 17th IEEE Nordic Micro Electronics Event (NORDIC'99).","key":"e_1_2_1_22_1"},{"unstructured":"Singh D. 2011. Implementing FPGA design with the OpenCL standard. Altera whitepaper. Singh D. 2011. Implementing FPGA design with the OpenCL standard. Altera whitepaper.","key":"e_1_2_1_23_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_24_1","DOI":"10.1145\/351397.351408"},{"unstructured":"Strickland M. and Langhammer M. 2008. FPGA coprocessing evolution: Sustained performance approaches. Altera whitepaper. Strickland M. and Langhammer M. 2008. FPGA coprocessing evolution: Sustained performance approaches. Altera whitepaper.","key":"e_1_2_1_25_1"},{"unstructured":"Verma S. and McElheny P. 2012. Introducing innovations at 28 nm to move beyond moores law. Altera whitepaper. Verma S. and McElheny P. 2012. Introducing innovations at 28 nm to move beyond moores law. Altera whitepaper.","key":"e_1_2_1_26_1"},{"doi-asserted-by":"publisher","key":"e_1_2_1_27_1","DOI":"10.1109\/JSTSP.2012.2215007"},{"doi-asserted-by":"publisher","key":"e_1_2_1_28_1","DOI":"10.1109\/TBC.2008.2000733"},{"unstructured":"Xilinx. 2011. Zinq 7000 EPP product brief. http:\/\/press.xilinx.com\/phoenix.zhtml&quest;c=212763&p=irol-newsArticle&ID=1678259&highlight= Xilinx. 2011. Zinq 7000 EPP product brief. http:\/\/press.xilinx.com\/phoenix.zhtml&quest;c=212763&p=irol-newsArticle&ID=1678259&highlight=","key":"e_1_2_1_29_1"},{"volume-title":"Proceedings of the International Conference on Field-Programmable Technology.","author":"Zhang Y.","unstructured":"Zhang . Y. , Shalabi , Y. H. , Jain , R. , Nagar , K. K. and Bakos , J. D . 2009. FPGA vs GPU for sparse matrix vector multiply . In Proceedings of the International Conference on Field-Programmable Technology. Zhang. Y., Shalabi, Y. H., Jain, R., Nagar, K. K. and Bakos, J. D. 2009. FPGA vs GPU for sparse matrix vector multiply. In Proceedings of the International Conference on Field-Programmable Technology.","key":"e_1_2_1_30_1"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2400682.2400702","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2400682.2400702","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T08:18:52Z","timestamp":1750234732000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2400682.2400702"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,1]]},"references-count":30,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,1]]}},"alternative-id":["10.1145\/2400682.2400702"],"URL":"https:\/\/doi.org\/10.1145\/2400682.2400702","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,1]]},"assertion":[{"value":"2012-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-01-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}