{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T22:01:09Z","timestamp":1777500069405,"version":"3.51.4"},"reference-count":46,"publisher":"IOP Publishing","issue":"1","license":[{"start":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T00:00:00Z","timestamp":1737331200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,20]],"date-time":"2025-01-20T00:00:00Z","timestamp":1737331200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"Canada Research Chair in Real-Time Intelligence Embedded for High-Speed Sensors"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Implementing machine learning (ML) models on field-programmable gate arrays (FPGAs) is becoming increasingly popular across various domains as a low-latency and low-power solution that helps manage large data rates generated by continuously improving detectors. However, developing ML models for FPGAs is time-consuming, as optimization requires synthesis to evaluate FPGA area and latency, making the process slow and repetitive. This paper introduces a novel method to predict the resource utilization and inference latency of neural networks (NNs) before their synthesis and implementation on FPGA. We leverage HLS4ML, a tool-flow that helps translate NNs into high-level synthesis (HLS) code, to synthesize a diverse dataset of NN architectures and train resource utilization and inference latency predictors. While HLS4ML requires full synthesis to obtain resource and latency insights, our method uses trained regression models for immediate pre-synthesis predictions. The prediction models estimate the usage of block RAM, digital signal processors, flip-flops, and look-Up tables, as well as the inference clock cycles. The predictors were evaluated on both synthetic and existing benchmark architectures and demonstrated high accuracy with <jats:italic>R<\/jats:italic>\n                  <jats:sup>2<\/jats:sup> scores ranging between 0.8 and 0.98 on the validation set and sMAPE values between 10% and 30%. Overall, our approach provides valuable preliminary insights, enabling users to quickly assess the feasibility and efficiency of NNs on FPGAs, accelerating the development and deployment processes. The open-source repository can be found at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/IMPETUS-UdeS\/rule4ml\">https:\/\/github.com\/IMPETUS-UdeS\/rule4ml<\/jats:ext-link>, while the datasets are publicly available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/borealisdata.ca\/dataverse\/rule4ml\">https:\/\/borealisdata.ca\/dataverse\/rule4ml<\/jats:ext-link>.<\/jats:p>","DOI":"10.1088\/2632-2153\/ada71c","type":"journal-article","created":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T23:01:18Z","timestamp":1736290878000},"page":"015009","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["rule4ml: an open-source tool for resource utilization and latency estimation for ML models on FPGA"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6582-8322","authenticated-orcid":true,"given":"Mohammad Mehdi","family":"Rahimifar","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0352-725X","authenticated-orcid":true,"given":"Hamza Ezzaoui","family":"Rahali","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6698-8400","authenticated-orcid":true,"given":"Audrey C","family":"Therrien","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"266","published-online":{"date-parts":[[2025,1,20]]},"reference":[{"key":"mlstada71cbib1","doi-asserted-by":"publisher","DOI":"10.1016\/j.nima.2023.168829","article-title":"Efficient compression at the edge for real-time data acquisition in a billion-pixel x-ray camera","volume":"1058","author":"Rahali","year":"2024","journal-title":"Nucl. Instrum. Methods Phys. Res. A"},{"key":"mlstada71cbib2","doi-asserted-by":"publisher","first-page":"746","DOI":"10.1007\/s11704-016-6159-1","article-title":"A survey of neural network accelerators","volume":"11","author":"Li","year":"2017","journal-title":"Front. Comput. Sci."},{"key":"mlstada71cbib3","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/acc0d7","article-title":"Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml","volume":"4","author":"Khoda","year":"2023","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstada71cbib4","doi-asserted-by":"publisher","first-page":"1930","DOI":"10.1364\/AO.445798","article-title":"Potential of edge machine learning for instrumentation","volume":"61","author":"Therrien","year":"2022","journal-title":"Appl. Opt."},{"key":"mlstada71cbib5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ColCACI.2018.8484858","article-title":"A systematic literature review of hardware neural networks","author":"Parra","year":"2018"},{"key":"mlstada71cbib6","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1109\/FPT.2016.7929192","article-title":"Accelerating binarized neural networks: comparison of FPGA, CPU, GPU, and ASIC","author":"Nurvitadhi","year":"2016"},{"key":"mlstada71cbib7","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1145\/3458864.3467882","article-title":"Nn-meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices","author":"Zhang","year":"2021"},{"key":"mlstada71cbib8","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1109\/ICCRD54409.2022.9730377","article-title":"A survey on convolutional neural network accelerators: GPU, FPGA and ASIC","author":"Hu","year":"2022"},{"key":"mlstada71cbib9","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/ISOCC50952.2020.9333063","article-title":"Deep neural network training accelerator designs in ASIC and FPGA","author":"Venkataramanaiah","year":"2020"},{"key":"mlstada71cbib10","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1109\/MM.2020.2985963","article-title":"Maestro: a data-centric approach to understand reuse, performance and hardware cost of DNN mappings","volume":"40","author":"Kwon","year":"2020","journal-title":"IEEE Micro"},{"key":"mlstada71cbib11","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/15\/05\/P05026","article-title":"Fast inference of boosted decision trees in FPGAs for particle physics","volume":"15","author":"Summers","year":"2020","journal-title":"J. Instrum."},{"key":"mlstada71cbib12","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac0ea1","article-title":"Fast convolutional neural networks on FPGAs with hls4ml","volume":"2","author":"Aarrestad","year":"2021","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstada71cbib13","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/16\/08\/P08016","article-title":"Nanosecond machine learning event classification with boosted decision trees in FPGA for high energy physics","volume":"16","author":"Hong","year":"2021","journal-title":"J. Instrum."},{"key":"mlstada71cbib14","doi-asserted-by":"publisher","first-page":"898","DOI":"10.1109\/TCAD.2018.2834439","article-title":"Are we there yet? A study on the state of high-level synthesis","volume":"38","author":"Lahti","year":"2018","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"mlstada71cbib15","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ad0d12","article-title":"Exploring machine learning to hardware implementations for large data rate x-ray instrumentation","volume":"4","author":"Rahimifar","year":"2023","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstada71cbib16","article-title":"hls4ml: an open-source codesign workflow to empower scientific low-power machine learning devices","author":"Fahim","year":"2021"},{"key":"mlstada71cbib17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3530775","article-title":"FPGA HLS today: successes, challenges and opportunities","volume":"15","author":"Cong","year":"2022","journal-title":"ACM Trans. on Reconfigurable Technology and Systems (TRETS)"},{"key":"mlstada71cbib18","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/FCCM.2018.00029","article-title":"Fast and accurate estimation of quality of results in high-level synthesis with machine learning","author":"Dai","year":"2018"},{"key":"mlstada71cbib19","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1109\/FPL.2019.00069","article-title":"Pyramid: machine learning framework to estimate the optimal timing and resource usage of a high-level synthesis design","author":"Makrani","year":"2019"},{"key":"mlstada71cbib20","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530408","article-title":"High-level synthesis performance prediction using gnns: benchmarking, modeling and advancing","author":"Wu","year":"2022"},{"key":"mlstada71cbib21","doi-asserted-by":"publisher","first-page":"85785","DOI":"10.1109\/ACCESS.2023.3303840","article-title":"A graph neural network model for fast and accurate quality of result estimation for high-level synthesis","volume":"11","author":"Jamal","year":"2023","journal-title":"IEEE Access"},{"key":"mlstada71cbib22","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1109\/ISVLSI51109.2021.00019","article-title":"Resource and performance estimation for CNN models using machine learning","author":"Shahshahani","year":"2021"},{"key":"mlstada71cbib23","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1109\/IPDPSW55747.2022.00022","article-title":"Machine learning aided hardware resource estimation for FPGA DNN implementations","author":"Diaconu","year":"2022"},{"key":"mlstada71cbib24","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac9cb5","article-title":"Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml","volume":"3","author":"Ghielmetti","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstada71cbib25","article-title":"HLS4ML GitHub repository","author":"Fahim","year":"2024"},{"key":"mlstada71cbib26","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"mlstada71cbib27","article-title":"Attention is all you need","author":"Vaswani","year":"2017"},{"key":"mlstada71cbib28","doi-asserted-by":"publisher","first-page":"1310","DOI":"10.48550\/arXiv.1211.5063","article-title":"On the difficulty of training recurrent neural networks","volume":"vol 28","author":"Pascanu","year":"2013"},{"key":"mlstada71cbib29","doi-asserted-by":"publisher","first-page":"11121","DOI":"10.1609\/aaai.v37i9.26317","article-title":"Are transformers effective for time series forecasting?","volume":"vol 37","author":"Zeng","year":"2023"},{"key":"mlstada71cbib30","doi-asserted-by":"publisher","first-page":"1763","DOI":"10.1213\/ANE.0000000000002864","article-title":"Correlation coefficients: appropriate use and interpretation","volume":"126","author":"Schober","year":"2018","journal-title":"Anesth. Analg."},{"key":"mlstada71cbib31","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1162\/neco_a_01199","article-title":"A review of recurrent neural networks: LSTM cells and network architectures","volume":"31","author":"Yu","year":"2019","journal-title":"Neural Comput."},{"key":"mlstada71cbib32","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"mlstada71cbib33","doi-asserted-by":"publisher","first-page":"p 30","DOI":"10.5555\/3294996.3295074","article-title":"Lightgbm: a highly efficient gradient boosting decision tree","author":"Ke","year":"2017"},{"key":"mlstada71cbib34","doi-asserted-by":"publisher","first-page":"p.e623","DOI":"10.7717\/peerj-cs.623","article-title":"The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation","volume":"7","author":"Davide","year":"2021","journal-title":"PeerJ Comput. Sci."},{"key":"mlstada71cbib35","first-page":"1","article-title":"Hyperband: a novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li","year":"2018","journal-title":"J. Mach. Learn. Res."},{"key":"mlstada71cbib36","doi-asserted-by":"publisher","first-page":"p 25","DOI":"10.5555\/2999325.2999464","article-title":"Practical bayesian optimization of machine learning algorithms","author":"Snoek","year":"2012"},{"key":"mlstada71cbib37","article-title":"Grid search, random search, genetic algorithm: a big comparison for nas","author":"Liashchynskyi","year":"2019"},{"key":"mlstada71cbib38","doi-asserted-by":"publisher","first-page":"7","DOI":"10.5120\/ijca2017915495","article-title":"A comparative study of categorical variable encoding techniques for neural network classifiers","volume":"175","author":"Potdar","year":"2017","journal-title":"Int. J. Comput. Appl."},{"key":"mlstada71cbib39","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014"},{"key":"mlstada71cbib40","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1016\/0169-2070(93)90079-3","article-title":"Accuracy measures: theoretical and practical concerns","volume":"9","author":"Makridakis","year":"1993","journal-title":"Int. J. Forecast."},{"key":"mlstada71cbib41","doi-asserted-by":"publisher","DOI":"10.1088\/1748-0221\/13\/07\/P07027","article-title":"Fast inference of deep neural networks in FPGAs for particle physics","volume":"13","author":"Duarte","year":"2018","journal-title":"J. Instrum."},{"key":"mlstada71cbib42","article-title":"Open-source FPGA-ML codesign for the MLPerf tiny benchmark","author":"Borras","year":"2022"},{"key":"mlstada71cbib43","doi-asserted-by":"publisher","DOI":"10.3389\/fphy.2022.957128","article-title":"Data reduction through optimized scalar quantization for more compact neural networks","volume":"10","author":"Gouin-Ferland","year":"2022","journal-title":"Front. Phys."},{"key":"mlstada71cbib44","article-title":"MNIST handwritten digit database","author":"LeCun","year":"2010"},{"key":"mlstada71cbib45","doi-asserted-by":"publisher","first-page":"1371","DOI":"10.1109\/TBCAS.2023.3299084","article-title":"Automlp: a framework for the acceleration of multi-layer perceptron models on FPGAs for real-time atrial fibrillation disease detection","volume":"17","author":"Chen","year":"2023","journal-title":"IEEE Trans. Biomed. Circuits Syst."},{"key":"mlstada71cbib46","article-title":"Charged particle tracking with machine learning on FPGAs","author":"Abidi","year":"2022"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,6]],"date-time":"2025-02-06T12:30:55Z","timestamp":1738845055000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ada71c"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,20]]},"references-count":46,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1,20]]},"published-print":{"date-parts":[[2025,3,31]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ada71c","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,20]]},"assertion":[{"value":"rule4ml: an open-source tool for resource utilization and latency estimation for ML models on FPGA","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2024-07-15","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-01-07","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-01-20","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}