{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T20:21:04Z","timestamp":1760473264366,"version":"3.41.2"},"reference-count":47,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T00:00:00Z","timestamp":1736121600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Comput. Sci."],"abstract":"<jats:p>Low-latency inference for machine learning models is increasingly becoming a necessary requirement, as these models are used in mission-critical applications such as autonomous driving, military defense (e.g., target recognition), and network traffic analysis. A widely studied and used technique to overcome this challenge is to offload some or all parts of the inference tasks onto specialized hardware such as graphic processing units. More recently, offloading machine learning inference onto programmable network devices, such as programmable network interface cards or a programmable switch, is gaining interest from both industry and academia, especially due to the latency reduction and computational benefits of performing inference directly on the data plane where the network packets are processed. Yet, current approaches are relatively limited in scope, and there is a need to develop more general approaches for mapping offloading machine learning models onto programmable network devices. To fulfill such a need, this work introduces a novel framework, called ML-NIC, for deploying trained machine learning models onto programmable network devices' data planes. ML-NIC deploys models directly into the computational cores of the devices to efficiently leverage the inherent parallelism capabilities of network devices, thus providing huge latency and throughput gains. Our experiments show that ML-NIC reduced inference latency by at least 6 \u00d7 on average and in the 99th percentile and increased throughput by at least 16<jats:italic>x<\/jats:italic> with little to no degradation in model effectiveness compared to the existing CPU solutions. In addition, ML-NIC can provide tighter guaranteed latency bounds in the presence of other network traffic with shorter tail latencies. Furthermore, ML-NIC reduces CPU and host server RAM utilization by 6.65% and 320.80 MB. Finally, ML-NIC can handle machine learning models that are 2.25 \u00d7 larger than the current state-of-the-art network device offloading approaches.<\/jats:p>","DOI":"10.3389\/fcomp.2024.1493399","type":"journal-article","created":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T07:02:24Z","timestamp":1736146944000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["ML-NIC: accelerating machine learning inference using smart network interface cards"],"prefix":"10.3389","volume":"6","author":[{"given":"Raghav","family":"Kapoor","sequence":"first","affiliation":[]},{"given":"David C.","family":"Anastasiu","sequence":"additional","affiliation":[]},{"given":"Sean","family":"Choi","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2025,1,6]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"693","DOI":"10.1145\/3544216.3544263","article-title":"\u201cAggregate-based congestion control for pulse-wave ddos defense,\u201d","author":"Alcoz","year":"2022","journal-title":"Proceedings of the ACM SIGCOMM 2022 Conference"},{"key":"B2","first-page":"992","article-title":"\u201cMultiplying matrices without multiplying,\u201d","volume-title":"International Conference on Machine Learning","author":"Blalock","year":"2021"},{"volume-title":"Classification and Regression Trees","year":"1984","author":"Breiman","key":"B3"},{"key":"B4","article-title":"pforest: In-network inference with random forests","author":"Busse-Grawitz","year":"2019","journal-title":"arXiv preprint arXiv:1909.05680"},{"key":"B5","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1145\/2939672.2939785","article-title":"\u201cXgboost: a scalable tree boosting system,\u201d","author":"Chen","year":"2016","journal-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"},{"key":"B6","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1109\/MICRO.2014.58","article-title":"\u201cDadiannao: a machine-learning supercomputer,\u201d","volume-title":"2014 47th Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Chen","year":"2014"},{"key":"B7","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1109\/MM.2018.022071134","article-title":"Volta: performance and programmability","volume":"38","author":"Choquette","year":"2018","journal-title":"IEEE Micro"},{"year":"2020","journal-title":"Corigine nfp-4000 flow processor","key":"B8"},{"key":"B9","doi-asserted-by":"publisher","first-page":"1726","DOI":"10.1109\/TNET.2020.2992106","article-title":"P4xos: consensus as a network service","volume":"28","author":"Dang","year":"2020","journal-title":"IEEE\/ACM Trans. Network"},{"key":"B10","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1109\/ATC52653.2021.9598239","article-title":"\u201cDevelopment of lightweight and accurate intrusion detection on programmable data plane,\u201d","volume-title":"2021 International Conference on Advanced Technologies for Communications (ATC)","author":"Dao","year":"2021"},{"key":"B11","doi-asserted-by":"publisher","first-page":"13254","DOI":"10.1109\/TVT.2022.3198266","article-title":"Jointnids: efficient joint traffic management for on-device network intrusion detection","volume":"71","author":"Dao","year":"2022","journal-title":"IEEE Trans. Vehic. Technol"},{"key":"B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/ISCA.2018.00012","article-title":"\u201cA configurable cloud-scale DNN processor for real-time ai,\u201d","volume-title":"2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)","author":"Fowers","year":"2018"},{"key":"B13","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1145\/3482898.3483356","article-title":"\u201cClustreams: data plane clustering,\u201d","author":"Friedman","year":"2021","journal-title":"Proceedings of the ACM SIGCOMM Symposium on SDN Research (SOSR)"},{"key":"B14","doi-asserted-by":"publisher","first-page":"71","DOI":"10.14529\/jsfi170206","article-title":"Beating floating point at its own game: posit arithmetic","volume":"4","author":"Gustafson","year":"2017","journal-title":"Supercomput. Front. Innov"},{"key":"B15","doi-asserted-by":"crossref","first-page":"368","DOI":"10.1109\/FPL.2018.00069","article-title":"\u201cA flexible k-means operator for hybrid databases,\u201d","volume-title":"2018 28th International Conference on Field Programmable Logic and Applications (FPL)","author":"He","year":"2018"},{"key":"B16","first-page":"3429","article-title":"\u201cAccelerating machine learning for trading using programmable switches,\u201d","volume-title":"27th European Conference on Artificial Intelligence (ECAI 2024)","author":"Hong","year":"2024"},{"key":"B17","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1145\/3132747.3132764","article-title":"\u201cNetcache: balancing key-value stores with fast in-network caching,\u201d","author":"Jin","year":"2017","journal-title":"Proceedings of the 26th Symposium on Operating Systems Principles"},{"key":"B18","first-page":"1","article-title":"\u201cIn-datacenter performance analysis of a tensor processing unit,\u201d","author":"Jouppi","year":"2017","journal-title":"Proceedings of the 44th Annual International Symposium on Computer Architecture"},{"key":"B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/ICCCN52240.2021.9522272","article-title":"\u201cMachine learning based flow classification in dCNS using p4 switches,\u201d","volume-title":"2021 International Conference on Computer Communications and Networks (ICCCN)","author":"Kamath","year":"2021"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2774993.2775007","article-title":"\u201cIn-band network telemetry via programmable dataplanes,\u201d","author":"Kim","year":"2015","journal-title":"ACM SIGCOMM"},{"volume-title":"Towards machine learning inference in the data plane","year":"2019","author":"Langlet","key":"B21"},{"volume-title":"Python Machine Learning by Example","year":"2017","author":"Liu","key":"B22"},{"year":"2024","author":"Netronome Systems","journal-title":"Agilio cx 2x25gbe smartnic","key":"B23"},{"key":"B24","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"B25","first-page":"352","article-title":"\u201cLine-speed and scalable intrusion detection at the network edge via federated learning,\u201d","volume-title":"2020 IFIP Networking Conference (Networking)","author":"Qin","year":"2020"},{"key":"B26","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1007\/BF00116251","article-title":"Induction of decision trees","volume":"1","author":"Quinlan","year":"1986","journal-title":"Mach. Learn"},{"key":"B27","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1016\/j.chemolab.2013.10.012","article-title":"On the calibration of sensor arrays for pattern recognition using the minimal number of experiments","volume":"130","author":"Rodr\u00edguez-Luj\u00e1n","year":"2014","journal-title":"Chemometr. Intellig. Lab. Syst"},{"year":"2016","author":"Roth","journal-title":"Decision trees","key":"B28"},{"key":"B29","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1145\/3229591.3229594","article-title":"\u201cCan the network be the ai accelerator?\u201d","author":"Sanvito","year":"2018","journal-title":"Proceedings of the 2018 Morning Workshop on In-Network Computing"},{"key":"B30","doi-asserted-by":"publisher","first-page":"3551","DOI":"10.1109\/LCOMM.2021.3108940","article-title":"Toward in-network intelligence: running distributed artificial neural networks in the data plane","volume":"25","author":"Saquetti","year":"2021","journal-title":"IEEE Commun. Lett"},{"key":"B31","doi-asserted-by":"publisher","first-page":"108","DOI":"10.5220\/0006639801080116","article-title":"Toward generating a new intrusion detection dataset and intrusion traffic characterization","volume":"1","author":"Sharafaldin","year":"2018","journal-title":"ICISSp"},{"key":"B32","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1109\/ICPADS53394.2021.00021","article-title":"\u201cTowards network-accelerated ml-based distributed computer vision systems,\u201d","volume-title":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","author":"Siddique","year":"2021"},{"key":"B33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/NOMS54207.2022.9789930","article-title":"\u201cRevisiting the classics: online RL in the programmable dataplane,\u201d","volume-title":"NOMS 2022\u20132022 IEEE\/IFIP Network Operations and Management Symposium","author":"Simpson","year":"2022"},{"key":"B34","article-title":"In-network neural networks","author":"Siracusano","year":"2018","journal-title":"arXiv preprint arXiv:1801.05731"},{"key":"B35","first-page":"513","article-title":"\u201cRe-architecting traffic analysis with neural network interface cards,\u201d","author":"Siracusano","year":"2022","journal-title":"19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)"},{"key":"B36","article-title":"Statlog (Landsat Satellite)","author":"Srinivasan","year":"1993","journal-title":"UCI Machine Learning Repository"},{"volume-title":"Reinforcement Learning: An Introduction","year":"2018","author":"Sutton","key":"B37"},{"key":"B38","doi-asserted-by":"publisher","first-page":"1099","DOI":"10.1145\/3503222.3507726","article-title":"\u201cTaurus: a data plane architecture for per-packet ml,\u201d","author":"Swamy","year":"2022","journal-title":"Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems"},{"key":"B39","doi-asserted-by":"publisher","first-page":"3046","DOI":"10.1109\/TPDS.2017.2714661","article-title":"Accelerating decision tree based traffic classification on fPGA and multicore platforms","volume":"28","author":"Tong","year":"2017","journal-title":"IEEE Trans. Parallel Distr. Syst"},{"year":"2014","author":"Wray","journal-title":"The joy of micro-c","key":"B40"},{"key":"B41","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/INFOCOM42981.2021.9488840","article-title":"\u201cProgrammable switches for in-networking classification,\u201d","volume-title":"IEEE INFOCOM 2021-IEEE Conference on Computer Communications","author":"Xavier","year":"2021"},{"key":"B42","doi-asserted-by":"publisher","first-page":"1938","DOI":"10.1109\/INFOCOM48880.2022.9796936","article-title":"\u201cMousika: enable general in-network intelligence in programmable switches by knowledge distillation,\u201d","author":"Xie","year":"2022","journal-title":"IEEE INFOCOM 2022-IEEE Conference on Computer Communications"},{"key":"B43","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1145\/3365609.3365864","article-title":"\u201cDo switches dream of machine learning? Toward in-network classification,\u201d","author":"Xiong","year":"2019","journal-title":"Proceedings of the 18th ACM Workshop on Hot Topics in Networks"},{"key":"B44","doi-asserted-by":"publisher","first-page":"47870","DOI":"10.1109\/ACCESS.2018.2866538","article-title":"Passive mine detection and classification method based on hybrid model","volume":"6","author":"Yilmaz","year":"2018","journal-title":"IEEE Access"},{"key":"B45","first-page":"1","article-title":"\u201cAccelerating gnn-based sar automatic target recognition on hbm-enabled fpga,\u201d","volume-title":"2023 IEEE High Performance Extreme Computing Conference (HPEC)","author":"Zhang","year":"2023"},{"key":"B46","doi-asserted-by":"publisher","first-page":"4353","DOI":"10.1109\/TNSM.2021.3094514","article-title":"pheavy: predicting heavy flows in the programmable data plane","volume":"18","author":"Zhang","year":"2021","journal-title":"IEEE Trans. Netw. Serv. Manag"},{"key":"B47","first-page":"12","article-title":"\u201cPlanter: seeding trees within switches,\u201d","volume-title":"Proceedings of the SIGCOMM '21 Poster and Demo Sessions, SIGCOMM '21","author":"Zheng","year":"2021"}],"container-title":["Frontiers in Computer Science"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1493399\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,6]],"date-time":"2025-01-06T07:02:36Z","timestamp":1736146956000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fcomp.2024.1493399\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,6]]},"references-count":47,"alternative-id":["10.3389\/fcomp.2024.1493399"],"URL":"https:\/\/doi.org\/10.3389\/fcomp.2024.1493399","relation":{},"ISSN":["2624-9898"],"issn-type":[{"type":"electronic","value":"2624-9898"}],"subject":[],"published":{"date-parts":[[2025,1,6]]},"article-number":"1493399"}}