{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T09:38:01Z","timestamp":1779356281108,"version":"3.51.4"},"reference-count":57,"publisher":"Academy of Cognitive and Natural Sciences","issue":"1","license":[{"start":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T00:00:00Z","timestamp":1779321600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Edge Comp."],"abstract":"<jats:p>Edge deployments of large language models (LLMs) often suffer from significant latency due to the overhead of high-level client runtimes on resource-constrained hardware. To address this challenge, we conducted a side-by-side performance analysis of four quantised LLMs \u2013 Llama 3.2:1b, Gemma 3:1b, Granite 3.1-MoE:1b, and Qwen 2.5:0.5b \u2013 on a Raspberry Pi 4 Model B (8 GB LPDDR4, quad-core ARM Cortex-A72) using both Python and Rust API clients. Each model was served via a local Ollama inference server, and a fixed suite of twenty prompts \u2013 covering factual retrieval, arithmetic reasoning, translation, code synthesis, and creative generation \u2013 was executed sequentially with a two-second inter-request delay, yielding 160 measurements per client. Rust markedly reduces cold-start delays: mean model load times fall from 1 648.7 ms (Python) to 52.8 ms (Rust) for Llama 3.2:1b, and from 607.0 ms to 171.3 ms for Qwen 2.5:0.5b. Corresponding end-to-end latencies decrease by 1.4-2.0 s across models. In warm-start conditions, both clients deliver nearly identical decoding throughput \u2013 \u22482.7 tokens\/s for Llama 3.2:1b, 4.4 tokens\/s for Gemma 3:1b, 7.4 tokens\/s for Granite 3.1-MoE, and 8.6 tokens\/s for Qwen 2.5:0.5b \u2013 indicating that runtime overhead is negligible once models are loaded. Rigorous statistical testing, including paired t-tests, Mann-Whitney U tests, and bootstrap confidence intervals, confirms that Rust\u2019s coldstart advantages are highly significant (p &lt; 0.01). At the same time, throughput differences in steady-state inference are not statistically meaningful. We discuss limitations in platform specificity, quantisation approaches, and prompt diversity, and outline future work on heterogeneous accelerators, adaptive scheduling, and ondevice fine-tuning. Finally, we highlight practical applications in smart agriculture, healthcare monitoring, industrial IoT, autonomous robotics, and offline educational tools. This benchmark furnishes actionable guidelines for selecting client languages and quantised models in edge AI scenarios.<\/jats:p>","DOI":"10.55056\/jec.1047","type":"journal-article","created":{"date-parts":[[2025,12,18]],"date-time":"2025-12-18T22:43:16Z","timestamp":1766097796000},"page":"47-89","source":"Crossref","is-referenced-by-count":1,"title":["Performance analysis of localised large language models in resource-constrained edge for Python and Rust APIs"],"prefix":"10.55056","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2306-2792","authenticated-orcid":false,"given":"Partha Pratim","family":"Ray","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-8731-764X","authenticated-orcid":false,"given":"Mohan Pratap","family":"Pradhan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"33647","published-online":{"date-parts":[[2026,5,21]]},"reference":[{"key":"107726","doi-asserted-by":"publisher","DOI":"10.1109\/EDUCON62633.2025.11016377"},{"key":"107727","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2024.acl-long.678"},{"key":"107728","doi-asserted-by":"crossref","unstructured":"Alves, V., Bezerra, C., Machado, I., Rocha, L., Virg\u00ednio, T. and Silva, P., 2025. Quality Assessment of Python Tests Generated by Large Language Models. 2506. 14297, Available from: https:\/\/doi.org\/10.48550\/arXiv.2506.14297.","DOI":"10.1145\/3756681.3756964"},{"key":"107729","unstructured":"Banerjee, D., Singh, P., Avadhanam, A. and Srivastava, S., 2023. Benchmarking LLM powered Chatbots: Methods and Metrics. 2308.04624, Available from: https:\/\/doi.org\/10.48550\/arXiv.2308.04624."},{"key":"107730","doi-asserted-by":"publisher","DOI":"10.1038\/s41746-024-01330-2"},{"key":"107731","doi-asserted-by":"publisher","DOI":"10.1145\/3578245.3584691"},{"key":"107732","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE55347.2025.00097"},{"key":"107733","unstructured":"Chrono: Date and Time for Rust, 2025. Available from: https:\/\/docs.rs\/chrono."},{"key":"107734","unstructured":"Chu, B., Feng, Y., Liu, K., Shi, H., Nan, Z., Guo, Z. and Xu, B., 2025. Boosting Rust Unit Test Coverage through Hybrid Program Analysis and Large Language Models. 2506.09002, Available from: https:\/\/doi.org\/10.48550\/arXiv.2506.09002."},{"key":"107735","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2025.findings-emnlp.743"},{"key":"107736","unstructured":"CSV \u2013 CSV File Reading and Writing, 2025. Available from: https:\/\/docs.python.org\/3\/library\/csv.html."},{"key":"107737","unstructured":"Datetime \u2013 Basic date and time types, 2025. Available from: https:\/\/docs.python.org\/3\/library\/datetime.html."},{"key":"107738","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE55347.2025.00022"},{"key":"107739","unstructured":"Deligiannis, P., Lal, A., Mehrotra, N. and Rastogi, A., 2023. Fixing Rust Compilation Errors using LLMs. 2308.05177, Available from: https:\/\/doi.org\/10.48550\/arXiv.2308.05177."},{"key":"107740","doi-asserted-by":"publisher","DOI":"10.1145\/3638550.3641126"},{"key":"107741","unstructured":"Emami, Y., Zhou, H., Nabavirazani, S. and Almeida, L., 2025. LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks. 2504.14556, Available from: https:\/\/doi.org\/10.48550\/arXiv.2504.14556."},{"key":"107742","unstructured":"Eniser, H.F., Zhang, H., David, C., Wang, M., Christakis, M., Paulsen, B., Dodds, J. and Kroening, D., 2024. Towards translating real-world code with LLMs: A study of translating to Rust. 2405.11514, Available from: https:\/\/doi.org\/10.48550\/arXiv.2405.11514."},{"key":"107743","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE55347.2025.00175"},{"key":"107744","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.8269"},{"key":"107745","doi-asserted-by":"publisher","DOI":"10.1609\/aaaiss.v2i1.27688"},{"key":"107746","doi-asserted-by":"publisher","DOI":"10.1007\/s10664-024-10573-2"},{"key":"107747","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btae350"},{"key":"107748","unstructured":"Kao, C.H., Zhao, W., Revankar, S., Speas, S., Bhagat, S., Datta, R., Phoo, C.P., Mall, U., Vondrick, C., Bala, K. and Hariharan, B., 2025. Towards LLM Agents for Earth Observation. 2504.12110, Available from: https:\/\/doi.org\/10.48550\/arXiv.2504.12110."},{"key":"107749","unstructured":"Lawton, N., Padmakumar, A., Gaspers, J., FitzGerald, J., Kumar, A., Steeg, G.V. and Galstyan, A., 2024. QuAILoRA: Quantization-Aware Initialization for LoRA. 2410.14713, Available from: https:\/\/doi.org\/10.48550\/arXiv.2410.14713."},{"key":"107750","unstructured":"Li, K. and Yuan, Y., 2024. Large language models as test case generators: Performance evaluation and enhancement. 2404.13340, Available from: https:\/\/doi.org\/10.48550\/arXiv.2404.13340."},{"key":"107751","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2025.3596892"},{"key":"107752","unstructured":"Li, Z., Su, Y., Yang, R., Xie, C., Wang, Z., Xie, Z., Wong, N. and Yang, H., 2025. Quantization meets reasoning: Exploring LLM low-bit quantization degradation for mathematical reasoning. 2501.03035, Available from: https:\/\/doi.org\/10.48550\/arXiv.2501.03035."},{"key":"107753","unstructured":"Liang, L., Gong, J., Liu, M., Wang, C., Ou, G., Wang, Y., Peng, X. and Zheng, Z., 2025. RustEvo2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation. 2503.16922, Available from: https:\/\/doi.org\/10.48550\/arXiv.2503.16922."},{"key":"107754","doi-asserted-by":"publisher","DOI":"10.1145\/3714983.3714987"},{"key":"107755","unstructured":"Luo, Y., Zhou, H., Zhang, M., De La Rosa, D., Ahmed, H., Xu, W. and Xu, D., 2025. HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust. 2503.10793, Available from: https:\/\/doi.org\/10.48550\/arXiv.2503.10793."},{"key":"107756","unstructured":"Martins, E.M., Fa\u00e9, L.G., Hoffmann, R.B., Bianchessi, L.S. and Griebler, D., 2025. NPB-Rust: NAS Parallel Benchmarks in Rust. 2502.15536, Available from: https:\/\/doi.org\/10.48550\/arXiv.2502.15536."},{"key":"107757","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2025.3641486"},{"key":"107758","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640383"},{"key":"107759","unstructured":"Ollama, 2025. Available from: https:\/\/ollama.com\/."},{"key":"107760","unstructured":"OS \u2013 Miscellaneous operating system interfaces, 2025. Available from: https:\/\/docs.python.org\/3\/library\/os.html."},{"key":"107761","unstructured":"Park, J.J. and Choi, S.J., 2024. LLMs for Enhanced Agricultural Meteorological Recommendations. 2408.04640, Available from: https:\/\/doi.org\/10.48550\/arXiv.2408.04640."},{"key":"107762","unstructured":"Prabakar, A. and Kiran, R., 2024. WebAssembly Performance Analysis: A Comparative Study of C++ and Rust Implementations. Available from: https:\/\/www.diva-portal.org\/smash\/get\/diva2:1879948\/FULLTEXT01.pdf."},{"key":"107763","unstructured":"Requests: HTTP for Humans\u2122, 2025. Available from: https:\/\/requests.readthedocs.io\/."},{"key":"107764","unstructured":"Reqwest, 2025. Available from: https:\/\/docs.rs\/reqwest\/."},{"key":"107765","doi-asserted-by":"publisher","DOI":"10.36227\/techrxiv.174063060.01215875\/v1"},{"key":"107766","unstructured":"Serde, 2025. Available from: https:\/\/serde.rs\/."},{"key":"107767","unstructured":"Shetty, M., Jain, N., Godbole, A., Seshia, S.A. and Sen, K., 2024. Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis. 2412.14234, Available from: https:\/\/doi.org\/10.48550\/arXiv.2412.14234."},{"key":"107768","doi-asserted-by":"publisher","DOI":"10.1109\/IATMSI64286.2025.10984781"},{"key":"107769","unstructured":"Time \u2013 Time access and conversions, 2025. Available from: https:\/\/docs.python.org\/3\/library\/time.html."},{"key":"107770","doi-asserted-by":"publisher","DOI":"10.1016\/j.smhl.2025.100570"},{"key":"107771","unstructured":"Wu, J., Chen, S., Cao, J., Lo, H.C. and Cheung, S.C., 2025. Isolating languagecoding from problem-solving: Benchmarking llms with pseudoeval. 2502.19149, Available from: https:\/\/doi.org\/10.48550\/arXiv.2502.19149."},{"key":"107772","doi-asserted-by":"publisher","DOI":"10.52202\/079017-2034"},{"key":"107773","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2024.3513457"},{"key":"107774","unstructured":"Yang, A.Z., Takashima, Y., Paulsen, B., Dodds, J. and Kroening, D., 2024. VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners. 2404.18852, Available from: https:\/\/doi.org\/10.48550\/arXiv.2404.18852."},{"key":"107775","unstructured":"Yao, J., Zhou, Z., Chen, W. and Cui, W., 2023. Leveraging Large Language Models for Automated Proof Synthesis in Rust. 2311.03739, Available from: https:\/\/doi.org\/10.48550\/arXiv.2311.03739."},{"key":"107776","doi-asserted-by":"publisher","DOI":"10.1145\/3649329.3658473"},{"key":"107777","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i21.34385"},{"key":"107778","doi-asserted-by":"publisher","DOI":"10.1145\/3729315"},{"key":"107779","doi-asserted-by":"publisher","DOI":"10.1109\/WCNC57260.2024.10571127"},{"key":"107780","doi-asserted-by":"publisher","DOI":"10.1145\/3691620.3695010"},{"key":"107781","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCWorkshops62562.2024.10693742"},{"key":"107782","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Ren, X., Xue, F., Luo, Y., Jiang, X. and You, Y., 2023. Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline. Advances in Neural Information Processing Systems, vol. 36. pp.65517\u201365530. Available from: https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2023\/hash\/ce7ff3405c782f761fac7f849b41ae9a-Abstract-Conference.html.","DOI":"10.52202\/075280-2859"}],"container-title":["Journal of Edge Computing"],"original-title":[],"link":[{"URL":"https:\/\/acnsci.org\/journal\/index.php\/jec\/article\/download\/1047\/944","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/acnsci.org\/journal\/index.php\/jec\/article\/download\/1047\/944","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T02:35:47Z","timestamp":1779330947000},"score":1,"resource":{"primary":{"URL":"https:\/\/acnsci.org\/journal\/index.php\/jec\/article\/view\/1047"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,21]]},"references-count":57,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2026,5,21]]}},"URL":"https:\/\/doi.org\/10.55056\/jec.1047","relation":{},"ISSN":["2837-181X"],"issn-type":[{"value":"2837-181X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,21]]}}}