{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T13:17:05Z","timestamp":1772111825517,"version":"3.50.1"},"reference-count":117,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,6]],"date-time":"2025-03-06T00:00:00Z","timestamp":1741219200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Generative AI (GenAI) models are designed to produce realistic and natural data, such as images, audio, or written text. Due to their high computational and memory demands, these models traditionally run on powerful remote compute servers. However, there is growing interest in deploying GenAI models at the edge, on resource-constrained embedded devices. Since 2018, the TinyML community has proved that running fixed topology AI models on edge devices offers several benefits, including independence from internet connectivity, low-latency processing, and enhanced privacy. Nevertheless, deploying resource-consuming GenAI models on embedded devices is challenging since the latter have limited computational, memory, and energy resources. This review paper aims to evaluate the progresses made to date in the field of Edge GenAI, an emerging area of research within the broader domain of EdgeAI which focuses on bringing GenAI on edge devices. Papers released between 2022 and 2024 that address the design and deployment of GenAI models on embedded devices are identified and described. Additionally, their approaches and results are compared. This manuscript contributes to understand the ongoing transition from TinyML to Edge GenAI and provides valuable insights to the AI research community on this emerging, impactful, and quite under-explored field.<\/jats:p>","DOI":"10.3390\/bdcc9030061","type":"journal-article","created":{"date-parts":[[2025,3,6]],"date-time":"2025-03-06T06:02:13Z","timestamp":1741240933000},"page":"61","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Transitioning from TinyML to Edge GenAI: A Review"],"prefix":"10.3390","volume":"9","author":[{"given":"Gloria","family":"Giorgetti","sequence":"first","affiliation":[{"name":"STMicroelectronics, Business Center Colleoni, Via Paracelso, 20, Building Andromeda 3, 7th Floor, 20864 Agrate Brianza, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1585-2313","authenticated-orcid":false,"given":"Danilo Pietro","family":"Pau","sequence":"additional","affiliation":[{"name":"STMicroelectronics, Business Center Colleoni, Via Paracelso, 20, Building Andromeda 3, 7th Floor, 20864 Agrate Brianza, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,6]]},"reference":[{"key":"ref_1","unstructured":"Fiorenza, G., Pau, D.P., and Schettini, R. (2024, January 11\u201313). Action Prediction with Edge Generative AI for Mice Pre-clinical Studies. Proceedings of the 2024 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA."},{"key":"ref_2","unstructured":"(2024, December 04). About the EDGE AI FOUNDATION. Available online: https:\/\/www.edgeaifoundation.org\/about."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ancilotto, A., and Farella, E. (2024, January 11\u201315). Painting the Starry Night using XiNets. Proceedings of the 2024 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), Biarritz, France.","DOI":"10.1109\/PerComWorkshops59983.2024.10502435"},{"key":"ref_4","first-page":"1","article-title":"Mobivqa: Efficient on-device visual question answering","volume":"6","author":"Cao","year":"2022","journal-title":"Proc. Acm Interact. Mob. Wearable Ubiquitous Technol."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tan, H., and Bansal, M. (2019). Lxmert: Learning cross-modality encoder representations from transformers. arXiv.","DOI":"10.18653\/v1\/D19-1514"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Cho, J., Lu, J., Schwenk, D., Hajishirzi, H., and Kembhavi, A. (2020). X-lxmert: Paint, caption and answer questions with multi-modal transformers. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.707"},{"key":"ref_7","unstructured":"Kim, W., Son, B., and Kim, I. (2021, January 18\u201324). Vilt: Vision-and-language transformer without convolution or region supervision. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"9543","DOI":"10.1109\/TMM.2023.3254205","article-title":"Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering","volume":"25","author":"Yu","year":"2023","journal-title":"IEEE Trans. Multimed."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Yu, Z., Yu, J., Cui, Y., Tao, D., and Tian, Q. (2019, January 15\u201320). Deep modular co-attention networks for visual question answering. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00644"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chen, Y.C., Li, L., Yu, L., El Kholy, A., Ahmed, F., Gan, Z., Cheng, Y., and Liu, J. (2020, January 23\u201328). Uniter: Universal image-text representation learning. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58577-8_7"},{"key":"ref_11","unstructured":"Shen, S., Li, L.H., Tan, H., Bansal, M., Rohrbach, A., Chang, K.W., Yao, Z., and Keutzer, K. (2021). How much can clip benefit vision-and-language tasks?. arXiv."},{"key":"ref_12","unstructured":"Rashid, H.A., Sarkar, A., Gangopadhyay, A., Rahnemoonfar, M., and Mohsenin, T. (2024). TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"89644","DOI":"10.1109\/ACCESS.2021.3090981","article-title":"Floodnet: A high resolution aerial imagery dataset for post flood scene understanding","volume":"9","author":"Rahnemoonfar","year":"2021","journal-title":"IEEE Access"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"4702716","DOI":"10.1109\/TGRS.2023.3276293","article-title":"Sam-vqa: Supervised attention-based visual question answering model for post-disaster damage assessment on remote sensing imagery","volume":"61","author":"Sarkar","year":"2023","journal-title":"IEEE Trans. Geosci. Remote. Sens."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Mishra, A., Agarwala, A., Tiwari, U., Rajendiran, V.N., and Miriyala, S.S. (2024, January 27\u201330). Efficient Visual Question Answering on Embedded Devices: Cross-Modality Attention With Evolutionary Quantization. Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICIP51287.2024.10647455"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., and Parikh, D. (2017, January 21\u201326). Making the v in vqa matter: Elevating the role of image understanding in visual question answering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.670"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Safiya, K., and Pandian, R. (2023, January 6\u20138). Computer Vision and Voice Assisted Image Captioning Framework for Visually Impaired Individuals using Deep Learning Approach. Proceedings of the 2023 4th IEEE Global Conference for Advancement in Technology (GCAT), Bengalore, India.","DOI":"10.1109\/GCAT59970.2023.10353449"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, Y., Lou, S., Wang, K., Wang, Y., Yuan, X., and Liu, H. (2024, January 13\u201317). Automatic Captioning based on Visible and Infrared Images. Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan.","DOI":"10.1109\/ICRA57147.2024.10610654"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Gao, C., Dong, Y., Yuan, X., Han, Y., and Liu, H. (June, January 29). Infrared Image Captioning with Wearable Device. Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK.","DOI":"10.1109\/ICRA48891.2023.10160809"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Arystanbekov, B., Kuzdeuov, A., Nurgaliyev, S., and Varol, H.A. (2023, January 24\u201327). Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages. Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia.","DOI":"10.1109\/EMBC40787.2023.10340575"},{"key":"ref_21","first-page":"610","article-title":"Resnet based deep gated recurrent unit for image captioning on smartphone","volume":"35","author":"Uslu","year":"2022","journal-title":"Avrupa Bilim Teknol. Derg."},{"key":"ref_22","first-page":"161","article-title":"Fusion of High-Level Visual Attributes for Image Captioning","volume":"52","year":"2023","journal-title":"Avrupa Bilim Teknol. Derg."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"248","DOI":"10.1108\/JET-03-2024-0024","article-title":"MyUEVision: An application generating image caption for assisting visually impaired people","volume":"18","author":"Nguyen","year":"2024","journal-title":"J. Enabling Technol."},{"key":"ref_24","first-page":"380","article-title":"Sequence-to-sequence video captioning with residual connected gated recurrent units","volume":"35","author":"Onan","year":"2022","journal-title":"Avrupa Bilim Teknol. Derg."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Pezzuto Damaceno, R.J., and Cesar, R.M. (2023, January 27\u201330). An End-to-End Deep Learning Approach for Video Captioning Through Mobile Devices. Proceedings of the Iberoamerican Congress on Pattern Recognition, Coimbra, Portugal.","DOI":"10.1007\/978-3-031-49018-7_51"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1939","DOI":"10.1109\/JSYST.2024.3456864","article-title":"Average Sparse Attention for Dense Video Captioning From Multiperspective Edge-Computing Cameras","volume":"18","author":"Huang","year":"2024","journal-title":"IEEE Syst. J."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"13039","DOI":"10.1109\/JIOT.2023.3337287","article-title":"Sequence-Aware Learnable Sparse Mask for Frame-Selectable End-to-End Dense Video Captioning for IoT Smart Cameras","volume":"11","author":"Huang","year":"2023","journal-title":"IEEE Internet Things J."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"4554","DOI":"10.1109\/JIOT.2021.3104289","article-title":"Environment-aware dense video captioning for IoT-enabled edge cameras","volume":"9","author":"Lu","year":"2022","journal-title":"IEEE Internet Things J."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"146","DOI":"10.24237\/djes.2024.17310","article-title":"A Lightweight Visual Understanding System for Enhanced Assistance to the Visually Impaired Using an Embedded Platform","volume":"17","author":"Yousif","year":"2024","journal-title":"Diyala J. Eng. Sci."},{"key":"ref_30","unstructured":"Wang, N., Xie, J., Luo, H., Cheng, Q., Wu, J., Jia, M., and Li, L. (2023, January 7\u201314). Efficient image captioning for edge devices. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Lin, J., Yin, H., Ping, W., Molchanov, P., Shoeybi, M., and Han, S. (2024, January 16\u201322). Vila: On pre-training for visual language models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.02520"},{"key":"ref_32","first-page":"87","article-title":"AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration","volume":"6","author":"Lin","year":"2024","journal-title":"Proc. Mach. Learn. Syst."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Atienza, R. (2023, January 4\u201310). Efficientspeech: An on-device text to speech model. Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10094639"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Chevi, R., Prasojo, R.E., Aji, A.F., Tjandra, A., and Sakti, S. (2023, January 9\u201312). Nix-TTS: Lightweight and end-to-end text-to-speech via module-wise distillation. Proceedings of the 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar.","DOI":"10.1109\/SLT54892.2023.10023322"},{"key":"ref_35","unstructured":"Kim, J., Kong, J., and Son, J. (2021, January 18\u201324). Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_36","unstructured":"(2025, January 09). Piper. Available online: https:\/\/github.com\/rhasspy\/piper."},{"key":"ref_37","first-page":"17022","article-title":"Hifi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis","volume":"33","author":"Kong","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_38","unstructured":"Ren, Y., Hu, C., Tan, X., Qin, T., Zhao, S., Zhao, Z., and Liu, T.Y. (2020). Fastspeech 2: Fast and high-quality end-to-end text to speech. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Nguyen, V.T., Pham, H.C., and Mac, D.K. (2023, January 4\u201310). How to Push the Fastest Model 50x Faster: Streaming Non-Autoregressive Speech Synthesis on Resouce-Limited Devices. Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece.","DOI":"10.1109\/ICASSP49357.2023.10096329"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Yang, G., Yang, S., Liu, K., Fang, P., Chen, W., and Xie, L. (2021, January 12\u201322). Multi-band melgan: Faster waveform generation for high-quality text-to-speech. Proceedings of the 2021 IEEE Spoken Language Technology Workshop (SLT), Shenzhen, China.","DOI":"10.1109\/SLT48900.2021.9383551"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Ciapponi, S., Paissan, F., Ancilotto, A., and Farella, E. (October, January 30). TinyVocos: Neural Vocoders on MCUs. Proceedings of the 2024 IEEE 5th International Symposium on the Internet of Sounds (IS2), Erlangen, Germany.","DOI":"10.1109\/IS262782.2024.10704173"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Park, S., Choo, K., Lee, J., Porov, A.V., Osipov, K., and Sung, J.S. (2022). Bunched LPCNet2: Efficient neural vocoders covering devices from cloud to edge. arXiv.","DOI":"10.21437\/Interspeech.2022-310"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Chen, S., Weng, J., Hong, S., He, Y., Zou, Y., and Wu, K. (2024, January 23\u201325). TransFiLM: An Efficient and Lightweight Audio Enhancement Network for Low-Cost Wearable Sensors. Proceedings of the 2024 IEEE 21st International Conference on Mobile Ad-Hoc and Smart Systems (MASS), Seoul, Republic of Korea.","DOI":"10.1109\/MASS62177.2024.00030"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"\u0160ljubura, N., \u0160imi\u0107, M., and Bilas, V. (2024, January 23\u201325). Deep Learning Based Speech Enhancement on Edge Devices Applied to Assistive Work Equipment. Proceedings of the 2024 IEEE Sensors Applications Symposium (SAS), Naples, Italy.","DOI":"10.1109\/SAS60918.2024.10636511"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Nossier, S.A., Wall, J.A., Moniri, M., Glackin, C., and Cannings, N. (2022, January 18\u201322). Convolutional Recurrent Smart Speech Enhancement Architecture for Hearing Aids. Proceedings of the INTERSPEECH 2022, Incheon, Republic of Korea.","DOI":"10.21437\/Interspeech.2022-522"},{"key":"ref_46","unstructured":"Vaswani, A. (2025, March 02). Attention Is All You Need. Advances in Neural Information Processing Systems. Available online: https:\/\/papers.nips.cc\/paper_files\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1109\/TASLP.2022.3153257","article-title":"Dynamic multi-branch layers for on-device neural machine translation","volume":"30","author":"Tan","year":"2022","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Li, S., Zhang, P., Gan, G., Lv, X., Wang, B., Wei, J., and Jiang, X. (2022, January 7\u201311). Hypoformer: Hybrid decomposition transformer for edge-friendly neural machine translation. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates.","DOI":"10.18653\/v1\/2022.emnlp-main.475"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Kim, Y., and Rush, A.M. (2016). Sequence-level knowledge distillation. arXiv.","DOI":"10.18653\/v1\/D16-1139"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Ancilotto, A., Paissan, F., and Farella, E. (2023, January 2\u20133). Xinet: Efficient neural networks for tinyml. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01556"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"103947","DOI":"10.1016\/j.cviu.2024.103947","article-title":"Towards efficient image and video style transfer via distillation and learnable feature transformation","volume":"241","author":"Huo","year":"2024","journal-title":"Comput. Vis. Image Underst."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Suresh, A.P., Jain, S., Noinongyao, P., Ganguly, A., Watchareeruetai, U., and Samacoits, A. (2024, January 4\u20138). Fastclipstyler: Optimisation-free text-based image style transfer using style representations. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV57701.2024.00715"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Kwon, G., and Ye, J.C. (2022, January 18\u201324). Clipstyler: Image style transfer with a single text condition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01753"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Reimers, N. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Ganugula, P., Kumar, Y., Reddy, N., Chellingi, P., Thakur, A., Kasera, N., and Anand, C.S. (2023, January 1\u20136). MOSAIC: Multi-object segmented arbitrary stylization using CLIP. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCVW60793.2023.00096"},{"key":"ref_56","unstructured":"Xu, Z., Hong, Z., Ding, C., Zhu, Z., Han, J., Liu, J., and Ding, E. (March, January 28). Mobilefaceswap: A lightweight framework for video face swapping. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Ancilotto, A., Paissan, F., and Farella, E. (2023, January 13\u201317). PhiNet-GAN: Bringing real-time face swapping to embedded devices. Proceedings of the 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), Atlanta, GA, USA.","DOI":"10.1109\/PerComWorkshops56833.2023.10150292"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3603173","article-title":"Ximswap: Many-to-many face swapping for tinyml","volume":"23","author":"Ancilotto","year":"2024","journal-title":"ACM Trans. Embed. Comput. Syst."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Berger, G., Dhingra, M., Mercier, A., Savani, Y., Panchal, S., and Porikli, F. (2023, January 17\u201324). QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPRW59228.2023.00212"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Conde, M.V., Vasluianu, F., Vazquez-Corral, J., and Timofte, R. (2023, January 2\u20137). Perceptual image enhancement for smartphone real-time applications. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00189"},{"key":"ref_61","unstructured":"LI, H., Guan, J., Rui, L., Ma, S., Gu, L., and Zhu, Z. (2024, January 10\u201315). TinyLUT: Tiny Look-Up Table for Efficient Image Restoration at the Edge. Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_62","unstructured":"Ignatov, A., Timofte, R., Denna, M., Younes, A., Gankhuyag, G., Huh, J., Kim, M.K., Yoon, K., Moon, H.C., and Lee, S. (2022, January 23\u201327). Efficient and accurate quantized image super-resolution on mobile NPUs, mobile AI & AIM 2022 challenge: Report. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Ignatov, A., Timofte, R., Chiang, C.M., Kuo, H.K., Xu, Y.S., Lee, M.Y., Lu, A., Cheng, C.M., Chen, C.C., and Yong, J.Y. (2022, January 23\u201327). Power efficient video super-resolution on mobile npus with deep learning, mobile ai & aim 2022 challenge: Report. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_6"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Ignatov, A., Timofte, R., Liu, S., Feng, C., Bai, F., Wang, X., Lei, L., Yi, Z., Xiang, Y., and Liu, Z. (2022, January 23\u201327). Learned smartphone ISP on mobile GPUs with deep learning, mobile AI & AIM 2022 challenge: Report. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_3"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Ignatov, A., Timofte, R., Zhang, J., Zhang, F., Yu, G., Ma, Z., Wang, H., Kwon, M., Qian, H., and Tong, W. (2022, January 23\u201327). Realistic bokeh effect rendering on mobile gpus, mobile ai & aim 2022 challenge: Report. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_7"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Ignatov, A., Malivenko, G., Timofte, R., Treszczotko, L., Chang, X., Ksiazek, P., Lopuszynski, M., Pioro, M., Rudnicki, R., and Smyl, M. (2022, January 23\u201327). Efficient single-image depth estimation on mobile devices, mobile AI & AIM 2022 challenge: Report. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_4"},{"key":"ref_67","unstructured":"Yang, R., Timofte, R., Li, X., Zhang, Q., Zhang, L., Liu, F., He, D., Li, F., Zheng, H., and Yuan, W. (2022, January 23\u201327). Aim 2022 challenge on super-resolution of compressed image and video: Dataset, methods and results. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel."},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Conde, M.V., Timofte, R., Huang, Y., Peng, J., Chen, C., Li, C., P\u00e9rez-Pellitero, E., Song, F., Bai, F., and Liu, S. (2022, January 23\u201327). Reversed image signal processing and RAW reconstruction. AIM 2022 challenge report. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_1"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"K\u0131nl\u0131, F., Mente\u015f, S., \u00d6zcan, B., K\u0131ra\u00e7, F., Timofte, R., Zuo, Y., Wang, Z., Zhang, X., Zhu, Y., and Li, C. (2022, January 23\u201327). AIM 2022 challenge on Instagram filter removal: Methods and results. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25066-8_2"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Sargsyan, A., Navasardyan, S., Xu, X., and Shi, H. (2023, January 1\u20136). Mi-gan: A simple baseline for image inpainting on mobile devices. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.00674"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Verma, S., Sharma, A., Sheshadri, R., and Raman, S. (2024, January 4\u20138). GraphFill: Deep Image Inpainting using Graphs. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV57701.2024.00492"},{"key":"ref_72","doi-asserted-by":"crossref","unstructured":"Ayazoglu, M., and Bilecen, B.B. (2022, January 23\u201327). Xcat-lightweight quantized single image super-resolution using heterogeneous group convolutions and cross concatenation. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_29"},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Gendy, G., Sabor, N., Hou, J., and He, G. (2022, January 23\u201327). Real-time channel mixing net for mobile image super-resolution. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_36"},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Luo, Z., Li, Y., Yu, L., Wu, Q., Wen, Z., Fan, H., and Liu, S. (2022, January 23\u201327). Fast nearest convolution for real-time efficient image super-resolution. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_35"},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"106407","DOI":"10.1016\/j.engappai.2023.106407","article-title":"Generative adversarial super-resolution at the edge with knowledge distillation","volume":"123","author":"Angarano","year":"2023","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Chao, J., Zhou, Z., Gao, H., Gong, J., Yang, Z., Zeng, Z., and Dehbi, L. (2023, January 17\u201324). Equivalent transformation and dual stream network construction for mobile image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.01355"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Deng, W., Yuan, H., Deng, L., and Lu, Z. (2023, January 17\u201324). Reparameterized residual feature network for lightweight image super-resolution. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPRW59228.2023.00172"},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"4972","DOI":"10.1109\/ACCESS.2022.3232258","article-title":"Skip-concatenated image super-resolution network for mobile devices","volume":"11","author":"Gankhuyag","year":"2022","journal-title":"IEEE Access"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"12183","DOI":"10.1007\/s11063-023-11415-w","article-title":"Texture-Enhanced Framework by Differential Filter-Based Re-parameterization for Super-Resolution on PC\/Mobile","volume":"55","author":"Liu","year":"2023","journal-title":"Neural Process. Lett."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"16440","DOI":"10.1109\/JIOT.2023.3268285","article-title":"Two-stage deep single-image super-resolution with multiple blur kernels for Internet of Things","volume":"10","author":"Sun","year":"2023","journal-title":"IEEE Internet Things J."},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Gao, S., Zheng, C., Zhang, X., Liu, S., Wu, B., Lu, K., Zhang, D., and Wang, N. (2022, January 23\u201327). RCBSR: Re-parameterization convolution block for super-resolution. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_33"},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Lian, W., and Lian, W. (2022, January 23\u201327). Sliding window recurrent network for efficient video super-resolution. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_37"},{"key":"ref_83","unstructured":"Xu, T., Jia, Z., Zhang, Y., Bao, L., and Sun, H. (2022). Elsr: Extreme low-power super resolution network for mobile devices. arXiv."},{"key":"ref_84","doi-asserted-by":"crossref","unstructured":"Yue, S., Li, C., Zhuge, Z., and Song, R. (2022, January 23\u201327). Eesrnet: A network for energy efficient super-resolution. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_38"},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Gou, W., Yi, Z., Xiang, Y., Li, S., Liu, Z., Kong, D., and Xu, K. (2023, January 17\u201324). SYENet: A simple yet effective network for multiple low-level vision tasks with real-time performance on mobile device. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Vancouver, BC, Canada.","DOI":"10.1109\/ICCV51070.2023.01119"},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"21508","DOI":"10.1007\/s11227-024-06160-3","article-title":"MWformer: A novel low computational cost image restoration algorithm","volume":"80","author":"Liao","year":"2024","journal-title":"J. Supercomput."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1109\/TCSVT.2023.3285014","article-title":"Lightweight neural network for enhancing imaging performance of under-display camera","volume":"34","author":"Li","year":"2023","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Fu, Z., Song, M., Ma, C., Nasti, J., Tyagi, V., Lloyd, G., and Tang, W. (2022, January 18\u201324). An efficient hybrid model for low-light image enhancement in mobile devices. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00345"},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"A Sharif, S., Myrzabekov, A., Khudjaev, N., Tsoy, R., Kim, S., and Lee, J. (2024, January 16\u201322). Learning optimized low-light image enhancement for edge vision tasks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPRW63382.2024.00639"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Liu, Z., Jin, M., Chen, Y., Liu, H., Yang, C., and Xiong, H. (2023, January 8\u201311). Lightweight network towards real-time image denoising on mobile devices. Proceedings of the 2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, Malaysia.","DOI":"10.1109\/ICIP49359.2023.10222387"},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Flepp, R., Ignatov, A., Timofte, R., and Van Gool, L. (2024, January 16\u201322). Real-World Mobile Image Denoising Dataset with Efficient Baselines. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.02111"},{"key":"ref_92","unstructured":"Xiang, L., Zhou, J., Liu, J., Wang, Z., Huang, H., Hu, J., Han, J., Guo, Y., and Ding, G. (March, January 28). ReMoNet: Recurrent multi-output network for efficient video denoising. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Ignatov, A., Malivenko, G., Timofte, R., Tseng, Y., Xu, Y.S., Yu, P.H., Chiang, C.M., Kuo, H.K., Chen, M.H., and Cheng, C.M. (2022, January 21\u201325). Pynet-v2 mobile: Efficient on-device photo processing with neural networks. Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada.","DOI":"10.1109\/ICPR56361.2022.9956598"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Ignatov, A., Sycheva, A., Timofte, R., Tseng, Y., Xu, Y.S., Yu, P.H., Chiang, C.M., Kuo, H.K., Chen, M.H., and Cheng, C.M. (2022, January 23\u201327). MicroISP: Processing 32mp photos on mobile devices with deep learning. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_46"},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Raimundo, D.W., Ignatov, A., and Timofte, R. (2022, January 18\u201324). LAN: Lightweight attention-based network for RAW-to-RGB smartphone image processing. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00096"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Zheng, J., Fan, Z., Wu, X., Wu, Y., and Zhang, F. (2022, January 23\u201327). Residual Feature Distillation Channel Spatial Attention Network for ISP on Smartphone. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-25063-7_40"},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"20869","DOI":"10.1007\/s00521-023-08852-y","article-title":"Depth-guided deep filtering network for efficient single image bokeh rendering","volume":"35","author":"Chen","year":"2023","journal-title":"Neural Comput. Appl."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Chen, Y.H., Sarokin, R., Lee, J., Tang, J., Chang, C.L., Kulik, A., and Grundmann, M. (2023, January 17\u201324). Speed is all you need: On-device acceleration of large diffusion models via gpu-aware optimizations. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPRW59228.2023.00490"},{"key":"ref_99","unstructured":"Choi, J., Kim, M., Ahn, D., Kim, T., Kim, Y., Jo, D., Jeon, H., Kim, J.J., and Kim, H. (2023). Squeezing large-scale diffusion models for mobile. arXiv."},{"key":"ref_100","unstructured":"Castells, T., Song, H.K., Piao, T., Choi, S., Kim, B.K., Yim, H., Lee, C., Kim, J.G., and Kim, T.H. (2024). EdgeFusion: On-Device Text-to-Image Generation. arXiv."},{"key":"ref_101","unstructured":"Hu, D., Chen, J., Huang, X., Coskun, H., Sahni, A., Gupta, A., Goyal, A., Lahiri, D., Singh, R., and Idelbayev, Y. (2024). SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training. arXiv."},{"key":"ref_102","first-page":"20662","article-title":"Snapfusion: Text-to-image diffusion model on mobile devices within two seconds","volume":"36","author":"Li","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_103","unstructured":"Kim, B.K., Song, H.K., Castells, T., and Choi, S. (October, January 29). Bk-sdm: A lightweight, fast, and cheap version of stable diffusion. Proceedings of the European Conference on Computer Vision, Milan, Italy."},{"key":"ref_104","unstructured":"Zhao, Y., Xu, Y., Xiao, Z., Jia, H., and Hou, T. (October, January 29). Mobilediffusion: Instant text-to-image generation on mobile devices. Proceedings of the European Conference on Computer Vision, Milan, Italy."},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022, January 18\u201324). High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"ref_106","unstructured":"Orhon, A., Siracusa, M., and Wadhwa, A. (2025, January 02). Stable Diffusion with Core ML on Apple Silicon. Available online: https:\/\/machinelearning.apple.com\/research\/stable-diffusion-coreml-apple-silicon."},{"key":"ref_107","unstructured":"Asghar, Z., and Hou, J. (2025, January 02). World\u2019s First On-Device Demonstration of Stable Diffusion on an Android Phone. Available online: https:\/\/www.qualcomm.com\/news\/onq\/2023\/02\/worlds-first-on-device-demonstration-of-stable-diffusion-on-android."},{"key":"ref_108","unstructured":"Xu, J., Li, Z., Chen, W., Wang, Q., Gao, X., Cai, Q., and Ling, Z. (2024). On-device language models: A comprehensive review. arXiv."},{"key":"ref_109","doi-asserted-by":"crossref","unstructured":"Zheng, Y., Chen, Y., Qian, B., Shi, X., Shu, Y., and Chen, J. (2024). A Review on Edge Large Language Models: Design, Execution, and Applications. arXiv.","DOI":"10.1145\/3719664"},{"key":"ref_110","doi-asserted-by":"crossref","unstructured":"Laskaridis, S., Katevas, K., Minto, L., and Haddadi, H. (2024, January 18\u201322). Melting point: Mobile evaluation of language transformers. Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, Washington, DC, USA.","DOI":"10.1145\/3636534.3690668"},{"key":"ref_111","unstructured":"Xiao, J., Huang, Q., Chen, X., and Tian, C. (2024). Large language model performance benchmarking on mobile platforms: A thorough evaluation. arXiv."},{"key":"ref_112","unstructured":"Wang, F., Zhang, Z., Zhang, X., Wu, Z., Mo, T., Lu, Q., Wang, W., Li, R., Xu, J., and Tang, X. (2024). A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and trustworthiness. arXiv."},{"key":"ref_113","unstructured":"Liu, Z., Zhao, C., Iandola, F., Lai, C., Tian, Y., Fedorov, I., Xiong, Y., Chang, E., Shi, Y., and Krishnamoorthi, R. (2024). Mobilellm: Optimizing sub-billion parameter language models for on-device use cases. arXiv."},{"key":"ref_114","unstructured":"Abdin, M., Aneja, J., Awadalla, H., Awadallah, A., Awan, A.A., Bach, N., Bahree, A., Bakhtiari, A., Bao, J., and Behl, H. (2024). Phi-3 technical report: A highly capable language model locally on your phone. arXiv."},{"key":"ref_115","unstructured":"Hu, S., Tu, Y., Han, X., He, C., Cui, G., Long, X., Zheng, Z., Fang, Y., Huang, Y., and Zhao, W. (2024). Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv."},{"key":"ref_116","unstructured":"Thawakar, O., Vayani, A., Khan, S., Cholakal, H., Anwer, R.M., Felsberg, M., Baldwin, T., Xing, E.P., and Khan, F.S. (2024). Mobillama: Towards accurate and lightweight fully transparent gpt. arXiv."},{"key":"ref_117","unstructured":"Yi, R., Li, X., Xie, W., Lu, Z., Wang, C., Zhou, A., Wang, S., Zhang, X., and Xu, M. (2024). Phonelm: An efficient and capable small language model family through principled pre-training. arXiv."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/3\/61\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:48:16Z","timestamp":1760028496000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/3\/61"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,6]]},"references-count":117,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["bdcc9030061"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9030061","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,6]]}}}