{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,15]],"date-time":"2025-11-15T10:31:24Z","timestamp":1763202684278,"version":"build-2065373602"},"reference-count":65,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["51827814","2021289"],"award-info":[{"award-number":["51827814","2021289"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004739","name":"Youth Innovation Promotion Association","doi-asserted-by":"publisher","award":["51827814","2021289"],"award-info":[{"award-number":["51827814","2021289"]}],"id":[{"id":"10.13039\/501100004739","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>The emergence of Transformer has led to the rapid development of video understanding, but it also brings the problem of high computational complexity. Previously, there were methods to divide the feature maps into windows along the spatiotemporal dimensions and then calculate the attention. There are also methods to perform down-sampling during attention computation to reduce the spatiotemporal resolution of features. Although the complexity is effectively reduced, there is still room for further optimization. Thus, we present the Windows and Linear Transformer (WLiT) for efficient video action recognition, by combining Spatial-Windows attention with Linear attention. We first divide the feature maps into multiple windows along the spatial dimensions and calculate the attention separately inside the windows. Therefore, our model further reduces the computational complexity compared with previous methods. However, the perceptual field of Spatial-Windows attention is small, and global spatiotemporal information cannot be obtained. To address this problem, we then calculate Linear attention along the channel dimension so that the model can capture complete spatiotemporal information. Our method achieves better recognition accuracy with less computational complexity through this mechanism. We conduct extensive experiments on four public datasets, namely Something-Something V2 (SSV2), Kinetics400 (K400), UCF101, and HMDB51. On the SSV2 dataset, our method reduces the computational complexity by 28% and improves the recognition accuracy by 1.6% compared to the State-Of-The-Art (SOTA) method. On the K400 and two other datasets, our method achieves SOTA-level accuracy while reducing the complexity by about 49%.<\/jats:p>","DOI":"10.3390\/s23031616","type":"journal-article","created":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T01:53:54Z","timestamp":1675302834000},"page":"1616","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["WLiT: Windows and Linear Transformer for Video Action Recognition"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4430-3279","authenticated-orcid":false,"given":"Ruoxi","family":"Sun","sequence":"first","affiliation":[{"name":"Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China"},{"name":"School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China"}]},{"given":"Tianzhao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Yong","family":"Wan","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Geomechanics and Geotechnical Engineering, Institute of Rock and Soil Mechanics, Chinese Academy of Sciences, Wuhan 430071, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8042-4609","authenticated-orcid":false,"given":"Fuping","family":"Zhang","sequence":"additional","affiliation":[{"name":"Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China"}]},{"given":"Jianming","family":"Wei","sequence":"additional","affiliation":[{"name":"Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,2,2]]},"reference":[{"key":"ref_1","unstructured":"Zhu, Y., Li, X., Liu, C., Zolfaghari, M., Xiong, Y., Wu, C., Zhang, Z., Tighe, J., Manmatha, R., and Li, M. (2020). A comprehensive study of deep video action recognition. arXiv."},{"key":"ref_2","unstructured":"Ulhaq, A., Akhtar, N., Pogrebna, G., and Mian, A. (2022). Vision Transformers for Action Recognition: A Survey. arXiv."},{"key":"ref_3","unstructured":"Bertasius, G., Wang, H., and Torresani, L. (2021, January 18\u201324). Is space-time attention all you need for video understanding?. Proceedings of the International Conference on Machine Learning, Online."},{"key":"ref_4","first-page":"1","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"NIPS"},{"key":"ref_5","first-page":"4898","article-title":"Understanding the effective receptive field in deep convolutional neural networks","volume":"29","author":"Luo","year":"2016","journal-title":"NIPS"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022, January 19\u201324). A convnet for the 2020s. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201312). Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 22\u201329). Quo vadis, action recognition? a new model and the kinetics dataset. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Venice, Italy.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_9","unstructured":"Feichtenhofer, C., Fan, H., Malik, J., and He, K. (November, January 27). Slowfast networks for video recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea."},{"key":"ref_10","unstructured":"Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., and Sutskever, I. (2020, January 13\u201318). Generative pretraining from pixels. Proceedings of the International Conference on Machine Learning (ICML), Online."},{"key":"ref_11","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Sennrich, R., Haddow, B., and Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv.","DOI":"10.18653\/v1\/P16-1162"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Online.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Vaswani, A., Ramachandran, P., Srinivas, A., Parmar, N., Hechtman, B., and Shlens, J. (2021, January 19\u201325). Scaling local self-attention for parameter efficient visual backbones. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Online.","DOI":"10.1109\/CVPR46437.2021.01270"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Lin, H., Zhang, Z., Sun, Y., He, T., Mueller, J., and Manmatha, R. (2022, January 19\u201324). Resnest: Split-attention networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPRW56347.2022.00309"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ding, M., Xiao, B., Codella, N., Luo, P., Wang, J., and Yuan, L. (2022). DaViT: Dual Attention Vision Transformers. arXiv.","DOI":"10.1007\/978-3-031-20053-3_5"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., and Zhang, L. (2021, January 11\u201317). Cvt: Introducing convolutions to vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Online.","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1007\/s41095-022-0274-8","article-title":"Pvt v2: Improved baselines with pyramid vision transformer","volume":"8","author":"Wang","year":"2022","journal-title":"Comput. Vis. Media"},{"key":"ref_19","unstructured":"Li, R., Su, J., Duan, C., and Zheng, S. (2020). Linear attention mechanism: An efficient attention for semantic segmentation. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1109\/LRA.2020.3039744","article-title":"Real-time semantic segmentation with fast attention","volume":"6","author":"Hu","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_21","unstructured":"Schlag, I., Irie, K., and Schmidhuber, J. (2021, January 18\u201324). Linear transformers are secretly fast weight programmers. Proceedings of the International Conference on Machine Learning (ICML), Online."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., and Hu, H. (2022, January 19\u201324). Video swin transformer. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00320"},{"key":"ref_23","first-page":"1","article-title":"A spatio-temporal descriptor based on 3d-gradients","volume":"Volume 275","author":"Klaser","year":"2008","journal-title":"Proceedings of the BMVC 2008-19th British Machine Vision Conference"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1007\/s11263-005-1838-7","article-title":"On space-time interest points","volume":"64","author":"Laptev","year":"2005","journal-title":"Int. J. Comput. Vis."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s11263-012-0594-8","article-title":"Dense trajectories and motion boundary descriptors for action recognition","volume":"103","author":"Wang","year":"2013","journal-title":"Int. J. Comput. Vis."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1162\/neco.1989.1.4.541","article-title":"Backpropagation applied to handwritten zip code recognition","volume":"1","author":"LeCun","year":"1989","journal-title":"Neural Comput."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Fei-Fei, L. (2014, January 24\u201327). Large-scale video classification with convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., and Toderici, G. (2015, January 8\u201310). Beyond short snippets: Deep networks for video classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299101"},{"key":"ref_30","unstructured":"Simonyan, K., and Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. arXiv."},{"key":"ref_31","first-page":"3468","article-title":"Spatiotemporal residual networks for video action recognition","volume":"3","author":"Christoph","year":"2016","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C. (2020, January 13\u201319). X3d: Expanding architectures for efficient video recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00028"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sun, L., Jia, K., Yeung, D.-Y., and Shi, B.E. (2015, January 7\u201312). Human action recognition using factorized spatio-temporal convolutional networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.522"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and Paluri, M. (2018, January 18\u201322). A closer look at spatiotemporal convolutions for action recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake, UT, USA.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Xie, S., Sun, C., Huang, J., Tu, Z., and Murphy, K. (2018, January 8\u201314). Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01267-0_19"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., and Mei, T. (2017, January 22\u201329). Learning spatio-temporal representation with pseudo-3d residual networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.590"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Van Gool, L. (2016, January 11\u201314). Temporal segment networks: Towards good practices for deep action recognition. Proceedings of the European Conference on Computer Vision, Cham, Switzerland.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_38","unstructured":"Lin, J., Gan, C., and Han, S. (November, January 27). Tsm: Temporal shift module for efficient video understanding. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., and Wang, L. (2020, January 13\u201319). Tea: Temporal excitation and aggregation for action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00099"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wang, L., Tong, Z., Ji, B., and Wu, G. (2021, January 19\u201325). Tdn: Temporal difference networks for efficient action recognition. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Online.","DOI":"10.1109\/CVPR46437.2021.00193"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3505244","article-title":"Transformers in vision: A survey","volume":"54","author":"Khan","year":"2022","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Wang, X., Xiong, X., Neumann, M., Piergiovanni, A., Ryoo, M.S., Angelova, A., Kitani, K.M., and Hua, W. (2020, January 23\u201328). Attentionnas: Spatiotemporal attention cell search for video classification. Proceedings of the European Conference on Computer Vision, Online.","DOI":"10.1007\/978-3-030-58598-3_27"},{"key":"ref_43","unstructured":"Sharir, G., Noy, A., and Zelnik-Manor, L. (2021). An image is worth 16x16 words, what is a video worth?. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Neimark, D., Bar, O., Zohar, M., and Asselmann, D. (2021, January 11\u201317). Video transformer network. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCVW54120.2021.00355"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lu\u010di\u0107, M., and Schmid, C. (2021, January 11\u201317). Vivit: A video vision transformer. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00676"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Fan, H., Xiong, B., Mangalam, K., Li, Y., Yan, Z., Malik, J., and Feichtenhofer, C. (2021, January 11\u201317). Multiscale vision transformers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Online.","DOI":"10.1109\/ICCV48922.2021.00675"},{"key":"ref_47","unstructured":"Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. (2020, January 13\u201318). Transformers are rnns: Fast autoregressive transformers with linear attention. Proceedings of the International Conference on Machine Learning, Online."},{"key":"ref_48","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Shaw, P., Uszkoreit, J., and Vaswani, A. (2018). Self-attention with relative position representations. arXiv.","DOI":"10.18653\/v1\/N18-2074"},{"key":"ref_50","unstructured":"Islam, M.A., Jia, S., and Bruce, N.D. (2020). How much position information do convolutional neural networks encode?. arXiv."},{"key":"ref_51","unstructured":"Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., and Qiao, Y. (2022). Uniformer: Unified transformer for efficient spatiotemporal representation learning. arXiv."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Goyal, R., Ebrahimi Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fruend, I., Yianilos, P., and Mueller-Freitag, M. (2017, January 22\u201329). The\u201d something something\u201d video database for learning and evaluating visual common sense. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.622"},{"key":"ref_53","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre, T. (2011, January 6\u201313). HMDB: A large video database for human motion recognition. Proceedings of the 2011 International conference on computer vision, Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"ref_55","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_56","unstructured":"Fan, Q., Chen, C.-F.R., Kuehne, H., Pistoia, M., and Cox, D. (2019). More is less: Learning efficient video representations by big-little network and depthwise temporal aggregation. arXiv."},{"key":"ref_57","unstructured":"Jiang, B., Wang, M., Gan, W., Wu, W., and Yan, J. (November, January 27). Stm: Spatiotemporal and motion encoding for action recognition. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Kwon, H., Kim, M., Kwak, S., and Cho, M. (2020, January 23\u201328). Motionsqueeze: Neural motion feature learning for video understanding. Proceedings of the European Conference on Computer Vision, Online.","DOI":"10.1007\/978-3-030-58517-4_21"},{"key":"ref_59","unstructured":"Li, K., Li, X., Wang, Y., Wang, J., and Qiao, Y. (2021). CT-net: Channel tensorization network for video classification. arXiv."},{"key":"ref_60","first-page":"19594","article-title":"Space-time mixing attention for video transformer","volume":"34","author":"Bulat","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Alfasly, S., Chui, C.K., Jiang, Q., Lu, J., and Xu, C. (2022). An effective video transformer with synchronized spatiotemporal and spatial self-attention for action recognition. IEEE Trans. Neural Netw. Learn. Syst., 1\u201314.","DOI":"10.1109\/TNNLS.2022.3190367"},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Wang, L., Li, W., Li, W., and Van Gool, L. (2018, January 18\u201322). Appearance-and-relation networks for video classification. Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake, UT, USA.","DOI":"10.1109\/CVPR.2018.00155"},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Stroud, J., Ross, D., Sun, C., Deng, J., and Sukthankar, R. (2020, January 2\u20135). D3d: Distilled 3d networks for video action recognition. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093274"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Zhu, L., Tran, D., Sevilla-Lara, L., Yang, Y., Feiszli, M., and Wang, H. (2020, January 7\u201312). Faster recurrent networks for efficient video classification. Proceedings of the AAAI Conference on Artificial Intelligence, Hilton New York Midtown, NY, USA.","DOI":"10.1609\/aaai.v34i07.7012"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Zhang, Y.J.S. (2022). MEST: An Action Recognition Network with Motion Encoder and Spatio-Temporal Module. Sensors, 22.","DOI":"10.3390\/s22176595"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1616\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:21:40Z","timestamp":1760120500000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/3\/1616"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,2,2]]},"references-count":65,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["s23031616"],"URL":"https:\/\/doi.org\/10.3390\/s23031616","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,2,2]]}}}