{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T04:46:25Z","timestamp":1774932385092,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T00:00:00Z","timestamp":1672358400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62005269"],"award-info":[{"award-number":["62005269"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>To address the challenge of no-reference image quality assessment (NR-IQA) for authentically and synthetically distorted images, we propose a novel network called the Combining Convolution and Self-Attention for Image Quality Assessment network (Conv-Former). Our model uses a multi-stage transformer architecture similar to that of ResNet-50 to represent appropriate perceptual mechanisms in image quality assessment (IQA) to build an accurate IQA model. We employ adaptive learnable position embedding to handle images with arbitrary resolution. We propose a new transformer block (TB) by taking advantage of transformers to capture long-range dependencies, and of local information perception (LIP) to model local features for enhanced representation learning. The module increases the model\u2019s understanding of the image content. Dual path pooling (DPP) is used to keep more contextual image quality information in feature downsampling. Experimental results verify that Conv-Former not only outperforms the state-of-the-art methods on authentic image databases, but also achieves competing performances on synthetic image databases which demonstrate the strong fitting performance and generalization capability of our proposed model.<\/jats:p>","DOI":"10.3390\/s23010427","type":"journal-article","created":{"date-parts":[[2023,1,2]],"date-time":"2023-01-02T03:08:59Z","timestamp":1672628939000},"page":"427","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Conv-Former: A Novel Network Combining Convolution and Self-Attention for Image Quality Assessment"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7115-6166","authenticated-orcid":false,"given":"Lintao","family":"Han","sequence":"first","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"},{"name":"College of Materials Science and Opto-Electronic Technology, University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3628-0991","authenticated-orcid":false,"given":"Hengyi","family":"Lv","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Yuchen","family":"Zhao","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Hailong","family":"Liu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Guoling","family":"Bi","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Zhiyong","family":"Yin","sequence":"additional","affiliation":[{"name":"Department of Electrical and Optical Engineering, Space Engineering University, Beijing 101416, China"}]},{"given":"Yuqiang","family":"Fang","sequence":"additional","affiliation":[{"name":"Department of Electrical and Optical Engineering, Space Engineering University, Beijing 101416, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Le, Q.-T., Ladret, P., Nguyen, H.-T., and Caplier, A. (2022). Computational Analysis of Correlations between Image Aesthetic and Image Naturalness in the Relation with Image Quality. J. Imaging, 8.","DOI":"10.3390\/jimaging8060166"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"3998","DOI":"10.1109\/TIP.2018.2831899","article-title":"NIMA: Neural Image Assessment","volume":"27","author":"Talebi","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Han, L.T., Zhao, Y.C., Lv, H.Y., Zhang, Y.S., Liu, H.L., and Bi, G.L. (2022). Remote Sensing Image Denoising Based on Deep and Shallow Feature Fusion and Attention Mechanism. Remote Sens., 14.","DOI":"10.3390\/rs14051243"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1109\/TCSVT.2018.2886771","article-title":"Blind Image Quality Assessment Using a Deep Bilinear Convolutional Neural Network","volume":"30","author":"Zhang","year":"2020","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2959","DOI":"10.1109\/26.477498","article-title":"Image quality measures and their performance","volume":"43","author":"Eskicioglu","year":"1995","journal-title":"IEEE Trans. Commun."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1109\/TIP.2003.819861","article-title":"Image quality assessment: From error visibility to structural similarity","volume":"13","author":"Wang","year":"2004","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hui, Q., Sheng, Y.X., Yang, L.K., Li, Q.M., and Chai, L. (2019, January 3\u20135). Reduced-Reference Image Quality Assessment for Single-Image Super-Resolution Based on Wavelet Domain. Proceedings of the 31st Chinese Control and Decision Conference (CCDC), Nanchang, China.","DOI":"10.1109\/CCDC.2019.8833247"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"4695","DOI":"10.1109\/TIP.2012.2214050","article-title":"No-Reference Image Quality Assessment in the Spatial Domain","volume":"21","author":"Mittal","year":"2012","journal-title":"IEEE Trans. Image Process."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4444","DOI":"10.1109\/TIP.2016.2585880","article-title":"Blind Image Quality Assessment Based on High Order Statistics Aggregation","volume":"25","author":"Xu","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1109\/LSP.2012.2227726","article-title":"Making a \u201cCompletely Blind\u201d Image Quality Analyzer","volume":"20","author":"Mittal","year":"2013","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_11","unstructured":"Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv."},{"key":"ref_12","first-page":"1","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_13","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (July, January 26). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"206","DOI":"10.1109\/JSTSP.2016.2639328","article-title":"Fully Deep Blind Image Quality Predictor","volume":"11","author":"Kim","year":"2017","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_15","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_16","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv."},{"key":"ref_17","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","first-page":"1","article-title":"Xlnet: Generalized autoregressive pretraining for language understanding","volume":"32","author":"Yang","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_19","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23\u201328). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K.M., Zhang, X.Y., Ren, S.Q., and Sun, J. (2016, January 27\u201330). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., and Xu, C. (2022, January 19\u201320). Cmt: Convolutional neural networks meet vision transformers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"ref_23","unstructured":"Tan, M., and Le, Q. (2019, January 9\u201315). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Graves, A., Mohamed, A.-r., and Hinton, G. (2013, January 26\u201331). Speech recognition with deep recurrent neural networks. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, Q., Wu, B., Zhu, P., Li, P., and Hu, Q. (2020, January 13\u201319). ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01155"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu, H. (2019, January 15\u201320). Dual attention network for scene segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00326"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"2387","DOI":"10.1109\/LSP.2015.2487369","article-title":"A patch-structure representation method for quality assessment of contrast changed images","volume":"22","author":"Wang","year":"2015","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"84105","DOI":"10.1109\/ACCESS.2020.2991842","article-title":"No-reference quality assessment for contrast-distorted images","volume":"8","author":"Liu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"4559","DOI":"10.1109\/TCYB.2016.2575544","article-title":"No-reference quality metric of contrast-distorted images based on information maximization","volume":"47","author":"Gu","year":"2016","journal-title":"IEEE Trans. Cybern."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1109\/LSP.2010.2043888","article-title":"A Two-Step Framework for Constructing Blind Image Quality Indices","volume":"17","author":"Moorthy","year":"2010","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"3350","DOI":"10.1109\/TIP.2011.2147325","article-title":"Blind Image Quality Assessment: From Natural Scene Statistics to Perceptual Quality","volume":"20","author":"Moorthy","year":"2011","journal-title":"IEEE Trans. Image Process."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/LSP.2010.2045550","article-title":"A DCT Statistics-Based Blind Image Quality Index","volume":"17","author":"Saad","year":"2010","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3339","DOI":"10.1109\/TIP.2012.2191563","article-title":"Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain","volume":"21","author":"Saad","year":"2012","journal-title":"IEEE Trans. Image Process."},{"key":"ref_36","unstructured":"Ye, P., Kumar, J., Kang, L., and Doermann, D. (2012, January 16\u201321). Unsupervised Feature Learning Framework for No-reference Image Quality Assessment. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA."},{"key":"ref_37","unstructured":"Zhang, P., Zhou, W.G., Wu, L., and Li, H.Q. (2015, January 7\u201312). SOM: Semantic Obviousness Metric for Image Quality Assessment. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3440","DOI":"10.1109\/TIP.2006.881959","article-title":"A statistical evaluation of recent full reference image quality assessment algorithms","volume":"15","author":"Sheikh","year":"2006","journal-title":"IEEE Trans. Image Process."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/j.image.2014.10.009","article-title":"Image database TID2013: Peculiarities, results and perspectives","volume":"30","author":"Ponomarenko","year":"2015","journal-title":"Signal Process.-Image Commun."},{"key":"ref_40","first-page":"21","article-title":"Most apparent distortion: Full-reference image quality assessment and the role of strategy","volume":"19","author":"Larson","year":"2010","journal-title":"J. Electron. Imaging"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"372","DOI":"10.1109\/TIP.2015.2500021","article-title":"Massive online crowdsourced study of subjective and objective picture quality","volume":"25","author":"Ghadiyaram","year":"2015","journal-title":"IEEE Trans. Image Process."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"4041","DOI":"10.1109\/TIP.2020.2967829","article-title":"KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment","volume":"29","author":"Hosu","year":"2020","journal-title":"IEEE Trans. Image Process."},{"key":"ref_43","first-page":"30","article-title":"TID2008\u2014A database for evaluation of full-reference visual quality assessment metrics","volume":"10","author":"Ponomarenko","year":"2009","journal-title":"Adv. Mod. Radioelectron."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1145\/2812802","article-title":"YFCC100M: The new data in multimedia research","volume":"59","author":"Thomee","year":"2016","journal-title":"Commun. ACM"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/1\/427\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:48:41Z","timestamp":1760147321000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/1\/427"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,30]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["s23010427"],"URL":"https:\/\/doi.org\/10.3390\/s23010427","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,30]]}}}