{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:00:55Z","timestamp":1750309255616,"version":"3.41.0"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,1,11]],"date-time":"2024-01-11T00:00:00Z","timestamp":1704931200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62371350, 62071339, 62171326"],"award-info":[{"award-number":["62371350, 62071339, 62171326"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangdong-Macau Joint Laboratory for Advanced and Intelligent Computing","award":["2020B1212030003"],"award-info":[{"award-number":["2020B1212030003"]}]},{"name":"Guangdong High-Level Innovation Research Institute","award":["2019B0909005"],"award-info":[{"award-number":["2019B0909005"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,4,30]]},"abstract":"<jats:p>Image quality assessment (IQA) is an important problem in computer vision with many applications. We propose a transformer-based multi-task learning framework for the IQA task. Two subtasks: constructing an auxiliary information error map and completing image quality prediction, are jointly optimized using a shared feature extractor. We use visual transformers (ViT) as a feature extractor for feature extraction and guide ViT to focus on image quality-related features by building auxiliary information error map subtask. In particular, we propose a fusion network that includes a channel focus module. Unlike the fusion methods commonly used in previous IQA methods, we use the fusion network, including the channel attention module, to fuse the auxiliary information error map features with the image features, which facilitates the model to mine the image quality features for more accurate image quality assessment. And by jointly optimizing the two subtasks, ViT focuses more on extracting image quality features and building a more precise mapping from feature representation to quality score. With slight adjustments to the model, our approach can be used in both no-reference (NR) and full-reference (FR) IQA environments. We evaluate the proposed method in multiple IQA databases, showing better performance than state-of-the-art FR and NR IQA methods.<\/jats:p>","DOI":"10.1145\/3635716","type":"journal-article","created":{"date-parts":[[2023,12,4]],"date-time":"2023-12-04T12:07:40Z","timestamp":1701691660000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Auxiliary Information Guided Self-attention for Image Quality Assessment"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1334-1827","authenticated-orcid":false,"given":"Jifan","family":"Yang","sequence":"first","affiliation":[{"name":"School of Computer, Wuhan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9796-488X","authenticated-orcid":false,"given":"Zhongyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer, Wuhan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8277-797X","authenticated-orcid":false,"given":"Guangcheng","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Transportation and Civil Engineering, Nantong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4882-5787","authenticated-orcid":false,"given":"Baojin","family":"Huang","sequence":"additional","affiliation":[{"name":"School of Computer, Wuhan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3001-7957","authenticated-orcid":false,"given":"Yuhong","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer, Wuhan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6933-3298","authenticated-orcid":false,"given":"Weiping","family":"Tu","sequence":"additional","affiliation":[{"name":"School of Computer, Wuhan University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1,11]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW53098.2021.00044"},{"key":"e_1_3_1_3_2","first-page":"109","volume-title":"Human Vision and Electronic Imaging XIII","author":"Ayd\u0131n Tun\u00e7 O.","year":"2008","unstructured":"Tun\u00e7 O. Ayd\u0131n, Rafal Mantiuk, and Hans-Peter Seidel. 2008. Extending quality metrics to full luminance range images. In Human Vision and Electronic Imaging XIII, Vol. 6806. International Society for Optics and Photonics, 109\u2013118."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2760518"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2007.901820"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447393"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413489"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.2991546"},{"key":"e_1_3_1_9_2","article-title":"VTAMIQ: Transformers for attention modulated image quality assessment","author":"Chubarau Andrei","year":"2021","unstructured":"Andrei Chubarau and James Clark. 2021. VTAMIQ: Transformers for attention modulated image quality assessment. Retrieved from https:\/\/arXiv:2110.01655","journal-title":"Retrieved from"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2757139"},{"key":"e_1_3_1_12_2","article-title":"An image is worth 16x16 words: Transformers for image recognition at scale","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR\u201921).","journal-title":"Proceedings of the International Conference on Learning Representations (ICLR\u201921)"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-vision-100419-120301"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00373"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.01.054"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2500021"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV51458.2022.00404"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2703148"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.2967829"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2913372"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3171604"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3569943"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.109142"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58621-8_37"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.224"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00510"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.213"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_3_1_29_2","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1007\/978-3-642-35289-8_3","volume-title":"Neural Networks: Tricks of the Trade, 2nd ed","author":"LeCun Yann A.","year":"2012","unstructured":"Yann A. LeCun, L\u00e9on Bottou, Genevieve B. Orr, and Klaus-Robert M\u00fcller. 2012. Efficient backprop. In Neural Networks: Tricks of the Trade, 2nd ed. Springer, Berlin, 9\u201348."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2011.01.005"},{"key":"e_1_3_1_31_2","article-title":"Learning from rankings for no-reference image quality assessment by Siamese network","author":"Liu Xialei","year":"2016","unstructured":"Xialei Liu. 2016. Learning from rankings for no-reference image quality assessment by Siamese network. Computer VIsion Center, Master thesis, Universitat Autonoma de Barcelona, Barcelona, Spain.","journal-title":"Computer VIsion Center, Master thesis, Universitat Autonoma de Barcelona, Barcelona, Spain"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3414837"},{"key":"e_1_3_1_33_2","unstructured":"Ilya Loshchilov and Frank Hutter. 2018. Fixing Weight Decay Regularization in Adam. Retrieved from https:\/\/openreview.net\/forum?id=rk6qdGgCZ"},{"key":"e_1_3_1_34_2","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1117\/12.586757","volume-title":"Human Vision and Electronic Imaging X","author":"Mantiuk Rafal","year":"2005","unstructured":"Rafal Mantiuk, Scott J. Daly, Karol Myszkowski, and Hans-Peter Seidel. 2005. Predicting visible differences in high dynamic range images: Model and its calibration. In Human Vision and Electronic Imaging X, Vol. 5666. International Society for Optics and Photonics, SPIE, 204\u2013214."},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/2010324.1964935"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2214050"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2012.2227726"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2011.2147325"},{"key":"e_1_3_1_39_2","article-title":"Pytorch: An imperative style, high-performance deep learning library","volume":"32","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga et\u00a0al. 2019. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Info. Process. Syst. 32 (2019).","journal-title":"Adv. Neural Info. Process. Syst."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2014.10.009"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00194"},{"key":"e_1_3_1_42_2","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/3-540-49430-8_3","volume-title":"Neural Networks: Tricks of the Trade","author":"Prechelt Lutz","year":"1998","unstructured":"Lutz Prechelt. 1998. Early stopping\u2014but when? In Neural Networks: Tricks of the Trade. Springer, Berlin, 55\u201369."},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2017.11.001"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2191563"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2020.3030895"},{"key":"e_1_3_1_46_2","unstructured":"H. R. Sheikh. 2005. LIVE image quality assessment database release 2. http:\/\/live.ece.utexas.edu\/research\/quality (2005)."},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2005.859378"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2005.859389"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2021.3074181"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2945675"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACSSC.2003.1292216"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2014.2304714"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3002478"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2465145"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2023.109552"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350990"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2947352"},{"key":"e_1_3_1_59_2","doi-asserted-by":"crossref","unstructured":"Yang Yang Yingqiu Ding Ming Cheng and Weiming Zhang. 2023. No-reference quality assessment for contrast-distorted images based on gray and color-gray-difference Space. ACM Transactions on Multimedia Computing Communications and Applications 19 2 (2023) 1\u201320.","DOI":"10.1145\/3555355"},{"key":"e_1_3_1_60_2","first-page":"1098","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Ye Peng","year":"2012","unstructured":"Peng Ye, Jayant Kumar, Le Kang, and David Doermann. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1098\u20131105."},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00363"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP42928.2021.9506075"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/MMUL.2014.50"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2014.2346028"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2011.2109730"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2461603"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2902484"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2016.2604042"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3635716","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3635716","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T23:56:59Z","timestamp":1750291019000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3635716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,11]]},"references-count":67,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,4,30]]}},"alternative-id":["10.1145\/3635716"],"URL":"https:\/\/doi.org\/10.1145\/3635716","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"type":"print","value":"1551-6857"},{"type":"electronic","value":"1551-6865"}],"subject":[],"published":{"date-parts":[[2024,1,11]]},"assertion":[{"value":"2022-08-30","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-28","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}