{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T17:32:04Z","timestamp":1771954324037,"version":"3.50.1"},"publisher-location":"California","reference-count":0,"publisher":"International Joint Conferences on Artificial Intelligence Organization","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7]]},"abstract":"<jats:p>Scene text image super-resolution (STISR) has been regarded as an important pre-processing task for text recognition from low-resolution scene text images. Most recent approaches use the recognizer's feedback as clues to guide super-resolution. However, directly using recognition clue has two problems: 1) Compatibility. It is in the form of probability distribution, has an obvious modal gap with STISR - a pixel-level task; 2) Inaccuracy. it usually contains wrong information, thus will mislead the main task and degrade super-resolution performance. In this paper, we present a novel method C3-STISR that jointly exploits the recognizer's feedback, visual and linguistical information as clues to guide super-resolution. Here, visual clue is from the images of texts predicted by the recognizer, which is informative and more compatible with the STISR task; while linguistical clue is generated by a pre-trained character-level language model, which is able to correct the predicted texts. We design effective extraction and fusion mechanisms for the triple cross-modal clues to generate a comprehensive and unified guidance for super-resolution. Extensive experiments on TextZoom show that C3-STISR outperforms the SOTA methods in fidelity and recognition performance. Code is available in https:\/\/github.com\/zhaominyiz\/C3-STISR.<\/jats:p>","DOI":"10.24963\/ijcai.2022\/238","type":"proceedings-article","created":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T02:55:56Z","timestamp":1657940156000},"page":"1707-1713","source":"Crossref","is-referenced-by-count":35,"title":["C3-STISR: Scene Text Image Super-resolution with Triple Clues"],"prefix":"10.24963","author":[{"given":"Minyi","family":"Zhao","sequence":"first","affiliation":[{"name":"Fudan University"}]},{"given":"Miao","family":"Wang","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Fan","family":"Bai","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Bingjia","family":"Li","sequence":"additional","affiliation":[{"name":"Fudan University"}]},{"given":"Jie","family":"Wang","sequence":"additional","affiliation":[{"name":"ByteDance"}]},{"given":"Shuigeng","family":"Zhou","sequence":"additional","affiliation":[{"name":"Fudan University"}]}],"member":"10584","event":{"name":"Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}","theme":"Artificial Intelligence","location":"Vienna, Austria","acronym":"IJCAI-2022","number":"31","sponsor":["International Joint Conferences on Artificial Intelligence Organization (IJCAI)"],"start":{"date-parts":[[2022,7,23]]},"end":{"date-parts":[[2022,7,29]]}},"container-title":["Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence"],"original-title":[],"deposited":{"date-parts":[[2022,7,18]],"date-time":"2022-07-18T11:08:32Z","timestamp":1658142512000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ijcai.org\/proceedings\/2022\/238"}},"subtitle":[],"proceedings-subject":"Artificial Intelligence Research Articles","short-title":[],"issued":{"date-parts":[[2022,7]]},"references-count":0,"URL":"https:\/\/doi.org\/10.24963\/ijcai.2022\/238","relation":{},"subject":[],"published":{"date-parts":[[2022,7]]}}}