{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T16:49:18Z","timestamp":1777567758393,"version":"3.51.4"},"publisher-location":"New York, NY, USA","reference-count":41,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,17]],"date-time":"2021-10-17T00:00:00Z","timestamp":1634428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Youth Innovation Promotion Association CAS","award":["2018497"],"award-info":[{"award-number":["2018497"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61836011 and 61632019"],"award-info":[{"award-number":["61836011 and 61632019"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,17]]},"DOI":"10.1145\/3474085.3475388","type":"proceedings-article","created":{"date-parts":[[2021,10,18]],"date-time":"2021-10-18T11:31:01Z","timestamp":1634556661000},"page":"273-281","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":67,"title":["DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction"],"prefix":"10.1145","author":[{"given":"Hao","family":"Feng","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuechen","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wengang","family":"Zhou","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiajun","family":"Deng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Houqiang","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,17]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2001.937649"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"e_1_3_2_1_3_1","volume-title":"2020 b. Pre-Trained Image Processing Transformer. arxiv","author":"Chen Hanting","year":"2012","unstructured":"Hanting Chen , Yunhe Wang , Tianyu Guo , Chang Xu , Yiping Deng , Zhenhua Liu , Siwei Ma , Chunjing Xu , Chao Xu , and Wen Gao . 2020 b. Pre-Trained Image Processing Transformer. arxiv : 2012 .00364 [cs.CV] Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and Wen Gao. 2020 b. Pre-Trained Image Processing Transformer. arxiv: 2012.00364 [cs.CV]"},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the International Conference on Machine Learning. 1691--1703","author":"Chen Mark","year":"2020","unstructured":"Mark Chen , Alec Radford , Rewon Child , Jeffrey Wu , Heewoo Jun , David Luan , and Ilya Sutskever . 2020 a. Generative pretraining from pixels . In Proceedings of the International Conference on Machine Learning. 1691--1703 . Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, and Ilya Sutskever. 2020 a. Generative pretraining from pixels. In Proceedings of the International Conference on Machine Learning. 1691--1703."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.40"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.461"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00022"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10479-005-5724-z"},{"key":"e_1_3_2_1_9_1","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","volume":"1","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding . In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 (Long and Short Papers) . 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) . 4171--4186."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the International Conference on Learning Representations .","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy , Lucas Beyer , Alexander Kolesnikov , Dirk Weissenborn , Xiaohua Zhai , Thomas Unterthiner , Mostafa Dehghani , Matthias Minderer , Georg Heigold , Sylvain Gelly , Jakob Uszkoreit , and Neil Houlsby . 2021 . An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale . Proceedings of the International Conference on Learning Representations . Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations ."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2013.88"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/938979.939197"},{"key":"e_1_3_2_1_13_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba . 2015 . Adam : A Method for Stochastic Optimization. CoRR , Vol. abs\/ 1412 .6980 (2015). Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. CoRR , Vol. abs\/1412.6980 (2015)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2009.2019301"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2001.958227"},{"key":"e_1_3_2_1_16_1","volume-title":"Binary codes capable of correcting deletions, insertions, and reversals","author":"Levenshtein Vladimir I","year":"1966","unstructured":"Vladimir I Levenshtein . 1966. Binary codes capable of correcting deletions, insertions, and reversals . , Vol. 10 ( 1966 ), 707--710. Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. , Vol. 10 (1966), 707--710."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3355089.3356563"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2010.147"},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the International Conference on Learning Representations .","author":"Loshchilov I.","unstructured":"I. Loshchilov and F. Hutter . 2019. Decoupled Weight Decay Regularization . In Proceedings of the International Conference on Learning Representations . I. Loshchilov and F. Hutter. 2019. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations ."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00494"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58610-2_13"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.497"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/11553595_131"},{"key":"e_1_3_2_1_24_1","volume-title":"Image Transformer. In Proceedings of the 35th International Conference on Machine Learning","volume":"80","author":"Parmar Niki","year":"2018","unstructured":"Niki Parmar , Ashish Vaswani , Jakob Uszkoreit , Lukasz Kaiser , Noam Shazeer , Alexander Ku , and Dustin Tran . 2018 . Image Transformer. In Proceedings of the 35th International Conference on Machine Learning , Vol. 80 . 4055--4064. Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image Transformer. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. 4055--4064."},{"key":"e_1_3_2_1_25_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).  Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017)."},{"key":"e_1_3_2_1_26_1","volume-title":"Pattern Recognition","volume":"106","author":"Qin Xuebin","year":"2020","unstructured":"Xuebin Qin , Zichen Zhang , Chenyang Huang , Masood Dehghan , Osmar R. Zaiane , and Martin Jagersand . 2020 . U2-Net: Going deeper with nested U-structure for salient object detection . Pattern Recognition , Vol. 106 (Oct 2020). Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, and Martin Jagersand. 2020. U2-Net: Going deeper with nested U-structure for salient object detection. Pattern Recognition , Vol. 106 (Oct 2020)."},{"key":"e_1_3_2_1_27_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Technical report OpenAI.  Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Technical report OpenAI."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24574-4_28"},{"key":"e_1_3_2_1_29_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv: 1409.1556 [cs.CV]  Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arxiv: 1409.1556 [cs.CV]"},{"key":"e_1_3_2_1_30_1","volume-title":"Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications","author":"Smith Leslie N","year":"2019","unstructured":"Leslie N Smith and Nicholay Topin . 2019 . Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications , Vol. 11006 . International Society for Optics and Photonics , 1100612. Leslie N Smith and Nicholay Topin. 2019. Super-convergence: Very fast training of neural networks using large learning rates. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications , Vol. 11006. International Society for Optics and Photonics, 1100612."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/1304596.1304846"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00542"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00162"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2003.819861"},{"key":"e_1_3_2_1_36_1","first-page":"1398","article-title":"Multiscale structural similarity for image quality assessment. In Proceedings of the Asilomar Conference on Signals","volume":"2","author":"Wang Zhou","year":"2003","unstructured":"Zhou Wang , Eero P. Simoncelli , and Alan C. Bovik . 2003 . Multiscale structural similarity for image quality assessment. In Proceedings of the Asilomar Conference on Signals , Systems Computers , Vol. 2. 1398 -- 1402 . Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In Proceedings of the Asilomar Conference on Signals, Systems Computers, Vol. 2. 1398--1402.","journal-title":"Systems Computers"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/645890.671100"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-57058-3_10"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3454804"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2675980"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2007.70831"}],"event":{"name":"MM '21: ACM Multimedia Conference","location":"Virtual Event China","acronym":"MM '21","sponsor":["SIGMM ACM Special Interest Group on Multimedia"]},"container-title":["Proceedings of the 29th ACM International Conference on Multimedia"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475388","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474085.3475388","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:32Z","timestamp":1750193312000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474085.3475388"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,17]]},"references-count":41,"alternative-id":["10.1145\/3474085.3475388","10.1145\/3474085"],"URL":"https:\/\/doi.org\/10.1145\/3474085.3475388","relation":{},"subject":[],"published":{"date-parts":[[2021,10,17]]},"assertion":[{"value":"2021-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}