{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:21:22Z","timestamp":1750220482226,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":36,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,10,26]],"date-time":"2021-10-26T00:00:00Z","timestamp":1635206400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,10,26]]},"DOI":"10.1145\/3459637.3481922","type":"proceedings-article","created":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T15:53:43Z","timestamp":1636991623000},"page":"3787-3795","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["From Pixels to Words"],"prefix":"10.1145","author":[{"given":"Pranay","family":"Dugar","sequence":"first","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]},{"given":"Rajesh Shreedhar","family":"Bhat","sequence":"additional","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]},{"given":"Asit Sharad","family":"Tarsode","sequence":"additional","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]},{"given":"Uddipto","family":"Dutta","sequence":"additional","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]},{"given":"Kunal","family":"Banerjee","sequence":"additional","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]},{"given":"Anirban","family":"Chatterjee","sequence":"additional","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]},{"given":"Vijay Srinivas","family":"Agneeswaran","sequence":"additional","affiliation":[{"name":"Walmart Global Tech, Bangalore, India"}]}],"member":"320","published-online":{"date-parts":[[2021,10,30]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs. https:\/\/developer.nvidia.com\/blog\/sony-breaks-resnet-50-training-record-with-nvidia-v100-tensor-core-gpus\/ Retrieved","author":"Alarcon Nefi","year":"2021","unstructured":"Nefi Alarcon . 2018. SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs. https:\/\/developer.nvidia.com\/blog\/sony-breaks-resnet-50-training-record-with-nvidia-v100-tensor-core-gpus\/ Retrieved May 16, 2021 from Nefi Alarcon. 2018. SONY Breaks ResNet-50 Training Record with NVIDIA V100 Tensor Core GPUs. https:\/\/developer.nvidia.com\/blog\/sony-breaks-resnet-50-training-record-with-nvidia-v100-tensor-core-gpus\/ Retrieved May 16, 2021 from"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Jeonghun Baek Geewook Kim Junyeop Lee Sungrae Park Dongyoon Han Sangdoo Yun Seong Joon Oh and Hwalsuk Lee. 2019 a. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis. In ICCV. 4714--4722.  Jeonghun Baek Geewook Kim Junyeop Lee Sungrae Park Dongyoon Han Sangdoo Yun Seong Joon Oh and Hwalsuk Lee. 2019 a. What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis. In ICCV. 4714--4722.","DOI":"10.1109\/ICCV.2019.00481"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Youngmin Baek Bado Lee Dongyoon Han Sangdoo Yun and Hwalsuk Lee. 2019 b. Character Region Awareness for Text Detection. In CVPR. 9365--9374.  Youngmin Baek Bado Lee Dongyoon Han Sangdoo Yun and Hwalsuk Lee. 2019 b. Character Region Awareness for Text Detection. In CVPR. 9365--9374.","DOI":"10.1109\/CVPR.2019.00959"},{"key":"e_1_3_2_1_4_1","volume-title":"The OpenCV Library. Dr. Dobb's Journal of Software Tools","author":"Bradski G.","year":"2000","unstructured":"G. Bradski . 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools ( 2000 ). G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000)."},{"key":"e_1_3_2_1_5_1","volume-title":"Imagenet: A large-scale hierarchical image database. In CVPR. 248--255.","author":"Deng Jia","year":"2009","unstructured":"Jia Deng , Wei Dong , Richard Socher , Li-Jia Li , Kai Li , and Li Fei-Fei . 2009 . Imagenet: A large-scale hierarchical image database. In CVPR. 248--255. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. 248--255."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Ankush Gupta Andrea Vedaldi and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images. In CVPR. 2315--2324.  Ankush Gupta Andrea Vedaldi and Andrew Zisserman. 2016. Synthetic Data for Text Localisation in Natural Images. In CVPR. 2315--2324.","DOI":"10.1109\/CVPR.2016.254"},{"key":"e_1_3_2_1_8_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0823-z"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969442.2969465"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2015.7333942"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2013.221"},{"key":"e_1_3_2_1_13_1","unstructured":"Chen-Yu Lee and Simon Osindero. 2016. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild. In CVPR. 2231--2239.  Chen-Yu Lee and Simon Osindero. 2016. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild. In CVPR. 2231--2239."},{"key":"e_1_3_2_1_14_1","unstructured":"Minghui Liao Baoguang Shi and Xiang Bai. 2018. TextBoxes  Minghui Liao Baoguang Shi and Xiang Bai. 2018. TextBoxes"},{"volume-title":"A Single-Shot Oriented Scene Text Detector","year":"2018","key":"e_1_3_2_1_15_1","unstructured":": A Single-Shot Oriented Scene Text Detector . IEEE Transactions on Image Processing , Vol. 27 , 8 ( 2018 ), 3676--3690. : A Single-Shot Oriented Scene Text Detector. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3676--3690."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298023.3298172"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/938980.939531"},{"key":"e_1_3_2_1_18_1","unstructured":"Paulius Micikevicius Sharan Narang Jonah Alben Gregory F. Diamos Erich Elsen David Garc\u00eda Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh and Hao Wu. 2018. Mixed Precision Training. In ICLR.  Paulius Micikevicius Sharan Narang Jonah Alben Gregory F. Diamos Erich Elsen David Garc\u00eda Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh and Hao Wu. 2018. Mixed Precision Training. In ICLR."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"crossref","unstructured":"Anand Mishra Karteek Alahari and C. V. Jawahar. 2012. Scene Text Recognition using Higher Order Language Priors. In BMVC. 1--11.  Anand Mishra Karteek Alahari and C. V. Jawahar. 2012. Scene Text Recognition using Higher Order Language Priors. In BMVC. 1--11.","DOI":"10.5244\/C.26.127"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2355095"},{"volume-title":"Nvidia Tesla V100 GPU Architecture. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf Retrieved","year":"2021","key":"e_1_3_2_1_21_1","unstructured":"nvidia.com. 2017. Nvidia Tesla V100 GPU Architecture. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf Retrieved May 19, 2021 from nvidia.com. 2017. Nvidia Tesla V100 GPU Architecture. https:\/\/images.nvidia.com\/content\/volta-architecture\/pdf\/volta-architecture-whitepaper.pdf Retrieved May 19, 2021 from"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/3454287.3455008"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.76"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Jeff Rasley Samyam Rajbhandari Olatunji Ruwase and Yuxiong He. 2020. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In KDD. 3505--3506.  Jeff Rasley Samyam Rajbhandari Olatunji Ruwase and Yuxiong He. 2020. DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. In KDD. 3505--3506.","DOI":"10.1145\/3394486.3406703"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2014.07.008"},{"key":"e_1_3_2_1_26_1","first-page":"234","article-title":"U-Net: Convolutional Networks for Biomedical Image Segmentation","volume":"9351","author":"Ronneberger Olaf","year":"2015","unstructured":"Olaf Ronneberger , Philipp Fischer , and Thomas Brox . 2015 . U-Net: Convolutional Networks for Biomedical Image Segmentation . In MICCAI , Vol. 9351. 234 -- 241 . Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In MICCAI, Vol. 9351. 234--241.","journal-title":"MICCAI"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2646371"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33715-4_54"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01258-8_36"},{"key":"e_1_3_2_1_30_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.  Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1080\/713821475"},{"volume-title":"https:\/\/www.walmart.com\/ip\/Kraft-America-s-Favorite-Sandwich-Spread-15-fl-oz-Jar\/1465165 Online","year":"2017","key":"e_1_3_2_1_32_1","unstructured":"Walmart.com. 2017. https:\/\/www.walmart.com\/ip\/Kraft-America-s-Favorite-Sandwich-Spread-15-fl-oz-Jar\/1465165 Online ; accessed December 16, 2017 . Walmart.com. 2017. https:\/\/www.walmart.com\/ip\/Kraft-America-s-Favorite-Sandwich-Spread-15-fl-oz-Jar\/1465165 Online; accessed December 16, 2017."},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126402"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045336"},{"key":"e_1_3_2_1_35_1","unstructured":"Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei Guoxin Wang Yijuan Lu Dinei A. F. Flor\u00eancio Cha Zhang Wanxiang Che Min Zhang and Lidong Zhou. 2021. LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding. In ACL\/IJCNLP. 2579--2591.  Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei Guoxin Wang Yijuan Lu Dinei A. F. Flor\u00eancio Cha Zhang Wanxiang Che Min Zhang and Lidong Zhou. 2021. LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding. In ACL\/IJCNLP. 2579--2591."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Zheng Zhang Wei Shen Cong Yao and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes. In CVPR. 2558--2567.  Zheng Zhang Wei Shen Cong Yao and Xiang Bai. 2015. Symmetry-based text line detection in natural scenes. In CVPR. 2558--2567.","DOI":"10.1109\/CVPR.2015.7298871"}],"event":{"name":"CIKM '21: The 30th ACM International Conference on Information and Knowledge Management","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"],"location":"Virtual Event Queensland Australia","acronym":"CIKM '21"},"container-title":["Proceedings of the 30th ACM International Conference on Information &amp; Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459637.3481922","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3459637.3481922","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:58Z","timestamp":1750193338000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459637.3481922"}},"subtitle":["A Scalable Journey of Text Information from Product Images to Retail Catalog"],"short-title":[],"issued":{"date-parts":[[2021,10,26]]},"references-count":36,"alternative-id":["10.1145\/3459637.3481922","10.1145\/3459637"],"URL":"https:\/\/doi.org\/10.1145\/3459637.3481922","relation":{},"subject":[],"published":{"date-parts":[[2021,10,26]]},"assertion":[{"value":"2021-10-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}