{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T04:12:08Z","timestamp":1754107928571,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":31,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,12,18]],"date-time":"2018-12-18T00:00:00Z","timestamp":1545091200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,12,18]]},"DOI":"10.1145\/3293353.3293391","type":"proceedings-article","created":{"date-parts":[[2020,5,4]],"date-time":"2020-05-04T22:07:32Z","timestamp":1588630052000},"page":"1-9","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["A Bottom-Up and Top-Down Approach for Image Captioning using Transformer"],"prefix":"10.1145","author":[{"given":"Sandeep Narayan","family":"Parameswaran","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Madras, Chennai, Tamil Nadu, India"}]},{"given":"Sukhendu","family":"Das","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Madras, Chennai, Tamil Nadu, India"}]}],"member":"320","published-online":{"date-parts":[[2020,5,3]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46454-1_24"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00636"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"Aneja Jyoti","key":"e_1_3_2_1_3_1","unstructured":"Jyoti Aneja , Aditya Deshpande , and Alexander G. Schwing . 2018. Convolutional Image Captioning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Jyoti Aneja, Aditya Deshpande, and Alexander G. Schwing. 2018. Convolutional Image Captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_3_2_1_4_1","volume-title":"Proceedings of the Association for Computational Linguistics (ACL) Workshop","volume":"29","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie . 2005 . METEOR: An automatic metric for MT evaluation with improved correlation with human judgments . In Proceedings of the Association for Computational Linguistics (ACL) Workshop , Vol. 29 . 65--72. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Association for Computational Linguistics (ACL) Workshop, Vol. 29. 65--72."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_1_6_1","volume-title":"Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467","author":"Devlin Jacob","year":"2015","unstructured":"Jacob Devlin , Saurabh Gupta , Ross Girshick , Margaret Mitchell , and C Lawrence Zitnick . 2015. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467 ( 2015 ). Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, and C Lawrence Zitnick. 2015. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467 (2015)."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2625--2634","author":"Donahue J.","key":"e_1_3_2_1_7_1","unstructured":"J. Donahue , L. A. Hendricks , S. Guadarrama , M. Rohrbach , S. Venugopalan , T. Darrell , and K. Saenko . 2015. Long-term recurrent convolutional networks for visual recognition and description . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2625--2634 . J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2625--2634."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1473--1482","author":"Fang H.","key":"e_1_3_2_1_8_1","unstructured":"H. Fang , S. Gupta , F. Iandola , R. K. Srivastava , L. Deng , P. Doll\u00e1r , J. Gao , X. He , M. Mitchell , J. C. Platt , C. L. Zitnick , and G. Zweig . 2015. From captions to visual concepts and back . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1473--1482 . H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Doll\u00e1r, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. 2015. From captions to visual concepts and back. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1473--1482."},{"key":"e_1_3_2_1_9_1","volume-title":"Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122","author":"Gehring Jonas","year":"2017","unstructured":"Jonas Gehring , Michael Auli , David Grangier , Denis Yarats , and Yann N Dauphin . 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 ( 2017 ). Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)."},{"key":"e_1_3_2_1_10_1","volume-title":"Proceedings of the International Conference on Artificial Intelligence and Statistics (ICAIS). 249--256","author":"Glorot Xavier","year":"2010","unstructured":"Xavier Glorot and Yoshua Bengio . 2010 . Understanding the difficulty of training deep feedforward neural networks . In Proceedings of the International Conference on Artificial Intelligence and Statistics (ICAIS). 249--256 . Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (ICAIS). 249--256."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_12_1","volume-title":"Neural machine translation in linear time. arXiv preprint arXiv:1610.10099","author":"Kalchbrenner Nal","year":"2016","unstructured":"Nal Kalchbrenner , Lasse Espeholt , Karen Simonyan , Aaron van den Oord , Alex Graves , and Koray Kavukcuoglu . 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 ( 2016 ). Nal Kalchbrenner, Lasse Espeholt, Karen Simonyan, Aaron van den Oord, Alex Graves, and Koray Kavukcuoglu. 2016. Neural machine translation in linear time. arXiv preprint arXiv:1610.10099 (2016)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_3_2_1_14_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"e_1_3_2_1_16_1","volume-title":"Proceedings of the Association for Computational Linguistics (ACL) Workshop","volume":"8","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin . 2004 . Rouge: A package for automatic evaluation of summaries . In Proceedings of the Association for Computational Linguistics (ACL) Workshop , Vol. 8 . Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Association for Computational Linguistics (ACL) Workshop, Vol. 8."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL). 311--318","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni , Salim Roukos , Todd Ward , and Wei-Jing Zhu . 2002 . BLEU: A Method for Automatic Evaluation of Machine Translation . In Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL). 311--318 . Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the Annual Meeting on Association for Computational Linguistics (ACL). 311--318."},{"key":"e_1_3_2_1_19_1","unstructured":"Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS). 91--99.  Shaoqing Ren Kaiming He Ross Girshick and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS). 91--99."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1179--1195","author":"Rennie S. J.","key":"e_1_3_2_1_20_1","unstructured":"S. J. Rennie , E. Marcheret , Y. Mroueh , J. Ross , and V. Goel . 2017. Self-Critical Sequence Training for Image Captioning . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1179--1195 . S. J. Rennie, E. Marcheret, Y. Mroueh, J. Ross, and V. Goel. 2017. Self-Critical Sequence Training for Image Captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1179--1195."},{"key":"e_1_3_2_1_21_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_1_22_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS). 6000--6010.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NIPS). 6000--6010."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299087"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3156--3164","author":"Vinyals O.","key":"e_1_3_2_1_24_1","unstructured":"O. Vinyals , A. Toshev , S. Bengio , and D. Erhan . 2015. Show and tell: A neural image caption generator . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3156--3164 . O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3156--3164."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7378--7387","author":"Wang Y.","key":"e_1_3_2_1_25_1","unstructured":"Y. Wang , Z. Lin , X. Shen , S. Cohen , and G. W. Cottrell . 2017. Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7378--7387 . Y. Wang, Z. Lin, X. Shen, S. Cohen, and G. W. Cottrell. 2017. Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 7378--7387."},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 203--212","author":"Wu Q.","key":"e_1_3_2_1_26_1","unstructured":"Q. Wu , C. Shen , L. Liu , A. Dick , and A. v. d. Hengel .2016. What Value Do Explicit High Level Concepts Have in Vision to Language Problems? . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 203--212 . Q. Wu, C. Shen, L.Liu, A. Dick, and A. v. d. Hengel.2016. What Value Do Explicit High Level Concepts Have in Vision to Language Problems?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 203--212."},{"key":"e_1_3_2_1_27_1","volume-title":"Image Captioning and Visual Question Answering Based on Attributes and External Knowledge","author":"Wu Qi","year":"2017","unstructured":"Qi Wu , Chunhua Shen , Peng Wang , Anthony Dick , and Anton van den Hengel . 2017. Image Captioning and Visual Question Answering Based on Attributes and External Knowledge . IEEE Transactions on Pattern Analysis and Machine Intelligence(T-PAMI) ( 2017 ). Qi Wu, Chunhua Shen, Peng Wang, Anthony Dick, and Anton van den Hengel. 2017. Image Captioning and Visual Question Answering Based on Attributes and External Knowledge. IEEE Transactions on Pattern Analysis and Machine Intelligence(T-PAMI) (2017)."},{"key":"e_1_3_2_1_28_1","volume-title":"Proceedings of the International Conference on International Conference on Machine Learning (ICML). 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Lei Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhutdinov , Richard S. Zemel , and Yoshua Bengio . 2015 . Show, Attend and Tell: Neural Image Caption Generation with Visual Attention . In Proceedings of the International Conference on International Conference on Machine Learning (ICML). 2048--2057 . Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the International Conference on International Conference on Machine Learning (ICML). 2048--2057."},{"key":"e_1_3_2_1_29_1","unstructured":"Zhilin Yang Ye Yuan Yuexin Wu William W Cohen and Ruslan R Salakhutdinov. 2016. Review networks for caption generation. In Advances in Neural Information Processing Systems (NIPS). 2361--2369.  Zhilin Yang Ye Yuan Yuexin Wu William W Cohen and Ruslan R Salakhutdinov. 2016. Review networks for caption generation. In Advances in Neural Information Processing Systems (NIPS). 2361--2369."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.503"},{"key":"e_1_3_2_1_31_1","volume-title":"Captioning Transformer with Stacked Attention Modules. Applied Sciences 8, 5","author":"Zhu Xinxin","year":"2018","unstructured":"Xinxin Zhu , Lixiang Li , Jing Liu , Haipeng Peng , and Xinxin Niu . 2018. Captioning Transformer with Stacked Attention Modules. Applied Sciences 8, 5 ( 2018 ). Xinxin Zhu, Lixiang Li, Jing Liu, Haipeng Peng, and Xinxin Niu. 2018. Captioning Transformer with Stacked Attention Modules. Applied Sciences 8, 5 (2018)."}],"event":{"name":"ICVGIP 2018: 11th Indian Conference on Computer Vision, Graphics and Image Processing","acronym":"ICVGIP 2018","location":"Hyderabad India"},"container-title":["Proceedings of the 11th Indian Conference on Computer Vision, Graphics and Image Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3293353.3293391","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3293353.3293391","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:58:08Z","timestamp":1750208288000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3293353.3293391"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,18]]},"references-count":31,"alternative-id":["10.1145\/3293353.3293391","10.1145\/3293353"],"URL":"https:\/\/doi.org\/10.1145\/3293353.3293391","relation":{},"subject":[],"published":{"date-parts":[[2018,12,18]]},"assertion":[{"value":"2020-05-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}