{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,19]],"date-time":"2026-04-19T10:17:36Z","timestamp":1776593856040,"version":"3.51.2"},"reference-count":37,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,1,21]],"date-time":"2022-01-21T00:00:00Z","timestamp":1642723200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001807","name":"S\u00e3o Paulo Research Foundation","doi-asserted-by":"publisher","award":["2019\/24041-4"],"award-info":[{"award-number":["2019\/24041-4"]}],"id":[{"id":"10.13039\/501100001807","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>Automatically describing images using natural sentences is essential to visually impaired people\u2019s inclusion on the Internet. This problem is known as Image Captioning. There are many datasets in the literature, but most contain only English captions, whereas datasets with captions described in other languages are scarce. We introduce the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese. In contrast to popular datasets, #PraCegoVer has only one reference per image, and both mean and variance of reference sentence length are significantly high, which makes our dataset challenging due to its linguistic aspect. We carry a detailed analysis to find the main classes and topics in our data. We compare #PraCegoVer to MS COCO dataset in terms of sentence length and word frequency. We hope that #PraCegoVer dataset encourages more works addressing the automatic generation of descriptions in Portuguese.<\/jats:p>","DOI":"10.3390\/data7020013","type":"journal-article","created":{"date-parts":[[2022,1,21]],"date-time":"2022-01-21T08:37:18Z","timestamp":1642754238000},"page":"13","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["#PraCegoVer: A Large Dataset for Image Captioning in Portuguese"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2835-1331","authenticated-orcid":false,"given":"Gabriel Oliveira","family":"dos Santos","sequence":"first","affiliation":[{"name":"Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0467-3133","authenticated-orcid":false,"given":"Esther Luna","family":"Colombini","sequence":"additional","affiliation":[{"name":"Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9068-938X","authenticated-orcid":false,"given":"Sandra","family":"Avila","sequence":"additional","affiliation":[{"name":"Institute of Computing, University of Campinas (Unicamp), Campinas 13083-852, Brazil"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,21]]},"reference":[{"key":"ref_1","unstructured":"Web para Todos (2022, January 14). Criadora do Projeto #PraCegoVer Incentiva a Descri\u00e7\u00e3o de Imagens na Web. Available online: http:\/\/mwpt.com.br\/criadora-do-projeto-pracegover-incentiva-descricao-de-imagens-na-web."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"853","DOI":"10.1613\/jair.3994","article-title":"Framing image description as a ranking task: Data, models and evaluation metrics","volume":"47","author":"Hodosh","year":"2013","journal-title":"J. Artif. Intell. Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"74","DOI":"10.1007\/s11263-016-0965-7","article-title":"Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models","volume":"123","author":"Plummer","year":"2017","journal-title":"Int. J. Comput. Vis."},{"key":"ref_4","unstructured":"Chen, X., Fang, H., Lin, T.-Y., Vedantam, R., Gupta, S., Dollar, P., and Zitnick, C.L. (2015). Microsoft COCO captions: Data collection and evaluation server. arXiv."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sharma, P., Ding, N., Goodman, S., and Soricut, R. (2018). Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/P18-1238"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Sidorov, O., Hu, R., Rohrbach, M., and Singh, A. (2020). Textcaps: A dataset for image captioning with reading comprehension. European Conference on Computer Vision, Springer International Publishing.","DOI":"10.1007\/978-3-030-58536-5_44"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Xue, L., Constant, N., Roberts, A., Kale, M., Al-Rfou, R., Siddhant, A., Barua, A., and Raffel, C. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.","DOI":"10.18653\/v1\/2021.naacl-main.41"},{"key":"ref_8","unstructured":"Rosa, G., Bonifacio, L., de Souza, L., Lotufo, R., Nogueira, R., and Melville, J. (2021). A cost-benefit analysis of cross-lingual transfer methods. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision, Springer International Publishing.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_10","unstructured":"Rashtchian, C., Young, P., Hodosh, M., and Hockenmaier, J. (2010). Collecting image annotations using amazon\u2019s mechanical turk. Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk, Association for Computational Linguistics."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Farhadi, A., Hejrati, M., Sadeghi, M., Young, P., Rashtchian, C., Hockenmaier, J., and Forsyth, D. (2010). Every picture tells a story: Generating sentences from images. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15561-1_2"},{"key":"ref_12","unstructured":"Elliott, D., and Keller, F. (2013). Image description using visual dependency representations. Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Zitnick, C., Parikh, D., and Vanderwende, L. (2013, January 1\u20138). Learning the visual interpretation of sentences. Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia.","DOI":"10.1109\/ICCV.2013.211"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Kong, C., Lin, D., Bansal, M., Urtasun, R., and Fidler, S. (2014, January 23\u201328). What are you talking about? Text-to-image coreference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.455"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Harwath, D., and Glass, J. (2015, January 13\u201317). Deep multimodal semantic embeddings for speech and images. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, AZ, USA.","DOI":"10.1109\/ASRU.2015.7404800"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Gan, C., Gan, Z., He, X., Gao, J., and Deng, L. (2017, January 21\u201326). Stylenet: Generating attractive visual captions with styles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.108"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1007\/s11263-016-0981-7","article-title":"Visual genome: Connecting language and vision using crowdsourced dense image annotations","volume":"123","author":"Krishna","year":"2017","journal-title":"Int. J. Comput. Vis."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Levinboim, T., Thapliyal, A., Sharma, P., and Soricut, R. Quality Estimation for Image Captions Based on Large-scale Human Evaluations. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.","DOI":"10.18653\/v1\/2021.naacl-main.253"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hsu, T., Giles, C., and Huang, T. (2021). SciCap: Generating Captions for Scientific Figures. Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2021.findings-emnlp.277"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Lam, Q., Le, Q., Nguyen, V., and Nguyen, N. (2020). UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning. International Conference on Computational Collective Intelligence, Springer International Publishing.","DOI":"10.1007\/978-3-030-63007-2_57"},{"key":"ref_21","unstructured":"Agrawal, H., Desai, K., Wang, Y., Chen, X., Jain, R., Johnson, M., Batra, D., Parikh, D., Lee, S., and Anderson, P. (November, January 27). nocaps: Novel object captioning at scale. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_22","unstructured":"Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., and Kamali, S. (2022, January 14). Openimages: A Public Dataset for Large-Scale Multi-Label and Multi-Class Image Classification. Available online: https:\/\/storage.googleapis.com\/openimages\/web\/index.html."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gurari, D., Zhao, Y., Zhang, M., and Bhattacharya, N. (2020). Captioning images taken by people who are blind. European Conference on Computer Vision, Springer International Publishing.","DOI":"10.1007\/978-3-030-58520-4_25"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Park, C., Kim, B., and Kim, G. (2017, January 21\u201326). Attend to you: Personalized image captioning with context sequence memory networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.681"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018, January 18\u201323). MobileNetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1111\/1467-9868.00196","article-title":"Probabilistic principal component analysis","volume":"61","author":"Tipping","year":"1999","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"McInnes, L., Healy, J., and Melville, J. (2018). UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv.","DOI":"10.21105\/joss.00861"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"205","DOI":"10.21105\/joss.00205","article-title":"HDBSCAN: Hierarchical density based clustering","volume":"2","author":"McInnes","year":"2017","journal-title":"J. Open Source Softw."},{"key":"ref_29","first-page":"993","article-title":"Latent dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Vedantam, R., Zitnick, C.L., and Parikh, D. (2015, January 7\u201312). CIDEr: Consensus-based image description evaluation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., and Goel, V. (2017, January 21\u201326). Self-critical sequence training for image captioning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.131"},{"key":"ref_32","unstructured":"Santos, G., Colombini, E., and Avila, S. CIDEr-R: Robust Consensus-based Image Description Evaluation. Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-NUT 2021)."},{"key":"ref_33","unstructured":"Huang, L., Wang, W., Chen, J., and Wei, X. (November, January 27). Attention on attention for image captioning. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2002, January 7\u201312). BLEU: A method for automatic evaluation of machine translation. Proceedings of the Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA.","DOI":"10.3115\/1073083.1073135"},{"key":"ref_35","unstructured":"Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. Text Summarization Branches Out, Association for Computational Linguistics."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Lavie, A., and Agarwal, A. (2007, January 23). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic.","DOI":"10.3115\/1626355.1626389"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1145\/3458723","article-title":"Datasheets for datasets","volume":"64","author":"Gebru","year":"2021","journal-title":"Commun. ACM"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/7\/2\/13\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:05:31Z","timestamp":1760133931000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/7\/2\/13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,21]]},"references-count":37,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2]]}},"alternative-id":["data7020013"],"URL":"https:\/\/doi.org\/10.3390\/data7020013","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,21]]}}}