{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:05:02Z","timestamp":1760058302976,"version":"build-2065373602"},"reference-count":46,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T00:00:00Z","timestamp":1743379200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Ministry of Science and Higher Education of the Russian Federation","award":["FEWZ-2024-0016"],"award-info":[{"award-number":["FEWZ-2024-0016"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Detecting mentions of green waste practices on social networks is a crucial tool for environmental monitoring and sustainability analytics. Social media serve as a valuable source of ecological information, enabling researchers to track trends, assess public engagement, and predict the spread of sustainable behaviors. Automatic extraction of mentions of green waste practices facilitates large-scale analysis, but the uneven distribution of such mentions presents a challenge for effective detection. To address this, data augmentation plays a key role in balancing class distribution in green practice detection tasks. In this study, we compared existing data augmentation techniques based on the paraphrasing of original texts. We evaluated the effectiveness of additional explanations in prompts, the Chain-of-Thought prompting, synonym substitution, and text expansion. Experiments were conducted on the GreenRu dataset, which focuses on detecting mentions of green waste practices in Russian social media. Our results, obtained using two instruction-based large language models, demonstrated the effectiveness of the Chain-of-Thought prompting for text augmentation. These findings contribute to advancing sustainability analytics by improving automated detection and analysis of environmental discussions. Furthermore, the results of this study can be applied to other tasks that require augmentation of text data in the context of ecological research and beyond.<\/jats:p>","DOI":"10.3390\/bdcc9040081","type":"journal-article","created":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T08:48:00Z","timestamp":1743410880000},"page":"81","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing Green Practice Detection in Social Media with Paraphrasing-Based Data Augmentation"],"prefix":"10.3390","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8409-6457","authenticated-orcid":false,"given":"Anna","family":"Glazkova","sequence":"first","affiliation":[{"name":"Carbon Measurement Test Area in Tyumen\u2019 Region (FEWZ-2024-0016), University of Tyumen, Tyumen 625003, Russia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1404-4915","authenticated-orcid":false,"given":"Olga","family":"Zakharova","sequence":"additional","affiliation":[{"name":"Carbon Measurement Test Area in Tyumen\u2019 Region (FEWZ-2024-0016), University of Tyumen, Tyumen 625003, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1038\/s41558-018-0121-1","article-title":"Towards demand-side solutions for mitigating climate change","volume":"8","author":"Creutzig","year":"2018","journal-title":"Nat. Clim. Chang."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1016\/j.erss.2019.02.001","article-title":"It starts at home? Climate policies targeting household consumption and behavioral decisions are key to low-carbon futures","volume":"52","author":"Dubois","year":"2019","journal-title":"Energy Res. Soc. Sci."},{"key":"ref_3","unstructured":"Spurling, N., McMeekin, A., Shove, E., Southerton, D., and Welch, D. (2025, February 11). Interventions in Practice: Re-Framing Policy Approaches to Consumer Behaviour. Available online: https:\/\/research.manchester.ac.uk\/en\/publications\/interventions-in-practice-re-framing-policy-approaches-to-consume."},{"key":"ref_4","unstructured":"Creutzig, F., Roy, J., Devine-Wright, P., D\u00edaz-Jos\u00e9, J., Geels, F., Grubler, A., Ma\u00efzi, N., Masanet, E., Mulugetta, Y., and Onyige-Ebeniro, C. (2022). Demand, Services and Social Aspects of Mitigation, Cambridge University Press. Technical Report."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"6414","DOI":"10.1021\/es803496a","article-title":"Carbon footprint of nations: A global, trade-linked analysis","volume":"43","author":"Hertwich","year":"2009","journal-title":"Environ. Sci. Technol."},{"key":"ref_6","unstructured":"Boev, P.A., and Burenko, D.L. (2016). Ecological Footprint of the Subjects of the Russian Federation\u20142016, WWF Russia."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hui, A., Schatzki, T., and Shove, E. (2017). The Nexus of Practices: Connections, Constellations, Practitioners, Routledge.","DOI":"10.4324\/9781315560816"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"145","DOI":"10.21684\/2412-2343-2024-11-4-145-167","article-title":"Green Waste Practices as Climate Adaptation and Mitigation Actions: Grassroots Initiatives in Russia","volume":"11","author":"Zakharova","year":"2024","journal-title":"BRICS Law J."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1013","DOI":"10.1007\/s11266-020-00208-7","article-title":"How and why do social and sustainable initiatives scale? A systematic review of the literature on social entrepreneurship and grassroots innovation","volume":"31","author":"Geuijen","year":"2020","journal-title":"Volunt. Int. J. Volunt. Nonprofit Organ."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1002\/eet.1929","article-title":"Hybrid infrastructures: The role of strategy and compromise in grassroot governance","volume":"31","author":"Schmid","year":"2021","journal-title":"Environ. Policy Gov."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Dai, H., Liu, Z., Liao, W., Huang, X., Cao, Y., Wu, Z., Zhao, L., Xu, S., Zeng, F., and Liu, W. (2025). AugGPT: Leveraging ChatGPT for Text Data Augmentation. IEEE Trans. Big Data, 1\u201312.","DOI":"10.1109\/TBDATA.2025.3536934"},{"key":"ref_12","unstructured":"Sarker, S., Qian, L., and Dong, X. (2023). Medical data augmentation via ChatGPT: A case study on medication identification and medication event classification. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wo\u017aniak, S., and Koco\u0144, J. (2023, January 1\u20134). From Big to Small Without Losing It All: Text Augmentation with ChatGPT for Efficient Sentiment Analysis. Proceedings of the 2023 IEEE International Conference on Data Mining Workshops (ICDMW), Shanghai, China.","DOI":"10.1109\/ICDMW60847.2023.00108"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Chen, W., Qiu, P., and Cauteruccio, F. (2024). MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8080086"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Pires, H., Paucar, L., and Carvalho, J.P. (2025). DeB3RTa: A Transformer-Based Model for the Portuguese Financial Domain. Big Data Cogn. Comput., 9.","DOI":"10.3390\/bdcc9030051"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Piedboeuf, F., and Langlais, P. (2023, January 6\u201310). Is ChatGPT the ultimate Data Augmentation Algorithm?. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore.","DOI":"10.18653\/v1\/2023.findings-emnlp.1044"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Zhao, H., Chen, H., Ruggles, T.A., Feng, Y., Singh, D., and Yoon, H.J. (2024). Improving Text Classification with Large Language Model-Based Data Augmentation. Electronics, 13.","DOI":"10.3390\/electronics13132535"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Glazkova, A., and Zakharova, O. (2024, January 11\u201312). Evaluating LLM Prompts for Data Augmentation in Multi-Label Classification of Ecological Texts. Proceedings of the 2024 Ivannikov Ispras Open Conference (ISPRAS), Moscow, Russia.","DOI":"10.1109\/ISPRAS64596.2024.10899128"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Li, Y., Ding, K., Wang, J., and Lee, K. (2024). Empowering Large Language Models for Textual Data Augmentation. arXiv.","DOI":"10.18653\/v1\/2024.findings-acl.756"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1109\/MIS.2024.3508432","article-title":"Exploring ChatGPT-Based Augmentation Strategies for Contrastive Aspect-Based Sentiment Analysis","volume":"40","author":"Xu","year":"2025","journal-title":"IEEE Intell. Syst."},{"key":"ref_21","unstructured":"Chai, Y., Xie, H., and Qin, J.S. (2025). Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Zheng, C., Sabour, S., Wen, J., Zhang, Z., and Huang, M. (2023, January 9\u201314). AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation. Proceedings of the Findings of the Association for Computational Linguistics: ACL, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.findings-acl.99"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yoo, K.M., Park, D., Kang, J., Lee, S.W., and Park, W. (2021, January 7\u201311). GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic.","DOI":"10.18653\/v1\/2021.findings-emnlp.192"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Sahu, G., Vechtomova, O., Bahdanau, D., and Laradji, I. (2023, January 6\u201310). PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore.","DOI":"10.18653\/v1\/2023.emnlp-main.323"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Honovich, O., Scialom, T., Levy, O., and Schick, T. (2023, January 9\u201314). Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.acl-long.806"},{"key":"ref_26","first-page":"65468","article-title":"Post hoc explanations of language models can improve language models","volume":"36","author":"Krishna","year":"2024","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Ye, X., Iyer, S., Celikyilmaz, A., Stoyanov, V., Durrett, G., and Pasunuru, R. (2023, January 9\u201314). Complementary Explanations for Effective In-Context Learning. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.findings-acl.273"},{"key":"ref_28","unstructured":"Cheng, X., Li, J., Zhao, W.X., and Wen, J.R. (2024, January 20\u201325). ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Tan, J.T. (2023, January 7). Causal abstraction for chain-of-thought reasoning in arithmetic word problems. Proceedings of the 6th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, Singapore.","DOI":"10.18653\/v1\/2023.blackboxnlp-1.12"},{"key":"ref_30","unstructured":"Zhao, X., Li, M., Lu, W., Weber, C., Lee, J.H., Chu, K., and Wermter, S. (2024, January 20\u201325). Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Peng, L., Zhang, Y., and Shang, J. (2024, January 11\u201316). Controllable data augmentation for few-shot text mining with chain-of-thought attribute manipulation. Proceedings of the Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand.","DOI":"10.18653\/v1\/2024.findings-acl.1"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wu, D., Zhang, J., and Huang, X. (2023, January 9\u201314). Chain of Thought Prompting Elicits Knowledge Augmentation. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada.","DOI":"10.18653\/v1\/2023.findings-acl.408"},{"key":"ref_33","unstructured":"Li, D., Li, Y., Mekala, D., Li, S., Wang, X., Hogan, W., and Shang, J. (2023). DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase. arXiv."},{"key":"ref_34","unstructured":"Ubani, S., Polat, S.O., and Nielsen, R. (2023). Zeroshotdataaug: Generating and augmenting training data with chatgpt. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"101887","DOI":"10.1016\/j.inffus.2023.101887","article-title":"Enhancing social network hate detection using back translation and GPT-3 augmentations during training and test-time","volume":"99","author":"Cohen","year":"2023","journal-title":"Inf. Fusion"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Shushkevich, E., Alexandrov, M., and Cardiff, J. (2023). Improving multiclass classification of fake news using BERT-based models and ChatGPT-augmented data. Inventions, 8.","DOI":"10.3390\/inventions8050112"},{"key":"ref_37","unstructured":"M\u00f8ller, A.G., Pera, A., Dalsgaard, J., and Aiello, L. (2024, January 17\u201322). The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), St. Julians, Malta."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1007\/s13278-024-01325-7","article-title":"Data augmentation using instruction-tuned models improves emotion analysis in tweets","volume":"14","author":"Yandrapati","year":"2024","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_39","first-page":"48987","article-title":"Evaluation and Analysis of Large Language Models for Clinical Text Augmentation and Generation","volume":"12","author":"Latif","year":"2024","journal-title":"IEEE Access"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zakharova, O., and Glazkova, A. (2024). GreenRu: A Russian Dataset for Detecting Mentions of Green Practices in Social Media Posts. Appl. Sci., 14.","DOI":"10.3390\/app14114466"},{"key":"ref_41","first-page":"884","article-title":"The Importance of Green Practices to Reduce Consumption","volume":"6","author":"Zakharova","year":"2022","journal-title":"Chang. Soc. Personal."},{"key":"ref_42","unstructured":"Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., and Fan, A. (2024). The Llama 3 herd of models. arXiv."},{"key":"ref_43","unstructured":"Zmitrovich, D., Abramov, A., Kalmykov, A., Kadulin, V., Tikhonova, M., Taktasheva, E., Astafurov, D., Baushenko, M., Snegirev, A., and Shavrina, T. (2024, January 20\u201325). A Family of Pretrained Transformer Language Models for Russian. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy."},{"key":"ref_44","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA."},{"key":"ref_45","unstructured":"Clark, K., Luong, M.T., Le, Q.V., and Manning, C.D. (2020, January 26\u201330). ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. Proceedings of the ICLR, Addis Ababa, Ethiopia."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2020, January 16\u201320). Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.emnlp-main.365"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/4\/81\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:06:25Z","timestamp":1760029585000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/9\/4\/81"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,31]]},"references-count":46,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,4]]}},"alternative-id":["bdcc9040081"],"URL":"https:\/\/doi.org\/10.3390\/bdcc9040081","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2025,3,31]]}}}