{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T09:47:38Z","timestamp":1772876858163,"version":"3.50.1"},"reference-count":52,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,5,9]],"date-time":"2025-05-09T00:00:00Z","timestamp":1746748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Nebraska Collaboration Initiative","award":["NRI-47130"],"award-info":[{"award-number":["NRI-47130"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Large language models (LLMs) have rapidly advanced natural language processing, showcasing remarkable effectiveness as automated annotators across various applications. Despite their potential to significantly reduce annotation costs and expedite workflows, annotations produced solely by LLMs can suffer from inaccuracies and inherent biases, highlighting the necessity of maintaining human oversight. In this article, we present a synergistic human\u2013LLM collaboration approach for data annotation enhancement (SYNCode). This framework is designed explicitly to facilitate collaboration between humans and LLMs for annotating complex, code-centric datasets such as Stack Overflow. The proposed approach involves an integrated pipeline that initially employs TF-IDF analysis for quick identification of relevant textual elements. Subsequently, we leverage advanced transformer-based models, specifically NLP Transformer and UniXcoder, to capture nuanced semantic contexts and code structures, generating more accurate preliminary annotations. Human annotators then engage in iterative refinement, validating and adjusting annotations to enhance accuracy and mitigate biases introduced during automated labeling. To operationalize this synergistic workflow, we developed the SYNCode prototype, featuring an interactive graphical interface that supports real-time collaborative annotation between humans and LLMs. This enables annotators to iteratively refine and validate automated suggestions effectively. Our integrated human\u2013LLM collaborative methodology demonstrates considerable promise in achieving high-quality, reliable annotations, particularly for domain-specific and technically demanding datasets, thereby enhancing downstream tasks in software engineering and natural language processing.<\/jats:p>","DOI":"10.3390\/info16050392","type":"journal-article","created":{"date-parts":[[2025,5,9]],"date-time":"2025-05-09T04:13:44Z","timestamp":1746764024000},"page":"392","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["SYNCode: Synergistic Human\u2013LLM Collaboration for Enhanced Data Annotation in Stack Overflow"],"prefix":"10.3390","volume":"16","author":[{"given":"Meng","family":"Xia","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shradha","family":"Maharjan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tammy","family":"Le","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Will","family":"Taylor","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Myoungkyu","family":"Song","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,5,9]]},"reference":[{"key":"ref_1","unstructured":"Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_2","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_3","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Fan, A., Gokkaya, B., Harman, M., Lyubarskiy, M., Sengupta, S., Yoo, S., and Zhang, J.M. (2023). Large language models for software engineering: Survey and open problems. arXiv.","DOI":"10.1109\/ICSE-FoSE59343.2023.00008"},{"key":"ref_5","unstructured":"Zheng, Z., Ning, K., Wang, Y., Zhang, J., Zheng, D., Ye, M., and Chen, J. (2023). A survey of large language models for code: Evolution, benchmarking, and future trends. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Chowdhary, K., and Chowdhary, K. (2020). Natural language processing. Fundamentals of Artificial Intelligence, Springer.","DOI":"10.1007\/978-81-322-3972-7"},{"key":"ref_7","unstructured":"Guo, D., Ren, S., Lu, S., Feng, Z., Tang, D., Liu, S., Zhou, L., Duan, N., Svyatkovskiy, A., and Fu, S. (2020). Graphcodebert: Pre-training code representations with data flow. arXiv."},{"key":"ref_8","unstructured":"Kanade, A., Maniatis, P., Balakrishnan, G., and Shi, K. (2020, January 3\u201318). Learning and evaluating contextual embedding of source code. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., and Jiang, D. (2020). Codebert: A pre-trained model for programming and natural languages. arXiv.","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"ref_10","unstructured":"Xia, C.S., Wei, Y., and Zhang, L. (2022). Practical program repair in the era of large pre-trained language models. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Niu, C., Li, C., Ng, V., Ge, J., Huang, L., and Luo, B. (2022, January 21\u201329). Spt-code: Sequence-to-sequence pre-training for learning source code representations. Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA.","DOI":"10.1145\/3510003.3510096"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, X., Lin, Z., Gong, Y., Jin, A., Zhang, H., Lin, C., Jiao, J., Yiu, S.M., Duan, N., and Chen, W. (2023). Annollm: Making large language models to be better crowdsourced annotators. arXiv.","DOI":"10.18653\/v1\/2024.naacl-industry.15"},{"key":"ref_13","unstructured":"Zhu, Y., Zhang, P., Haq, E.U., Hui, P., and Tyson, G. (2023). Can chatgpt reproduce human-generated labels? a study of social computing tasks. arXiv."},{"key":"ref_14","unstructured":"Le, T., Taylor, W., Maharjan, S., Xia, M., and Song, M. (2025, January 29\u201331). SYNC: Synergistic Annotation Collaboration between Humans and LLMs for Enhanced Model Training. Proceedings of the 23rd IEEE\/ACIS International Conference on Software Engineering Research, Management and Applications (SERA), Las Vegas, NV, USA."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Saha, S., Hase, P., Rajani, N., and Bansal, M. (2022). Are hard examples also harder to explain? A study with human and model-generated explanations. arXiv.","DOI":"10.18653\/v1\/2022.emnlp-main.137"},{"key":"ref_16","unstructured":"Wang, P., Chan, A., Ilievski, F., Chen, M., and Ren, X. (2022). Pinto: Faithful language reasoning using prompt-generated rationales. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wiegreffe, S., Hessel, J., Swayamdipta, S., Riedl, M., and Choi, Y. (2021). Reframing human-AI collaboration for generating free-text explanations. arXiv.","DOI":"10.18653\/v1\/2022.naacl-main.47"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Bhat, M.M., Sordoni, A., and Mukherjee, S. (2021, January 7\u201311). Self-training with few-shot rationalization. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Virtual.","DOI":"10.18653\/v1\/2021.emnlp-main.836"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Marasovi\u0107, A., Beltagy, I., Downey, D., and Peters, M.E. (2021). Few-shot self-rationalization with natural language prompts. arXiv.","DOI":"10.18653\/v1\/2022.findings-naacl.31"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wang, P., Wang, Z., Li, Z., Gao, Y., Yin, B., and Ren, X. (2023). Scott: Self-consistent chain-of-thought distillation. arXiv.","DOI":"10.18653\/v1\/2023.acl-long.304"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Wiegreffe, S., Marasovi\u0107, A., and Smith, N.A. (2020). Measuring association between labels and free-text rationales. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.804"},{"key":"ref_22","first-page":"74952","article-title":"Language models don\u2019t always say what they think: Unfaithful explanations in chain-of-thought prompting","volume":"36","author":"Turpin","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","unstructured":"Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., and Anadkat, S. (2023). Gpt-4 technical report. arXiv."},{"key":"ref_25","unstructured":"Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., and Chen, Z. (2023). Palm 2 technical report. arXiv."},{"key":"ref_26","unstructured":"Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozi\u00e8re, B., Goyal, N., Hambro, E., and Azhar, F. (2023). Llama: Open and efficient foundation language models. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Wang, S., Liu, Y., Xu, Y., Zhu, C., and Zeng, M. (2021). Want to reduce labeling cost? GPT-3 can help. arXiv.","DOI":"10.18653\/v1\/2021.findings-emnlp.354"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"e2305016120","DOI":"10.1073\/pnas.2305016120","article-title":"ChatGPT outperforms crowd workers for text-annotation tasks","volume":"120","author":"Gilardi","year":"2023","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1162\/coli_a_00502","article-title":"Can large language models transform computational social science?","volume":"50","author":"Ziems","year":"2024","journal-title":"Comput. Linguist."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Wang, H., Hee, M.S., Awal, M., Choo, K., and Lee, R.K.W. (2023). Evaluating GPT-3 Generated Explanations for Hateful Content Moderation. arXiv.","DOI":"10.24963\/ijcai.2023\/694"},{"key":"ref_31","unstructured":"Skeppstedt, M. (2013, January 4\u20139). Annotating named entities in clinical text by combining pre-annotation and active learning. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop, Sofia, Bulgaria."},{"key":"ref_32","unstructured":"Fort, K., and Sagot, B. (2010, January 15\u201316). Influence of pre-annotation on POS-tagged corpus development. Proceedings of the fourth ACL Linguistic Annotation Workshop, Uppsala, Sweden."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"406","DOI":"10.1136\/amiajnl-2013-001837","article-title":"Evaluating the impact of pre-annotation on annotation speed and potential bias: Natural language processing gold standard development for clinical named entity recognition in clinical trial announcements","volume":"21","author":"Lingren","year":"2014","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_34","unstructured":"Mikulov\u00e1, M., Straka, M., \u0160t\u011bp\u00e1nek, J., \u0160t\u011bp\u00e1nkov\u00e1, B., and Haji\u010d, J. (2023). Quality and efficiency of manual annotation: Pre-annotation bias. arXiv."},{"key":"ref_35","unstructured":"Ogren, P.V., Savova, G.K., and Chute, C.G. (June, January 26). Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition. Proceedings of the LREC, Marrakech, Morocco."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"162","DOI":"10.1016\/j.jbi.2014.05.002","article-title":"Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text","volume":"50","author":"South","year":"2014","journal-title":"J. Biomed. Inform."},{"key":"ref_37","first-page":"91","article-title":"PAL, a tool for pre-annotation and active learning","volume":"31","author":"Skeppstedt","year":"2017","journal-title":"J. Lang. Technol. Comput. Linguist."},{"key":"ref_38","unstructured":"Sujoy, S., Krishna, A., and Goyal, P. (2023, January 9\u201313). Pre-annotation based approach for development of a Sanskrit named entity recognition dataset. Proceedings of the Computational Sanskrit & Digital Humanities: Selected Papers Presented at the 18th World Sanskrit Conference, Canberra, Australia."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Andriluka, M., Uijlings, J.R., and Ferrari, V. (2018, January 22\u201326). Fluid annotation: A human-machine collaboration interface for full image annotation. Proceedings of the 26th ACM international conference on Multimedia, Seoul, Republic of Korea.","DOI":"10.1145\/3240508.3241916"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Hernandez, A., Hochheiser, H., Horn, J., Crowley, R., and Boyce, R. (2014, January 2\u20134). Testing pre-annotation to help non-experts identify drug-drug interactions mentioned in drug product labeling. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Pittsburgh, PA, USA.","DOI":"10.1609\/hcomp.v2i1.13213"},{"key":"ref_41","unstructured":"Kuo, T.T., Huh, J., Kim, J., El-Kareh, R., Singh, S., Feupe, S.F., Kuri, V., Lin, G., Day, M.E., and Ohno-Machado, L. (2018). The impact of automatic pre-annotation in Clinical Note Data Element Extraction-the CLEAN Tool. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Ghosh, T., Saha, R.K., Jenamani, M., Routray, A., Singh, S.K., and Mondal, A. (2023, January 16\u201321). SeisLabel: An AI-Assisted Annotation Tool for Seismic Data Labeling. Proceedings of the IGARSS 2023\u20132023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA.","DOI":"10.1109\/IGARSS52108.2023.10283015"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Green, B., and Chen, Y. (2019, January 29\u201331). Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA.","DOI":"10.1145\/3287560.3287563"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3129669","article-title":"Improving human-machine cooperative visual search with soft highlighting","volume":"15","author":"Kneusel","year":"2017","journal-title":"ACM Trans. Appl. Percept."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Lai, V., and Tan, C. (2019, January 29\u201331). On human predictions with explanations and predictions of machine learning models: A case study on deception detection. Proceedings of the Conference on Fairness, Accountability, and Transparency, Atlanta, GA, USA.","DOI":"10.1145\/3287560.3287590"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Ma, S., Lei, Y., Wang, X., Zheng, C., Shi, C., Yin, M., and Ma, X. (2023, January 23\u201328). Who should i trust: Ai or myself? Leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany.","DOI":"10.1145\/3544548.3581058"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Wang, X., Lu, Z., and Yin, M. (2022, January 25\u201329). Will you accept the ai recommendation? Predicting human behavior in ai-assisted decision making. Proceedings of the ACM Web Conference 2022, Lyon, France.","DOI":"10.1145\/3485447.3512240"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Stites, M.C., Nyre-Yu, M., Moss, B., Smutz, C., and Smith, M.R. (2021, January 24\u201329). Sage advice? The impacts of explanations for machine learning models on human decision-making in spam detection. Proceedings of the International Conference on Human-Computer Interaction, Virtual Event.","DOI":"10.2172\/1878725"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Yin, M., Wortman Vaughan, J., and Wallach, H. (2019, January 4\u20139). Understanding the effect of accuracy on trust in machine learning models. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, UK.","DOI":"10.1145\/3290605.3300509"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019, January 3\u20137). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Guo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., and Yin, J. (2022). Unixcoder: Unified cross-modal pre-training for code representation. arXiv.","DOI":"10.18653\/v1\/2022.acl-long.499"},{"key":"ref_52","unstructured":"Husain, H., Wu, H.H., Gazit, T., Allamanis, M., and Brockschmidt, M. (2019). Codesearchnet challenge: Evaluating the state of semantic code search. arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/392\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:29:49Z","timestamp":1760030989000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/5\/392"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,9]]},"references-count":52,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["info16050392"],"URL":"https:\/\/doi.org\/10.3390\/info16050392","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,9]]}}}