{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,5]],"date-time":"2026-05-05T02:54:01Z","timestamp":1777949641530,"version":"3.51.4"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:00:00Z","timestamp":1777852800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T00:00:00Z","timestamp":1777852800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100013000","name":"Politecnico di Torino","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100013000","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Autom Softw Eng"],"published-print":{"date-parts":[[2026,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Modern software development automation is mostly based on AI, covering every aspect of code production and maintenance, throughout the entire software development lifecycle, from requirements and code writing to testing and maintenance. Code commenting is no exception. Automated code comment generation methods rely on static syntactic and lexical features of source code. However, these approaches frequently underperform in data-centric software applications, where understanding the effect of code on data is essential. We explore an execution-aware extension to automatic documentation generation. In this exploratory work, we aim at capturing post-execution data transformations (i.e.,\n                    <jats:italic>semantic data differences)<\/jats:italic>\n                    that reveal the code\u2019s effect on data, and use it as a complementary signal alongside existing code representations to automate explanatory comments for data wrangling code. We build a curated dataset of Python notebooks from Kaggle and apply a lightweight execution tracer to extract structured descriptions of runtime data transformations. We define a formal grammar for capturing these effects and integrate them into a multimodal encoder-decoder model using co-attention mechanisms. Multiple training strategies are explored to assess the impact of this new modality on comment generation. Our evaluation reveals that models incorporating this modality performed competitively with code-only baselines. Notably, in cases where no observable data transformation occurred, the presence of symbolic\n                    <jats:inline-formula>\n                      <jats:tex-math>$$\\langle \\mathsf {no\\_diff} \\rangle$$<\/jats:tex-math>\n                    <\/jats:inline-formula>\n                    signals led to improved robustness and higher comment quality, as measured by both automatic and human evaluation metrics. However, we did not observe improvements in comment quality in semantically rich scenarios, suggesting possible paths of improvement for future research direction. Qualitative analysis of generated comments supports this pattern, indicating that the modality helps stabilize comments by reducing unnecessary or speculative details in neutral cases, but does not provide yet consistent guidance when meaningful data transformations occur. These trends are less pronounced on a larger, noisier extended test set, suggesting sensitivity to comment\u2013code alignment. Our study demonstrates the feasibility and potential of using execution-derived feedback as a complementary signal in automated comment generation. While the current approach is limited by dataset size and modality noise, it demonstrates that post-execution state changes can guide more context-aware and stable code summarization. This suggests a promising direction for execution-sensitive models in assisting data-centric software development and its documentation.\n                  <\/jats:p>","DOI":"10.1007\/s10515-026-00623-y","type":"journal-article","created":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T03:37:41Z","timestamp":1777865861000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Beyond syntax: enhancing automated documentation with data differences"],"prefix":"10.1007","volume":"33","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-5808-604X","authenticated-orcid":false,"given":"Giacomo","family":"Fantino","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2027-3308","authenticated-orcid":false,"given":"Antonio","family":"Vetro\u2019","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5328-368X","authenticated-orcid":false,"given":"Marco","family":"Torchiano","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4485-9055","authenticated-orcid":false,"given":"Federica","family":"Cappelluti","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2026,5,4]]},"reference":[{"key":"623_CR1","doi-asserted-by":"publisher","unstructured":"Bansal, A., Haque, S., McMillan, C.: Project-Level Encoding for Neural Source Code Summarization of Subroutines (2021). https:\/\/doi.org\/10.48550\/arXiv.2103.11599.","DOI":"10.48550\/arXiv.2103.11599"},{"issue":"6","key":"623_CR2","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1145\/3582083","volume":"20","author":"C Bird","year":"2023","unstructured":"Bird, C., Ford, D., Zimmermann, T., et al.: Taking flight with copilot: Early insights and opportunities of ai-powered pair-programming tools. Queue 20(6), 35\u20135 (2023). https:\/\/doi.org\/10.1145\/3582083","journal-title":"Queue"},{"key":"623_CR3","doi-asserted-by":"crossref","unstructured":"Chen, W., Chen, H.: Collaborative co-attention network for session-based recommendation. Mathematics 9(12), (2021). https:\/\/doi.org\/10.3390\/math9121392. https:\/\/www.mdpi.com\/2227-7390\/9\/12\/1392","DOI":"10.3390\/math9121392"},{"key":"623_CR4","doi-asserted-by":"publisher","unstructured":"\u015eim\u015fek, T., G\u00fcl\u015feni, C., Olcay, G.A.: The future of software development with genai: Evolving roles of software personas. IEEE Eng. Manag. Rev. 1\u20138 (2024). https:\/\/doi.org\/10.1109\/EMR.2024.3454112","DOI":"10.1109\/EMR.2024.3454112"},{"issue":"6","key":"623_CR5","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1109\/MS.2024.3428439","volume":"41","author":"N Davila","year":"2024","unstructured":"Davila, N., Melegati, J., Wiese, I.: Tales from the trenches: Expectations and challenges from practice for code review in the generative ai era. IEEE Softw. 41(6), 38\u201345 (2024). https:\/\/doi.org\/10.1109\/MS.2024.3428439","journal-title":"IEEE Softw."},{"key":"623_CR6","doi-asserted-by":"publisher","unstructured":"Dhruv, A., Dubey, A.: Leveraging large language models for code translation and software development in scientific computing. In: Proceedings of the Platform for Advanced Scientific Computing Conference. Association for Computing Machinery, New York, NY, USA, PASC \u201925, pp. 1\u20139 (2025). https:\/\/doi.org\/10.1145\/3732775.3733572","DOI":"10.1145\/3732775.3733572"},{"key":"623_CR7","doi-asserted-by":"publisher","unstructured":"Dimitrov, M., Zhou, H.: Unified architectural support for soft-error protection or software bug detection. In: 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp. 73\u201382 (2007). https:\/\/doi.org\/10.1109\/PACT.2007.4336201","DOI":"10.1109\/PACT.2007.4336201"},{"key":"623_CR8","doi-asserted-by":"publisher","unstructured":"Ding, Y., Steenhoek, B., Pei, K., et\u00a0al.: Traced: Execution-aware pre-training for source code. In: Proceedings of the IEEE\/ACM 46th International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, ICSE \u201924 (2024). https:\/\/doi.org\/10.1145\/3597503.3608140","DOI":"10.1145\/3597503.3608140"},{"key":"623_CR9","doi-asserted-by":"publisher","unstructured":"Donvir, A., Sharma, G.: Ethical challenges and frameworks in ai-driven software development and testing. In: 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), pp 00569\u201300576 (2025). https:\/\/doi.org\/10.1109\/CCWC62904.2025.10903892","DOI":"10.1109\/CCWC62904.2025.10903892"},{"key":"623_CR10","doi-asserted-by":"publisher","unstructured":"Dvivedi, S.S., Vijay, V., Pujari, S.L.R., et\u00a0al.: A comparative analysis of large language models for code documentation generation. In: Proceedings of the 1st ACM International Conference on AI-Powered Software. Association for Computing Machinery, New York, NY, USA, AIware 2024, pp. 65\u201373 (2024). https:\/\/doi.org\/10.1145\/3664646.3664765, https:\/\/doi.org\/10.1145\/3664646.3664765","DOI":"10.1145\/3664646.3664765"},{"key":"623_CR11","doi-asserted-by":"crossref","unstructured":"Feng, Z., Guo, D., Tang, D., et\u00a0al.: CodeBERT: A pre-trained model for programming and natural languages. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp. 1536\u20131547 (2020). https:\/\/doi.org\/10.18653\/v1\/2020.findings-emnlp.139. https:\/\/aclanthology.org\/2020.findings-emnlp.139\/","DOI":"10.18653\/v1\/2020.findings-emnlp.139"},{"key":"623_CR12","unstructured":"Guo, D., Ren, S., Lu, S., et\u00a0al.: Graphcodebert: Pre-training code representations with data flow. In: International Conference on Learning Representations (2021). https:\/\/openreview.net\/forum?id=jLoC4ez43PZ"},{"key":"623_CR13","doi-asserted-by":"publisher","unstructured":"Haque, S., LeClair, A., Wu, L., et\u00a0al.: Improved Automatic Summarization of Subroutines via Attention to File Context. In: Proceedings of the 17th International Conference on Mining Software Repositories, pp. 300\u2013310 (2020). https:\/\/doi.org\/10.1145\/3379597.3387449. https:\/\/arxiv.org\/abs\/2004.04881","DOI":"10.1145\/3379597.3387449"},{"key":"623_CR14","doi-asserted-by":"crossref","unstructured":"Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. (2007)","DOI":"10.1080\/19312450709336664"},{"key":"623_CR15","doi-asserted-by":"publisher","unstructured":"Hu, X., Xia, X., Lo, D., et\u00a0al.: Practitioners\u2019 expectations on automated code comment generation. In: Proceedings of the 44th International Conference on Software Engineering. ACM, pp. 1693\u20131705 (2022). https:\/\/doi.org\/10.1145\/3510003.3510152. https:\/\/dl.acm.org\/doi\/10.1145\/3510003.3510152","DOI":"10.1145\/3510003.3510152"},{"key":"623_CR16","doi-asserted-by":"publisher","unstructured":"Huang, J., Guo, D., Wang, C., et\u00a0al.: Contextualized Data-Wrangling Code Generation in Computational Notebooks. In: Proceedings of the 39th IEEE\/ACM International Conference on Automated Software Engineering. ACM, pp. 1282\u20131294 (2024). https:\/\/doi.org\/10.1145\/3691620.3695503. https:\/\/dl.acm.org\/doi\/10.1145\/3691620.3695503","DOI":"10.1145\/3691620.3695503"},{"key":"623_CR17","doi-asserted-by":"publisher","unstructured":"Jackson, V., Vasilescu, B., Russo, D., et\u00a0al.: The impact of generative ai on creativity in software development: A research agenda. ACM Trans. Softw. Eng. Methodol. 34(5) (2025). https:\/\/doi.org\/10.1145\/3708523","DOI":"10.1145\/3708523"},{"key":"623_CR18","doi-asserted-by":"publisher","unstructured":"Kretzer, F., Kolthoff, K., Bartelt, C., et\u00a0al.: Closing the loop between user stories and gui prototypes: An llm-based assistant for cross-functional integration in software development. In: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI \u201925 (2025). https:\/\/doi.org\/10.1145\/3706598.3713932","DOI":"10.1145\/3706598.3713932"},{"key":"623_CR19","doi-asserted-by":"publisher","unstructured":"Kudriavtseva, A., Hotak, N.A., Gadyatskaya, O.: My code is less secure with gen ai: Surveying developers\u2019 perceptions of the impact of code generation tools on security. In: Proceedings of the 40th ACM\/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, USA, SAC \u201925, pp. 1637\u2013164 (2025). https:\/\/doi.org\/10.1145\/3672608.3707778","DOI":"10.1145\/3672608.3707778"},{"key":"623_CR20","doi-asserted-by":"crossref","unstructured":"Liu, M., Zhou, F., Chen, K., et al.: Co-attention networks based on aspect and context for aspect-level sentiment analysis. Knowl.-Based Syst. 217, 10681 (2021). https:\/\/doi.org\/10.1016\/j.knosys.2021.106810, https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0950705121000733","DOI":"10.1016\/j.knosys.2021.106810"},{"key":"623_CR21","unstructured":"Lu, J., Batra, D., Parikh, D., et\u00a0al.: Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Wallach, H., Larochelle, H., Beygelzimer, A., et\u00a0al. (eds.) Advances in Neural Information Processing Systems, vol.\u00a032. Curran Associates, Inc. (2019). https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/c74d97b01eae257e44aa9d5bade97baf-Paper.pdf"},{"key":"623_CR22","unstructured":"Lu, S., Guo, D., Ren, S., et\u00a0al.: Codexglue: A machine learning benchmark dataset for code understanding and generation. CoRR arxiv:2102.04664 (2021)"},{"key":"623_CR23","doi-asserted-by":"publisher","unstructured":"Meem, F.N., Johnson, B.: Investigating the impact of ai-assisted tools on software practitioner well-being. In: Adjunct Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work. Association for Computing Machinery, New York, NY, USA, CHIWORK \u201925 Adjunct (2025). https:\/\/doi.org\/10.1145\/3707640.3731915","DOI":"10.1145\/3707640.3731915"},{"key":"623_CR24","doi-asserted-by":"crossref","unstructured":"Mondal, T., Barnett, S., Lal, A., et\u00a0al.: Cell2Doc: ML Pipeline for Generating Documentation in Computational Notebooks. In: 2023 38th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp. 384\u2013396 (2023). https:\/\/doi.org\/10.1109\/ASE56229.2023.00200. https:\/\/ieeexplore.ieee.org\/document\/10298542\/","DOI":"10.1109\/ASE56229.2023.00200"},{"key":"623_CR25","unstructured":"Ni, A., Allamanis, M., Cohan, A., et\u00a0al.: Next: teaching large language models to reason about code execution. In: Proceedings of the 41st International Conference on Machine Learning. JMLR.org, ICML\u201924 (2024)"},{"issue":"3","key":"623_CR26","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/MS.2023.3248401","volume":"40","author":"I Ozkaya","year":"2023","unstructured":"Ozkaya, I.: Application of large language models to software engineering tasks: Opportunities, risks, and implications. IEEE Softw. 40(3), 4\u20138 (2023). https:\/\/doi.org\/10.1109\/MS.2023.3248401","journal-title":"IEEE Softw."},{"key":"623_CR27","unstructured":"Patterson, E., Baldini, I., Mojsilovic, A., et\u00a0al.: Teaching machines to understand data science code by semantic enrichment of dataflow graphs. arxiv:1807.05691 (2019)"},{"key":"623_CR28","doi-asserted-by":"publisher","unstructured":"Roy, D., Fakhoury, S., Arnaoudova, V.: Reassessing automatic evaluation metrics for code summarization tasks. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, pp. 1105\u20131116 (2021). https:\/\/doi.org\/10.1145\/3468264.3468588. https:\/\/dl.acm.org\/doi\/10.1145\/3468264.3468588","DOI":"10.1145\/3468264.3468588"},{"key":"623_CR29","unstructured":"Sanh, V., Debut, L., Chaumond, J., et\u00a0al.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR arxiv:1910.01108 (2019)"},{"key":"623_CR30","doi-asserted-by":"publisher","unstructured":"Schr\u00f6der M, Kr\u00fcger, F., Spors, S.: Reproducible Research is more than Publishing Research Artefacts: A Systematic Analysis of Jupyter Notebooks from Research Articles (2019).https:\/\/doi.org\/10.48550\/ARXIV.1905.00092.","DOI":"10.48550\/ARXIV.1905.00092"},{"key":"623_CR31","unstructured":"Spacy: Spacy library (2017). https:\/\/spacy.io\/"},{"key":"623_CR32","doi-asserted-by":"publisher","unstructured":"Stalnaker, T., Wintersgill, N., Chaparro, O., et al.: Developer perspectives on licensing and copyright issues arising from generative ai for software development. ACM Trans Softw Eng Method (2025). https:\/\/doi.org\/10.1145\/3743133, just Accepted","DOI":"10.1145\/3743133"},{"key":"623_CR33","doi-asserted-by":"crossref","unstructured":"Sutton, C., Hobson, T., Geddes, J., et\u00a0al.: Data diff: Interpretable, executable summaries of changes in distributions for data wrangling. In: Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, pp. 2279\u20132288 (2018)","DOI":"10.1145\/3219819.3220057"},{"key":"623_CR34","doi-asserted-by":"publisher","unstructured":"Treshcheva, E., Itkin, I., Yavorskiy, R., et\u00a0al.: Test2text: Ai-based mapping between autogenerated tests and atomic requirements. In: 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 17\u201320 (2025). https:\/\/doi.org\/10.1109\/ICSTW64639.2025.10962519","DOI":"10.1109\/ICSTW64639.2025.10962519"},{"key":"623_CR35","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., et\u00a0al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, NIPS\u201917, pp. 6000\u20136010 (2017)"},{"key":"623_CR36","doi-asserted-by":"crossref","unstructured":"Wilkinson, M.D., Dumontier, M., family=Aalbersberg gigiven=IJsbrand Jan, et al.: The FAIR Guiding Principles for scientific data management and stewardship. Scientif. Data 3(1), 16001 (2016). https:\/\/doi.org\/10.1038\/sdata.2016.18. https:\/\/www.nature.com\/articles\/sdata201618","DOI":"10.1038\/sdata.2016.18"},{"key":"623_CR37","doi-asserted-by":"crossref","unstructured":"Yang, C., Zhou, S., Guo, J.L., et\u00a0al.: Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code. In: 2021 36th IEEE\/ACM International Conference on Automated Software Engineering (ASE). IEEE, pp 304\u2013316 (2021).https:\/\/doi.org\/10.1109\/ASE51524.2021.9678520. https:\/\/ieeexplore.ieee.org\/document\/9678520\/","DOI":"10.1109\/ASE51524.2021.9678520"},{"key":"623_CR38","doi-asserted-by":"publisher","unstructured":"Zhou, W., Wu, J.: Code Comments Generation with Data Flow-Guided Transformer. In: Zhao, X., Yang, S., Wang, X., et\u00a0al. (eds.) Web Information Systems and Applications, vol. 13579. Springer International Publishing, pp. 168\u2013180 (2022). https:\/\/doi.org\/10.1007\/978-3-031-20309-1_15. https:\/\/link.springer.com\/10.1007\/978-3-031-20309-1_15","DOI":"10.1007\/978-3-031-20309-1_15"}],"container-title":["Automated Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10515-026-00623-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10515-026-00623-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10515-026-00623-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T03:37:48Z","timestamp":1777865868000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10515-026-00623-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,4]]},"references-count":38,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,11]]}},"alternative-id":["623"],"URL":"https:\/\/doi.org\/10.1007\/s10515-026-00623-y","relation":{},"ISSN":["0928-8910","1573-7535"],"issn-type":[{"value":"0928-8910","type":"print"},{"value":"1573-7535","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,4]]},"assertion":[{"value":"17 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2026","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2026","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"77"}}