{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T05:29:47Z","timestamp":1765344587373,"version":"3.46.0"},"reference-count":72,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T00:00:00Z","timestamp":1765152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Software"],"abstract":"<jats:p>Graph generalization is a powerful concept with a wide range of potential applications, while established algorithms exist for generalizing simple graphs, practical approaches for more complex graphs remain elusive. We introduce a novel formal model and algorithm (GGA) that generalizes labeled directed graphs without assuming label identity. We evaluate GGA by focusing on its information preservation relative to its input graphs, its scalability in execution, and its utility for three applications: abstract syntax trees, class graphs, and call graphs. Our findings reveal the superiority of GGA over alternative tools. GGA outperforms ASGard by an average of 5\u201318% on metrics related to information preservation; GGA matches 100% with diffsitter, indicating the correctness of the output. For class graphs, GGA achieves 77.1% in precision at 5, while for call graphs, it exhibits 60% in precision at 5 for a specific application problem. We also test performance for the first two applications: GGA\u2019s execution time scales linearly with respect to the product of vertex count and edge count. Our research demonstrates the ability of GGA to preserve information in diverse applications while performing efficiently, signaling its potential to advance the field.<\/jats:p>","DOI":"10.3390\/software4040033","type":"journal-article","created":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T16:20:29Z","timestamp":1765210829000},"page":"33","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Graph Generalization for Software Engineering"],"prefix":"10.3390","volume":"4","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-7387-8666","authenticated-orcid":false,"given":"Mohammad Reza","family":"Kianifar","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0953-6907","authenticated-orcid":false,"given":"Robert J.","family":"Walker","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Calgary, 2500 University Dr NW, Calgary, AB T2N 1N4, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2025,12,8]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Otala, J., Minard, A., Madraki, G., and Mousavian, S. (2021). Graph-based modeling in shop scheduling problems: Review and extensions. Appl. Sci., 11.","DOI":"10.3390\/app11114741"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"685","DOI":"10.1145\/506315.506316","article-title":"A framework for call graph construction algorithms","volume":"23","author":"Grove","year":"2001","journal-title":"ACM Trans. Program. Lang. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/3-540-44541-2_2","article-title":"User preference of graph layout aesthetics: A UML study","volume":"Volume 1984","author":"Purchase","year":"2001","journal-title":"Proceedings of the 8th International Symposium on Graph Drawing"},{"key":"ref_4","unstructured":"Riaz, F., and Ali, K.M. (2011, January 26\u201328). Applications of graph theory in computer science. Proceedings of the 3rd International Conference on Computational Intelligence, Communication Systems and Networks, Bali, Indonesia."},{"key":"ref_5","unstructured":"Pearson PLC (2025, October 14). Longman Dictionary of Contemporary English Online. Article: Generalization. Available online: https:\/\/www.ldoceonline.com\/dictionary\/generalization."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.scico.2014.11.013","article-title":"Six strategies for generalizing software engineering theories","volume":"101","author":"Wieringa","year":"2015","journal-title":"Sci. Comput. Program."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Jackson, D., and Ladd, D.A. (1994, January 19\u201323). Semantic Diff: A tool for summarizing the effects of modifications. Proceedings of the 1994 International Conference on Software Maintenance, Victoria, BC, Canada.","DOI":"10.1109\/ICSM.1994.336770"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Falleri, J.R., Morandat, F., Blanc, X., Martinez, M., and Monperrus, M. (2014, January 15\u201319). Fine-grained and accurate source code differencing. Proceedings of the 29th ACM\/IEEE International Conference on Automated Software Engineering, V\u00e4ster\u00e5s, Sweden.","DOI":"10.1145\/2642937.2642982"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"725","DOI":"10.1109\/TSE.2007.70731","article-title":"Change distilling: Tree differencing for fine-grained source code change extraction","volume":"33","author":"Fluri","year":"2007","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Dotzler, G., and Philippsen, M. (2016, January 3\u20137). Move-optimized source code tree differencing. Proceedings of the 31st IEEE\/ACM International Conference on Automated Software Engineering, Singapore.","DOI":"10.1145\/2970276.2970315"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Frick, V., Grassauer, T., Beck, F., and Pinzger, M. (2018, January 23\u201329). Generating accurate and compact edit scripts using tree differencing. Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution, Madrid, Spain.","DOI":"10.1109\/ICSME.2018.00036"},{"key":"ref_12","unstructured":"Enayet, A. (2025, October 14). diffsitter. Available online: https:\/\/github.com\/afnanenayet\/diffsitter."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1080\/0022250X.1978.9989883","article-title":"A graph-theoretic generalization of the clique concept","volume":"6","author":"Seidman","year":"1978","journal-title":"J. Math. Sociol."},{"key":"ref_14","first-page":"75","article-title":"Graph Unification and Matching","volume":"Volume 1073","author":"Cuny","year":"1994","journal-title":"Proceedings of the Graph Grammars and Their Application to Computer Science"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1002\/(SICI)1520-6610(1999)7:6<395::AID-JCD1>3.0.CO;2-U","article-title":"Deza graphs: A generalization of strongly regular graph","volume":"7","author":"Erickson","year":"1999","journal-title":"J. Comb. Des."},{"key":"ref_16","unstructured":"Baumgartner, A., Kutsia, T., Levy, J., and Villaret, M. (2018, January 9\u201312). Term-graph anti-unification. Proceedings of the 3rd International Conference on Formal Structures for Computation and Deduction, Oxford, UK. Leibniz International Proceedings in Informatics."},{"key":"ref_17","first-page":"17","article-title":"Looking for the bigger picture","volume":"31","author":"Tall","year":"2011","journal-title":"Learn. Math."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1145\/1035570.1035572","article-title":"Inheritance processing and conflicts in structural generalization hierarchies","volume":"36","author":"Formica","year":"2004","journal-title":"ACM Comput. Surv."},{"key":"ref_19","unstructured":"Roy, C.K., and Cordy, J.R. (2007). A Survey on Software Clone Detection Research, Queen\u2019s University, School of Computing. Available online: https:\/\/research.cs.queensu.ca\/TechReports\/Reports\/2007-541.pdf."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Min, H., and Li Ping, Z. (2019, January 12\u201314). Survey on Software Clone Detection Research. Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences, Wuhan, China.","DOI":"10.1145\/3312662.3312707"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1145\/1095430.1081754","article-title":"DynaMine: Finding common error patterns by mining software revision histories","volume":"30","author":"Livshits","year":"2005","journal-title":"ACM SIGSOFT Softw. Eng. Notes"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wasylkowski, A., Zeller, A., and Lindig, C. (2007, January 3\u20137). Detecting object usage anomalies. Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Dubrovnik, Croatia.","DOI":"10.1145\/1287624.1287632"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"111974","DOI":"10.1016\/j.jss.2024.111974","article-title":"API usage templates via structural generalization","volume":"210","author":"Mahmoud","year":"2024","journal-title":"J. Syst. Softw."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1145\/1125944.1125949","article-title":"How UML is used","volume":"49","author":"Dobing","year":"2006","journal-title":"Commun. ACM"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chatzigeorgiou, A., Tsantalis, N., and Stephanides, G. (2006, January 20). Application of graph theory to OO software engineering. Proceedings of the 2006 International Workshop on Interdisciplinary Software Engineering Research, Shanghai, China.","DOI":"10.1145\/1137661.1137669"},{"key":"ref_26","unstructured":"Hejderup, J., van Deursen, A., and Gousios, G. (June, January 27). Software ecosystem call graph for dependency management. Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, Gothenburg, Sweden."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Horwitz, S., and Reps, T. (1992, January 11\u201315). The use of program dependence graphs in software engineering. Proceedings of the 14th International Conference on Software Engineering, Melbourne, Australia.","DOI":"10.1145\/143062.143156"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1145\/365813.365819","article-title":"SIMULA: An ALGOL-based simulation language","volume":"9","author":"Dahl","year":"1966","journal-title":"Commun. ACM"},{"key":"ref_29","unstructured":"Goldberg, A., and Robson, D. (1983). Smalltalk-80: The Language and Its Implementation, Addison-Wesley Longman Publishing Co., Inc."},{"key":"ref_30","first-page":"40","article-title":"Object-oriented programming: Themes and variations","volume":"6","author":"Stefik","year":"1985","journal-title":"AI Mag."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1145\/320544.320546","article-title":"Database Abstractions: Aggregation and Generalization","volume":"2","author":"Smith","year":"1977","journal-title":"ACM Trans. Database Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Clark, W.A., and Farley, B.G. (1955, January 1\u20133). Generalization of pattern recognition in a self-organizing system. Proceedings of the Western Joint Computer Conference, Los Angeles, CA, USA.","DOI":"10.1145\/1455292.1455309"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1016\/S0743-1066(99)00030-8","article-title":"Conjunctive partial deduction: Foundations, control, algorithms, and experiments","volume":"41","author":"Leuschel","year":"1999","journal-title":"J. Log. Program."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Bonif\u00e1cio, A.L., Moura, A.V., and da Silva Simao, A. (2008, January 10\u201314). A generalized model-based test generation method. Proceedings of the 2008 Sixth IEEE International Conference on Software Engineering and Formal Methods, Cape Town, South Africa.","DOI":"10.1109\/SEFM.2008.17"},{"key":"ref_35","first-page":"413","article-title":"Anti-unification algorithms and their applications in program analysis","volume":"Volume 5947","author":"Bulychev","year":"2010","journal-title":"Proceedings of the 7th International Andrei Ershov Memorial Conference on Perspectives of Systems Informatics"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"957","DOI":"10.1109\/32.177365","article-title":"Seesoft: A tool for visualizing line oriented software statistics","volume":"18","author":"Eick","year":"1992","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_37","unstructured":"Fischer, M., Pinzger, M., and Gall, H. (2003, January 22\u201326). Populating a release history database from version control and bug tracking systems. Proceedings of the 2003 International Conference on Software Maintenance, Amsterdam, The Netherlands."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/32.895984","article-title":"Does code decay? Assessing the evidence from change management data","volume":"27","author":"Eick","year":"2001","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Xing, Z., and Stroulia, E. (2005, January 7\u201311). UMLDiff: An algorithm for object-oriented design differencing. Proceedings of the 20th IEEE\/ACM International Conference on Automated Software Engineering, Long Beach, CA, USA.","DOI":"10.1145\/1101908.1101919"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1109\/TPAMI.1980.4767034","article-title":"Pattern recognition as rule-guided inductive inference","volume":"2","author":"Michalski","year":"1980","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1145\/322290.322295","article-title":"Pattern matching in trees","volume":"29","author":"Hoffmann","year":"1982","journal-title":"J. ACM"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1137\/0218082","article-title":"Simple fast algorithms for the editing distance between trees and related problems","volume":"18","author":"Zhang","year":"1989","journal-title":"SIAM J. Comput."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Schauer, F. (1993). Rules as Generalizations. Playing by the Rules: A Philosophical Examination of Rule-Based Decision-Making in Law and in Life, Oxford University Press.","DOI":"10.1093\/acprof:oso\/9780198258315.001.0001"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Bulychev, P., and Minea, M. (2008, January 29\u201330). Duplicate code detection using anti-unification. Proceedings of the Spring\/Summer Young Researchers\u2019 Colloquium on Software Engineering, St. Petersburg, Russia.","DOI":"10.15514\/SYRCOSE-2008-2-22"},{"key":"ref_45","unstructured":"Baxter, I.D., Yahin, A., Moura, L., Sant\u2019Anna, M., and Bier, L. (1998, January 20). Clone detection using abstract syntax trees. Proceedings of the International Conference on Software Maintenance, Bethesda, MD, USA."},{"key":"ref_46","unstructured":"Cui, B., Li, J., Guo, T., Wang, J., and Ma, D. (2010, January 26\u201328). Code comparison system based on abstract syntax tree. Proceedings of the 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology, Beijing, China."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1367","DOI":"10.1109\/TPAMI.2004.75","article-title":"A (sub) graph isomorphism algorithm for matching large graphs","volume":"26","author":"Cordella","year":"2004","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"2867","DOI":"10.1016\/S0031-3203(01)00232-1","article-title":"Inexact graph matching using estimation of distribution algorithms","volume":"35","author":"Bengoetxea","year":"2002","journal-title":"Pattern Recognit."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"629","DOI":"10.1016\/0743-1066(94)90035-3","article-title":"Inductive logic programming: Theory and methods","volume":"19","author":"Muggleton","year":"1994","journal-title":"J. Log. Program."},{"key":"ref_50","first-page":"273","article-title":"Restricted higher-order anti-unification for analogy making","volume":"Volume 4830","author":"Krumnack","year":"2007","journal-title":"Proceedings of the 20th Australian Joint Conference on Artificial Intelligence"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.fss.2019.03.019","article-title":"Fuzzy lattice operations on first-order terms over signatures with similar constructors: A constraint-based approach","volume":"391","author":"Pasi","year":"2020","journal-title":"Fuzzy Sets Syst."},{"key":"ref_52","first-page":"153","article-title":"A note on inductive generalization","volume":"5","author":"Plotkin","year":"1970","journal-title":"Mach. Intell."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1007\/s10817-013-9285-6","article-title":"Anti-unification for unranked terms and hedges","volume":"52","author":"Kutsia","year":"2014","journal-title":"J. Autom. Reason."},{"key":"ref_54","first-page":"135","article-title":"Transformational systems and algebraic structure of atomic formulas","volume":"5","author":"Reynolds","year":"1970","journal-title":"Mach. Intell."},{"key":"ref_55","unstructured":"Herbrand, J. (1930). Recherches sur la Th\u00e9orie de la Demonstration. [Ph.D. Thesis, Universit\u00e9 de Paris]. Available online: https:\/\/www.numdam.org\/item\/THESE_1930__110__1_0.pdf."},{"key":"ref_56","first-page":"63","article-title":"Computational Logic: The Unification Computation","volume":"6","author":"Robinson","year":"1971","journal-title":"Mach. Intell."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1007\/s10817-022-09635-1","article-title":"Faster Linear Unification Algorithm","volume":"66","year":"2022","journal-title":"J. Autom. Reason."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Cerna, D.M., and Kutsia, T. (2023, January 19\u201325). Anti-unification and generalization: A survey. Proceedings of the 32nd International Joint Conference on Artificial Intelligence, Cape Town, South Africa.","DOI":"10.24963\/ijcai.2023\/736"},{"key":"ref_59","unstructured":"Smullyan, R.M. (1995). First-Order Logic, Courier Corporation."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"293","DOI":"10.1007\/s10817-016-9383-3","article-title":"Higher-order pattern anti-unification in linear time","volume":"58","author":"Baumgartner","year":"2017","journal-title":"J. Autom. Reason."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Kostylev, E.V., and Zakharov, V.A. (2008). On complexity of the anti-unification problem. Discret. Math. Appl., 18.","DOI":"10.1515\/DMA.2008.007"},{"key":"ref_62","unstructured":"Mahmoud, M. (2023). API Usage Templates via Structural Generalization. [Ph.D. Thesis, University of Calgary]. Available online: https:\/\/prism.ucalgary.ca\/items\/5f7a0eef-fa07-40dd-9dd4-6b41d65e5749."},{"key":"ref_63","unstructured":"GitHub (2025, October 14). Tree-sitter. Available online: https:\/\/tree-sitter.github.io\/tree-sitter\/."},{"key":"ref_64","first-page":"845","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"163","author":"Levenshtein","year":"1965","journal-title":"Dokl. Akad. Nauk SSSR"},{"key":"ref_65","unstructured":"GitHub, Inc. (2025, October 14). CodeSearchNet. Available online: https:\/\/github.com\/github\/CodeSearchNet."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"4814","DOI":"10.1109\/TSE.2023.3315935","article-title":"Hyperparameter optimization for AST differencing","volume":"49","author":"Martinez","year":"2023","journal-title":"IEEE Trans. Softw. Eng."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1145\/235968.233366","article-title":"Change detection in hierarchically structured information","volume":"25","author":"Chawathe","year":"1996","journal-title":"ACM SIGMOD Rec."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"1209","DOI":"10.1007\/s10910-009-9635-0","article-title":"Centrality measure in graphs","volume":"47","author":"Klein","year":"2010","journal-title":"J. Math. Chem."},{"key":"ref_69","unstructured":"Kiss, T. (2025, October 14). CodeMetrics. Available online: https:\/\/plugins.jetbrains.com\/plugin\/12159-codemetrics."},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Walker, R.J., Rawal, S., and Sillito, J. (2012, January 11\u201316). Do crosscutting concerns cause modularity problems?. Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, Cary, NC, USA.","DOI":"10.1145\/2393596.2393654"},{"key":"ref_71","unstructured":"Gousios, G. (2025, October 14). java-callgraph: Java Call Graph Utilities. Available online: https:\/\/github.com\/gousiosg\/java-callgraph."},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"100979","DOI":"10.1016\/j.cola.2020.100979","article-title":"PathPair2Vec: An AST path pair-based code representation method for defect prediction","volume":"59","author":"Shi","year":"2020","journal-title":"J. Comput. Lang."}],"container-title":["Software"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2674-113X\/4\/4\/33\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,10]],"date-time":"2025-12-10T05:25:29Z","timestamp":1765344329000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2674-113X\/4\/4\/33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,8]]},"references-count":72,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["software4040033"],"URL":"https:\/\/doi.org\/10.3390\/software4040033","relation":{},"ISSN":["2674-113X"],"issn-type":[{"type":"electronic","value":"2674-113X"}],"subject":[],"published":{"date-parts":[[2025,12,8]]}}}