{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T11:51:58Z","timestamp":1778500318139,"version":"3.51.4"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T00:00:00Z","timestamp":1724889600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T00:00:00Z","timestamp":1724889600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002241","name":"Japan Science and Technology Agency","doi-asserted-by":"publisher","award":["KAKENHI JP19H0111"],"award-info":[{"award-number":["KAKENHI JP19H0111"]}],"id":[{"id":"10.13039\/501100002241","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002241","name":"Japan Science and Technology Agency","doi-asserted-by":"publisher","award":["KAKENHI JP19H0111"],"award-info":[{"award-number":["KAKENHI JP19H0111"]}],"id":[{"id":"10.13039\/501100002241","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002241","name":"Japan Science and Technology Agency","doi-asserted-by":"publisher","award":["KAKENHI JP19H0111"],"award-info":[{"award-number":["KAKENHI JP19H0111"]}],"id":[{"id":"10.13039\/501100002241","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004721","name":"The University of Tokyo","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004721","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Graph data augmentation (GDA), which manipulates graph structure and\/or attributes, has been demonstrated as an effective method for improving the generalization of graph neural networks on semi-supervised node classification. As a data augmentation technique, label preservation is critical, that is, node labels should not change after data manipulation. However, most existing methods overlook the label preservation requirements. Determining the label-preserving nature of a GDA method is highly challenging, owing to the non-Euclidean nature of the graph structure. In this study, for the first time, we formulate a label-preserving problem (LPP) in the context of GDA. The LPP is formulated as an optimization problem in which, given a fixed augmentation budget, the objective is to find an augmented graph with minimal difference in data distribution compared to the original graph. To solve the LPP problem, we propose GMMDA, a generative data augmentation (DA) method based on Gaussian mixture modeling (GMM) of a graph in a latent space. We designed a novel learning objective that jointly learns a low-dimensional graph representation and estimates the GMM. The learning is followed by sampling from the GMM, and the samples are converted back to the graph as additional nodes. To uphold label preservation, we designed a minimum description length (MDL)-based method to select a set of samples that produces the minimum shift in the data distribution captured by the GMM. Through experiments, we demonstrate that GMMDA can improve the performance of graph convolutional network on\n                    <jats:sc>Cora<\/jats:sc>\n                    ,\n                    <jats:sc>Citeseer<\/jats:sc>\n                    and\n                    <jats:sc>Pubmed<\/jats:sc>\n                    by as much as\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$7.75\\%$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mrow>\n                            <mml:mn>7.75<\/mml:mn>\n                            <mml:mo>%<\/mml:mo>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    ,\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$8.75\\%$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mrow>\n                            <mml:mn>8.75<\/mml:mn>\n                            <mml:mo>%<\/mml:mo>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    and\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$5.87\\%$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mrow>\n                            <mml:mn>5.87<\/mml:mn>\n                            <mml:mo>%<\/mml:mo>\n                          <\/mml:mrow>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    , respectively, significantly outperforming the state-of-the-art methods.\n                  <\/jats:p>","DOI":"10.1007\/s10115-024-02207-2","type":"journal-article","created":{"date-parts":[[2024,8,29]],"date-time":"2024-08-29T03:02:44Z","timestamp":1724900564000},"page":"7667-7695","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["GMMDA: Gaussian mixture modeling of graph in latent space for graph data augmentation"],"prefix":"10.1007","volume":"66","author":[{"given":"Yanjin","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Linchuan","family":"Xu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kenji","family":"Yamanishi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,8,29]]},"reference":[{"key":"2207_CR1","doi-asserted-by":"crossref","unstructured":"Wang Y, Wang W, Liang Y, Cai Y, Liu J, Hooi B (2020) Nodeaug: semi-supervised node classification with data augmentation. In: KDD. ACM, pp 207\u2013217","DOI":"10.1145\/3394486.3403063"},{"key":"2207_CR2","doi-asserted-by":"crossref","unstructured":"Verma V, Qu M, Kawaguchi K, Lamb A, Bengio Y, Kannala J, Tang J (2021) Graphmix: improved training of GNNs for semi-supervised learning. In: AAAI, vol 35. AAAI Press, pp 10024\u201310032","DOI":"10.1609\/aaai.v35i11.17203"},{"key":"2207_CR3","doi-asserted-by":"crossref","unstructured":"Zhao T, Liu Y, Neves L, Woodford O, Jiang M, Shah N (2021) Data augmentation for graph neural networks. In: AAAI, vol 35. AAAI Press, pp 11015\u201311023","DOI":"10.1609\/aaai.v35i12.17315"},{"key":"2207_CR4","first-page":"19010","volume":"34","author":"H Park","year":"2021","unstructured":"Park H, Lee S, Kim S, Park J, Jeong J, Kim K-M, Ha J-W, Kim HJ (2021) Metropolis-hastings data augmentation for graph neural networks. NeurIPS 34:19010\u201319020","journal-title":"NeurIPS"},{"key":"2207_CR5","unstructured":"Rong Y, Huang W, Xu T, Huang J (2019) Dropedge: towards deep graph convolutional networks on node classification. arXiv preprint arXiv:1907.10903"},{"key":"2207_CR6","unstructured":"Deng Z, Dong Y, Zhu J (2019) Batch virtual adversarial training for graph convolutional networks. arXiv preprint arXiv:1902.09192"},{"key":"2207_CR7","doi-asserted-by":"crossref","unstructured":"Bo D, Hu B, Wang X, Zhang Z, Shi C, Zhou J (2022) Regularizing graph neural networks via consistency-diversity graph augmentations. In: AAAI, vol 36. AAAI Press, pp 3913\u20133921","DOI":"10.1609\/aaai.v36i4.20307"},{"key":"2207_CR8","unstructured":"Liu S, Ying R, Dong H, Li L, Xu T, Rong Y, Zhao P, Huang J, Wu D (2022) Local augmentation for graph neural networks. In: ICML. PMLR, pp 14054\u201314072"},{"key":"2207_CR9","doi-asserted-by":"crossref","unstructured":"Ding K, Xu Z, Tong H, Liu H (2022) Data augmentation for deep graph learning: a survey. arXiv preprint arXiv:2202.08235","DOI":"10.1145\/3575637.3575646"},{"key":"2207_CR10","unstructured":"Zhou J, Xie C, Wen Z, Zhao X, Xuan Q (2022) Data augmentation on graphs: a survey. arXiv preprint arXiv:2212.09970"},{"key":"2207_CR11","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2022.100527","volume":"47","author":"M Adjeisah","year":"2023","unstructured":"Adjeisah M, Zhu X, Xu H, Ayall TA (2023) Towards data augmentation in graph neural network: an overview and evaluation. Comput Sci Rev 47:100527","journal-title":"Comput Sci Rev"},{"key":"2207_CR12","doi-asserted-by":"crossref","unstructured":"Yu S, Huang H, Dao M.N, Xia F (2022) Graph augmentation learning. In: Companion proceedings of TheWebConf. IW3C2, pp 1063\u20131072","DOI":"10.1145\/3487553.3524718"},{"key":"2207_CR13","doi-asserted-by":"crossref","unstructured":"Li Q, Han Z, Wu X.-M (2018) Deeper insights into graph convolutional networks for semi-supervised learning. In: AAAI. AAAI Press","DOI":"10.1609\/aaai.v32i1.11604"},{"key":"2207_CR14","doi-asserted-by":"crossref","unstructured":"Chen D, Lin Y, Li W, Li P, Zhou J, Sun X (2020) Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In: AAAI, vol 34. AAAI Press, pp 3438\u20133445","DOI":"10.1609\/aaai.v34i04.5747"},{"key":"2207_CR15","unstructured":"Zhao L, Akoglu L (2019) Pairnorm: tackling oversmoothing in GNNs. arXiv preprint arXiv:1909.12223"},{"key":"2207_CR16","unstructured":"Oono K, Suzuki T (2019) Graph neural networks exponentially lose expressive power for node classification. arXiv preprint arXiv:1905.10947"},{"key":"2207_CR17","unstructured":"Ganea O, B\u00e9cigneul G, Hofmann T (2018) Hyperbolic entailment cones for learning hierarchical embeddings. In: ICML. PMLR, pp 1646\u20131655"},{"key":"2207_CR18","first-page":"22092","volume":"33","author":"W Feng","year":"2020","unstructured":"Feng W, Zhang J, Dong Y, Han Y, Luan H, Xu Q, Yang Q, Kharlamov E, Tang J (2020) Graph random neural networks for semi-supervised learning on graphs. NeurIPS 33:22092\u201322103","journal-title":"NeurIPS"},{"issue":"455","key":"2207_CR19","doi-asserted-by":"publisher","first-page":"1077","DOI":"10.1198\/016214501753208735","volume":"96","author":"K Nowicki","year":"2001","unstructured":"Nowicki K, Snijders TAB (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96(455):1077\u20131087","journal-title":"J Am Stat Assoc"},{"issue":"395","key":"2207_CR20","doi-asserted-by":"publisher","first-page":"832","DOI":"10.1080\/01621459.1986.10478342","volume":"81","author":"O Frank","year":"1986","unstructured":"Frank O, Strauss D (1986) Markov graphs. J Am Stat Assoc 81(395):832\u2013842","journal-title":"J Am Stat Assoc"},{"key":"2207_CR21","unstructured":"Hamilton W.L, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584"},{"issue":"5","key":"2207_CR22","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1016\/0005-1098(78)90005-5","volume":"14","author":"J Rissanen","year":"1978","unstructured":"Rissanen J (1978) Modeling by shortest data description. Automatica 14(5):465\u2013471","journal-title":"Automatica"},{"key":"2207_CR23","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-99-1790-7","volume-title":"Learning with the minimum description length principle","author":"K Yamanishi","year":"2023","unstructured":"Yamanishi K (2023) Learning with the minimum description length principle. Springer, Berlin"},{"key":"2207_CR24","doi-asserted-by":"crossref","unstructured":"Li Y, Linchuan Xu KY (2023) GMMDA: Gaussian mixture modeling of graph in latent space for graph data augmentation. In: ICDM. IEEE","DOI":"10.21203\/rs.3.rs-3942311\/v1"},{"key":"2207_CR25","unstructured":"Zhang C, He Y, Cen Y, Hou Z, Tang J (2021) Improving the training of graph neural networks with consistency regularization. arXiv preprint arXiv:2112.04319"},{"key":"2207_CR26","first-page":"29350","volume":"35","author":"H Yue","year":"2022","unstructured":"Yue H, Zhang C, Zhang C, Liu H (2022) Label-invariant augmentation for semi-supervised graph classification. Adv Neural Inf Process Syst 35:29350\u201329361","journal-title":"Adv Neural Inf Process Syst"},{"key":"2207_CR27","volume-title":"Mixture models: inference and applications to clustering","author":"GJ McLachlan","year":"1988","unstructured":"McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. M. Dekker, New York"},{"key":"2207_CR28","doi-asserted-by":"crossref","unstructured":"Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q (2015) Line: large-scale information network embedding. In: TheWebConf. IW3C2, pp 1067\u20131077","DOI":"10.1145\/2736277.2741093"},{"key":"2207_CR29","doi-asserted-by":"crossref","unstructured":"Cavallari S, Zheng VW, Cai H, Chang KC-C, Cambria E (2017) Learning community embedding with community detection and node embedding on graphs. In: CIKM. ACM, pp 377\u2013386","DOI":"10.1145\/3132847.3132925"},{"key":"2207_CR30","doi-asserted-by":"crossref","unstructured":"Yang L, Cheung N-M, Li J, Fang J (2019) Deep clustering by Gaussian mixture variational autoencoders with graph embedding. In: ICCV. IEEE, pp 6440\u20136449","DOI":"10.1109\/ICCV.2019.00654"},{"key":"2207_CR31","doi-asserted-by":"crossref","unstructured":"Hui B, Zhu P, Hu Q (2020) Collaborative graph convolutional networks: unsupervised learning meets semi-supervised learning. In: AAAI, vol 34, pp 4215\u20134222","DOI":"10.1609\/aaai.v34i04.5843"},{"issue":"4","key":"2207_CR32","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1007\/s10618-019-00624-4","volume":"33","author":"K Yamanishi","year":"2019","unstructured":"Yamanishi K, Wu T, Sugawara S, Okada M (2019) The decomposed normalized maximum likelihood code-length criterion for selecting hierarchical latent variable models. Data Min Knowl Discov 33(4):1017\u20131058","journal-title":"Data Min Knowl Discov"},{"issue":"8","key":"2207_CR33","doi-asserted-by":"publisher","first-page":"997","DOI":"10.3390\/e23080997","volume":"23","author":"PT Hung","year":"2021","unstructured":"Hung PT, Yamanishi K (2021) Word2vec skip-gram dimensionality selection via sequential normalized maximum likelihood. Entropy 23(8):997","journal-title":"Entropy"},{"key":"2207_CR34","doi-asserted-by":"crossref","unstructured":"Fukushima S, Kanai R, Yamanishi K (2022) Graph summarization with latent variable probabilistic models. In: Complex networks & their applications X. Springer, pp 428\u2013440","DOI":"10.1007\/978-3-030-93413-2_36"},{"issue":"1","key":"2207_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-019-0197-0","volume":"6","author":"C Shorten","year":"2019","unstructured":"Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6(1):1\u201348","journal-title":"J Big Data"},{"key":"2207_CR36","unstructured":"Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907"},{"key":"2207_CR37","unstructured":"Veli\u010dkovi\u0107 P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. arXiv preprint arXiv:1710.10903"},{"key":"2207_CR38","unstructured":"Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. In: ICML. PMLR, pp 1263\u20131272"},{"key":"2207_CR39","unstructured":"Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. NeurIPS, vol 30"},{"key":"2207_CR40","volume-title":"Pattern recognition and machine learning","author":"CM Bishop","year":"2006","unstructured":"Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 4. Springer, Berlin"},{"key":"2207_CR41","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. NeurIPS, vol 26"},{"key":"2207_CR42","unstructured":"Hirai S, Yamanishi K (2017) Upper bound on normalized maximum likelihood codes for gaussian mixture models. arXiv preprint arXiv:1709.00925"},{"issue":"1","key":"2207_CR43","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1109\/TNNLS.2020.2978386","volume":"32","author":"Z Wu","year":"2020","unstructured":"Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32(1):4\u201324","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"2207_CR44","doi-asserted-by":"crossref","unstructured":"Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model CNNs. In: CVPR. IEEE, pp 5115\u20135124","DOI":"10.1109\/CVPR.2017.576"},{"key":"2207_CR45","unstructured":"Shchur O, Mumme M, Bojchevski A, G\u00fcnnemann S (2018) Pitfalls of graph neural network evaluation. Preprint arXiv:1811.05868"},{"issue":"11","key":"2207_CR46","first-page":"2579","volume":"9","author":"L Maaten","year":"2008","unstructured":"Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579\u20132605","journal-title":"J Mach Learn Res"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-024-02207-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-024-02207-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-024-02207-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T08:08:33Z","timestamp":1729757313000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-024-02207-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,29]]},"references-count":46,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["2207"],"URL":"https:\/\/doi.org\/10.1007\/s10115-024-02207-2","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3942311\/v1","asserted-by":"object"}]},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,8,29]]},"assertion":[{"value":"9 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 June 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 August 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 August 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}