{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,1]],"date-time":"2026-02-01T03:37:19Z","timestamp":1769917039966,"version":"3.49.0"},"reference-count":69,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,10,20]],"date-time":"2021-10-20T00:00:00Z","timestamp":1634688000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,10,20]],"date-time":"2021-10-20T00:00:00Z","timestamp":1634688000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100004270","name":"Royal Institute of Technology","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004270","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Soc. Netw. Anal. Min."],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Graph representation learning on dynamic graphs has become an important task on several real-world applications, such as recommender systems, email spam detection, and so on. To efficiently capture the evolution of a graph, representation learning approaches employ deep neural networks, with large amount of parameters to train. Due to the large model size, such approaches have high online inference latency. As a consequence, such models are challenging to deploy to an industrial setting with vast number of users\/nodes. In this study, we propose DynGKD, a distillation strategy to transfer the knowledge from a large teacher model to a small student model with low inference latency, while achieving high prediction accuracy. We first study different distillation loss functions to separately train the student model with various types of information from the teacher model. In addition, we propose a hybrid distillation strategy for evolving graph representation learning to combine the teacher\u2019s different types of information. Our experiments with five publicly available datasets demonstrate the superiority of our proposed model against several baselines, with average relative drop<jats:inline-formula><jats:alternatives><jats:tex-math>$$40.60\\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mn>40.60<\/mml:mn><mml:mo>%<\/mml:mo><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>in terms of RMSE in the link prediction task. Moreover, our DynGKD model achieves a compression ratio of 21:100, accelerating the inference latency with a speed up factor<jats:inline-formula><jats:alternatives><jats:tex-math>$$\\times 30$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\"><mml:mrow><mml:mo>\u00d7<\/mml:mo><mml:mn>30<\/mml:mn><\/mml:mrow><\/mml:math><\/jats:alternatives><\/jats:inline-formula>, when compared with the teacher model. For reproduction purposes, we make our datasets and implementation publicly available at<jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/stefanosantaris\/DynGKD\">https:\/\/github.com\/stefanosantaris\/DynGKD<\/jats:ext-link>.<\/jats:p>","DOI":"10.1007\/s13278-021-00816-1","type":"journal-article","created":{"date-parts":[[2021,10,20]],"date-time":"2021-10-20T18:05:59Z","timestamp":1634753159000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Knowledge distillation on neural networks for evolving graphs"],"prefix":"10.1007","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1135-8863","authenticated-orcid":false,"given":"Stefanos","family":"Antaris","sequence":"first","affiliation":[]},{"given":"Dimitrios","family":"Rafailidis","sequence":"additional","affiliation":[]},{"given":"Sarunas","family":"Girdzijauskas","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,10,20]]},"reference":[{"issue":"3","key":"816_CR1","doi-asserted-by":"publisher","first-page":"626","DOI":"10.1007\/s10618-014-0365-y","volume":"29","author":"L Akoglu","year":"2015","unstructured":"Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626\u2013688","journal-title":"Data Min Knowl Discov"},{"key":"816_CR2","doi-asserted-by":"crossref","unstructured":"Antaris S, Rafailidis D (2020) Distill2vec: dynamic graph representation learning with knowledge distillation. In: ASONAM","DOI":"10.1109\/ASONAM49781.2020.9381315"},{"key":"816_CR3","doi-asserted-by":"crossref","unstructured":"Antaris S, Rafailidis D (2020) Vstreamdrls: dynamic graph representation learning with self-attention for enterprise distributed video streaming solutions. In: ASONAM","DOI":"10.1109\/ASONAM49781.2020.9381430"},{"key":"816_CR4","doi-asserted-by":"crossref","unstructured":"Antaris S, Rafailidis D, Girdzijauskas S (2020) Egad: evolving graph representation learning with self-attention and knowledge distillation for live video streaming events","DOI":"10.1109\/BigData50022.2020.9378219"},{"key":"816_CR5","unstructured":"Asif U, Tang J, Harrer S (2020) Ensemble knowledge distillation for learning improved and efficient networks"},{"key":"816_CR6","unstructured":"Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in Neural Information Processing Systems, vol\u00a027. Curran Associates, Inc."},{"key":"816_CR7","unstructured":"Bresson X, Laurent T (2019) A two-step graph convolutional decoder for molecule generation. In: NeurIPS"},{"key":"816_CR8","doi-asserted-by":"crossref","unstructured":"Bucilua C, Caruana R, Niculescu-Mizil A (2006) Model compression. In: KDD, pp 535\u2013541","DOI":"10.1145\/1150402.1150464"},{"key":"816_CR9","doi-asserted-by":"crossref","unstructured":"Cao Y, Wang X, He X, Hu Z, Chua TS (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In: WWW, pp 151\u2013161","DOI":"10.1145\/3308558.3313705"},{"key":"816_CR10","doi-asserted-by":"crossref","unstructured":"Chang X, Liu X, Wen J, Li S, Fang Y, Song L, Qi Y (2020) Continuous-time dynamic graph learning via neural interaction processes, pp 145\u2013154","DOI":"10.1145\/3340531.3411946"},{"key":"816_CR11","doi-asserted-by":"crossref","unstructured":"Chen D, Mei JP, Zhang Y, Wang C, Wang Z, Feng Y, Chen C (2021) Cross-layer distillation with semantic calibration. In: AAAI, vol\u00a035, pp 7028\u20137036","DOI":"10.1609\/aaai.v35i8.16865"},{"key":"816_CR12","unstructured":"Chen G, Choi W, Yu X, Han T, Chandraker M (2017) Learning efficient object detection models with knowledge distillation. In: NeurIPS, pp 742\u2013751"},{"key":"816_CR13","doi-asserted-by":"crossref","unstructured":"Chen H, Perozzi B, Hu Y, Skiena S (2018) Harp: hierarchical representation learning for networks. In: Proceedings of the AAAI conference on artificial intelligence, vol\u00a032","DOI":"10.1609\/aaai.v32i1.11849"},{"issue":"1","key":"816_CR14","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1109\/TNNLS.2020.2970494","volume":"32","author":"H Chen","year":"2020","unstructured":"Chen H, Wang Y, Xu C, Xu C, Tao D (2020) Learning student networks via feature embedding. IEEE Trans Neural Netw Learn Syst 32(1):25\u201335","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"816_CR15","unstructured":"Chen J, Ma T, Xiao C (2018) Fastgcn: fast learning with graph convolutional networks via importance sampling. Preprint arXiv:1801.10247"},{"key":"816_CR16","doi-asserted-by":"crossref","unstructured":"Chen Y, Bian Y, Xiao X, Rong Y, Xu T, Huang J (2020) On self-distilling graph neural network. Preprint arXiv:2011.02255","DOI":"10.24963\/ijcai.2021\/314"},{"key":"816_CR17","unstructured":"Dai H, Wang Y, Trivedi R, Song L (2016) Deep coevolutionary network: embedding user and item features for recommendation. Preprint arXiv:1609.03675"},{"key":"816_CR18","doi-asserted-by":"crossref","unstructured":"Du L, Wang Y, Song G, Lu Z, Wang J (2018) Dynamic network embedding: an extended approach for skip-gram based network embedding. In: IJCAI, pp 2086\u20132092","DOI":"10.24963\/ijcai.2018\/288"},{"issue":"6","key":"816_CR19","doi-asserted-by":"publisher","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","volume":"129","author":"J Gou","year":"2021","unstructured":"Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Comput Vis 129(6):1789\u20131819","journal-title":"Int J Comput Vis"},{"key":"816_CR20","doi-asserted-by":"publisher","first-page":"104816","DOI":"10.1016\/j.knosys.2019.06.024","volume":"187","author":"P Goyal","year":"2020","unstructured":"Goyal P, Chhetri SR, Canedo A (2020) dyngraph2vec: capturing network dynamics using dynamic graph representation learning. Knowl-Based Syst 187:104816","journal-title":"Knowl-Based Syst"},{"key":"816_CR21","unstructured":"Goyal P, Kamra N, He X, Liu Y (2018) Dyngem: deep embedding method for dynamic graphs. Preprint arXiv:1805.11273"},{"key":"816_CR22","doi-asserted-by":"crossref","unstructured":"Grover A, Leskovec J (2016) Node2vec: scalable feature learning for networks. In: KDD, pp 855\u2013864","DOI":"10.1145\/2939672.2939754"},{"key":"816_CR23","doi-asserted-by":"crossref","unstructured":"Guo Q, Wang X, Wu Y, Yu Z, Liang D, Hu X, Luo P (2020) Online knowledge distillation via collaborative learning. In: CVPR, pp 11020\u201311029","DOI":"10.1109\/CVPR42600.2020.01103"},{"key":"816_CR24","unstructured":"Hamilton WL, Ying R, Leskovec J (2017a) Inductive representation learning on large graphs. In: NeurIPS, pp 1025\u20131035"},{"key":"816_CR25","unstructured":"Hamilton WL, Ying R, Leskovec J (2017b) Representation learning on graphs: methods and applications. Preprint arXiv:1709.05584"},{"key":"816_CR26","unstructured":"Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. In: NIPS"},{"key":"816_CR27","unstructured":"Huang Z, Wang N (2017) Like what you like: knowledge distill via neuron selectivity transfer. Preprint arXiv:1707.01219"},{"key":"816_CR28","doi-asserted-by":"crossref","unstructured":"Kim J, Hyun M, Chung I, Kwak N (2021) Feature fusion for online mutual knowledge distillation. In: ICPR, pp 4619\u20134625","DOI":"10.1109\/ICPR48806.2021.9412615"},{"key":"816_CR29","doi-asserted-by":"crossref","unstructured":"Kim Y, Rush AM (2016) Sequence-level knowledge distillation. In: EMNLP, pp 1317\u20131327","DOI":"10.18653\/v1\/D16-1139"},{"key":"816_CR30","unstructured":"Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR"},{"key":"816_CR31","unstructured":"Kipf TN, Welling M (2016) Variational graph auto-encoders. arXiv:abs\/1611.07308"},{"key":"816_CR32","unstructured":"Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: ICLR"},{"key":"816_CR33","doi-asserted-by":"crossref","unstructured":"Kumar S, Zhang X, Leskovec J (2019) Predicting dynamic embedding trajectory in temporal interaction networks. In: SIGKDD, pp 1269\u20131278","DOI":"10.1145\/3292500.3330895"},{"key":"816_CR34","unstructured":"Lee S, Song BC (2019) Graph-based knowledge distillation by multi-head attention network. Preprint arXiv:1907.02226"},{"key":"816_CR35","doi-asserted-by":"crossref","unstructured":"Li J, Dani H, Hu X, Tang J, Chang Y, Liu H (2017)Attributed network embedding for learning in a dynamic environment. In: CIKM, pp 387\u2013396","DOI":"10.1145\/3132847.3132919"},{"key":"816_CR36","doi-asserted-by":"crossref","unstructured":"Liu M, Gao H, Ji S (2020) Towards deeper graph neural networks. In: KDD, pp 338\u2013348","DOI":"10.1145\/3394486.3403076"},{"key":"816_CR37","doi-asserted-by":"crossref","unstructured":"Liu Y, Cao J, Li B, Yuan C, Hu W, Li Y, Duan Y (2019) Knowledge distillation via instance relationship graph. In: CVPR, pp 7096\u20137104","DOI":"10.1109\/CVPR.2019.00726"},{"key":"816_CR38","doi-asserted-by":"crossref","unstructured":"Liu Z, Huang C, Yu Y, Song P, Fan B, Dong J (2020) Dynamic representation learning for large-scale attributed networks. In: CIKM, pp 1005\u20131014","DOI":"10.1145\/3340531.3411945"},{"key":"816_CR39","unstructured":"Ma J, Mei Q (2019) Graph representation learning via multi-task knowledge distillation. Preprint arXiv:1911.05700"},{"key":"816_CR40","doi-asserted-by":"crossref","unstructured":"Mahdavi S, Khoshraftar S, An A (2020) Dynamic joint variational graph autoencoders. In: Machine learning and knowledge discovery in databases, pp 385\u2013401","DOI":"10.1007\/978-3-030-43823-4_32"},{"key":"816_CR41","doi-asserted-by":"crossref","unstructured":"Meng Z, Li J, Zhao Y, Gong Y (2019) Conditional teacher-student learning. In: ICASSP, pp 6445\u20136449","DOI":"10.1109\/ICASSP.2019.8683438"},{"key":"816_CR42","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, pp 3111\u20133119"},{"key":"816_CR43","doi-asserted-by":"crossref","unstructured":"Mirzadeh SI, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H (2020) Improved knowledge distillation via teacher assistant. In: AAAI, pp 5191\u20135198","DOI":"10.1609\/aaai.v34i04.5963"},{"key":"816_CR44","doi-asserted-by":"crossref","unstructured":"Nguyen GH, Lee JB, Rossi RA, Ahmed NK, Koh E, Kim S (2018) Continuous-time dynamic network embeddings. In: WWW, pp 969\u2013976","DOI":"10.1145\/3184558.3191526"},{"key":"816_CR45","doi-asserted-by":"crossref","unstructured":"Pareja A, Domeniconi G, Chen J, Ma T, Suzumura T, Kanezashi H, Kaler T, Schardl TB, Leiserson CE (2020) EvolveGCN: evolving graph convolutional networks for dynamic graphs. In: AAAI","DOI":"10.1609\/aaai.v34i04.5984"},{"key":"816_CR46","doi-asserted-by":"crossref","unstructured":"Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: KDD, pp 701\u2013710","DOI":"10.1145\/2623330.2623732"},{"key":"816_CR47","unstructured":"Phuong M, Lampert C (2019) Towards understanding knowledge distillation. In: ICML, pp 5142\u20135151"},{"key":"816_CR48","unstructured":"Qian Q, Li H, Hu J (2020) Efficient kernel transfer in knowledge distillation. arXiv:abs\/2009.14416"},{"key":"816_CR49","unstructured":"Qu M, Bengio Y, Tang J (2019) Gmnn: graph markov neural networks. In: ICML, pp 5241\u20135250"},{"key":"816_CR50","unstructured":"Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: ICLR"},{"key":"816_CR51","doi-asserted-by":"crossref","unstructured":"Roverso R, Reale R, El-Ansary S, Haridi S (2015) Smoothcache 2.0: Cdn-quality adaptive http live streaming on peer-to-peer overlays. In: MMSys, pp 61\u201372","DOI":"10.1145\/2713168.2713182"},{"key":"816_CR52","doi-asserted-by":"crossref","unstructured":"Sankar A, Wu Y, Gou L, Zhang W, Yang H (2020) Dysat: deep neural representation learning on dynamic graphs via self-attention networks. In: WSDM, pp 519\u2013527","DOI":"10.1145\/3336191.3371845"},{"key":"816_CR53","unstructured":"Sun L, Gou J, Yu B, Du L, Tao D (2021) Collaborative teacher-student learning via multiple knowledge transfer. Preprint arXiv:2101.08471"},{"key":"816_CR54","doi-asserted-by":"crossref","unstructured":"Tang J, Wang K (2018) Ranking distillation: learning compact ranking models with high performance for recommender system. In: KDD, pp 2289\u20132298","DOI":"10.1145\/3219819.3220021"},{"key":"816_CR55","unstructured":"Trivedi R, Farajtabar M, Biswal P, Zha H (2019) Dyrep: learning representations over dynamic graphs. In: ICLR"},{"key":"816_CR56","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser LU, Polosukhin I (2017) Attention is all you need. In: NeurIPS, vol\u00a030"},{"key":"816_CR57","unstructured":"Veli\u010dkovi\u0107 P, Cucurull G, Casanova A, Romero A, Li\u00f2 P, Bengio Y (2018) Graph attention networks. In: ICLR"},{"key":"816_CR58","doi-asserted-by":"crossref","unstructured":"Wang X, Bo D, Shi C, Fan S, Ye Y, Yu PS (2020) A survey on heterogeneous graph embedding: methods, techniques, applications and sources. Preprint arXiv:2011.14867","DOI":"10.1145\/3308558.3313562"},{"key":"816_CR59","unstructured":"Williams C, Seeger M (2001) Using the nystr\u00f6m method to speed up kernel machines. In: NeurIPS, vol\u00a013"},{"key":"816_CR60","doi-asserted-by":"crossref","unstructured":"Yang Y, Qiu J, Song M, Tao D, Wang X (2020) Distilling knowledge from graph convolutional networks. In: CVPR","DOI":"10.1109\/CVPR42600.2020.00710"},{"key":"816_CR61","doi-asserted-by":"crossref","unstructured":"Ying R, He R, Chen K, Eksombatchai P, Hamilton WL, Leskovec J (2018) Graph convolutional neural networks for web-scale recommender systems. In: KDD, pp 974\u2013983","DOI":"10.1145\/3219819.3219890"},{"key":"816_CR62","unstructured":"You J, Liu B, Ying R, Pande V, Leskovec J (2018) Graph convolutional policy network for goal-directed molecular graph generation. In: NeurIPS"},{"key":"816_CR63","unstructured":"Zagoruyko S, Komodakis N (2017) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR"},{"key":"816_CR64","unstructured":"Zhang M, Chen Y (2018) Link prediction based on graph neural networks. In: NeurIPS"},{"key":"816_CR66","doi-asserted-by":"crossref","unstructured":"Zhang Y, Zhang F, Yao P, Tang J (2018) Name disambiguation in aminer: clustering, maintenance, and human in the loop. In: KDD, pp 1002\u20131011","DOI":"10.1145\/3219819.3219859"},{"key":"816_CR65","doi-asserted-by":"crossref","unstructured":"Zhang Y, Pal S, Coates M, Ustebay D (2019) Bayesian graph convolutional neural networks for semi-supervised classification. In: AAAI, pp 5829\u20135836","DOI":"10.1609\/aaai.v33i01.33015829"},{"key":"816_CR67","doi-asserted-by":"crossref","unstructured":"Zhang Z, Bu J, Ester M, Zhang J, Yao C, Li Z, Wang C (2020) Learning temporal interaction graph embedding via coupled memory networks. In: WWW, pp 3049\u20133055","DOI":"10.1145\/3366423.3380076"},{"key":"816_CR68","doi-asserted-by":"crossref","unstructured":"Zhou G, Fan Y, Cui R, Bian W, Zhu X, Gai K (2018a) Rocket launching: a universal and efficient framework for training well-performing light net. In: AAAI","DOI":"10.1609\/aaai.v32i1.11601"},{"key":"816_CR69","doi-asserted-by":"crossref","unstructured":"Zhou L, Yang Y, Ren X, Wu F, Zhuang Y (2018b) Dynamic network embedding by modelling triadic closure process. In: AAAI","DOI":"10.1609\/aaai.v32i1.11257"}],"container-title":["Social Network Analysis and Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13278-021-00816-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13278-021-00816-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13278-021-00816-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,13]],"date-time":"2023-01-13T00:42:34Z","timestamp":1673570554000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13278-021-00816-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,20]]},"references-count":69,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["816"],"URL":"https:\/\/doi.org\/10.1007\/s13278-021-00816-1","relation":{},"ISSN":["1869-5450","1869-5469"],"issn-type":[{"value":"1869-5450","type":"print"},{"value":"1869-5469","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,20]]},"assertion":[{"value":"28 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 July 2021","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 July 2021","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 October 2021","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"100"}}