{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T05:09:38Z","timestamp":1754111378823,"version":"3.37.3"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,10,28]],"date-time":"2020-10-28T00:00:00Z","timestamp":1603843200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,28]],"date-time":"2020-10-28T00:00:00Z","timestamp":1603843200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Max-Planck-Gesellschaft","award":["1"],"award-info":[{"award-number":["1"]}]},{"name":"Projekt DEAL"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Netw Sci"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We perform an extensive analysis of how sampling impacts the estimate of several relevant network measures. In particular, we focus on how a sampling strategy optimized to recover a particular spectral centrality measure impacts other topological quantities. Our goal is on one hand to extend the analysis of the behavior of TCEC (Ruggeri and De Bacco, in: Cherifi, Gaito, Mendes, Moro, Rocha (eds) Complex networks and their applications VIII, Springer, Cham, pp 90\u2013101, 2020), a theoretically-grounded sampling method for eigenvector centrality estimation. On the other hand, to demonstrate more broadly how sampling can impact the estimation of relevant network properties like centrality measures different than the one aimed at optimizing, community structure and node attribute distribution. In addition, we analyze sampling behaviors in various instances of network generative models. Finally, we adapt the theoretical framework behind TCEC for the case of PageRank centrality and propose a sampling algorithm aimed at optimizing its estimation. We show that, while the theoretical derivation can be suitably adapted to cover this case, the resulting algorithm suffers of a high computational complexity that requires further approximations compared to the eigenvector centrality case. Main contributions (a) Extensive empirical analysis of the impact of the TCEC\u00a0sampling method (optimized for eigenvector centrality recovery) on different centrality measures, community structure, node attributes and statistics related to specific network generative models; (b) extending TCEC\u00a0to optimize PageRank estimation.<\/jats:p>","DOI":"10.1007\/s41109-020-00324-9","type":"journal-article","created":{"date-parts":[[2020,10,28]],"date-time":"2020-10-28T13:03:05Z","timestamp":1603890185000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Sampling on networks: estimating spectral centrality measures and their impact in evaluating other relevant network measures"],"prefix":"10.1007","volume":"5","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6847-9001","authenticated-orcid":false,"given":"Nicol\u00f2","family":"Ruggeri","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Caterina","family":"De Bacco","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,10,28]]},"reference":[{"key":"324_CR1","doi-asserted-by":"crossref","unstructured":"Adler M, Mitzenmacher M (2001) Towards compressing web graphs. In: Proceedings DCC 2001. Data compression conference. IEEE, pp 203\u2013212","DOI":"10.1109\/DCC.2001.917151"},{"key":"324_CR2","unstructured":"Ahmed NK, Neville J, Kompella R (2012) Network sampling designs for relational classification. In: Sixth international AAAI conference on weblogs and social media"},{"key":"324_CR3","unstructured":"Antunes N, Bhamidi S, Guo T, Pipiras V, Wang B (2018) Sampling-based estimation of in-degree distribution with applications to directed complex networks. arXiv preprint arXiv:1810.01300"},{"issue":"7","key":"324_CR4","doi-asserted-by":"crossref","first-page":"8260","DOI":"10.1126\/sciadv.aar8260","volume":"4","author":"C De Bacco","year":"2018","unstructured":"De Bacco C, Larremore DB, Moore C (2018) A physical model for efficient ranking in networks. Sci Adv 4(7):8260","journal-title":"Sci Adv"},{"key":"324_CR5","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.physa.2017.02.048","volume":"477","author":"N Blagus","year":"2017","unstructured":"Blagus N, \u0160ubelj L, Bajec M (2017) Empirical comparison of network sampling: how to choose the most appropriate method? Physica A 477:136\u2013148","journal-title":"Physica A"},{"issue":"1","key":"324_CR6","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1080\/0022250X.1972.9989806","volume":"2","author":"P Bonacich","year":"1972","unstructured":"Bonacich P (1972) Factoring and weighting approaches to status scores and clique identification. J Math Sociol 2(1):113\u2013120","journal-title":"J Math Sociol"},{"issue":"1\u20137","key":"324_CR7","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1016\/S0169-7552(98)00110-X","volume":"30","author":"S Brin","year":"1998","unstructured":"Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1\u20137):107\u2013117","journal-title":"Comput Netw ISDN Syst"},{"key":"324_CR8","unstructured":"Chen Y-Y, Gan Q, Suel T (2004) Local methods for estimating pagerank values. In: Proceedings of the thirteenth ACM international conference on information and knowledge management. ACM, pp 381\u2013389"},{"key":"324_CR9","doi-asserted-by":"crossref","unstructured":"Contisciani M, Power E, De\u00a0Bacco C (2020) Community detection with node attributes in multilayer networks. arXiv preprint arXiv:2004.09160","DOI":"10.1038\/s41598-020-72626-y"},{"issue":"4","key":"324_CR10","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1016\/S0378-8733(03)00012-1","volume":"25","author":"E Costenbader","year":"2003","unstructured":"Costenbader E, Valente TW (2003) The stability of centrality measures when networks are sampled. Soc Netw 25(4):283\u2013307","journal-title":"Soc Netw"},{"key":"324_CR11","unstructured":"Davis JV, Dhillon IS (2006) Estimating the global pagerank of web communities. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 116\u2013125"},{"key":"324_CR12","doi-asserted-by":"crossref","unstructured":"De\u00a0Choudhury M, Lin Y-R, Sundaram H, Candan KS, Xie L, Kelliher A (2010) How does the data sampling strategy impact the discovery of information diffusion in social media? In: Fourth international AAAI conference on weblogs and social media","DOI":"10.1609\/icwsm.v4i1.14024"},{"key":"324_CR13","first-page":"1277","volume":"2018","author":"L Esp\u00edn-Noboa","year":"2018","unstructured":"Esp\u00edn-Noboa L, Wagner C, Karimi F, Lerman K (2018) Towards quantifying sampling bias in network inference. Companion Proc Web Conf 2018:1277\u20131285","journal-title":"Companion Proc Web Conf"},{"key":"324_CR14","doi-asserted-by":"crossref","unstructured":"Frank O (2005) Network sampling and model fitting. Models and methods in social network analysis, pp 31\u201356","DOI":"10.1017\/CBO9780511811395.003"},{"key":"324_CR15","unstructured":"Ganguly, A., Kolaczyk, E.D (2018) Estimation of vertex degrees in a sampled network. In: 2017 51st asilomar conference on signals, systems, and computers. IEEE, pp 967\u2013974"},{"key":"324_CR16","doi-asserted-by":"crossref","unstructured":"Gjoka M, Kurant M, Butts CT, Markopoulou A (2010) Walking in Facebook: a case study of unbiased sampling of OSNS. In: 2010 Proceedings IEEE Infocom. IEEE, pp 1\u20139","DOI":"10.1109\/INFCOM.2010.5462078"},{"key":"324_CR17","doi-asserted-by":"crossref","unstructured":"Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855\u2013864","DOI":"10.1145\/2939672.2939754"},{"issue":"7","key":"324_CR18","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1038\/nbt1116","volume":"23","author":"J-DJ Han","year":"2005","unstructured":"Han J-DJ, Dupuy D, Bertin N, Cusick ME, Vidal M (2005) Effect of sampling on topology predictions of protein-protein interaction networks. Nat Biotechnol 23(7):839","journal-title":"Nat Biotechnol"},{"issue":"1","key":"324_CR19","doi-asserted-by":"crossref","first-page":"25","DOI":"10.9708\/jksci.2016.21.1.025","volume":"21","author":"C-G Han","year":"2016","unstructured":"Han C-G, Lee S-H (2016) Analysis of effect of an additional edge on eigenvector centrality of graph. J Korea Soc Comput Inf 21(1):25\u201331","journal-title":"J Korea Soc Comput Inf"},{"key":"324_CR20","doi-asserted-by":"crossref","unstructured":"He Y, Wai H-T (2020) Estimating centrality blindly from low-pass filtered graph signals. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5330\u20135334","DOI":"10.1109\/ICASSP40776.2020.9053437"},{"issue":"2","key":"324_CR21","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/0378-8733(83)90021-7","volume":"5","author":"PW Holland","year":"1983","unstructured":"Holland PW, Laskey KB, Leinhardt S (1983) Stochastic blockmodels: first steps. Soc Netw 5(2):109\u2013137","journal-title":"Soc Netw"},{"issue":"373","key":"324_CR22","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1080\/01621459.1981.10477598","volume":"76","author":"PW Holland","year":"1981","unstructured":"Holland PW, Leinhardt S (1981) An exponential family of probability distributions for directed graphs. J Am Stat Assoc 76(373):33\u201350","journal-title":"J Am Stat Assoc"},{"key":"324_CR23","doi-asserted-by":"crossref","unstructured":"H\u00fcbler C, Kriegel H-P, Borgwardt K, Ghahramani Z (2008) Metropolis algorithms for representative subgraph sampling. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 283\u2013292","DOI":"10.1109\/ICDM.2008.124"},{"key":"324_CR24","doi-asserted-by":"crossref","unstructured":"Kamvar SD, Haveliwala TH, Manning CD, Golub GH (2003) Extrapolation methods for accelerating pagerank computations. In: Proceedings of the 12th international conference on world wide web, pp 261\u2013270","DOI":"10.1145\/775152.775190"},{"issue":"1","key":"324_CR25","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1007\/BF02289026","volume":"18","author":"L Katz","year":"1953","unstructured":"Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39\u201343","journal-title":"Psychometrika"},{"key":"324_CR26","unstructured":"Kendall MG (1990) Rank correlation methods, 5th edn.\u00a0A Charles Griffin Title.\u00a0 https:\/\/www.bibsonomy.org\/bibtex\/2b5c89320f7c7f43cf6d7865d19a1a02c\/asalber"},{"issue":"3","key":"324_CR27","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.socnet.2005.07.002","volume":"28","author":"G Kossinets","year":"2006","unstructured":"Kossinets G (2006) Effects of missing data in social networks. Soc Netw 28(3):247\u2013268","journal-title":"Soc Netw"},{"issue":"1","key":"324_CR28","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1214\/aoms\/1177729694","volume":"22","author":"S Kullback","year":"1951","unstructured":"Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79\u201386","journal-title":"Ann Math Stat"},{"key":"324_CR29","doi-asserted-by":"crossref","unstructured":"Kunegis J (2013) Konect: the Koblenz network collection. In: Proceedings of the 22nd international conference on world wide web, pp 1343\u20131350","DOI":"10.1145\/2487788.2488173"},{"issue":"10","key":"324_CR30","doi-asserted-by":"crossref","first-page":"1078","DOI":"10.1038\/s41562-019-0677-4","volume":"3","author":"E Lee","year":"2019","unstructured":"Lee E, Karimi F, Wagner C, Jo H-H, Strohmaier M, Galesic M (2019) Homophily and minority-group size explain perception biases in social networks. Nat Hum Behav 3(10):1078\u20131087","journal-title":"Nat Hum Behav"},{"issue":"1","key":"324_CR31","doi-asserted-by":"crossref","first-page":"016102","DOI":"10.1103\/PhysRevE.73.016102","volume":"73","author":"SH Lee","year":"2006","unstructured":"Lee SH, Kim P-J, Jeong H (2006) Statistical properties of sampled networks. Phys Rev E 73(1):016102","journal-title":"Phys Rev E"},{"issue":"1","key":"324_CR32","first-page":"29","volume":"6","author":"J Leskovec","year":"2009","unstructured":"Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Int Math 6(1):29\u2013123","journal-title":"Int Math"},{"key":"324_CR33","doi-asserted-by":"crossref","unstructured":"Leskovec J, Faloutsos C (2006) Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 631\u2013636","DOI":"10.1145\/1150402.1150479"},{"key":"324_CR34","unstructured":"Leskovec J, Krevl A (2014) SNAP datasets: Stanford large network dataset collection. http:\/\/snap.stanford.edu\/data"},{"key":"324_CR35","unstructured":"Lin M, Li W, Nguyen C-t, Wang X, Lu S (2019) Sampling based Katz centrality estimation for large-scale social networks. In: International conference on algorithms and architectures for parallel processing. Springer, pp 584\u2013598"},{"key":"324_CR36","doi-asserted-by":"crossref","unstructured":"Maiya AS, Berger-Wolf TY (2010) Sampling community structure. In: Proceedings of the 19th international conference on world wide web. ACM, pp 701\u2013710","DOI":"10.1145\/1772690.1772762"},{"issue":"4","key":"324_CR37","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/S0378-8733(01)00042-9","volume":"23","author":"J Moody","year":"2001","unstructured":"Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23(4):261\u2013283","journal-title":"Soc Netw"},{"key":"324_CR38","unstructured":"Morstatter F, Pfeffer J, Liu H, Carley KM (2013) Is the sample good enough? Comparing data from twitter\u2019s streaming API with twitter\u2019s firehose. In: Seventh international AAAI conference on weblogs and social media"},{"key":"324_CR39","doi-asserted-by":"crossref","unstructured":"Murai S, Yoshida Y (2019) Sensitivity analysis of centralities on unweighted networks. In: The world wide web conference. ACM, pp 1332\u20131342","DOI":"10.1145\/3308558.3313422"},{"issue":"6","key":"324_CR40","doi-asserted-by":"crossref","first-page":"066117","DOI":"10.1103\/PhysRevE.70.066117","volume":"70","author":"J Park","year":"2004","unstructured":"Park J, Newman ME (2004) Statistical mechanics of networks. Phys Rev E 70(6):066117","journal-title":"Phys Rev E"},{"key":"324_CR41","unstructured":"Roddenberry TM, Segarra S (2019) Blind inference of centrality rankings from graph signals. arXiv preprint arXiv:1910.10846"},{"key":"324_CR42","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1007\/978-3-030-36687-2_8","volume-title":"Complex networks and their applications VIII","author":"N Ruggeri","year":"2020","unstructured":"Ruggeri N, De Bacco C (2020) Sampling on networks: estimating eigenvector centrality on incomplete networks. In: Cherifi H, Gaito S, Mendes JF, Moro E, Rocha LM (eds) Complex networks and their applications VIII. Springer, Cham, pp 90\u2013101"},{"key":"324_CR43","doi-asserted-by":"crossref","unstructured":"Sadikov E, Medina M, Leskovec J, Garcia-Molina H (2011) Correcting for missing data in information cascades. In: Proceedings of the fourth ACM international conference on web search and data mining. ACM, pp 55\u201364","DOI":"10.1145\/1935826.1935844"},{"key":"324_CR44","doi-asserted-by":"crossref","unstructured":"Sakakura Y, Yamaguchi Y, Amagasa T, Kitagawa H (2014) An improved method for efficient pagerank estimation. In: International conference on database and expert systems applications. Springer, pp 208\u2013222","DOI":"10.1007\/978-3-319-10085-2_19"},{"issue":"3","key":"324_CR45","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1109\/TSP.2015.2486740","volume":"64","author":"S Segarra","year":"2015","unstructured":"Segarra S, Ribeiro A (2015) Stability and continuity of centrality measures in weighted graphs. IEEE Trans Signal Process 64(3):543\u2013555","journal-title":"IEEE Trans Signal Process"},{"issue":"1","key":"324_CR46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-016-0028-x","volume":"7","author":"H Shao","year":"2017","unstructured":"Shao H, Mesbahi M, Li D, Xi Y (2017) Inferring centrality from network snapshots. Sci Rep 7(1):1\u201313","journal-title":"Sci Rep"},{"issue":"3","key":"324_CR47","doi-asserted-by":"crossref","first-page":"036118","DOI":"10.1103\/PhysRevE.72.036118","volume":"72","author":"MP Stumpf","year":"2005","unstructured":"Stumpf MP, Wiuf C (2005) Sampling properties of random graphs: the degree distribution. Phys Rev E 72(3):036118","journal-title":"Phys Rev E"},{"issue":"2","key":"324_CR48","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1109\/TNET.2008.2001730","volume":"17","author":"D Stutzbach","year":"2009","unstructured":"Stutzbach D, Rejaie R, Duffield N, Sen S, Willinger W (2009) On unbiased sampling for unstructured peer-to-peer networks. IEEE\/ACM Trans Netw TON 17(2):377\u2013390","journal-title":"IEEE\/ACM Trans Netw TON"},{"key":"324_CR49","unstructured":"Takac L, Zabovsky M (2012) Data analysis in public social networks. In: International scientific conference and international workshop present day trends of innovations, vol 1"},{"key":"324_CR50","doi-asserted-by":"crossref","unstructured":"Wagner C, Singer P, Karimi F, Pfeffer J, Strohmaier M (2017) Sampling from social networks with attributes. In: Proceedings of the 26th international conference on world wide web, pp 1181\u20131190","DOI":"10.1145\/3038912.3052665"},{"issue":"6684","key":"324_CR51","doi-asserted-by":"crossref","first-page":"440","DOI":"10.1038\/30918","volume":"393","author":"DJ Watts","year":"1998","unstructured":"Watts DJ, Strogatz SH (1998) Collective dynamics of \u2018small-world\u2019 networks. Nature 393(6684):440","journal-title":"Nature"},{"issue":"1","key":"324_CR52","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1145\/1052812.1052825","volume":"35","author":"B Zhang","year":"2005","unstructured":"Zhang B, Liu R, Massey D, Zhang L (2005) Collecting the internet as-level topology. ACM SIGCOMM Comput Commun Rev 35(1):53\u201361","journal-title":"ACM SIGCOMM Comput Commun Rev"}],"container-title":["Applied Network Science"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s41109-020-00324-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s41109-020-00324-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s41109-020-00324-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,25]],"date-time":"2022-11-25T05:00:36Z","timestamp":1669352436000},"score":1,"resource":{"primary":{"URL":"https:\/\/appliednetsci.springeropen.com\/articles\/10.1007\/s41109-020-00324-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,28]]},"references-count":52,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["324"],"URL":"https:\/\/doi.org\/10.1007\/s41109-020-00324-9","relation":{},"ISSN":["2364-8228"],"issn-type":[{"type":"electronic","value":"2364-8228"}],"subject":[],"published":{"date-parts":[[2020,10,28]]},"assertion":[{"value":"12 March 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 October 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 October 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"81"}}