{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T06:13:07Z","timestamp":1774678387835,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2017,12,13]],"date-time":"2017-12-13T00:00:00Z","timestamp":1513123200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"The Polish National Science Centre","award":["DEC-2016\/21\/D\/ST6\/02408"],"award-info":[{"award-number":["DEC-2016\/21\/D\/ST6\/02408"]}]},{"name":"European Union\u2019s Horizon 2020","award":["691152"],"award-info":[{"award-number":["691152"]}]},{"name":"Polish Ministry of Science and Higher Education","award":["3628\/H2020\/2016\/2"],"award-info":[{"award-number":["3628\/H2020\/2016\/2"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>[-15]With the growing use of popular social media services like Facebook and Twitter it is challenging to collect all content from the networks without access to the core infrastructure or paying for it. Thus, if all content cannot be collected one must consider which data are of most importance. In this work we present a novel User-guided Social Media Crawling method (USMC) that is able to collect data from social media, utilizing the wisdom of the crowd to decide the order in which user generated content should be collected to cover as many user interactions as possible. USMC is validated by crawling 160 public Facebook pages, containing content from 368 million users including 1.3 billion interactions, and it is compared with two other crawling methods. The results show that it is possible to cover approximately 75% of the interactions on a Facebook page by sampling just 20% of its posts, and at the same time reduce the crawling time by 53%. In addition, the social network constructed from the 20% sample contains more than 75% of the users and edges compared to the social network created from all posts, and it has similar degree distribution.<\/jats:p>","DOI":"10.3390\/e19120686","type":"journal-article","created":{"date-parts":[[2017,12,14]],"date-time":"2017-12-14T04:30:55Z","timestamp":1513225855000},"page":"686","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Do We Really Need to Catch Them All? A New User-Guided Social Media Crawling Method"],"prefix":"10.3390","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3219-9598","authenticated-orcid":false,"given":"Fredrik","family":"Erlandsson","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6474-0089","authenticated-orcid":false,"given":"Piotr","family":"Br\u00f3dka","sequence":"additional","affiliation":[{"name":"Department of Computational Intelligence, Wroc\u0142aw University of Science and Technology, 50-370 Wroc\u0142aw, Poland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9316-4842","authenticated-orcid":false,"given":"Martin","family":"Boldt","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Henric","family":"Johnson","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,12,13]]},"reference":[{"key":"ref_1","unstructured":"(2017, December 12). Twitter, Company | About. Available online: https:\/\/about.twitter.com\/company\/."},{"key":"ref_2","unstructured":"(2017, October 03). Facebook, Company Info | Facebook Newsroom. Available online: http:\/\/newsroom.fb.com\/company-info\/."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Erlandsson, F., Nia, R., Boldt, M., Johnson, H., and Wu, S.F. (2015, January 21\u201322). Crawling Online Social Networks. Proceedings of the Network Intelligence Conference, Karlskrona, Sweden.","DOI":"10.1109\/ENIC.2015.10"},{"key":"ref_4","unstructured":"Erlandsson, F., and Wu, F.S. (2017, December 12). SocialCrawler 2.9. Available online: https:\/\/doi.org\/10.5281\/zenodo.153825."},{"key":"ref_5","unstructured":"Walpole, R., Myers, R., Sharon, M., and Ye, K. (2012). Probability & Statistics\u2014For Engineers and Scientists, Pearson, Cambridge University Press."},{"key":"ref_6","unstructured":"Sheskin, D.J. (2011). Handbook of Parametric and Nonparametric Statistical Procedures, Chapman & Hall\/CRC. [5th ed.]."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zafarani, R., Abbasi, M.A., and Liu, H. (2014). Social Media Mining, An Introduction, Cambridge University Press.","DOI":"10.1017\/CBO9781139088510"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/j.bushor.2011.01.005","article-title":"Social media? Get serious! Understanding the functional building blocks of social media","volume":"54","author":"Kietzmann","year":"2011","journal-title":"Bus. Horiz."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Nia, R., Erlandsson, F., Johnson, H., and Wu, S.F. (2013, January 8\u201311). Leveraging social interactions to suggest friends. Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops (ICDCSW), Philadelphia, PA, USA.","DOI":"10.1109\/ICDCSW.2013.93"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Erlandsson, F., Borg, A., Johnson, H., and Br\u00f3dka, P. (2016). Predicting User Participation in Social Media. Advances in Network Science, Springer International Publishing.","DOI":"10.1007\/978-3-319-28361-6_10"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Erlandsson, F., Br\u00f3dka, P., Borg, A., and Johnson, H. (2016). Finding Influential Users in Social Media Using Association Rule Learning. Entropy, 18.","DOI":"10.3390\/e18050164"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Agichtein, E., Castillo, C., Donato, D., Gionis, A., and Mishne, G. (2008, January 11\u201312). Finding high-quality content in social media. Proceedings of the 2008 international conference on web search and data mining, Palo Alto, CA, USA.","DOI":"10.1145\/1341531.1341557"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., and Bhattacharjee, B. (2007, January 23\u201326). Measurement and analysis of online social networks. Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, San Diego, CA, USA.","DOI":"10.1145\/1298306.1298311"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Gjoka, M., Kurant, M., Butts, C.T., and Markopoulou, A. (2010, January 14\u201319). Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. Proceedings of the 2010 IEEE Conference on Computer Communications (INFOCOM), San Diego, CA, USA.","DOI":"10.1109\/INFCOM.2010.5462078"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1893","DOI":"10.1109\/JSAC.2011.111012","article-title":"Multigraph Sampling of Online Social Networks","volume":"29","author":"Gjoka","year":"2011","journal-title":"IEEE J. Sel. Areas Commun."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Leskovec, J., and Faloutsos, C. (2006, January 20\u201323). Sampling from large graphs. Proceedings of the 12th ACM SIGKDD International Conference, New York, NY, USA.","DOI":"10.1145\/1150402.1150479"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, X., Ma, R.T.B., Xu, Y., and Li, Z. (May, January 26). Sampling online social networks via heterogeneous statistics. Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China.","DOI":"10.1109\/INFOCOM.2015.7218649"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1007\/s13278-016-0371-8","article-title":"Sampling algorithms for weighted networks","volume":"6","author":"Rezvanian","year":"2016","journal-title":"Soc. Netw. Anal. Min."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chiericetti, F., Dasgupta, A., Kumar, R., Lattanzi, S., and Sarl\u00f3s, T. (2016, January 11\u201315). On Sampling Nodes in a Network. Proceedings of the 25th International Conference on World Wide Web (WWW\u2019 16), Montr\u00e9al, QC, Canada.","DOI":"10.1145\/2872427.2883045"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Catanese, S.A., de Meo, P., Ferrara, E., Fiumara, G., and Provetti, A. (2011, January 25\u201327). Crawling Facebook for social network analysis purposes. Proceedings of the International Conference on Web Intelligence, Mining and Semantics, Sogndal, Norway.","DOI":"10.1145\/1988688.1988749"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1145\/2382616.2382620","article-title":"Beyond social graphs: User interactions in online social networks and their implications","volume":"6","author":"Wilson","year":"2012","journal-title":"ACM Trans. Web"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1016\/j.socnet.2013.12.002","article-title":"Visualization techniques for categorical analysis of social networks with multiple edge sets","volume":"37","author":"Crnovrsanin","year":"2014","journal-title":"Soc. Netw."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/j.ins.2013.08.046","article-title":"Moving from social networks to social internetworking scenarios: The crawling perspective","volume":"256","author":"Buccafurri","year":"2014","journal-title":"Inf. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Nia, R., Erlandsson, F., Bhattacharyya, P., Rahman, M.R., Johnson, H., and Wu, S.F. (2012, January 14\u201316). Sin: A platform to make interactions in social networks accessible. Proceedings of the 2012 International Conference on Social Informatics (SocialInformatics), Lausanne, Switzerland.","DOI":"10.1109\/SocialInformatics.2012.29"},{"key":"ref_25","unstructured":"Davidson, R., and MacKinnon, J.G. (2004). Econometric Theory and Methods, Oxford University Press."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Erlandsson, F. (2017, December 12). Replication Data for: Do We Really Need to Catch Them All? A New User-Guided Social Media Crawling Method. Available online: http:\/\/dx.doi.org\/10.7910\/DVN\/DCBDEP.","DOI":"10.3390\/e19120686"},{"key":"ref_27","unstructured":"Facebook (2017, December 12). Facebook Data Policy. Available online: https:\/\/www.facebook.com\/full_data_use_policy."},{"key":"ref_28","unstructured":"Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences, Academic Press. [revised ed.]."},{"key":"ref_29","unstructured":"Safko, L. (2012). The Social Media Bible: Tactics, Tools, and Strategies for Business Success, John Wiley & Sons."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"30","DOI":"10.3390\/e17053053","article-title":"Predicting Community Evolution in Social Networks","volume":"17","author":"Saganowski","year":"2015","journal-title":"Entropy"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/19\/12\/686\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:53:51Z","timestamp":1760208831000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/19\/12\/686"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,13]]},"references-count":30,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2017,12]]}},"alternative-id":["e19120686"],"URL":"https:\/\/doi.org\/10.3390\/e19120686","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12,13]]}}}