{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T09:29:53Z","timestamp":1777627793566,"version":"3.51.4"},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T00:00:00Z","timestamp":1717027200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Big Data"],"abstract":"<jats:p>Social media has profoundly changed our modes of self-expression, communication, and participation in public discourse, generating volumes of conversations and content that cover every aspect of our social lives. Social media platforms have thus become increasingly important as data sources to identify social trends and phenomena. In recent years, academics have steadily lost ground on access to social media data as technology companies have set more restrictions on Application Programming Interfaces (APIs) or entirely closed public APIs. This circumstance halts the work of many social scientists who have used such data to study issues of public good. We considered the viability of eight approaches for image-based social media data collection: data philanthropy organizations, data repositories, data donation, third-party data companies, homegrown tools, and various web scraping tools and scripts. This paper discusses the advantages and challenges of these approaches from literature and from the authors' experience. We conclude the paper by discussing mechanisms for improving social media data collection that will enable this future frontier of social science research.<\/jats:p>","DOI":"10.3389\/fdata.2024.1379921","type":"journal-article","created":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T05:13:53Z","timestamp":1717046033000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["From theory to practice: insights and hurdles in collecting social media data for social science research"],"prefix":"10.3389","volume":"7","author":[{"given":"Yan","family":"Chen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kate","family":"Sherren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kyung Young","family":"Lee","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lori","family":"McCay-Peet","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shan","family":"Xue","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Smit","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2024,5,30]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1007\/s10502-019-09325-9","article-title":"Social media data archives in an API-driven world","volume":"20","author":"Acker","year":"2020","journal-title":"Arch. Sci."},{"key":"B2","doi-asserted-by":"publisher","first-page":"509","DOI":"10.1126\/science.aaa1465","article-title":"Privacy and human behavior in the age of information","volume":"347","author":"Acquisti","year":"2015","journal-title":"Science"},{"key":"B3","unstructured":"Social media demographics to inform your brand's strategy in 2023\n            BarnhartB.\n          Sproutsocial.2023"},{"key":"B4","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/99608f92.9a36bdb6","article-title":"The lives and after lives of data","volume":"1","author":"Borgman","year":"2019","journal-title":"Hard. Data Sci. Rev."},{"key":"B5","doi-asserted-by":"publisher","first-page":"2058","DOI":"10.1177\/1461444820924622","article-title":"The practical and ethical challenges in acquiring and sharing digital trace data: negotiating public-private partnerships","volume":"22","author":"Breuer","year":"2020","journal-title":"New Media Soc."},{"key":"B6","doi-asserted-by":"publisher","first-page":"1544","DOI":"10.1080\/1369118X.2019.1637447","article-title":"After the \u2018APIcalypse': social media platforms and their fight against critical scholarly research","volume":"22","author":"Bruns","year":"2019","journal-title":"Inf. Commun. Soc."},{"key":"B7","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1016\/j.landurbplan.2017.07.004","article-title":"Using geo-tagged Instagram posts to reveal landscape values around current and proposed hydroelectric dams and their reservoirs","volume":"170","author":"Chen","year":"2018","journal-title":"Landsc. Urban Plan."},{"key":"B8","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1177\/14614448211038761","article-title":"Using social media images as data in social science research","volume":"24","author":"Chen","year":"2023","journal-title":"New Media Soc."},{"key":"B9","unstructured":"ConfessoreN.\n          Cambridge Analytica and Facebook: The scandal and the fallout so far.2018"},{"key":"B10","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1080\/13645579.2013.774185","article-title":"Digital social research, social media and the sociological imagination: Surrogacy, augmentation and re-orientation","volume":"16","author":"Edwards","year":"2013","journal-title":"Int. J. Soc. Res. Methodol."},{"key":"B11","unstructured":"Commission opens formal proceedings against X under the Digital Services Act.2023"},{"key":"B12","doi-asserted-by":"publisher","first-page":"665","DOI":"10.1080\/10584609.2018.1477506","article-title":"Computational research in the post-API age","volume":"35","author":"Freelon","year":"2018","journal-title":"Polit. Commun."},{"key":"B13","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1016\/j.gloenvcha.2019.02.003","article-title":"Passive crowdsourcing of social media in environmental research: a systematic map","volume":"55","author":"Ghermandi","year":"2019","journal-title":"Global Environ. Change"},{"key":"B14","unstructured":"arc298\/instagram-scraper2022"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.1109\/BigData59044.2023.10386212","article-title":"\u201cNatural language processing to understand human activities impacted by hydroelectric energy projects,\u201d","author":"Gone","year":"2023","journal-title":"2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy"},{"key":"B16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/01972243.2018.1542647","article-title":"An agnotological analysis of APIs: or, disconnectivity and the ideological limits of our knowledge of social media","volume":"35","author":"John","year":"2019","journal-title":"Inf. Soc."},{"key":"B17","unstructured":"FAQs: DSA data access for researchers. European Centre for Algorithmic Transparency.2023"},{"key":"B18","doi-asserted-by":"crossref","DOI":"10.1145\/2615569.2615685","article-title":"\u201cI always feel it must be great to be a hacker!\u201d","volume-title":"The Role of Interdisciplinary Work in Social Media Research","author":"Kinder-Kurlanda","year":"2014"},{"key":"B19","doi-asserted-by":"publisher","first-page":"509954","DOI":"10.3389\/fdata.2020.509954","article-title":"Perspective: acknowledging data work in the social media research lifecycle","volume":"3","author":"Kinder-Kurlanda","year":"2020","journal-title":"Front. Big Data"},{"key":"B20","doi-asserted-by":"publisher","first-page":"703","DOI":"10.1017\/S1049096519001021","article-title":"A new model for industry - academic partnerships","volume":"53","author":"King","year":"2020","journal-title":"Polit. Sci. Polit."},{"key":"B21","doi-asserted-by":"publisher","first-page":"721","DOI":"10.1126\/science.1167742","article-title":"Life in the network: the coming age of computational social science","volume":"323","author":"Lazer","year":"2009","journal-title":"Science"},{"key":"B22","unstructured":"Meta Content Library and API.2023"},{"key":"B23","unstructured":"Instagram Platform.2023"},{"key":"B24","first-page":"260","article-title":"\u201cGood data is critical data: an appeal for critical digital studies,\u201d","volume-title":"Good Data","author":"Poletti","year":"2019"},{"key":"B25","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1080\/13645579.2013.774172","article-title":"Reading the riots on Twitter: Methodological innovation for the analysis of big data","volume":"16","author":"Procter","year":"2013","journal-title":"Int. J. Soc. Res. Methodol."},{"key":"B26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.14763\/2020.4.1535","article-title":"Towards platform observability","volume":"9","author":"Rieder","year":"2020","journal-title":"Internet Policy Rev."},{"key":"B27","unstructured":"Sandvigv."},{"key":"B28","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1177\/0038038507080443","article-title":"The coming crisis of empirical sociology","volume":"41","author":"Savage","year":"2007","journal-title":"Sociology"},{"key":"B29","doi-asserted-by":"publisher","first-page":"00113921231203179","DOI":"10.1177\/00113921231203179","article-title":"Social media and social impact assessment: evolving methods in a shifting context","volume":"2023","author":"Sherren","year":"2023","journal-title":"Curr. Sociol."},{"key":"B30","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1109\/CLOUD.2013.131","article-title":"\u201cToward an ecosystem for precision sharing of segmented Big Data,\u201d","volume-title":"2013 IEEE Sixth International Conference on Cloud Computing","author":"Shtern","year":"2013"},{"key":"B31","first-page":"122","article-title":"\u201cSocial research and Big Data \u2013 the tension between opportunities and realities,\u201d","volume-title":"Internet Research Ethics","author":"Steen-Johnsen","year":"2015"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1177\/1747016117738559","article-title":"Mining social media data: how are research sponsors and researchers addressing the ethical challenges?","volume":"14","author":"Taylor","year":"2018","journal-title":"Res. Ethics"},{"key":"B33","unstructured":"Research API2023"},{"key":"B34","doi-asserted-by":"publisher","first-page":"266","DOI":"10.1080\/19312458.2022.2109608","article-title":"Promises and pitfalls of social media data donations","volume":"16","author":"Van Driel","year":"2022","journal-title":"Commun. Methods Measur."},{"key":"B35","unstructured":"VogusC.\n          Improving researcher access to digital data: A workshop report. Center for Democracy and Technology.2022"},{"key":"B36","unstructured":"WalkerS.\n          The complexity of collecting digital and social media data in ephemeral contexts.2017"},{"key":"B37","article-title":"\u201cUncovering the challenges in collection, sharing and documentation: the hidden data of social media research?\u201d","author":"Weller","year":"2015","journal-title":"2015 ICWSM Workshop"},{"key":"B38","unstructured":"About the Twitter API.2023"},{"key":"B39","unstructured":"Research under EU Digital Services Act.2024"},{"key":"B40","unstructured":"ZuckermanE.\n          When the internet becomes unknowable.2023"}],"container-title":["Frontiers in Big Data"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2024.1379921\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,30]],"date-time":"2024-05-30T05:14:08Z","timestamp":1717046048000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fdata.2024.1379921\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,30]]},"references-count":40,"alternative-id":["10.3389\/fdata.2024.1379921"],"URL":"https:\/\/doi.org\/10.3389\/fdata.2024.1379921","relation":{},"ISSN":["2624-909X"],"issn-type":[{"value":"2624-909X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,30]]},"article-number":"1379921"}}