{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,30]],"date-time":"2026-06-30T04:05:57Z","timestamp":1782792357801,"version":"3.54.5"},"reference-count":83,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T00:00:00Z","timestamp":1763596800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"},{"start":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T00:00:00Z","timestamp":1763596800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Big Data &amp; Society"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>Scientists across disciplines often use data from the internet to conduct research, generating valuable insights about human behavior. However, as generative artificial intelligence relying on massive text corpora becomes increasingly valuable, platforms have greatly restricted access to data through official channels. As a result, researchers will likely engage in more web scraping to collect data, introducing new challenges and concerns for researchers. This paper proposes a comprehensive framework for web scraping in social science research for U.S.-based researchers, examining the legal, ethical, institutional, and scientific factors that we recommend researchers consider when scraping the web. We present an overview of the current regulatory environment impacting when and how researchers can access, collect, store, and share data via scraping. We then provide researchers with recommendations to conduct scraping in a scientifically legitimate and ethical manner. We aim to equip researchers with the relevant information to mitigate risks and maximize the impact of their research amid this evolving data access landscape.<\/jats:p>","DOI":"10.1177\/20539517251381686","type":"journal-article","created":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T15:05:21Z","timestamp":1763651121000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":23,"title":["Web scraping for research: Legal, ethical, institutional, and scientific considerations"],"prefix":"10.1177","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1338-8054","authenticated-orcid":false,"given":"Megan A","family":"Brown","sequence":"first","affiliation":[{"name":"University of Michigan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-6516-9730","authenticated-orcid":false,"given":"Andrew","family":"Gruen","sequence":"additional","affiliation":[{"name":"Working Paper, LLC, New York, NY, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-0867-4771","authenticated-orcid":false,"given":"Gabe","family":"Maldoff","sequence":"additional","affiliation":[{"name":"University of Maine,"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Solomon","family":"Messing","sequence":"additional","affiliation":[{"name":"New York University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zeve","family":"Sanderson","sequence":"additional","affiliation":[{"name":"New York University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4229-4847","authenticated-orcid":false,"given":"Michael","family":"Zimmer","sequence":"additional","affiliation":[{"name":"Marquette University"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2025,11,20]]},"reference":[{"key":"e_1_3_4_2_1","unstructured":"18 U.S.C. \u00a71030(a)(2)(C). (n.d.) (Access to Computer Systems Without Authorization)."},{"key":"e_1_3_4_3_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06883-y"},{"key":"e_1_3_4_4_1","doi-asserted-by":"publisher","DOI":"10.1609\/icwsm.v14i1.7347"},{"key":"e_1_3_4_5_1","unstructured":"Berman v. Freedom Financial Network LLC. (2022) 30 F.4th. (9th Cir.)."},{"key":"e_1_3_4_6_1","doi-asserted-by":"publisher","DOI":"10.54501\/jots.v1i3.60"},{"key":"e_1_3_4_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512102"},{"key":"e_1_3_4_8_1","first-page":"146144482413039","article-title":"Making academia suck less: Supporting early career researchers studying harmful content online through a feminist ethics of care","author":"Brown MA","year":"2024","unstructured":"Brown MA, Lukito J, Pruden ML, et al. (2024) Making academia suck less: Supporting early career researchers studying harmful content online through a feminist ethics of care. New Media & Society 14614448241303999.","journal-title":"New Media & Society"},{"key":"e_1_3_4_9_1","unstructured":"Buchanan E Zimmer M (2021) Internet research ethics. The Stanford encyclopedia of philosophy. Retrieved 2021-01-12 from http:\/\/plato.stanford.edu\/entries\/ethics-internet-research\/."},{"key":"e_1_3_4_10_1","unstructured":"Clearview AI\u2014Facial Recognition. (2023) Retrieved 2023-06-21 from https:\/\/www.clearview.ai."},{"key":"e_1_3_4_11_1","unstructured":"Coalition for Independent Technology Research. (2023) Letter: Twitter\u2019s New API Plans Will Devastate Public Interest Research. Retrieved 2024-05-27 from https:\/\/independenttechresearch.org\/letter-twitters-new-api-plans-will-devastate-public-interest-research\/."},{"key":"e_1_3_4_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00146-021-01301-1"},{"key":"e_1_3_4_13_1","unstructured":"CrowdTangle. (2024) Important Update to CrowdTangle | March 2024 | CrowdTangle Help Center. Retrieved 2024-03-20 from http:\/\/help.crowdtangle.com\/en\/ articles\/9014544-important-update-to-crowdtangle-march-2024."},{"key":"e_1_3_4_14_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41562-023-01750-2"},{"key":"e_1_3_4_15_1","doi-asserted-by":"crossref","unstructured":"Demir N Gro\u00dfe-Kampmann M Urban T et al. (2022) Reproducibility and replicability of web measurement studies. In: Proceedings of the ACM Web Conference 2022 (pp. 533\u2013544).","DOI":"10.1145\/3485447.3512214"},{"key":"e_1_3_4_16_1","unstructured":"Directive (EU) 2019\/790 of the European Parliament and of the Council on Copyright and Related Rights in the Digital Single Market. (2019) https:\/\/eur-lex.europa.eu\/eli\/dir\/2019\/790\/oj. (OJ L 130 17.5.2019 p. 92\u2013125)."},{"key":"e_1_3_4_17_1","doi-asserted-by":"publisher","DOI":"10.17645\/pag.v10i1.4713"},{"key":"e_1_3_4_18_1","doi-asserted-by":"publisher","DOI":"10.14763\/2023.3.1722"},{"key":"e_1_3_4_19_1","volume-title":"Cyberspace and the Law of the Horse.","author":"Easterbrook FH","year":"1996","unstructured":"Easterbrook FH (1996) Cyberspace and the Law of the Horse. Vol 1996, pp. 207\u2013208. Chicago, IL: University of Chicago Legal Forum. https:\/\/chicagounbound.uchicago.edu\/uclf\/vol1996\/iss1\/7."},{"key":"e_1_3_4_20_1","unstructured":"European Data Protection Board. (2024 May 23) Report of the Work Undertaken by the ChatGPT Taskforce. Retrieved from https:\/\/www.edpb.europa.eu\/system\/files\/2024-05\/edpb_20240523_report_chatgpt_taskforce_en.pdf."},{"key":"e_1_3_4_21_1","unstructured":"European Data Protection Supervisor. (2020 Jan 6) A preliminary opinion on data protection and scientific research. Retrieved from https:\/\/www.edps.europa.eu\/sites\/default\/files\/publication\/20-01-06_opinion_research_en.pdf."},{"key":"e_1_3_4_22_1","unstructured":"European Digital Media Observatory. (2022 May 31) Report of the European digital media observatory\u2019s working group on platform-to-researcher data access Annex 4\u2014Compendium of EU Member State Laws. Retrieved from https:\/\/edmo.eu\/wp-content\/uploads\/2022\/02\/Report-of-the-European-Digital-Media-Observatorys-Working-Group-on-Platform-to-Researcher-Data-Access-2022.pdf."},{"key":"e_1_3_4_23_1","unstructured":"Facebook Inc. v. Power Ventures Inc. (2016) 844 F.3d 1058. (9th Cir.)."},{"key":"e_1_3_4_24_1","unstructured":"Faddoul M Chaslot G Farid H (2020) A longitudinal analysis of YouTube\u2019s promotion of conspiracy videos. arXiv. https:\/\/doi.org\/10.48550\/arXiv.2003.03318."},{"issue":"2","key":"e_1_3_4_25_1","first-page":"275","article-title":"Mapping of underdeveloped areas based on research frequency utilizing distributed web scraping and web GIS","volume":"2","author":"Fathoni AN","year":"2022","unstructured":"Fathoni AN, Priyawati D (2022) Mapping of underdeveloped areas based on research frequency utilizing distributed web scraping and web GIS. International Journal for Disaster and Development Interface 2(2): 275\u2013291.","journal-title":"International Journal for Disaster and Development Interface"},{"key":"e_1_3_4_26_1","unstructured":"Federal Trade Commission. (2024 March 4) FTC cracks down on mass data collectors: A closer look at avast X-Mode and InMarket. Retrieved from https:\/\/www.ftc.gov\/policy\/advocacy-research\/tech-at-ftc\/2024\/03\/ftc-cracks-down-mass-data-collectors-closer-look-avast-x-mode-inmarket."},{"key":"e_1_3_4_27_1","doi-asserted-by":"publisher","DOI":"10.1177\/2056305118763366"},{"key":"e_1_3_4_28_1","doi-asserted-by":"crossref","unstructured":"Fiesler C Wisniewski P Pater J et al. (2016) Exploring ethics and obligations for studying digital communities. In: GROUP \u201816: Proceedings of the 2016 ACM international conference on supporting group work pp.457\u2013460. https:\/\/doi.org\/10.1145\/2957276.2996293.","DOI":"10.1145\/2957276.2996293"},{"key":"e_1_3_4_29_1","doi-asserted-by":"crossref","unstructured":"Fiesler C Zimmer M Proferes N et al. (2024) Remember the human: A Systematic review of ethical considerations in reddit research. In: Proceedings of the ACM on human-computer interaction 8(GROUP). https:\/\/doi.org\/10.1145\/3633070.","DOI":"10.1145\/3633070"},{"key":"e_1_3_4_30_1","volume-title":"Internet Research: Ethical Guidelines 3.0","author":"Franzke AS","year":"2020","unstructured":"Franzke AS, Bechmann A, Zimmer M, et al. (2020) Internet Research: Ethical Guidelines 3.0. Association of Internet Researchers. https:\/\/aoir.org\/reports\/ethics3.pdf."},{"key":"e_1_3_4_31_1","unstructured":"Fung B (2023 Mar) DOJ will hire more data experts to scrutinize digital monopolies Antitrust chief says\u2014CNN business. Cable News Network. Retrieved from https:\/\/ www.cnn.com\/2023\/03\/06\/tech\/doj-data-experts\/index.html."},{"key":"e_1_3_4_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458723"},{"key":"e_1_3_4_33_1","doi-asserted-by":"publisher","DOI":"10.1001\/jama.1982.03330020041027"},{"key":"e_1_3_4_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.socnet.2014.01.004"},{"key":"e_1_3_4_35_1","unstructured":"Gray M (1995) Measuring the Growth of the Web. Retrieved 2023-06-21 from https:\/\/www.mit.edu\/people\/mkgray\/growth\/."},{"key":"e_1_3_4_36_1","unstructured":"Grynbaum MM Mac R (2023 December) The times sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work. The New York Times. Retrieved 2024-03-20 from https:\/\/www.nytimes.com\/2023\/12\/27\/business\/media\/new-york-times-open-ai-microsoft-lawsuit.html."},{"key":"e_1_3_4_37_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.24368"},{"key":"e_1_3_4_38_1","unstructured":"Herrman J Isaac M (2016) The online video view: We Can Count It but Can We Count on It? The New York Times. Retrieved from https:\/\/www.nytimes.com\/2016\/10\/03\/business\/media\/the-online-video-view-we-can-count-it-but-can-we-count-on-it.html (accessed 18 April 2024)."},{"key":"e_1_3_4_39_1","unstructured":"hiQ Labs Inc. v. LinkedIn Corporation. (2022) 31 F. 4th 1180. (9th Cir.)."},{"key":"e_1_3_4_40_1","unstructured":"ICPSR About the Organization. (n.d.) Inter-university consortium for political and social research. Retrieved 2023-05-18 from https:\/\/www.icpsr.umich.edu\/web\/pages\/about\/."},{"key":"e_1_3_4_41_1","unstructured":"iThenticate\u2014Plagiarism Checking for Academic Research\u2014Turnitin. (2023) Retrieved 2023-06-21 from https:\/\/www.turnitin.com\/products\/ithenticate."},{"key":"e_1_3_4_42_1","unstructured":"Kids Online Safety Act LYN22092 2SF. (2022) 117th Congress. Retrieved from https:\/\/www.blumenthal.senate.gov\/imo\/media\/doc\/kids_online_safety_act_-_bill_text.pdf."},{"key":"e_1_3_4_43_1","doi-asserted-by":"publisher","DOI":"10.1177\/20563051221144317"},{"key":"e_1_3_4_44_1","doi-asserted-by":"crossref","unstructured":"Koster M Illyes G Zeller H et al. (2022 September) Robots Exclusion Protocol (No. 9309). RFC 9309. RFC Editor. Retrieved from https:\/\/www.rfc-editor.org\/info\/rfc9309 DOI: 10.17487\/RFC9309.","DOI":"10.17487\/RFC9309"},{"key":"e_1_3_4_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3085682"},{"key":"e_1_3_4_46_1","doi-asserted-by":"publisher","DOI":"10.1002\/hec.4602"},{"key":"e_1_3_4_47_1","unstructured":"Levine S (2021) Letter from acting director of the bureau of consumer protection Samuel Levine to Facebook. Retrieved from https:\/\/www.ftc.gov\/blog-posts\/2021\/08\/letter-acting-director-bureau-consumer-protection-samuel-levine-facebook."},{"key":"e_1_3_4_48_1","unstructured":"Markham A Buchanan E (2012) Ethical decision-making and internet research: recommendations from the aoir ethics working committee (Version 2.0) (Tech. Rep.). Association of Internet Researchers. Retrieved from http:\/\/aoir.org\/reports\/ethics2.pdf."},{"key":"e_1_3_4_49_1","doi-asserted-by":"publisher","DOI":"10.1080\/17530350.2013.772070"},{"key":"e_1_3_4_50_1","doi-asserted-by":"publisher","DOI":"10.51685\/jqd.2023.022"},{"key":"e_1_3_4_51_1","unstructured":"MDDC About. (n.d.) Media and democracy data cooperative. Retrieved 2023-05-18 from https:\/\/mddatacoop.org\/about\/."},{"key":"e_1_3_4_52_1","unstructured":"Meta Platforms Inc. v. BrandTotal Ltd. (2022) 605 F.Supp.3d. (N.D. Cal.)."},{"key":"e_1_3_4_53_1","unstructured":"Meta Platforms Inc. v. Bright Data Ltd. (2024) 2024 WL 251406. (N.D. Cal. Jan. 23)."},{"key":"e_1_3_4_54_1","doi-asserted-by":"publisher","DOI":"10.1177\/2053951716650211"},{"key":"e_1_3_4_55_1","doi-asserted-by":"publisher","DOI":"10.1177\/20531680231187271"},{"key":"e_1_3_4_56_1","doi-asserted-by":"publisher","DOI":"10.1080\/19312458.2023.2181319"},{"key":"e_1_3_4_57_1","unstructured":"OpenAI\u2014ChatGPT. (2024) Retrieved 2024-09-10 from https:\/\/openai.com\/chatgpt\/."},{"key":"e_1_3_4_58_1","unstructured":"Ortutay B (2021) Facebook shuts out NYU academics\u2019 research on political ads. AP News. Retrieved from https:\/\/apnews.com\/article\/technology-business-5d3021ed9f193bf249c3af158b128d18."},{"key":"e_1_3_4_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492857"},{"key":"e_1_3_4_60_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2021.100336"},{"key":"e_1_3_4_61_1","unstructured":"PERVADE. (2023) PERVADE Data Ethics Tool. https:\/\/pervade.umd.edu\/pervade-data-%20ethics-tool\/ (accessed 23 May 2024)."},{"key":"e_1_3_4_62_1","unstructured":"Police Data Accessibility Project. (2023) Retrieved 2023-06-21 from https:\/\/www.pdap.io."},{"key":"e_1_3_4_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274417"},{"key":"e_1_3_4_64_1","unstructured":"Sandvig v. Barr. (2020) Civ. Action No. 16-1368. (D.D.C. March 28)."},{"key":"e_1_3_4_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3476058"},{"key":"e_1_3_4_66_1","first-page":"372","article-title":"Twenty years of web scraping and the computer fraud and abuse act","volume":"24","author":"Sellars A","year":"2018","unstructured":"Sellars A (2018) Twenty years of web scraping and the computer fraud and abuse act. Boston University Journal of Science & Technology Law 24: 372\u2013376.","journal-title":"Boston University Journal of Science & Technology Law"},{"key":"e_1_3_4_67_1","doi-asserted-by":"publisher","DOI":"10.1177\/205395172110407"},{"issue":"1","key":"e_1_3_4_68_1","first-page":"71","article-title":"Trumping hate on twitter? Online hate speech in the 2016 U.S","volume":"16","author":"Siegel AA","year":"2021","unstructured":"Siegel AA, Nikitin E, Barbera P, et al. (2021) Trumping hate on twitter? Online hate speech in the 2016 U.S. Election Campaign and its Aftermath. Quarterly Journal of Political Science 16(1): 71\u2013104.","journal-title":"Election Campaign and its Aftermath. Quarterly Journal of Political Science"},{"key":"e_1_3_4_69_1","unstructured":"Singel R (2011) Google Catches Bing Copying; Microsoft Says \u2018So What?\u2019. Retrieved 2023-03-24 from https:\/\/www.wired.com\/2011\/02\/bing-copies-google\/."},{"key":"e_1_3_4_70_1","first-page":"147","article-title":"A new common law of web scraping","volume":"25","author":"Sobel BLW","year":"2021","unstructured":"Sobel BLW (2021) A new common law of web scraping. Lewis & Clark Law Review 25: 147. https:\/\/law.lclark.edu\/live\/files\/31605-7-sobel-article-251pdf.","journal-title":"Lewis & Clark Law Review"},{"key":"e_1_3_4_71_1","unstructured":"Social Media Disclosure and Transparency of Advertisements Act of 2021. (2021) 117th Congress. Retrieved from https:\/\/trahan.house.gov\/uploadedfiles\/social_media_data_act_bill_text.pdf."},{"key":"e_1_3_4_72_1","unstructured":"The citizen browser project\u2014auditing the algorithms of disinformation\u2014The Markup. (2020) Retrieved 2024-03-20 from https:\/\/themarkup.org\/citizen-browser."},{"key":"e_1_3_4_73_1","unstructured":"The Markup Staff. (2020) Why Web Scraping Is Vital to Democracy. Retrieved 2023-06-21 from https:\/\/themarkup.org\/news\/2020\/12\/03\/why-web-scraping-is-vital-to-democracy."},{"key":"e_1_3_4_74_1","unstructured":"The Platform Accountability and Transparency Act LYN23256 1RR. (2023) 118th Congress.Retrieved from https:\/\/www.coons.senate.gov\/imo\/media\/doc\/text_pata_117.pdf."},{"key":"e_1_3_4_75_1","unstructured":"TikTok. (2023) Video Play Reporting Metrics. Retrieved 2024-04-18 from https:\/\/ads.tiktok.com\/help\/article\/video-play."},{"key":"e_1_3_4_76_1","doi-asserted-by":"crossref","unstructured":"Tromble R Storz A Stockmann D (2017) We don\u2019t know what we don\u2019t know: when and how the use of Twitter\u2019s public APIs biases scientific inference. Available at SSRN 3079927. http:\/\/dx.doi.org\/10.2139\/ssrn.3079927.","DOI":"10.2139\/ssrn.3079927"},{"key":"e_1_3_4_77_1","doi-asserted-by":"publisher","DOI":"10.1177\/1556264617725200"},{"key":"e_1_3_4_78_1","doi-asserted-by":"publisher","DOI":"10.1177\/1461444820933547"},{"key":"e_1_3_4_79_1","unstructured":"X Corp. v. Bright Data Ltd (2023) No. 3:23-cv-03698 (N.D. Cal. July 26)."},{"key":"e_1_3_4_80_1","unstructured":"X Corp. v. Center for Countering Digital Hate Inc. (2024) 2024 WL 1246318. (N.D. Cal. March 25)."},{"key":"e_1_3_4_81_1","unstructured":"XDevelopers. (2023) Announcing new access tiers for the twitter api. Retrieved 2024-05-27 from https:\/\/devcommunity.x.com\/t\/announcing-new-access-tiers-for-the-twitter-api\/188728."},{"key":"e_1_3_4_82_1","doi-asserted-by":"publisher","DOI":"10.1177\/2056305118768300"},{"key":"e_1_3_4_83_1","doi-asserted-by":"publisher","DOI":"10.5210\/spir.v2020i0.11369"},{"key":"e_1_3_4_84_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1005399"}],"container-title":["Big Data &amp; Society"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/20539517251381686","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/20539517251381686","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/20539517251381686","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T13:01:29Z","timestamp":1777381289000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/20539517251381686"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,20]]},"references-count":83,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1177\/20539517251381686"],"URL":"https:\/\/doi.org\/10.1177\/20539517251381686","relation":{},"ISSN":["2053-9517","2053-9517"],"issn-type":[{"value":"2053-9517","type":"print"},{"value":"2053-9517","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,20]]},"article-number":"20539517251381686"}}