{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T11:28:48Z","timestamp":1776943728403,"version":"3.51.4"},"reference-count":22,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,5,18]],"date-time":"2023-05-18T00:00:00Z","timestamp":1684368000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Res. Metr. Anal."],"abstract":"<jats:p>Reliable and updated indicators of the presence of languages in the Internet are required to drive efficiently policies for languages, to forecast e-commerce market or to support further researches on the field of digital support of languages. This article presents a complete description of the methodological elements involved in the production of an unprecedented set of indicators of the presence in the Internet of the 329 languages with more than 1 million L1 speakers. A special emphasis is given to the treatment of the comprehensive set of biases involved in the process, either from the method or the various sources used in the modeling process. The biases related to other sources providing similar data are also discussed, and in particular, it is shown how the lack of consideration of the high level of multilingualism of the Web leads to a huge overestimation of the presence of English. The detailed list of sources is presented in the various annexes. For the first time in the history of the Internet, the production of indicators about virtual presence of a large set of languages could allow progress in the fields of economy of languages, cyber-geography of languages and language policies for multilingualism.<\/jats:p>","DOI":"10.3389\/frma.2023.1149347","type":"journal-article","created":{"date-parts":[[2023,5,18]],"date-time":"2023-05-18T08:18:01Z","timestamp":1684397881000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["The method behind the unprecedented production of indicators of the presence of languages in the Internet"],"prefix":"10.3389","volume":"8","author":[{"given":"Daniel","family":"Pimienta","sequence":"first","affiliation":[]},{"given":"\u00c1lvaro","family":"Blanco","sequence":"additional","affiliation":[]},{"given":"Gilvan M\u00fcller","family":"de Oliveira","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,5,18]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"212","DOI":"10.1080\/13698230.2015.1023635","article-title":"The political value of languages","volume":"18","author":"Baub\u00f6ck","year":"2015","journal-title":"Crit. Rev. Int. Soc. Pol. Phil."},{"key":"B2","doi-asserted-by":"crossref","DOI":"10.4324\/9781003138549","author":"Flint","year":"2021","journal-title":"Introduction to Geopolitics"},{"key":"B3","doi-asserted-by":"crossref","unstructured":"GazzolaM. Il Valore Economico Delle Lingue (The Economic Value of Languages)2015","DOI":"10.2139\/ssrn.2691086"},{"key":"B4","doi-asserted-by":"publisher","first-page":"76","DOI":"10.3390\/fi12040076","article-title":"Exploring the dominance of the english language on the websites of EU countries","volume":"12","author":"Giannakoulopoulos","year":"2020","journal-title":"Fut. Int."},{"key":"B5","unstructured":"GrefenstetteG. NocheJ. Rhone-AlpesXerox Research Centre EuropeEstimation of English and Non-English Language use on the WWW2000"},{"key":"B6","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1017\/S0267190500003275","article-title":"The economics of multilingualism: overview and analytical framework","volume":"17","author":"Grin","year":"1997","journal-title":"Annu. Rev. Appl. Linguist."},{"key":"B7","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1146\/annurev.anthro.012809.104951","article-title":"The Commodification of Language","volume":"39","author":"Heller","year":"2010","journal-title":"Ann. Rev. Anthropol."},{"key":"B8","author":"Lavoie","year":"1999","journal-title":"How \u201cWorld Wide\u201d is the Web? Annual Review of OCLC Research."},{"key":"B9","doi-asserted-by":"crossref","unstructured":"MikamiY. ZavarskyP. RozanM. Z. A. SuzukiI. TakahashiM. MakT. The language observatory project (LOP). In: 2005","DOI":"10.1145\/1062745.1062833"},{"key":"B10","unstructured":"Monr\u00e1sF. MedinaM. Cabr\u00e9S. CantoP. MelendezV. RipollE. Estad\u00ed2006"},{"key":"B11","author":"O'Hara","year":"2018","journal-title":"Four Internets: The Geopolitics of Digital Governance."},{"key":"B12","first-page":"21","author":"Oliveira","year":"2010","journal-title":"O lugar das l"},{"key":"B13","author":"O'Neill","year":"2003","journal-title":"Trends in the Evolution of the Public Web: 1998 - 2002"},{"key":"B14","author":"Pimienta","year":"2014","journal-title":"Le fran\u00e7ais dans l'Internet, Rapport 2014 \u201dLa langue fran\u00e7aise dans le monde\u201c."},{"key":"B15","unstructured":"PimientaD. BarcelonaLanguage Technologies and Language DiversityInternet and Linguistic Diversity: The Cyber-Geography of Languages With the Largest Number of Speakers, LinguaPax Review 20212021"},{"key":"B16","unstructured":"PimientaD. Resource: Indicators on the Presence of Languages in Internet In Proceedings of the 1st Annual Meeting of the ELRA\/ISCA Special Interest Group on Under-Resourced Languages, Marseille. European Language Resources Association. 83\u2013912022"},{"key":"B17","doi-asserted-by":"publisher","DOI":"10.13140\/RG.2.2.20767.43683","author":"Pimienta","year":"2023","journal-title":"Is it true that more than half the Web contents are in English? If Web multilingualism is paid due attention then no! ReseachGate Preprint."},{"key":"B18","doi-asserted-by":"crossref","unstructured":"PimientaD. OliveiraG. M. Cyber-Geography of Languages. Part 2: The Demographic Factor and the Growth of Asian Languages and Arabic. Alberta: International Review of Information Ethics. 32","DOI":"10.29173\/irie491"},{"key":"B19","unstructured":"PimientaD. OliveiraG. M. AlbertaInternational Review of Information EthicsCyber-Geography of Languages. Part 1: Method, Results and Focus on English"},{"key":"B20","doi-asserted-by":"publisher","first-page":"e141","DOI":"10.3989\/redc.2016.3.1328","article-title":"Medici\u00f3n de la presencia de la lengua espa\u00f1ola en la Internet: m\u00e9todos y resultados","volume":"39","author":"Pimienta","year":"2016","journal-title":"Revista Espa\u00f1ola de Documentaci\u00f3n Cient\u00edfica"},{"key":"B21","unstructured":"PimientaD. PradoD. Blanco\u00c1. Twelve Years of Measuring Linguistic Diversity on the Internet: Balance and Perspectives. Paris: UNESCO publications for the World Summit on the Information Society2009"},{"key":"B22","unstructured":"SimonsG. F. ThomasA. L. WhiteC. K. GyeongjuInternational Committee on Computational LinguisticsAssessing Digital Language Support on a Global Scale, In Proceedings of the 29th International Conference on Computational Linguistics2023"}],"container-title":["Frontiers in Research Metrics and Analytics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frma.2023.1149347\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T07:06:43Z","timestamp":1702451203000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frma.2023.1149347\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,18]]},"references-count":22,"alternative-id":["10.3389\/frma.2023.1149347"],"URL":"https:\/\/doi.org\/10.3389\/frma.2023.1149347","relation":{},"ISSN":["2504-0537"],"issn-type":[{"value":"2504-0537","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,18]]},"article-number":"1149347"}}