{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T00:25:25Z","timestamp":1776471925513,"version":"3.51.2"},"reference-count":26,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2016,9,29]],"date-time":"2016-09-29T00:00:00Z","timestamp":1475107200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"funder":[{"DOI":"10.13039\/501100003329","name":"Ministerio de Econom\u00eda y Competitividad","doi-asserted-by":"publisher","award":["TIN2014-54288-C4-3-R"],"award-info":[{"award-number":["TIN2014-54288-C4-3-R"]}],"id":[{"id":"10.13039\/501100003329","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2017,6]]},"DOI":"10.1007\/s10115-016-0997-x","type":"journal-article","created":{"date-parts":[[2016,9,29]],"date-time":"2016-09-29T09:17:16Z","timestamp":1475140636000},"page":"965-989","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Language identification of multilingual posts from Twitter: a case study"],"prefix":"10.1007","volume":"51","author":[{"given":"Ferran","family":"Pla","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Llu\u00eds-F.","family":"Hurtado","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2016,9,29]]},"reference":[{"key":"997_CR1","unstructured":"Baldwin T, Lui M (2010) Language identification: the long and the short of the matter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT \u201810. Association for Computational Linguistics, Stroudsburg, PA, pp 229\u2013237"},{"key":"997_CR2","unstructured":"Bergsma S, McNamee P, Bagdouri M, Fink C, Wilson T (2012) Language identification for creating language-specific twitter collections. In: Proceedings of the second workshop on language in social media, LSM \u201812. Association for Computational Linguistics, Stroudsburg, PA, pp 65\u201374"},{"issue":"1","key":"997_CR3","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1007\/s10579-012-9195-y","volume":"47","author":"S Carter","year":"2013","unstructured":"Carter S, Weerkamp W, Tsagkias M (2013) Microblog language identification: overcoming the limitations of short, unedited and idiomatic text. Lang Resour Eval 47(1):195\u2013215","journal-title":"Lang Resour Eval"},{"key":"997_CR4","unstructured":"Cavnar WB, Trenkle JM (1994) N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd annual symposium on document analysis and information retrieval, pp. 161\u2013175"},{"issue":"3","key":"997_CR5","first-page":"273","volume":"20","author":"C Cortes","year":"1995","unstructured":"Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273\u2013297","journal-title":"Mach Learn"},{"key":"997_CR6","unstructured":"Gamallo P, Garc\u00eda M, Sotelo S, Campos JRP (2014) Comparing ranking-based and naive bayes approaches to language detection on tweets. \u2018TweetLID@SEPLN\u2019, pp 12\u201316"},{"key":"997_CR7","doi-asserted-by":"publisher","unstructured":"Goldszmidt M, Najork M, Paparizos S (2013) Boot-strapping language identifiers for short colloquial postings. In: Proceeding of the European conference on machine learning and principles and practice of knowledge discovery in databases (ECMLPKDD 2013). Springer","DOI":"10.1007\/978-3-642-40991-2_7"},{"key":"997_CR8","unstructured":"Grefenstette G (1995) Comparing two language identification schemes. In: 3rd international conference on statistical analysis of textural data"},{"key":"997_CR9","unstructured":"Hurtado LF, Pla F, Gim\u00e9nez M, Arnal ES (2014) Elirf-upv en tweetlid: Identificaci\u00f3n del idioma en twitter, In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing, TweetLID@SEPLN 2014, Girona, 16 Sept 2014, pp 35\u201338"},{"key":"997_CR10","doi-asserted-by":"publisher","unstructured":"Jauhiainen T, Lind\u00e9n K, Jauhiainen H (2015) Language set identification in noisy synthetic multilingual documents. In: Gelbukh A (ed) Computational linguistics and intelligent text processing, vol 9041 of lecture notes in computer science. Springer International Publishing, pp 633\u2013643","DOI":"10.1007\/978-3-319-18111-0_48"},{"key":"997_CR11","doi-asserted-by":"publisher","unstructured":"Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: N\u00e9dellec C, Rouveirol C (eds) Proceedings of ECML-98, 10th European conference on machine learning, no. 1398. Springer, Heidelberg, pp 137\u2013142","DOI":"10.1007\/BFb0026683"},{"key":"997_CR12","volume-title":"Sentiment analysis and opinion mining. A comprehensive introduction and survey","author":"B Liu","year":"2012","unstructured":"Liu B (2012) Sentiment analysis and opinion mining. A comprehensive introduction and survey. Morgan & Claypool Publishers, San Rafael"},{"key":"997_CR13","unstructured":"Ljube\u0161i\u0107 N, Mikeli\u0107 N, Boras D (2007) Language identification: How to distinguish similar languages, In: Lu\u017ear-Stifter V, Hljuz\u00a0Dobri\u0107 V (eds), Proceedings of the 29th international conference on information technology interfaces. SRCE University Computing Centre, Zagreb, pp 541\u2013546"},{"key":"997_CR14","doi-asserted-by":"publisher","unstructured":"Lui M, Baldwin T (2014) Accurate language identification of twitter messages. In: Proceedings of the EACL 2014 workshop on language analysis in social media (LASM 2014), pp 17\u201325","DOI":"10.3115\/v1\/W14-1303"},{"key":"997_CR15","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1162\/tacl_a_00163","volume":"2","author":"M Lui","year":"2014","unstructured":"Lui M, Lau JH, Baldwin T (2014) Automatic detection and language identification of multilingual documents. Trans Assoc Comput Linguist 2:27\u201340","journal-title":"Trans Assoc Comput Linguist"},{"key":"997_CR16","doi-asserted-by":"crossref","unstructured":"Nguyen D, Dogruoz AS (2014) Word level language identification in online multilingual communication. In: Proceedings of the 2013 conference on empirical methods in natural language processing","DOI":"10.18653\/v1\/D13-1084"},{"key":"997_CR17","doi-asserted-by":"crossref","unstructured":"O\u2019Connor B, Krieger M, Ahn D (2010) Tweetmotif: exploratory search and topic summarization for twitter. In: Cohen WW, Gosling S (eds) Proceedings of the fourth international conference on weblogs and social media, ICWSM 2010, Washington, DC. The AAAI Press, 23\u201326 May 2010","DOI":"10.1609\/icwsm.v4i1.14008"},{"key":"997_CR18","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825\u20132830","journal-title":"J Mach Learn Res"},{"key":"997_CR19","unstructured":"Pla F, Hurtado L-F (2014) Political tendency identification in twitter using sentiment analysis techniques. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers. Dublin City University and Association for Computational Linguistics, Dublin, pp 183\u2013192"},{"issue":"3","key":"997_CR20","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1080\/07421222.1999.11518257","volume":"16","author":"JM Prager","year":"1999","unstructured":"Prager JM (1999) Linguini: language identification for multilingual documents. J Manage Inf Syst 16(3):71\u2013101","journal-title":"J Manage Inf Syst"},{"issue":"2","key":"997_CR21","doi-asserted-by":"crossref","first-page":"876","DOI":"10.1016\/j.patcog.2011.08.007","volume":"45","author":"J Ram\u00f3n Quevedo","year":"2012","unstructured":"Ram\u00f3n Quevedo J, Luaces O, Bahamonde A (2012) Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recogn 45(2):876\u2013883","journal-title":"Pattern Recogn"},{"key":"997_CR22","doi-asserted-by":"publisher","unstructured":"Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, SMUC \u201810. ACM, New York, NY, pp 37\u201344","DOI":"10.1145\/1871985.1871993"},{"issue":"1","key":"997_CR23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/505282.505283","volume":"34","author":"F Sebastiani","year":"2002","unstructured":"Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1\u201347","journal-title":"ACM Comput Surv"},{"key":"997_CR24","doi-asserted-by":"publisher","first-page":"1","DOI":"10.4018\/jdwm.2007070101","volume":"2007","author":"G Tsoumakas","year":"2007","unstructured":"Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 2007:1\u201313","journal-title":"Int J Data Warehous Min"},{"key":"997_CR25","unstructured":"Zubiaga A, Vicente IS, Gamallo P, Campos JRP, Loinaz IA, Aranberri N, Ezeiza A Fresno-Fern\u00e1ndez V (2014) Overview of tweetlid: Tweet language identification at SEPLN 2014. In: Proceedings of the Tweet language identification workshop co-located with 30th conference of the Spanish society for natural language processing. TweetLID@SEPLN 2014, Girona, Spain, 16 Sept 2014, pp 1\u201311"},{"key":"997_CR26","doi-asserted-by":"publisher","unstructured":"Zubiaga A, San\u00a0Vicente I, Gamallo P, Pichel JR, Alegria I, Aranberri N, Ezeiza A, Fresno V (2015) TweetLID: a benchmark for tweet language identification. J Lang Res Eval. Springer, pp\u00a01\u201338. doi: 10.1007\/s10579-015-9317-4","DOI":"10.1007\/s10579-015-9317-4"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10115-016-0997-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-016-0997-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-016-0997-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,10]],"date-time":"2025-06-10T21:52:41Z","timestamp":1749592361000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10115-016-0997-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,9,29]]},"references-count":26,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,6]]}},"alternative-id":["997"],"URL":"https:\/\/doi.org\/10.1007\/s10115-016-0997-x","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,9,29]]}}}