{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,12]],"date-time":"2025-07-12T22:57:47Z","timestamp":1752361067333,"version":"3.37.3"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Digital Scholarship Humanities"],"published-print":{"date-parts":[[2016,6]]},"DOI":"10.1093\/llc\/fqu066","type":"journal-article","created":{"date-parts":[[2014,12,15]],"date-time":"2014-12-15T01:12:55Z","timestamp":1418605975000},"page":"227-243","source":"Crossref","is-referenced-by-count":7,"title":["Twitter corpus creation: The case of a Malay Chat-style-text Corpus (MCC)"],"prefix":"10.1093","volume":"31","author":[{"given":"Mohammad","family":"Arshi Saloot","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Norisma","family":"Idris","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"AiTi","family":"Aw","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dirk","family":"Thorleuchter","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2014,12,14]]},"reference":[{"key":"2016051900130516000_31.2.227.1","unstructured":"Ahmad M Mathkour H . Proceedings of International MultiConference of Engineers & Computer Scientists. Hong Kong: International Association of Engineers; 2009. A Pattern Matching Approach for Redundancy Detection in Bi-lingual and Mono-lingual Corpora; p. 526-31."},{"key":"2016051900130516000_31.2.227.2","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/7.1.1"},{"key":"2016051900130516000_31.2.227.3","doi-asserted-by":"crossref","unstructured":"Berber-Sardinha T . Proceedings of the Workshop on Comparing Corpora - Volume 9. 2000. Comparing corpora with wordsmith tools: how large must the reference corpus be? WCC\u201900. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 7\u201313. http:\/\/dx.doi.org\/10.3115\/1117729.1117731 .","DOI":"10.3115\/1117729.1117731"},{"key":"2016051900130516000_31.2.227.4","unstructured":"Bouma G . Normalized (Pointwise) Mutual Information in Collocation Extraction. Potsdam U , editor. 2009. http:\/\/www.google.de\/url?sa=t&rct=j&q=normalized (pointwise) mutual informationin collocation extraction&source=web&cd=2&cad=rja&ved=0CE4QFjAB&url=https:\/\/svn.spraakdata.gu.se\/repos\/gerlof\/pub\/www\/Docs\/npmi-pfd.pdf&ei=prr5UNWSBs_TsgaPzoD4Bg&usg=AFQjCNFAHJHKG5tLXCNmGJw4yRqX2WuPuA&bvm=bv.41248874,d.Yms."},{"issue":"2","key":"2016051900130516000_31.2.227.5","first-page":"299","article-title":"Creating a live, public short message service corpus: the NUS SMS corpus","volume":"47","author":"Chen","year":"2013","journal-title":"Language Resources and Evaluation"},{"key":"2016051900130516000_31.2.227.6","doi-asserted-by":"crossref","unstructured":"De Choudhury M Lin Y-R Sundaram H Candan K S Xie L Kelliher A . Proceedings of the 4th International AAAI Conference on Weblogs and Social Media. 2010. How does the data sampling strategy impact the discovery of information diffusion in social media? http:\/\/www.public.asu.edu\/\u223cmdechoud\/pubs\/icwsm_10.pdf .","DOI":"10.1609\/icwsm.v4i1.14024"},{"key":"2016051900130516000_31.2.227.7","doi-asserted-by":"publisher","DOI":"10.1075\/ijcl.14.2.02dav"},{"key":"2016051900130516000_31.2.227.8","doi-asserted-by":"publisher","DOI":"10.1007\/s10588-005-5377-0"},{"key":"2016051900130516000_31.2.227.9","unstructured":"Han B Baldwin T . Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1. 2011. Lexical Normalisation of Short Text Messages: Makn Sens a Twitter. HLT\u201911. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 368\u201378. http:\/\/dl.acm.org\/citation.cfm?id=2002472.2002520 ."},{"key":"2016051900130516000_31.2.227.10","unstructured":"Hasund K . Explorations in Corpus Linguistics. Renouf A , editor. Amsterdam & Atlanta: Rodopi; 1998."},{"key":"2016051900130516000_31.2.227.11","doi-asserted-by":"crossref","unstructured":"Java A Song X Finin T Tseng B . Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. 2007. Why we twitter: understanding microblogging usage and communities. WebKDD\/SNA-KDD\u201907. New York, NY, USA: ACM, pp. 56\u201365. http:\/\/doi.acm.org\/10.1145\/1348549.1348556 .","DOI":"10.1145\/1348549.1348556"},{"key":"2016051900130516000_31.2.227.12","unstructured":"Jones R Ghani R . Proceedings of the Student Research Workshop at the 38th Annual Meeting of the ACL. Hong Kong: Association for Computational Linguistics. 2000. Automatically building a corpus for a minority language from the web; p. 29-36."},{"issue":"1","key":"2016051900130516000_31.2.227.13","first-page":"47","article-title":"Malay language as a foreign language and the singapore\u2019 s education system","volume":"8","author":"Kassim","year":"2008","journal-title":"Online Journal of Language Studies"},{"key":"2016051900130516000_31.2.227.14","unstructured":"Kaufmann M Kalita J . Proceedings of the International Conference on Natural Language Processing. Kharagpur, India: Indian Institute of Technology; 2010. Syntactic normalization of Twitter messages; p. 1-7. http:\/\/cs.uccs.edu\/\u223cjkalita\/work\/reu\/REUFinalPapers2010\/Kaufmann.pdf ."},{"key":"2016051900130516000_31.2.227.15","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711569"},{"key":"2016051900130516000_31.2.227.16","doi-asserted-by":"crossref","unstructured":"Kwee A Tsai F Tang W . Sentence-level novelty detection in english and malay. In: Theeramunkong T , editors. Advances in Knowledge Discovery and Data Mining SE - 7. 2009. Lecture Notes in Computer Science. Springer Berlin Heidelberg, pp. 40\u201351. http:\/\/dx.doi.org\/10.1007\/978-3-642-01307-2_7 .","DOI":"10.1007\/978-3-642-01307-2_7"},{"key":"2016051900130516000_31.2.227.17","doi-asserted-by":"crossref","unstructured":"Liu W Wang T . Index-based online text classification for sms spam filtering. Journal of Computers 2010;5(6). http:\/\/ojs.academypublisher.com\/index.php\/jcp\/article\/view\/0506844851 .","DOI":"10.4304\/jcp.5.6.844-851"},{"key":"2016051900130516000_31.2.227.18","unstructured":"Lui M Baldwin T . Proceedings of the ACL 2012 System Demonstrations. 2012. Langid.Py: an off-the-shelf language identification tool. ACL\u201912. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 25\u201330. http:\/\/dl.acm.org\/citation.cfm?id=2390470.2390475 ."},{"issue":"1","key":"2016051900130516000_31.2.227.19","first-page":"21","article-title":"A text mining technique using association rules extraction","volume":"4","author":"Mahgoub","year":"2009","journal-title":"International Journal of Information and Mathematical Sciences"},{"key":"2016051900130516000_31.2.227.20","unstructured":"Manning C D Schuetze H . Foundations of Statistical Natural Language Processing. 1st edn. Cambridge, MA: The MIT Press; 1999. http:\/\/amazon.com\/o\/ASIN\/0262133601\/ ."},{"key":"2016051900130516000_31.2.227.21","doi-asserted-by":"crossref","unstructured":"McCreadie R Soboroff I Lin J Macdonald C Ounis I McCullough D . Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012. On building a reusable twitter corpus. SIGIR\u201912. New York, NY, USA: ACM, pp. 1113\u201314. http:\/\/doi.acm.org\/10.1145\/2348283.2348495 .","DOI":"10.1145\/2348283.2348495"},{"key":"2016051900130516000_31.2.227.22","doi-asserted-by":"crossref","unstructured":"McEnery T Hardie A . Corpus Linguistics: Method, Theory and Practice (Cambridge Textbooks in Linguistics). Cambridge, UK: Cambridge University Press; 2011. http:\/\/amazon.com\/o\/ASIN\/0521547369\/ .","DOI":"10.1017\/CBO9780511981395"},{"key":"2016051900130516000_31.2.227.23","unstructured":"McEnery T Wilson A . Corpus Linguistics (Edinburgh Textbooks in Empirical Linguistics). 2nd edn. Edinburgh, Scotland: Edinburgh University Press; 2001. http:\/\/amazon.com\/o\/ASIN\/0748611657\/ ."},{"issue":"1","key":"2016051900130516000_31.2.227.24","first-page":"241","article-title":"The world wide web as linguistic corpus","volume":"46","author":"Meyer","year":"2003","journal-title":"Language and Computers"},{"key":"2016051900130516000_31.2.227.25","doi-asserted-by":"crossref","unstructured":"Mihalcea R Moldovan D I . Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics. 1999. A method for word sense disambiguation of unrestricted text. ACL\u201999. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 152\u20138. http:\/\/dx.doi.org\/10.3115\/1034678.1034709 .","DOI":"10.3115\/1034678.1034709"},{"key":"2016051900130516000_31.2.227.26","doi-asserted-by":"crossref","unstructured":"Mislove A J\u00f8rgensen SL Ahn Y-Y Onnela J-P Rosenquist J N . Proceedings of the Fifth International Conference of Weblogs and Social Media. Barcelona, Catalonia, Spain: AAAI Press; 2011. Understanding the demographics of twitter users; p. 554-7.","DOI":"10.1609\/icwsm.v5i1.14168"},{"key":"2016051900130516000_31.2.227.27","unstructured":"Mood A M Graybill F A Boes D C . Introduction to the Theory of Statistics. 3rd edn. McGraw Hill; 1974. http:\/\/amazon.com\/o\/ASIN\/0070854653\/ ."},{"key":"2016051900130516000_31.2.227.28","doi-asserted-by":"publisher","DOI":"10.1007\/s00779-009-0259-y"},{"key":"2016051900130516000_31.2.227.29","unstructured":"Pak A Paroubek P . Proceedings of the International Conference on Language Resources and Evaluation. Valletta, Malta: LREC; 2010. Twitter as a corpus for sentiment analysis and opinion mining; p. 1320-6."},{"key":"2016051900130516000_31.2.227.30","doi-asserted-by":"crossref","unstructured":"Pennell D Liu Y . IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011. 2011. Toward text message normalization: Modeling abbreviation generation; p. 5364-7.","DOI":"10.1109\/ICASSP.2011.5947570"},{"key":"2016051900130516000_31.2.227.31","unstructured":"Petrovi\u0107 S Osborne M Lavrenko V . Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media. 2010. The edinburgh twitter corpus. WSA\u201910. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 25\u20136. http:\/\/dl.acm.org\/citation.cfm?id=1860667.1860680 ."},{"key":"2016051900130516000_31.2.227.32","unstructured":"Rayson P Wilson T M . A Rainbow of Corpora. Munich: Lincom Publishers; 2003."},{"issue":"1","key":"2016051900130516000_31.2.227.33","first-page":"27","article-title":"The time dimension in modern english corpus linguistics","volume-title":"Language and Computers","volume":"42","author":"Renouf","year":"2002"},{"key":"2016051900130516000_31.2.227.34","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322711578"},{"key":"2016051900130516000_31.2.227.35","doi-asserted-by":"crossref","unstructured":"Rivest R . The {MD5} Message-Digest Algorithm. 1992. ftp:\/\/ftp.internic.net\/rfc\/rfc1321.txt .","DOI":"10.17487\/rfc1321"},{"key":"2016051900130516000_31.2.227.36","unstructured":"Rizzo C R . Getting on With Corpus Compilation: From Theory to Practice. 2010. ESP World, 1(27): 1\u201323."},{"key":"2016051900130516000_31.2.227.37","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2014.04.009"},{"key":"2016051900130516000_31.2.227.38","unstructured":"Sinclair J . Corpus Concordance and Collocation (Describing English Language). Oxford Univ Pr (Sd); 1991. http:\/\/amazon.com\/o\/ASIN\/0194371441\/ ."},{"issue":"1","key":"2016051900130516000_31.2.227.39","first-page":"39","article-title":"Intuition and annotation; the discussion continues","volume":"49","author":"Sinclair","year":"2004","journal-title":"Language and Computers"},{"key":"2016051900130516000_31.2.227.40","doi-asserted-by":"publisher","DOI":"10.2307\/2345174"},{"key":"2016051900130516000_31.2.227.41","unstructured":"Soon S K . The English Teacher. 1987. Functions of code-switching in malaysia and singapore. XVI. http:\/\/www.melta.org.my\/ET\/1987\/main3.html ."},{"key":"2016051900130516000_31.2.227.42","unstructured":"Spousta M . {WDS}\u201906 Proceedings of Contributed Papers. Prague, Czech Republic: Matfyzpress; 2006. Web as a Corpus; p. 179-84."},{"key":"2016051900130516000_31.2.227.43","unstructured":"Tongco M D C . Purposive Sampling as a Tool for Informant Selection. 2008. http:\/\/lib-ojs3.lib.sfu.ca:8114\/index.php\/era\/article\/view\/126 ."},{"key":"2016051900130516000_31.2.227.44","doi-asserted-by":"publisher","DOI":"10.2307\/2286513"},{"key":"2016051900130516000_31.2.227.45","doi-asserted-by":"crossref","unstructured":"Xiang G Fan B Wang L Hong J L Rose C P . Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. CIKM\u201912. New York, NY, USA: ACM, pp. 1980\u20134. http:\/\/doi.acm.org\/10.1145\/2396761.2398556 .","DOI":"10.1145\/2396761.2398556"},{"key":"2016051900130516000_31.2.227.46","unstructured":"Xue Z Yin D Davison B D . Analyzing Microtext: Papers from the 2011 AAAI Workshop. San Francisco, CA: USA: AAAI; 2011. Normalizing microtext; p. 74-9. http:\/\/www.aaai.org\/ocs\/index.php\/WS\/AAAIW11\/paper\/view\/3987 ."},{"key":"2016051900130516000_31.2.227.47","doi-asserted-by":"crossref","unstructured":"Zhao D Rosson M B . Proceedings of the ACM 2009 International Conference on Supporting Group Work. 2009. How and why people twitter: the role that micro-blogging plays in informal communication at work. GROUP\u201909. New York, NY, USA: ACM, pp. 243\u201352. http:\/\/doi.acm.org\/10.1145\/1531674.1531710 .","DOI":"10.1145\/1531674.1531710"}],"container-title":["Digital Scholarship in the Humanities"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/dsh\/article-pdf\/31\/2\/227\/7452482\/fqu066.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,30]],"date-time":"2023-07-30T17:36:12Z","timestamp":1690738572000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/dsh\/article-lookup\/doi\/10.1093\/llc\/fqu066"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,12,14]]},"references-count":47,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2016,5,18]]},"published-print":{"date-parts":[[2016,6]]}},"alternative-id":["10.1093\/llc\/fqu066"],"URL":"https:\/\/doi.org\/10.1093\/llc\/fqu066","relation":{},"ISSN":["2055-7671","2055-768X"],"issn-type":[{"type":"print","value":"2055-7671"},{"type":"electronic","value":"2055-768X"}],"subject":[],"published":{"date-parts":[[2014,12,14]]}}}