{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T01:54:09Z","timestamp":1770170049692,"version":"3.49.0"},"reference-count":35,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2020,6,16]],"date-time":"2020-06-16T00:00:00Z","timestamp":1592265600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"published-print":{"date-parts":[[2020,8,31]]},"abstract":"<jats:p>\n                    The paper presents a new corpus for fake news detection in the Urdu language along with the baseline classification and its evaluation. With the escalating use of the Internet worldwide and substantially increasing impact produced by the availability of ambiguous information, the challenge to quickly identify fake news in digital media in various languages becomes more acute. We provide a manually assembled and verified dataset containing 900 news articles, 500 annotated as real and 400, as fake, allowing the investigation of automated fake news detection approaches in Urdu. The news articles in the truthful subset come from legitimate news sources, and their validity has been manually verified. In the fake subset, the known difficulty of finding fake news was solved by hiring professional journalists native in Urdu who were instructed to intentionally write deceptive news articles. The dataset contains 5 different topics: (i) Business, (ii) Health, (iii) Showbiz, (iv) Sports, and (v) Technology. To establish our Urdu dataset as a benchmark, we performed baseline classification. We crafted a variety of text representation feature sets including word\n                    <jats:italic>n<\/jats:italic>\n                    -grams, character\n                    <jats:italic>n<\/jats:italic>\n                    -grams, functional word\n                    <jats:italic>n<\/jats:italic>\n                    -grams, and their combinations. After applying a variety of feature weighting schemes, we ran a series of classifiers on the train-test split. The results show sizable performance gains by AdaBoost classifier with 0.87 F1\n                    <jats:sub>Fake<\/jats:sub>\n                    and 0.90 F1\n                    <jats:sub>Real<\/jats:sub>\n                    . We provide the results evaluated against different metrics for a convenient comparison of future research. The dataset is publicly available for research purposes.\n                  <\/jats:p>","DOI":"10.3233\/jifs-179905","type":"journal-article","created":{"date-parts":[[2020,6,19]],"date-time":"2020-06-19T12:07:00Z","timestamp":1592568420000},"page":"2457-2469","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":43,"title":["\u201cBend the truth\u201d: Benchmark dataset for fake news detection in Urdu language and its evaluation"],"prefix":"10.1177","volume":"39","author":[{"given":"Maaz","family":"Amjad","sequence":"first","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n (CIC), Instituto Polit\u00e9cnico Nacional, Mexico"}]},{"given":"Grigori","family":"Sidorov","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n (CIC), Instituto Polit\u00e9cnico Nacional, Mexico"}]},{"given":"Alisa","family":"Zhila","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n (CIC), Instituto Polit\u00e9cnico Nacional, Mexico"}]},{"given":"Helena","family":"G\u00f3mez-Adorno","sequence":"additional","affiliation":[{"name":"Instituto de Investigaciones en Matem\u00e1ticas Aplicadas y en Sistemas (IIMAS), Universidad Nacional Aut\u00f3noma de M\u00e9xico, Mexico"}]},{"given":"Ilia","family":"Voronkov","sequence":"additional","affiliation":[{"name":"Moscow Institute of Physics and Technology, Russia"}]},{"given":"Alexander","family":"Gelbukh","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n (CIC), Instituto Polit\u00e9cnico Nacional, Mexico"}]}],"member":"179","published-online":{"date-parts":[[2020,6,16]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"CieriC. MaxwellM. StrasselS.M. TraceyJ. ChoukriK. DeclerckT. GoggiS. and GrobelnikM. Selection Criteria for Low Resource Language Programs in: Proceedings of the 10th. International Conference on Language Resources and Evaluation LREC\u20192016) European Language Resources Association (ELRA) 2016."},{"key":"e_1_3_2_3_2","doi-asserted-by":"crossref","unstructured":"PotthastM. KieselJ. ReinartzK. BevendorffJ. and SteinB. A Stylometric Inquiry into Hyperpartisan and Fake News in: Proceedings of the 56th. Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Association forComputationalLinguistics (2018) pp. 231\u2013240.","DOI":"10.18653\/v1\/P18-1022"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3137597.3137600"},{"key":"e_1_3_2_5_2","unstructured":"RubinV.L. ChenY. and ConroyN.J. Deception Detection for News: Three Types of Fakes in: Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community American Society for Information Science (2015) pp. 83."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1002\/pra2.2015.145052010082"},{"key":"e_1_3_2_7_2","unstructured":"NguyenD.M. DoT.H. CalderbankR. and DeligiannisN. Fake News Detection using Deep Markov Random Fields in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1 (Long and Short Papers) (2019) pp. 1391\u20131400."},{"key":"e_1_3_2_8_2","unstructured":"P\u00e9rez-RosasV. KleinbergB. LefevreA. and MihalceaR. Automatic Detection of Fake News in: Proceedings of the 27th International Conference on Computational Linguistics Association for Computational Linguistics Santa Fe New Mexico USA (2018) pp. 3391\u20133401. https:\/\/www.aclweb.org\/anthology\/C18-1287."},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.10.171"},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"ChenY. ConroyN.J. and RubinV.L. Misleading online content: Recognizing clickbait as false news in: Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection ACM (2015) pp. 15\u201319.","DOI":"10.1145\/2823465.2823467"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3137597.3137602"},{"key":"e_1_3_2_12_2","doi-asserted-by":"crossref","unstructured":"IvanovV. and TutubalinaE. Clause-Based Approach to Extracting Problem Phrases from User Reviews of Products (2014) pp. 229\u2013236.","DOI":"10.1007\/978-3-319-12580-0_24"},{"key":"e_1_3_2_13_2","doi-asserted-by":"crossref","unstructured":"AfrozS. BrennanM. and GreenstadtR. Detecting hoaxes frauds and deception in writing style online in: 2012 IEEE Symposium on Security and Privacy IEEE 2012 pp. 461\u2013475.","DOI":"10.1109\/SP.2012.34"},{"key":"e_1_3_2_14_2","unstructured":"Posadas-Dur\u00e1nJ.P. G\u00f3mez-AdornoH. SidorovG. and Jaime Moreno EscobarJ. Detection of Fake News in a New Corpus for the Spanish Language Journal of Intelligent & Fuzzy Systems (2018)."},{"key":"e_1_3_2_15_2","doi-asserted-by":"crossref","unstructured":"BalyR. KaradzhovG. SalehA. GlassJ. and NakovP. Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media arXiv preprint arXiv:1904.00542 (2019).","DOI":"10.18653\/v1\/N19-1216"},{"key":"e_1_3_2_16_2","doi-asserted-by":"crossref","unstructured":"FerreiraW. and VlachosA. Emergent: a Novel Data-Set for Stance Classification in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies NAACL\u20192016 (2016) pp. 1163\u20131168.","DOI":"10.18653\/v1\/N16-1138"},{"key":"e_1_3_2_17_2","unstructured":"KrejzlP. Hourov\u00e1B. and SteinbergerJ. Stance Detection in Online Discussions arXiv preprint arXiv:1701.00504 (2017)."},{"key":"e_1_3_2_18_2","unstructured":"SeanB. DougS. and PanY. Talos targets disinformation with fake news challenge victory https:\/\/blog.talosintelligence.com\/2017\/06\/ (2017)."},{"key":"e_1_3_2_19_2","unstructured":"MikolovT. ChenK. CorradoG. and DeanJ. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013)."},{"key":"e_1_3_2_20_2","doi-asserted-by":"crossref","unstructured":"WangW.Y. \u201d liar liar pants on fire\u201d: A new benchmark dataset for fake news detection arXiv preprint arXiv:1705.00648 (2017).","DOI":"10.18653\/v1\/P17-2067"},{"key":"e_1_3_2_21_2","unstructured":"MitraT. and GilbertE. Credbank: A large-scale social media corpus with associated credibility annotations in: Ninth International AAAI Conference on Web and Social Media 2015."},{"key":"e_1_3_2_22_2","unstructured":"ShuK. MahudeswaranD. WangS. LeeD. and LiuH. Fake-NewsNet: A Data Repository with News Content Social Context and Dynamic Information for Studying Fake News on Social Media arXiv preprint arXiv:1809.01286 (2018)."},{"key":"e_1_3_2_23_2","doi-asserted-by":"crossref","unstructured":"PronozaE.V. YagunovaE. and PronozaA. A New Corpus of the Russian Social Network News Feed Paraphrases: Corpus Construction and Linguistic Feature Analysis in: Advances in Computational Intelligence - 16th Mexican International Conference on Artificial Intelligence MICAI\u20192017 (2017) pp. 133\u2013145.","DOI":"10.1007\/978-3-030-02840-4_11"},{"key":"e_1_3_2_24_2","unstructured":"BakerP. HardieA. McEneryT. CunninghamH. and GaizauskasR. EMILLE A 67 Million Word Corpus of Indic Languages: Data Collection Mark-up and Harmonisation in: Proceedings of the 3rd. Language Resources and Evaluation Conference (2002) pp. 819\u2013825."},{"key":"e_1_3_2_25_2","doi-asserted-by":"crossref","unstructured":"SaeedA. NawabR.M.A. and StevensonM. A Word Sense Disambiguation Corpus for Urdu (2018).","DOI":"10.1007\/s10579-018-9438-7"},{"key":"e_1_3_2_26_2","doi-asserted-by":"crossref","unstructured":"MuazA. AliA. and HussainS. Analysis and Development of Urdu POS tagged corpus in: Proceedings of the 7th. Workshop on Asian Language Resources IJCNLP (2009) pp. 24\u201329.","DOI":"10.3115\/1690299.1690303"},{"key":"e_1_3_2_27_2","doi-asserted-by":"crossref","unstructured":"RazaA.A. HussainS. SarfrazH. UllahI. and SarfrazZ. Design and Development of Phonetically Pich Urdu Speech Corpus in: 2009 Oriental COCOSDA International Conference on Speech Database and Assessments IEEE (2009) pp. 38\u201343.","DOI":"10.1109\/ICSDA.2009.5278380"},{"key":"e_1_3_2_28_2","doi-asserted-by":"crossref","unstructured":"WangW.Y. \u201dLiar Liar Pants on Fire\u201d: A New Benchmark Dataset for Fake News Detection arXiv preprint arXiv:1705.00648 (2017).","DOI":"10.18653\/v1\/P17-2067"},{"issue":"1","key":"e_1_3_2_29_2","first-page":"47","article-title":"Stylometry-based approach for detecting writing style changes in literary texts","volume":"22","author":"G\u00f3mez-Adorno H.","year":"2018","unstructured":"G\u00f3mez-AdornoH., Posadas-DuranJ.-P., R\u00edos-ToledoG., SidorovG. and SierraG., Stylometry-based approach for detecting writing style changes in literary texts, Computac\u00edon y Sistemas22(1) (2018), 47\u201353.","journal-title":"Computac\u00edon y Sistemas"},{"key":"e_1_3_2_30_2","doi-asserted-by":"crossref","unstructured":"PangB. LeeL. and VaithyanathanS. Thumbs up?: sentiment classification using machine learning techniques in: Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 Association for Computational Linguistics (2002) pp. 79\u201386.","DOI":"10.3115\/1118693.1118704"},{"key":"e_1_3_2_31_2","unstructured":"HornR.A. and JohnsonC.R. Norms for vectors and matrices Ch. 5 in Matrix analysis (1990) 313\u2013386."},{"key":"e_1_3_2_32_2","unstructured":"LandauerT.K. LSA as a theory of meaning in: Handbook of latent semantic analysis Psychology Press (2007) pp. 15\u201346."},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","unstructured":"SidorovG. Miranda-Jim\u00e9nezS. Viveros-Jim\u00e9nezF. GelbukhA. Castro-S\u00e1nchezN. Vel\u00e1squezF. D\u00edaz-RangelI. Su\u00e1rez-GuerraS. TrevinoA. and GordonJ. Empirical Study of Machine Learning based Approach for Opinion Mining in Tweets in: Mexican international conference on Artificial intelligence MICAI\u20192012 Springer (2012) pp. 1\u201314.","DOI":"10.1007\/978-3-642-37807-2_1"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21001"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000011"},{"key":"e_1_3_2_36_2","unstructured":"BuitinckL. LouppeG. BlondelM. PedregosaF. MuellerA. GriselO. NiculaeV. PrettenhoferP. GramfortA. GroblerJ. LaytonR. VanderPlasJ. JolyA. HoltB. and VaroquauxG. API Design for Machine Learning Software: Experiences from the Scikit-learn Project in: ECML PKDDWorkshop: Languages for Data Mining and Machine Learning (2013) pp. 108\u2013122."}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-179905","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JIFS-179905","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-179905","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T12:33:59Z","timestamp":1770122039000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JIFS-179905"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,16]]},"references-count":35,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,8,31]]}},"alternative-id":["10.3233\/JIFS-179905"],"URL":"https:\/\/doi.org\/10.3233\/jifs-179905","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,16]]}}}