{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T00:45:01Z","timestamp":1776041101630,"version":"3.50.1"},"reference-count":49,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2019,7,12]],"date-time":"2019-07-12T00:00:00Z","timestamp":1562889600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2020,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>This paper proposes four novel term evaluation metrics to represent documents in the text categorization where class distribution is imbalanced. These metrics are achieved from the revision of the four common term evaluation metrics:<jats:italic>chi-square<\/jats:italic>,<jats:italic>information gain<\/jats:italic>,<jats:italic>odds ratio<\/jats:italic>, and<jats:italic>relevance frequency<\/jats:italic>. While the common metrics require a balanced class distribution, our proposed metrics evaluate the document terms under an imbalanced distribution. They calculate the degree of relatedness of terms with respect to minor and major classes by considering their imbalanced distribution. Using these metrics in the document representation makes a better distinction between the documents of the minor and major classes and improves the performance of machine learning algorithms. The proposed metrics are assessed over three popular benchmarks (two subsets of Reuters-21578 and WebKB) by using four classification algorithms: support vector machines, naive Bayes, decision trees, and centroid-based classifiers. Our empirical results indicate that the proposed metrics outperform the common metrics in the imbalanced text categorization.<\/jats:p>","DOI":"10.1017\/s1351324919000317","type":"journal-article","created":{"date-parts":[[2019,7,12]],"date-time":"2019-07-12T10:30:57Z","timestamp":1562927457000},"page":"31-47","source":"Crossref","is-referenced-by-count":9,"title":["Term evaluation metrics in imbalanced text categorization"],"prefix":"10.1017","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4429-5341","authenticated-orcid":false,"given":"Behzad","family":"Naderalvojoud","sequence":"first","affiliation":[]},{"given":"Ebru","family":"Akcapinar Sezer","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2019,7,12]]},"reference":[{"key":"S1351324919000317_ref48","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2012.04.039"},{"key":"S1351324919000317_ref40","author":"Soucy","year":"2005"},{"key":"S1351324919000317_ref37","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560582"},{"key":"S1351324919000317_ref33","first-page":"15","volume-title":"European Conference on Data Mining (ECDM)","author":"Naderalvojoud","year":"2014"},{"key":"S1351324919000317_ref28","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2007.10.042"},{"key":"S1351324919000317_ref27","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84628-754-1_10"},{"key":"S1351324919000317_ref19","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-015-0745-z"},{"key":"S1351324919000317_ref16","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"S1351324919000317_ref15","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-016-0924-1"},{"key":"S1351324919000317_ref13","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2012.06.013"},{"key":"S1351324919000317_ref11","unstructured":"Domeniconi, G. , Moro, G. , Pasolini, R. and Sartori, C. 2015. A Study on term weighting for text categorization: A novel supervised variant of tf.idf. In Proceedings of 4th International Conference on Data Management Technologies and Applications, pp. 26\u201337."},{"key":"S1351324919000317_ref8","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2010.07.003"},{"key":"S1351324919000317_ref4","unstructured":"Cachopo, A.M.d.J.C. 2007. Improving methods for single-label text categorization. Ph.D. dissertation, Universidade T\u00e9cnica de Lisboa."},{"key":"S1351324919000317_ref2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-017-5670-4"},{"key":"S1351324919000317_ref44","doi-asserted-by":"publisher","DOI":"10.1016\/j.proeng.2014.03.129"},{"key":"S1351324919000317_ref42","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2009.07.011"},{"key":"S1351324919000317_ref41","doi-asserted-by":"publisher","DOI":"10.1007\/11731139_30"},{"key":"S1351324919000317_ref23","author":"Lan","year":"2005"},{"key":"S1351324919000317_ref45","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2015.08.050"},{"key":"S1351324919000317_ref46","author":"Yang","year":"1997"},{"key":"S1351324919000317_ref47","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2011.12.005"},{"key":"S1351324919000317_ref32","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1145\/2911451.2914722","volume-title":"Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval","author":"Moreo","year":"2016"},{"key":"S1351324919000317_ref7","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"S1351324919000317_ref31","first-page":"143","article-title":"The chi-square test of independence","volume":"23","author":"McHugh","year":"2012","journal-title":"Biochemia Medica"},{"key":"S1351324919000317_ref24","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.110"},{"key":"S1351324919000317_ref25","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2004.08.006"},{"key":"S1351324919000317_ref18","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-2002-6504"},{"key":"S1351324919000317_ref34","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-24033-6_37"},{"key":"S1351324919000317_ref17","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.07.036"},{"key":"S1351324919000317_ref30","volume-title":"Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Data Sets","author":"Maloof","year":"2003"},{"key":"S1351324919000317_ref12","unstructured":"Dougherty, J. , Kohavi, R. and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 194\u2013202."},{"key":"S1351324919000317_ref5","first-page":"27","article-title":"LIBSVM: A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Transactions on Intelligent Systems and Technology (TIST)"},{"key":"S1351324919000317_ref39","doi-asserted-by":"publisher","DOI":"10.1145\/505282.505283"},{"key":"S1351324919000317_ref35","unstructured":"Nguyen, C. H. and Ho, T.B. 2010. Learning imbalanced data with manifold-based sampling. Japan Advanced Institute of Science and Technology https:\/\/www.jaist.ac.jp\/~bao\/WebPapers\/"},{"key":"S1351324919000317_ref29","first-page":"445","volume-title":"AMIA Annual Symposium Proceedings","author":"Lustgarten","year":"2008"},{"key":"S1351324919000317_ref21","doi-asserted-by":"publisher","DOI":"10.1002\/asi.23338"},{"key":"S1351324919000317_ref38","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"S1351324919000317_ref6","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007733"},{"key":"S1351324919000317_ref49","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007741"},{"key":"S1351324919000317_ref43","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.02.019"},{"key":"S1351324919000317_ref20","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(02)00056-0"},{"key":"S1351324919000317_ref14","doi-asserted-by":"publisher","DOI":"10.1145\/1007730.1007736"},{"key":"S1351324919000317_ref36","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2013.02.029"},{"key":"S1351324919000317_ref22","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324917000298"},{"key":"S1351324919000317_ref9","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45219-5_7"},{"key":"S1351324919000317_ref3","first-page":"148","volume-title":"12th International Conference on Semantic Computing (ICSC)","author":"Bloodgood","year":"2018"},{"key":"S1351324919000317_ref1","first-page":"1","volume-title":"Second International Conference on Electrical, Computer and Communication Technologies (ICECCT)","author":"Awasare","year":"2017"},{"key":"S1351324919000317_ref26","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2006.158"},{"key":"S1351324919000317_ref10","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.10.056"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324919000317","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,12]],"date-time":"2021-01-12T06:02:45Z","timestamp":1610431365000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324919000317\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,12]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1]]}},"alternative-id":["S1351324919000317"],"URL":"https:\/\/doi.org\/10.1017\/s1351324919000317","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,12]]}}}