{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:19:17Z","timestamp":1750306757898,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,11,1]],"date-time":"2013-11-01T00:00:00Z","timestamp":1383264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2013,11]]},"abstract":"<jats:p>\n            In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain. Semisupervised learning and active learning are two strategies whose aim is maximizing the effectiveness of the resulting classifiers for a given amount of training effort. Both strategies have been actively investigated for TC in recent years. Much less research has been devoted to a third such strategy,\n            <jats:italic>training label cleaning<\/jats:italic>\n            (TLC), which consists in devising ranking functions that sort the original training examples in terms of how likely it is that the human annotator has mislabelled them. This provides a convenient means for the human annotator to revise the training set so as to improve its quality. Working in the context of boosting-based learning methods for multilabel classification we present three different techniques for performing TLC and, on three widely used TC benchmarks, evaluate them by their capability of spotting training documents that, for experimental reasons only, we have purposefully mislabelled. We also evaluate the degradation in classification effectiveness that these mislabelled texts bring about, and to what extent training label cleaning can prevent this degradation.\n          <\/jats:p>","DOI":"10.1145\/2516889","type":"journal-article","created":{"date-parts":[[2013,12,4]],"date-time":"2013-12-04T14:04:47Z","timestamp":1386165887000},"page":"1-28","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Improving Text Classification Accuracy by Training Label Cleaning"],"prefix":"10.1145","volume":"31","author":[{"given":"Andrea","family":"Esuli","sequence":"first","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Italy"}]},{"given":"Fabrizio","family":"Sebastiani","sequence":"additional","affiliation":[{"name":"Consiglio Nazionale delle Ricerche, Italy"}]}],"member":"320","published-online":{"date-parts":[[2013,11]]},"reference":[{"volume-title":"Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP\/VLC\u201999)","author":"Abney S.","key":"e_1_2_1_1_1","unstructured":"Abney , S. , Schapire , R. E. , and Singer , Y . 1999. Boosting applied to tagging and PP attachment . In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP\/VLC\u201999) . 38--45. Abney, S., Schapire, R. E., and Singer, Y. 1999. Boosting applied to tagging and PP attachment. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP\/VLC\u201999). 38--45."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2007.21"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/3013545.3013554"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018054314350"},{"volume-title":"Proceedings of the 13th Conference of the American Association for Artificial Intelligence (AAAI\u201996)","author":"Brodley C. E.","key":"e_1_2_1_5_1","unstructured":"Brodley , C. E. and Friedl , M. A . 1996. Identifying and eliminating mislabeled training instances . In Proceedings of the 13th Conference of the American Association for Artificial Intelligence (AAAI\u201996) . 799--805. Brodley, C. E. and Friedl, M. A. 1996. Identifying and eliminating mislabeled training instances. In Proceedings of the 13th Conference of the American Association for Artificial Intelligence (AAAI\u201996). 799--805."},{"key":"e_1_2_1_6_1","volume-title":"Eds","author":"Chapelle O.","year":"2006","unstructured":"Chapelle , O. , Sch\u00f6lkopf , B. , and Zien , A. , Eds . 2006 . Semi-Supervised Learning. MIT Press , Cambridge, MA. Chapelle, O., Sch\u00f6lkopf, B., and Zien, A., Eds. 2006. Semi-Supervised Learning. MIT Press, Cambridge, MA."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022673506211"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067823"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007607513941"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/974305.974325"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04417-5_4"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.2501\/S147078531020165X"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/11880561_1"},{"key":"e_1_2_1_14_1","volume-title":"Advances in Neural Information Processing Systems","volume":"5","author":"Freund Y.","unstructured":"Freund , Y. , Seung , H. S. , Shamir , E. , and Tishby , N . 1992. Information, prediction, and query by committee . In Advances in Neural Information Processing Systems , Vol. 5 , MIT Press, Cambridge, MA, 483--490. Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1992. Information, prediction, and query by committee. In Advances in Neural Information Processing Systems, Vol. 5, MIT Press, Cambridge, MA, 483--490."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1016218223"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220480"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/646633.699638"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1992.4.1.1"},{"volume-title":"Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk. 172--179","author":"Grady C.","key":"e_1_2_1_19_1","unstructured":"Grady , C. and Lease , M . 2010. Crowdsourcing document relevance assessment with Mechanical fiTurk . In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk. 172--179 . Grady, C. and Lease, M. 2010. Crowdsourcing document relevance assessment with Mechanical fiTurk. In Proceedings of the NAACL HLT Workshop on Creating Speech and Language Data with Amazon\u2019s Mechanical Turk. 172--179."},{"volume-title":"Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR\u201994)","author":"Hersh W.","key":"e_1_2_1_20_1","unstructured":"Hersh , W. , Buckley , C. , Leone , T. , and Hickman , D . 1994. OHSUMED: An interactive retrieval evaluation and new large text collection for research . In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR\u201994) . 192--201. Hersh, W., Buckley, C., Leone, T., and Hickman, D. 1994. OHSUMED: An interactive retrieval evaluation and new large text collection for research. In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR\u201994). 192--201."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/345508.345545"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD\u201995)","author":"John G. H.","year":"1995","unstructured":"John , G. H. 1995 . Robust decision trees: Removing outliers from databases . In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD\u201995) . 174--179. John, G. H. 1995. Robust decision trees: Removing outliers from databases. In Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD\u201995). 174--179."},{"key":"e_1_2_1_23_1","unstructured":"Lewis D. D. 2004. Reuters-21578 text categorization test collection Distribution 1.0 README file (v 1.3). http:\/\/www.daviddlewis.com\/resources\/testcollections\/reuters21578\/readme.txt.  Lewis D. D. 2004. Reuters-21578 text categorization test collection Distribution 1.0 README file (v 1.3). http:\/\/www.daviddlewis.com\/resources\/testcollections\/reuters21578\/readme.txt."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/243199.243277"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1005345"},{"volume-title":"Proceedings of the 14th Conference of the American Association for Artificial Intelligence (AAAI\u201997)","author":"Maclin R.","key":"e_1_2_1_26_1","unstructured":"Maclin , R. and Opitz , D. W . 1997. An empirical evaluation of bagging and boosting . In Proceedings of the 14th Conference of the American Association for Artificial Intelligence (AAAI\u201997) . 546--551. Maclin, R. and Opitz, D. W. 1997. An empirical evaluation of bagging and boosting. In Proceedings of the 14th Conference of the American Association for Artificial Intelligence (AAAI\u201997). 546--551."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2011.36"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066078.1066080"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072329"},{"key":"e_1_2_1_30_1","unstructured":"Resta G. 2012. On the expected average precision of the random ranker. Tech. rep. IIT TR-04\/2012 Istituto di Informatica e Telematica Consiglio Nazionale delle Ricerche Pisa IT. http:\/\/www.iit.cnr.it\/sites\/default\/files\/TR-04-2012.pdf.  Resta G. 2012. On the expected average precision of the random ranker. Tech. rep. IIT TR-04\/2012 Istituto di Informatica e Telematica Consiglio Nazionale delle Ricerche Pisa IT. http:\/\/www.iit.cnr.it\/sites\/default\/files\/TR-04-2012.pdf."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007614523901"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007649029923"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/2207821"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the IJCAI Workshop on Text Learning Beyond Supervision.","author":"Shinnou H.","year":"2001","unstructured":"Shinnou , H. 2001 . Detection of errors in training data by using a decision list and Adaboost . In Proceedings of the IJCAI Workshop on Text Learning Beyond Supervision. Shinnou, H. 2001. Detection of errors in training data by using a decision list and Adaboost. In Proceedings of the IJCAI Workshop on Text Learning Beyond Supervision."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148253"},{"volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908)","author":"Snow R.","key":"e_1_2_1_36_1","unstructured":"Snow , R. , O\u2019Connor , B. , Jurafsky , D. , and Ng , A. Y . 2008. Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks . In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908) . 254--263. Snow, R., O\u2019Connor, B., Jurafsky, D., and Ng, A. Y. 2008. Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908). 254--263."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.248"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188496"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009982220290"},{"volume-title":"Proceedings of the 4th Conference on Email and Anti-Spam (CEAS\u201907)","author":"Yih W.-T.","key":"e_1_2_1_40_1","unstructured":"Yih , W.-T. , McCann , R. , and Kolcz , A . 2007. Improving spam filtering by detecting gray mail . In Proceedings of the 4th Conference on Email and Anti-Spam (CEAS\u201907) . Yih, W.-T., McCann, R., and Kolcz, A. 2007. Improving spam filtering by detecting gray mail. In Proceedings of the 4th Conference on Email and Anti-Spam (CEAS\u201907)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/11563983_32"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390442"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/1294000.1294004"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Zhu X. and Goldberg A. B. 2009. Introduction to Semi-Supervised Learning. Morgan and Claypool San Rafael CA.   Zhu X. and Goldberg A. B. 2009. Introduction to Semi-Supervised Learning . Morgan and Claypool San Rafael CA.","DOI":"10.1007\/978-3-031-01548-9"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2516889","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2516889","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:28:39Z","timestamp":1750231719000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2516889"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,11]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,11]]}},"alternative-id":["10.1145\/2516889"],"URL":"https:\/\/doi.org\/10.1145\/2516889","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"type":"print","value":"1046-8188"},{"type":"electronic","value":"1558-2868"}],"subject":[],"published":{"date-parts":[[2013,11]]},"assertion":[{"value":"2012-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}