{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T02:27:06Z","timestamp":1771554426483,"version":"3.50.1"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2010,4,1]],"date-time":"2010-04-01T00:00:00Z","timestamp":1270080000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["60873091"],"award-info":[{"award-number":["60873091"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Speech Lang. Process."],"published-print":{"date-parts":[[2010,4]]},"abstract":"<jats:p>The labor-intensive task of labeling data is a serious bottleneck for many supervised learning approaches for natural language processing applications. Active learning aims to reduce the human labeling cost for supervised learning methods. Determining when to stop the active learning process is a very important practical issue in real-world applications. This article addresses the stopping criterion issue of active learning, and presents four simple stopping criteria based on confidence estimation over the unlabeled data pool, including<jats:italic>maximum uncertainty<\/jats:italic>,<jats:italic>overall uncertainty<\/jats:italic>,<jats:italic>selected accuracy,<\/jats:italic>and<jats:italic>minimum expected error<\/jats:italic>methods. Further, to obtain a proper threshold for a stopping criterion in a specific task, this article presents a strategy by considering the label change factor to dynamically update the predefined threshold of a stopping criterion during the active learning process. To empirically analyze the effectiveness of each stopping criterion for active learning, we design several comparison experiments on seven real-world datasets for three representative natural language processing applications such as word sense disambiguation, text classification and opinion analysis.<\/jats:p>","DOI":"10.1145\/1753783.1753784","type":"journal-article","created":{"date-parts":[[2010,4,27]],"date-time":"2010-04-27T12:45:25Z","timestamp":1272372325000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":61,"title":["Confidence-based stopping criteria for active learning for data annotation"],"prefix":"10.1145","volume":"6","author":[{"given":"Jingbo","family":"Zhu","sequence":"first","affiliation":[{"name":"Northeastern University, China"}]},{"given":"Huizhen","family":"Wang","sequence":"additional","affiliation":[{"name":"Northeastern University, China"}]},{"given":"Eduard","family":"Hovy","sequence":"additional","affiliation":[{"name":"University of Southern California, Marina del Rey, CA"}]},{"given":"Matthew","family":"Ma","sequence":"additional","affiliation":[{"name":"Scientific Works, Princeton Junction, NJ"}]}],"member":"320","published-online":{"date-parts":[[2010,4,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022821128753"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1005342"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996","author":"Becker M.","unstructured":"Becker , M. and Osborne , M . 2005. A two-stage method for active learning of statistical grammars . In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996 . Becker, M. and Osborne, M. 2005. A two-stage method for active learning of statistical grammars. In Proceedings of the 19th International Joint Conference on Artificial Intelligence. 991--996."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/234285.234289"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/981732.981752"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the International Conference on Machine Learning. 111--118","author":"Campbell C.","unstructured":"Campbell , C. , Cristianini , N. , and Smola , A . 2000. Query learning with large margin classifiers . In Proceedings of the International Conference on Machine Learning. 111--118 . Campbell, C., Cristianini, N., and Smola, A. 2000. Query learning with large margin classifiers. In Proceedings of the International Conference on Machine Learning. 111--118."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56","author":"Chan Y. S.","unstructured":"Chan , Y. S. and Ng , H. T . 2007. Domain adaptation with active learning for word sense disambiguation . In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56 . Chan, Y. S. and Ng, H. T. 2007. Domain adaptation with active learning for word sense disambiguation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 49--56."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220835.1220851"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1022673506211"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1622737.1622744"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/1619410.1619452"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111","author":"Dimistakakis C.","unstructured":"Dimistakakis , C. and Savu-Krohn , C . 2008. Cost-Minimizing strategies for data labeling: Optimal stopping and active learning . In Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111 . Dimistakakis, C. and Savu-Krohn, C. 2008. Cost-Minimizing strategies for data labeling: Optimal stopping and active learning. In Proceedings of the 5th International Symposium on Foundations of Information and Knowledge Systems (FoIKS). 96--111."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-74958-5_14"},{"key":"e_1_2_1_14_1","unstructured":"Duda R. O. and Hart P. E. 1973. Pattern Classification and Scene Analysis. Wiley New York. Duda R. O. and Hart P. E. 1973. Pattern Classification and Scene Analysis. Wiley New York."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the International Conference on Machine Learning. 150--157","author":"Dagan I.","unstructured":"Dagan , I. and Engelson , S. P . 1995. Committee-Based sampling for training probabilistic classifiers . In Proceedings of the International Conference on Machine Learning. 150--157 . Dagan, I. and Engelson, S. P. 1995. Committee-Based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning. 150--157."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007330508534"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60","author":"Hovy E. H.","unstructured":"Hovy , E. H. , Marcus , M. , Palmer , M. , Ramshaw , L. , and Weischedel , R . 2006. Ontonotes: The 90&percnt; solution . In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60 . Hovy, E. H., Marcus, M., Palmer, M., Ramshaw, L., and Weischedel, R. 2006. Ontonotes: The 90&percnt; solution. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL. 57--60."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.3115\/1117794.1117800"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/11871842_68"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 22nd International Conference on Computational Linguistics. 465--472","author":"Laws F.","unstructured":"Laws , F. and Sch\u00fctze , H . 2008. Stopping criteria for active learning of named entity recognition . In Proceedings of the 22nd International Conference on Computational Linguistics. 465--472 . Laws, F. and Sch\u00fctze, H. 2008. Stopping criteria for active learning of named entity recognition. In Proceedings of the 22nd International Conference on Computational Linguistics. 465--472."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118693.1118699"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12","author":"Lewis D.","unstructured":"Lewis , D. and Gale , W . 1994. A sequential algorithm for training text classifiers . In Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12 . Lewis, D. and Gale, W. 1994. A sequential algorithm for training text classifiers. In Proceedings of 17th ACM International Conference on Research and Development in Information Retrieval. 3--12."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.156"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/972470.972475"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of 15th International Conference on Machine Learning. 350--358","author":"McCallum A.","unstructured":"McCallum , A. and Nigram , K . 1998a. Employing EM in pool-based active learning for text classification . In Proceedings of 15th International Conference on Machine Learning. 350--358 . McCallum, A. and Nigram, K. 1998a. Employing EM in pool-based active learning for text classification. In Proceedings of 15th International Conference on Machine Learning. 350--358."},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of AAAI-98 Workshop on Learning for Text Categorization.","author":"McCallum A.","unstructured":"McCallum , A. and Nigram , K . 1998b. A comparison of event models for na\u00efve Bayes text classification . In Proceedings of AAAI-98 Workshop on Learning for Text Categorization. McCallum, A. and Nigram, K. 1998b. A comparison of event models for na\u00efve Bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/981863.981869"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075218.1075234"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66","author":"Philpot A.","unstructured":"Philpot , A. , Hovy , E. H. , and Pantel , P . 2005. The Omega ontology . In Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66 . Philpot, A., Hovy, E. H., and Pantel, P. 2005. The Omega ontology. In Proceedings of OntoLex Conference onOntologies and Lexical Resources. 59--66."},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Platt J. 1999. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Classifiers. 61--74. Platt J. 1999. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Classifiers. 61--74.","DOI":"10.7551\/mitpress\/1113.003.0008"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the National Conference on Artificial Intelligence. 683--688","author":"Roth D.","unstructured":"Roth , D. and Small , K . 2008. Active learning for pipeline models . In Proceedings of the National Conference on Artificial Intelligence. 683--688 . Roth, D. and Small, K. 2008. Active learning for pipeline models. In Proceedings of the National Conference on Artificial Intelligence. 683--688."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 18th International Conference on Machine Learning. 441--448","author":"Roy N.","year":"2001","unstructured":"Roy , N. and McCallum , A. 2001 . Toward optimal active learning through sampling estimation of error reduction . In Proceedings of the 18th International Conference on Machine Learning. 441--448 . Roy, N. and McCallum, A. 2001. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th International Conference on Machine Learning. 441--448."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-007-5019-5"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 17th International Conference on Machine Learning. 839--846","author":"Schohn G.","unstructured":"Schohn , G. and Cohn , D . 2000. Less is more: Active learning with support vector machines . In Proceedings of the 17th International Conference on Machine Learning. 839--846 . Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proceedings of the 17th International Conference on Machine Learning. 839--846."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/130385.130417"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219030"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073105"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244302760185243"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495","author":"Tomanek K.","unstructured":"Tomanek , K. , Wermter , J. , and Hahn , U . 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data . In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495 . Tomanek, K., Wermter, J., and Hahn, U. 2007. An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 486--495."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 16th International Conference on Machine Learning. 406--414","author":"Thompson C. A.","unstructured":"Thompson , C. A. , Califf , M. E. , and Mooney , R. J . 1999. Active learning for natural language parsing and information extraction . In Proceedings of the 16th International Conference on Machine Learning. 406--414 . Thompson, C. A., Califf, M. E., and Mooney, R. J. 1999. Active learning for natural language parsing and information extraction. In Proceedings of the 16th International Conference on Machine Learning. 406--414."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2007.12.001"},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the AAAI Spring Symposium on New Directions in Question Answering.","author":"Wiebe J.","year":"2003","unstructured":"Wiebe , J. , Breck , E. , Buckley , C. , Cardie , C. , Davis , P. , 2003 . Recognizing and organizing opinions expressed in the world press . In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering. Wiebe, J., Breck, E., Buckley, C., Cardie, C., Davis, P., et al. 2003. Recognizing and organizing opinions expressed in the world press. In Proceedings of the AAAI Spring Symposium on New Directions in Question Answering."},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790","author":"Zhu J.","unstructured":"Zhu , J. and Hovy , E. H . 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem . In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790 . Zhu, J. and Hovy, E. H. 2007. Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In Proceedings of the Joint Meeting of the Conference on Empirical Methods on Natural Language Processing and the Conference on Natural Language Learning. 783--790."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372","author":"Zhu J.","unstructured":"Zhu , J. , Wang , H. , and Hovy , E. H . 2008a. Learning a stopping criterion for active learning for word sense disambiguation and text classification . In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372 . Zhu, J., Wang, H., and Hovy, E. H. 2008a. Learning a stopping criterion for active learning for word sense disambiguation and text classification. In Proceedings of the 3rd International Joint Conference on Natural Language Processing. 366--372."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136","author":"Zhu J.","unstructured":"Zhu , J. , Wang , H. , and Hovy , E. H . 2008b. Multi-Criteria-Based strategy to stop active learning for data annotation . In Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136 . Zhu, J., Wang, H., and Hovy, E. H. 2008b. Multi-Criteria-Based strategy to stop active learning for data annotation. In Proceedings of the 22nd International Conference on Computational Linguistics. 1129--1136."}],"container-title":["ACM Transactions on Speech and Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1753783.1753784","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1753783.1753784","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T12:41:32Z","timestamp":1750250492000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1753783.1753784"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,4]]},"references-count":45,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2010,4]]}},"alternative-id":["10.1145\/1753783.1753784"],"URL":"https:\/\/doi.org\/10.1145\/1753783.1753784","relation":{},"ISSN":["1550-4875","1550-4883"],"issn-type":[{"value":"1550-4875","type":"print"},{"value":"1550-4883","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,4]]},"assertion":[{"value":"2008-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-04-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}