{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T13:19:34Z","timestamp":1768569574627,"version":"3.49.0"},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2012,3,30]],"date-time":"2012-03-30T00:00:00Z","timestamp":1333065600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2012,3,30]],"date-time":"2012-03-30T00:00:00Z","timestamp":1333065600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Braz Comput Soc"],"published-print":{"date-parts":[[2012,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled instances to train a classifier. In such circumstances it is common to have massive corpora where a few instances are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in unlabeled instances to improve classification models. However, these techniques assume that the labeled instances cover all the classes to learn which might not be the case. Moreover, when in the presence of an imbalanced class distribution, getting labeled instances from minority classes might be very costly, requiring extensive labeling, if queries are randomly selected. Active learning allows asking an oracle to label new instances, which are selected by criteria, aiming to reduce the labeling effort. D-Confidence is an active learning approach that is effective when in presence of imbalanced training sets. In this paper we evaluate the performance of d-Confidence in comparison to its baseline criteria over tabular and text datasets. We provide empirical evidence that d-Confidence reduces label disclosure complexity\u2014which we have defined as the number of queries required to identify instances from all classes to learn\u2014when in the presence of imbalanced data.<\/jats:p>","DOI":"10.1007\/s13173-012-0069-3","type":"journal-article","created":{"date-parts":[[2012,3,29]],"date-time":"2012-03-29T11:20:09Z","timestamp":1333020009000},"page":"311-330","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions"],"prefix":"10.1007","volume":"18","author":[{"given":"Nuno Filipe","family":"Escudeiro","sequence":"first","affiliation":[]},{"given":"Al\u00edpio M\u00e1rio","family":"Jorge","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,3,30]]},"reference":[{"key":"69_CR1","unstructured":"Uc irvine machine learning repository (2009). http:\/\/archive.ics.uci.edu\/ml\/"},{"key":"69_CR2","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1016\/j.datak.2004.11.003","volume":"54","author":"G Adami","year":"2005","unstructured":"Adami G, Avesani P, Sona D (2005) Clustering documents into a web directory for bootstrapping a supervised classification. Data Knowl Eng 54:301\u2013325","journal-title":"Data Knowl Eng"},{"key":"69_CR3","first-page":"420","volume-title":"Proceedings of the 8th international conference on database theory, ICDT\u201901","author":"CC Aggarwal","year":"2001","unstructured":"Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: Proceedings of the 8th international conference on database theory, ICDT\u201901. Springer, London, pp\u00a0420\u2013434. http:\/\/dl.acm.org\/citation.cfm?id=645504.656414"},{"key":"69_CR4","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1007\/BF00116828","volume":"2","author":"D Angluin","year":"1988","unstructured":"Angluin D (1988) Queries and concept learning. Mach Learn 2:319\u2013342. doi:10.1007\/BF00116828","journal-title":"Mach Learn"},{"key":"69_CR5","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1145\/1143844.1143853","volume-title":"ICML","author":"MF Balcan","year":"2006","unstructured":"Balcan MF, Beygelzimer A, Langford J (2006) Agnostic active learning. In: ICML, pp\u00a065\u201372."},{"key":"69_CR6","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1109\/72.80287","volume":"2","author":"E Baum","year":"1991","unstructured":"Baum E (1991) Neural net algorithms that learn in polynomial time from examples and queries. IEEE Trans Neural Netw 2:5\u201319","journal-title":"IEEE Trans Neural Netw"},{"key":"69_CR7","volume-title":"Dynamic programming","author":"RE Bellman","year":"1957","unstructured":"Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton"},{"key":"69_CR8","volume-title":"Active learning: creating excitement in the classroom","author":"CC Bonwell","year":"1991","unstructured":"Bonwell CC, Eison JA (1991) Active learning: creating excitement in the classroom. Jossey-Bass, San Francisco"},{"key":"69_CR9","volume-title":"Proceedings of the twentieth international conference on machine learning","author":"K Brinker","year":"2003","unstructured":"Brinker K (2003) Incorporating diversity in active learning with support vector machines. In: Proceedings of the twentieth international conference on machine learning"},{"key":"69_CR10","volume-title":"Mining the Web: discovering knowledge from hypertext data","author":"S Chakrabarti","year":"2002","unstructured":"Chakrabarti S (2002) Mining the Web: discovering knowledge from hypertext data. Morgan Kauffman, San Mateo. http:\/\/www.cse.iitb.ac.in\/~soumen\/mining-the-web\/"},{"key":"69_CR11","volume-title":"Semi-supervised learning","year":"2006","unstructured":"Chapelle O, Schoelkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge"},{"key":"69_CR12","volume-title":"Advances in neural information processing systems","author":"D Cohn","year":"1990","unstructured":"Cohn D, Atlas L, Ladner R (1990) Training connectionist networks with queries and selective sampling. In: Advances in neural information processing systems"},{"key":"69_CR13","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1023\/A:1022673506211","volume":"15","author":"D Cohn","year":"1994","unstructured":"Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15:201\u2013221. doi:10.1023\/A:1022673506211. http:\/\/portal.acm.org\/citation.cfm?id=189256.189489","journal-title":"Mach Learn"},{"key":"69_CR14","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1613\/jair.295","volume":"4","author":"D Cohn","year":"1996","unstructured":"Cohn D, Ghahramani Z, Jordan M (1996) Active learning with statistical models. J Artif Intell Res 4:129\u2013145","journal-title":"J Artif Intell Res"},{"key":"69_CR15","first-page":"18","volume-title":"Advances in neural information processing systems","author":"S Dasgupta","year":"2005","unstructured":"Dasgupta S (2005) Coarse sample complexity bonds for active learning. In: Advances in neural information processing systems, p\u00a018"},{"key":"69_CR16","volume-title":"Proceedings of the 25th international conference on machine learning","author":"S Dasgupta","year":"2008","unstructured":"Dasgupta S, Hsu D (2008) Hierarchical sampling for active learning. In: Proceedings of the 25th international conference on machine learning"},{"key":"69_CR17","series-title":"LNCS","first-page":"82","volume-title":"Semi-automatic creation and maintenance of web resources with web Topic","author":"N Escudeiro","year":"2006","unstructured":"Escudeiro N, Jorge A (2006) Semantics, web and mining. In: Semi-automatic creation and maintenance of web resources with web Topic. LNCS, vol\u00a04289. Springer, Heidelberg, pp\u00a082\u2013102"},{"key":"69_CR18","volume-title":"Brazilian symposium on artificial intelligence, web and text intelligence workshop","author":"N Escudeiro","year":"2008","unstructured":"Escudeiro N, Jorge A (2008) Learning partially specified concepts with d-confidence. In: Brazilian symposium on artificial intelligence, web and text intelligence workshop"},{"key":"69_CR19","first-page":"411","volume-title":"Progress in artificial intelligence, proceedings of the 14th Portuguese conference on artificial intelligence (EPIA 2009)","author":"N Escudeiro","year":"2009","unstructured":"Escudeiro N, Jorge A (2009) Efficient coverage of case space with active learning. In: Lopes LS, Lau N (eds) Progress in artificial intelligence, proceedings of the 14th Portuguese conference on artificial intelligence (EPIA 2009), vol\u00a05816. Springer, Berlin, pp\u00a0411\u2013422"},{"key":"69_CR20","first-page":"18","volume-title":"Proceedings of the NAACL HLT 2010 workshop on active learning for natural language processing, association for computational linguistics","author":"N Escudeiro","year":"2010","unstructured":"Escudeiro N, Jorge AM (2010) D-Confidence: an active learning strategy which efficiently identifies small classes. In: Proceedings of the NAACL HLT 2010 workshop on active learning for natural language processing, association for computational linguistics, Los Angeles, CA, pp\u00a018\u201326. http:\/\/10.255.0.115\/pub\/2010\/EJ10"},{"key":"69_CR21","volume-title":"Proceedings of the III international workshop on web and text intelligence (WTI\u20142010)","author":"N Escudeiro","year":"2010","unstructured":"Escudeiro N, Jorge AM (2010) Reducing label complexity in the presence of imbalanced class distributions. In: Proceedings of the III international workshop on web and text intelligence (WTI\u20142010), S\u00e3o Bernardo do Campo, S\u00e3o Paulo, Brazil. http:\/\/10.255.0.115\/pub\/2010\/EJ10a"},{"key":"69_CR22","volume-title":"Proceedings of the 24th international conference on machine learning","author":"S Hanneke","year":"2007","unstructured":"Hanneke S (2007) A\u00a0bound on the label complexity of agnostic active learning. In: Proceedings of the 24th international conference on machine learning"},{"issue":"2","key":"69_CR23","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1287\/moor.10.2.180","volume":"10","author":"D Hochbaum","year":"1985","unstructured":"Hochbaum D, Shmoys D (1985) A\u00a0best possible heuristic for the k-center problem. Math Oper Res 10(2):180\u2013184","journal-title":"Math Oper Res"},{"key":"69_CR24","volume-title":"Proceedings of the world wide web conference","author":"S Hoi","year":"2006","unstructured":"Hoi S, Jin R, Lyu M (2006) Large-scale text categorization by batch mode active learning. In: Proceedings of the world wide web conference"},{"issue":"3","key":"69_CR25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1508850.1508854","volume":"27","author":"SCH Hoi","year":"2009","unstructured":"Hoi SCH, Jin R, Zhu J, Lyu MR (2009) Semisupervised svm batch mode active learning with applications to image retrieval. ACM Trans Inf Syst 27(3):1\u201329. doi:10.1145\/1508850.1508854","journal-title":"ACM Trans Inf Syst"},{"issue":"5","key":"69_CR26","doi-asserted-by":"publisher","first-page":"1147","DOI":"10.1109\/TSMCB.2009.2013197","volume":"39","author":"W Hu","year":"2009","unstructured":"Hu W, Hu W, Xie N, Maybank S (2009) Unsupervised active learning based on hierarchical graph-theoretic clustering. Trans Syst Man Cybern, Part B 39(5):1147\u20131161. doi:10.1109\/TSMCB.2009.2013197","journal-title":"Trans Syst Man Cybern, Part B"},{"key":"69_CR27","doi-asserted-by":"publisher","first-page":"839","DOI":"10.1109\/ICDM.2008.80","volume-title":"ICDM\u201908: proceedings of the 2008 eighth IEEE international conference on data mining","author":"A Huang","year":"2008","unstructured":"Huang A, Milne D, Frank E, Witten IH (2008) Clustering documents with active learning using Wikipedia. In: ICDM\u201908: proceedings of the 2008 eighth IEEE international conference on data mining. IEEE Comput. Soc., Washington, pp\u00a0839\u2013844. doi:10.1109\/ICDM.2008.80"},{"key":"69_CR28","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1007\/11894841_9","volume-title":"Algorithmic learning theory","author":"M K\u00e4\u00e4ri\u00e4inen","year":"2006","unstructured":"K\u00e4\u00e4ri\u00e4inen M (2006) Active learning in the non-realizable case. In: Algorithmic learning theory. Springer, Berlin\/Heidelberg, pp\u00a063\u201377"},{"key":"69_CR29","first-page":"3","volume-title":"SIGIR\u201994: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval","author":"DD Lewis","year":"1994","unstructured":"Lewis DD, Gale WA (1994) A\u00a0sequential algorithm for training text classifiers. In: SIGIR\u201994: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer, New York, pp\u00a03\u201312"},{"key":"69_CR30","doi-asserted-by":"publisher","first-page":"1251","DOI":"10.1109\/TPAMI.2006.156","volume":"28","author":"M Li","year":"2006","unstructured":"Li M, Sethi I (2006) Confidence-based active learning. IEEE Trans Pattern Anal Mach Intell 28:1251\u20131261","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"69_CR31","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-3359-4","volume-title":"Instance selection and construction for data mining","author":"H Liu","year":"2001","unstructured":"Liu H, Motoda H (2001) Instance selection and construction for data mining. Kluwer Academic, Dordrecht"},{"key":"69_CR32","volume-title":"Machine learning","author":"TM Mitchell","year":"1997","unstructured":"Mitchell TM (1997) Machine learning. McGraw-Hill, New York"},{"key":"69_CR33","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1613\/jair.2005","volume":"27","author":"I Muslea","year":"2006","unstructured":"Muslea I, Minton S, Knoblock CA (2006) Active learning with multiple views. J Artif Intell Res 27:203\u2013233","journal-title":"J Artif Intell Res"},{"key":"69_CR34","first-page":"623","volume-title":"Proceedings of the 21st international conference on machine learning","author":"HT Nguyen","year":"2004","unstructured":"Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the 21st international conference on machine learning. ACM, New York, pp\u00a0623\u2013630"},{"key":"69_CR35","volume-title":"Proceedings of the European conference on the use of modern information and communication technologies","author":"P Ribeiro","year":"2008","unstructured":"Ribeiro P, Escudeiro N (2008) On-line news \u201c\u00e0 la carte\u201d. In: Proceedings of the European conference on the use of modern information and communication technologies"},{"key":"69_CR36","first-page":"441","volume-title":"Proceedings of the eighteenth international conference on machine learning, ICML\u201901","author":"N Roy","year":"2001","unstructured":"Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the eighteenth international conference on machine learning, ICML\u201901. Morgan Kaufmann, San Francisco, pp\u00a0441\u2013448. http:\/\/portal.acm.org\/citation.cfm?id=645530.655646"},{"key":"69_CR37","volume-title":"Proceedings of the international conference on machine learning","author":"G Schohn","year":"2000","unstructured":"Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. In: Proceedings of the international conference on machine learning"},{"key":"69_CR38","volume-title":"Proceedings of the 5th annual workshop on computational learning theory","author":"H Seung","year":"1992","unstructured":"Seung H, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the 5th annual workshop on computational learning theory"}],"container-title":["Journal of the Brazilian Computer Society"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13173-012-0069-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13173-012-0069-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s13173-012-0069-3","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13173-012-0069-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T18:43:13Z","timestamp":1630521793000},"score":1,"resource":{"primary":{"URL":"https:\/\/journal-bcs.springeropen.com\/articles\/10.1007\/s13173-012-0069-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3,30]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,11]]}},"alternative-id":["69"],"URL":"https:\/\/doi.org\/10.1007\/s13173-012-0069-3","relation":{},"ISSN":["0104-6500","1678-4804"],"issn-type":[{"value":"0104-6500","type":"print"},{"value":"1678-4804","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,3,30]]},"assertion":[{"value":"2 August 2011","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 March 2012","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 March 2012","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}