{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T17:27:31Z","timestamp":1770917251166,"version":"3.50.1"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2013,9,1]],"date-time":"2013-09-01T00:00:00Z","timestamp":1377993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"],"award-info":[{"award-number":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"],"award-info":[{"award-number":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000143","name":"Division of Computing and Communication Foundations","doi-asserted-by":"publisher","award":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"],"award-info":[{"award-number":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"]}],"id":[{"id":"10.13039\/100000143","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000006","name":"Office of Naval Research","doi-asserted-by":"publisher","award":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"],"award-info":[{"award-number":["IIS-0953662, CCF-1025177, NIH LM010730, ONR N00014-11-1-0108"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2013,9]]},"abstract":"<jats:p>Active Learning is a machine learning and data mining technique that selects the most informative samples for labeling and uses them as training data; it is especially useful when there are large amount of unlabeled data and labeling them is expensive. Recently, batch-mode active learning, where a set of samples are selected concurrently for labeling, based on their collective merit, has attracted a lot of attention. The objective of batch-mode active learning is to select a set of informative samples so that a classifier learned on these samples has good generalization performance on the unlabeled data. Most of the existing batch-mode active learning methodologies try to achieve this by selecting samples based on certain criteria. In this article we propose a novel criterion which achieves good generalization performance of a classifier by specifically selecting a set of query samples that minimize the difference in distribution between the labeled and the unlabeled data, after annotation. We explicitly measure this difference based on all candidate subsets of the unlabeled data and select the best subset. The proposed objective is an NP-hard integer programming optimization problem. We provide two optimization techniques to solve this problem. In the first one, the problem is transformed into a convex quadratic programming problem and in the second method the problem is transformed into a linear programming problem. Our empirical studies using publicly available UCI datasets and two biomedical image databases demonstrate the effectiveness of the proposed approach in comparison with the state-of-the-art batch-mode active learning methods. We also present two extensions of the proposed approach, which incorporate uncertainty of the predicted labels of the unlabeled data and transfer learning in the proposed formulation. In addition, we present a joint optimization framework for performing both transfer and active learning simultaneously unlike the existing approaches of learning in two separate stages, that is, typically, transfer learning followed by active learning. We specifically minimize a common objective of reducing distribution difference between the domain adapted source, the queried and labeled samples and the rest of the unlabeled target domain data. Our empirical studies on two biomedical image databases and on a publicly available 20 Newsgroups dataset show that incorporation of uncertainty information and transfer learning further improves the performance of the proposed active learning based classifier. Our empirical studies also show that the proposed transfer-active method based on the joint optimization framework performs significantly better than a framework which implements transfer and active learning in two separate stages.<\/jats:p>","DOI":"10.1145\/2513092.2513094","type":"journal-article","created":{"date-parts":[[2013,9,17]],"date-time":"2013-09-17T19:57:05Z","timestamp":1379447825000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":40,"title":["Batch Mode Active Sampling Based on Marginal Probability Distribution Matching"],"prefix":"10.1145","volume":"7","author":[{"given":"Rita","family":"Chattopadhyay","sequence":"first","affiliation":[{"name":"Arizona State University"}]},{"given":"Zheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Arizona State University"}]},{"given":"Wei","family":"Fan","sequence":"additional","affiliation":[{"name":"Huawei Noah\u2019s Ark Lab"}]},{"given":"Ian","family":"Davidson","sequence":"additional","affiliation":[{"name":"University of California, Davis"}]},{"given":"Sethuraman","family":"Panchanathan","sequence":"additional","affiliation":[{"name":"Arizona State University"}]},{"given":"Jieping","family":"Ye","sequence":"additional","affiliation":[{"name":"Arizona State University"}]}],"member":"320","published-online":{"date-parts":[[2013,9]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/1577069.1755858"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl242"},{"key":"e_1_2_2_3_1","doi-asserted-by":"crossref","unstructured":"Boyd S. and Vandenberghe L. 2004. Convex Optimization. Cambridge.   Boyd S. and Vandenberghe L. 2004. Convex Optimization . Cambridge.","DOI":"10.1017\/CBO9780511804441"},{"key":"e_1_2_2_4_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Brinker K.","year":"2003","unstructured":"Brinker , K. 2003 . Incorporating diversity in active learning with support vector machines . In Proceedings of the International Conference on Machine Learning. Brinker, K. 2003. Incorporating diversity in active learning with support vector machines. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2011.104"},{"key":"e_1_2_2_6_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Campbell C.","unstructured":"Campbell , C. , Cristianini , N. , and Smola , A . 2000. Query learning with large margin classifiers . In Proceedings of the International Conference on Machine Learning. Campbell, C., Cristianini, N., and Smola, A. 2000. Query learning with large margin classifiers. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_2_2_7_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Dagan I.","unstructured":"Dagan , I. and Engelson , S . 1995. Committee-based sampling for training probabilistic classifiers . In Proceedings of the International Conference on Machine Learning. Dagan, I. and Engelson, S. 1995. Committee-based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2009.97"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007330508534"},{"key":"e_1_2_2_10_1","volume-title":"CVX: Matlab software for disciplined convex programming, version 1.21.","author":"Grant M.","year":"2007","unstructured":"Grant , M. and Boyd , S . 2007 . CVX: Matlab software for disciplined convex programming, version 1.21. Grant, M. and Boyd, S. 2007. CVX: Matlab software for disciplined convex programming, version 1.21."},{"key":"e_1_2_2_11_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems.","author":"Gretton A.","unstructured":"Gretton , A. , Borgwardt , K. M. , Rasch , M. , Scholkopf , B. , and Smola , A. J . 2007. A kernel method for the two-sample problem . In Proceedings of the Conference on Advances in Neural Information Processing Systems. Gretton, A., Borgwardt, K. M., Rasch, M., Scholkopf, B., and Smola, A. J. 2007. A kernel method for the two-sample problem. In Proceedings of the Conference on Advances in Neural Information Processing Systems."},{"key":"e_1_2_2_12_1","first-page":"1","article-title":"A kernel method for the two-sample-problem","volume":"1","author":"Gretton A.","year":"2008","unstructured":"Gretton , A. , Borgwardt , K. M. , Rasch , M. , Sch\u00f6lkopf , B. , and Smola , A. J. 2008 . A kernel method for the two-sample-problem . Journal of Machine Learning Research 1 , 1 -- 10 . Gretton, A., Borgwardt, K. M., Rasch, M., Sch\u00f6lkopf, B., and Smola, A. J. 2008. A kernel method for the two-sample-problem. Journal of Machine Learning Research 1, 1--10.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems.","author":"Guo Y.","year":"2010","unstructured":"Guo , Y. 2010 . Active instance sampling via matrix partition . In Proceedings of the Conference on Advances in Neural Information Processing Systems. Guo, Y. 2010. Active instance sampling via matrix partition. In Proceedings of the Conference on Advances in Neural Information Processing Systems."},{"key":"e_1_2_2_14_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems.","author":"Guo Y.","unstructured":"Guo , Y. and Schuurmans , D . 2007. Discriminative batch mode active learning . In Proceedings of the Conference on Advances in Neural Information Processing Systems. Guo, Y. and Schuurmans, D. 2007. Discriminative batch mode active learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems."},{"key":"e_1_2_2_15_1","unstructured":"He X. and Cai D. 2009. Active subspace learning. In ICCV.  He X. and Cai D. 2009. Active subspace learning. In ICCV ."},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143897"},{"key":"e_1_2_2_17_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Hoi S.","unstructured":"Hoi , S. , Jin , R. , Zhu , J. , and Lyu , M . 2008. Semi-supervised svm batch mode active learning for image retrieval . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hoi, S., Jin, R., Zhu, J., and Lyu, M. 2008. Semi-supervised svm batch mode active learning for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_2_18_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems.","author":"Huang J.","unstructured":"Huang , J. , Smola , A. , Gretton , A. , Borgwardt , K. , and Scholkopf , B . 2007. Correcting sample selection bias by unlabeled data . In Proceedings of the Conference on Advances in Neural Information Processing Systems. Huang, J., Smola, A., Gretton, A., Borgwardt, K., and Scholkopf, B. 2007. Correcting sample selection bias by unlabeled data. In Proceedings of the Conference on Advances in Neural Information Processing Systems."},{"key":"e_1_2_2_19_1","unstructured":"Huang S. Jin R. and Zhou Z. 2010. Active Learning by Querying Informative and Representative Examples. In NIPS.  Huang S. Jin R. and Zhou Z. 2010. Active Learning by Querying Informative and Representative Examples. In NIPS ."},{"key":"e_1_2_2_20_1","volume-title":"Proceedings of the International Conference on Multimedia and Expo.","author":"Jing F.","unstructured":"Jing , F. , Li , M. , Zhang , H. , and Zhang , B . 2004. Entropy based active learning with support vector machines for content based image retrieval . In Proceedings of the International Conference on Multimedia and Expo. Jing, F., Li, M., Zhang, H., and Zhang, B. 2004. Entropy based active learning with support vector machines for content based image retrieval. In Proceedings of the International Conference on Multimedia and Expo."},{"key":"e_1_2_2_21_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.","author":"Joshi A.","unstructured":"Joshi , A. , Porikli , F. , and Papanikolopoulos , N . 2009. Multi-class active learning for image classification . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Joshi, A., Porikli, F., and Papanikolopoulos, N. 2009. Multi-class active learning for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cell.2007.08.003"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2002.999679"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_2_2_25_1","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence.","author":"Pan S.","unstructured":"Pan , S. , Tsang , I. , Kwok , J. , and Yang , Q . 2009. Domain adaptation via transfer component analysis . In Proceedings of the International Joint Conference on Artificial Intelligence. Pan, S., Tsang, I., Kwok, J., and Yang, Q. 2009. Domain adaptation via transfer component analysis. In Proceedings of the International Joint Conference on Artificial Intelligence."},{"key":"e_1_2_2_26_1","unstructured":"Pan S. J. Kwok J. T. and Yang Q. 2008. Transfer learning via dimensionality reduction. In AAAI.   Pan S. J. Kwok J. T. and Yang Q. 2008. Transfer learning via dimensionality reduction. In AAAI ."},{"key":"e_1_2_2_27_1","volume-title":"Proceedings of the NAACL-HLT Active Learning for NLP Workshop.","author":"Rai P.","unstructured":"Rai , P. , Saha , A. , Daum\u00e9 , H., III , and Venkatasubramanian , S . 2010. Domain adaptation meets active learning . In Proceedings of the NAACL-HLT Active Learning for NLP Workshop. Rai, P., Saha, A., Daum\u00e9, H., III, and Venkatasubramanian, S. 2010. Domain adaptation meets active learning. In Proceedings of the NAACL-HLT Active Learning for NLP Workshop."},{"key":"e_1_2_2_28_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Schohn G.","unstructured":"Schohn , G. and Cohn , D . 2000. Less is more: Active learning with support vector machines . In Proceedings of the International Conference on Machine Learning. Schohn, G. and Cohn, D. 2000. Less is more: Active learning with support vector machines. In Proceedings of the International Conference on Machine Learning."},{"key":"e_1_2_2_29_1","volume-title":"Active learning literature survey. Computer Sciences Tech. rep. 1648","author":"Settles B.","unstructured":"Settles , B. 2009. Active learning literature survey. Computer Sciences Tech. rep. 1648 , University of Wisconsin-Madison. Settles, B. 2009. Active learning literature survey. Computer Sciences Tech. rep. 1648, University of Wisconsin-Madison."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/130385.130417"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/3121525.3121549"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-3758(00)00115-4"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1859901"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244302760185252"},{"key":"e_1_2_2_35_1","doi-asserted-by":"crossref","unstructured":"Sugiyama M. Nakajima S. Kashima H. Buenau P. V. and Kawanabe M. 2008. Direct importance estimation with model selection and its application to covariate shift adaptation. In NIPS.  Sugiyama M. Nakajima S. Kashima H. Buenau P. V. and Kawanabe M. 2008. Direct importance estimation with model selection and its application to covariate shift adaptation. In NIPS .","DOI":"10.1007\/s10463-008-0197-x"},{"key":"e_1_2_2_36_1","doi-asserted-by":"crossref","unstructured":"Tomancak P. Beaton A. Weiszmann R. Kwan E. and Shu S. 2002. Systematic determination of patterns of gene expression during drosophila embryogenesis. Genome Biol. 3.  Tomancak P. Beaton A. Weiszmann R. Kwan E. and Shu S. 2002. Systematic determination of patterns of gene expression during drosophila embryogenesis. Genome Biol. 3 .","DOI":"10.1186\/gb-2002-3-12-research0088"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244302760185243"},{"key":"e_1_2_2_38_1","volume-title":"The Nature of Statistical Learning Theory","author":"Vapnik V. N.","unstructured":"Vapnik , V. N. 2000. The Nature of Statistical Learning Theory . Springer . Vapnik, V. N. 2000. The Nature of Statistical Learning Theory. Springer."},{"key":"e_1_2_2_39_1","volume-title":"All of Statistics: A Concise Course in Statistical Inference","author":"Wasserman L.","unstructured":"Wasserman , L. 2005. All of Statistics: A Concise Course in Statistical Inference . Springer . Wasserman, L. 2005. All of Statistics: A Concise Course in Statistical Inference. Springer."},{"key":"e_1_2_2_40_1","volume-title":"Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann.","author":"Witten I.","year":"2000","unstructured":"Witten , I. and Frank , E . 2000 . Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann. Witten, I. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann."},{"key":"e_1_2_2_41_1","volume-title":"Proceedings of the International Conference on Multimedia and Expo.","author":"Wu Y.","unstructured":"Wu , Y. , Kozintsev , I. , Bouguet , J. , and Dulong , C . 2006. Sampling strategies for active learning in personal photo retrieval . In Proceedings of the International Conference on Multimedia and Expo. Wu, Y., Kozintsev, I., Bouguet, J., and Dulong, C. 2006. Sampling strategies for active learning in personal photo retrieval. In Proceedings of the International Conference on Multimedia and Expo."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143980"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.20"},{"key":"e_1_2_2_44_1","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Zhang T.","unstructured":"Zhang , T. and Oles , F . 2000. A probability analysis on the value of unlabeled data for classification problems . In Proceedings of the International Conference on Machine Learning. Zhang, T. and Oles, F. 2000. A probability analysis on the value of unlabeled data for classification problems. In Proceedings of the International Conference on Machine Learning."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2513092.2513094","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2513092.2513094","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:34:25Z","timestamp":1750232065000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2513092.2513094"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,9]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2013,9]]}},"alternative-id":["10.1145\/2513092.2513094"],"URL":"https:\/\/doi.org\/10.1145\/2513092.2513094","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,9]]},"assertion":[{"value":"2012-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}