{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:12:17Z","timestamp":1760242337168,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2017,5,5]],"date-time":"2017-05-05T00:00:00Z","timestamp":1493942400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. Our first contribution assumes labels from a perfect source. Towards this, we propose a novel topic model (ML-PA-LDA). The distinguishing feature in our model is that classes that are present as well as the classes that are absent generate the latent topics and hence the words. Extensive experimentation on real world datasets reveals the superior performance of the proposed model. A natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In this more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this motivation, we further augment our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS. With experiments on simulated noisy annotators, the proposed model learns the qualities of the annotators well, even with minimal training data.<\/jats:p>","DOI":"10.3390\/info8020052","type":"journal-article","created":{"date-parts":[[2017,5,5]],"date-time":"2017-05-05T10:31:08Z","timestamp":1493980268000},"page":"52","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Multi-Label Classification from Multiple Noisy Sources Using Topic Models"],"prefix":"10.3390","volume":"8","author":[{"given":"Divya","family":"Padmanabhan","sequence":"first","affiliation":[{"name":"Department of Computer Science and Automation, Indian Institute of Science, Bangalore-560012, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Satyanath","family":"Bhat","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Automation, Indian Institute of Science, Bangalore-560012, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shirish","family":"Shevade","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Automation, Indian Institute of Science, Bangalore-560012, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Y.","family":"Narahari","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Automation, Indian Institute of Science, Bangalore-560012, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2017,5,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1016\/j.im.2016.04.005","article-title":"Social Emotion Classification of Short Text via Topic-Level Maximum Entropy Model","volume":"53","author":"Rao","year":"2016","journal-title":"Inf. Manag."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.ipm.2015.03.001","article-title":"Incorporating Sentiment into Tag-based User Profiles and Resource Profiles for Personalized Search in Folksonomy","volume":"52","author":"Xie","year":"2016","journal-title":"Inf. Process. Manag."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1016\/j.knosys.2014.04.022","article-title":"News impact on stock price return via sentiment analysis","volume":"69","author":"Li","year":"2014","journal-title":"Knowl. Based Syst."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1109\/MIS.2015.1","article-title":"Does Summarization Help Stock Prediction? A News Impact Analysis","volume":"30","author":"Li","year":"2015","journal-title":"IEEE Intell. Syst."},{"key":"ref_5","first-page":"993","article-title":"Latent Dirichlet Allocation","volume":"3","author":"Blei","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Heinrich, G. (2009, January 18\u201321). A Generic Approach to Topic Models. Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I (ECML PKDD \u201909), Shanghai, China.","DOI":"10.1007\/978-3-642-04180-8_51"},{"key":"ref_7","unstructured":"Krestel, R., and Fankhauser, P. (2009, January 7). Tag recommendation using probabilistic topic models. Proceedings of the International Workshop at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Bled, Slovenia."},{"key":"ref_8","unstructured":"Li, F.F., and Perona, P. (2005, January 20\u201326). A bayesian hierarchical model for learning natural Scene categories. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1093\/genetics\/155.2.945","article-title":"Inference of population structure using multilocus genotype data","volume":"155","author":"Pritchard","year":"2000","journal-title":"Genetics"},{"key":"ref_10","unstructured":"Marlin, B. (2004). Collaborative Filtering: A Machine Learning Perspective. [Ph.D. Thesis, University of Toronto]."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Erosheva, E.A. (2002). Grade of Membership and Latent Structure Models with Application To Disability Survey Data. [Ph.D. Thesis, Office of Population Research, Princeton University].","DOI":"10.1201\/9780203497159.ch6"},{"key":"ref_12","unstructured":"Girolami, M., and Kab\u00e1n, A. (2003, January 21\u201324). Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles. Proceedings of the 16th International Conference on Neural Information Processing Systems (NIPS\u201903), Washington, DC, USA."},{"key":"ref_13","unstructured":"Mcauliffe, J.D., and Blei, D.M. (2007, January 3\u20136). Supervised Topic Models. Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems (NIPS\u201907), Vancouver, BC, Canada."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, H., Huang, M., and Zhu, X. (2008, January 15\u201319). A Generative Probabilistic Model for Multi-label Classification. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining (ICDM\u201908), Washington, DC, USA.","DOI":"10.1109\/ICDM.2008.86"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/s10994-011-5272-5","article-title":"Statistical Topic Models for Multi-label Document Classification","volume":"88","author":"Rubin","year":"2012","journal-title":"Mach. Learn."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"4","DOI":"10.19153\/cleiej.14.1.4","article-title":"Multi-label Problem Transformation Methods: A Case Study","volume":"14","author":"Cherman","year":"2011","journal-title":"CLEI Electron. J."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1109\/TKDE.2010.164","article-title":"Random k-Labelsets for Multilabel Classification","volume":"23","author":"Tsoumakas","year":"2011","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Elisseeff, A., and Weston, J. (2001, January 3\u20138). A Kernel Method for Multi-labelled Classification. Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS\u201901), Vancouver, BC, Canada.","DOI":"10.7551\/mitpress\/1120.003.0092"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1819","DOI":"10.1109\/TKDE.2013.39","article-title":"A Review on Multi-Label Learning Algorithms","volume":"26","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_20","unstructured":"McCallum, A.K. (1999, January 18\u201322). Multi-label text classification with a mixture model trained by EM. Proceedings of the AAAI 99 Workshop on Text Learning, Orlando, FL, USA."},{"key":"ref_21","unstructured":"Ueda, N., and Saito, K. (2002, January 9\u201314). Parametric mixture models for multi-labeled text. Proceedings of the Neural Information Processing Systems 15 (NIPS\u201902), Vancouver, BC, Canada."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Soleimani, H., and Miller, D.J. (2016, January 24\u201328). Semi-supervised Multi-Label Topic Models for Document Classification and Sentence Labeling. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM\u201916), Indianapolis, IN, USA.","DOI":"10.1145\/2983323.2983752"},{"key":"ref_23","first-page":"1297","article-title":"Learning From Crowds","volume":"11","author":"Raykar","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Bragg, J., and Weld, D.S. (2013, January 7\u20139). Crowdsourcing Multi-Label Classification for Taxonomy Creation. Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing (HCOMP), Palm Springs, CA, USA.","DOI":"10.1609\/hcomp.v1i1.13091"},{"key":"ref_25","unstructured":"Deng, J., Russakovsky, O., Krause, J., Bernstein, M.S., Berg, A., and Fei-Fei, L. (May, January 26). Scalable Multi-label Annotation. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI\u201914), Toronto, ON, Canada."},{"key":"ref_26","unstructured":"Duan, L., Satoshi, O., Sato, H., and Kurihara, M. (2017, May 04). Leveraging Crowdsourcing to Make Models in Multi-label Domains Interoperable. Available online: http:\/\/hokkaido.ipsj.or.jp\/info2014\/papers\/20\/Duan_INFO.pdf."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Rodrigues, F., Ribeiro, B., Louren\u00e7o, M., and Pereira, F. (2015, January 8\u201311). Learning Supervised Topic Models from Crowds. Proceedings of the Third AAAI Conference on Human Computation and Crowdsourcing (HCOMP), San Diego, CA, USA.","DOI":"10.1609\/hcomp.v3i1.13221"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ramage, D., Manning, C.D., and Dumais, S. (2011, January 21\u201324). Partially Labeled Topic Models for Interpretable Text Mining. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201911), San Diego, CA, USA.","DOI":"10.1145\/2020408.2020481"},{"key":"ref_29","unstructured":"Bishop, C. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_30","unstructured":"(2017, May 04). ML-PA-LDA-MNS Overview. Available online: https:\/\/bitbucket.org\/divs1202\/ml-pa-lda-mns."},{"key":"ref_31","unstructured":"Lichman, M. (2013). UCI Machine Learning Repository, University of California, Irvine."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/eb046814","article-title":"An algorithm for suffix stripping","volume":"14","author":"Porter","year":"1980","journal-title":"Program"},{"key":"ref_33","unstructured":"Katakis, I., Tsoumakas, G., and Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. ECML PKDD Discov. Chall., 75\u201383."},{"key":"ref_34","first-page":"2411","article-title":"Mulan: A Java Library for Multi-Label Learning","volume":"12","author":"Tsoumakas","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_35","unstructured":"Tsoumakas, G., Katakis, I., and Vlahavas, I. (2008, January 24\u201326). Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. Proceedings of the ECML\/PKDD 2008 Workshop on Mining Multidimensional Data (MMD\u201908), Atlanta, GA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1757","DOI":"10.1016\/j.patcog.2004.03.009","article-title":"Learning multi-label Scene classification","volume":"37","author":"Boutell","year":"2004","journal-title":"Pattern Recognit."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Gibaja, E., and Ventura, S. (2015). A Tutorial on Multilabel Learning. ACM Comput. Surv., 47.","DOI":"10.1145\/2716262"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Read, J., Martino, L., and Luengo, D. (2013, January 26\u201331). Efficient Monte Carlo optimization for multi-label classifier chains. Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada.","DOI":"10.1109\/ICASSP.2013.6638300"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/s10994-011-5256-5","article-title":"Classifier Chains for Multi-label Classification","volume":"85","author":"Read","year":"2011","journal-title":"Mach. Learn."},{"key":"ref_40","unstructured":"Zaragoza, J.H., Sucar, L.E., Morales, E.F., Bielza, C., and Larra\u00f1aga, P. (2011, January 16\u201322). Bayesian Chain Classifiers for Multidimensional Classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI\u201911), Barcelona, Spain."},{"key":"ref_41","unstructured":"(2017, May 04). MEKA: A Multi-label Extension to WEKA. Available online: http:\/\/meka.sourceforge.net."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1016\/0020-0190(87)90114-1","article-title":"Occam\u2019s Razor","volume":"24","author":"Blumer","year":"1987","journal-title":"Inf. Process. Lett."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"69","DOI":"10.2307\/2980283","article-title":"The Efficiencies of the Binomial Series Tests of Significance of a Mean and of a Correlation Coefficient","volume":"100","author":"Cochran","year":"1937","journal-title":"J. R. Stat. Soc."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Rohatgi, V.K., and Saleh, A.M.E. (2015). An Introduction to Probability and Statistics, John Wiley & Sons.","DOI":"10.1002\/9781118799635"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/2\/52\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:34:42Z","timestamp":1760207682000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/2\/52"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,5,5]]},"references-count":44,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2017,6]]}},"alternative-id":["info8020052"],"URL":"https:\/\/doi.org\/10.3390\/info8020052","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2017,5,5]]}}}