{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T20:18:59Z","timestamp":1767212339330,"version":"build-2065373602"},"reference-count":31,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2022,4,20]],"date-time":"2022-04-20T00:00:00Z","timestamp":1650412800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100008982","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DMS 2011140"],"award-info":[{"award-number":["DMS 2011140"]}],"id":[{"id":"10.13039\/501100008982","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a novel method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method on legal documents provided by the California Innocence Project and the 20 Newsgroups dataset. Our results show that the proposed method improves both classification accuracy and topic coherence in comparison to past methods such as Semi-Supervised Non-negative Matrix Factorization (SSNMF), Guided Non-negative Matrix Factorization (Guided NMF), and Topic Supervised NMF.<\/jats:p>","DOI":"10.3390\/a15050136","type":"journal-article","created":{"date-parts":[[2022,4,21]],"date-time":"2022-04-21T01:55:51Z","timestamp":1650506151000},"page":"136","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Guided Semi-Supervised Non-Negative Matrix Factorization"],"prefix":"10.3390","volume":"15","author":[{"given":"Pengyu","family":"Li","sequence":"first","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8934-551X","authenticated-orcid":false,"given":"Christine","family":"Tseng","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Yaxuan","family":"Zheng","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2144-1924","authenticated-orcid":false,"given":"Joyce A.","family":"Chew","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Longxiu","family":"Huang","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]},{"given":"Benjamin","family":"Jarman","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8058-8638","authenticated-orcid":false,"given":"Deanna","family":"Needell","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of California, Los Angeles, CA 90095, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,4,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"788","DOI":"10.1038\/44565","article-title":"Learning the parts of objects by non-negative matrix factorization","volume":"401","author":"Lee","year":"1999","journal-title":"Nature"},{"key":"ref_2","first-page":"556","article-title":"Algorithms for non-negative matrix factorization","volume":"13","author":"Seung","year":"2001","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Arora, S., Ge, R., and Moitra, A. (2012, January 20\u201323). Learning topic models\u2013going beyond SVD. Proceedings of the 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, New Brunswick, NJ, USA.","DOI":"10.1109\/FOCS.2012.49"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kuang, D., Choo, J., and Park, H. (2015). Nonnegative Matrix Factorization for Interactive Topic Modeling and Document Clustering. Partitional Clustering Algorithms, Springer.","DOI":"10.1007\/978-3-319-09259-1_7"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3836","DOI":"10.1109\/TIP.2019.2907054","article-title":"Simultaneous dimensionality reduction and classification via dual embedding regularized nonnegative matrix factorization","volume":"28","author":"Wu","year":"2019","journal-title":"IEEE Trans. Image Process."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2698","DOI":"10.1109\/TCSVT.2020.3027570","article-title":"Positive and negative label-driven nonnegative matrix factorization","volume":"31","author":"Wu","year":"2020","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_7","unstructured":"Xu, W., Liu, X., and Gong, Y. (August, January 28). Document Clustering Based on Non-Negative Matrix Factorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada."},{"key":"ref_8","unstructured":"Chang, J., Gerrish, S., Wang, C., Boyd-graber, J., and Blei, D. (2009, January 7\u201310). Reading Tea Leaves: How Humans Interpret Topic Models. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_9","unstructured":"Jagarlamudi, J., Daum\u00e9 III, H., and Udupa, R. (2012, January 23\u201327). Incorporating Lexical Priors into Topic Models. Proceedings of the Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1007\/s10115-008-0134-6","article-title":"Non-negative matrix factorization for semi-supervised data clustering","volume":"17","author":"Chen","year":"2008","journal-title":"Knowl. Inf. Syst."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1109\/LSP.2009.2027163","article-title":"Semi-Supervised Nonnegative Matrix Factorization","volume":"17","author":"Lee","year":"2010","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_12","first-page":"2510","article-title":"Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization","volume":"31","author":"Jia","year":"2019","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2550","DOI":"10.1109\/TCYB.2020.2969684","article-title":"Semisupervised adaptive symmetric non-negative matrix factorization","volume":"51","author":"Jia","year":"2020","journal-title":"IEEE Trans. Cybern."},{"key":"ref_14","unstructured":"Haddock, J., Kassab, L., Li, S., Kryshchenko, A., Grotheer, R., Sizikova, E., Wang, C., Merkh, T., Madushani, R.W.M.A., and Ahn, M. (2022, January 29). Semi-Supervised NMF Models for Topic Modeling in Learning Tasks. Available online: https:\/\/arxiv.org\/pdf\/2010.07956."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Vendrow, J., Haddock, J., Rebrova, E., and Needell, D. (2021, January 6\u201311). On a Guided Nonnegative Matrix Factorization. Proceedings of the ICASSP 2021\u20142021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada.","DOI":"10.1109\/ICASSP39728.2021.9413656"},{"key":"ref_16","unstructured":"MacMillan, K., and Wilson, J.D. (2017). Topic supervised non-negative matrix factorization. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1589","DOI":"10.1109\/TNN.2007.895831","article-title":"On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization","volume":"18","author":"Lin","year":"2007","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_18","unstructured":"Budahazy, R., Cheng, L., Huang, Y., Johnson, A., Li, P., Vendrow, J., Wu, Z., Molitor, D., Rebrova, E., and Needell, D. (2022, January 29). Analysis of Legal Documents via Non-negative Matrix Factorization Methods. Available online: https:\/\/arxiv.org\/pdf\/2104.14028."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Lang, K. (1995, January 9\u201312). Newsweeder: Learning to filter netnews. Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, CA, USA.","DOI":"10.1016\/B978-1-55860-377-6.50048-7"},{"key":"ref_20","unstructured":"Opitz, J., and Burst, S. (2022, January 29). Macro F1 and Macro F1. Available online: https:\/\/arxiv.org\/pdf\/1911.03347."},{"key":"ref_21","unstructured":"Mimno, D., Wallach, H., Talley, E., Leenders, M., and McCallum, A. (2011, January 27\u201331). Optimizing Semantic Coherence in Topic Models. Proceedings of the Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_22","unstructured":"Bird, S., Klein, E., and Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, O\u2019Reilly Media."},{"key":"ref_23","unstructured":"Ramos, J. (2022, January 29). Using tf-idf to Determine Word Relevance in Document Queries. Available online: https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.121.1424&rep=rep1&type=pdf."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1007\/s11859-007-0038-4","article-title":"Keyword extraction based on tf\/idf for Chinese news document","volume":"12","author":"Li","year":"2007","journal-title":"Wuhan Univ. J. Nat. Sci."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","article-title":"Term-weighting approaches in automatic text retrieval","volume":"24","author":"Salton","year":"1988","journal-title":"Inf. Process. Manag."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3434237","article-title":"A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models","volume":"20","author":"Naseem","year":"2021","journal-title":"Trans. Asian -Low-Resour. Lang. Inf. Process."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1108\/00220410410560582","article-title":"Understanding inverse document frequency: On theoretical arguments for IDF","volume":"60","author":"Robertson","year":"2004","journal-title":"J. Doc."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kwok, I., and Wang, Y. (2013, January 14\u201318). Locate the hate: Detecting tweets against blacks. Proceedings of the Twenty-seventh AAAI Conference on Artificial Intelligence, Bellevue, DC, USA.","DOI":"10.1609\/aaai.v27i1.8539"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1002\/poi3.85","article-title":"Cyber hate speech on twitter: An application of machine classification and statistical modeling for policy and decision-making","volume":"7","author":"Burnap","year":"2015","journal-title":"Policy Internet"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Grotheer, R., Huang, L., Huang, Y., Kryshchenko, A., Kryshchenko, O., Li, P., Li, X., Rebrova, E., Ha, K., and Needell, D. (2020, January 13\u201317). COVID-19 Literature Topic-Based Search via Hierarchical NMF. Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Virtual Event.","DOI":"10.18653\/v1\/2020.nlpcovid19-2.4"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1111\/j.2517-6161.1974.tb00994.x","article-title":"Cross-validatory choice and assessment of statistical predictions","volume":"36","author":"Stone","year":"1974","journal-title":"J. R. Stat. Soc. Ser. B Methodol."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/5\/136\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:57:41Z","timestamp":1760137061000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/5\/136"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,20]]},"references-count":31,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["a15050136"],"URL":"https:\/\/doi.org\/10.3390\/a15050136","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,4,20]]}}}