{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T14:40:16Z","timestamp":1776523216114,"version":"3.51.2"},"reference-count":23,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,4,23]],"date-time":"2023-04-23T00:00:00Z","timestamp":1682208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["UIDB\/04111\/2020"],"award-info":[{"award-number":["UIDB\/04111\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>This article presents a dataset of 10,917 news articles with hierarchical news categories collected between 1 January 2019 and 31 December 2019. We manually labeled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news.<\/jats:p>","DOI":"10.3390\/data8050074","type":"journal-article","created":{"date-parts":[[2023,4,24]],"date-time":"2023-04-24T03:38:07Z","timestamp":1682307487000},"page":"74","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0652-6068","authenticated-orcid":false,"given":"Alina","family":"Petukhova","sequence":"first","affiliation":[{"name":"COPELABS, Lus\u00f3fona University, Campo Grande 376, 1749-024 Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8487-5837","authenticated-orcid":false,"given":"Nuno","family":"Fachada","sequence":"additional","affiliation":[{"name":"COPELABS, Lus\u00f3fona University, Campo Grande 376, 1749-024 Lisbon, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"100336","DOI":"10.1016\/j.patter.2021.100336","article-title":"Data and its (dis)contents: A survey of dataset development and use in machine learning research","volume":"2","author":"Paullada","year":"2021","journal-title":"Patterns"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Jayakody, N., Mohammad, A., and Halgamuge, M. (2022, January 17\u201320). Fake News Detection using a Decentralized Deep Learning Model and Federated Learning. Proceedings of the IECON 2022\u201448th Annual Conference of the IEEE Industrial Electronics Society, Brussels, Belgium.","DOI":"10.1109\/IECON49645.2022.9968358"},{"key":"ref_3","unstructured":"Stefansson, J.K. (2014). Quantitative Measure of Evaluative Labeling in News Reports: Psychology of Communication Bias Studied by Content Analysis and Semantic Differential. [Master\u2019s Thesis, UiT, Norway\u2019s Arctic University]."},{"key":"ref_4","unstructured":"Gezici, G. (2022). Quantifying Political Bias in News Articles. arXiv."},{"key":"ref_5","unstructured":"Mitchell, T. (2023, April 10). 20 Newsgroups Data Set. Available online: http:\/\/qwone.com\/~jason\/20Newsgroups\/."},{"key":"ref_6","unstructured":"(2023, April 10). AG\u2019s Corpus of News Articles. Available online: http:\/\/groups.di.unipi.it\/~gulli\/AG_corpus_of_news_articles.html."},{"key":"ref_7","unstructured":"Rus, V., and Markov, Z. (2017, January 22\u201324). RIPML: A Restricted Isometry Property-Based Approach to Multilabel Learning. Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017, Marco Island, FL, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chen, S., Soni, A., Pappu, A., and Mehdad, Y. (2017, January 3). DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging. Proceedings of the Rep4NLP@ACL, Vancouver, BC, Canada.","DOI":"10.18653\/v1\/W17-2614"},{"key":"ref_9","unstructured":"Misra, R. (2022). News Category Dataset. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Roberts, H., Bhargava, R., Valiukas, L., Jen, D., Malik, M., Bishop, C., Ndulue, E., Dave, A., Clark, J., and Etling, B. (2021). Media Cloud: Massive Open Source Collection of Global News on the Open Web. arXiv.","DOI":"10.1609\/icwsm.v15i1.18127"},{"key":"ref_11","unstructured":"Gruppi, M., Horne, B.D., and Adal\u0131, S. (2020). NELA-GT-2019: A Large Multi-Labelled News Dataset for The Study of Misinformation in News Articles. arXiv."},{"key":"ref_12","unstructured":"(2023, April 10). IPTC NewsCodes Scheme (Controlled Vocabulary). Available online: https:\/\/cv.iptc.org\/newscodes\/mediatopic\/."},{"key":"ref_13","unstructured":"(2023, April 10). IPTC Media Topics\u2014Vocabulary Published on 25 February 2020. Available online: https:\/\/www.iptc.org\/std\/NewsCodes\/previous-versions\/IPTC-MediaTopic-NewsCodes_2020-02-25.xlsx."},{"key":"ref_14","unstructured":"(2022, November 21). NewsCodes\u2014Controlled Vocabularies for the Media. Available online: https:\/\/iptc.org\/standards\/newscodes\/#:~:text=Who%20uses%20IPTC%20NewsCodes%3F,becoming%20more%20and%20more%20popular."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Sammut, C., and Webb, G.I. (2010). Encyclopedia of Machine Learning, Springer.","DOI":"10.1007\/978-0-387-30164-8"},{"key":"ref_16","first-page":"2825","article-title":"Scikit-Learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C. (2014, January 25\u201329). Glove: Global Vectors for Word Representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Manning, C.D., Raghavan, P., and Sch\u00fctze, H. (2008). Introduction to Information Retrieval, Cambridge University Press.","DOI":"10.1017\/CBO9780511809071"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1111\/j.2517-6161.1958.tb00292.x","article-title":"The regression analysis of binary sequences","volume":"20","author":"Cox","year":"1958","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_22","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1007\/s10618-010-0175-9","article-title":"A survey of hierarchical classification across different application domains","volume":"22","author":"Silla","year":"2011","journal-title":"Data Min. Knowl. Discov."}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/5\/74\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:21:57Z","timestamp":1760124117000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/8\/5\/74"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,23]]},"references-count":23,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["data8050074"],"URL":"https:\/\/doi.org\/10.3390\/data8050074","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,23]]}}}