{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T21:49:40Z","timestamp":1779313780950,"version":"3.51.4"},"reference-count":32,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,7,5]],"date-time":"2022-07-05T00:00:00Z","timestamp":1656979200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model\u2019s accuracy if the available training set is very small.<\/jats:p>","DOI":"10.3390\/bdcc6030074","type":"journal-article","created":{"date-parts":[[2022,7,5]],"date-time":"2022-07-05T10:21:33Z","timestamp":1657016493000},"page":"74","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets"],"prefix":"10.3390","volume":"6","author":[{"given":"Ran","family":"Deng","sequence":"first","affiliation":[{"name":"School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9864-1249","authenticated-orcid":false,"given":"Fedor","family":"Duzhin","sequence":"additional","affiliation":[{"name":"School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"501","DOI":"10.1146\/annurev-statistics-031017-100045","article-title":"Topological Data Analysis","volume":"5","author":"Wasserman","year":"2018","journal-title":"Annu. Rev. Stat. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"255","DOI":"10.1090\/S0273-0979-09-01249-X","article-title":"Topology and Data","volume":"46","author":"Carlsson","year":"2009","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1090\/S0273-0979-07-01191-3","article-title":"Barcodes: The persistent topology of Data","volume":"45","author":"Ghrist","year":"2007","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kraetzer, C., Shi, Y.Q., Dittmann, J., and Kim, H.J. (2017). Topological Data Analysis for Image Tampering Detection. Digital Forensics and Watermarking, Springer International Publishing.","DOI":"10.1007\/978-3-319-64185-0"},{"key":"ref_5","first-page":"673","article-title":"Topological data analysis in computer vision","volume":"Volume 11433","author":"Osten","year":"2020","journal-title":"Proceedings of the Twelfth International Conference on Machine Vision (ICMV 2019)"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Seversky, L.M., Davis, S., and Berger, M. (2016, January 27\u201330). On Time-Series Topological Data Analysis: New Data and Opportunities. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Las Vegas, NV, USA.","DOI":"10.1109\/CVPRW.2016.131"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0126383","article-title":"Topological Data Analysis of Biological Aggregation Models","volume":"10","author":"Topaz","year":"2015","journal-title":"PLoS ONE"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"814","DOI":"10.1002\/cnm.2655","article-title":"Persistent homology analysis of protein structure, flexibility, and folding","volume":"30","author":"Xia","year":"2014","journal-title":"Int. J. Numer. Methods Biomed. Eng."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"656","DOI":"10.1162\/netn_a_00073","article-title":"The importance of the whole: Topological Data Analysis for the network neuroscientist","volume":"3","author":"Sizemore","year":"2019","journal-title":"Netw. Neurosci."},{"key":"ref_10","unstructured":"Rucco, M., Falsetti, L., Herman, D., Petrossian, T., Merelli, E., Nitti, C., and Salvi, A. (2014). Using Topological Data Analysis for diagnosis pulmonary embolism. arXiv."},{"key":"ref_11","unstructured":"Zhu, X. (2013, January 3\u20139). Persistent homology: An introduction and a new text representation for natural language processing. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Dutoit, T., Mart\u00edn-Vide, C., and Pironkov, G. (2018). Movie Genre Detection Using Topological Data Analysis. Statistical Language and Speech Processing, Springer International Publishing.","DOI":"10.1007\/978-3-030-00810-9"},{"key":"ref_13","unstructured":"Hoang, Q. (2018). Predicting Movie Genres Based on Plot Summaries. arXiv."},{"key":"ref_14","unstructured":"Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv."},{"key":"ref_15","unstructured":"Gholizadeh, S., Seyeditabari, A., and Zadrozny, W. (2020). A Novel Method of Extracting Topological Features from Word Embeddings. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Chen, T., and Guestrin, C. (2016, January 13\u201317). Xgboost: A scalable tree boosting system. Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2939672.2939785"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Lai, S., Xu, L., Liu, K., and Zhao, J. (2015, January 25\u201330). Recurrent convolutional neural networks for text classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9513"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv.","DOI":"10.18653\/v1\/E17-2068"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Pennington, J., Socher, R., and Manning, C.D. (2014, January 25\u201329). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1162"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Speer, R., Chin, J., and Havasi, C. (2017, January 4\u20139). Conceptnet 5.5: An open multilingual graph of general knowledge. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11164"},{"key":"ref_21","unstructured":"Gholizadeh, S., Savle, K., Seyeditabari, A., and Zadrozny, W. (2020). Topological Data Analysis in Text Classification: Extracting Features with Additive Information. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Gholizadeh, S., Seyeditabari, A., and Zadrozny, W. (2018). Topological signature of 19th century novelists: Persistent homology in text mining. Big Data Cogn. Comput., 2.","DOI":"10.20944\/preprints201809.0466.v1"},{"key":"ref_24","unstructured":"Elyasi, N., and Moghadam, M.H. (2019). An Introduction to a New Text Classification and Visualization for Natural Language Processing Using Topological Data Analysis. arXiv."},{"key":"ref_25","unstructured":"Singh, G., M\u00e9moli, F., and Carlsson, G.E. (2007, January 2\u20133). Topological methods for the analysis of high dimensional data sets and 3d object recognition. Proceedings of the Eurographics Symposium on Point-Based Graphics, Prague, Czech Republic."},{"key":"ref_26","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_27","first-page":"1","article-title":"Persistence images: A stable vector representation of persistent homology","volume":"18","author":"Adams","year":"2017","journal-title":"J. Mach. Learn. Res."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"e9","DOI":"10.1002\/spy2.9","article-title":"Detecting opinion spams and fake news using text classification","volume":"1","author":"Ahmed","year":"2017","journal-title":"Secur. Priv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Ahmed, H., Traor\u00e9, I., and Saad, S. (2017, January 25\u201327). Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques. Proceedings of the International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, Vancouver, BC, Canada.","DOI":"10.1007\/978-3-319-69155-8_9"},{"key":"ref_30","unstructured":"Hornik, K. (2022, April 15). openNLP: Apache OpenNLP Tools Interface, R Package Version 0.2-6. Available online: https:\/\/CRAN.R-project.org\/package=openNLP."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"860","DOI":"10.21105\/joss.00860","article-title":"TDAstats: R pipeline for computing persistent homology in topological data analysis","volume":"3","author":"Wadhwa","year":"2018","journal-title":"J. Open Source Softw."},{"key":"ref_32","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/3\/74\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:43:01Z","timestamp":1760139781000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/3\/74"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,5]]},"references-count":32,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,9]]}},"alternative-id":["bdcc6030074"],"URL":"https:\/\/doi.org\/10.3390\/bdcc6030074","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,5]]}}}