{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T03:19:48Z","timestamp":1777778388580,"version":"3.51.4"},"reference-count":69,"publisher":"SAGE Publications","issue":"2","license":[{"start":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T00:00:00Z","timestamp":1737936000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T00:00:00Z","timestamp":1737936000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Information Visualization"],"published-print":{"date-parts":[[2025,4]]},"abstract":"<jats:p>The sensemaking process of large sets of text documents is highly challenging for tasks such as obtaining a comprehensive overview or keeping up with the most important trends and topics. Even though several established methods for condensation and summarization of large text corpora exist, many of them lack the ability to account for difference in prevalence between identified topics, which in turn impedes quantitative analysis. In this paper, we therefore propose a novel prevalence-aware method for topic extraction, and show how it can be used to obtain important insights from two text corpora with very different content. We also implemented a prototype visual analytics tool which guides the user in the search for relevant insights and promotes trust in the yielded results. We have verified our application by a user study, as well as by a validation run on a data set with previously known topic structure. The results clearly show that our approach is suitable for text mining, that it can be used by non-experts, and that it offers features which makes it an interesting candidate for use in several different analysis scenarios.<\/jats:p>","DOI":"10.1177\/14738716241312400","type":"journal-article","created":{"date-parts":[[2025,1,29]],"date-time":"2025-01-29T13:33:18Z","timestamp":1738157598000},"page":"179-198","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Visually guided extraction of prevalent topics"],"prefix":"10.1177","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6150-0787","authenticated-orcid":false,"given":"Daniel","family":"Witschard","sequence":"first","affiliation":[{"name":"Department of Computer Science and Media Technology, Linnaeus University, V\u00e4xj\u00f6, Sweden"}]},{"given":"Ilir","family":"Jusufi","sequence":"additional","affiliation":[{"name":"Blekinge Institute of Technology, Karlskrona, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1907-7820","authenticated-orcid":false,"given":"Kostiantyn","family":"Kucher","sequence":"additional","affiliation":[{"name":"Link\u00f6ping University, Norrk\u00f6ping, Sweden"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0519-2537","authenticated-orcid":false,"given":"Andreas","family":"Kerren","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Media Technology, Linnaeus University, V\u00e4xj\u00f6, Sweden"},{"name":"Link\u00f6ping University, Norrk\u00f6ping, Sweden"}]}],"member":"179","published-online":{"date-parts":[[2025,1,27]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-022-13428-4"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2020.04.020"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2022.102131"},{"key":"e_1_3_2_5_2","volume-title":"Proceedings of the Eurographics conference on visualization (EuroVis)\u2014STARs","author":"J\u00e4nicke S","unstructured":"J\u00e4nicke S, Franzini G, Cheema MF, et al. On close and distant reading in digital humanities: a survey and future challenges. In: Proceedings of the Eurographics conference on visualization (EuroVis)\u2014STARs. Eindhoven, Netherlands: The Eurographics Association."},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2856530"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.05.045"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0255-7"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.3366\/cor.2017.0118"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.54691\/bcpssh.v14i.226"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1177\/14738716221114372"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14034"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1111\/cgf.14859"},{"key":"e_1_3_2_14_2","first-page":"348","article-title":"Statistical bibliography or bibliometrics?","volume":"25","author":"Pritchard A.","year":"1969","unstructured":"Pritchard A. Statistical bibliography or bibliometrics? J Doc 1969; 25: 348\u2013349.","journal-title":"J Doc"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254"},{"key":"e_1_3_2_16_2","first-page":"117","volume-title":"Proceedings of the IEEE Pacific visualization symposium","author":"Kucher K","unstructured":"Kucher K, Kerren A. Text visualization techniques: taxonomy, visual survey, and community insights. In: Proceedings of the IEEE Pacific visualization symposium. PacificVis \u201915, pp.117\u2013121. Los Alamitos, USA: IEEE Computer Society."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2834341"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12650-017-0462-2"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2016.2610422"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2018.2815030"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20317"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2024.3456199"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2020.3033401"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2019.2934654"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.dash-1.3"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2009.140"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0242283"},{"key":"e_1_3_2_28_2","first-page":"720","volume-title":"Proceedings of the 2013 IEEE\/ACM international conference on advances in social networks analysis and mining (ASONAM 2013)","author":"Malik S","unstructured":"Malik S, Smith A, Hawes T, et al. TopicFlow: visualizing topic alignment of Twitter data over time. In: Proceedings of the 2013 IEEE\/ACM international conference on advances in social networks analysis and mining (ASONAM 2013), New York City, USA. pp.720\u2013726."},{"key":"e_1_3_2_29_2","first-page":"85","volume-title":"Proceedings of the 2016 IEEE second international conference on big data computing service and applications (BigDataService)","author":"Su J","unstructured":"Su J, Boydell O. TopicListener: observing key topics from multi-channel speech audio streams. In: Proceedings of the 2016 IEEE second international conference on big data computing service and applications (BigDataService). Los Alamitos, USA: IEEE Computer Society, pp.85\u201394."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00371-019-01721-7"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-73531-3"},{"key":"e_1_3_2_32_2","first-page":"1137","article-title":"A neural probabilistic language model","volume":"3","author":"Bengio Y","year":"2003","unstructured":"Bengio Y, Ducharme R, Vincent P, et al. A neural probabilistic language model. J Mach Learn Res 2003; 3: 1137\u20131155.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.50"},{"key":"e_1_3_2_34_2","first-page":"2493","article-title":"Natural language processing (almost) from scratch","volume":"12","author":"Collobert R","year":"2011","unstructured":"Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch. J Mach Learn Res 2011; 12: 2493\u20132537.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_2_35_2","first-page":"384","volume-title":"Proceedings of the 48th annual meeting of the association for computational linguistics","author":"Turian J","unstructured":"Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. ACL \u201910, pp.384\u2013394. Stroudsburg, USA: Association for Computational Linguistics."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","unstructured":"Toshevska M Stojanovska F Kalajdjieski J. Comparative analysis of word embeddings for capturing word similarities. In: Proceedings of the international conference on natural language processing. NATP \u201920 arXiv. DOI:10.5121\/csit.2020.100402.","DOI":"10.5121\/csit.2020.100402"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","unstructured":"Almeida F Xex\u00e9o G. Word embeddings: a survey. arXiv Preprints 2019. DOI: 10.48550\/arXiv.1901.09069.1901.09069.","DOI":"10.48550\/arXiv.1901.09069.1901.09069"},{"key":"e_1_3_2_38_2","volume-title":"Advances in neural information processing systems","author":"Mikolov T","unstructured":"Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, vol. 26. Red Hook, USA: Curran Associates, Inc."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","unstructured":"Devlin J Chang M Lee K et al. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv Preprints 2018. DOI: 10.48550\/arXiv.1810.04805.","DOI":"10.48550\/arXiv.1810.04805"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","unstructured":"Cohan A Feldman S Beltagy I et al. SPECTER: document-level representation learning using citation-informed transformers. In: Proceedings of the annual meeting of the association for computational linguistics. Stroudsburg USA: ACL pp.2270\u20132282. DOI: 10.18653\/v1\/2020.acl-main.207.","DOI":"10.18653\/v1\/2020.acl-main.207"},{"key":"e_1_3_2_41_2","first-page":"1188","volume-title":"Proceedings of the 31st international conference on machine learning","author":"Le Q","unstructured":"Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning. ICML \u201914, Beijing, China: PMLR, pp.1188\u20131196."},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1551-6709.2010.01106.x"},{"key":"e_1_3_2_43_2","first-page":"1631","volume-title":"Proceedings of the 2013 conference on empirical methods in natural language processing","author":"Socher R","unstructured":"Socher R, Perelygin A, Wu J, et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. EMNLP \u201913, Stroudsburg, USA: Association for Computational Linguistics, pp.1631\u20131642."},{"key":"e_1_3_2_44_2","first-page":"655","volume-title":"Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers)","author":"Kalchbrenner N","unstructured":"Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers). ACL \u201914, Stroudsburg, USA: Association for Computational Linguistics, pp.655\u2013665."},{"key":"e_1_3_2_45_2","volume-title":"Advances in neural information processing systems","author":"Kiros R","unstructured":"Kiros R, Zhu Y, Salakhutdinov RR, et al. Skip-thought vectors. In: Advances in neural information processing systems, vol. 28. Red Hook, USA: Curran Associates, Inc."},{"key":"e_1_3_2_46_2","first-page":"169","volume-title":"Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations","author":"Cer D","unstructured":"Cer D, Yang Y, Kong S, et al. Universal sentence encoder for English. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations. EMNLP \u201918, Stroudsburg, USA: Association for Computational Linguistics, pp.169\u2013174."},{"key":"e_1_3_2_47_2","first-page":"3982","volume-title":"Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing","author":"Reimers N","unstructured":"Reimers N, Gurevych I. Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing. EMNLP-IJCNLP \u201919, Stroudsburg, USA: Association for Computational Linguistics, pp.3982\u20133992."},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560591"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007617005950"},{"key":"e_1_3_2_51_2","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei DM","year":"2003","unstructured":"Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res 2003; 3: 993\u20131022.","journal-title":"J Mach Learn Res"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1038\/44565"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.48550\/2203.05794.2203.05794"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2008.09470.2008.09470"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2016.2615308"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2018.2865146"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.1902"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21227"},{"key":"e_1_3_2_59_2","doi-asserted-by":"crossref","unstructured":"Kucher K Martins RM Kerren A. Analysis of VINCI 2009\u20132017 Proceedings. In: VINCI \u201918 Association for Computing Machinery. New York USA: ACM pp.97\u2013101.","DOI":"10.1145\/3231622.3231641"},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2404.05961"},{"key":"e_1_3_2_61_2","unstructured":"Meta. Meta-Llama-3-8B https:\/\/huggingface.co\/meta-llama\/Meta-Llama-3-8B (2024 accessed 18 December 2024)."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2013.126"},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","unstructured":"Minaee S Mikolov T Nikzad N et al. Large language models: a survey. arXiv Preprints 2024. DOI: 10.48550\/arXiv.2402.06196.","DOI":"10.48550\/arXiv.2402.06196"},{"key":"e_1_3_2_64_2","doi-asserted-by":"crossref","unstructured":"Wang L Yang N Huang X et al. Improving text embeddings with large language models. In: Proceedings of the annual meeting of the association for computational linguistics (volume 1: long papers). Stroudsburg USA: ACL pp.11897\u201311916.","DOI":"10.18653\/v1\/2024.acl-long.642"},{"key":"e_1_3_2_65_2","first-page":"2014","volume-title":"Proceedings of the conference of the European Chapter of the association for computational linguistics","author":"Muennighoff N","unstructured":"Muennighoff N, Tazi N, Magne L, et al. MTEB: Massive text embedding benchmark. In: Vlachos A, Augenstein I (eds.) Proceedings of the conference of the European Chapter of the association for computational linguistics. EACL \u201923, Stroudsburg, USA: ACL, pp.2014\u20132037."},{"issue":"2","key":"e_1_3_2_66_2","first-page":"1149","article-title":"A survey of community detection approaches: from statistical modeling to deep learning","volume":"35","author":"Jin D","year":"2023","unstructured":"Jin D, Yu Z, Jiao P, et al. A survey of community detection approaches: from statistical modeling to deep learning. IEEE Trans Knowl Data Eng 2023; 35(2): 1149\u20131170.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TBDATA.2018.2850013"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2022.3231230"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.3390\/info11090421"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1145\/3440755"}],"container-title":["Information Visualization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14738716241312400","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/14738716241312400","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/14738716241312400","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T19:19:23Z","timestamp":1777490363000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/14738716241312400"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,27]]},"references-count":69,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,4]]}},"alternative-id":["10.1177\/14738716241312400"],"URL":"https:\/\/doi.org\/10.1177\/14738716241312400","relation":{},"ISSN":["1473-8716","1473-8724"],"issn-type":[{"value":"1473-8716","type":"print"},{"value":"1473-8724","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,27]]}}}