{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,23]],"date-time":"2025-03-23T04:23:56Z","timestamp":1742703836887,"version":"3.40.2"},"reference-count":85,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T00:00:00Z","timestamp":1742515200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"The Knowledge Foundation, Sweden","award":["20210077"],"award-info":[{"award-number":["20210077"]}]},{"DOI":"10.13039\/501100005967","name":"Linnaeus University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005967","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Netw Sci"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Similarity-based analysis is a powerful and intuitive tool for exploring large data sets, for instance, for revealing patterns by grouping items by similarity or for recommending items based on selected samples. However, similarity is an abstract and subjective property which makes it hard to evaluate by a purely computational approach. Furthermore, there are usually several possible computational models that could be applied to the data, each with its own strengths and weaknesses. With this in mind, we aim to extend the research frontier regarding what impact the choice of a computational model may have on the results. In this paper, we target the scope of embedding-based similarity calculations on text documents and seek to answer the research question: \u201cHow can a better understanding of the continuous similarity distribution captured by different models lead to better similarity calculations on document sets?\u201d. We propose a new and generic methodology based on similarity network comparison, and based on this approach, we have developed a computational pipeline together with a prototype visual analytics tool that allows the user to easily assess the level of model agreement\/disagreement. To demonstrate the potential of our method, as well as showing its application to real world scenarios, we apply it in an experimental setup using three state-of-the-art text embedding models and three different text corpora. In view of the surprisingly low level of model agreement regarding the data, we also discuss strategies for handling model disagreement.<\/jats:p>","DOI":"10.1007\/s41109-025-00699-7","type":"journal-article","created":{"date-parts":[[2025,3,23]],"date-time":"2025-03-23T00:04:58Z","timestamp":1742688298000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Using similarity network analysis to improve text similarity calculations"],"prefix":"10.1007","volume":"10","author":[{"given":"Daniel","family":"Witschard","sequence":"first","affiliation":[]},{"given":"Kostiantyn","family":"Kucher","sequence":"additional","affiliation":[]},{"given":"Ilir","family":"Jusufi","sequence":"additional","affiliation":[]},{"given":"Andreas","family":"Kerren","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,21]]},"reference":[{"key":"699_CR1","doi-asserted-by":"publisher","first-page":"52138","DOI":"10.1109\/ACCESS.2018.2870052","volume":"6","author":"A Adadi","year":"2018","unstructured":"Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138\u201352160. https:\/\/doi.org\/10.1109\/ACCESS.2018.2870052","journal-title":"IEEE Access"},{"key":"699_CR2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-73531-3","volume-title":"Machine learning for text","author":"CC Aggarwal","year":"2018","unstructured":"Aggarwal CC (2018) Machine learning for text. Springer International Publishing, Cham. https:\/\/doi.org\/10.1007\/978-3-319-73531-3"},{"key":"699_CR3","unstructured":"Agirre E, Cer D, Diab M, et\u00a0al (2012) SemEval-2012 Task 6: A pilot on semantic textual similarity. In: *SEM 2012: the first joint conference on lexical and computational semantics \u2013 vol 1: proceedings of the main conference and the shared task, and vol 2: proceedings of the 6th international workshop on semantic evaluation (SemEval 2012). ACL, Montr\u00e9al, pp 385\u2013393, https:\/\/aclanthology.org\/S12-1051"},{"key":"699_CR4","doi-asserted-by":"publisher","unstructured":"Almeida F, Xex\u00e9o G (2019) Word embeddings: a survey. CoRR abs\/1901.09069. https:\/\/doi.org\/10.48550\/arXiv.1901.09069","DOI":"10.48550\/arXiv.1901.09069"},{"issue":"4","key":"699_CR5","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1609\/aimag.v35i4.2513","volume":"35","author":"S Amershi","year":"2014","unstructured":"Amershi S, Cakmak M, Knox WB et al (2014) Power to the people: the role of humans in interactive machine learning. AI Magazine 35(4):105\u2013120. https:\/\/doi.org\/10.1609\/aimag.v35i4.2513","journal-title":"AI Magazine"},{"key":"699_CR6","doi-asserted-by":"publisher","DOI":"10.5311\/JOSIS.2013.7.128","author":"A Ballatore","year":"2014","unstructured":"Ballatore A, Bertolotto M, Wilson D (2014) The semantic similarity ensemble. J Inf Spatial Sci. https:\/\/doi.org\/10.5311\/JOSIS.2013.7.128","journal-title":"J Inf Spatial Sci"},{"key":"699_CR7","unstructured":"Bax ET, Le Y (2015) Some theory for practical classifier validation. ArXiv abs\/1510.02676. https:\/\/api.semanticscholar.org\/CorpusID:14729266"},{"key":"699_CR8","doi-asserted-by":"publisher","unstructured":"Beck F, Krause C (2022) Visually explaining publication ranks in citation-based literature search with PURE Suggest. In: EuroVis 2022 - Posters. The Eurographics Association, https:\/\/doi.org\/10.2312\/evp.20221110","DOI":"10.2312\/evp.20221110"},{"key":"699_CR9","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1162\/tacl_a_00254","volume":"7","author":"Y Belinkov","year":"2019","unstructured":"Belinkov Y, Glass J (2019) Analysis methods in neural language processing: a survey. Trans Assoc Comput Linguist 7:49\u201372. https:\/\/doi.org\/10.1162\/tacl_a_00254","journal-title":"Trans Assoc Comput Linguist"},{"key":"699_CR10","first-page":"1137","volume":"3","author":"Y Bengio","year":"2003","unstructured":"Bengio Y, Ducharme R, Vincent P et al (2003) A neural probabilistic language model. J Mach Learn Res 3:1137\u20131155","journal-title":"J Mach Learn Res"},{"issue":"8","key":"699_CR11","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798\u20131828. https:\/\/doi.org\/10.1109\/TPAMI.2013.50","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"699_CR12","doi-asserted-by":"publisher","first-page":"98144","DOI":"10.1109\/ACCESS.2019.2929754","volume":"7","author":"A Benito-Santos","year":"2019","unstructured":"Benito-Santos A, Ther\u00f3n S\u00e1nchez R (2019) Cross-domain visual exploration of academic corpora via the latent meaning of user-authored keywords. IEEE Access 7:98144\u201398160. https:\/\/doi.org\/10.1109\/ACCESS.2019.2929754","journal-title":"IEEE Access"},{"issue":"1","key":"699_CR13","doi-asserted-by":"publisher","first-page":"691","DOI":"10.1109\/TVCG.2016.2598667","volume":"23","author":"M Berger","year":"2017","unstructured":"Berger M, McDonough K, Seversky LM (2017) cite2vec: Citation-driven document exploration via word embeddings. IEEE Trans Visual Comput Gr 23(1):691\u2013700. https:\/\/doi.org\/10.1109\/TVCG.2016.2598667","journal-title":"IEEE Trans Visual Comput Gr"},{"key":"699_CR14","doi-asserted-by":"publisher","DOI":"10.3390\/jintelligence11090172","author":"I Bianchi","year":"2023","unstructured":"Bianchi I, Burro R (2023) The perception of similarity, difference and opposition. J Intell. https:\/\/doi.org\/10.3390\/jintelligence11090172","journal-title":"J Intell"},{"key":"699_CR15","unstructured":"Boggust A, Carter B, Satyanarayan A (2019) Embedding comparator: visualizing differences in global structure and local neighborhoods via small multiples. CoRR abs\/1912.04853. arXiv:1912.04853"},{"issue":"12","key":"699_CR16","doi-asserted-by":"publisher","first-page":"2389","DOI":"10.1002\/asi.21419","volume":"61","author":"KW Boyack","year":"2010","unstructured":"Boyack KW, Klavans R (2010) Co-citation analysis, bibliographic coupling, and direct citation: which citation approach represents the research front most accurately? J Am Soc Inf Sci Technol 61(12):2389\u20132404. https:\/\/doi.org\/10.1002\/asi.21419","journal-title":"J Am Soc Inf Sci Technol"},{"issue":"2","key":"699_CR17","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1109\/MCG.2020.3033401","volume":"41","author":"P Caillou","year":"2021","unstructured":"Caillou P, Renault J, Fekete JD et al (2021) Cartolabe: A web-based scalable visualization of large document collections. IEEE Comput Gr Appl 41(2):76\u201388. https:\/\/doi.org\/10.1109\/MCG.2020.3033401","journal-title":"IEEE Comput Gr Appl"},{"key":"699_CR18","doi-asserted-by":"publisher","unstructured":"Cer D, Yang Y, Kong S, et\u00a0al (2018) Universal sentence encoder for English. In: Proceedings of the conference on empirical methods in natural language processing: system demonstrations. Association for Computational Linguistics, EMNLP\u00a0\u201918, pp 169\u2013174. https:\/\/doi.org\/10.18653\/v1\/D18-2029","DOI":"10.18653\/v1\/D18-2029"},{"key":"699_CR19","doi-asserted-by":"publisher","DOI":"10.1145\/3440755","author":"D Chandrasekaran","year":"2021","unstructured":"Chandrasekaran D, Mago V (2021) Evolution of semantic similarity-a survey. ACM Comput Surv. https:\/\/doi.org\/10.1145\/3440755","journal-title":"ACM Comput Surv"},{"issue":"3","key":"699_CR20","doi-asserted-by":"publisher","first-page":"713","DOI":"10.1111\/cgf.14034","volume":"39","author":"A Chatzimparmpas","year":"2020","unstructured":"Chatzimparmpas A, Martins RM, Jusufi I et al (2020) The state of the art in enhancing trust in machine learning models with the use of visualizations. Comput Gr Forum 39(3):713\u2013756. https:\/\/doi.org\/10.1111\/cgf.14034","journal-title":"Comput Gr Forum"},{"issue":"3","key":"699_CR21","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1002\/asi.20317","volume":"57","author":"C Chen","year":"2006","unstructured":"Chen C (2006) CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol 57(3):359\u2013377. https:\/\/doi.org\/10.1002\/asi.20317","journal-title":"J Am Soc Inf Sci Technol"},{"issue":"6","key":"699_CR22","doi-asserted-by":"publisher","first-page":"1161","DOI":"10.1109\/TVCG.2009.140","volume":"15","author":"Y Chen","year":"2009","unstructured":"Chen Y, Wang L, Dong M et al (2009) Exemplar-based visualization of large document corpus. IEEE Trans Visual Comput Gr 15(6):1161\u20131168. https:\/\/doi.org\/10.1109\/TVCG.2009.140","journal-title":"IEEE Trans Visual Comput Gr"},{"key":"699_CR23","first-page":"43","volume":"8","author":"S Choi","year":"2009","unstructured":"Choi S, Cha SH, Tappert C (2009) A survey of binary similarity and distance measures. J Syst Cybern Inf 8:43\u201348","journal-title":"J Syst Cybern Inf"},{"key":"699_CR24","unstructured":"CNN (2024) https:\/\/edition.cnn.com\/"},{"key":"699_CR25","doi-asserted-by":"publisher","unstructured":"Cohan A, Feldman S, Beltagy I, et\u00a0al (2020) SPECTER: Document-level representation learning using citation-informed transformers. In: Proceedings of the annual meeting of the association for computational linguistics. ACL, pp 2270\u20132282. https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.207","DOI":"10.18653\/v1\/2020.acl-main.207"},{"key":"699_CR26","first-page":"2493","volume":"12","author":"R Collobert","year":"2011","unstructured":"Collobert R, Weston J, Bottou L et al (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493\u20132537","journal-title":"J Mach Learn Res"},{"key":"699_CR27","unstructured":"Devlin J, Chang M, Lee K, et\u00a0al (2018) BERT: Pre-training of deep bidirectional transformers for language understanding. CoRR abs\/1810.04805. arXiv:1810.04805"},{"key":"699_CR28","doi-asserted-by":"publisher","DOI":"10.1007\/s13164-023-00692-y","author":"N Di Stefano","year":"2023","unstructured":"Di Stefano N, Spence C (2023) Perceptual similarity: Insights from crossmodal correspondences. Rev Philos Psychol. https:\/\/doi.org\/10.1007\/s13164-023-00692-y","journal-title":"Rev Philos Psychol"},{"key":"699_CR29","doi-asserted-by":"publisher","unstructured":"Dias AG, Milios EE, de\u00a0Oliveira MCF (2019) TRIVIR: A visualization system to support document retrieval with high recall. In: Proceedings of the ACM symposium on document engineering. ACM, DocEng\u00a0\u201919, pp 1\u201310. https:\/\/doi.org\/10.1145\/3342558.3345401","DOI":"10.1145\/3342558.3345401"},{"issue":"8","key":"699_CR30","doi-asserted-by":"publisher","first-page":"458","DOI":"10.1111\/cgf.13092","volume":"36","author":"A Endert","year":"2017","unstructured":"Endert A, Ribarsky W, Turkay C et al (2017) The state of the art in integrating machine learning into visual analytics. Comput Gr Forum 36(8):458\u2013486. https:\/\/doi.org\/10.1111\/cgf.13092","journal-title":"Comput Gr Forum"},{"key":"699_CR31","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2022.104743","volume":"110","author":"AE Ezugwu","year":"2022","unstructured":"Ezugwu AE, Ikotun AM, Oyelade OO et al (2022) A comprehensive survey of clustering algorithms: state-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng Appl Artif Intell 110:104743. https:\/\/doi.org\/10.1016\/j.engappai.2022.104743","journal-title":"Eng Appl Artif Intell"},{"issue":"9","key":"699_CR32","doi-asserted-by":"publisher","first-page":"2179","DOI":"10.1109\/TVCG.2016.2610422","volume":"23","author":"P Federico","year":"2017","unstructured":"Federico P, Heimerl F, Koch S et al (2017) A survey on visual approaches for analyzing scientific literature and patents. IEEE Trans Visual Comput Gr 23(9):2179\u20132198. https:\/\/doi.org\/10.1109\/TVCG.2016.2610422","journal-title":"IEEE Trans Visual Comput Gr"},{"key":"699_CR33","doi-asserted-by":"publisher","unstructured":"Fekete JD (2009) Visualizing networks using adjacency matrices: Progresses and challenges. In: Proceedings of the IEEE international conference on computer-aided design and computer graphics. IEEE, CAD\/Graphics\u00a0\u201909, pp 636\u2013638. https:\/\/doi.org\/10.1109\/CADCG.2009.5246813","DOI":"10.1109\/CADCG.2009.5246813"},{"key":"699_CR34","doi-asserted-by":"publisher","unstructured":"Ghoniem M, Fekete JD, Castagliola P (2004) A comparison of the readability of graphs using node-link and matrix-based representations. In: Proceedings of the IEEE symposium on information visualization. IEEE, InfoVis\u00a0\u201904, pp 17\u201324. https:\/\/doi.org\/10.1109\/INFVIS.2004.1","DOI":"10.1109\/INFVIS.2004.1"},{"key":"699_CR35","doi-asserted-by":"publisher","unstructured":"Gilpin LH, Bau D, Yuan BZ, et\u00a0al (2018) Explaining explanations: an overview of interpretability of machine learning. In: Proceedings of the IEEE international conference on data science and advanced analytics. IEEE, pp 80\u201389. https:\/\/doi.org\/10.1109\/DSAA.2018.00018","DOI":"10.1109\/DSAA.2018.00018"},{"key":"699_CR36","doi-asserted-by":"publisher","DOI":"10.5120\/11638-7118","author":"W Gomaa","year":"2013","unstructured":"Gomaa W, Fahmy A (2013) A survey of text similarity approaches. Int J Comput Appl. https:\/\/doi.org\/10.5120\/11638-7118","journal-title":"Int J Comput Appl"},{"key":"699_CR37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3236009","volume":"51","author":"R Guidotti","year":"2018","unstructured":"Guidotti R, Monreale A, Turini F et al (2018) A survey of methods for explaining black box models. ACM Comput Surv 51:1\u201342. https:\/\/doi.org\/10.1145\/3236009","journal-title":"ACM Comput Surv"},{"issue":"8","key":"699_CR38","doi-asserted-by":"publisher","first-page":"843","DOI":"10.1002\/asi.24171","volume":"70","author":"J He","year":"2019","unstructured":"He J, Ping Q, Lou W et al (2019) PaperPoles: Facilitating adaptive visual exploration of scientific publications by citation links. J Assoc Inf Sci Technol 70(8):843\u2013857. https:\/\/doi.org\/10.1002\/asi.24171","journal-title":"J Assoc Inf Sci Technol"},{"issue":"3","key":"699_CR39","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1111\/cgf.13417","volume":"37","author":"F Heimerl","year":"2018","unstructured":"Heimerl F, Gleicher M (2018) Interactive analysis of word vector embeddings. Comput Gr Forum 37(3):253\u2013265. https:\/\/doi.org\/10.1111\/cgf.13417","journal-title":"Comput Gr Forum"},{"key":"699_CR40","doi-asserted-by":"publisher","DOI":"10.1109\/TVCG.2020.3045918","author":"F Heimerl","year":"2020","unstructured":"Heimerl F, Kralj C, Moller T et al (2020) embComp: Visual interactive comparison of vector embeddings. IEEE Trans Visual Comput Gr. https:\/\/doi.org\/10.1109\/TVCG.2020.3045918","journal-title":"IEEE Trans Visual Comput Gr"},{"issue":"3","key":"699_CR41","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1111\/cgf.14859","volume":"42","author":"Z Huang","year":"2023","unstructured":"Huang Z, Witschard D, Kucher K et al (2023) VA + Embeddings STAR: a state-of-the-art report on the use of embeddings in visual analytics. Comput Gr Forum 42(3):539\u2013571. https:\/\/doi.org\/10.1111\/cgf.14859","journal-title":"Comput Gr Forum"},{"issue":"9","key":"699_CR42","doi-asserted-by":"publisher","first-page":"2199","DOI":"10.1109\/TVCG.2016.2615308","volume":"23","author":"P Isenberg","year":"2017","unstructured":"Isenberg P, Heimerl F, Koch S et al (2017) Vispubdata.org: A metadata collection about IEEE visualization (VIS) publications. IEEE Trans Visual Comput Gr 23(9):2199\u20132206. https:\/\/doi.org\/10.1109\/TVCG.2016.2615308","journal-title":"IEEE Trans Visual Comput Gr"},{"issue":"1145\/1376815","key":"699_CR43","first-page":"1376819","volume":"10","author":"A Islam","year":"2008","unstructured":"Islam A, Inkpen D (2008) Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans Knowl Discov Data 10(1145\/1376815):1376819","journal-title":"ACM Trans Knowl Discov Data"},{"issue":"6","key":"699_CR44","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1111\/cgf.12873","volume":"36","author":"S J\u00e4nicke","year":"2017","unstructured":"J\u00e4nicke S, Franzini G, Cheema MF et al (2017) Visual text analysis in digital humanities. Comput Gr Forum 36(6):226\u2013250. https:\/\/doi.org\/10.1111\/cgf.12873","journal-title":"Comput Gr Forum"},{"issue":"6","key":"699_CR45","doi-asserted-by":"publisher","first-page":"2181","DOI":"10.1109\/TVCG.2019.2903946","volume":"25","author":"X Ji","year":"2019","unstructured":"Ji X, Shen HW, Ritter A et al (2019) Visual exploration of neural document embedding in information retrieval: semantics and feature selection. IEEE Trans Visual Comput Gr 25(6):2181\u20132192. https:\/\/doi.org\/10.1109\/TVCG.2019.2903946","journal-title":"IEEE Trans Visual Comput Gr"},{"issue":"3","key":"699_CR46","doi-asserted-by":"publisher","first-page":"2663","DOI":"10.1007\/s40747-021-00637-x","volume":"8","author":"W Jia","year":"2022","unstructured":"Jia W, Sun M, Lian J et al (2022) Feature dimensionality reduction: a review. Complex Intell Syst 8(3):2663\u20132693. https:\/\/doi.org\/10.1007\/s40747-021-00637-x","journal-title":"Complex Intell Syst"},{"key":"699_CR47","unstructured":"Kaggle (2024) https:\/\/www.kaggle.com\/datasets\/quora\/question-pairs-dataset"},{"key":"699_CR48","doi-asserted-by":"crossref","unstructured":"Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. CoRR abs\/1404.2188. arXiv:1404.2188","DOI":"10.3115\/v1\/P14-1062"},{"key":"699_CR49","unstructured":"Keim DA, Kohlhammer J, Ellis G, et\u00a0al (eds) (2010) Mastering the information age: solving problems with visual analytics. Eurographics Association"},{"key":"699_CR50","doi-asserted-by":"publisher","unstructured":"Kerren A, Schreiber F (2012) Toward the role of interaction in visual analytics. In: Proceedings of the winter simulation conference. IEEE, WSC\u00a0\u201912. https:\/\/doi.org\/10.1109\/WSC.2012.6465208","DOI":"10.1109\/WSC.2012.6465208"},{"key":"699_CR51","unstructured":"Kiros R, Zhu Y, Salakhutdinov R, et\u00a0al (2015) Skip-thought vectors. CoRR abs\/1506.06726. arXiv:1506.06726"},{"key":"699_CR52","doi-asserted-by":"publisher","unstructured":"Kucher K, Kerren A (2015) Text visualization techniques: Taxonomy, visual survey, and community insights. In: Proceedings of the IEEE Pacific visualization symposium. IEEE, PacificVis\u00a0\u201915, pp 117\u2013121 https:\/\/doi.org\/10.1109\/PACIFICVIS.2015.7156366","DOI":"10.1109\/PACIFICVIS.2015.7156366"},{"key":"699_CR53","doi-asserted-by":"publisher","unstructured":"Kucher K, Kerren A (2023) Supporting university research and administration via interactive visual exploration of bibliographic data. In: Proceedings of the 18th international joint conference on computer vision, Imaging and computer graphics theory and applications (VISIGRAPP\u00a0\u201923), vol. 3. IVAPP SciTePress, IVAPP\u00a0\u201923, pp 248\u2013255. https:\/\/doi.org\/10.5220\/0011806900003417","DOI":"10.5220\/0011806900003417"},{"key":"699_CR54","unstructured":"Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on machine learning. PMLR, ICML\u00a0\u201914, pp 1188\u20131196. http:\/\/proceedings.mlr.press\/v32\/le14.pdf"},{"issue":"1","key":"699_CR55","doi-asserted-by":"publisher","first-page":"1182","DOI":"10.1109\/TVCG.2019.2934667","volume":"26","author":"Z Li","year":"2020","unstructured":"Li Z, Zhang C, Jia S et al (2020) Galex: Exploring the evolution and intersection of disciplines. IEEE Trans Visual Comput Gr 26(1):1182\u20131192. https:\/\/doi.org\/10.1109\/TVCG.2019.2934667","journal-title":"IEEE Trans Visual Comput Gr"},{"key":"699_CR56","doi-asserted-by":"publisher","first-page":"19205","DOI":"10.1109\/ACCESS.2018.2815030","volume":"6","author":"J Liu","year":"2018","unstructured":"Liu J, Tang T, Wang W et al (2018) A survey of scholarly data visualization. IEEE Access 6:19205\u201319221. https:\/\/doi.org\/10.1109\/ACCESS.2018.2815030","journal-title":"IEEE Access"},{"issue":"7","key":"699_CR57","doi-asserted-by":"publisher","first-page":"2482","DOI":"10.1109\/TVCG.2018.2834341","volume":"25","author":"S Liu","year":"2019","unstructured":"Liu S, Wang X, Collins C et al (2019) Bridging text visualization and mining: a task-driven survey. IEEE Trans Visual Comput Gr 25(7):2482\u20132504. https:\/\/doi.org\/10.1109\/TVCG.2018.2834341","journal-title":"IEEE Trans Visual Comput Gr"},{"key":"699_CR58","doi-asserted-by":"publisher","unstructured":"Li K, Yang H, Montoya E, et\u00a0al (2022) Visual exploration of literature with Argo Scholar. In: Proceedings of the 31st ACM international conference on information and knowledge management. Association for computing machinery, New York, CIKM\u00a0\u201922, pp 4912-4916. https:\/\/doi.org\/10.1145\/3511808.3557177","DOI":"10.1145\/3511808.3557177"},{"key":"699_CR59","volume-title":"Foundations of statistical natural language processing","author":"CD Manning","year":"1999","unstructured":"Manning CD, Sch\u00fctze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge"},{"key":"699_CR60","doi-asserted-by":"publisher","DOI":"10.1016\/j.mlwa.2022.100423","volume":"10","author":"J Martinez-Gil","year":"2022","unstructured":"Martinez-Gil J (2022) A comprehensive review of stacking methods for semantic similarity measurement. Mach Learn Appl 10:100423. https:\/\/doi.org\/10.1016\/j.mlwa.2022.100423","journal-title":"Mach Learn Appl"},{"key":"699_CR61","unstructured":"Mikolov T, Sutskever I, Chen K, et\u00a0al (2013) Distributed representations of words and phrases and their compositionality. CoRR abs\/1310.4546. arXiv:1310.4546"},{"issue":"8","key":"699_CR62","doi-asserted-by":"publisher","first-page":"1388","DOI":"10.1111\/j.1551-6709.2010.01106.x","volume":"34","author":"J Mitchell","year":"2010","unstructured":"Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cogn Sci 34(8):1388\u20131429. https:\/\/doi.org\/10.1111\/j.1551-6709.2010.01106.x","journal-title":"Cogn Sci"},{"issue":"1","key":"699_CR63","doi-asserted-by":"publisher","first-page":"486","DOI":"10.1109\/TVCG.2021.3114820","volume":"28","author":"A Narechania","year":"2022","unstructured":"Narechania A, Karduni A, Wesslen R et al (2022) VITALITY: Promoting serendipitous discovery of academic literature with transformers and visual analytics. IEEE Trans Visual Comput Gr 28(1):486\u2013496. https:\/\/doi.org\/10.1109\/TVCG.2021.3114820","journal-title":"IEEE Trans Visual Comput Gr"},{"issue":"1","key":"699_CR64","doi-asserted-by":"publisher","first-page":"361","DOI":"10.1109\/TVCG.2017.2744478","volume":"24","author":"D Park","year":"2018","unstructured":"Park D, Kim S, Lee J et al (2018) ConceptVector: Text visual analytics via interactive lexicon building using word embedding. IEEE Trans Visual Comput Gr 24(1):361\u2013370. https:\/\/doi.org\/10.1109\/TVCG.2017.2744478","journal-title":"IEEE Trans Visual Comput Gr"},{"key":"699_CR65","first-page":"348","volume":"25","author":"A Pritchard","year":"1969","unstructured":"Pritchard A (1969) Statistical bibliography or bibliometrics? J Doc 25:348\u2013349","journal-title":"J Doc"},{"key":"699_CR66","doi-asserted-by":"publisher","unstructured":"Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In: Proceedings of the conference on empirical methods in natural language processing and the international joint conference on natural language processing. Association for Computational Linguistics, EMNLP-IJCNLP\u00a0\u201919, pp 3982\u20133992. https:\/\/doi.org\/10.18653\/v1\/D19-1410","DOI":"10.18653\/v1\/D19-1410"},{"key":"699_CR67","doi-asserted-by":"publisher","unstructured":"Rezaeipourfarsangi S, Pei N, Sherkat E, et\u00a0al (2022) Interactive clustering and high-recall information retrieval using language models. In: Proceedings of the international conference on advanced visual interfaces. ACM, AVI\u00a0\u201922, pp 1\u20135. https:\/\/doi.org\/10.1145\/3531073.3531174","DOI":"10.1145\/3531073.3531174"},{"key":"699_CR68","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1016\/j.neucom.2017.01.105","volume":"268","author":"D Sacha","year":"2017","unstructured":"Sacha D, Sedlmair M, Zhang L et al (2017) What you see is what you can change: Human-centered machine learning by interactive visualization. Neurocomputing 268:164\u2013175. https:\/\/doi.org\/10.1016\/j.neucom.2017.01.105","journal-title":"Neurocomputing"},{"issue":"1","key":"699_CR69","doi-asserted-by":"publisher","first-page":"385","DOI":"10.1109\/TVCG.2018.2864838","volume":"25","author":"D Sacha","year":"2019","unstructured":"Sacha D, Kraus M, Keim DA et al (2019) VIS4ML: An ontology for visual analytics assisted machine learning. IEEE Trans Visual Comput Gr 25(1):385\u2013395. https:\/\/doi.org\/10.1109\/TVCG.2018.2864838","journal-title":"IEEE Trans Visual Comput Gr"},{"issue":"5","key":"699_CR70","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1016\/0306-4573(88)90021-0","volume":"24","author":"G Salton","year":"1988","unstructured":"Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513\u2013523. https:\/\/doi.org\/10.1016\/0306-4573(88)90021-0","journal-title":"Inf Process Manag"},{"key":"699_CR71","doi-asserted-by":"publisher","first-page":"25","DOI":"10.5120\/13897-1851","volume":"80","author":"T Slimani","year":"2013","unstructured":"Slimani T (2013) Description and evaluation of semantic similarity measures approaches. Int J Comput Appl 80:25\u201333. https:\/\/doi.org\/10.5120\/13897-1851","journal-title":"Int J Comput Appl"},{"key":"699_CR72","unstructured":"Smilkov D, Thorat N, Nicholson C, et\u00a0al (2016) Embedding Projector: Interactive visualization and interpretation of embeddings. In: Proceedings of the NIPS 2016 workshop on interpretable machine learning for complex systems. arXiv:1611.05469"},{"key":"699_CR73","doi-asserted-by":"crossref","unstructured":"Socher R, Perelygin A, Wu J, et\u00a0al (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing. ACL, EMNLP\u00a0\u201913, pp 1631\u20131642","DOI":"10.18653\/v1\/D13-1170"},{"key":"699_CR74","doi-asserted-by":"crossref","unstructured":"Stapor K (2018) Evaluating and comparing classifiers: Review, some recommendations and limitations. In: Kurzynski M, Wozniak M, Burduk R (eds) Proceedings of the 10th international conference on computer recognition systems CORES 2017. Springer International Publishing, Cham, pp 12\u201321","DOI":"10.1007\/978-3-319-59162-9_2"},{"key":"699_CR75","doi-asserted-by":"publisher","unstructured":"Steck H, Ekanadham C, Kallus N (2024) Is cosine-similarity of embeddings really about similarity? In: Companion proceedings of the ACM on Web conference 2024. Association for Computing Machinery, New York, WWW\u00a0\u201924, p 887-890. https:\/\/doi.org\/10.1145\/3589335.3651526","DOI":"10.1145\/3589335.3651526"},{"key":"699_CR76","doi-asserted-by":"crossref","unstructured":"Thompson VU, Panchev C, Oakes M (2015) Performance evaluation of similarity measures on similar and dissimilar text retrieval. In: 2015 7th international joint conference on knowledge discovery, knowledge engineering and kknowledge management (IC3K), pp 577\u2013584","DOI":"10.5220\/0005619105770584"},{"issue":"6","key":"699_CR77","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1007\/s12650-023-00941-3","volume":"26","author":"M Tian","year":"2023","unstructured":"Tian M, Li G, Yuan X (2023) LitVis: A visual analytics approach for managing and exploring literature. J Visual 26(6):1445\u20131458. https:\/\/doi.org\/10.1007\/s12650-023-00941-3","journal-title":"J Visual"},{"key":"699_CR78","doi-asserted-by":"publisher","unstructured":"Toshevska M, Stojanovska F, Kalajdjieski J (2020) Comparative analysis of word embeddings for capturing word similarities. In: Proceedings of the 6th international conference on natural language processing. AIRCC Publishing Corporation, NATP\u00a0\u201920, pp 9\u201324. https:\/\/doi.org\/10.5121\/csit.2020.100402","DOI":"10.5121\/csit.2020.100402"},{"key":"699_CR79","unstructured":"Turian J, Ratinov L, Bengio Y (2010) Word representations: A simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics. ACL, USA, ACL\u00a0\u201910, pp 384\u2013394. https:\/\/www.aclweb.org\/anthology\/P10-1040.pdf"},{"issue":"4","key":"699_CR80","doi-asserted-by":"publisher","first-page":"327","DOI":"10.1037\/0033-295X.84.4.327","volume":"84","author":"A Tversky","year":"1977","unstructured":"Tversky A (1977) Features of similarity. Psychol Rev 84(4):327\u2013352. https:\/\/doi.org\/10.1037\/0033-295X.84.4.327","journal-title":"Psychol Rev"},{"key":"699_CR81","doi-asserted-by":"publisher","DOI":"10.3390\/info11090421","author":"J Wang","year":"2020","unstructured":"Wang J, Dong Y (2020) Measurement of text similarity: a survey. Information. https:\/\/doi.org\/10.3390\/info11090421","journal-title":"Information"},{"key":"699_CR82","doi-asserted-by":"crossref","unstructured":"Wang X, Huang Z, van Harmelen F (2020) Evaluating similarity measures for dataset search. In: Huang Z, Beek W, Wang H et al (eds) Web information systems engineering-WISE 2020. Springer International Publishing, Cham, pp 38\u201351","DOI":"10.1007\/978-3-030-62008-0_3"},{"issue":"4","key":"699_CR83","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1177\/14738716221114372","volume":"21","author":"D Witschard","year":"2022","unstructured":"Witschard D, Jusufi I, Martins RM et al (2022) Interactive optimization of embedding-based text similarity calculations. Inf Visual 21(4):335\u2013353. https:\/\/doi.org\/10.1177\/14738716221114372","journal-title":"Inf Visual"},{"issue":"2","key":"699_CR84","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1007\/s12650-017-0462-2","volume":"21","author":"C Zhang","year":"2018","unstructured":"Zhang C, Li Z, Zhang J (2018) A survey on visualization for scientific literature topics. J Visual 21(2):321\u2013335. https:\/\/doi.org\/10.1007\/s12650-017-0462-2","journal-title":"J Visual"},{"key":"699_CR85","doi-asserted-by":"publisher","unstructured":"Zhou K, Ethayarajh K, Card D, et\u00a0al (2022) Problems with cosine as a measure of embedding similarity for high frequency words. In: Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Dublin, pp 401\u2013423. https:\/\/doi.org\/10.18653\/v1\/2022.acl-short.45","DOI":"10.18653\/v1\/2022.acl-short.45"}],"container-title":["Applied Network Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41109-025-00699-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41109-025-00699-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41109-025-00699-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,23]],"date-time":"2025-03-23T00:05:30Z","timestamp":1742688330000},"score":1,"resource":{"primary":{"URL":"https:\/\/appliednetsci.springeropen.com\/articles\/10.1007\/s41109-025-00699-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,21]]},"references-count":85,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["699"],"URL":"https:\/\/doi.org\/10.1007\/s41109-025-00699-7","relation":{},"ISSN":["2364-8228"],"issn-type":[{"value":"2364-8228","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,21]]},"assertion":[{"value":"13 January 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 March 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 March 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no Conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"8"}}