{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T10:20:14Z","timestamp":1777890014815,"version":"3.51.4"},"reference-count":41,"publisher":"SAGE Publications","issue":"5","license":[{"start":{"date-parts":[[2020,8,25]],"date-time":"2020-08-25T00:00:00Z","timestamp":1598313600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["SW"],"published-print":{"date-parts":[[2020,8,25]]},"abstract":"<jats:p>Searching for similar documents and exploring major themes covered across groups of documents are common activities when browsing collections of scientific papers. This manual knowledge-intensive task can become less tedious and even lead to unexpected relevant findings if unsupervised algorithms are applied to help researchers. Most text mining algorithms represent documents in a common feature space that abstract them away from the specific sequence of words used in them. Probabilistic Topic Models reduce that feature space by annotating documents with thematic information. Over this low-dimensional latent space some locality-sensitive hashing algorithms have been proposed to perform document similarity search. However, thematic information gets hidden behind hash codes, preventing thematic exploration and limiting the explanatory capability of topics to justify content-based similarities. This paper presents a novel hashing algorithm based on approximate nearest-neighbor techniques that uses hierarchical sets of topics as hash codes. It not only performs efficient similarity searches, but also allows extending those queries with thematic restrictions explaining the similarity score from the most relevant topics. Extensive evaluations on both scientific and industrial text datasets validate the proposed algorithm in terms of accuracy and efficiency.<\/jats:p>","DOI":"10.3233\/sw-200373","type":"journal-article","created":{"date-parts":[[2020,5,1]],"date-time":"2020-05-01T13:32:15Z","timestamp":1588339935000},"page":"735-750","source":"Crossref","is-referenced-by-count":3,"title":["Large-scale semantic exploration of scientific literature using topic-based hashing algorithms"],"prefix":"10.1177","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2753-9917","authenticated-orcid":false,"given":"Carlos","family":"Badenes-Olmedo","sequence":"first","affiliation":[{"name":"Ontology Engineering Group, Universidad Polit\u00e9cnica de Madrid, Boadilla del Monte, Spain. E-mail:\u00a0cbadenes@fi.upm.es"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7413-447X","authenticated-orcid":false,"given":"Jos\u00e9 Luis","family":"Redondo-Garc\u00eda","sequence":"additional","affiliation":[{"name":"Amazon Research, Cambridge, UK. E-mail:\u00a0jluisred@amazon.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9260-0753","authenticated-orcid":false,"given":"Oscar","family":"Corcho","sequence":"additional","affiliation":[{"name":"Ontology Engineering Group, Universidad Polit\u00e9cnica de Madrid, Boadilla del Monte, Spain. E-mail:\u00a0ocorcho@fi.upm.es"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"issue":"1","key":"10.3233\/SW-200373_ref1","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1002\/asi.23574","article-title":"Evaluating topic representations for exploring document collections","volume":"68","author":"Aletras","year":"2017","journal-title":"Journal of the Association for Information Science and Technology"},{"key":"10.3233\/SW-200373_ref2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N18-3011"},{"key":"10.3233\/SW-200373_ref3","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611973402.76"},{"key":"10.3233\/SW-200373_ref4","doi-asserted-by":"publisher","DOI":"10.1145\/3103010.3121040"},{"key":"10.3233\/SW-200373_ref5","doi-asserted-by":"publisher","DOI":"10.1145\/3148011.3148019"},{"key":"10.3233\/SW-200373_ref6","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.3465855"},{"issue":"4","key":"10.3233\/SW-200373_ref7","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1109\/MSP.2010.938079","article-title":"Probabilistic topic models","volume":"55","author":"Blei","year":"2010","journal-title":"IEEE Signal Processing Magazine"},{"issue":"4\u20135","key":"10.3233\/SW-200373_ref8","doi-asserted-by":"publisher","first-page":"993","DOI":"10.1162\/jmlr.2003.3.4-5.993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"10.3233\/SW-200373_ref9","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509965"},{"issue":"12","key":"10.3233\/SW-200373_ref10","doi-asserted-by":"publisher","first-page":"2928","DOI":"10.1109\/TKDE.2014.2313872","article-title":"BTM: Topic modeling over short texts","volume":"26","author":"Cheng","year":"2014","journal-title":"IEEE Transactions on Knowledge & Data Engineering"},{"key":"10.3233\/SW-200373_ref11","doi-asserted-by":"publisher","DOI":"10.1145\/997817.997857"},{"issue":"6","key":"10.3233\/SW-200373_ref12","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9","article-title":"Indexing by latent semantic analysis","volume":"41","author":"Deerwester","year":"1990","journal-title":"Journal of the American Society for Information Science"},{"issue":"7","key":"10.3233\/SW-200373_ref13","doi-asserted-by":"publisher","first-page":"1858","DOI":"10.1109\/TIT.2003.813506","article-title":"A new metric for probability distributions","volume":"49","author":"Endres","year":"2006","journal-title":"IEEE Transactions on Information Theory"},{"issue":"1","key":"10.3233\/SW-200373_ref15","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1017\/pan.2016.7","article-title":"Exploring the political agenda of the European Parliament using a dynamic topic modeling approach","volume":"25","author":"Greene","year":"2017","journal-title":"Political Analysis"},{"issue":"Supplement 1","key":"10.3233\/SW-200373_ref16","doi-asserted-by":"publisher","first-page":"5228","DOI":"10.1073\/pnas.0307752101","article-title":"Finding scientific topics","volume":"101","author":"Griffiths","year":"2004","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"2","key":"10.3233\/SW-200373_ref17","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1037\/0033-295X.114.2.211","article-title":"Topics in semantic representation","volume":"114","author":"Griffiths","year":"2007","journal-title":"Psychological Review"},{"key":"10.3233\/SW-200373_ref18","doi-asserted-by":"publisher","DOI":"10.3115\/1613715.1613763"},{"key":"10.3233\/SW-200373_ref19","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2017.24"},{"key":"10.3233\/SW-200373_ref20","doi-asserted-by":"publisher","DOI":"10.1145\/3130348.3130370"},{"issue":"6","key":"10.3233\/SW-200373_ref21","doi-asserted-by":"publisher","first-page":"2309","DOI":"10.1109\/TNNLS.2017.2689242","article-title":"Online hashing","volume":"29","author":"Huang","year":"2018","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"10.3233\/SW-200373_ref22","doi-asserted-by":"publisher","DOI":"10.1145\/276698.276876"},{"key":"10.3233\/SW-200373_ref23","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2013.119"},{"key":"10.3233\/SW-200373_ref24","unstructured":"K.\u00a0Krstovski and D.A.\u00a0Smith, A minimally supervised approach for detecting and ranking document translation pairs, in: Proceedings of the Sixth Workshop on Statistical Machine Translation, ACM Press, Edinburgh, Scotland, UK, 2011, pp.\u00a0207\u2013216, isbn:9781937284121."},{"key":"10.3233\/SW-200373_ref25","doi-asserted-by":"publisher","DOI":"10.1145\/2499178.2499189"},{"issue":"6","key":"10.3233\/SW-200373_ref26","doi-asserted-by":"publisher","first-page":"1092","DOI":"10.1109\/TPAMI.2011.219","article-title":"Kernelized locality-sensitive hashing","volume":"34","author":"Kulis","year":"2012","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"10.3233\/SW-200373_ref27","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772759"},{"key":"10.3233\/SW-200373_ref28","unstructured":"P.\u00a0Li, A.B.\u00a0Owen and C.H.\u00a0Zhang, One permutation hashing, in: Advances in Neural Information Processing Systems, Vol.\u00a04, Curran Associates, Inc., 2012, pp.\u00a03113\u20133121, isbn:9781627480031."},{"issue":"9","key":"10.3233\/SW-200373_ref29","doi-asserted-by":"publisher","first-page":"745","DOI":"10.14778\/2732939.2732947","article-title":"K-LSH: An efficient index structure for approximate nearest neighbor search","volume":"7","author":"Liu","year":"2014","journal-title":"Proceedings of the VLDB Endowment"},{"key":"10.3233\/SW-200373_ref30","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1016\/j.jbi.2016.02.003","article-title":"Modeling healthcare data using multiple-channel latent Dirichlet allocation","volume":"60","author":"Lu","year":"2016","journal-title":"Journal of Biomedical Informatics"},{"key":"10.3233\/SW-200373_ref31","doi-asserted-by":"crossref","unstructured":"X.\u00a0Mao, B.-S.\u00a0Feng, Y.-J.\u00a0Hao, L.\u00a0Nie, H.\u00a0Huang and G.\u00a0Wen, S2JSD-LSH: A locality-sensitive hashing schema for probability distributions, in: Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI Press, San Francisco, California, USA, 2017, pp.\u00a03244\u20133251, issn:2374-3468.","DOI":"10.1609\/aaai.v31i1.10989"},{"key":"10.3233\/SW-200373_ref33","unstructured":"J.\u00a0O\u2019Neill, C.\u00a0Robin, L.\u00a0O\u2019Brien and P.\u00a0Buitelaar, An analysis of topic modelling for legislative texts, in: Proceedings of the 2nd Workshop on Automated Semantic Analysis of Information in Legal Texts, Co-Located with the 16th International Conference on Artificial Intelligence and Law, CEUR-WS.org, London, UK, 2016."},{"key":"10.3233\/SW-200373_ref34","unstructured":"S.\u00a0Petrovic, M.\u00a0Osborne and V.\u00a0Lavrenko, Streaming first story detection with application to Twitter, in: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ACL Press, Los Angeles, California, USA, 2010, pp.\u00a0181\u2013189, isbn:978-1-932432-65-7."},{"key":"10.3233\/SW-200373_ref35","doi-asserted-by":"crossref","unstructured":"D.\u00a0Ramage, S.\u00a0Dumais and D.\u00a0Liebling, Characterizing microblogs with topic models, in: International AAAI Conference on Weblogs and Social Media, American Association for Artificial Intelligence, Washington, DC, USA, 2010, pp.\u00a01\u20138, isbn:9781577354451.","DOI":"10.1609\/icwsm.v4i1.14026"},{"key":"10.3233\/SW-200373_ref36","doi-asserted-by":"publisher","DOI":"10.2196\/medinform.7779"},{"key":"10.3233\/SW-200373_ref37","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-73951-7_4"},{"issue":"1","key":"10.3233\/SW-200373_ref38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2890510","article-title":"Measuring similarity similarly","volume":"8","author":"Towne","year":"2016","journal-title":"ACM Transactions on Intelligent Systems and Technology"},{"issue":"1","key":"10.3233\/SW-200373_ref39","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1007\/s00167-009-0884-z","article-title":"Histologic analysis of ruptured quadriceps tendons","volume":"18","author":"Trobisch","year":"2010","journal-title":"Knee Surgery, Sports Traumatology, Arthroscopy"},{"issue":"2","key":"10.3233\/SW-200373_ref40","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1109\/TPAMI.2013.121","article-title":"Hashing hyperplane queries to near points with applications to large-scale active learning","volume":"36","author":"Vijayanarasimhan","year":"2014","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"1","key":"10.3233\/SW-200373_ref41","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1109\/JPROC.2015.2487976","article-title":"Learning to hash for indexing big data-a survey","volume":"104","author":"Wang","year":"2016","journal-title":"Proceedings of the IEEE"},{"key":"10.3233\/SW-200373_ref42","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502152"},{"issue":"1","key":"10.3233\/SW-200373_ref43","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1109\/TCYB.2015.2392052","article-title":"Spectral multimodal hashing and its application to multimedia retrieval","volume":"46","author":"Zhen","year":"2016","journal-title":"IEEE Transactions on Cybernetics"}],"container-title":["Semantic Web"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/SW-200373","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T05:24:49Z","timestamp":1777613089000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/SW-200373"}},"subtitle":[],"editor":[{"given":"Tomi","family":"Kauppinen","sequence":"additional","affiliation":[{"name":"Aalto University, Finland"}],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Daniel","family":"Garijo","sequence":"additional","affiliation":[{"name":"University of Southern California, USA"}],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Natalia","family":"Villanueva","sequence":"additional","affiliation":[{"name":"The University of Texas at El Paso, USA"}],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Daniel","family":"Garijo","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Natalia","family":"Villanueva-Rosales","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]},{"given":"Tomi","family":"Kauppinen","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2020,8,25]]},"references-count":41,"journal-issue":{"issue":"5"},"URL":"https:\/\/doi.org\/10.3233\/sw-200373","relation":{},"ISSN":["2210-4968","1570-0844"],"issn-type":[{"value":"2210-4968","type":"electronic"},{"value":"1570-0844","type":"print"}],"subject":[],"published":{"date-parts":[[2020,8,25]]}}}