{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T08:23:08Z","timestamp":1770279788834,"version":"3.49.0"},"reference-count":38,"publisher":"SAGE Publications","issue":"5","license":[{"start":{"date-parts":[[2019,4,15]],"date-time":"2019-04-15T00:00:00Z","timestamp":1555286400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"published-print":{"date-parts":[[2019,5,14]]},"abstract":"<jats:p>\u00a0Detection of topics in Natural Language text collections is an important step towards flexible automated text handling, for tasks like text translation, summarization, etc. In the current dominant paradigm to topic modeling, topics are represented as probability distributions of terms. Although such models are theoretically sound, their high computational complexity makes them difficult to use in very large scale collections. In this work we propose an alternative topic modeling paradigm based on a simpler representation of topics as overlapping clusters of semantically similar documents, that is able to take advantage of highly-scalable clustering algorithms. Our Query-based Topic Modeling framework (QTM) is an information-theoretic method that assumes the existence of a \u201cgolden\u201d set of queries that can capture most of the semantic information of the collection and produce models with maximum \u201csemantic coherence\u201d. QTM was designed with scalability in mind and was executed in parallel using a Map-Reduce implementation; further, we show complexity measures that support our scalability claims. Our experiments show that the QTM can produce models of comparable or even superior quality than those produced by state of the art probabilistic methods.<\/jats:p>","DOI":"10.3233\/jifs-179015","type":"journal-article","created":{"date-parts":[[2019,4,16]],"date-time":"2019-04-16T17:08:04Z","timestamp":1555434484000},"page":"4645-4657","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Scalable text semantic clustering around topics"],"prefix":"10.1177","volume":"36","author":[{"given":"Ramon","family":"Brena","sequence":"first","affiliation":[{"name":"Tecnologico de Monterrey, Av. E. Garza Sada 2501, Monterrey, Mexico"}]},{"given":"Eduardo","family":"Ramirez","sequence":"additional","affiliation":[{"name":"Tecnologico de Monterrey, Av. E. Garza Sada 2501, Monterrey, Mexico"}]}],"member":"179","published-online":{"date-parts":[[2019,4,15]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564429"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/32206.32212"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/1148170.1148204"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1214\/07-AOAS114"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2003.1196112"},{"key":"e_1_3_2_7_2","unstructured":"RamirezE. Large Scale Topic Modeling Using Search Queries: An Information-theoretic Approach PhD thesis Tecnologico de Monterrey 2010."},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1093\/ijl\/3.4.235"},{"key":"e_1_3_2_9_2","volume-title":"Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04)","author":"Pedersen T.","year":"2004","unstructured":"PedersenT., PatwardhanS. and MichelizziJ., Word-Net::Similarity - Measuring the Relatedness of Concepts, in: Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI-04), 2004."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/1099554.1099696"},{"key":"e_1_3_2_11_2","first-page":"31","volume-title":"Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference,","author":"Mandala R.","year":"1998","unstructured":"MandalaR., TokunagaT. and TanakaH., The Use of Word-Net in Information Retrieval, in: Use of WordNet in Natural Language Processing Systems: Proceedings of the Conference,HarabagiuS., ed., Association for Comutational Linguistics, Somerset, New Jersey, 1998, 31\u201337."},{"key":"e_1_3_2_12_2","first-page":"275","article-title":"Re-ranking Passages with LSA in a Question Answering System","author":"Tom\u00e1s D.","year":"2010","unstructured":"Tom\u00e1sD. and VicedoJ., Re-ranking Passages with LSA in a Question Answering System, Evaluation of Multilingual and Multi-modal Information Retrieval (2010), 275\u2013279.","journal-title":"Evaluation of Multilingual and Multi-modal Information Retrieval"},{"issue":"1","key":"e_1_3_2_13_2","first-page":"1","article-title":"Semantic associations for contextual advertising","volume":"9","author":"Ciaramita M.","year":"2008","unstructured":"CiaramitaM., MurdockV. and PlachourasV., Semantic associations for contextual advertising, Journal of Electronic Commerce Research9(1) (2008), 1\u201315.","journal-title":"Journal of Electronic Commerce Research"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/312624.312649"},{"key":"e_1_3_2_16_2","first-page":"993","article-title":"Latent Dirichlet allocation","volume":"3","author":"Blei D.","year":"2003","unstructured":"BleiD., NgA. and JordanM., Latent Dirichlet allocation, Journal of Machine Learning Research3 (2003), 993\u20131022.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/2133806.2133826"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.114.2.211"},{"key":"e_1_3_2_19_2","volume-title":"Handbook of Latent Semantic Analysis,","author":"Steyvers M.","year":"2007","unstructured":"SteyversM. and GriffithsT., Probabilistic Topic Models, in: Handbook of Latent Semantic Analysis,, LandauerT., McnamaraD., DennisS. and KintschW., eds, Lawrence Erlbaum Associates, 2007."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956764"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1146847.1146881"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.142"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/1183614.1183777"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.3115\/981574.981598"},{"key":"e_1_3_2_25_2","volume-title":"Elements of Information Theory,","author":"Cover T.M.","year":"1991","unstructured":"CoverT.M., ThomasJ., Elements of Information Theory,Wiley, 1991."},{"key":"e_1_3_2_26_2","article-title":"The information bottleneck method","author":"Tishby N.","year":"2000","unstructured":"TishbyN., PereiraF.C. and BialekW., The information bottleneck method, arXiv preprint physics\/0004057 (2000).","journal-title":"arXiv preprint physics\/0004057"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564401"},{"key":"e_1_3_2_28_2","first-page":"208","volume-title":"ACM SIGIR 2000,","author":"Slonim N.","year":"2000","unstructured":"SlonimN. and TishbyN., Document Clustering using WordClusters via the Information Bottleneck Method, in: ACM SIGIR 2000,ACM Press, 2000, pp. 208\u2013215."},{"key":"e_1_3_2_29_2","doi-asserted-by":"crossref","unstructured":"ShanH. and BanerjeeA. Bayesian Co-clustering in: ICDM\u201908: Proceedings of the 2008 Eighth IEEE International Conferenceon Data Mining IEEE Comuter Society Washington DC USA 2008 pp. 530\u2013539.","DOI":"10.1109\/ICDM.2008.91"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04174-7_34"},{"key":"e_1_3_2_31_2","first-page":"1008","volume-title":"Advances in Neural Information Processing Systems 24,","author":"Sontag D.","year":"2011","unstructured":"SontagD. and RoyD., Complexity of Inference in Latent Dirichlet Allocation, in: Advances in Neural Information Processing Systems 24,Shawe-TaylorJ., ZemelR.S., BartlettL., PereiraF. and WeinbergerK.Q., eds, Curran Associates, Inc., 2011, pp. 1008\u20131016."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401960"},{"key":"e_1_3_2_33_2","article-title":"DistributedInference for Latent Dirichlet Allocation","volume":"20","author":"Newman D.","year":"2007","unstructured":"NewmanD., AsuncionA., SmythP. and WellingM., DistributedInference for Latent Dirichlet Allocation, in: Advances in Neural Information Processing Systems,, Vol. 20, 2007.","journal-title":"Advances in Neural Information Processing Systems,"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623691"},{"key":"e_1_3_2_35_2","first-page":"795","volume-title":"Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics andthe 7th International Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers),","volume":"1","author":"Das R.","year":"2015","unstructured":"DasR., ZaheerM. and DyerC., Gaussian LDA for topic Modelswith Word Embeddings, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics andthe 7th International Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers),1 (2015), 795\u2013804."},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390334.1390431"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1983.10478008"},{"issue":"383","key":"e_1_3_2_38_2","first-page":"569","article-title":"A Method for Comparing Two Hierarchical Clusterings: Comment","volume":"78","author":"Wallace D.L.","year":"1983","unstructured":"WallaceD.L., A Method for Comparing Two Hierarchical Clusterings: Comment, Journal of the American Statistical Association78(383) (1983), 569\u2013576.","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2011.04.032"}],"container-title":["Journal of Intelligent &amp; Fuzzy Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-179015","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.3233\/JIFS-179015","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.3233\/JIFS-179015","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T18:21:04Z","timestamp":1770229264000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.3233\/JIFS-179015"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,15]]},"references-count":38,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2019,5,14]]}},"alternative-id":["10.3233\/JIFS-179015"],"URL":"https:\/\/doi.org\/10.3233\/jifs-179015","relation":{},"ISSN":["1064-1246","1875-8967"],"issn-type":[{"value":"1064-1246","type":"print"},{"value":"1875-8967","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,4,15]]}}}