{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T16:33:28Z","timestamp":1773246808227,"version":"3.50.1"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"2","license":[{"start":{"date-parts":[[2022,2,23]],"date-time":"2022-02-23T00:00:00Z","timestamp":1645574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"The Russian Science Foundation","award":["20-11-20270"],"award-info":[{"award-number":["20-11-20270"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,3,30]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Topic modelling is a popular unsupervised method for text processing that provides interpretable document representation. One of the most high-level approaches is additively regularized topic models (ARTM). This method features better quality than other methods due to its flexibility and advanced regularization abilities. However, it is challenging to find an optimal learning strategy to create high-quality topics because a user needs to select the regularizers with their values and determine the order of application. Moreover, it may require many real runs or model training which makes this task time consuming. At the current moment, there is a lack of research on parameter optimization for ARTM-based models. Our work proposes an approach that formalizes the learning strategy into a vector of parameters which can be solved with evolutionary approach. We also propose a surrogate-based modification which utilizes machine learning methods that makes the approach for parameters search time efficient. We investigate different optimization algorithms (evolutionary and Bayesian) and their modifications with surrogates in application to topic modelling optimization using the proposed learning strategy approach. An experimental study conducted on English and Russian datasets indicates that the proposed approaches are able to find high-quality parameter solutions for ARTM and substantially reduce the execution time of the search.<\/jats:p>","DOI":"10.1093\/jigpal\/jzac019","type":"journal-article","created":{"date-parts":[[2022,1,25]],"date-time":"2022-01-25T12:19:36Z","timestamp":1643113176000},"page":"287-299","source":"Crossref","is-referenced-by-count":3,"title":["Surrogate-based optimization of learning strategies for additively regularized topic models"],"prefix":"10.1093","volume":"31","author":[{"given":"Maria","family":"Khodorchenko","sequence":"first","affiliation":[{"name":"ITMO University , 49 Kronverksky pr., St Petersburg, 197101, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikolay","family":"Butakov","sequence":"additional","affiliation":[{"name":"ITMO University , 49 Kronverksky pr., St Petersburg, 197101, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Timur","family":"Sokhin","sequence":"additional","affiliation":[{"name":"ITMO University , 49 Kronverksky pr., St Petersburg, 197101, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sergey","family":"Teryoshkin","sequence":"additional","affiliation":[{"name":"ITMO University , 49 Kronverksky pr., St Petersburg, 197101, Russia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,2,23]]},"reference":[{"key":"2023033115514210200_","first-page":"733","article-title":"Model-based genetic algorithms for algorithm configuration","volume-title":"Proceedings of the 24th International Conference on Artificial Intelligence","author":"Ans\u00f3tegui","year":"2015"},{"key":"2023033115514210200_","first-page":"169","article-title":"Additive regularization for topic modeling in sociological studies of user-generated texts","volume-title":"Mexican International Conference on Artificial Intelligence","author":"Apishev","year":"2017"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/ICAICT.2014.7035936","article-title":"Co-evolutional genetic algorithm for workflow scheduling in heterogeneous distributed environment","volume-title":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","author":"Butakov","year":"2014"},{"key":"2023033115514210200_","first-page":"1","article-title":"Unified domain-specific language for collecting and processing data of social media","author":"Butakov","year":"2018","journal-title":"Information Systems"},{"key":"2023033115514210200_","volume-title":"Artificial Intelligence Applications and Innovations: 15th IFIP WG 12.5 International Conference, AIAI 2019, Hersonissos, Crete, Greece, May 24\u201326, 2019, Proceedings","author":"Dasari","year":"2019"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1016\/j.cor.2011.06.007","article-title":"A modified artificial bee colony algorithm","volume":"39","author":"Gao","year":"2012","journal-title":"Computers & Operations Research"},{"key":"2023033115514210200_","first-page":"5937","article-title":"Principled selection of hyperparameters in the latent Dirichlet allocation model","volume":"18","author":"George","year":"2017","journal-title":"Journal of Machine Learning Research"},{"key":"2023033115514210200_","first-page":"289","article-title":"Probabilistic latent semantic analysis","volume-title":"Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, UAI\u201999","author":"Hofmann","year":"1999"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.swevo.2011.05.001","article-title":"Surrogate-assisted evolutionary computation: recent advances and future challenges","volume":"1","author":"Jin","year":"2011","journal-title":"Swarm and Evolutionary Computation"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"687","DOI":"10.1016\/j.asoc.2007.05.007","article-title":"On the performance of artificial bee colony (abc) algorithm","volume":"8","author":"Karaboga","year":"2008","journal-title":"Applied Soft Computing"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"284","DOI":"10.1007\/978-3-030-61705-9_24","article-title":"Optimization of learning strategies for artm-based topic models","volume-title":"Hybrid Artificial Intelligent Systems: 15th International Conference, HAIS 2020, Gij\u00f3n, Spain, November 11-13, 2020, Proceedings","author":"Khodorchenko","year":"2020"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1016\/B978-1-55860-377-6.50048-7","article-title":"Newsweeder: Learning to filter netnews","volume-title":"Machine Learning Proceedings 1995","author":"Lang","year":"1995"},{"key":"2023033115514210200_","first-page":"530","article-title":"Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality","volume-title":"In: Proceedings of the 14th Conference of the European Chapter of the ACL","author":"Lau","year":"2014"},{"key":"2023033115514210200_","first-page":"43","article-title":"The irace package: Iterated racing for automatic algorithm configuration. Operations Research","volume":"3","author":"L\u00f3pez-Ib\u00e1\u00f1ez","year":"2016","journal-title":"Perspectives"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"897","DOI":"10.1145\/2488388.2488466","article-title":"From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews","volume-title":"Proceedings of the 22nd International Conference on World Wide Web","author":"McAuley","year":"2013"},{"key":"2023033115514210200_","first-page":"100","article-title":"Automatic evaluation of topic coherence","volume-title":"Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Newman","year":"2010"},{"key":"2023033115514210200_","first-page":"100","article-title":"Automatic evaluation of topic coherence","volume-title":"Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL","author":"Newman","year":"2010"},{"key":"2023033115514210200_","first-page":"1029","article-title":"Topic quality metrics based on distributed word representations","volume-title":"Proceedings of the 39th International ACM SIGIR Conference","author":"Nikolenko","year":"2016"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"5645","DOI":"10.1016\/j.eswa.2015.02.055","article-title":"An analysis of the coherence of descriptors in topic modeling","volume":"42","author":"O\u2019Callaghan","year":"2015","journal-title":"Expert Systems with Applications"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2018\/2497471","article-title":"Biomedical text categorization based on ensemble pruning and optimized topic modelling","volume":"2018","author":"Onan","year":"2018","journal-title":"Computational and Mathematical Methods in Medicine"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1007\/978-3-030-27455-9_2","article-title":"A systematic comparison of search algorithms for topic modelling\u2014a study on duplicate bug report identification","volume-title":"Search-Based Software Engineering: 11th International Symposium, SSBSE 2019, Tallinn, Estonia, August 31\u2014September 1, 2019, Proceedings","author":"Panichella","year":"2019"},{"key":"2023033115514210200_","volume-title":"Differential Evolution: A Practical Approach to Global Optimization","author":"Price","year":"2005"},{"key":"2023033115514210200_","article-title":"Handling the impact of low frequency events on co-occurrence based measures of word similarity\u2014a case study of pointwise mutual information","author":"Role","year":"2011"},{"key":"2023033115514210200_","first-page":"370","volume-title":"Bigartm: open source library for regularized multimodal topic modeling of large collections","author":"Vorontsov","year":"2015"},{"key":"2023033115514210200_","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-17091-6_14","volume-title":"Additive regularization of topic models for topic selection and sparse factorization","author":"Vorontsov","year":"2015"},{"key":"2023033115514210200_","volume-title":"Corpus of russian news articles collected from lenta.ru","author":"Yutkin"}],"container-title":["Logic Journal of the IGPL"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jigpal\/article-pdf\/31\/2\/287\/49705918\/jzac019.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jigpal\/article-pdf\/31\/2\/287\/49705918\/jzac019.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,16]],"date-time":"2023-11-16T07:23:45Z","timestamp":1700119425000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jigpal\/article\/31\/2\/287\/6534494"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,23]]},"references-count":26,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2,23]]},"published-print":{"date-parts":[[2023,3,30]]}},"URL":"https:\/\/doi.org\/10.1093\/jigpal\/jzac019","relation":{},"ISSN":["1367-0751","1368-9894"],"issn-type":[{"value":"1367-0751","type":"print"},{"value":"1368-9894","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,4]]},"published":{"date-parts":[[2022,2,23]]}}}