{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:19:59Z","timestamp":1740122399852,"version":"3.37.3"},"reference-count":54,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000780","name":"European Commission","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.<\/jats:p>","DOI":"10.1007\/s10618-021-00811-2","type":"journal-article","created":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T07:03:04Z","timestamp":1641279784000},"page":"709-738","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Provable randomized rounding for minimum-similarity diversification"],"prefix":"10.1007","volume":"36","author":[{"given":"Bruno","family":"Ordozgoiti","sequence":"first","affiliation":[]},{"given":"Ananth","family":"Mahadevan","sequence":"additional","affiliation":[]},{"given":"Antonis","family":"Matakos","sequence":"additional","affiliation":[]},{"given":"Aristides","family":"Gionis","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,4]]},"reference":[{"key":"811_CR1","doi-asserted-by":"crossref","unstructured":"Abbassi Z, Mirrokni VS, Thakur M (2013) Diversity maximization under matroid constraints, pp 32\u201340","DOI":"10.1145\/2487575.2487636"},{"key":"811_CR2","unstructured":"Ashkan A, Kveton B, Berkovsky S, Wen Z (2015) Optimal greedy diversity for recommendation"},{"key":"811_CR3","doi-asserted-by":"crossref","unstructured":"Bansal N, Jain K, Kazeykina A, Naor JS (2010) Approximation algorithms for diversified search ranking. In: International colloquium on automata, languages, and programming. Springer, pp 273\u2013284","DOI":"10.1007\/978-3-642-14162-1_23"},{"issue":"2","key":"811_CR4","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1109\/MCSE.2010.118","volume":"13","author":"S Behnel","year":"2011","unstructured":"Behnel S, Bradshaw R, Citro C, Dalcin L, Seljebotn DS, Smith K (2011) Cython: the best of both worlds. Comput Sci Eng 13(2):31\u201339. https:\/\/doi.org\/10.1109\/MCSE.2010.118","journal-title":"Comput Sci Eng"},{"key":"811_CR5","unstructured":"Bhaskara A, Ghadiri M, Mirrokni V, Svensson O (2016) Linear relaxations for finding diverse elements in metric spaces. In: Advances in neural information processing systems, pp 4098\u20134106"},{"issue":"2","key":"811_CR6","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1051\/ro:2008011","volume":"42","author":"A Billionnet","year":"2008","unstructured":"Billionnet A, Elloumi S, Plateau MC (2008) Quadratic 0\u20131 programming: tightening linear or quadratic convex reformulation by use of relaxations. RAIRO-Oper Res 42(2):103\u2013121","journal-title":"RAIRO-Oper Res"},{"issue":"3","key":"811_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3086464","volume":"13","author":"A Borodin","year":"2017","unstructured":"Borodin A, Jain A, Lee HC, Ye Y (2017) Max-sum diversification, monotone submodular functions, and dynamic updates. ACM Trans Algorithms 13(3):1\u201325","journal-title":"ACM Trans Algorithms"},{"key":"811_CR8","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511804441","volume-title":"Convex optimization","author":"S Boyd","year":"2004","unstructured":"Boyd S, Boyd SP, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge"},{"key":"811_CR9","doi-asserted-by":"crossref","unstructured":"Calinescu G, Chekuri C, P\u00e1l M, Vondr\u00e1k J (2007) Maximizing a submodular set function subject to a matroid constraint, pp 182\u2013196","DOI":"10.1007\/978-3-540-72792-7_15"},{"key":"811_CR10","doi-asserted-by":"crossref","unstructured":"Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results","DOI":"10.1145\/1963192.1963202"},{"key":"811_CR11","doi-asserted-by":"crossref","unstructured":"Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries, pp 335\u2013336","DOI":"10.1145\/290941.291025"},{"issue":"1","key":"811_CR12","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1016\/0166-218X(84)90111-2","volume":"7","author":"MW Carter","year":"1984","unstructured":"Carter MW (1984) The indefinite zero-one quadratic problem. Discrete Appl Math 7(1):23\u201344","journal-title":"Discrete Appl Math"},{"issue":"2","key":"811_CR13","doi-asserted-by":"publisher","first-page":"438","DOI":"10.1006\/jagm.2000.1145","volume":"38","author":"B Chandra","year":"2001","unstructured":"Chandra B, Halld\u00f3rsson MM (2001) Approximation algorithms for dispersion problems. J Algorithms 38(2):438\u2013465","journal-title":"J Algorithms"},{"key":"811_CR14","doi-asserted-by":"crossref","unstructured":"Charikar M, Li S (2012) A dependent lp-rounding approach for the k-median problem. In: International colloquium on automata, languages, and programming. Springer, pp 194\u2013205","DOI":"10.1007\/978-3-642-31594-7_17"},{"key":"811_CR15","doi-asserted-by":"publisher","unstructured":"Chekuri C, Vondrak J, Zenklusen R (2010) Dependent randomized rounding via exchange properties of combinatorial structures, vol 10. IEEE Computer Society, USA, pp 575\u2013584. https:\/\/doi.org\/10.1109\/FOCS.2010.60","DOI":"10.1109\/FOCS.2010.60"},{"key":"811_CR16","doi-asserted-by":"crossref","unstructured":"Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, B\u00fcttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation, pp 659\u2013666","DOI":"10.1145\/1390334.1390446"},{"issue":"3","key":"811_CR17","doi-asserted-by":"publisher","first-page":"1317","DOI":"10.1214\/aoms\/1177703287","volume":"35","author":"JN Darroch","year":"1964","unstructured":"Darroch JN et al (1964) On the distribution of the number of successes in independent trials. Ann Math Stat 35(3):1317\u20131321","journal-title":"Ann Math Stat"},{"key":"811_CR18","doi-asserted-by":"crossref","unstructured":"Doerr B (2005) Roundings respecting hard constraints, pp 617\u2013628","DOI":"10.1007\/978-3-540-31856-9_51"},{"key":"811_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2594409","volume":"19","author":"B Doerr","year":"2015","unstructured":"Doerr B, Wahlstr\u00f6m M (2015) Randomized rounding in the presence of a cardinality constraint. J Exp Algorithmics 19:1\u20131","journal-title":"J Exp Algorithmics"},{"key":"811_CR20","doi-asserted-by":"crossref","unstructured":"Doerr B, Wahlstr\u00f6m M (2016) How to generate randomized roundings with dependencies and how to derandomize them. In: Algorithm engineering. Springer, pp 159\u2013184","DOI":"10.1007\/978-3-319-49487-6_5"},{"key":"811_CR21","doi-asserted-by":"crossref","unstructured":"Dou Z, Hu S, Chen K, Song R, Wen JR (2011) Multi-dimensional search result diversification, pp 475\u2013484","DOI":"10.1145\/1935826.1935897"},{"key":"811_CR22","unstructured":"Dua D, Graff C (2017) Uci machine learning repository"},{"issue":"2","key":"811_CR23","doi-asserted-by":"publisher","first-page":"803","DOI":"10.1109\/TAES.2010.5461658","volume":"46","author":"M Fern\u00e1ndez","year":"2010","unstructured":"Fern\u00e1ndez M, Williams S (2010) Closed-form expression for the poisson-binomial probability density function. IEEE Trans Aerosp Electron Syst 46(2):803\u2013817","journal-title":"IEEE Trans Aerosp Electron Syst"},{"issue":"2","key":"811_CR24","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1007\/s12532-018-0147-4","volume":"11","author":"F Furini","year":"2019","unstructured":"Furini F, Traversi E, Belotti P, Frangioni A, Gleixner A, Gould N, Liberti L, Lodi A, Misener R, Mittelmann H et al (2019) Qplib: a library of quadratic programming instances. Math Program Comput 11(2):237\u2013265","journal-title":"Math Program Comput"},{"key":"811_CR25","doi-asserted-by":"crossref","unstructured":"Gandhi R, Khuller S, Parthasarathy S, Srinivasan A (2002) Dependent rounding in bipartite graphs. In: The 43rd annual IEEE symposium on foundations of computer science, 2002 Proceedings. IEEE, pp 323\u2013332","DOI":"10.1109\/SFCS.2002.1181955"},{"key":"811_CR26","doi-asserted-by":"crossref","unstructured":"Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification, pp 381\u2013390","DOI":"10.1145\/1526709.1526761"},{"key":"811_CR27","unstructured":"He J, Tong H, Mei Q, Szymanski B (2012) Gender: a generic diversified ranking algorithm. In: Advances in neural information processing systems, pp 1142\u20131150"},{"key":"811_CR28","doi-asserted-by":"crossref","unstructured":"Hildebrand R, Weismantel R, Zemmer K (2016) An fptas for minimizing indefinite quadratic forms over integers in polyhedra, pp 1715\u20131723","DOI":"10.1137\/1.9781611974331.ch118"},{"issue":"4B","key":"811_CR29","doi-asserted-by":"publisher","first-page":"3638","DOI":"10.3150\/16-BEJ860","volume":"23","author":"E Hillion","year":"2017","unstructured":"Hillion E, Johnson O et al (2017) A proof of the Shepp\u2013Olkin entropy concavity conjecture. Bernoulli 23(4B):3638\u20133649","journal-title":"Bernoulli"},{"issue":"3","key":"811_CR30","doi-asserted-by":"publisher","first-page":"713","DOI":"10.1214\/aoms\/1177728178","volume":"27","author":"W Hoeffding","year":"1956","unstructured":"Hoeffding W (1956) On the distribution of the number of successes in independent trials. Ann Math Stat 27(3):713\u2013721","journal-title":"Ann Math Stat"},{"key":"811_CR31","doi-asserted-by":"crossref","unstructured":"Jain K, Vazirani VV (1999) Primal-dual approximation algorithms for metric facility location and k-median problems. In: 40th Annual symposium on foundations of computer science (Cat. No. 99CB37039). IEEE, pp 2\u201313","DOI":"10.1109\/SFFCS.1999.814571"},{"issue":"4","key":"811_CR32","doi-asserted-by":"publisher","first-page":"527","DOI":"10.1002\/1520-6750(199008)37:4<527::AID-NAV3220370407>3.0.CO;2-P","volume":"37","author":"B Kalantari","year":"1990","unstructured":"Kalantari B, Bagchi A (1990) An algorithm for quadratic zero-one programs. Naval Res Logist 37(4):527\u2013538","journal-title":"Naval Res Logist"},{"key":"811_CR33","doi-asserted-by":"crossref","unstructured":"K\u00fc\u00e7\u00fcktun\u00e7 O, Saule E, Kaya K, \u00c7ataly\u00fcrek \u00dcV (2013) Diversified recommendation on graphs: pitfalls, measures, and algorithms, pp 715\u2013726","DOI":"10.1145\/2488388.2488451"},{"issue":"1","key":"811_CR34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10589-016-9856-7","volume":"66","author":"RM Lima","year":"2017","unstructured":"Lima RM, Grossmann IE (2017) On the solution of nonconvex cardinality boolean quadratic programming problems: a computational study. Comput Optim Appl 66(1):1\u201337","journal-title":"Comput Optim Appl"},{"key":"811_CR35","doi-asserted-by":"publisher","unstructured":"Lucchese C, Nardini FM, Perego R, Orlando S, Trani S (2018) Selective gradient boosting for effective learning to rank. In: The 41st international ACM SIGIR conference on research & development in information retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR 2018, pp 155\u2013164. https:\/\/doi.org\/10.1145\/3209978.3210048","DOI":"10.1145\/3209978.3210048"},{"issue":"1","key":"811_CR36","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1007\/BF01588971","volume":"14","author":"GL Nemhauser","year":"1978","unstructured":"Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions-I. Math Program 14(1):265\u2013294","journal-title":"Math Program"},{"issue":"1","key":"811_CR37","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1016\/S0377-0427(00)00433-7","volume":"124","author":"FA Potra","year":"2000","unstructured":"Potra FA, Wright SJ (2000) Interior-point methods. J Comput Appl Math 124(1):281\u2013302. https:\/\/doi.org\/10.1016\/S0377-0427(00)00433-7","journal-title":"J Comput Appl Math"},{"key":"811_CR38","unstructured":"Qin T, Liu TY (2013) Introducing letor 4.0 datasets. arXiv:1306.2597"},{"issue":"4","key":"811_CR39","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1007\/s10791-009-9123-y","volume":"13","author":"T Qin","year":"2010","unstructured":"Qin T, Liu TY, Xu J, Li H (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13(4):346\u2013374. https:\/\/doi.org\/10.1007\/s10791-009-9123-y","journal-title":"Inf Retr"},{"key":"811_CR40","doi-asserted-by":"crossref","unstructured":"Radlinski F, Dumais S (2006) Improving personalized web search using result diversification, pp 691\u2013692","DOI":"10.1145\/1148170.1148320"},{"key":"811_CR41","doi-asserted-by":"crossref","unstructured":"Rafiei D, Bharat K, Shukla A (2010) Diversifying web search results, pp 781\u2013790","DOI":"10.1145\/1772690.1772770"},{"issue":"4","key":"811_CR42","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/BF02579324","volume":"7","author":"P Raghavan","year":"1987","unstructured":"Raghavan P, Tompson CD (1987) Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica 7(4):365\u2013374","journal-title":"Combinatorica"},{"key":"811_CR43","doi-asserted-by":"crossref","unstructured":"Santos RL, Macdonald C, Ounis I (2010) Selectively diversifying web search results, pp 1179\u20131188","DOI":"10.1145\/1871437.1871586"},{"key":"811_CR44","doi-asserted-by":"crossref","unstructured":"Srinivasan A (2001) Distributions on level-sets with applications to approximation algorithms, pp 588\u2013597","DOI":"10.1109\/SFCS.2001.959935"},{"key":"811_CR45","doi-asserted-by":"crossref","unstructured":"Tong H, He J, Wen Z, Konuru R, Lin CY (2011) Diversified ranking on large graphs: an optimization viewpoint, pp 1028\u20131036","DOI":"10.1145\/2020408.2020573"},{"key":"811_CR46","doi-asserted-by":"crossref","unstructured":"Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews, pp 168\u2013176","DOI":"10.1145\/2020408.2020440"},{"issue":"1\u20133","key":"811_CR47","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1007\/BF01581085","volume":"57","author":"SA Vavasis","year":"1992","unstructured":"Vavasis SA (1992) Approximation algorithms for indefinite quadratic programming. Math Program 57(1\u20133):279\u2013311","journal-title":"Math Program"},{"key":"811_CR48","unstructured":"Wang YH (1993) On the number of successes in independent trials. Statistica Sinica 295\u2013312"},{"key":"811_CR49","unstructured":"Watrigant R, Bougeret M, Giroudeau R (2012) The k-sparsest subgraph problem. HAL Arch Ouvertes"},{"key":"811_CR50","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511921735","volume-title":"The design of approximation algorithms","author":"DP Williamson","year":"2011","unstructured":"Williamson DP, Shmoys DB (2011) The design of approximation algorithms. Cambridge University Press, Cambridge"},{"key":"811_CR51","unstructured":"Zadeh SA, Ghadiri M (2015) Max-sum diversification, monotone submodular functions and semi-metric spaces. arXiv preprint arXiv:151102402"},{"key":"811_CR52","doi-asserted-by":"crossref","unstructured":"Zhang M, Hurley N (2008) Avoiding monotony: improving the diversity of recommendation lists, pp 123\u2013130","DOI":"10.1145\/1454008.1454030"},{"issue":"3","key":"811_CR53","doi-asserted-by":"publisher","first-page":"453","DOI":"10.1007\/s101070050006","volume":"87","author":"S Zhang","year":"2000","unstructured":"Zhang S (2000) Quadratic maximization and semidefinite relaxation. Math Program 87(3):453\u2013465","journal-title":"Math Program"},{"key":"811_CR54","doi-asserted-by":"crossref","unstructured":"Ziegler CN, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification, pp 22\u201332","DOI":"10.1145\/1060745.1060754"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00811-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-021-00811-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-021-00811-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T09:55:53Z","timestamp":1648547753000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-021-00811-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,4]]},"references-count":54,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,3]]}},"alternative-id":["811"],"URL":"https:\/\/doi.org\/10.1007\/s10618-021-00811-2","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"type":"print","value":"1384-5810"},{"type":"electronic","value":"1573-756X"}],"subject":[],"published":{"date-parts":[[2022,1,4]]},"assertion":[{"value":"31 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}