{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,7,17]],"date-time":"2023-07-17T05:47:59Z","timestamp":1689572879040},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,1]]},"abstract":"\n Given a dataset of points in a metric space and an integer\n k<\/jats:italic>\n , a diversity maximization problem requires determining a subset of\n k<\/jats:italic>\n points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass\/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the input which contain good approximations to the optimal solution for the whole input. For a variety of diversity objective functions, our algorithms attain an (\n \u03b1<\/jats:italic>\n +\n \u03b5<\/jats:italic>\n )-approximation ratio, for any constant\n \u03b5<\/jats:italic>\n > 0, where\n \u03b1<\/jats:italic>\n is the best approximation ratio achieved by a polynomial-time, linear-space sequential algorithm for the same diversity objective. This improves substantially over the approximation ratios attainable in Streaming and MapReduce by state-of-the-art algorithms for general metric spaces. We provide extensive experimental evidence of the effectiveness of our algorithms on both real world and synthetic datasets, scaling up to over a billion points.\n <\/jats:p>","DOI":"10.14778\/3055540.3055541","type":"journal-article","created":{"date-parts":[[2017,3,15]],"date-time":"2017-03-15T14:27:29Z","timestamp":1489588049000},"page":"469-480","source":"Crossref","is-referenced-by-count":14,"title":["MapReduce and streaming algorithms for diversity maximization in metric spaces of bounded doubling dimension"],"prefix":"10.14778","volume":"10","author":[{"given":"Matteo","family":"Ceccarello","sequence":"first","affiliation":[{"name":"University of Padova, Padova, Italy"}]},{"given":"Andrea","family":"Pietracaprina","sequence":"additional","affiliation":[{"name":"University of Padova, Padova, Italy"}]},{"given":"Geppino","family":"Pucci","sequence":"additional","affiliation":[{"name":"University of Padova, Padova, Italy"}]},{"given":"Eli","family":"Upfal","sequence":"additional","affiliation":[{"name":"Brown University"}]}],"member":"320","published-online":{"date-parts":[[2017,1]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487636"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1824777.1824779"},{"key":"e_1_2_1_3_1","volume-title":"Geometric approximation via coresets. Combinatorial and computational geometry, 52:1--30","author":"Agarwal P.","year":"2005","unstructured":"P. Agarwal , S. Har-Peled , and K. Varadarajan . Geometric approximation via coresets. Combinatorial and computational geometry, 52:1--30 , 2005 . P. Agarwal, S. Har-Peled, and K. Varadarajan. Geometric approximation via coresets. Combinatorial and computational geometry, 52:1--30, 2005."},{"key":"e_1_2_1_4_1","first-page":"38","volume-title":"Proc. CCCG","author":"Aghamolaei S.","year":"2015","unstructured":"S. Aghamolaei , M. Farhadi , and H. Zarrabi-Zadeh . Diversity maximization via composable coresets . In Proc. CCCG , pages 38 -- 48 , 2015 . S. Aghamolaei, M. Farhadi, and H. Zarrabi-Zadeh. Diversity maximization via composable coresets. In Proc. CCCG, pages 38--48, 2015."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989405"},{"key":"e_1_2_1_6_1","volume-title":"Proc. ISMIR","author":"Bertin-Mahieux T.","year":"2011","unstructured":"T. Bertin-Mahieux , D. P. Ellis , B. Whitman , and P. Lamere . The million song dataset . In Proc. ISMIR , 2011 . T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere. The million song dataset. In Proc. ISMIR, 2011."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963452"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-007-9142-2"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2755573.2755591"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055540.3055541"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2016.61"},{"key":"e_1_2_1_12_1","first-page":"26","volume-title":"Proc. SoCG","volume":"51","author":"Cevallos A.","year":"2016","unstructured":"A. Cevallos , F. Eisenbrand , and R. Zenklusen . Max-sum diversity via convex programming . In Proc. SoCG , volume 51 , page 26 , 2016 . A. Cevallos, F. Eisenbrand, and R. Zenklusen. Max-sum diversity via convex programming. In Proc. SoCG, volume 51, page 26, 2016."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.2000.1145"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539702418498"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1247480.1247551"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132516.1132599"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00453-003-1074-x"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1526709.1526761"},{"issue":"293","key":"e_1_2_1_19_1","first-page":"306","article-title":"Clustering to minimize the maximum intercluster distance","volume":"38","author":"Gonzalez T. F.","year":"1985","unstructured":"T. F. Gonzalez . Clustering to minimize the maximum intercluster distance . Theoretical Computer Science , 38 : 293 -- 306 , 1985 . T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293 -- 306, 1985.","journal-title":"Theoretical Computer Science"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2014.2339840"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/946243.946308"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895480196309791"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6377(97)00034-5"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2594538.2594560"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.5555\/1873601.1873677"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87779-0_26"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.5555\/2787930"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1070.0413"},{"key":"e_1_2_1_29_1","volume-title":"Proc. ICWSM","author":"Munson S.","year":"2009","unstructured":"S. Munson , D. Zhou , and P. Resnick . Sidelines: An algorithm for increasing diversity in news and opinion aggregators . In Proc. ICWSM , 2009 . S. Munson, D. Zhou, and P. Resnick. Sidelines: An algorithm for increasing diversity in news and opinion aggregators. In Proc. ICWSM, 2009."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304607"},{"key":"e_1_2_1_31_1","first-page":"107","volume-title":"Proc. DIMACS Workshop External Memory and Visualization","volume":"50","author":"Raghavan P.","year":"1999","unstructured":"P. Raghavan and M. Henzinger . Computing on data streams . In Proc. DIMACS Workshop External Memory and Visualization , volume 50 , page 107 , 1999 . P. Raghavan and M. Henzinger. Computing on data streams. In Proc. DIMACS Workshop External Memory and Visualization, volume 50, page 107, 1999."},{"key":"e_1_2_1_32_1","volume-title":"Handbook of Approximation Algorithms and Metaheuristics.","author":"Rosenkrantz D.","year":"2007","unstructured":"D. Rosenkrantz , S. Ravi , and G. Tayi . Approximation algorithms for facility dispersion . In Handbook of Approximation Algorithms and Metaheuristics. 2007 . D. Rosenkrantz, S. Ravi, and G. Tayi. Approximation algorithms for facility dispersion. In Handbook of Approximation Algorithms and Metaheuristics. 2007."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1137\/0404048"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.4028\/www.scientific.net\/AMM.347-350.2548"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0781-x"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2009.225"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3055540.3055541","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:12:21Z","timestamp":1672225941000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3055540.3055541"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,1]]},"references-count":36,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2017,1]]}},"alternative-id":["10.14778\/3055540.3055541"],"URL":"http:\/\/dx.doi.org\/10.14778\/3055540.3055541","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":["General Earth and Planetary Sciences","Water Science and Technology","Geography, Planning and Development"],"published":{"date-parts":[[2017,1]]}}}