{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:26:24Z","timestamp":1750220784206,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2020,8,5]],"date-time":"2020-08-05T00:00:00Z","timestamp":1596585600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100017336","name":"Dipartimenti di Eccellenza","doi-asserted-by":"crossref","award":["L. 232"],"award-info":[{"award-number":["L. 232"]}],"id":[{"id":"10.13039\/100017336","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003407","name":"MIUR","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003407","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Italian Ministry of Education, University and Research, under PRIN Project","award":["n. 20174LF3T8"],"award-info":[{"award-number":["n. 20174LF3T8"]}]},{"name":"AHeAD"},{"name":"University of Padova under project \u201cSID 2020: Resource-Allocation Tradeoffs for Dynamic and Extreme Data\u201d"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,10,31]]},"abstract":"<jats:p>\n            Diversity maximization is a fundamental problem in web search and data mining. For a given dataset\n            <jats:italic>S<\/jats:italic>\n            of\n            <jats:italic>n<\/jats:italic>\n            elements, the problem requires to determine a subset of\n            <jats:italic>S<\/jats:italic>\n            containing\n            <jats:italic>k<\/jats:italic>\n            \u226a\n            <jats:italic>n<\/jats:italic>\n            \u201crepresentatives\u201d which maximize some diversity function expressed in terms of pairwise distances, where distance models dissimilarity. An important variant of the problem prescribes that the solution satisfy an additional orthogonal requirement, which can be specified as a matroid constraint (i.e., a feasible solution must be an independent set of size\n            <jats:italic>k<\/jats:italic>\n            of a given matroid). While unconstrained diversity maximization admits efficient coreset-based strategies for several diversity functions, known approaches dealing with the additional matroid constraint apply only to one diversity function (sum of distances), and are based on an expensive, inherently sequential, local search over the entire input dataset. We devise the first coreset-based algorithms for diversity maximization under matroid constraints for various diversity functions, together with efficient sequential, MapReduce, and Streaming implementations. Technically, our algorithms rely on the construction of a small coreset, that is, a subset of\n            <jats:italic>S<\/jats:italic>\n            containing a feasible solution which is no more than a factor 1\u2212\u025b away from the optimal solution for\n            <jats:italic>S<\/jats:italic>\n            . While our algorithms are fully general, for the partition and transversal matroids, if \u025b is a constant in\n            <jats:italic>(0,1)<\/jats:italic>\n            and\n            <jats:italic>S<\/jats:italic>\n            has bounded doubling dimension, the coreset size is independent of\n            <jats:italic>n<\/jats:italic>\n            and it is small enough to afford the execution of a slow sequential algorithm to extract a final, accurate, solution in reasonable time. Extensive experiments show that our algorithms are accurate, fast, and scalable, and therefore they are capable of dealing with the large input instances typical of the big data scenario.\n          <\/jats:p>","DOI":"10.1145\/3402448","type":"journal-article","created":{"date-parts":[[2020,8,5]],"date-time":"2020-08-05T18:53:00Z","timestamp":1596653580000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["A General Coreset-Based Approach to Diversity Maximization under Matroid Constraints"],"prefix":"10.1145","volume":"14","author":[{"given":"Matteo","family":"Ceccarello","sequence":"first","affiliation":[{"name":"Free University of Bozen, Bolzano, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9189-9618","authenticated-orcid":false,"given":"Andrea","family":"Pietracaprina","sequence":"additional","affiliation":[{"name":"University of Padova, Padova, Italy"}]},{"given":"Geppino","family":"Pucci","sequence":"additional","affiliation":[{"name":"University of Padova, Padova, Italy"}]}],"member":"320","published-online":{"date-parts":[[2020,8,5]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487636"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1824777.1824779"},{"key":"e_1_2_1_3_1","first-page":"1","article-title":"Geometric approximation via coresets","volume":"52","author":"Agarwal Pankaj K.","year":"2005","unstructured":"Pankaj K. Agarwal , Sariel Har-Peled , and Kasturi R. Varadarajan . 2005 . Geometric approximation via coresets . Journal of Combinatorial and Computational Geometry 52 (2005), 1 -- 30 . http:\/\/library.msri.org\/books\/Book52\/contents.html. Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan. 2005. Geometric approximation via coresets. Journal of Combinatorial and Computational Geometry 52 (2005), 1--30. http:\/\/library.msri.org\/books\/Book52\/contents.html.","journal-title":"Journal of Combinatorial and Computational Geometry"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the 27th Canadian Conference on Computational Geometry (CCCG\u201915)","author":"Aghamolaei Sepideh","year":"2015","unstructured":"Sepideh Aghamolaei , Majid Farhadi , and Hamid Zarrabi-Zadeh . 2015 . Diversity maximization via composable coresets . In Proceedings of the 27th Canadian Conference on Computational Geometry (CCCG\u201915) . Sepideh Aghamolaei, Majid Farhadi, and Hamid Zarrabi-Zadeh. 2015. Diversity maximization via composable coresets. In Proceedings of the 27th Canadian Conference on Computational Geometry (CCCG\u201915)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157556"},{"key":"e_1_2_1_6_1","first-page":"993","article-title":"Latent dirichlet allocation","author":"Blei David M.","year":"2003","unstructured":"David M. Blei , Andrew Y. Ng , and Michael I. Jordan . 2003 . Latent dirichlet allocation . Journal of Machine Learning Research 3 , Jan. (2003), 993 -- 1022 . Retrieved from http:\/\/jmlr.org\/papers\/v3\/blei03a.html. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3, Jan. (2003), 993--1022. Retrieved from http:\/\/jmlr.org\/papers\/v3\/blei03a.html.","journal-title":"Journal of Machine Learning Research 3"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3294052.3319701"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213556.2213580"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159719"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055540.3055541"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 29th International Symposium on Algorithms and Computation (ISAAC\u201918)","volume":"123","author":"Cevallos Alfonso","year":"2018","unstructured":"Alfonso Cevallos , Friedrich Eisenbrand , and Sarah Morell . 2018 . Diversity maximization in doubling metrics . In Proceedings of the 29th International Symposium on Algorithms and Computation (ISAAC\u201918) , Wen-Lian Hsu, Der-Tsai Lee, and Chung-Shou Liao (Eds.) , Vol. 123 . Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 33:1--33:12. DOI:https:\/\/doi.org\/10.4230\/LIPIcs.ISAAC.2018.33 10.4230\/LIPIcs.ISAAC.2018.33 Alfonso Cevallos, Friedrich Eisenbrand, and Sarah Morell. 2018. Diversity maximization in doubling metrics. In Proceedings of the 29th International Symposium on Algorithms and Computation (ISAAC\u201918), Wen-Lian Hsu, Der-Tsai Lee, and Chung-Shou Liao (Eds.), Vol. 123. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 33:1--33:12. DOI:https:\/\/doi.org\/10.4230\/LIPIcs.ISAAC.2018.33"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611974782.9"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1006\/jagm.2000.1145"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539702418498"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1132516.1132599"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI\u201904)","author":"Dean Jeffrey","year":"2004","unstructured":"Jeffrey Dean and Sanjay Ghemawat . 2004 . MapReduce: Simplified data processing on large clusters . In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI\u201904) . 137--150. Retrieved from http:\/\/www.usenix.org\/events\/osdi04\/tech\/dean.html. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI\u201904). 137--150. Retrieved from http:\/\/www.usenix.org\/events\/osdi04\/tech\/dean.html."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323165.3323172"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(85)90224-5"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2014.2339840"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the DIMACS Workshop on External Memory Algorithms. 107--118","author":"Henzinger Monika Rauch","year":"1998","unstructured":"Monika Rauch Henzinger , Prabhakar Raghavan , and Sridhar Rajagopalan . 1998 . Computing on data streams . In Proceedings of the DIMACS Workshop on External Memory Algorithms. 107--118 . Monika Rauch Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. 1998. Computing on data streams. In Proceedings of the DIMACS Workshop on External Memory Algorithms. 107--118."},{"volume-title":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS\u201914)","author":"Indyk Piotr","key":"e_1_2_1_21_1","unstructured":"Piotr Indyk , Sepideh Mahabadi , Mohammad Mahdian , and Vahab S. Mirrokni . 2014. Composable core-sets for diversity and coverage maximization . In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS\u201914) . 100--108. DOI:https:\/\/doi.org\/10.1145\/2594538.2594560 10.1145\/2594538.2594560 Piotr Indyk, Sepideh Mahabadi, Mohammad Mahdian, and Vahab S. Mirrokni. 2014. Composable core-sets for diversity and coverage maximization. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS\u201914). 100--108. DOI:https:\/\/doi.org\/10.1145\/2594538.2594560"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.5555\/1873601.1873677"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1400751.1400808"},{"key":"e_1_2_1_24_1","volume-title":"Ullman","author":"Leskovec Jure","year":"2014","unstructured":"Jure Leskovec , Anand Rajaraman , and Jeffrey D . Ullman . 2014 . Mining of Massive Datasets (2nd ed.). Cambridge University Press . Jure Leskovec, Anand Rajaraman, and Jeffrey D. Ullman. 2014. Mining of Massive Datasets (2nd ed.). Cambridge University Press."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.1070.0413"},{"volume-title":"Matroid Theory","author":"Oxley James G.","key":"e_1_2_1_26_1","unstructured":"James G. Oxley . 2006. Matroid Theory . Oxford University Press . James G. Oxley. 2006. Matroid Theory. Oxford University Press."},{"volume-title":"Proceedings of the 18th Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914)","author":"Pennington Jeffrey","key":"e_1_2_1_27_1","unstructured":"Jeffrey Pennington , Richard Socher , and Christopher D. Manning . 2014. Glove: Global vectors for word representation . In Proceedings of the 18th Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914) , Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1532--1543. Retrieved from https:\/\/www.aclweb.org\/anthology\/D14-1162\/. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 18th Conference on Empirical Methods in Natural Language Processing (EMNLP\u201914), Alessandro Moschitti, Bo Pang, and Walter Daelemans (Eds.). ACL, 1532--1543. Retrieved from https:\/\/www.aclweb.org\/anthology\/D14-1162\/."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2304576.2304607"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.4028\/www.scientific.net\/AMM.347-350.2548"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-014-0781-x"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201910)","author":"Zaharia Matei","year":"2010","unstructured":"Matei Zaharia , Mosharaf Chowdhury , Michael J. Franklin , Scott Shenker , and Ion Stoica . 2010 . Spark: Cluster computing with working sets . In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201910) , Erich M. Nahum and Dongyan Xu (Eds.). USENIX Association. Retrieved from https:\/\/www.usenix.org\/conference\/hotcloud-10\/spark-cluster-computing-working-sets. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud\u201910), Erich M. Nahum and Dongyan Xu (Eds.). USENIX Association. Retrieved from https:\/\/www.usenix.org\/conference\/hotcloud-10\/spark-cluster-computing-working-sets."}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3402448","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3402448","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:34Z","timestamp":1750200094000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3402448"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,5]]},"references-count":31,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2020,10,31]]}},"alternative-id":["10.1145\/3402448"],"URL":"https:\/\/doi.org\/10.1145\/3402448","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2020,8,5]]},"assertion":[{"value":"2019-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}