{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T20:43:19Z","timestamp":1765485799374},"reference-count":26,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2012,12]]},"abstract":"<jats:p>In this paper, we address the problem of selectivity estimation in a crowdsourced database. Specifically, we develop several techniques for using workers on a crowdsourcing platform like Amazon's Mechanical Turk to estimate the fraction of items in a dataset (e.g., a collection of photos) that satisfy some property or predicate (e.g., photos of trees). We do this without explicitly iterating through every item in the dataset. This is important in crowd-sourced query optimization to support predicate ordering and in query evaluation, when performing a GROUP BY operation with a COUNT or AVG aggregate. We compare sampling item labels, a traditional approach, to showing workers a collection of items and asking them to estimate how many satisfy some predicate. Additionally, we develop techniques to eliminate spammers and colluding attackers trying to skew selectivity estimates when using this count estimation approach. We find that for images, counting can be much more effective than sampled labeling, reducing the amount of work necessary to arrive at an estimate that is within 1% of the true fraction by up to an order of magnitude, with lower worker latency. We also find that sampled labeling outperforms count estimation on a text processing task, presumably because people are better at quickly processing large batches of images than they are at reading strings of text. Our spammer detection technique, which is applicable to both the label- and count-based approaches, can improve accuracy by up to two orders of magnitude.<\/jats:p>","DOI":"10.14778\/2535568.2448944","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"109-120","source":"Crossref","is-referenced-by-count":55,"title":["Counting with the crowd"],"prefix":"10.14778","volume":"6","author":[{"given":"Adam","family":"Marcus","sequence":"first","affiliation":[{"name":"MIT, CSAIL"}]},{"given":"David","family":"Karger","sequence":"additional","affiliation":[{"name":"MIT, CSAIL"}]},{"given":"Samuel","family":"Madden","sequence":"additional","affiliation":[{"name":"MIT, CSAIL"}]},{"given":"Robert","family":"Miller","sequence":"additional","affiliation":[{"name":"MIT, CSAIL"}]},{"given":"Sewoong","family":"Oh","sequence":"additional","affiliation":[{"name":"MIT, CSAIL"}]}],"member":"320","published-online":{"date-parts":[[2012,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"UIST, 2011","author":"Ahmad S.","year":"2047","unstructured":"S. Ahmad The jabberwocky programming environment for structured social computing . In UIST, 2011 . 10.1145\/ 2047 196.2047203 S. Ahmad et al. The jabberwocky programming environment for structured social computing. In UIST, 2011. 10.1145\/2047196.2047203"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1145\/2145204.2145277","volume-title":"CSCW","author":"Andr\u00e9 P.","year":"2012","unstructured":"P. Andr\u00e9 , M. S. Bernstein , and K. Luther . Who gives a tweet?: evaluating microblog content value . In CSCW , pages 471 - 474 , 2012 . 10.1145\/2145204.2145277 P. Andr\u00e9, M. S. Bernstein, and K. Luther. Who gives a tweet?: evaluating microblog content value. In CSCW, pages 471-474, 2012. 10.1145\/2145204.2145277"},{"key":"e_1_2_1_3_1","volume-title":"HCOMP","author":"Attenberg J.","year":"2011","unstructured":"J. Attenberg Beat the machine: Challenging workers to find the unknown unknowns . In HCOMP , 2011 . J. Attenberg et al. Beat the machine: Challenging workers to find the unknown unknowns. In HCOMP, 2011."},{"key":"e_1_2_1_4_1","volume-title":"UIST, 2010","author":"Bernstein M. S.","year":"1866","unstructured":"M. S. Bernstein : a word processor with a crowd inside . In UIST, 2010 . 10.1145\/ 1866 029.1866078 M. S. Bernstein et al. Soylent: a word processor with a crowd inside. In UIST, 2010. 10.1145\/1866029.1866078"},{"key":"e_1_2_1_5_1","volume-title":"UIST, 2011","author":"Bernstein M. S.","year":"2047","unstructured":"M. S. Bernstein Crowds in two seconds: enabling realtime crowd-powered interfaces . In UIST, 2011 . 10.1145\/ 2047 196.2047201 M. S. Bernstein et al. Crowds in two seconds: enabling realtime crowd-powered interfaces. In UIST, 2011. 10.1145\/2047196.2047201"},{"key":"e_1_2_1_6_1","volume-title":"Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Royal Stat. Soc.","author":"Dawid A.","year":"1979","unstructured":"A. Dawid and A. Skene . Max . Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Royal Stat. Soc. , 1979 . A. Dawid and A. Skene. Max. Likelihood Estimation of Observer Error-Rates Using the EM Algorithm. Royal Stat. Soc., 1979."},{"key":"e_1_2_1_7_1","first-page":"251","volume-title":"IPTPS","author":"Douceur J. R.","year":"2002","unstructured":"J. R. Douceur . The sybil attack . In IPTPS , pages 251 - 260 , 2002 . J. R. Douceur. The sybil attack. In IPTPS, pages 251-260, 2002."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-4541-9","volume-title":"An Introduction to the Bootstrap","author":"Efron B.","year":"1993","unstructured":"B. Efron and R. Tibshirani . An Introduction to the Bootstrap . Chapman & Hall\/CRC , 1993 . B. Efron and R. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall\/CRC, 1993."},{"key":"e_1_2_1_9_1","volume-title":"An Introduction to Probability Theory","author":"Feller W.","year":"1968","unstructured":"W. Feller . An Introduction to Probability Theory , Vol. 1 . Wiley , New York, NY , third edition, 1968 . W. Feller. An Introduction to Probability Theory, Vol. 1. Wiley, New York, NY, third edition, 1968."},{"key":"e_1_2_1_10_1","first-page":"61","volume-title":"SIGMOD","author":"Franklin M.","year":"2011","unstructured":"M. Franklin , D. Kossmann , T. Kraska , S. Ramesh , and R. Xin . CrowdDB: Answering queries with crowdsourcing . In SIGMOD , pages 61 - 72 , 2011 . 10.1145\/1989323.1989331 M. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. CrowdDB: Answering queries with crowdsourcing. In SIGMOD, pages 61-72, 2011. 10.1145\/1989323.1989331"},{"key":"e_1_2_1_11_1","volume-title":"HLT-NAACL","author":"Gormley M. R.","year":"2010","unstructured":"M. R. Gormley , A. Gerber , M. Harper , and M. Dredze . Nonexpert correction of automatically generated relation annotations . In HLT-NAACL , 2010 . M. R. Gormley, A. Gerber, M. Harper, and M. Dredze. Nonexpert correction of automatically generated relation annotations. In HLT-NAACL, 2010."},{"key":"e_1_2_1_12_1","volume-title":"SIGMOD","author":"Guo S.","year":"2012","unstructured":"S. Guo , A. Parameswaran , and H. Garcia-Molina . So who won? dynamic max discovery with the crowd . In SIGMOD , 2012 . 10.1145\/2213836.2213880 S. Guo, A. Parameswaran, and H. Garcia-Molina. So who won? dynamic max discovery with the crowd. In SIGMOD, 2012. 10.1145\/2213836.2213880"},{"key":"e_1_2_1_13_1","unstructured":"P. Ipeirotis. Identify Verification (and how to bypass it) March 2012. http:\/\/www.behind-the-enemy-lines.com\/2012\/01\/identify-verification-and-how-to-bypass.html.  P. Ipeirotis. Identify Verification (and how to bypass it) March 2012. http:\/\/www.behind-the-enemy-lines.com\/2012\/01\/identify-verification-and-how-to-bypass.html."},{"key":"e_1_2_1_14_1","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1145\/1837885.1837906","volume-title":"HCOMP","author":"Ipeirotis P. G.","year":"2010","unstructured":"P. G. Ipeirotis , F. Provost , and J. Wang . Quality management on amazon mechanical turk . In HCOMP , pages 64 - 67 , 2010 . 10.1145\/1837885.1837906 P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In HCOMP, pages 64-67, 2010. 10.1145\/1837885.1837906"},{"key":"e_1_2_1_15_1","first-page":"1953","volume-title":"NIPS","author":"Karger D. R.","year":"2011","unstructured":"D. R. Karger , S. Oh , and D. Shah . Iterative learning for reliable crowdsourcing systems . In NIPS , pages 1953 - 1961 , 2011 . D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In NIPS, pages 1953-1961, 2011."},{"key":"e_1_2_1_16_1","volume-title":"CIDR","author":"Marcus A.","year":"2011","unstructured":"A. Marcus , E. Wu , : Query processing with people . In CIDR , 2011 . A. Marcus, E. Wu, et al. Crowdsourced databases: Query processing with people. In CIDR, 2011."},{"key":"e_1_2_1_17_1","volume-title":"PVLDB","author":"Marcus A.","year":"2011","unstructured":"A. Marcus , E. Wu , D. R. Karger , S. Madden , and R. C. Miller . Human-powered sorts and joins . PVLDB , 2011 . A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 2011."},{"key":"e_1_2_1_18_1","volume-title":"Crowdscreen: Algorithms for filtering data with humans","author":"Parameswaran A.","year":"2011","unstructured":"A. Parameswaran Crowdscreen: Algorithms for filtering data with humans . 2011 . A. Parameswaran et al. Crowdscreen: Algorithms for filtering data with humans. 2011."},{"key":"e_1_2_1_19_1","volume-title":"CIDR","author":"Parameswaran A.","year":"2011","unstructured":"A. Parameswaran and N. Polyzotis . Answering queries using databases, humans and algorithms . In CIDR , 2011 . A. Parameswaran and N. Polyzotis. Answering queries using databases, humans and algorithms. In CIDR, 2011."},{"key":"e_1_2_1_20_1","volume-title":"The Wisdom of Crowds","author":"Surowiecki J.","year":"2005","unstructured":"J. Surowiecki . The Wisdom of Crowds . Anchor Books , 2005 . J. Surowiecki. The Wisdom of Crowds. Anchor Books, 2005."},{"key":"e_1_2_1_21_1","volume-title":"March","author":"Tarr\u00e9s F.","year":"2012","unstructured":"F. Tarr\u00e9s and A. Rama . GTAV Face Database , March 2012 . F. Tarr\u00e9s and A. Rama. GTAV Face Database, March 2012."},{"key":"e_1_2_1_22_1","volume-title":"Getting it all from the crowd. CoRR, abs\/1202.2335","author":"Trushkowsky B.","year":"2012","unstructured":"B. Trushkowsky , T. Kraska , M. J. Franklin , and P. Sarkar . Getting it all from the crowd. CoRR, abs\/1202.2335 , 2012 . B. Trushkowsky, T. Kraska, M. J. Franklin, and P. Sarkar. Getting it all from the crowd. CoRR, abs\/1202.2335, 2012."},{"key":"e_1_2_1_23_1","volume-title":"Sept.","author":"Wilcox R. R.","year":"2003","unstructured":"R. R. Wilcox and H. J. Keselman . Modern robust data analysis methods: measures of central tendency. Psych. methods , Sept. 2003 . R. R. Wilcox and H. J. Keselman. Modern robust data analysis methods: measures of central tendency. Psych. methods, Sept. 2003."},{"key":"e_1_2_1_24_1","volume-title":"Probable inference, the law of succession, and statistical inference. American Statistical Assoc., 22(158)","author":"Wilson E. B.","year":"1927","unstructured":"E. B. Wilson . Probable inference, the law of succession, and statistical inference. American Statistical Assoc., 22(158) , 1927 . E. B. Wilson. Probable inference, the law of succession, and statistical inference. American Statistical Assoc., 22(158), 1927."},{"key":"e_1_2_1_25_1","volume-title":"Journal of Experimental Psychology","author":"Wolfe J. M.","year":"2007","unstructured":"J. M. Wolfe Low target prevalence is a stubborn source of errors in visual search tasks . In Journal of Experimental Psychology , 2007 . J. M. Wolfe et al. Low target prevalence is a stubborn source of errors in visual search tasks. In Journal of Experimental Psychology, 2007."},{"key":"e_1_2_1_26_1","first-page":"13","volume-title":"Attention","author":"Wolfe J. M.","year":"1998","unstructured":"J. M. Wolfe and H. Pashler (Editor). Attention , chapter Visual Search, pages 13 - 73 . University College London Press , 1998 . J. M. Wolfe and H. Pashler (Editor). Attention, chapter Visual Search, pages 13-73. University College London Press, 1998."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2535568.2448944","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:06:23Z","timestamp":1672225583000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2535568.2448944"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,12]]},"references-count":26,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,12]]}},"alternative-id":["10.14778\/2535568.2448944"],"URL":"https:\/\/doi.org\/10.14778\/2535568.2448944","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2012,12]]}}}