{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,11]],"date-time":"2026-06-11T16:05:06Z","timestamp":1781193906565,"version":"3.54.1"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2017,1]]},"abstract":"<jats:p>Crowdsourcing has emerged as a novel problem-solving paradigm, which facilitates addressing problems that are hard for computers, e.g., entity resolution and sentiment analysis. However, due to the openness of crowdsourcing, workers may yield low-quality answers, and a redundancy-based method is widely employed, which first assigns each task to multiple workers and then infers the correct answer (called<jats:italic>truth<\/jats:italic>) for the task based on the answers of the assigned workers. A fundamental problem in this method is<jats:italic>Truth Inference<\/jats:italic>, which decides how to effectively infer the truth. Recently, the database community and data mining community independently study this problem and propose various algorithms. However, these algorithms are not compared extensively under the same framework and it is hard for practitioners to select appropriate algorithms. To alleviate this problem, we provide a detailed survey on 17 existing algorithms and perform a comprehensive evaluation using 5 real datasets. We make all codes and datasets public for future research. Through experiments we find that existing algorithms are not stable across different datasets and there is no algorithm that outperforms others consistently. We believe that the truth inference problem is not fully solved, and identify the limitations of existing algorithms and point out promising research directions.<\/jats:p>","DOI":"10.14778\/3055540.3055547","type":"journal-article","created":{"date-parts":[[2017,3,15]],"date-time":"2017-03-15T14:27:29Z","timestamp":1489588049000},"page":"541-552","source":"Crossref","is-referenced-by-count":344,"title":["Truth inference in crowdsourcing"],"prefix":"10.14778","volume":"10","author":[{"given":"Yudian","family":"Zheng","sequence":"first","affiliation":[{"name":"The University of Hong Kong"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guoliang","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuanbing","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Caihua","family":"Shan","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Reynold","family":"Cheng","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2017,1]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"https:\/\/docs.aws.amazon.com\/AWSMechTurk\/latest\/RequesterUI\/amt-ui.pdf. https:\/\/docs.aws.amazon.com\/AWSMechTurk\/latest\/RequesterUI\/amt-ui.pdf."},{"key":"e_1_2_1_2_1","unstructured":"Amazon mechanical turk. https:\/\/www.mturk.com\/. Amazon mechanical turk. https:\/\/www.mturk.com\/."},{"key":"e_1_2_1_3_1","unstructured":"Chi-squared. https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution. Chi-squared. https:\/\/en.wikipedia.org\/wiki\/Chi-squared_distribution."},{"key":"e_1_2_1_4_1","unstructured":"Adult Datset. https:\/\/github.com\/ipeirotis\/Get-Another-Label\/tree\/master\/data. Adult Datset. https:\/\/github.com\/ipeirotis\/Get-Another-Label\/tree\/master\/data."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/2892753.2892959"},{"key":"e_1_2_1_6_1","volume-title":"Latent dirichlet allocation. JMLR, 3(Jan):993--1022","author":"Blei D. M.","year":"2003","unstructured":"D. M. Blei , A. Y. Ng , and M. I. Jordan . Latent dirichlet allocation. JMLR, 3(Jan):993--1022 , 2003 . D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3(Jan):993--1022, 2003."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.122"},{"issue":"3","key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"5","DOI":"10.58729\/1941-6687.1324","article-title":"survey of entity resolution and record linkage methodologies","volume":"6","author":"Brizan D. G.","year":"2015","unstructured":"D. G. Brizan and A. U. Tansel . A. survey of entity resolution and record linkage methodologies . Communications of the IIMA , 6 ( 3 ): 5 , 2015 . D. G. Brizan and A. U. Tansel. A. survey of entity resolution and record linkage methodologies. Communications of the IIMA, 6(3):5, 2015.","journal-title":"Communications of the IIMA"},{"key":"e_1_2_1_9_1","volume-title":"The Nineteenth TREC Notebook","author":"Buckley C.","year":"2010","unstructured":"C. Buckley , M. Lease , and M. D. Smucker . Overview of the trec 2010 relevance feedback track (notebook) . In The Nineteenth TREC Notebook , 2010 . C. Buckley, M. Lease, and M. D. Smucker. Overview of the trec 2010 relevance feedback track (notebook). In The Nineteenth TREC Notebook, 2010."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/1699510.1699548"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915252"},{"key":"e_1_2_1_12_1","unstructured":"CrowdFlower. http:\/\/crowdflower.com\/. CrowdFlower. http:\/\/crowdflower.com\/."},{"key":"e_1_2_1_13_1","unstructured":"Crowdsourcing Datasets. http:\/\/dbgroup.cs.tsinghua.edu.cn\/ligl\/crowddata\/. Crowdsourcing Datasets. http:\/\/dbgroup.cs.tsinghua.edu.cn\/ligl\/crowddata\/."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2448496.2448524"},{"key":"e_1_2_1_15_1","first-page":"20","volume-title":"Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics","author":"Dawid A. P.","year":"1979","unstructured":"A. P. Dawid and A. M. Skene . Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics , pages 20 -- 28 , 1979 . A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics, pages 20--28, 1979."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187900"},{"key":"e_1_2_1_17_1","series-title":"Series B (methodological)","first-page":"1","volume-title":"Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society","author":"Dempster A. P.","year":"1977","unstructured":"A. P. Dempster , N. M. Laird , and D. B. Rubin . Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society . Series B (methodological) , pages 1 -- 38 , 1977 . A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1--38, 1977."},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","DOI":"10.1201\/9780429246593","volume-title":"An introduction to the bootstrap","author":"Efron B.","year":"1994","unstructured":"B. Efron and R. J. Tibshirani . An introduction to the bootstrap . 1994 . B. Efron and R. J. Tibshirani. An introduction to the bootstrap. 1994."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750550"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989331"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/2733085.2733101"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213880"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856331"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498229"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2487575.2487595"},{"key":"e_1_2_1_26_1","first-page":"1953","volume-title":"NIPS","author":"Karger D. R.","year":"2011","unstructured":"D. R. Karger , S. Oh , and D. Shah . Iterative learning for reliable crowdsourcing systems . In NIPS , pages 1953 -- 1961 , 2011 . D. R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In NIPS, pages 1953--1961, 2011."},{"key":"e_1_2_1_27_1","first-page":"619","volume-title":"AISTATS","author":"Kim H.-C.","year":"2012","unstructured":"H.-C. Kim and Z. Ghahramani . Bayesian classifier combination . In AISTATS , pages 619 -- 627 , 2012 . H.-C. Kim and Z. Ghahramani. Bayesian classifier combination. In AISTATS, pages 619--627, 2012."},{"key":"e_1_2_1_28_1","volume-title":"Probabilistic graphical models: principles and techniques","author":"Koller D.","year":"2009","unstructured":"D. Koller and N. Friedman . Probabilistic graphical models: principles and techniques . MIT Press , 2009 . D. Koller and N. Friedman. Probabilistic graphical models: principles and techniques. MIT Press, 2009."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2535242"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.14778\/2735496.2735505"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2610509"},{"key":"e_1_2_1_32_1","volume-title":"Synthesis lectures on human language technologies, 5(1):1--167","author":"Liu B.","year":"2012","unstructured":"B. Liu . Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1--167 , 2012 . B. Liu. Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1):1--167, 2012."},{"key":"e_1_2_1_33_1","first-page":"692","volume-title":"NIPS","author":"Liu Q.","year":"2012","unstructured":"Q. Liu , J. Peng , and A. T. Ihler . Variational inference for crowdsourcing . In NIPS , pages 692 -- 700 , 2012 . Q. Liu, J. Peng, and A. T. Ihler. Variational inference for crowdsourcing. In NIPS, pages 692--700, 2012."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336676"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2783258.2783314"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/2047485.2047487"},{"key":"e_1_2_1_37_1","first-page":"211","volume-title":"CIDR","author":"Marcus A.","year":"2011","unstructured":"A. Marcus , E. Wu , S. Madden , and R. C. Miller . Crowdsourced databases: Query processing with people . In CIDR , pages 211 -- 214 , 2011 . A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213878"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/2396761.2398421"},{"key":"e_1_2_1_40_1","unstructured":"Project Page. http:\/\/dbgroup.cs.tsinghua.edu.cn\/ligl\/crowd_truth_inference\/. Project Page. http:\/\/dbgroup.cs.tsinghua.edu.cn\/ligl\/crowd_truth_inference\/."},{"key":"e_1_2_1_41_1","volume-title":"Learning from crowds. JMLR, 11(Apr):1297--1322","author":"Raykar V. C.","year":"2010","unstructured":"V. C. Raykar , S. Yu , L. H. Zhao , G. H. Valadez , C. Florin , L. Bogoni , and L. Moy . Learning from crowds. JMLR, 11(Apr):1297--1322 , 2010 . V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. JMLR, 11(Apr):1297--1322, 2010."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/584091.584093"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2736277.2741689"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/1613715.1613751"},{"key":"e_1_2_1_45_1","unstructured":"Twitter Sentiment. http:\/\/www.sananalytics.com\/lab\/twitter-sentiment\/. Twitter Sentiment. http:\/\/www.sananalytics.com\/lab\/twitter-sentiment\/."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2566486.2567989"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2187836.2187969"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1160379"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000001"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350263"},{"key":"e_1_2_1_51_1","first-page":"2424","volume-title":"NIPS","author":"Welinder P.","year":"2010","unstructured":"P. Welinder , S. Branson , P. Perona , and S. J. Belongie . The multidimensional wisdom of crowds . In NIPS , pages 2424 -- 2432 , 2010 . P. Welinder, S. Branson, P. Perona, and S. J. Belongie. The multidimensional wisdom of crowds. In NIPS, pages 2424--2432, 2010."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536337"},{"key":"e_1_2_1_53_1","first-page":"2035","volume-title":"NIPS","author":"Whitehill J.","year":"2009","unstructured":"J. Whitehill , T.-f. Wu , J. Bergsma , J. R. Movellan , and P. L. Ruvolo . Whose vote should count more: Optimal integration of labels from labelers of unknown expertise . In NIPS , pages 2035 -- 2043 , 2009 . J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009."},{"key":"e_1_2_1_54_1","first-page":"932","volume-title":"AISTATS","author":"Yan Y.","year":"2010","unstructured":"Y. Yan , R. Rosales , G. Fung , M. W. Schmidt , G. H. Valadez , L. Bogoni , L. Moy , and J. G. Dy . Modeling annotator expertise: Learning when everybody knows a bit of something . In AISTATS , pages 932 -- 939 , 2010 . Y. Yan, R. Rosales, G. Fung, M. W. Schmidt, G. H. Valadez, L. Bogoni, L. Moy, and J. G. Dy. Modeling annotator expertise: Learning when everybody knows a bit of something. In AISTATS, pages 932--939, 2010."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/2921558.2921559"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/1996889.1996934"},{"key":"e_1_2_1_57_1","first-page":"397","volume-title":"EDBT","author":"Zhao Z.","year":"2015","unstructured":"Z. Zhao , F. Wei , M. Zhou , W. Chen , and W. Ng . Crowd-selection query processing in crowdsourcing databases: A task-driven approach . EDBT , pages 397 -- 408 , 2015 . Z. Zhao, F. Wei, M. Zhou, W. Chen, and W. Ng. Crowd-selection query processing in crowdsourcing databases: A task-driven approach. EDBT, pages 397--408, 2015."},{"key":"e_1_2_1_58_1","first-page":"193","volume-title":"EDBT","author":"Zheng Y.","year":"2015","unstructured":"Y. Zheng , R. Cheng , S. Maniu , and L. Mo . On optimality of jury selection in crowdsourcing . In EDBT , pages 193 -- 204 , 2015 . Y. Zheng, R. Cheng, S. Maniu, and L. Mo. On optimality of jury selection in crowdsourcing. In EDBT, pages 193--204, 2015."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.14778\/3025111.3025118"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749430"},{"key":"e_1_2_1_61_1","first-page":"2195","volume-title":"NIPS","author":"Zhou D.","year":"2012","unstructured":"D. Zhou , S. Basu , Y. Mao , and J. C. Platt . Learning from the wisdom of crowds by minimax entropy . In NIPS , pages 2195 -- 2203 , 2012 . D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In NIPS, pages 2195--2203, 2012."},{"key":"e_1_2_1_62_1","first-page":"262","volume-title":"ICML","author":"Zhou D.","year":"2014","unstructured":"D. Zhou , Q. Liu , J. Platt , and C. Meek . Aggregating ordinal labels from crowds by minimax conditional entropy . In ICML , pages 262 -- 270 , 2014 . D. Zhou, Q. Liu, J. Platt, and C. Meek. Aggregating ordinal labels from crowds by minimax conditional entropy. In ICML, pages 262--270, 2014."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1627"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3055540.3055547","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T17:01:40Z","timestamp":1692723700000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3055540.3055547"}},"subtitle":["is the problem solved?"],"short-title":[],"issued":{"date-parts":[[2017,1]]},"references-count":63,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2017,1]]}},"alternative-id":["10.14778\/3055540.3055547"],"URL":"https:\/\/doi.org\/10.14778\/3055540.3055547","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2017,1]]}}}