{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T17:44:47Z","timestamp":1775324687215,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2018,9]]},"abstract":"<jats:p>\n                    Large-scale data annotation is indispensable for many applications, such as machine learning and data integration. However, existing annotation solutions either incur expensive cost for large datasets or produce noisy results. This paper introduces a cost-effective annotation approach, and focuses on the labeling rule generation problem that aims to generate high-quality rules to largely reduce the labeling cost while preserving quality. To address the problem, we first generate candidate rules, and then devise a game-based crowdsourcing approach C\n                    <jats:sc>ROWD<\/jats:sc>\n                    G\n                    <jats:sc>AME<\/jats:sc>\n                    to select high-quality rules by considering\n                    <jats:italic>coverage and precision.<\/jats:italic>\n                    C\n                    <jats:sc>ROWD<\/jats:sc>\n                    G\n                    <jats:sc>AME<\/jats:sc>\n                    employs two groups of crowd workers: one group answers rule validation tasks (whether a rule is valid) to play a role of\n                    <jats:italic>rule generator,<\/jats:italic>\n                    while the other group answers tuple checking tasks (whether the annotated label of a data tuple is correct) to play a role of\n                    <jats:italic>rule refuter.<\/jats:italic>\n                    We let the two groups play a two-player game: rule generator identifies high-quality rules with large coverage and precision, while rule refuter tries to refute its opponent rule generator by checking some tuples that provide enough evidence to reject rules covering the tuples. This paper studies the challenges in C\n                    <jats:sc>ROWD<\/jats:sc>\n                    G\n                    <jats:sc>AME<\/jats:sc>\n                    . The first is to balance the trade-off between coverage and precision. We define the loss of a rule by considering the two factors. The second is rule precision estimation. We utilize\n                    <jats:italic>Bayesian estimation<\/jats:italic>\n                    to combine both rule validation and tuple checking tasks. The third is to select crowdsourcing tasks to fulfill the game-based framework for minimizing the loss. We introduce a minimax strategy and develop efficient task selection algorithms. We conduct experiments on entity matching and relation extraction, and the results show that our method outperforms state-of-the-art solutions.\n                  <\/jats:p>","DOI":"10.14778\/3275536.3275541","type":"journal-article","created":{"date-parts":[[2018,12,19]],"date-time":"2018-12-19T08:08:07Z","timestamp":1545206887000},"page":"57-70","source":"Crossref","is-referenced-by-count":32,"title":["Cost-effective data annotation using game-based crowdsourcing"],"prefix":"10.14778","volume":"12","author":[{"given":"Jingru","family":"Yang","sequence":"first","affiliation":[{"name":"Renmin University of China, China"}]},{"given":"Ju","family":"Fan","sequence":"additional","affiliation":[{"name":"Renmin University of China, China"}]},{"given":"Zhewei","family":"Wei","sequence":"additional","affiliation":[{"name":"Renmin University of China, China and Beihang University, China"}]},{"given":"Guoliang","family":"Li","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}]},{"given":"Tongyu","family":"Liu","sequence":"additional","affiliation":[{"name":"Renmin University of China, China"}]},{"given":"Xiaoyong","family":"Du","sequence":"additional","affiliation":[{"name":"Renmin University of China, China"}]}],"member":"320","published-online":{"date-parts":[[2018,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2082"},{"key":"e_1_2_1_2_1","volume-title":"Springer","author":"Bishop C. M.","year":"2007"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1080\/00949659208811439"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915252"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/89086.89095"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035960"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.250581"},{"issue":"2","key":"e_1_2_1_8_1","first-page":"104","article-title":"Human-in-the-loop rule learning for data integration","volume":"41","author":"Fan J.","year":"2018","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2750550"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2014.6816716"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2407353"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989331"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2588576"},{"key":"e_1_2_1_14_1","first-page":"2672","volume-title":"NIPS","author":"Goodfellow I.","year":"2014"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/2856318.2856331"},{"key":"e_1_2_1_16_1","first-page":"541","volume-title":"ACL","author":"Hoffmann R.","year":"2011"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"M. Joglekar H. Garcia-Molina and A. Parameswaran. Comprehensive and reliable crowd assessment algorithms. pages 195--206 2014. M. Joglekar H. Garcia-Molina and A. Parameswaran. Comprehensive and reliable crowd assessment algorithms. pages 195--206 2014.","DOI":"10.1109\/ICDE.2015.7113284"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983323.2983831"},{"key":"e_1_2_1_19_1","first-page":"957","volume-title":"ICML 2015","author":"Kusner M. J.","year":"2015"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3137765.3137833"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064036"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2535242"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1104"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/2336664.2336676"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989486"},{"key":"e_1_2_1_27_1","unstructured":"T. Mikolov K. Chen G. Corrado and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013. T. Mikolov K. Chen G. Corrado and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013."},{"key":"e_1_2_1_28_1","first-page":"3111","volume-title":"NIPS","author":"Mikolov T.","year":"2013"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1690219.1690287"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1219097111"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/2367502.2367555"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3157794.3157797"},{"key":"e_1_2_1_33_1","first-page":"3567","volume-title":"NIPS 2016","author":"Ratner A. J.","year":"2016"},{"key":"e_1_2_1_34_1","first-page":"24","volume-title":"EMNLP","author":"Roth B.","year":"2013"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/938978.939133"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401965"},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"C. Sun A. Shrivastava S. Singh and A. Gupta. Revisiting unreasonable effectiveness of data in deep learning era. CoRR abs\/1707.02968 2017. C. Sun A. Shrivastava S. Singh and A. Gupta. Revisiting unreasonable effectiveness of data in deep learning era. CoRR abs\/1707.02968 2017.","DOI":"10.1109\/ICCV.2017.97"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.5555\/2390524.2390626"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2797962"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2016.7498228"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035931"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2732982"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2350229.2350263"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465280"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080786"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723739"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536336.2536337"},{"key":"e_1_2_1_49_1","first-page":"1260","volume-title":"International Conference on Neural Information Processing Systems","author":"Zhang Y.","year":"2014"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3055540.3055547"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2749430"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3275536.3275541","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T16:38:28Z","timestamp":1775320708000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3275536.3275541"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2018,9]]}},"alternative-id":["10.14778\/3275536.3275541"],"URL":"https:\/\/doi.org\/10.14778\/3275536.3275541","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2018,9]]}}}