{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,27]],"date-time":"2025-07-27T07:21:56Z","timestamp":1753600916702,"version":"3.41.0"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,4,25]],"date-time":"2016-04-25T00:00:00Z","timestamp":1461542400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2016,7,14]]},"abstract":"<jats:p>Parsing the semantic structure of a web page is a key component of web information extraction. Successful extraction algorithms usually require large-scale training and evaluation datasets, which are difficult to acquire. Recently, crowdsourcing has proven to be an effective method of collecting large-scale training data in domains that do not require much domain knowledge. For more complex domains, researchers have proposed sophisticated quality control mechanisms to replicate tasks in parallel or sequential ways and then aggregate responses from multiple workers. Conventional annotation integration methods often put more trust in the workers with high historical performance; thus, they are called performance-based methods. Recently, Rzeszotarski and Kittur have demonstrated that behavioral features are also highly correlated with annotation quality in several crowdsourcing applications. In this article, we present a new crowdsourcing system, called Wernicke, to provide annotations for web information extraction. Wernicke collects a wide set of behavioral features and, based on these features, predicts annotation quality for a challenging task domain: annotating web page structure. We evaluate the effectiveness of quality control using behavioral features through a case study where 32 workers annotate 200 Q&amp;A web pages from five popular websites. In doing so, we discover several things: (1) Many behavioral features are significant predictors for crowdsourcing quality. (2) The behavioral-feature-based method outperforms performance-based methods in recall prediction, while performing equally with precision prediction. In addition, using behavioral features is less vulnerable to the cold-start problem, and the corresponding prediction model is more generalizable for predicting recall than precision for cross-website quality analysis. (3) One can effectively combine workers\u2019 behavioral information and historical performance information to further reduce prediction errors.<\/jats:p>","DOI":"10.1145\/2870649","type":"journal-article","created":{"date-parts":[[2016,4,25]],"date-time":"2016-04-25T19:51:13Z","timestamp":1461613873000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Crowdsourcing Human Annotation on Web Page Structure"],"prefix":"10.1145","volume":"7","author":[{"given":"Shuguang","family":"Han","sequence":"first","affiliation":[{"name":"University of Pittsburgh, Pittsburgh, PA"}]},{"given":"Peng","family":"Dai","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"Praveen","family":"Paritosh","sequence":"additional","affiliation":[{"name":"Google"}]},{"given":"David","family":"Huynh","sequence":"additional","affiliation":[{"name":"Google"}]}],"member":"320","published-online":{"date-parts":[[2016,4,25]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"International Joint Conference on Artificial Intelligence","volume":"7","author":"Banko Michele","year":"2007"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1866029.1866078"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.152"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2013.06.002"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2675133.2675260"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.2307\/2346806"},{"volume-title":"Proceedings of the 22nd Annual Conference on Learning Theory.","year":"2009","author":"Dekel Ofer","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/1753326.1753688"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2484028.2484100"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2738036"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1060745.1060762"},{"key":"e_1_2_1_12_1","first-page":"1","article-title":"The rise of crowdsourcing","volume":"14","author":"Howe Jeff","year":"2006","journal-title":"Wired Magazine"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.22"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1978942.1979125"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/11574620_31"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1166253.1166274"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijforecast.2006.03.001"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837885.1837906"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-9884.00122"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1357054.1357127"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718542"},{"key":"e_1_2_1_22_1","unstructured":"Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014).  Hongwei Li and Bin Yu. 2014. Error rate bounds and iterative weighted majority voting for crowdsourcing. arXiv preprint arXiv:1411.4086 (2014)."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1600150.1600159"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-007-0090-8"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380116.2380125"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/2047196.2047199"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871447"},{"volume-title":"Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 102--107","year":"2012","author":"Stenetorp Pontus","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1401890.1401978"},{"volume-title":"Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 118--127","author":"Wu Fei","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2645710.2645724"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2870649","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2870649","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:39:15Z","timestamp":1750221555000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2870649"}},"subtitle":["Infrastructure Design and Behavior-Based Quality Control"],"short-title":[],"issued":{"date-parts":[[2016,4,25]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,7,14]]}},"alternative-id":["10.1145\/2870649"],"URL":"https:\/\/doi.org\/10.1145\/2870649","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2016,4,25]]},"assertion":[{"value":"2015-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-04-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}