{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,8]],"date-time":"2025-11-08T23:02:13Z","timestamp":1762642933370,"version":"3.28.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,7]]},"abstract":"<jats:p>Information Extraction (IE) from semi-structured web-pages is a long studied problem. Training a model for this extraction task requires a large number of human-labeled samples. Prior works have proposed transferable models to improve the label-efficiency of this training process. Extraction performance of transferable models however, depends on the size of their fine-tuning corpus. This holds true for large language models (LLM) such as GPT-3 as well. Generalist models like LLMs need to be fine-tuned on in-domain, human-labeled samples for competitive performance on this extraction task. Constructing a large-scale fine-tuning corpus with human-labeled samples, however, requires significant effort. In this paper, we develop a<jats:italic>Label-Efficient Self-Training Algorithm<\/jats:italic>(LEAST) to improve the label-efficiency of this fine-tuning process. Our contributions are two-fold.<jats:italic>First<\/jats:italic>, we develop a generative model that facilitates the construction of a large-scale fine-tuning corpus with minimal human-effort.<jats:italic>Second<\/jats:italic>, to ensure that the extraction performance does not suffer due to noisy training samples in our fine-tuning corpus, we develop an uncertainty-aware training strategy. Experiments on two publicly available datasets show that LEAST generalizes to multiple verticals and backbone models. Using LEAST, we can train models with less than ten human-labeled pages from each website, outperforming strong baselines while reducing the number of human-labeled training samples needed for comparable performance by up to 11<jats:italic>x.<\/jats:italic><\/jats:p>","DOI":"10.14778\/3611479.3611511","type":"journal-article","created":{"date-parts":[[2023,8,25]],"date-time":"2023-08-25T02:08:08Z","timestamp":1692929288000},"page":"3098-3110","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages"],"prefix":"10.14778","volume":"16","author":[{"given":"Ritesh","family":"Sarkhel","sequence":"first","affiliation":[{"name":"Amazon, Seattle, Washington"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Binxuan","family":"Huang","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, Washington"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Colin","family":"Lockard","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, Washington"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Prashant","family":"Shiralkar","sequence":"additional","affiliation":[{"name":"Amazon, Seattle, Washington"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.emnlp-main.130"},{"key":"e_1_2_1_2_1","unstructured":"Eleuther AI. 2021. The GPT-Neo 1.3B model. Accessed: 2023-04-05. Eleuther AI. 2021. The GPT-Neo 1.3B model. Accessed: 2023-04-05."},{"key":"e_1_2_1_3_1","volume-title":"Self-training: A survey. arXiv preprint arXiv:2202.12040","author":"Amini Massih-Reza","year":"2022","unstructured":"Massih-Reza Amini , Vasilii Feofanov , Loic Pauletto , Emilie Devijver , and Yury Maximov . 2022 . Self-training: A survey. arXiv preprint arXiv:2202.12040 (2022). Massih-Reza Amini, Vasilii Feofanov, Loic Pauletto, Emilie Devijver, and Yury Maximov. 2022. Self-training: A survey. arXiv preprint arXiv:2202.12040 (2022)."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICEEI.2017.8312458"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2857054"},{"key":"e_1_2_1_6_1","first-page":"805","article-title":"Extraction and integration of partially overlapping web sources","volume":"6","author":"Bronzi Mirko","year":"2013","unstructured":"Mirko Bronzi , Valter Crescenzi , Paolo Merialdo , and Paolo Papotti . 2013 . Extraction and integration of partially overlapping web sources . VLDB 6 , 10 (2013), 805 -- 816 . Mirko Bronzi, Valter Crescenzi, Paolo Merialdo, and Paolo Papotti. 2013. Extraction and integration of partially overlapping web sources. VLDB 6, 10 (2013), 805--816.","journal-title":"VLDB"},{"key":"e_1_2_1_7_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.152"},{"key":"e_1_2_1_9_1","volume-title":"XDoc: Unified Pre-training for Cross-Format Document Understanding. arXiv preprint arXiv:2210.02849","author":"Chen Jingye","year":"2022","unstructured":"Jingye Chen , Tengchao Lv , Lei Cui , Cha Zhang , and Furu Wei . 2022. XDoc: Unified Pre-training for Cross-Format Document Understanding. arXiv preprint arXiv:2210.02849 ( 2022 ). Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, and Furu Wei. 2022. XDoc: Unified Pre-training for Cross-Format Document Understanding. arXiv preprint arXiv:2210.02849 (2022)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.343"},{"key":"e_1_2_1_11_1","volume-title":"Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555","author":"Clark Kevin","year":"2020","unstructured":"Kevin Clark , Minh-Thang Luong , Quoc V Le , and Christopher D Manning . 2020 . Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020). Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020)."},{"key":"e_1_2_1_12_1","volume-title":"KBQA: learning question answering over QA corpora and knowledge bases. arXiv:1903.02419","author":"Cui Wanyun","year":"2019","unstructured":"Wanyun Cui , Yanghua Xiao , Haixun Wang , Yangqiu Song , Seung-won Hwang, and Wei Wang . 2019. KBQA: learning question answering over QA corpora and knowledge bases. arXiv:1903.02419 ( 2019 ). Wanyun Cui, Yanghua Xiao, Haixun Wang, Yangqiu Song, Seung-won Hwang, and Wei Wang. 2019. KBQA: learning question answering over QA corpora and knowledge bases. arXiv:1903.02419 (2019)."},{"key":"e_1_2_1_13_1","volume-title":"DOM-LM: Learning Generalizable Representations for HTML Documents. arXiv preprint arXiv:2201.10608","author":"Deng Xiang","year":"2022","unstructured":"Xiang Deng , Prashant Shiralkar , Colin Lockard , Binxuan Huang , and Huan Sun . 2022. DOM-LM: Learning Generalizable Representations for HTML Documents. arXiv preprint arXiv:2201.10608 ( 2022 ). Xiang Deng, Prashant Shiralkar, Colin Lockard, Binxuan Huang, and Huan Sun. 2022. DOM-LM: Learning Generalizable Representations for HTML Documents. arXiv preprint arXiv:2201.10608 (2022)."},{"key":"e_1_2_1_14_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171--4186.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL. 4171--4186."},{"key":"e_1_2_1_15_1","volume-title":"Horn","author":"Dong Xin","year":"2014","unstructured":"Xin Dong , Evgeniy Gabrilovich , Geremy Heitz , and Wilko et al. Horn . 2014 . Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD. 601--610. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, and Wilko et al. Horn. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD. 601--610."},{"key":"e_1_2_1_16_1","volume-title":"Structured information extraction from complex scientific text with fine-tuned large language models. arXiv preprint arXiv:2212.05238","author":"Dunn Alexander","year":"2022","unstructured":"Alexander Dunn , John Dagdelen , Nicholas Walker , Sanghoon Lee , Andrew S Rosen , Gerbrand Ceder , Kristin Persson , and Anubhav Jain . 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv preprint arXiv:2212.05238 ( 2022 ). Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S Rosen, Gerbrand Ceder, Kristin Persson, and Anubhav Jain. 2022. Structured information extraction from complex scientific text with fine-tuned large language models. arXiv preprint arXiv:2212.05238 (2022)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2699442"},{"key":"e_1_2_1_18_1","volume-title":"Classification in the presence of label noise: a survey","author":"Fr\u00e9nay Beno\u00eet","year":"2013","unstructured":"Beno\u00eet Fr\u00e9nay and Michel Verleysen . 2013. Classification in the presence of label noise: a survey . IEEE transactions on neural networks and learning systems 25, 5 ( 2013 ), 845--869. Beno\u00eet Fr\u00e9nay and Michel Verleysen. 2013. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5 (2013), 845--869."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v36i1.2567"},{"volume-title":"Web-scale information extraction with vertex","author":"Gulhane Pankaj","key":"e_1_2_1_20_1","unstructured":"Pankaj Gulhane , Amit Madaan , Rupesh Mehta , Jeyashankher Ramamirtham , Rajeev Rastogi , Sandeep Satpal , Srinivasan H Sengamedu , Ashwin Tengli , and Charu Tiwari . 2011. Web-scale information extraction with vertex . In ICDE. IEEE , 1209--1220. Pankaj Gulhane, Amit Madaan, Rupesh Mehta, Jeyashankher Ramamirtham, Rajeev Rastogi, Sandeep Satpal, Srinivasan H Sengamedu, Ashwin Tengli, and Charu Tiwari. 2011. Web-scale information extraction with vertex. In ICDE. IEEE, 1209--1220."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3532086"},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Qiang Hao Rui Cai Yanwei Pang and Lei Zhang. 2011. From one tree to a forest: a unified solution for structured web data extraction. In SIGIR. 775--784. Qiang Hao Rui Cai Yanwei Pang and Lei Zhang. 2011. From one tree to a forest: a unified solution for structured web data extraction. In SIGIR. 775--784.","DOI":"10.1145\/2009916.2010020"},{"key":"e_1_2_1_23_1","volume-title":"Using trusted data to train deep networks on labels corrupted by severe noise. arXiv:1802.05300","author":"Hendrycks Dan","year":"2018","unstructured":"Dan Hendrycks , Mantas Mazeika , Duncan Wilson , and Kevin Gimpel . 2018. Using trusted data to train deep networks on labels corrupted by severe noise. arXiv:1802.05300 ( 2018 ). Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. arXiv:1802.05300 (2018)."},{"key":"e_1_2_1_24_1","volume-title":"Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML. PMLR, 2304--2313.","author":"Jiang Lu","year":"2018","unstructured":"Lu Jiang , Zhengyuan Zhou , Thomas Leung , Li-Jia Li , and Li Fei-Fei . 2018 . Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML. PMLR, 2304--2313. Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, and Li Fei-Fei. 2018. Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels. In ICML. PMLR, 2304--2313."},{"key":"e_1_2_1_25_1","unstructured":"D. Kreines and B. Laskey. 1999. Oracle Database Administration: The Essential Refe. O'Reilly Media Incorporated. https:\/\/books.google.com\/books?id=WVC-R0gdl0kC D. Kreines and B. Laskey. 1999. Oracle Database Administration: The Essential Refe. O'Reilly Media Incorporated. https:\/\/books.google.com\/books?id=WVC-R0gdl0kC"},{"key":"e_1_2_1_26_1","volume-title":"Wrapper induction: Efficiency and expressiveness. Artificial intelligence 118, 1--2","author":"Kushmerick Nicholas","year":"2000","unstructured":"Nicholas Kushmerick . 2000. Wrapper induction: Efficiency and expressiveness. Artificial intelligence 118, 1--2 ( 2000 ), 15--68. Nicholas Kushmerick. 2000. Wrapper induction: Efficiency and expressiveness. Artificial intelligence 118, 1--2 (2000), 15--68."},{"key":"e_1_2_1_27_1","volume-title":"Markuplm: Pre-training of text and markup language for visually-rich document understanding. arXiv preprint arXiv:2110.08518","author":"Li Junlong","year":"2021","unstructured":"Junlong Li , Yiheng Xu , Lei Cui , and Furu Wei . 2021 . Markuplm: Pre-training of text and markup language for visually-rich document understanding. arXiv preprint arXiv:2110.08518 (2021). Junlong Li, Yiheng Xu, Lei Cui, and Furu Wei. 2021. Markuplm: Pre-training of text and markup language for visually-rich document understanding. arXiv preprint arXiv:2110.08518 (2021)."},{"key":"e_1_2_1_28_1","first-page":"10276","article-title":"Learning to self-train for semi-supervised few-shot classification","volume":"32","author":"Li Xinzhe","year":"2019","unstructured":"Xinzhe Li , Qianru Sun , Yaoyao Liu , Qin Zhou , Shibao Zheng , Tat-Seng Chua , and Bernt Schiele . 2019 . Learning to self-train for semi-supervised few-shot classification . NeurIPS 32 (2019), 10276 -- 10286 . Xinzhe Li, Qianru Sun, Yaoyao Liu, Qin Zhou, Shibao Zheng, Tat-Seng Chua, and Bernt Schiele. 2019. Learning to self-train for semi-supervised few-shot classification. NeurIPS 32 (2019), 10276--10286.","journal-title":"NeurIPS"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Bill Yuchen Lin Ying Sheng Nguyen Vo and Sandeep Tata. 2020. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents. In SIGKDD. 1092--1102. Bill Yuchen Lin Ying Sheng Nguyen Vo and Sandeep Tata. 2020. FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents. In SIGKDD. 1092--1102.","DOI":"10.1145\/3394486.3403153"},{"key":"e_1_2_1_30_1","volume-title":"Arash Einolghozati, and Prashant Shiralkar.","author":"Lockard Colin","year":"2018","unstructured":"Colin Lockard , Xin Luna Dong , Arash Einolghozati, and Prashant Shiralkar. 2018 . Ceres : Distantly supervised relation extraction from the semi-structured web. arXiv:1804.04635 (2018). Colin Lockard, Xin Luna Dong, Arash Einolghozati, and Prashant Shiralkar. 2018. Ceres: Distantly supervised relation extraction from the semi-structured web. arXiv:1804.04635 (2018)."},{"key":"e_1_2_1_31_1","volume-title":"Xin Luna Dong, and Hannaneh Hajishirzi","author":"Lockard Colin","year":"2020","unstructured":"Colin Lockard , Prashant Shiralkar , Xin Luna Dong, and Hannaneh Hajishirzi . 2020 . ZeroShotCeres : Zero-shot relation extraction from semi-structured web-pages. arXiv:2005.07105 (2020). Colin Lockard, Prashant Shiralkar, Xin Luna Dong, and Hannaneh Hajishirzi. 2020. ZeroShotCeres: Zero-shot relation extraction from semi-structured web-pages. arXiv:2005.07105 (2020)."},{"key":"e_1_2_1_32_1","volume-title":"Decoupled Weight Decay Regularization. In International Conference on Learning Representations.","author":"Loshchilov Ilya","year":"2018","unstructured":"Ilya Loshchilov and Frank Hutter . 2018 . Decoupled Weight Decay Regularization. In International Conference on Learning Representations. Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations."},{"key":"e_1_2_1_33_1","volume-title":"Steven Bethard, and David McClosky.","author":"Manning Christopher D","year":"2014","unstructured":"Christopher D Manning , Mihai Surdeanu , John Bauer , Jenny Rose Finkel , Steven Bethard, and David McClosky. 2014 . The Stanford CoreNLP natural language processing toolkit. In 52nd ACL. 55--60. Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In 52nd ACL. 55--60."},{"key":"e_1_2_1_34_1","volume-title":"Uncertainty-aware self-training for few-shot text classification. NeurIPS 33","author":"Mukherjee Subhabrata","year":"2020","unstructured":"Subhabrata Mukherjee and Ahmed Awadallah . 2020. Uncertainty-aware self-training for few-shot text classification. NeurIPS 33 ( 2020 ). Subhabrata Mukherjee and Ahmed Awadallah. 2020. Uncertainty-aware self-training for few-shot text classification. NeurIPS 33 (2020)."},{"key":"e_1_2_1_35_1","unstructured":"S.B. Navathe W. Wu S. Shekhar X. Du X.S. Wang and H. Xiong. 2016. Database Systems for Advanced Applications: 21st International Conference DASFAA 2016 Dallas TX USA April 16--19 2016 Proceedings Part I. Springer International Publishing. https:\/\/books.google.com\/books?id=Ka7WCwAAQBAJ S.B. Navathe W. Wu S. Shekhar X. Du X.S. Wang and H. Xiong. 2016. Database Systems for Advanced Applications: 21st International Conference DASFAA 2016 Dallas TX USA April 16--19 2016 Proceedings Part I. Springer International Publishing. https:\/\/books.google.com\/books?id=Ka7WCwAAQBAJ"},{"key":"e_1_2_1_36_1","volume-title":"Webred: Effective pretraining and finetuning for relation extraction on the web. arXiv preprint arXiv:2102.09681","author":"Ormandi Robert","year":"2021","unstructured":"Robert Ormandi , Mohammad Saleh , Erin Winter , and Vinay Rao . 2021 . Webred: Effective pretraining and finetuning for relation extraction on the web. arXiv preprint arXiv:2102.09681 (2021). Robert Ormandi, Mohammad Saleh, Erin Winter, and Vinay Rao. 2021. Webred: Effective pretraining and finetuning for relation extraction on the web. arXiv preprint arXiv:2102.09681 (2021)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.191"},{"key":"e_1_2_1_38_1","unstructured":"Mengye Ren Wenyuan Zeng Bin Yang and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In ICML. PMLR 4334--4343. Mengye Ren Wenyuan Zeng Bin Yang and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In ICML. PMLR 4334--4343."},{"volume-title":"56th ACL. 1044--1054.","author":"Ruder Sebastian","key":"e_1_2_1_39_1","unstructured":"Sebastian Ruder and Barbara Plank . 2018. Strong Baselines for Neural Semi-Supervised Learning under Domain Shift . In 56th ACL. 1044--1054. Sebastian Ruder and Barbara Plank. 2018. Strong Baselines for Neural Semi-Supervised Learning under Domain Shift. In 56th ACL. 1044--1054."},{"key":"e_1_2_1_40_1","volume-title":"Interpretable multi-headed attention for abstractive summarization at controllable lengths. arXiv preprint arXiv:2002.07845","author":"Sarkhel Ritesh","year":"2020","unstructured":"Ritesh Sarkhel , Moniba Keymanesh , Arnab Nandi , and Srinivasan Parthasarathy . 2020. Interpretable multi-headed attention for abstractive summarization at controllable lengths. arXiv preprint arXiv:2002.07845 ( 2020 ). Ritesh Sarkhel, Moniba Keymanesh, Arnab Nandi, and Srinivasan Parthasarathy. 2020. Interpretable multi-headed attention for abstractive summarization at controllable lengths. arXiv preprint arXiv:2002.07845 (2020)."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/3367471.3367508"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3319867"},{"key":"e_1_2_1_43_1","volume-title":"Improving information extraction from visually rich documents using visual span representations. VLDB 14, 5","author":"Sarkhel Ritesh","year":"2021","unstructured":"Ritesh Sarkhel and Arnab Nandi . 2021. Improving information extraction from visually rich documents using visual span representations. VLDB 14, 5 ( 2021 ). Ritesh Sarkhel and Arnab Nandi. 2021. Improving information extraction from visually rich documents using visual span representations. VLDB 14, 5 (2021)."},{"key":"e_1_2_1_44_1","volume-title":"Cross-modal entity matching for visually rich documents. arXiv preprint arXiv:2303.00720","author":"Sarkhel Ritesh","year":"2023","unstructured":"Ritesh Sarkhel and Arnab Nandi . 2023. Cross-modal entity matching for visually rich documents. arXiv preprint arXiv:2303.00720 ( 2023 ). Ritesh Sarkhel and Arnab Nandi. 2023. Cross-modal entity matching for visually rich documents. arXiv preprint arXiv:2303.00720 (2023)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1965.1053799"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2012.135"},{"key":"e_1_2_1_47_1","volume-title":"Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv:1703.01780","author":"Tarvainen Antti","year":"2017","unstructured":"Antti Tarvainen and Harri Valpola . 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv:1703.01780 ( 2017 ). Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv:1703.01780 (2017)."},{"key":"e_1_2_1_48_1","volume-title":"Attention is all you need. NeurIPS 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N Gomez , \u0141ukasz Kaiser , and Illia Polosukhin . 2017. Attention is all you need. NeurIPS 30 ( 2017 ). Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS 30 (2017)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"crossref","unstructured":"Hongwei Wang Fuzheng Zhang Miao Zhao Wenjie Li Xing Xie and Minyi Guo. 2019. Multi-task feature learning for knowledge graph enhanced recommendation. In WWW. 2000--2010. Hongwei Wang Fuzheng Zhang Miao Zhao Wenjie Li Xing Xie and Minyi Guo. 2019. Multi-task feature learning for knowledge graph enhanced recommendation. In WWW. 2000--2010.","DOI":"10.1145\/3308558.3313411"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3394486.3403047"},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Yaqing Wang Subhabrata Mukherjee Haoda Chu Yuancheng Tu Ming Wu Jing Gao and Ahmed Hassan Awadallah. 2021. Meta Self-training for Few-shot Neural Sequence Labeling. In SIGKDD. 1737--1747. Yaqing Wang Subhabrata Mukherjee Haoda Chu Yuancheng Tu Ming Wu Jing Gao and Ahmed Hassan Awadallah. 2021. Meta Self-training for Few-shot Neural Sequence Labeling. In SIGKDD. 1737--1747.","DOI":"10.1145\/3447548.3467235"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3183729"},{"key":"e_1_2_1_53_1","doi-asserted-by":"crossref","unstructured":"Yanhong Zhai and Bing Liu. 2005. Web data extraction based on partial tree alignment. In WWW. 76--85. Yanhong Zhai and Bing Liu. 2005. Web data extraction based on partial tree alignment. In WWW. 76--85.","DOI":"10.1145\/1060745.1060761"},{"key":"e_1_2_1_54_1","volume-title":"Simplified DOM Trees for Transferable Attribute Extraction from the Web. arXiv:2101.02415","author":"Zhou Yichao","year":"2021","unstructured":"Yichao Zhou , Ying Sheng , Nguyen Vo , Nick Edmonds , and Sandeep Tata . 2021. Simplified DOM Trees for Transferable Attribute Extraction from the Web. arXiv:2101.02415 ( 2021 ). Yichao Zhou, Ying Sheng, Nguyen Vo, Nick Edmonds, and Sandeep Tata. 2021. Simplified DOM Trees for Transferable Attribute Extraction from the Web. arXiv:2101.02415 (2021)."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3611479.3611511","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,26]],"date-time":"2024-10-26T21:03:49Z","timestamp":1729976629000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3611479.3611511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,7]]},"references-count":54,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,7]]}},"alternative-id":["10.14778\/3611479.3611511"],"URL":"https:\/\/doi.org\/10.14778\/3611479.3611511","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2023,7]]},"assertion":[{"value":"2023-08-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}