{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T07:33:14Z","timestamp":1767771194372,"version":"3.41.2"},"reference-count":45,"publisher":"Emerald","issue":"2","license":[{"start":{"date-parts":[[2020,2,23]],"date-time":"2020-02-23T00:00:00Z","timestamp":1582416000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.emerald.com\/insight\/site-policies"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AJIM"],"published-print":{"date-parts":[[2020,2,23]]},"abstract":"<jats:sec><jats:title content-type=\"abstract-subheading\">Purpose<\/jats:title><jats:p>The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Design\/methodology\/approach<\/jats:title><jats:p>The proposed cooperative crowdsourcing framework (CCF) uses both human\u2013computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Findings<\/jats:title><jats:p>The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human\u2013computer collaboration by considering the specialization of workers in different categories of tasks.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Research limitations\/implications<\/jats:title><jats:p>This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Practical implications<\/jats:title><jats:p>The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH.<\/jats:p><\/jats:sec><jats:sec><jats:title content-type=\"abstract-subheading\">Originality\/value<\/jats:title><jats:p>CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human\u2013computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.<\/jats:p><\/jats:sec>","DOI":"10.1108\/ajim-07-2019-0192","type":"journal-article","created":{"date-parts":[[2020,3,3]],"date-time":"2020-03-03T05:09:03Z","timestamp":1583212143000},"page":"243-261","source":"Crossref","is-referenced-by-count":11,"title":["A cooperative crowdsourcing framework for knowledge extraction in digital humanities \u2013 cases on Tang poetry"],"prefix":"10.1108","volume":"72","author":[{"given":"Liang","family":"Hong","sequence":"first","affiliation":[]},{"given":"Wenjun","family":"Hou","sequence":"additional","affiliation":[]},{"given":"Zonghui","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Huijie","family":"Han","sequence":"additional","affiliation":[]}],"member":"140","reference":[{"issue":"1","key":"key2020042013533534400_ref001","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/MIS.2003.1179189","article-title":"Automatic ontology-based knowledge extraction from web documents","volume":"18","year":"2003","journal-title":"IEEE Intelligent Systems"},{"key":"key2020042013533534400_ref002","unstructured":"Beijing Normal University (2018), \u201cGarden of Tang poetry\u201d, available at: http:\/\/poem.studentsystem.org\/(accessed 1 May 2019)."},{"first-page":"438","article-title":"Visual recognition with humans in the loop","year":"2010","key":"key2020042013533534400_ref003"},{"key":"key2020042013533534400_ref004","unstructured":"British Library (2010), \u201cCapturing the sounds of the UK\u201d, available at: https:\/\/www.bl.uk\/press-releases\/2010\/august\/capturing-the-sounds-of-the-uk\/ (accessed 9 April 2012)."},{"key":"key2020042013533534400_ref005","first-page":"858","article-title":"Predicting concrete and abstract entities in modern poetry","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","year":"2019"},{"journal-title":"Museums and the Web 2013 Conference","article-title":"Digital humanities and crowdsourcing: an exploration","year":"2013","key":"key2020042013533534400_ref006"},{"article-title":"The actortopic model for extracting social networks in literary narrative","volume-title":"NIPS Workshop: Machine Learning for Social Computing","year":"2010","key":"key2020042013533534400_ref044"},{"key":"key2020042013533534400_ref007","first-page":"87","article-title":"Tagging and Searching\u2013Serendipity and museum collection databases","year":"2007","journal-title":"Museums and the Web 2007 Conference"},{"key":"key2020042013533534400_ref009","first-page":"746","article-title":"Reducing labeling effort for structured prediction tasks","volume":"5","year":"2005","journal-title":"AAAI"},{"key":"key2020042013533534400_ref011","first-page":"27","article-title":"Adapting nlp and corpus analysis techniques to structured imagery analysis in classical Chinese poetry","volume-title":"Proceedings of the Workshop on Adaptation of Language Resources and Technology to New Domains","year":"2009"},{"issue":"2","key":"key2020042013533534400_ref010","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s10115-012-0507-8","article-title":"A survey on instance selection for active learning","volume":"35","year":"2013","journal-title":"Knowledge and Information Systems"},{"key":"key2020042013533534400_ref012","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1145\/2212776.2212794","article-title":"User-driven collaborative intelligence: social networks as crowdsourcing ecosystems","volume-title":"CHI'12 Extended Abstracts on Human Factors in Computing Systems","year":"2012"},{"key":"key2020042013533534400_ref015","first-page":"172","article-title":"Named entity recognition with long short-term memory","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4","year":"2003"},{"key":"key2020042013533534400_ref013","unstructured":"Harvard University (2001), \u201cChina historical geographic information system project history\u201d, available at: http:\/\/www.fas.harvard.edu\/\u223cchgis\/ (accessed 25 July 2019)."},{"key":"key2020042013533534400_ref014","unstructured":"Harvard University (2008), \u201cChina biographical database project history\u201d, available at: https:\/\/projects.iq.harvard.edu\/cbdb (accessed 28 July 2019)."},{"key":"key2020042013533534400_ref016","first-page":"81","article-title":"Quite right, dear and interesting: seeking the sentimental in nineteenth century American fiction","year":"2006","journal-title":"Digital Humanities Conference 2006"},{"key":"key2020042013533534400_ref017","first-page":"152","article-title":"Chinese word segmentation based on contextual entropy","volume-title":"Proceedings of the 17th Pacific Asia Conference on Language","year":"2003"},{"year":"2011","key":"key2020042013533534400_ref018","article-title":"In search of quality in crowdsourcing for search engine evaluation"},{"key":"key2020042013533534400_ref019","first-page":"580","article-title":"A fuzzy k-nearest neighbor algorithm","volume":"4","year":"1985","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"article-title":"Crowdsourcing user studies with Mechanical Turk","volume-title":"Proceedings of the SIGCHI Conference on Human Factors in Computing Systems","year":"2008","key":"key2020042013533534400_ref020"},{"key":"key2020042013533534400_ref021","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1126\/science.231.4736.335","article-title":"Shakespeare's new poem: an ode to statistics","volume":"231","year":"1986","journal-title":"Science"},{"issue":"1","key":"key2020042013533534400_ref022","first-page":"82","article-title":"Syntactic patterns in classical Chinese poems: a quantitative study","volume":"33","year":"2017","journal-title":"Digital Scholarship in the Humanities"},{"issue":"2","key":"key2020042013533534400_ref0025","doi-asserted-by":"crossref","first-page":"167","DOI":"10.3233\/SW-140134","article-title":"DBpedia\u2013a large-scale, multilingual knowledge base extracted from Wikipedia","volume":"6","year":"2015","journal-title":"Semantic Web"},{"issue":"5","key":"key2020042013533534400_ref023","first-page":"39","article-title":"Review and reflection on the study of literature in the Tang dynasty in the past 30 years","volume":"47","year":"2010","journal-title":"Journal of Northwest Normal University (Social Science Edition)"},{"issue":"1","key":"key2020042013533534400_ref024","first-page":"11","article-title":"CWAAP: an authorship attribution forensic platform for Chinese web information","volume":"9","year":"2014","journal-title":"JSW"},{"issue":"4","key":"key2020042013533534400_ref025","doi-asserted-by":"crossref","first-page":"15","DOI":"10.5121\/ijnlc.2012.1402","article-title":"Named entity recognition using hidden Markov model (HMM)","volume":"1","year":"2012","journal-title":"International Journal on Natural Language Computing"},{"key":"key2020042013533534400_ref026","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1145\/2103354.2103373","article-title":"Crowdsourcing in the cultural heritage domain: opportunities and challenges","volume-title":"Proceedings of the 5th International Conference on Communities and Technologies","year":"2011"},{"issue":"1","key":"key2020042013533534400_ref027","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1111\/cura.12012","article-title":"Digital cultural heritage and the crowd","volume":"56","year":"2013","journal-title":"Curator: The Museum Journal"},{"key":"key2020042013533534400_ref028","unstructured":"Peking University (2018), \u201cAcademic inheritance knowledge graph in Song dynasty\u201d, available at: http:\/\/dh.kvlab.org\/cbdb_kg\/ (accessed 1 May 2019)."},{"key":"key2020042013533534400_ref029","first-page":"140","article-title":"Exploring erotic's in Emily Dickinson's correspondence with text mining and visual interfaces","volume-title":"Proceedings of the 6th ACM\/IEEE-CS Joint Conference on Digital Libraries","year":"2006"},{"issue":"4","key":"key2020042013533534400_ref030","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1111\/cura.12046","article-title":"From tagging to theorizing: deepening engagement with cultural heritage through crowdsourcing","volume":"56","year":"2013","journal-title":"Curator: The Museum Journal"},{"first-page":"525","article-title":"Crowdmap: crowdsourcing ontology alignment with microtasks","year":"2012","key":"key2020042013533534400_ref031"},{"issue":"3","key":"key2020042013533534400_ref032","first-page":"2","article-title":"Big? Smart? Clean? Messy? Data in the humanities","volume":"2","year":"2013","journal-title":"Journal of Digital Humanities"},{"first-page":"1223","article-title":"Open Mind common sense: knowledge acquisition from the general public","year":"2002","key":"key2020042013533534400_ref033"},{"first-page":"611","article-title":"Active feature selection based on a very limited number of entities","year":"2003","key":"key2020042013533534400_ref034"},{"key":"key2020042013533534400_ref008","first-page":"670","article-title":"Unsupervised identification of text reuse in early Chinese literature","volume-title":"Digital Scholarship in the Humanities","year":"2018"},{"key":"key2020042013533534400_ref035","unstructured":"Wang, Z. (2018), \u201cChronicle map of literatures in Tang and Song dynasties\u201d, available at: https:\/\/sou-yun.cn\/PoetLifeMap.aspx (accessed 30 July 2019)."},{"key":"key2020042013533534400_ref036","unstructured":"Yoshimura, K. and Shein, C. (2011), \u201cSocial metadata for libraries, archives and museums Part 1: site reviews\u201d, available at: http:\/\/www.oclc.org\/research\/publications\/library\/2011\/2011-02.pdf\/ (accessed 10 November 2019)."},{"key":"key2020042013533534400_ref037","doi-asserted-by":"crossref","first-page":"49","DOI":"10.2307\/2718714","article-title":"Syntax, diction, and imagery in T'ang poetry","volume":"31","year":"1971","journal-title":"Harvard Journal of Asiatic Studies"},{"issue":"1","key":"key2020042013533534400_ref038","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1515\/jdis-2017-0001","article-title":"Smart data for digital humanities","volume":"2","year":"2017","journal-title":"Journal of Data and Information Science"},{"key":"key2020042013533534400_ref039","first-page":"951","article-title":"Motivations of volunteers in the Transcribe Sheng project: a grounded theory approach","volume-title":"Proceedings of the Association for Information Science and Technology","year":"2018"},{"key":"key2020042013533534400_ref040","first-page":"162","article-title":"An improved Chinese word segmentation system with conditional random field","volume-title":"Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing","year":"2006"},{"key":"key2020042013533534400_ref041","first-page":"361","article-title":"Docs: a domain-aware crowdsourcing system using knowledge bases","volume-title":"Proceedings of the VLDB Endowment","year":"2016"},{"key":"key2020042013533534400_ref042","unstructured":"Zhengzhou University (2008), \u201cThe complete Tang poetry database\u201d, available at: http:\/\/www3.zzu.edu.cn\/qts\/ (accessed 1 July 2019)."},{"issue":"1","key":"key2020042013533534400_ref043","first-page":"61","article-title":"Tang poetry and historical geography","year":"2007","journal-title":"Yindu Academic Journal"}],"container-title":["Aslib Journal of Information Management"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/AJIM-07-2019-0192\/full\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.emerald.com\/insight\/content\/doi\/10.1108\/AJIM-07-2019-0192\/full\/html","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,24]],"date-time":"2025-07-24T23:00:58Z","timestamp":1753398058000},"score":1,"resource":{"primary":{"URL":"http:\/\/www.emerald.com\/ajim\/article\/72\/2\/243-261\/20992"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,23]]},"references-count":45,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,2,23]]}},"alternative-id":["10.1108\/AJIM-07-2019-0192"],"URL":"https:\/\/doi.org\/10.1108\/ajim-07-2019-0192","relation":{},"ISSN":["2050-3806"],"issn-type":[{"type":"print","value":"2050-3806"}],"subject":[],"published":{"date-parts":[[2020,2,23]]}}}