{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:12:34Z","timestamp":1760242354767,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2017,5,22]],"date-time":"2017-05-22T00:00:00Z","timestamp":1495411200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National High-tech R&amp;D Program of China","award":["2015AA7115028","2015AA7115061"],"award-info":[{"award-number":["2015AA7115028","2015AA7115061"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>The intensive construction of domain-specific knowledge bases (DSKB) has posed an urgent demand for researches about domain-specific entity detection and linking (DSEDL). Joint models are usually adopted in DSEDL tasks, but data imbalance and high computational complexity exist in these models. Besides, traditional feature representation methods are insufficient for domain-specific tasks, due to problems such as lack of labeled data, link sparseness in DSKBs, and so on. In this paper, a two-stage joint (TSJ) model is proposed to solve the data imbalance problem by discriminatively processing entity mentions with different degrees of ambiguity. In addition, three novel methods are put forward to generate effective features by incorporating an unlabeled corpus. One crucial feature involving entity detection is the mention type, extracted by a long short-term memory (LSTM) model trained on automatically annotated data. The other two types of features mainly involve entity linking, including the inner-document topical coherence, which is measured based on entity co-occurring relationships in the corpus, and the cross-document entity coherence evaluated using similar documents. An overall 74.26% F1 value is obtained on a dataset of real-world movie comments, demonstrating the effectiveness of the proposed approach and indicating its potentiality to be used in real-world domain-specific applications.<\/jats:p>","DOI":"10.3390\/info8020059","type":"journal-article","created":{"date-parts":[[2017,5,23]],"date-time":"2017-05-23T01:47:33Z","timestamp":1495504053000},"page":"59","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["A Two-Stage Joint Model for Domain-Specific Entity Detection and Linking Leveraging an Unlabeled Corpus"],"prefix":"10.3390","volume":"8","author":[{"given":"Hongzhi","family":"Zhang","sequence":"first","affiliation":[{"name":"Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Weili","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"University of Chinese Academy of Sciences, Beijing 100049, China"}]},{"given":"Tinglei","family":"Huang","sequence":"additional","affiliation":[{"name":"Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Xiao","family":"Liang","sequence":"additional","affiliation":[{"name":"Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}]},{"given":"Kun","family":"Fu","sequence":"additional","affiliation":[{"name":"Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2017,5,22]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1109\/TKDE.2014.2327028","article-title":"Entity linking with a knowledge base: Issues, techniques, and solutions","volume":"27","author":"Shen","year":"2015","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_2","unstructured":"Gottipati, S., and Jiang, J. (2011, January 27\u201331). Linking entities to a knowledge base with query expansion. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_3","unstructured":"Han, X., and Sun, L. (2011, January 19\u201324). A generative entity-mention model for linking entities with knowledge base. Proceedings of the Meeting of the Association for Computational Linguistics, Oregon, Portland."},{"key":"ref_4","unstructured":"Han, X., and Sun, L. (2012, January 12\u201314). An entity-topic model for entity linking. Proceedings of the Empirical Methods in Natural Language Processing, Jeju Island, Korea."},{"key":"ref_5","unstructured":"Zhang, W., Sim, Y.C., Su, J., and Tan, C.L. (2011, January 19\u201322). Entity linking with effective acronym expansion, instance selection and topic modeling. Proceedings of the International Joint Conference on Artificial Intelligence, Barcelona, Spain."},{"key":"ref_6","unstructured":"Zhang, W., Su, J., Tan, C.L., and Wang, W.T. (2010, January 23\u201327). Entity linking leveraging: Automatically generated annotation. Proceedings of the International Conference on Computational Linguistics, Beijing China."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Zhang, J., Li, J., Li, X., Shi, Y., Li, J., and Wang, Z. (2016, January 16\u201319). Domain-specific Entity Linking via Fake Named Entity Detection. Proceedings of the Database Systems for Advanced Applications, Dallas, TX, USA.","DOI":"10.1007\/978-3-319-32025-0_7"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Li, Y., Tan, S., Sun, H., Han, J., Roth, D., and Yan, X. (2016, January 11\u201315). Entity disambiguation with linkless knowledge bases. Proceedings of the International World Wide Web Conferences, Montreal, Canada.","DOI":"10.1145\/2872427.2883068"},{"key":"ref_9","unstructured":"Guo, S., Chang, M., and Kiciman, E. (2013, January 9\u201314). To link or not to link? A study on end-to-end tweet entity linking. Proceedings of the North American Chapter of the Association for Computational Linguistics, Atlanta, GA, USA."},{"key":"ref_10","unstructured":"Sil, A., and Yates, A. (November, January 27). Re-ranking for joint named-entity recognition and linking. Proceedings of the Conference on Information and Knowledge Management, San Francisco, CA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Han, X., Sun, L., and Zhao, J. (2011, January 24\u201328). Collective entity linking in web text: A graph-based method. Proceedings of the International Acm Sigir Conference on Research and Development in Information Retrieval, Beijing, China.","DOI":"10.1145\/2009916.2010019"},{"key":"ref_12","unstructured":"Ratinov, L., Roth, D., Downey, D., and Anderson, M.R. (2011, January 19\u201324). Local and global algorithms for disambiguation to wikipedia. Proceedings of the Meeting of the Association for Computational Linguistics, Oregon, Portland."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Shen, W., Wang, J., Luo, P., and Wang, M. (2012, January 16\u201320). Linden: Linking named entities with knowledge base via semantic knowledge. Proceedings of the International World Wide Web Conferences, Lyon, France.","DOI":"10.1145\/2187836.2187898"},{"key":"ref_14","unstructured":"Liu, X., Li, Y., Wu, H., Zhou, M., Wei, F., and Lu, Y. (2013, January 4\u20139). Entity linking for tweets. Proceedings of the Meeting of the Association for Computational Linguistics, Sofia, Bulgaria."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhou, G., and Su, J. (2002, January 7\u201312). Named entity recognition using an hmm-based chunk tagger. Proceedings of the Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania.","DOI":"10.3115\/1073083.1073163"},{"key":"ref_17","unstructured":"Mccallum, A., and Li, W. (June, January 27). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. Proceedings of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada."},{"key":"ref_18","unstructured":"Bender, O., Och, F.J., and Ney, H. (June, January 27). Maximum entropy models for named entity recognition. Proceedings of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada."},{"key":"ref_19","unstructured":"Hammerton, J. (June, January 27). Named entity recognition with long short-term memory. Proceedings of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1075\/li.30.1.03nad","article-title":"A survey of named entity recognition and classification","volume":"30","author":"Nadeau","year":"2007","journal-title":"Lingvisticae Investig."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Gattani, A., Lamba, D.S., Garera, N., Tiwari, M., Chai, X., Das, S., Subramaniam, S., Rajaraman, A., Harinarayan, V., and Doan, A. (2013, January 26\u201331). Entity extraction, linking, classification, and tagging for social media: A wikipedia-based approach. Proceedings of the Very Large Data Base, Trento, Italy.","DOI":"10.14778\/2536222.2536237"},{"key":"ref_22","unstructured":"Varma, V., Bharat, V., Kovelamudi, S., Bysani, P., Gsk, S., Kiran, K.N., Reddy, K., Kumar, K., and Maganti, N. (2009, January 16\u201317). IIIt hyderabad at TAC 2009. Proceedings of the Text Analysis Conference, Gaithersburg, MD, USA."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Pilz, A., and Paas, G. (2011, January 24\u201328). From names to entities using thematic context distance. Proceedings of the Conference on Information and Knowledge Management, Glasgow, UK.","DOI":"10.1145\/2063576.2063700"},{"key":"ref_24","unstructured":"Pasca, R.B. (2006, January 3\u20137). Using encyclopedic knowledge for named entity disambiguation. Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics, Trento, Italy."},{"key":"ref_25","unstructured":"Zhang, W., Yan, C., and Su, S.J. (2010, January 15\u201316). Nus-i2r: Learning a combined system for entity linking. Proceedings of the Text Analysis Conference, Gaithersburg, MD, USA. Available online: https:\/\/tac.nist.gov\/publications\/2010\/participant.papers\/NUSchime.proceedings.pdf."},{"key":"ref_26","unstructured":"Chen, Z., and Ji, H. (2011, January 27\u201331). Collaborative ranking: A case study on entity linking. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_27","unstructured":"Hoffart, J., Yosef, M.A., Bordino, I., Furstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., and Weikum, G. (2011, January 27\u201331). Robust disambiguation of named entities in text. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Shen, W., Wang, J., Luo, P., and Wang, M. (2013, January 11\u201314). Linking named entities in tweets with knowledge base via user interest modeling. Proceedings of 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA.","DOI":"10.1145\/2487575.2487686"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zwicklbauer, S., Seifert, C., and Granitzer, M. (2016, January 17\u201321). Robust and collective entity disambiguation through semantic embeddings. Proceedings of the International Acm Sigir Conference on Research and Development in Information Retrieval, Pisa, Italy.","DOI":"10.1145\/2911451.2911535"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Demartini, G., Difallah, D.E., and Cudremauroux, P. (2012, January 16\u201320). Zencrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. Proceedings of the 21st International Conference on World Wide Web, Lyon, France.","DOI":"10.1145\/2187836.2187900"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., and Ruan, T. (2016). Graph-based jointly modeling entity detection and linking in domain-specific area. Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data, Proceedings of the CCKS 2016: Chinese Conference on Knowledge Graph and Semantic Computing, Beijing, China, 19\u201322 September 2016, Springer Singapore.","DOI":"10.1007\/978-981-10-3168-7"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Li, Y., Wang, C., Han, F., Han, J., Roth, D., and Yan, X. (2013, January 11\u201314). Mining evidences for named entity disambiguation. Proceedings of 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA.","DOI":"10.1145\/2487575.2487681"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., and Ruan, T. (2016). Domain-specific entity discovery and linking task. Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data, Proceedings of the CCKS 2016: Chinese Conference on Knowledge Graph and Semantic Computing, Beijing, China, 19\u201322 September 2016, Springer Singapore.","DOI":"10.1007\/978-981-10-3168-7"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Guo, Y., Qin, B., Liu, T., and Li, S. (2013, January 18\u201321). Microblog entity linking by leveraging extra posts. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA.","DOI":"10.18653\/v1\/D13-1085"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"270","DOI":"10.1162\/neco.1989.1.2.270","article-title":"A learning algorithm for continually running fully recurrent neural networks","volume":"1","author":"Williams","year":"1989","journal-title":"Neural Comput."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Milne, D., and Witten, I.H. (2008, January 26\u201330). Learning to link with wikipedia. Proceedings of the Conference on Information and Knowledge Management, Napa Valley, CA, USA.","DOI":"10.1145\/1458082.1458150"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1016\/j.artint.2012.06.007","article-title":"An open-source toolkit for mining wikipedia","volume":"194","author":"Milne","year":"2013","journal-title":"Artif. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1145\/361219.361220","article-title":"A vector space model for automatic indexing","volume":"18","author":"Salton","year":"1975","journal-title":"Commun. ACM"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chen, H., Ji, H., Sun, L., Wang, H., Qian, T., and Ruan, T. (2016). Icrc-dsedl: A film named entity discovery and linking system based on knowledge bases. Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data, Proceedings of the CCKS 2016: Chinese Conference on Knowledge Graph and Semantic Computing, Beijing, China, 19\u201322 September 2016, Springer Singapore.","DOI":"10.1007\/978-981-10-3168-7"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/2\/59\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:36:37Z","timestamp":1760207797000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/2\/59"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,5,22]]},"references-count":39,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2017,6]]}},"alternative-id":["info8020059"],"URL":"https:\/\/doi.org\/10.3390\/info8020059","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2017,5,22]]}}}