{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:59:31Z","timestamp":1760241571142,"version":"build-2065373602"},"reference-count":60,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2018,6,5]],"date-time":"2018-06-05T00:00:00Z","timestamp":1528156800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Automatic question answering (QA), which can greatly facilitate the access to information, is an important task in artificial intelligence. Recent years have witnessed the development of QA methods based on deep learning. However, a great amount of data is needed to train deep neural networks, and it is laborious to annotate training data for factoid QA of new domains or languages. In this paper, a distantly supervised method is proposed to automatically generate QA pairs. Additional efforts are paid to let the generated questions reflect the query interests and expression styles of users by exploring the community QA. Specifically, the generated questions are selected according to the estimated probabilities they are asked. Diverse paraphrases of questions are mined from community QA data, considering that the model trained on monotonous synthetic questions is very sensitive to variants of question expressions. Experimental results show that the model solely trained on generated data via the distant supervision and mined paraphrases could answer real-world questions with the accuracy of 49.34%. When limited annotated training data is available, significant improvements could be achieved by incorporating the generated data. An improvement of 1.35 absolute points is still observed on WebQA, a dataset with large-scale annotated training samples.<\/jats:p>","DOI":"10.3390\/e20060439","type":"journal-article","created":{"date-parts":[[2018,6,5]],"date-time":"2018-06-05T11:04:46Z","timestamp":1528196686000},"page":"439","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Factoid Question Answering with Distant Supervision"],"prefix":"10.3390","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9688-7691","authenticated-orcid":false,"given":"Hongzhi","family":"Zhang","sequence":"first","affiliation":[{"name":"Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiao","family":"Liang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guangluan","family":"Xu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Fu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"},{"name":"School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"Institute of Electronics, Chinese Academy of Sciences, Suzhou, Suzhou 215123, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Feng","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tinglei","family":"Huang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,6,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Berant, J., Chou, A., Frostig, R., and Liang, P. (2013, January 18\u201321). Semantic Parsing on Freebase from Question-Answer Pairs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA.","DOI":"10.18653\/v1\/D13-1160"},{"key":"ref_2","unstructured":"Bordes, A., Usunier, N., Chopra, S., and Weston, J. (arXiv, 2015). Large-scale Simple Question Answering with Memory Networks, arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Sun, H., Ma, H., He, X., Yih, W.t., Su, Y., and Yan, X. (2016, January 11\u201315). Table Cell Search for Question Answering. Proceedings of the 25th International Conference on World Wide Web, Republic and Canton of Geneva, Switzerland.","DOI":"10.1145\/2872427.2883080"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (arXiv, 2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text, arXiv.","DOI":"10.18653\/v1\/D16-1264"},{"key":"ref_5","unstructured":"Li, P., Li, W., He, Z., Wang, X., Cao, Y., Zhou, J., and Xu, W. (arXiv, 2016). Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering, arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Trischler, A., Wang, T., Yuan, X., Harris, J., Sordoni, A., Bachman, P., and Suleman, K. (arXiv, 2017). NewsQA: A Machine Comprehension Dataset, arXiv.","DOI":"10.18653\/v1\/W17-2623"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1:1","DOI":"10.1147\/JRD.2012.2184356","article-title":"Introduction to \u2019This is Watson\u2019","volume":"56","author":"Ferrucci","year":"2012","journal-title":"IBM J. Res. Dev."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Bao, J., Duan, N., Zhou, M., and Zhao, T. (2014, January 22\u201327). Knowledge-Based Question Answering as Machine Translation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-1091"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"He, S., Liu, K., Zhang, Y., Xu, L., and Zhao, J. (2014, January 25\u201329). Question Answering over Linked Data Using First-order Logic. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.","DOI":"10.3115\/v1\/D14-1116"},{"key":"ref_10","unstructured":"Hermann, K.M., Ko\u010disk\u00fd, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. (2015, January 7\u201312). Teaching Machines to Read and Comprehend. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_11","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Learned in Translation: Contextualized Word Vectors. Advances in Neural Information Processing Systems 30, Curran Associates, Inc."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, W., Yang, N., Wei, F., Chang, B., and Zhou, M. (2017). Gated Self-Matching Networks for Reading Comprehension and Question Answering. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/P17-1018"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Jia, R., and Liang, P. (arXiv, 2017). Adversarial Examples for Evaluating Reading Comprehension Systems, arXiv.","DOI":"10.18653\/v1\/D17-1215"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Du, X., Shao, J., and Cardie, C. (arXiv, 2017). Learning to Ask: Neural Question Generation for Reading Comprehension, arXiv.","DOI":"10.18653\/v1\/P17-1123"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Duan, N., Tang, D., Chen, P., and Zhou, M. (2017, January 9\u201311). Question Generation for Question Answering. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1090"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1145\/2629489","article-title":"Wikidata: A Free Collaborative Knowledgebase","volume":"57","year":"2014","journal-title":"Commun. ACM"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., and Ives, Z. (2007, January 11\u201315). DBpedia: A Nucleus for a Web of Open Data. Proceedings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, Busan, Korea.","DOI":"10.1007\/978-3-540-76298-0_52"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Zhang, H. (2018, June 05). zhhongzhi\/factoid_QA_with_distant_spervision: Codes for Our Paper Factoid Question Answering With Distant Supervision. Available online: https:\/\/github.com\/zhhongzhi\/factoid_QA_with_distant_spervision.","DOI":"10.3390\/e20060439"},{"key":"ref_19","unstructured":"Zhang, H. (2018, June 05). Data_for_factoid_QA_with_distant_spervision. Available online: https:\/\/drive.google.com\/drive\/folders\/1EI47PfmeZRfpAUdNq2EI7um_sxlV8prv?usp=sharing."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"He, W., Liu, K., Lyu, Y., Zhao, S., Xiao, X., Liu, Y., Wang, Y., Wu, H., She, Q., and Liu, X. (arXiv, 2017). DuReader: A Chinese Machine Reading Comprehension Dataset from Real-world Applications, arXiv.","DOI":"10.18653\/v1\/W18-2605"},{"key":"ref_21","unstructured":"Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., and Deng, L. (arXiv, 2016). MS MARCO: A Human Generated MAchine Reading COmprehension Dataset, arXiv."},{"key":"ref_22","unstructured":"Seo, M.J., Kembhavi, A., Farhadi, A., and Hajishirzi, H. (arXiv, 2016). Bidirectional Attention Flow for Machine Comprehension, arXiv."},{"key":"ref_23","unstructured":"Wang, S., and Jiang, J. (2017). Machine Comprehension Using Match-LSTM and Answer Pointer, ICLR. ICLR 2017."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cui, Y., Chen, Z., Wei, S., Wang, S., Liu, T., and Hu, G. (arXiv, 2016). Attention-over-Attention Neural Networks for Reading Comprehension, arXiv.","DOI":"10.18653\/v1\/P17-1055"},{"key":"ref_25","unstructured":"Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., and Garnett, R. (2015). Pointer Networks. Advances in Neural Information Processing Systems 28, Curran Associates, Inc."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Hewlett, D., Jones, L., Lacoste, A., and Gur, I. (2017, January 9\u201311). Accurate Supervised and Semi-Supervised Machine Reading for Long Documents. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1214"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Labutov, I., Basu, S., and Vanderwende, L. (2015, January 27\u201331). Deep Questions without Deep Understanding. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China.","DOI":"10.3115\/v1\/P15-1086"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1162\/COLI_a_00206","article-title":"Towards Topic-to-question Generation","volume":"41","author":"Chali","year":"2015","journal-title":"Comput. Linguist."},{"key":"ref_29","unstructured":"Song, L., and Zhao, L. (arXiv, 2016). Domain-specific Question Generation from a Knowledge Base, arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009, January 2\u20137). Distant supervision for relation extraction without labeled data. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore.","DOI":"10.3115\/1690219.1690287"},{"key":"ref_31","unstructured":"Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., and Weld, D.S. (2011, January 19\u201324). Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Oregon, Poland."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Zeng, D., Liu, K., Chen, Y., and Zhao, J. (2015, January 19\u201321). Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1203"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Lin, Y., Shen, S., Liu, Z., Luan, H., and Sun, M. (2016, January 7\u201312). Neural Relation Extraction with Selective Attention over Instances. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany.","DOI":"10.18653\/v1\/P16-1200"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Levy, O., Seo, M., Choi, E., and Zettlemoyer, L. (2017, January 3\u20134). Zero-Shot Relation Extraction via Reading Comprehension. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada.","DOI":"10.18653\/v1\/K17-1034"},{"key":"ref_35","unstructured":"Purver, M., and Battersby, S. Experimenting with Distant Supervision for Emotion Classification, Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics."},{"key":"ref_36","unstructured":"Plank, B., Hovy, D., McDonald, R., and S\u00f8gaard, A. (2014, January 23\u201329). Adapting taggers to Twitter with not-so-distant supervision. Proceedings of the COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Tabassum, J., Ritter, A., and Xu, W. (2016, January 2\u20134). TweeTime : A Minimally Supervised Method for Recognizing and Normalizing Time Expressions in Twitter. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing;, Austin, Texas.","DOI":"10.18653\/v1\/D16-1030"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Zeng, Y., Feng, Y., Ma, R., Wang, Z., Yan, R., Shi, C., and Zhao, D. (arXiv, 2017). Scale Up Event Extraction Learning via Automatic Training Data Generation, arXiv.","DOI":"10.1609\/aaai.v32i1.12030"},{"key":"ref_39","unstructured":"Joshi, M., Choi, E., Weld, D., and Zettlemoyer, L. (August, January 30). TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)."},{"key":"ref_40","unstructured":"Dhingra, B., Mazaitis, K., and Cohen, W.W. (arXiv, 2017). Quasar: Datasets for Question Answering by Search and Reading, arXiv."},{"key":"ref_41","unstructured":"Chen, D., Fisch, A., Weston, J., and Bordes, A. Reading Wikipedia to Answer Open-Domain Questions, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Clark, C., and Gardner, M. (arXiv, 2017). Simple and Effective Multi-Paragraph Reading Comprehension, arXiv.","DOI":"10.18653\/v1\/P18-1078"},{"key":"ref_43","unstructured":"Wang, S., Yu, M., Guo, X., Wang, Z., Klinger, T., Zhang, W., Chang, S., Tesauro, G., Zhou, B., and Jiang, J. (arXiv, 2017). R$3$: Reinforced Reader-Ranker for Open-Domain Question Answering, arXiv."},{"key":"ref_44","unstructured":"Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., and Weinberger, K.Q. (2014). Semi-supervised Learning with Deep Generative Models. Advances in Neural Information Processing Systems 27, Curran Associates, Inc."},{"key":"ref_45","unstructured":"Odena, A. (arXiv, 2016). Semi-Supervised Learning with Generative Adversarial Networks, arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vision"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A., and Potts, C. (2013, January 18\u201321,). Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA.","DOI":"10.18653\/v1\/D13-1170"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Nallapati, R., Zhou, B., dos Santos, C., Gulcehre, C., and Xiang, B. (2016, January 11\u201312). Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany.","DOI":"10.18653\/v1\/K16-1028"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (arXiv, 2018). Deep contextualized word representations, arXiv.","DOI":"10.18653\/v1\/N18-1202"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Wiese, G., Weissenborn, D., and Neves, M. (2017, January 3\u20134). Neural Domain Adaptation for Biomedical Question Answering. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada.","DOI":"10.18653\/v1\/K17-1029"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Chung, Y., Lee, H., and Glass, J.R. (arXiv, 2017). Supervised and Unsupervised Transfer Learning for Question Answering, arXiv.","DOI":"10.18653\/v1\/N18-1143"},{"key":"ref_52","unstructured":"Min, S., Seo, M., and Hajishirzi, H. (August, January 31). Question Answering through Transfer Learning from Large Fine-grained Supervision Data. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vancouver, Canada."},{"key":"ref_53","unstructured":"Yang, Z., Hu, J., Salakhutdinov, R., and Cohen, W. (August, January 31). Semi-Supervised QA with Generative Domain-Adaptive Nets. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada."},{"key":"ref_54","unstructured":"Jurafsky, D., and Martin, J.H. (2009). Speech and Language Processing, Prentice-Hall, Inc.. [2nd ed.]."},{"key":"ref_55","unstructured":"Wang, S., Yu, M., Jiang, J., Zhang, W., Guo, X., Chang, S., Wang, Z., Klinger, T., Tesauro, G., and Campbell, M. (arXiv, 2017). Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering, arXiv."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Galbraith, B., Pratap, B., and Shank, D. (2017, January 3\u20134). Talla at SemEval-2017 Task 3: Identifying Similar Questions Through Paraphrase Detection. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada.","DOI":"10.18653\/v1\/S17-2062"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Filice, S., Da San Martino, G., and Moschitti, A. (2017, January 3\u20134). KeLP at SemEval-2017 Task 3: Learning Pairwise Patterns in Community Question Answering. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada.","DOI":"10.18653\/v1\/S17-2053"},{"key":"ref_58","unstructured":"Lei, T., and Zhang, Y. (arXiv, 2017). Training RNNs as Fast as CNNs, arXiv."},{"key":"ref_59","unstructured":"Kingma, D.P., and Ba, J. (arXiv, 2014). Adam: A Method for Stochastic Optimization, arXiv."},{"key":"ref_60","first-page":"3371","article-title":"Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion","volume":"11","author":"Vincent","year":"2010","journal-title":"J. Mach. Learn. Res."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/20\/6\/439\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:07:26Z","timestamp":1760195246000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/20\/6\/439"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,5]]},"references-count":60,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2018,6]]}},"alternative-id":["e20060439"],"URL":"https:\/\/doi.org\/10.3390\/e20060439","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2018,6,5]]}}}