{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,22]],"date-time":"2025-11-22T11:10:24Z","timestamp":1763809824310,"version":"3.37.3"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program","doi-asserted-by":"crossref","award":["2016YFD0101900"],"award-info":[{"award-number":["2016YFD0101900"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["31701144"],"award-info":[{"award-number":["31701144"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The fundamental challenge of modern genetic analysis is to establish gene-phenotype correlations that are often found in the large-scale publications. Because lexical features of gene are relatively regular in text, the main challenge of these relation extraction is phenotype recognition. Due to phenotypic descriptions are often study- or author-specific, few lexicon can be used to effectively identify the entire phenotypic expressions in text, especially for plants.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We have proposed a pipeline for extracting phenotype, gene and their relations from biomedical literature. Combined with abbreviation revision and sentence template extraction, we improved the unsupervised word-embedding-to-sentence-embedding cascaded approach as representation learning to recognize the various broad phenotypic information in literature. In addition, the dictionary- and rule-based method was applied for gene recognition. Finally, we integrated one of famous information extraction system OLLIE to identify gene-phenotype relations. To demonstrate the applicability of the pipeline, we established two types of comparison experiment using model organism Arabidopsis thaliana. In the comparison of state-of-the-art baselines, our approach obtained the best performance (F1-Measure of 66.83%). We also applied the pipeline to 481 full-articles from TAIR gene-phenotype manual relationship dataset to prove the validity. The results showed that our proposed pipeline can cover 70.94% of the original dataset and add 373 new relations to expand it.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code is available at http:\/\/www.wutbiolab.cn: 82\/Gene-Phenotype-Relation-Extraction-Pipeline.zip.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty263","type":"journal-article","created":{"date-parts":[[2018,4,25]],"date-time":"2018-04-25T06:44:44Z","timestamp":1524638684000},"page":"i386-i394","source":"Crossref","is-referenced-by-count":35,"title":["A gene\u2013phenotype relationship extraction pipeline from the biomedical literature using a representation learning approach"],"prefix":"10.1093","volume":"34","author":[{"given":"Wenhui","family":"Xing","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junsheng","family":"Qi","sequence":"additional","affiliation":[{"name":"Department of Plant Science, College of Biological Science, China Agricultural University, Beijing, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaohui","family":"Yuan","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lin","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoyu","family":"Zhang","sequence":"additional","affiliation":[{"name":"Britton Chance Center for Biomedical Photonics, Wuhan National Laboratory for Optoelectronics-Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuhua","family":"Fu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shengwu","family":"Xiong","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lun","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jing","family":"Peng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"year":"2013","author":"Berant","key":"2023051604253533500_bty263-B1"},{"key":"2023051604253533500_bty263-B2","doi-asserted-by":"crossref","first-page":"1253","DOI":"10.1093\/bioinformatics\/bts125","article-title":"Harmonization of gene\/protein annotations: towards a gold standard medline","volume":"28","author":"Campos","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051604253533500_bty263-B3","doi-asserted-by":"crossref","first-page":"W399","DOI":"10.1093\/nar\/gkn296","article-title":"Polysearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites","volume":"36","author":"Cheng","year":"2008","journal-title":"Nucleic Acids Res"},{"first-page":"4","year":"2006","author":"Chun","key":"2023051604253533500_bty263-B4"},{"key":"2023051604253533500_bty263-B5","doi-asserted-by":"crossref","first-page":"867","DOI":"10.1007\/s00122-013-2066-0","article-title":"Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype\u2013phenotype relationships and its relevance to crop improvement","volume":"126","author":"Cobb","year":"2013","journal-title":"Theor. Appl. Genet"},{"key":"2023051604253533500_bty263-B6","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1093\/bib\/6.1.57","article-title":"A survey of current work in biomedical text mining","volume":"6","author":"Cohen","year":"2005","journal-title":"Brief. Bioinformatics"},{"key":"2023051604253533500_bty263-B7","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bav104","article-title":"Phenominer: from text to a database of phenotypes associated with OMIM diseases","volume":"2015","author":"Collier","year":"2015","journal-title":"Database"},{"key":"2023051604253533500_bty263-B8","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1016\/j.jbi.2010.08.005","article-title":"Using text to build semantic networks for pharmacogenomics","volume":"43","author":"Coulet","year":"2010","journal-title":"J. Biomed. Informatics"},{"first-page":"1535","year":"2011","author":"Fader","key":"2023051604253533500_bty263-B9"},{"key":"2023051604253533500_bty263-B10","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/brain\/awt202","article-title":"Genotype\u2013phenotype correlations in neurogenetics: lesch-nyhan disease as a model disorder","volume":"137","author":"Fu","year":"2014","journal-title":"Brain"},{"key":"2023051604253533500_bty263-B11","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1093\/bioinformatics\/19.1.135","article-title":"Protein structures and information extraction from biological texts: the pasta system","volume":"19","author":"Gaizauskas","year":"2003","journal-title":"Bioinformatics"},{"key":"2023051604253533500_bty263-B12","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1093\/bioinformatics\/btg449","article-title":"Automated extraction of mutation data from the literature: application of mutext to g protein-coupled receptors and nuclear hormone receptors","volume":"20","author":"Horn","year":"2004","journal-title":"Bioinformatics"},{"key":"2023051604253533500_bty263-B13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1136\/jamia.1998.0050001","article-title":"The unified medical language system: an informatics research collaboration","volume":"5","author":"Humphreys","year":"1998","journal-title":"J. Am. Med. Informatics Assoc"},{"key":"2023051604253533500_bty263-B14","doi-asserted-by":"crossref","first-page":"D1123","DOI":"10.1093\/nar\/gkq1066","article-title":"Ahd2. 0: an update version of arabidopsis hormone database for plant systematic studies","volume":"39","author":"Jiang","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051604253533500_bty263-B15","doi-asserted-by":"crossref","first-page":"40154.","DOI":"10.1038\/srep40154","article-title":"An analysis of disease-gene relationship from medline abstracts by digsee","volume":"7","author":"Kim","year":"2017","journal-title":"Sci. Rep"},{"key":"2023051604253533500_bty263-B16","doi-asserted-by":"crossref","first-page":"D1202","DOI":"10.1093\/nar\/gkr1090","article-title":"The arabidopsis information resource (tair): improved gene annotation and new tools","volume":"40","author":"Lamesch","year":"2012","journal-title":"Nucleic Acids Res"},{"first-page":"1188","year":"2014","author":"Le","key":"2023051604253533500_bty263-B17"},{"key":"2023051604253533500_bty263-B18","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1093\/bib\/bbw001","article-title":"Bridging semantics and syntax with graph algorithmsstate-of-the-art of extracting biomedical relations","volume":"18","author":"Luo","year":"2017","journal-title":"Brief. Bioinformatics"},{"key":"2023051604253533500_bty263-B19","first-page":"24","article-title":"Language combinatorics: a sentence pattern extraction architecture based on combinatorial explosion","volume":"2","author":"Michal","year":"2011","journal-title":"Int. J. Comput. Linguistics"},{"first-page":"3111","year":"2013","author":"Mikolov","key":"2023051604253533500_bty263-B20"},{"key":"2023051604253533500_bty263-B21","doi-asserted-by":"crossref","first-page":"e309.","DOI":"10.1371\/journal.pbio.0020309","article-title":"Textpresso: an ontology-based information retrieval and extraction system for biological literature","volume":"2","author":"M\u00fcller","year":"2004","journal-title":"PLoS Biol"},{"key":"2023051604253533500_bty263-B22","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1109\/JPROC.2015.2483592","article-title":"A review of relational machine learning for knowledge graphs","volume":"104","author":"Nickel","year":"2016","journal-title":"Proc. IEEE"},{"key":"2023051604253533500_bty263-B23","doi-asserted-by":"crossref","first-page":"i277","DOI":"10.1093\/bioinformatics\/btn182","article-title":"Identifying gene-disease associations using centrality on a literature mined gene-interaction network","volume":"24","author":"\u00d6zg\u00fcr","year":"2008","journal-title":"Bioinformatics"},{"key":"2023051604253533500_bty263-B24","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1016\/j.ymeth.2014.10.026","article-title":"Protein\u2013protein interaction predictions using text mining methods","volume":"74","author":"Papanikolaou","year":"2015","journal-title":"Methods"},{"first-page":"517","year":"1999","author":"Rindflesch","key":"2023051604253533500_bty263-B25"},{"first-page":"523","year":"2012","author":"Schmitz","key":"2023051604253533500_bty263-B26"},{"key":"2023051604253533500_bty263-B27","doi-asserted-by":"crossref","first-page":"816","DOI":"10.1016\/j.drudis.2008.06.001","article-title":"Drug name recognition and classification in biomedical texts: a case study outlining approaches underpinning automated systems","volume":"13","author":"Segura-Bedmar","year":"2008","journal-title":"Drug Discov. Today"},{"key":"2023051604253533500_bty263-B28","first-page":"1","article-title":"The 1st DDIExtraction-2011 challenge task: extraction of drug-drug interactions from biomedical texts","volume":"761","author":"Segura-Bedmar","year":"2011","journal-title":"CEUR workshop proc"},{"year":"2013","author":"Segura Bedmar","key":"2023051604253533500_bty263-B29"},{"key":"2023051604253533500_bty263-B30","doi-asserted-by":"crossref","first-page":"D1054","DOI":"10.1093\/nar\/gkw986","article-title":"Arapheno: a public database for Arabidopsis thaliana phenotypes","volume":"45","author":"Seren","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051604253533500_bty263-B31","doi-asserted-by":"crossref","first-page":"e1005017.","DOI":"10.1371\/journal.pcbi.1005017","article-title":"Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine","volume":"12","author":"Singhal","year":"2016","journal-title":"PLoS Comput. Biol"},{"key":"2023051604253533500_bty263-B32","doi-asserted-by":"crossref","first-page":"2000","DOI":"10.1109\/TPAMI.2016.2632117","article-title":"Nelasso: group-sparse modeling for characterizing relations among named entities in news articles","volume":"39","author":"Tariq","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023051604253533500_bty263-B33","first-page":"1.","article-title":"Gnormplus: an integrative approach for tagging genes, gene families, and protein domains","volume":"2015","author":"Wei","year":"2015","journal-title":"BioMed Res. Int"},{"key":"2023051604253533500_bty263-B34","doi-asserted-by":"crossref","first-page":"S5.","DOI":"10.1186\/1471-2105-12-S8-S5","article-title":"Cross-species gene normalization by species inference","volume":"12","author":"Wei","year":"2011","journal-title":"BMC Bioinformatics"},{"first-page":"477","year":"2017","author":"Xing","key":"2023051604253533500_bty263-B35"},{"key":"2023051604253533500_bty263-B36","doi-asserted-by":"crossref","first-page":"14.","DOI":"10.1186\/1471-2105-10-14","article-title":"MBA: a literature mining system for extracting biomedical abbreviations","volume":"10","author":"Xu","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023051604253533500_bty263-B37","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1038\/nmeth.3484","article-title":"Phenolyzer: phenotype-based prioritization of candidate genes for human diseases","volume":"12","author":"Yang","year":"2015","journal-title":"Nat. Methods"},{"key":"2023051604253533500_bty263-B38","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1016\/j.artmed.2010.12.002","article-title":"Multiple kernel learning in protein\u2013protein interaction extraction from biomedical literature","volume":"51","author":"Yang","year":"2011","journal-title":"Artif. Intell. Med"},{"first-page":"1306","year":"2015","author":"Zhu","key":"2023051604253533500_bty263-B39"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i386\/50316356\/bioinformatics_34_13_i386.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i386\/50316356\/bioinformatics_34_13_i386.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T04:29:08Z","timestamp":1684211348000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i386\/5045803"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":39,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty263","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}