{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,2,13]],"date-time":"2024-02-13T08:40:55Z","timestamp":1707813655551},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude\/longitudes using knowledge derived from external geographical databases.<\/jats:p>\n               <jats:p>Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude\/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus.<\/jats:p>\n               <jats:p>Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude\/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively.<\/jats:p>\n               <jats:p>Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction.<\/jats:p>\n               <jats:p>Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles.<\/jats:p>","DOI":"10.1093\/jamia\/ocv172","type":"journal-article","created":{"date-parts":[[2016,1,18]],"date-time":"2016-01-18T01:07:26Z","timestamp":1453079246000},"page":"934-941","source":"Crossref","is-referenced-by-count":16,"title":["A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records"],"prefix":"10.1093","volume":"23","author":[{"given":"Tasnia","family":"Tahsin","sequence":"first","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Davy","family":"Weissenbacher","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Robert","family":"Rivera","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Rachel","family":"Beard","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Mari","family":"Firago","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Garrick","family":"Wallstrom","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Matthew","family":"Scotch","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]},{"given":"Graciela","family":"Gonzalez","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics, Arizona State University, 13212 E Shea Blvd, Scottsdale, AZ 85259, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,1,17]]},"reference":[{"key":"2020110612354141700_ocv172-B1","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1038\/clpt.2012.96","article-title":"Pharmacogenomics knowledge for personalized medicine","volume":"92","author":"Whirl-Carrillo","year":"2012","journal-title":"Clin Pharmacol Ther."},{"key":"2020110612354141700_ocv172-B2","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1046\/j.1365-294X.2003.02051.x","article-title":"The phylogeography of human viruses","volume":"13","author":"Holmes","year":"2004","journal-title":"Mol Ecol."},{"key":"2020110612354141700_ocv172-B3","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1007\/s00705-014-2262-5","article-title":"Combining phylogeography and spatial epidemiology to uncover predictors of H5N1 influenza A virus diffusion","volume":"160","author":"Magee","year":"2015","journal-title":"Arch Virol."},{"key":"2020110612354141700_ocv172-B4","doi-asserted-by":"crossref","first-page":"1939","DOI":"10.1017\/S0031182012001102","article-title":"Integrative molecular phylogeography in the context of infectious diseases on the human-animal interface","volume":"139","author":"Gray","year":"2012","journal-title":"Parasitology."},{"key":"2020110612354141700_ocv172-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.1096-0031.2009.00297.x","article-title":"Tracking the geographical spread of avian influenza (H5N1) with multiple phylogenetic trees","volume":"26","author":"Hovm\u00f6ller","year":"2010","journal-title":"Cladistics."},{"key":"2020110612354141700_ocv172-B6","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1111\/cla.12107","article-title":"Phylogenetic visualization of the spread of H7 influenza A viruses","volume":"31","author":"Janies","year":"2015","journal-title":"Cladistics."},{"key":"2020110612354141700_ocv172-B7","doi-asserted-by":"crossref","first-page":"e1001005","DOI":"10.1371\/journal.pcbi.1001005","article-title":"Network analysis of global influenza spread","volume":"6","author":"Chan","year":"2010","journal-title":"PLoS Comput Biol."},{"key":"2020110612354141700_ocv172-B8","doi-asserted-by":"crossref","first-page":"998","DOI":"10.1289\/ehp.6735","article-title":"Spatial epidemiology: current approaches and future challenges","volume":"112","author":"Elliott","year":"2004","journal-title":"Environ Health Perspect."},{"key":"2020110612354141700_ocv172-B9","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gks1195","article-title":"GenBank","volume":"41","author":"Benson","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2020110612354141700_ocv172-B10","doi-asserted-by":"crossref","first-page":"e1002064","DOI":"10.1371\/journal.ppat.1002064","article-title":"Endemic dengue associated with the co-circulation of multiple viral lineages and localized density-dependent transmission","volume":"7","author":"Raghwani","year":"2011","journal-title":"PLoS Pathog."},{"key":"2020110612354141700_ocv172-B11","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1016\/j.coviro.2011.10.003","article-title":"Toward a quantitative understanding of viral phylogeography","volume":"1","author":"Faria","year":"2011","journal-title":"Curr Opin Virol."},{"key":"2020110612354141700_ocv172-B12","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1080\/10635150701266848","article-title":"Genomic analysis and geographic visualization of the spread of avian influenza (H5N1)","volume":"56","author":"Janies","year":"2007","journal-title":"Syst Biol."},{"key":"2020110612354141700_ocv172-B13"},{"key":"2020110612354141700_ocv172-B14","first-page":"161","article-title":"BioNLP shared Task 2013\u2013An Overview of the Bacteria Biotope Task","volume-title":"Proceedings of the BioNLP Shared Task Workshop, ACL","author":"Bossy","year":"2013"},{"key":"2020110612354141700_ocv172-B15","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1186\/1471-2105-11-294","article-title":"EnvMine: a text-mining system for the automatic extraction of contextual information","volume":"11","author":"Tamames","year":"2010","journal-title":"BMC Bioinformatics."},{"key":"2020110612354141700_ocv172-B16","first-page":"717","article-title":"Leveraging biomedical ontologies and annotation services to organize microbiome data from Mammalian hosts","volume":"2010","author":"Sarkar","year":"2010","journal-title":"AMIA Annu Symp Proc."},{"key":"2020110612354141700_ocv172-B17","first-page":"6","article-title":"Towards structuring unstructured genbank metadata for enhancing comparative biological studies","volume":"2011","author":"Chen","year":"2011","journal-title":"AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci."},{"key":"2020110612354141700_ocv172-B18","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/j.jbi.2009.10.003","article-title":"MeSHing molecular sequences and clinical trials: a feasibility study","volume":"43","author":"Chen","year":"2010","journal-title":"J Biomed Inform."},{"key":"2020110612354141700_ocv172-B19","doi-asserted-by":"crossref","first-page":"101","DOI":"10.1186\/1756-0500-2-101","article-title":"GenBank and PubMed: how connected are they?","volume":"2","author":"Miller","year":"2009","journal-title":"BMC Res Notes."},{"key":"2020110612354141700_ocv172-B20","doi-asserted-by":"crossref","first-page":"240175","DOI":"10.1155\/2013\/240175","article-title":"The world bacterial biogeography and biodiversity through databases: a case study of NCBI Nucleotide Database and GBIF Database","volume":"2013","author":"Selama","year":"2013","journal-title":"Biomed Res Int."},{"key":"2020110612354141700_ocv172-B21","first-page":"102","article-title":"Natural language processing methods for enhancing geographic metadata for phylogeography of zoonotic viruses","volume":"2014","author":"Tahsin","year":"2014","journal-title":"AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci."},{"key":"2020110612354141700_ocv172-B22"},{"key":"2020110612354141700_ocv172-B23","author":"Sayers"},{"key":"2020110612354141700_ocv172-B24","author":"Lieberman"},{"key":"2020110612354141700_ocv172-B25","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/978-3-540-89903-7_8","article-title":"A Toponym Resolution Service Following the OGC WPS Standard","volume-title":"Proceedings of the 8th International Symposium on Web and Wireless Geographical Information Systems","author":"Ladra","year":"2008"},{"key":"2020110612354141700_ocv172-B26"},{"key":"2020110612354141700_ocv172-B27","doi-asserted-by":"crossref","first-page":"S44","DOI":"10.1016\/j.jbi.2011.06.005","article-title":"Enhancing phylogeography by improving geographical information from GenBank","volume":"44","author":"Scotch","year":"2011","journal-title":"J Biomed Inform."},{"key":"2020110612354141700_ocv172-B28","article-title":"SUTime: A library for recognizing and normalizing time expressions","volume-title":"Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)","author":"Chang","year":"2012"},{"key":"2020110612354141700_ocv172-B29","doi-asserted-by":"crossref","first-page":"i348","DOI":"10.1093\/bioinformatics\/btv259","article-title":"Knowledge-driven geospatial location resolution for phylogeographic models of virus migration","volume":"31","author":"Weissenbacher","year":"2015","journal-title":"Bioinformatics."},{"key":"2020110612354141700_ocv172-B30","first-page":"168","article-title":"GATE: a framework and graphical development environment for robust NLP tools and applications","author":"Cunningham","year":"2002","journal-title":"Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics"},{"key":"2020110612354141700_ocv172-B31","doi-asserted-by":"crossref","first-page":"467","DOI":"10.3115\/1220575.1220634","article-title":"Bidirectional inference with the easiest-first strategy for tagging sequence data","volume-title":"Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT \u201905","author":"Tsuruoka","year":"2005"},{"key":"2020110612354141700_ocv172-B32","first-page":"124","article-title":"Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding","volume-title":"SIGIR Forum","author":"Leidner","year":"2007"},{"key":"2020110612354141700_ocv172-B33","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1197\/jamia.M1733","article-title":"Agreement, the f-measure, and reliability in information retrieval","volume":"12","author":"Hripcsak","year":"2005","journal-title":"J Am Med Inform Assoc."}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/jamia\/article\/23\/5\/934\/2379807","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/jamia\/article\/23\/5\/934\/2379807","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,6]],"date-time":"2020-11-06T17:38:59Z","timestamp":1604684339000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/23\/5\/934\/2379807"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,1,17]]},"references-count":33,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2016,1,17]]},"published-print":{"date-parts":[[2016,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocv172","relation":{},"ISSN":["1527-974X","1067-5027"],"issn-type":[{"value":"1527-974X","type":"electronic"},{"value":"1067-5027","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,9]]},"published":{"date-parts":[[2016,1,17]]}}}