{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T22:59:32Z","timestamp":1772319572381,"version":"3.50.1"},"reference-count":60,"publisher":"Wiley","license":[{"start":{"date-parts":[[2012,5,22]],"date-time":"2012-05-22T00:00:00Z","timestamp":1337644800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"funder":[{"name":"MacArthur Foundation Grant to the Encyclopedia of Life","award":["0830976"],"award-info":[{"award-number":["0830976"]}]},{"name":"MacArthur Foundation Grant to the Encyclopedia of Life","award":["0849982"],"award-info":[{"award-number":["0849982"]}]},{"name":"National Science Foundation Data Net Program","award":["0830976"],"award-info":[{"award-number":["0830976"]}]},{"name":"National Science Foundation Data Net Program","award":["0849982"],"award-info":[{"award-number":["0849982"]}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["0830976"],"award-info":[{"award-number":["0830976"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["0849982"],"award-info":[{"award-number":["0849982"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Advances in Bioinformatics"],"published-print":{"date-parts":[[2012,5,22]]},"abstract":"<jats:p>Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science.\nA computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science.<\/jats:p>","DOI":"10.1155\/2012\/391574","type":"journal-article","created":{"date-parts":[[2012,5,29]],"date-time":"2012-05-29T07:11:13Z","timestamp":1338275473000},"page":"1-17","source":"Crossref","is-referenced-by-count":75,"title":["Applications of Natural Language Processing in Biodiversity Science"],"prefix":"10.1155","volume":"2012","author":[{"given":"Anne E.","family":"Thessen","sequence":"first","affiliation":[{"name":"Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hong","family":"Cui","sequence":"additional","affiliation":[{"name":"School of Information Resources and Library Science, University of Arizona, Tucson, AZ 85719, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dmitry","family":"Mozzherin","sequence":"additional","affiliation":[{"name":"Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","reference":[{"key":"1","doi-asserted-by":"publisher","DOI":"10.1126\/science.287.5454.793"},{"key":"2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.241391498"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1890\/1540-9295-7.9.455"},{"key":"4","doi-asserted-by":"crossref","first-page":"15","DOI":"10.3897\/zookeys.150.1766","volume":"150","year":"2011","journal-title":"ZooKeys"},{"key":"5","year":"2009"},{"key":"6","doi-asserted-by":"publisher","DOI":"10.1038\/nrg2414"},{"issue":"2","key":"7","doi-asserted-by":"crossref","first-page":"280","DOI":"10.1353\/lib.0.0036","volume":"57","year":"2008","journal-title":"Library Trends"},{"key":"8"},{"issue":"2","key":"9","volume":"7","year":"2010","journal-title":"Biodiversity Informatics"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.1126\/science.1191506"},{"key":"11","doi-asserted-by":"publisher","DOI":"10.3233\/ISU-2010-0613"},{"key":"13","year":"2007"},{"key":"14","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21246"},{"key":"28","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21325"},{"key":"70","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btn631"},{"key":"72","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/19.1.135"},{"key":"73","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti296"},{"key":"74","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth386"},{"key":"75","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-5-147"},{"key":"76","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl302"},{"key":"77","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiolchem.2004.09.010"},{"key":"78","doi-asserted-by":"publisher","DOI":"10.1186\/1742-5581-3-11"},{"key":"17","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-7-S3-S2"},{"key":"18","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2008.12.004"},{"key":"80","first-page":"20","volume":"5","year":"2008","journal-title":"Biodiversity Informatics"},{"key":"44","first-page":"79","volume":"2","year":"2005","journal-title":"Biodiversity Informatics"},{"key":"47","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-11-85"},{"key":"49","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btr452"},{"key":"68","doi-asserted-by":"publisher","DOI":"10.1002\/asi.22618"},{"key":"19","doi-asserted-by":"publisher","DOI":"10.1016\/S0378-1119(00)00431-5"},{"key":"20","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl425"},{"key":"24","series-title":"Morgan Kaufmann Series in Data Management Systems","year":"2005"},{"issue":"2","key":"25","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1093\/bib\/3.2.154","volume":"3","year":"2002","journal-title":"Briefings in Bioinformatics"},{"key":"26","first-page":"182"},{"key":"29","year":"2009","journal-title":"Nature Precedings"},{"key":"32","volume-title":"Digitization and enhancement of biodiversity literature through OCR, scientific names mapping and crowdsourcing.","year":"2011"},{"key":"35","year":"2004"},{"key":"38","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti475"},{"key":"39","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp081"},{"key":"40","doi-asserted-by":"publisher","DOI":"10.1038\/nbt0609-508"},{"key":"41","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkm795"},{"key":"42","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0010500"},{"key":"43","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0010708"},{"key":"45","first-page":"46","volume":"3","year":"2007","journal-title":"Biodiversity Informatics"},{"key":"46","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btm109"},{"key":"48","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl534"},{"key":"50","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324904003468"},{"key":"51","year":"2006"},{"key":"52","first-page":"39","volume-title":"The status of telegraphic sublanguages","year":"1986"},{"key":"53","first-page":"357","volume-title":"Populating a database from parallel texts using ontology-based information extraction","volume":"3136","year":"2004"},{"key":"54","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth496"},{"key":"55","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2006.06.001"},{"key":"56","first-page":"99","volume-title":"Abbreviations in biomedical text","year":"2006"},{"issue":"5","key":"57","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1055\/s-0038-1634373","volume":"41","year":"2002","journal-title":"Methods of Information in Medicine"},{"key":"58","doi-asserted-by":"publisher","DOI":"10.1017\/S1477200003001129"},{"key":"60","year":"1986"},{"key":"64","doi-asserted-by":"publisher","DOI":"10.1002\/asi.v58:1"},{"issue":"1","key":"66","first-page":"233","volume":"34","year":"1999","journal-title":"Machine Learning"},{"key":"67","doi-asserted-by":"publisher","DOI":"10.1002\/meet.2011.14504801031"},{"key":"69","doi-asserted-by":"publisher","DOI":"10.1016\/j.tree.2007.03.013"}],"container-title":["Advances in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/archive\/2012\/391574.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/archive\/2012\/391574.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/archive\/2012\/391574.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T03:59:13Z","timestamp":1607572753000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/abi\/2012\/391574\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5,22]]},"references-count":60,"alternative-id":["391574","391574"],"URL":"https:\/\/doi.org\/10.1155\/2012\/391574","relation":{},"ISSN":["1687-8027","1687-8035"],"issn-type":[{"value":"1687-8027","type":"print"},{"value":"1687-8035","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,5,22]]}}}