{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T13:51:49Z","timestamp":1769089909273,"version":"3.49.0"},"reference-count":20,"publisher":"Oxford University Press (OUP)","issue":"23","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1816,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Research in the biomedical domain can have a major impact through open sharing of the data produced. For this reason, it is important to be able to identify instances of data production and deposition for potential re-use. Herein, we report on the automatic identification of data deposition statements in research articles.<\/jats:p>\n               <jats:p>Results: We apply machine learning algorithms to sentences extracted from full-text articles in PubMed Central in order to automatically determine whether a given article contains a data deposition statement, and retrieve the specific statements. With an Support Vector Machine classifier using conditional random field determined deposition features, articles containing deposition statements are correctly identified with 81% F-measure. An error analysis shows that almost half of the articles classified as containing a deposition statement by our method but not by the gold standard do indeed contain a deposition statement. In addition, our system was used to process articles in PubMed Central, predicting that a total of 52 932 articles report data deposition, many of which are not currently included in the Secondary Source Identifier [si] field for MEDLINE citations.<\/jats:p>\n               <jats:p>Availability: All annotated datasets described in this study are freely available from the NLM\/NCBI website at http:\/\/www.ncbi.nlm.nih.gov\/CBBresearch\/Fellows\/Neveol\/DepositionDataSets.zip<\/jats:p>\n               <jats:p>Contact: \u00a0aurelie.neveol@nih.gov; john.wilbur@nih.gov; zhiyong.lu@nih.gov<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr573","type":"journal-article","created":{"date-parts":[[2011,10,15]],"date-time":"2011-10-15T00:13:39Z","timestamp":1318637619000},"page":"3306-3312","source":"Crossref","is-referenced-by-count":25,"title":["Extraction of data deposition statements from the literature: a method for automatically tracking research results"],"prefix":"10.1093","volume":"27","author":[{"given":"Aur\u00e9lie","family":"N\u00e9v\u00e9ol","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine, Bethesda, Maryland, 20894 USA"}]},{"given":"W. John","family":"Wilbur","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine, Bethesda, Maryland, 20894 USA"}]},{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information (NCBI), National Library of Medicine, Bethesda, Maryland, 20894 USA"}]}],"member":"286","published-online":{"date-parts":[[2011,10,13]]},"reference":[{"key":"2023012511031632500_B1","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1038\/nmeth0308-209","article-title":"Thou shalt share your data","volume":"5","author":"Anonymous","year":"2008","journal-title":"Nat. Methods"},{"key":"2023012511031632500_B2","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1197\/jamia.M1911","article-title":"Automatically identifying health outcome information in MEDLINE records","volume":"13","author":"Demner-Fushman","year":"2006","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"2023012511031632500_B3","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1093\/bioinformatics\/btr043","article-title":"Annotating genes and genomes with DNA sequences extracted from biomedical articles","volume":"27","author":"Haeussler","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012511031632500_B4","article-title":"Na\u00efve Bayes and SVM classifiers for classifying databank accession number sentences from online biomedical articles","volume-title":"IS&T\/SPIE's 22nd Annual Symposium on Electronic Imaging.","author":"Kim","year":"2010"},{"issue":"Suppl. 2","key":"2023012511031632500_B5","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-12-S2-S5","article-title":"Automatic classification of sentences to support Evidence Based Medicine","volume":"12","author":"Kim","year":"2011","journal-title":"BMC Bioinformatics"},{"issue":"Suppl. 2","key":"2023012511031632500_B6","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s2-s4","article-title":"Overview of the protein-protein interaction annotation extraction task of BioCreative II","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"key":"2023012511031632500_B7","first-page":"440","article-title":"Categorization of sentence types in medical abstracts","volume":"2008","author":"McKnight","year":"2003","journal-title":"AMIA Annu. Symp. Proc."},{"key":"2023012511031632500_B8","first-page":"485","article-title":"Emerging trend prediction in biomedical literature","author":"Moerchen","year":"2008","journal-title":"AMIA Annu. Symp. Proc."},{"key":"2023012511031632500_B9","doi-asserted-by":"crossref","first-page":"991","DOI":"10.1038\/nmeth1208-991","article-title":"Much room for improvement in deposition rates of expression microarray datasets","volume":"5","author":"Ochsner","year":"2008","journal-title":"Nat. Methods"},{"key":"2023012511031632500_B10","doi-asserted-by":"crossref","first-page":"e308","DOI":"10.1371\/journal.pone.0000308","article-title":"Sharing detailed research data is associated with increased citation rate","volume":"2","author":"Piwowar","year":"2007","journal-title":"PLoS One"},{"key":"2023012511031632500_B11","article-title":"Linking database submissions to primary citations with PubMed Central","volume-title":"Proceedings of the BioLINK workshop at ISBM.","author":"Piwowar","year":"2008"},{"key":"2023012511031632500_B12","first-page":"596","article-title":"Identifying data sharing in biomedical literature","volume":"2008","author":"Piwowar","year":"2008","journal-title":"AMIA Annu. Symp. Proc."},{"key":"2023012511031632500_B13","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.joi.2009.11.010","article-title":"Public sharing of research datasets: a pilot study of associations","volume":"4","author":"Piwowar","year":"2010","journal-title":"J. Informetr."},{"key":"2023012511031632500_B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/2041-1480-2-1","article-title":"Protein interaction sentence detection using multiple semantic kernels","volume":"2","author":"Polajnar","year":"2011","journal-title":"J. Biomed. Semantics"},{"key":"2023012511031632500_B15","doi-asserted-by":"crossref","first-page":"160","DOI":"10.3163\/1536-5050.99.2.009","article-title":"A retrospective cohort study of structured abstracts in MEDLINE, 1992\u20132006","volume":"99","author":"Ripple","year":"2011","journal-title":"J. Med. Libr. Assoc."},{"key":"2023012511031632500_B16","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505282.505283","article-title":"Machine learning in automated text categorization","volume":"1","author":"Sebastiani","year":"2002","journal-title":"ACM Comput. Surv."},{"key":"2023012511031632500_B17","doi-asserted-by":"crossref","first-page":"2320","DOI":"10.1093\/bioinformatics\/bth227","article-title":"MedPost: a part-of-speech tagger for bioMedical text","volume":"20","author":"Smith","year":"2004","journal-title":"Bioinformatics"},{"issue":"Suppl. 6","key":"2023012511031632500_B18","doi-asserted-by":"crossref","first-page":"S18","DOI":"10.1186\/1471-2105-9-S6-S18","article-title":"ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses","volume":"9","author":"Stokes","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012511031632500_B19","first-page":"155","article-title":"Text mining techniques for leveraging positively labeled data","author":"Yeganova","year":"2011","journal-title":"Proceedings of the ACL Workshop BioNLP"},{"key":"2023012511031632500_B20","first-page":"e5","article-title":"GEO accession numbers in MEDLINE\u00ae","volume":"349","author":"Yorks","year":"2006","journal-title":"NLM Tech. Bull."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/23\/3306\/48861282\/bioinformatics_27_23_3306.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/23\/3306\/48861282\/bioinformatics_27_23_3306.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T11:08:52Z","timestamp":1674644932000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/23\/3306\/236597"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,10,13]]},"references-count":20,"journal-issue":{"issue":"23","published-print":{"date-parts":[[2011,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr573","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,12,1]]},"published":{"date-parts":[[2011,10,13]]}}}