{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:39:02Z","timestamp":1773790742162,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2017,5,23]],"date-time":"2017-05-23T00:00:00Z","timestamp":1495497600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["LM007359"],"award-info":[{"award-number":["LM007359"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["AI117924"],"award-info":[{"award-number":["AI117924"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["HG007019"],"award-info":[{"award-number":["HG007019"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The NCBI\u2019s Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http:\/\/github.com\/deweylab\/metasra-pipeline<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx334","type":"journal-article","created":{"date-parts":[[2017,5,21]],"date-time":"2017-05-21T23:06:18Z","timestamp":1495407978000},"page":"2914-2923","source":"Crossref","is-referenced-by-count":95,"title":["MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive"],"prefix":"10.1093","volume":"33","author":[{"given":"Matthew N","family":"Bernstein","sequence":"first","affiliation":[{"name":"Department of Computer Sciences, University of Wisconsin, Madison, WI, USA"}]},{"given":"AnHai","family":"Doan","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, University of Wisconsin, Madison, WI, USA"}]},{"given":"Colin N","family":"Dewey","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, University of Wisconsin, Madison, WI, USA"},{"name":"Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,5,23]]},"reference":[{"key":"2023020206402056200_btx334-B1","doi-asserted-by":"crossref","first-page":"R21","DOI":"10.1186\/gb-2005-6-2-r21","article-title":"An ontology for cell types","volume":"6","author":"Bard","year":"2005","journal-title":"Genome Biol"},{"key":"2023020206402056200_btx334-B2","doi-asserted-by":"crossref","first-page":"D57","DOI":"10.1093\/nar\/gkr1163","article-title":"BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata","volume":"40","author":"Barrett","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B3","doi-asserted-by":"crossref","first-page":"D991","DOI":"10.1093\/nar\/gks1193","article-title":"NCBI GEO: archive for functional genomics data sets\u2014update","volume":"41","author":"Barrett","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B4","first-page":"271","volume-title":"Proceedings of the 9th International Symposium on String Processing and Information Retrieval, SPIRE 2002","author":"Bartolini","year":"2002"},{"key":"2023020206402056200_btx334-B5","doi-asserted-by":"crossref","first-page":"651","DOI":"10.1175\/WAF993.1","article-title":"Increasing the reliability of reliability diagrams","volume":"22","author":"Br\u00f6cker","year":"2007","journal-title":"Weather Forecasting"},{"key":"2023020206402056200_btx334-B6","author":"Browne","year":"2000"},{"key":"2023020206402056200_btx334-B7","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1186\/2041-1480-4-32","article-title":"The drosophila anatomy ontology","volume":"4","author":"Costa","year":"2013","journal-title":"J. Biomed. Sem"},{"key":"2023020206402056200_btx334-B8","first-page":"403","article-title":"Ontology-based annotations and semantic relations in large-scale (epi)genomics data","volume":"18","author":"Galeota","year":"2017","journal-title":"Brief. Bioinf"},{"key":"2023020206402056200_btx334-B9","doi-asserted-by":"crossref","first-page":"bas033","DOI":"10.1093\/database\/bas033","article-title":"The Units Ontology: a tool for integrating units of measurement in science","volume":"2012","author":"Gkoutos","year":"2012","journal-title":"Database"},{"key":"2023020206402056200_btx334-B10","doi-asserted-by":"crossref","first-page":"4038","DOI":"10.1093\/bioinformatics\/btv503","article-title":"RNASeqMetaDB: a database and web server for navigating metadata of publicly available mouse RNA-Seq datasets","volume":"31","author":"Guo","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020206402056200_btx334-B11","doi-asserted-by":"crossref","first-page":"D456","DOI":"10.1093\/nar\/gks1146","article-title":"The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013","volume":"41","author":"Hastings","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B12","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1038\/227168a0","article-title":"Characteristics of a human diploid cell designated MRC-5","volume":"227","author":"Jacobs","year":"1970","journal-title":"Nature"},{"key":"2023020206402056200_btx334-B13","doi-asserted-by":"crossref","first-page":"D1071","DOI":"10.1093\/nar\/gku1011","article-title":"Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data","volume":"43","author":"Kibbe","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B14","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The sequence read archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B15","doi-asserted-by":"crossref","first-page":"bav010","DOI":"10.1093\/database\/bav010","article-title":"Ontology application and use at the ENCODE DCC","volume":"2015","author":"Malladi","year":"2015","journal-title":"Database"},{"key":"2023020206402056200_btx334-B16","doi-asserted-by":"crossref","first-page":"1112","DOI":"10.1093\/bioinformatics\/btq099","article-title":"Modeling sample variables with an Experimental Factor Ontology","volume":"26","author":"Malone","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020206402056200_btx334-B17","doi-asserted-by":"crossref","first-page":"D1077","DOI":"10.1093\/nar\/gkr913","article-title":"Gene Expression Atlas update\u2014a value-added database of microarray and sequencing-based functional genomics experiments","volume":"40","author":"Misha","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B18","doi-asserted-by":"crossref","first-page":"R5","DOI":"10.1186\/gb-2012-13-1-r5","article-title":"Uberon, an integrative multi-species anatomy ontology","volume":"13","author":"Mungall","year":"2012","journal-title":"Genome Biol"},{"key":"2023020206402056200_btx334-B19","doi-asserted-by":"crossref","first-page":"W170","DOI":"10.1093\/nar\/gkp440","article-title":"BioPortal: ontologies and integrated data resources at the click of a mouse","volume":"37","author":"Noy","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023020206402056200_btx334-B20","doi-asserted-by":"crossref","first-page":"bav089","DOI":"10.1093\/database\/bav089","article-title":"SORTA: a system for ontology-based re-coding and technical annotation of biomedical phenotype data","volume":"2015","author":"Pang","year":"2015","journal-title":"Database"},{"key":"2023020206402056200_btx334-B21","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-10-S2-S1","article-title":"Ontology-driven indexing of public datasets for translational bioinformatics","volume":"10","author":"Shah","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020206402056200_btx334-B22","author":"Tanenblatt","year":"2010"},{"key":"2023020206402056200_btx334-B23","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1186\/1471-2105-14-19","article-title":"SRAdb: query and use public next-generation sequencing data from within R","volume":"14","author":"Yuelin","year":"2013","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/18\/2914\/49040778\/bioinformatics_33_18_2914.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/18\/2914\/49040778\/bioinformatics_33_18_2914.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,23]],"date-time":"2023-08-23T13:31:14Z","timestamp":1692797474000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/18\/2914\/3848915"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,5,23]]},"references-count":23,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2017,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx334","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/090506","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,9,15]]},"published":{"date-parts":[[2017,5,23]]}}}