{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,4]],"date-time":"2025-05-04T22:03:19Z","timestamp":1746396199647},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Serial Analysis of Gene Expressions (SAGE) produces gene expression measurements on a discrete scale, due to the finite number of molecules in the sample. This means that part of the variance in SAGE data should be understood as the sampling error in a binomial or Poisson distribution, whereas other variance sources, in particular biological variance, should be modeled using a continuous distribution function, i.e. a prior on the intensity of the Poisson distribution. One challenge is that such a model predicts a large number of genes with zero counts, which cannot be observed.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We present a hierarchical Poisson model with a gamma prior and three different algorithms for estimating the parameters in the model. It turns out that the rate parameter in the gamma distribution can be estimated on the basis of a single SAGE library, whereas the estimate of the shape parameter becomes unstable. This means that the number of zero counts cannot be estimated reliably. When a bivariate model is applied to two SAGE libraries, however, the number of predicted zero counts becomes more stable and in approximate agreement with the number of transcripts observed across a large number of experiments. In all the libraries we analyzed there was a small population of very highly expressed tags, typically 1% of the tags, that could not be accounted for by the model. To handle those tags we chose to augment our model with a non-parametric component. We also show some results based on a log-normal distribution instead of the gamma distribution.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>By modeling SAGE data with a hierarchical Poisson model it is possible to separate the sampling variance from the variance in gene expression. If expression levels are reported at the gene level rather than at the tag level, genes mapped to multiple tags must be kept separate, since their expression levels show a different statistical behavior. A log-normal prior provided a better fit to our data than the gamma prior, but except for a small subpopulation of tags with very high counts, the two priors are similar.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-157","type":"journal-article","created":{"date-parts":[[2006,4,6]],"date-time":"2006-04-06T13:29:24Z","timestamp":1144330164000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Modeling Sage data with a truncated gamma-Poisson model"],"prefix":"10.1186","volume":"7","author":[{"given":"Helene H","family":"Thygesen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aeilko H","family":"Zwinderman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,3,20]]},"reference":[{"key":"896_CR1","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1126\/science.270.5235.484","volume":"270","author":"VE Velculescu","year":"1995","unstructured":"Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science 1995, 270: 484\u2013487.","journal-title":"Science"},{"key":"896_CR2","doi-asserted-by":"publisher","first-page":"285","DOI":"10.1155\/S1110865701000294","volume":"4","author":"VA Kuznetsov","year":"2001","unstructured":"Kuznetsov VA: Distribution associated with stachastic processes of gene expression in a single eukaryotic cell. EURASIP Journal of Applied Signal Processing 2001, 4: 285\u2013296. 10.1155\/S1110865701000294","journal-title":"EURASIP Journal of Applied Signal Processing"},{"key":"896_CR3","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1093\/bioinformatics\/btg018","volume":"19","author":"MD Stern","year":"2003","unstructured":"Stern MD, Anisimov SV, Boheler KR: Can transcriptome size be estimated from sage catalogs? Bioinformatics 2003, 19: 443\u2013448. 10.1093\/bioinformatics\/btg018","journal-title":"Bioinformatics"},{"key":"896_CR4","unstructured":"Blades NJ, Jones JB, Kern SE, Parmigiani G: Denoising ofdata from serial analysis of gene expressions. Bioinformatics, in press."},{"key":"896_CR5","doi-asserted-by":"crossref","unstructured":"Beissbarth T, Hyde L, Smyth GK, Job C, Boon W-M, Tan S-S, Scott HS, Speed TP: Statistical modelling of sequencing errors in sage libraries. Bioinformatics 2004, (Suppl 1):31\u201339. 10.1093\/bioinformatics\/bth924","DOI":"10.1093\/bioinformatics\/bth924"},{"key":"896_CR6","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1186\/1471-2105-5-152","volume":"5","author":"SV Anisimov","year":"2004","unstructured":"Anisimov SV, Sharov AA: Incidence of \"quasi-ditags\" in catalogs generated by serial analysis of gene expression (sage). BMC Bioinformatics 2004, 5: 152. 10.1186\/1471-2105-5-152","journal-title":"BMC Bioinformatics"},{"key":"896_CR7","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1093\/genetics\/161.3.1321","volume":"161","author":"VA Kuznetsov","year":"2002","unstructured":"Kuznetsov VA, Knott GD, Bonner RF: General statistics of stochastic process of gene expression in eukaryotic cells. Genetics 2002, 161: 1321\u20131332.","journal-title":"Genetics"},{"key":"896_CR8","volume-title":"SAGE: Current technologies and applications","year":"2005","unstructured":"Wang SM, (Ed): SAGE: Current technologies and applications. Horizon Biosci; 2005."},{"key":"896_CR9","volume-title":"Bayes and Emperical Bayes methods for data analysis","author":"BP Carlin","year":"1996","unstructured":"Carlin BP, Louis TA: Bayes and Emperical Bayes methods for data analysis. Chapma and Hall, London; 1996."},{"key":"896_CR10","first-page":"225","volume-title":"Univariate discrete distributions","author":"NL Johnson","year":"1992","unstructured":"Johnson NL, Kotz S, Kemp AW: Truncated negative binomial distributions. In Univariate discrete distributions. 2nd edition. Chapman and Hall, New York; 1992:225\u2013227.","edition":"2"},{"key":"896_CR11","doi-asserted-by":"publisher","first-page":"637","DOI":"10.2307\/2530255","volume":"35","author":"D Schenzle","year":"1979","unstructured":"Schenzle D: Fitting the truncated negative binomial distribution without the second sample moment. Biometrics 1979, 35: 637\u2013639.","journal-title":"Biometrics"},{"key":"896_CR12","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1186\/gb-2004-5-7-r51","volume":"5","author":"L Cai","year":"2004","unstructured":"Cai L, Huang H, Blackshaw S, Liu JS, Cepko C, Wong WH: Clustering analysis of sage data using a poisson approach. Genome Biology 2004, 5: 51. 10.1186\/gb-2004-5-7-r51","journal-title":"Genome Biology"},{"key":"896_CR13","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1186\/1471-2105-5-119","volume":"5","author":"RZN Vencio","year":"2004","unstructured":"Vencio RZN, Brentani A, Patrao AFC, Pereia CAB: Bayesian model accounting for within-class biological variability in serial analysis of gene expressions (sage). BMC Bioinformatics 2004, 5: 119. 10.1186\/1471-2105-5-119","journal-title":"BMC Bioinformatics"},{"key":"896_CR14","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1152\/physiolgenomics.00042.2002","volume":"11","author":"JM Ruijter","year":"2002","unstructured":"Ruijter JM, van Kampen AHC, Baas F: Statistical evaluation of sage libraries: consequences for experimental design. Physiol Genomics 2002, 11: 37\u201344.","journal-title":"Physiol Genomics"},{"key":"896_CR15","doi-asserted-by":"publisher","first-page":"1477","DOI":"10.1093\/bioinformatics\/btg173","volume":"19","author":"KA Baggerly","year":"2003","unstructured":"Baggerly KA, Deng L, Morris JS, Marcelo Aldaz C: Differential expression in sage: accounting for normal between-library variation. Bioinformatics 2003, 19: 1477\u20131483. 10.1093\/bioinformatics\/btg173","journal-title":"Bioinformatics"},{"key":"896_CR16","doi-asserted-by":"publisher","first-page":"476","DOI":"10.1111\/1541-0420.00057","volume":"59","author":"JS Morris","year":"2003","unstructured":"Morris JS, Baggerly KA, Coombes KR: Bayesian shrinkage estimation of the relative abundance of m-rna transcripts using sage. Biometrics 2003, 59: 476\u2013486. 10.1111\/1541-0420.00057","journal-title":"Biometrics"},{"key":"896_CR17","doi-asserted-by":"publisher","first-page":"1289","DOI":"10.1126\/science.1056794","volume":"291","author":"H Caron","year":"2001","unstructured":"Caron H, van Schaik B, van der Mee M, Baas F, Riggins G, vanSluis P, Hermus M-C, van Asperen R, Boon K, Voute KA, Heisterkamp A, van Kampen A, Versteeg R: The human transcriptome map:clustering of highly expressed genes in chromosomal domains. Science 2001, 291: 1289\u20131292. 10.1126\/science.1056794","journal-title":"Science"},{"key":"896_CR18","unstructured":"The human transcriptome map[http:\/\/bioinfo.amc.uva.nl\/HTMseq\/controller]"},{"key":"896_CR19","doi-asserted-by":"publisher","first-page":"11547","DOI":"10.1073\/pnas.192436299","volume":"99","author":"P Liang","year":"2002","unstructured":"Liang P: Sage genie: a suite with panoramic view of gene expression. PNAS 2002, 99: 11547\u201311548. 10.1073\/pnas.192436299","journal-title":"PNAS"},{"key":"896_CR20","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1007\/978-1-4757-3121-7","volume-title":"Modern applied statistics with S-Plus","author":"WN Venables","year":"1999","unstructured":"Venables WN, Ripley BD: General facilities for minimization. In Modern applied statistics with S-Plus. 3rd edition. Springer, New York; 1999:267\u2013269.","edition":"3"},{"key":"896_CR21","unstructured":"The bugs project[http:\/\/www.mrc-bsu.cam.ac.uk\/bugs\/welcome.shtml]"},{"key":"896_CR22","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/BF00048672","volume":"44","author":"AM Mathal","year":"1992","unstructured":"Mathal AM, Moschopoulos PG: A form of multivariate gamma distribution. Annals of the Institute of Statistical Mathematics 1992, 44: 97\u2013106. 10.1007\/BF00048672","journal-title":"Annals of the Institute of Statistical Mathematics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-157.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:02:54Z","timestamp":1630494174000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-157"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3,20]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["896"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-157","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,3,20]]},"assertion":[{"value":"1 September 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"157"}}