{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T00:03:03Z","timestamp":1778630583732,"version":"3.51.4"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2005,12,13]],"date-time":"2005-12-13T00:00:00Z","timestamp":1134432000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"},{"start":{"date-parts":[[2005,12,13]],"date-time":"2005-12-13T00:00:00Z","timestamp":1134432000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four <jats:italic>Arabidopsis thaliana<\/jats:italic> EST sets suggests that the number of expressed genes present in four different cDNA libraries of <jats:italic>Arabidopsis thaliana<\/jats:italic> varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusion<\/jats:title>\n                        <jats:p>The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-6-300","type":"journal-article","created":{"date-parts":[[2005,12,14]],"date-time":"2005-12-14T07:33:00Z","timestamp":1134545580000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries"],"prefix":"10.1186","volume":"6","author":[{"given":"Ji-Ping Z","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bruce G","family":"Lindsay","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Liying","family":"Cui","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P Kerr","family":"Wall","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Josh","family":"Marion","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiaxuan","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Claude W","family":"dePamphilis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2005,12,13]]},"reference":[{"key":"624_CR1","doi-asserted-by":"publisher","first-page":"1651","DOI":"10.1126\/science.2047873","volume":"252","author":"MD Adams","year":"1991","unstructured":"Adams MD, Kelley JM, Gocayne JD, Dubnick M, Polymeropoulos MH, Xiao H, Merril CR, Wu A, Olde B, Moreno RF, Kerlavage AR, McCombie WR, Venter JC: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 1991, 252: 1651\u20131656.","journal-title":"Science"},{"key":"624_CR2","first-page":"829","volume":"6","author":"X Huang","year":"1999","unstructured":"Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Research 1999, 6: 829\u2013845.","journal-title":"Genome Research"},{"issue":"4","key":"624_CR3","doi-asserted-by":"publisher","first-page":"332","DOI":"10.1038\/ng0893-332","volume":"4","author":"MS Boguski","year":"1993","unstructured":"Boguski MS, Lowe TM, Tolstoshev CM: dbEST-database for expressed sequence \"tags\". Nature Genetics 1993, 4(4):332\u2013333. 10.1038\/ng0893-332","journal-title":"Nature Genetics"},{"issue":"4","key":"624_CR4","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1038\/ng0895-369","volume":"10","author":"MS Boguski","year":"1995","unstructured":"Boguski MS, Schuler GD: ESTablishing a human transcript map. Nature Genetics 1995, 10(4):369\u201371. 10.1038\/ng0895-369","journal-title":"Nature Genetics"},{"key":"624_CR5","doi-asserted-by":"publisher","first-page":"1135","DOI":"10.1101\/gr.9.11.1135","volume":"9","author":"J Burke","year":"1999","unstructured":"Burke J, Davison D, Hide W: d2_cluster: A validated method for clustering EST and full-length cDNA sequences. Genome Research 1999, 9: 1135\u20131142. 10.1101\/gr.9.11.1135","journal-title":"Genome Research"},{"key":"624_CR6","doi-asserted-by":"publisher","first-page":"3657","DOI":"10.1093\/nar\/28.18.3657","volume":"28","author":"F Liang","year":"2000","unstructured":"Liang F, Holt I, Pertea G, Karamycheva S, Salzberg SL, Quackenbush J: An optimized protocol for analysis of EST sequences. Nucleic Acids Research 2000, 28: 3657\u20133665. 10.1093\/nar\/28.18.3657","journal-title":"Nucleic Acids Research"},{"key":"624_CR7","doi-asserted-by":"publisher","first-page":"1143","DOI":"10.1101\/gr.9.11.1143","volume":"9","author":"RT Miller","year":"1999","unstructured":"Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA: A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. Genome Research 1999, 9: 1143\u20131155. 10.1101\/gr.9.11.1143","journal-title":"Genome Research"},{"key":"624_CR8","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1093\/nar\/29.1.234","volume":"29","author":"A Christoffels","year":"2001","unstructured":"Christoffels A, van Gelder A, Greyling G, Miller R, Hide T, Hide W: STACK: Sequence Tag Alignment and Consensus Knowledgebase. Nucleic Acids Research 2001, 29: 234\u20138. 10.1093\/nar\/29.1.234","journal-title":"Nucleic Acids Research"},{"key":"624_CR9","doi-asserted-by":"publisher","first-page":"632","DOI":"10.1038\/355632a0","volume":"355","author":"MD Adams","year":"1992","unstructured":"Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelley JM, Utterback TR, Nagle JW, Fields C, Venter JC: Sequence identification of 2,375 human brain genes. Nature 1992, 355: 632\u2013634. 10.1038\/355632a0","journal-title":"Nature"},{"key":"624_CR10","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1038\/ng0793-256","volume":"4","author":"MD Adams","year":"1993","unstructured":"Adams MD, Kerlavage AR, Fields C, Venter JC: 3,400 new expressed sequenced tags identify diversity of transcripts in human brain. Nature Genetics 1993, 4: 256\u2013267. 10.1038\/ng0793-256","journal-title":"Nature Genetics"},{"key":"624_CR11","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1038\/ng1192-180","volume":"2","author":"AS Khan","year":"1992","unstructured":"Khan AS, Wilcox AS, Polymeropoulos MH, Hopkins JA, Stevens TJ, Robinson M, Orpana AK, Sikela JM: Single pass sequencing and physical and genetic mapping of human brain cDNAs. Nature Genetics 1992, 2: 180\u2013185. 10.1038\/ng1192-180","journal-title":"Nature Genetics"},{"key":"624_CR12","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1038\/sj.tpj.6500109","volume":"2","author":"G Hu","year":"2002","unstructured":"Hu G, Modrek B, Riise SH, Saarela J, Pajukanta P, Kustanovich V, Nelson Peltonen, Lee C: Efficient discovery of single-nucleotide polymorphisms in coding regions of human genes. Pharmacogenomics Journal 2002, 2: 236\u2013242. 10.1038\/sj.tpj.6500109","journal-title":"Pharmacogenomics Journal"},{"key":"624_CR13","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1101\/gr.9.2.167","volume":"9","author":"L Picoult-Newberg","year":"1999","unstructured":"Picoult-Newberg L, Ideker T, Pohl M, Taylor S, Donaldson M, Nickerson D, Boyce-Jacino M: Mining SNPs from EST databases. Genome Research 1999, 9: 167\u2013174.","journal-title":"Genome Research"},{"key":"624_CR14","doi-asserted-by":"publisher","first-page":"999","DOI":"10.1093\/bioinformatics\/btg109","volume":"19","author":"C Lee","year":"2003","unstructured":"Lee C: Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 2003, 19: 999\u20131008. 10.1093\/bioinformatics\/btg109","journal-title":"Bioinformatics"},{"key":"624_CR15","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1093\/bioinformatics\/18.suppl_1.S181","volume":"18","author":"S Heber","year":"2002","unstructured":"Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA: Splicing graphs and EST assembly problem. Bioinformatics 2002, 18: 181\u2013188.","journal-title":"Bioinformatics"},{"key":"624_CR16","doi-asserted-by":"publisher","first-page":"3754","DOI":"10.1093\/nar\/gkf492","volume":"30","author":"Q Xu","year":"2002","unstructured":"Xu Q, Modrek B, Lee C: Genome-wide detection of tissue-specific alternative splicing in the human transcriptome. Nucleic Acids Research 2002, 30: 3754\u20133766. 10.1093\/nar\/gkf492","journal-title":"Nucleic Acids Research"},{"key":"624_CR17","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1038\/ng0102-13","volume":"30","author":"B Modrek","year":"2002","unstructured":"Modrek B, Lee C: A genomic view of alternative splicing. Nature Genetics 2002, 30: 13\u201319. 10.1038\/ng0102-13","journal-title":"Nature Genetics"},{"key":"624_CR18","doi-asserted-by":"publisher","first-page":"2850","DOI":"10.1093\/nar\/29.13.2850","volume":"29","author":"B Modrek","year":"2001","unstructured":"Modrek B, Resch A, Grasso C, Lee C: Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Research 2001, 29: 2850\u20132859. 10.1093\/nar\/29.13.2850","journal-title":"Nucleic Acids Research"},{"key":"624_CR19","first-page":"1821","volume":"8","author":"S Audic","year":"1997","unstructured":"Audic S, Claverie JM: Computational methods for the identification of differential and coordinated gene expression. Human Molecular Genetics 1997, 8: 1821\u20131832.","journal-title":"Human Molecular Genetics"},{"key":"624_CR20","doi-asserted-by":"publisher","first-page":"2055","DOI":"10.1101\/gr.GR-1325RR","volume":"10","author":"DJ Stekel","year":"2000","unstructured":"Stekel DJ, Git Y, Falciani F: The comparison of gene expression from multiple cDNA libraries. Genome Research 2000, 10: 2055\u20132061. 10.1101\/gr.GR-1325RR","journal-title":"Genome Research"},{"key":"624_CR21","doi-asserted-by":"publisher","first-page":"2279","DOI":"10.1093\/bioinformatics\/bth239","volume":"20","author":"E Susko","year":"2004","unstructured":"Susko E, Roger A: Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys. Bioinformatics 2004, 20: 2279\u20132287. 10.1093\/bioinformatics\/bth239","journal-title":"Bioinformatics"},{"key":"624_CR22","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1038\/ng0794-345","volume":"7","author":"C Fields","year":"1994","unstructured":"Fields C, Adams MD, White O, Venter JC: How many genes in the human genome? Nature Genetics 1994, 7: 345\u2013346. 10.1038\/ng0794-345","journal-title":"Nature Genetics"},{"key":"624_CR23","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1038\/76115","volume":"25","author":"B Ewing","year":"2000","unstructured":"Ewing B, Green P: Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genetics 2000, 25: 232\u2013233. 10.1038\/76115","journal-title":"Nature Genetics"},{"key":"624_CR24","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1038\/76126","volume":"25","author":"F Liang","year":"2000","unstructured":"Liang F, Holt I, Pertea G, Karamycheva S, Salzberg S, Quackenbush J: Gene Index analysis of the human genome estimates approximately 120,000 genes. Nature Genetics 2000, 25: 239\u2013240. 10.1038\/76126","journal-title":"Nature Genetics"},{"key":"624_CR25","doi-asserted-by":"publisher","first-page":"1441","DOI":"10.1105\/tpc.010478","volume":"14","author":"R Van der Hoeven","year":"2002","unstructured":"Van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S: Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. The Plant Cell 2002, 14: 1441\u20131456. 10.1105\/tpc.010478","journal-title":"The Plant Cell"},{"key":"624_CR26","doi-asserted-by":"publisher","first-page":"796","DOI":"10.1038\/35048692","volume":"408","author":"The Arabidopsis Genome Initiative","year":"2000","unstructured":"The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana . Nature 2000, 408: 796\u2013815. 10.1038\/35048692","journal-title":"Nature"},{"key":"624_CR27","doi-asserted-by":"publisher","first-page":"2973","DOI":"10.1093\/bioinformatics\/bth342","volume":"20","author":"JPZ Wang","year":"2004","unstructured":"Wang JPZ, Lindsay BG, LeebensMack J, Cui L, Wall PK, Webb CM, dePamphilis CW: EST clustering error evaluation and correction. Bioinformatics 2004, 20: 2973\u20132984. 10.1093\/bioinformatics\/bth342","journal-title":"Bioinformatics"},{"key":"624_CR28","doi-asserted-by":"publisher","first-page":"42","DOI":"10.2307\/1411","volume":"12","author":"RA Fisher","year":"1943","unstructured":"Fisher RA, Corbet AS, Williams CB: The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 1943, 12: 42\u201358.","journal-title":"Journal of Animal Ecology"},{"key":"624_CR29","first-page":"435","volume":"63","author":"B Efron","year":"1976","unstructured":"Efron B, Thisted R: Estimating the number of unseen species: How many words did Shakespeare know? Biometrika 1976, 63: 435\u2013447.","journal-title":"Biometrika"},{"key":"624_CR30","doi-asserted-by":"publisher","first-page":"942","DOI":"10.1198\/016214504000002005","volume":"100","author":"JPZ Wang","year":"2005","unstructured":"Wang JPZ, Lindsay BG: A penalized nonparametric maximum likelihood approach to species richness estimation. Journal of American Statistical Association 2005, 100: 942\u2013959. 10.1198\/016214504000002005","journal-title":"Journal of American Statistical Association"},{"key":"624_CR31","volume-title":"An Introduction to Probability Theory and Its Applications","author":"W Feller","year":"1968","unstructured":"Feller W: An Introduction to Probability Theory and Its Applications. Volume I. Wiley & Sons, inc; 1968."},{"key":"624_CR32","volume-title":"An Introduction to Probability Theory and Its Applications","author":"W Feller","year":"1971","unstructured":"Feller W: An Introduction to Probability Theory and Its Applications. Volume II. Wiley & Sons, inc; 1971."},{"key":"624_CR33","doi-asserted-by":"publisher","first-page":"758","DOI":"10.1080\/01621459.1987.10478496","volume":"82","author":"BG Lindsay","year":"1987","unstructured":"Lindsay BG, Roeder K: A unified treatment of integer parameter models(in Theory and Methods). Journal of the American Statistical Association 1987, 82: 758\u2013764.","journal-title":"Journal of the American Statistical Association"},{"key":"624_CR34","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1093\/biomet\/43.1-2.45","volume":"43","author":"IJ Good","year":"1956","unstructured":"Good IJ, Toulmin GH: The Number of New Species and the Increase in Population Coverage, When a Sample is Increased. Biometrika 1956, 43: 45\u201363.","journal-title":"Biometrika"},{"key":"624_CR35","unstructured":"Egene[http:\/\/www.mathstat.dal.ca\/tsusko]"},{"key":"624_CR36","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1093\/dnares\/7.3.175","volume":"7","author":"E Asamizu","year":"2000","unstructured":"Asamizu E, Nakamura Y, Sato S, Tabata S: A large scale analysis of cDNA in Arabidopsis thaliana: generation of 12,028 non-redundant expressed sequence tags from normalized and size-selected cDNA libraries. DNA Research 2000, 7: 175\u2013180. 10.1093\/dnares\/7.3.175","journal-title":"DNA Research"},{"key":"624_CR37","doi-asserted-by":"publisher","first-page":"887","DOI":"10.1214\/aoms\/1177728066","volume":"27","author":"J Kiefer","year":"1956","unstructured":"Kiefer J, Wolfowitz J: Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters. The Annals of Mathematical Statistics 1956, 27: 887\u2013906.","journal-title":"The Annals of Mathematical Statistics"},{"key":"624_CR38","doi-asserted-by":"publisher","first-page":"139","DOI":"10.2307\/3314608","volume":"9","author":"B Efron","year":"1981","unstructured":"Efron B: Nonparametric standard errors and confidence intervals. Canadian Journal of Statistics 1981, 9: 139\u2013172.","journal-title":"Canadian Journal of Statistics"},{"key":"624_CR39","first-page":"227","volume":"10","author":"A Chao","year":"2000","unstructured":"Chao A, Huang WH, Chen YC, Kuo CY: Estimating the number of shared species in two communities. Statistica Sinica 2000, 10: 227\u2013246.","journal-title":"Statistica Sinica"},{"key":"624_CR40","unstructured":"ESTstat[http:\/\/www.floralgenome.org\/ESTstat]"},{"key":"624_CR41","unstructured":"Supplementray materials[http:\/\/bioinfo.stats.northwestern.edu\/jzwang]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-300.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-6-300\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-300.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:14:52Z","timestamp":1728303292000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-300"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,12,13]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2005,12]]}},"alternative-id":["624"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-300","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,12,13]]},"assertion":[{"value":"3 December 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2005","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 December 2005","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"300"}}