{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,13]],"date-time":"2023-09-13T21:41:01Z","timestamp":1694641261173},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Ensemble attribute profile clustering is a novel, text-based strategy for analyzing a user-defined list of genes and\/or proteins. The strategy exploits annotation data present in gene-centered corpora and utilizes ideas from statistical information retrieval to discover and characterize properties shared by subsets of the list. The practical utility of this method is demonstrated by employing it in a retrospective study of two non-overlapping sets of genes defined by a published investigation as markers for normal human breast luminal epithelial cells and myoepithelial cells.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Each genetic locus was characterized using a finite set of biological properties and represented as a vector of features indicating attributes associated with the locus (a gene attribute profile). In this study, the vector space models for a pre-defined list of genes were constructed from the Gene Ontology (GO) terms and the Conserved Domain Database (CDD) protein domain terms assigned to the loci by the gene-centered corpus LocusLink. This data set of GO- and CDD-based gene attribute profiles, vectors of binary random variables, was used to estimate multiple finite mixture models and each ensuing model utilized to partition the profiles into clusters. The resultant partitionings were combined using a unanimous voting scheme to produce consensus clusters, sets of profiles that co-occured consistently in the same cluster. Attributes that were important in defining the genes assigned to a consensus cluster were identified. The clusters and their attributes were inspected to ascertain the GO and CDD terms most associated with subsets of genes and in conjunction with external knowledge such as chromosomal location, used to gain functional insights into human breast biology. The 52 luminal epithelial cell markers and 89 myoepithelial cell markers are disjoint sets of genes. Ensemble attribute profile clustering-based analysis indicated that both lists contained groups of genes with the functional properties of membrane receptor biology\/signal transduction and nucleic acid binding\/transcription. A subset of the luminal markers was associated with metabolic and oxidoreductase activities, whereas a subset of myoepithelial markers was associated with protein hydrolase activity.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Given a set of genes and\/or proteins associated with a phenomenon, process or system of interest, ensemble attribute profile clustering provides a simple method for collating and sythesizing the annotation data pertaining to them that are present in text-based, gene-centered corpora. The results provide information about properties common and unique to subsets of the list and hence insights into the biology of the problem under investigation.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-147","type":"journal-article","created":{"date-parts":[[2006,4,6]],"date-time":"2006-04-06T13:28:13Z","timestamp":1144330093000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Ensemble attribute profile clustering: discovering and characterizing groups of genes with similar patterns of biological features"],"prefix":"10.1186","volume":"7","author":[{"given":"JR","family":"Semeiks","sequence":"first","affiliation":[]},{"given":"A","family":"Rizki","sequence":"additional","affiliation":[]},{"given":"MJ","family":"Bissell","sequence":"additional","affiliation":[]},{"given":"IS","family":"Mian","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,3,16]]},"reference":[{"key":"886_CR1","unstructured":"PubMed[http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?db=PubMed]"},{"key":"886_CR2","unstructured":"LocusLink[http:\/\/www.ncbi.nlm.nih.gov\/LocusLink]"},{"key":"886_CR3","unstructured":"Entrez Gene[http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?db=gene]"},{"key":"886_CR4","unstructured":"SGD[http:\/\/www.yeastgenome.org]"},{"key":"886_CR5","unstructured":"Wormbase[http:\/\/www.wormbase.org]"},{"key":"886_CR6","unstructured":"Flybase[http:\/\/www.flybase.org]"},{"key":"886_CR7","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1093\/bioinformatics\/16.2.125","volume":"16","author":"R MacCallum","year":"2000","unstructured":"MacCallum R, Kelley R, Steinberg M: SAWTED: Structure Assignment With Text Description \u2013 Enhanced detection of remote homologues with automated SW1SS-PROT annotation comparisons. Bioinformatics 2000, 16: 125\u2013129. 10.1093\/bioinformatics\/16.2.125","journal-title":"Bioinformatics"},{"key":"886_CR8","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1038\/88213","volume":"28","author":"T Jenssen","year":"2001","unstructured":"Jenssen T, Laegreid A, Komorowski J, Hovig E: A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics 2001, 28: 21\u201328. 10.1038\/88213","journal-title":"Nature Genetics"},{"key":"886_CR9","doi-asserted-by":"publisher","first-page":"4553","DOI":"10.1093\/nar\/gkg636","volume":"31","author":"S Raychaudhuri","year":"2003","unstructured":"Raychaudhuri S, Chang J, Imam F, Altman R: The computational analysis of scientific literature to define and recognize gene expression clusters. Nucleic Acids Research 2003, 31: 4553\u20134560. 10.1093\/nar\/gkg636","journal-title":"Nucleic Acids Research"},{"key":"886_CR10","doi-asserted-by":"publisher","first-page":"e134","DOI":"10.1371\/journal.pbio.0030134","volume":"3","author":"J Korbel","year":"2005","unstructured":"Korbel J, Doerks T, Jensen L, Perez-Iratxeta C, Kaczanowski S, Hooper S, Andrade M, Bork P: Systematic association of genes to phenotypes by genome and literature mining. PLoS Biology 2005, 3: e134. 10.1371\/journal.pbio.0030134","journal-title":"PLoS Biology"},{"key":"886_CR11","volume-title":"BMC Bioinformatics","author":"D Blei","year":"2006","unstructured":"Blei D, Franks K, Jordan M, Mian I: Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to aging. BMC Bioinformatics 2006, in press."},{"key":"886_CR12","doi-asserted-by":"publisher","first-page":"1582","DOI":"10.1101\/gr.116402","volume":"12","author":"S Raychaudhuri","year":"2002","unstructured":"Raychaudhuri S, Sch\u00fctze H, Altman R: Using text analysis to identify functionally coherent gene groups. Genome Research 2002, 12: 1582\u20131590. 10.1101\/gr.116402","journal-title":"Genome Research"},{"key":"886_CR13","doi-asserted-by":"publisher","first-page":"D258","DOI":"10.1093\/nar\/gkh066","volume":"32","author":"M Harris","year":"2004","unstructured":"Harris M, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin G, Blake J, Bult C, Dolan M, Drabkin H, Eppig J, Hill D, Ni L, Ringwald M, Balakrishnan R, Cherry J, Christie K, Costanzo M, Dwight S, Engel S, Fisk D, Hirschman J, Hong E, Nash R, Sethuraman A, Theesfeld C, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee S, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz E, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 2004, 32: D258-D261. 10.1093\/nar\/gkh066","journal-title":"Nucleic Acids Research"},{"key":"886_CR14","volume-title":"Genome Biology","author":"B Zeeberg","year":"2003","unstructured":"Zeeberg B, Feng W, Wang G, Wang M, Fojo A, Sunshine M, Narasimhan S, Kane D, Reinhold W, Lababidi S, Bussey K, Riss J, Barrett J, Weinstein J: GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biology 2003., 4:"},{"key":"886_CR15","unstructured":"GOTermFinder[http:\/\/www.yeastgenome.org]"},{"key":"886_CR16","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1186\/1471-2105-3-35","volume":"3","author":"M Robinson","year":"2002","unstructured":"Robinson M, Grigull J, Mohammad N, Hughes T: FunSpec: a web-based cluster interpreter for yeast. BMC Bioinformatics 2002, 3: 35\u201340. 10.1186\/1471-2105-3-35","journal-title":"BMC Bioinformatics"},{"key":"886_CR17","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.1093\/bioinformatics\/bth088","volume":"20","author":"T Bei\u00dfbarth","year":"2004","unstructured":"Bei\u00dfbarth T, Speed T: GOstat: Find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004, 20: 1464\u20131465. 10.1093\/bioinformatics\/bth088","journal-title":"Bioinformatics"},{"key":"886_CR18","doi-asserted-by":"publisher","first-page":"2502","DOI":"10.1093\/bioinformatics\/btg363","volume":"19","author":"G Berriz","year":"2003","unstructured":"Berriz G, King O, Bryant B, Sander C, Roth F: Characterizing gene sets with FuncAssociate. Bioinformatics 2003, 19: 2502\u20132504. 10.1093\/bioinformatics\/btg363","journal-title":"Bioinformatics"},{"key":"886_CR19","volume-title":"Foundations of Statistical Natural Language Processing","author":"C Manning","year":"1999","unstructured":"Manning C, Sch\u00fctze H: Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press; 1999."},{"key":"886_CR20","volume-title":"Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer","author":"G Salton","year":"1988","unstructured":"Salton G: Automatic Text Processing: The Transformation Analysis and Retrieval of Information by Computer. Addison-Wesley; 1988."},{"key":"886_CR21","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1152\/physiolgenomics.2000.4.2.109","volume":"4","author":"E Moler","year":"2000","unstructured":"Moler E, Chow M, Mian I: Analysis of molecular profile data using generative and discriminative methods. Physiological Genomics 2000, 4: 109\u2013126.","journal-title":"Physiological Genomics"},{"key":"886_CR22","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1152\/physiolgenomics.2000.4.2.127","volume":"4","author":"E Moler","year":"2000","unstructured":"Moler E, Radisky D, Mian I: Integrating na\u00efve Bayes models and external knowledge to examine copper and iron homeostasis in Saccharomyces cerevisiae . Physiological Genomics 2000, 4: 127\u2013135.","journal-title":"Physiological Genomics"},{"key":"886_CR23","doi-asserted-by":"publisher","first-page":"13790","DOI":"10.1073\/pnas.191502998","volume":"98","author":"A Bhattacharjee","year":"2001","unstructured":"Bhattacharjee A, Richards W, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Weber G, Mark E, Lander E, Wong W, Johnson B, Golub T, Sugarbaker D, Meyerson M: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 2001, 98: 13790\u201313795. 10.1073\/pnas.191502998","journal-title":"Proc Natl Acad Sci"},{"key":"886_CR24","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/A:1023949509487","volume":"52","author":"S Monti","year":"2003","unstructured":"Monti S, Tamayo P, Mesirov J, Golub T: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 2003, 52: 91\u2013118. 10.1023\/A:1023949509487","journal-title":"Machine Learning"},{"key":"886_CR25","first-page":"56","volume-title":"Proceedings of the IDAMAP2001 Workshop 2001","author":"P Kellam","year":"2001","unstructured":"Kellam P, Liu X, Martin N, Orengo C, Swift S, Tucker A: Comparing, contrasting and combining clusters in viral gene expression data. Proceedings of the IDAMAP2001 Workshop 2001 2001, 56\u201362."},{"key":"886_CR26","doi-asserted-by":"publisher","first-page":"3037","DOI":"10.1158\/0008-5472.CAN-03-2028","volume":"64","author":"C Jones","year":"2004","unstructured":"Jones C, Mackay A, Grigoriadis A, Cossu A, Reis-Filho J, Fulford L, Dexter T, Davies S, Bulmer K, Ford E, Parry S, Budroni M, Palmieri G, Neville A, O'Hare M, Lakhani S: Expression profiling of purified normal human luminal and myoepithelial breast cells: identification of novel prognostic markers for breast cancer. Cancer Research 2004, 64: 3037\u20133045. 10.1158\/0008-5472.CAN-03-2028","journal-title":"Cancer Research"},{"key":"886_CR27","volume-title":"Advances in Neural Information Processing Systems","author":"T Hofmann","year":"1999","unstructured":"Hofmann T, Puzicha J, Jordan M: Learning from dyadic data. In Advances in Neural Information Processing Systems. Volume 11. MIT Press, Cambridge MA; 1999."},{"key":"886_CR28","volume-title":"Bayesian Analysis","author":"P Hoff","year":"2005","unstructured":"Hoff P: Model-based subspace clustering. Bayesian Analysis 2005, in press."},{"key":"886_CR29","volume-title":"Biometrics","author":"P Hoff","year":"2005","unstructured":"Hoff P: Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics 2005, in press."},{"key":"886_CR30","doi-asserted-by":"publisher","first-page":"R43","DOI":"10.1186\/gb-2004-5-6-r43","volume":"5","author":"P Glenisson","year":"2004","unstructured":"Glenisson P, Coessens B, van Vooren S, Mathys J, Moreau Y, de Moor B: TXTGate: profiling gene groups with text-based information. Genome Biology 2004, 5: R43. 10.1186\/gb-2004-5-6-r43","journal-title":"Genome Biology"},{"key":"886_CR31","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1038\/nrg1319","volume":"5","author":"L Hurst","year":"2004","unstructured":"Hurst L, P\u00e1l C, Lercher M: The evolutionary dynamics of eukaryotic gene order. Nature Review Genetics 2004, 5: 299\u2013310. 10.1038\/nrg1319","journal-title":"Nature Review Genetics"},{"key":"886_CR32","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.ccr.2004.06.010","volume":"6","author":"M Allinen","year":"2004","unstructured":"Allinen M, Beroukhim R, Cai L, Brennan C, Lahti-Domenici J, Huang H, Porter D, Hu M, Chin L, Richardson A, Schnitt S, Sellers W, Polyak K: Molecular characterization of the tumor microenvironment in breast cancer. Cancer Cell 2004, 6: 17\u201332. 10.1016\/j.ccr.2004.06.010","journal-title":"Cancer Cell"},{"key":"886_CR33","doi-asserted-by":"publisher","first-page":"5988","DOI":"10.1158\/1078-0432.CCR-03-0731","volume":"10","author":"C Jones","year":"2004","unstructured":"Jones C, Ford E, Gillett C, Ryder K, Merrett S, Reis-Filho J, Fulford L, Hanby A, Lakhani S: Molecular cytogenetic identification of subgroups of grade III invasive ductal breast carcinomas with different clinical outcomes. Clinical Cancer Research 2004, 10: 5988\u20135997. 10.1158\/1078-0432.CCR-03-0731","journal-title":"Clinical Cancer Research"},{"key":"886_CR34","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1093\/nar\/29.1.137","volume":"29","author":"K Pruitt","year":"2001","unstructured":"Pruitt K, Maglott D: RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Research 2001, 29: 137\u2013140. 10.1093\/nar\/29.1.137","journal-title":"Nucleic Acids Research"},{"key":"886_CR35","unstructured":"RefSeq[http:\/\/www.ncbi.nlm.nih.gov\/RefSeq]"},{"key":"886_CR36","unstructured":"Bioperl[http:\/\/www.bioperl.org]"},{"key":"886_CR37","unstructured":"GO[http:\/\/www.geneontology.org\/]"},{"key":"886_CR38","unstructured":"CDD[http:\/\/www.ncbi.nlm.nih.gov\/Structure\/cdd\/cdd.shtml]"},{"key":"886_CR39","first-page":"153","volume-title":"Advances in Knowledge Discovery and Data Mining","author":"P Cheeseman","year":"1996","unstructured":"Cheeseman P, Stutz J: Bayesian Classification (AutoClass): Theory and Results. In Advances in Knowledge Discovery and Data Mining. Edited by: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R. AAAI Press\/MIT Press; 1996:153\u2013180."},{"key":"886_CR40","doi-asserted-by":"publisher","DOI":"10.1002\/0471660264","volume-title":"Combining Pattern Classifiers: Methods and Algorithms","author":"L Kuncheva","year":"2004","unstructured":"Kuncheva L: Combining Pattern Classifiers: Methods and Algorithms. London: John Wiley & Sons; 2004."},{"key":"886_CR41","first-page":"309","volume-title":"Multiple Classifier Systems","author":"A Fred","year":"2002","unstructured":"Fred A: Finding consistent clusters in data partitions. In Multiple Classifier Systems. Volume LNCS 2364. Springer; 2002:309\u2013318."},{"key":"886_CR42","doi-asserted-by":"publisher","first-page":"583","DOI":"10.1162\/153244303321897735","volume":"3","author":"A Strehl","year":"2002","unstructured":"Strehl A, Ghosh J: Cluster Ensembles \u2013 A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research 2002, 3: 583\u2013617. 10.1162\/153244303321897735","journal-title":"Journal of Machine Learning Research"},{"key":"886_CR43","volume-title":"Proceedings SIAM Conf on Data Mining","author":"A Topchy","year":"2004","unstructured":"Topchy A, Jain A, Punch W: A mixture model of clustering ensembles. Proceedings SIAM Conf on Data Mining 2004."},{"key":"886_CR44","unstructured":"C++ Boost Graph library[http:\/\/www.boost.org\/libs\/graph\/doc\/index.html]"},{"key":"886_CR45","unstructured":"KEGG[http:\/\/www.genome.jp\/kegg]"},{"key":"886_CR46","unstructured":"UCSC Genome Browser[http:\/\/genome.ucsc.edu]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-147.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:24:45Z","timestamp":1630466685000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-147"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,3,16]]},"references-count":46,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["886"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-147","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,3,16]]},"assertion":[{"value":"28 June 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 March 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 March 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"147"}}