{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T13:10:04Z","timestamp":1740229804091,"version":"3.37.3"},"reference-count":33,"publisher":"IGI Global","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,4,1]]},"abstract":"<p>Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.<\/p>","DOI":"10.4018\/jkdb.2010040102","type":"journal-article","created":{"date-parts":[[2010,7,1]],"date-time":"2010-07-01T01:19:04Z","timestamp":1277947144000},"page":"12-28","source":"Crossref","is-referenced-by-count":4,"title":["Clustering Genes Using Heterogeneous Data Sources"],"prefix":"10.4018","volume":"1","author":[{"given":"Erliang","family":"Zeng","sequence":"first","affiliation":[{"name":"University of Notre Dame, USA"}]},{"given":"Chengyong","family":"Yang","sequence":"additional","affiliation":[{"name":"Life Technologies Inc., USA"}]},{"given":"Tao","family":"Li","sequence":"additional","affiliation":[{"name":"Florida International University, USA"}]},{"given":"Giri","family":"Narasimhan","sequence":"additional","affiliation":[{"name":"Florida International University, USA"}]}],"member":"2432","reference":[{"key":"jkdb.2010040102-0","unstructured":"Basu, S., Banerjee, A., et al. (2002). Semi-supervised clustering by seeding. Paper presented at the International Conference on Machine Learning."},{"key":"jkdb.2010040102-1","doi-asserted-by":"publisher","DOI":"10.1088\/0954-898X\/7\/1\/003"},{"key":"jkdb.2010040102-2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btg363"},{"key":"jkdb.2010040102-3","doi-asserted-by":"crossref","unstructured":"Bickel, S., & Tobias, S. (2004). Multi-View Clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM'04).","DOI":"10.1109\/ICDM.2004.10095"},{"key":"jkdb.2010040102-4","doi-asserted-by":"crossref","unstructured":"Bilenko, M., Basu, S., et al. (2004). Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the International Conference on Machine Learning (ICML '04).","DOI":"10.1145\/1015330.1015360"},{"key":"jkdb.2010040102-5","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btg160"},{"key":"jkdb.2010040102-6","doi-asserted-by":"publisher","DOI":"10.1007\/11871637_15"},{"key":"jkdb.2010040102-7","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-497"},{"key":"jkdb.2010040102-8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the em algorithm.","volume":"39","author":"A. P.Dempster","year":"1977","journal-title":"Journal of the Royal Statistical Society. Series A (General)"},{"key":"jkdb.2010040102-9","doi-asserted-by":"publisher","DOI":"10.1101\/gr.397002"},{"key":"jkdb.2010040102-10","first-page":"391","article-title":"Evaluation of the vector space representation in text-based gene clustering.","author":"P.Glenisson","year":"2003","journal-title":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing"},{"key":"jkdb.2010040102-11","doi-asserted-by":"publisher","DOI":"10.1145\/980972.980985"},{"key":"jkdb.2010040102-12","doi-asserted-by":"publisher","DOI":"10.1126\/science.292.5518.929"},{"journal-title":"Algorithms for clustering data","year":"1988","author":"A. K.Jain","key":"jkdb.2010040102-13"},{"key":"jkdb.2010040102-14","doi-asserted-by":"crossref","unstructured":"Kasturi, J., & Acharya, R. (2004). Clustering of diverse genomic data using information fusion. In Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus (pp. 116-120). New York: ACM Press.","DOI":"10.1145\/967900.967926"},{"key":"jkdb.2010040102-15","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2007.11.001"},{"key":"jkdb.2010040102-16","unstructured":"Klein, D., Kamvar, S. D., et al. (2002). From instance-level constraints to space- level constraints: Making the most of prior knowledge in data clustering. Paper presented at the International Conference on Machine Learning (ICML'02)."},{"key":"jkdb.2010040102-17","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkg108"},{"issue":"5","key":"jkdb.2010040102-18","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1002\/(SICI)1097-4571(198609)37:5<279::AID-ASI1>3.0.CO;2-Q","article-title":"A critical analysis of vector space model for information retrieval.","volume":"37","author":"V. V.Raghavan","year":"1986","journal-title":"Journal of the American Society for Information Science American Society for Information Science"},{"key":"jkdb.2010040102-19","doi-asserted-by":"publisher","DOI":"10.2307\/2284239"},{"key":"jkdb.2010040102-20","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.98.2.381"},{"key":"jkdb.2010040102-21","doi-asserted-by":"publisher","DOI":"10.1038\/nbt1098-939"},{"key":"jkdb.2010040102-22","doi-asserted-by":"crossref","first-page":"371a","DOI":"10.1091\/mbc.9.12.3273","article-title":"Identification of cell cycle regulated genes in yeast by DNA microarray hybridization.","volume":"9","author":"P. T.Spellman","year":"1998","journal-title":"Molecular Biology of the Cell"},{"key":"jkdb.2010040102-23","doi-asserted-by":"crossref","unstructured":"Stephens, M., Palakal, M., et al. (2001). Detecting gene relations from MEDLINE abstracts. In Proceedings of the sixth Ann Pac Symp Biocomp (PSB 2001).","DOI":"10.1142\/9789814447362_0047"},{"issue":"5","key":"jkdb.2010040102-24","first-page":"2050","article-title":"The two positively acting regulatory proteins PHO2 and PHO4 physically interact with PHO5 upstream activation regions.","volume":"9","author":"K.Vogel","year":"1989","journal-title":"Molecular and Cellular Biology"},{"key":"jkdb.2010040102-25","unstructured":"Wagsta, K., Basu, S., et al. (2006). When is constrained clustering beneficial, and why? In Proceedings of the AAAI."},{"key":"jkdb.2010040102-26","unstructured":"Wagsta, K., Cardie, C., et al. (2001). Constrained k-means clustering with background knowledge. Paper presented at the 18th International Conference on Machine Learning (ICML-01)."},{"key":"jkdb.2010040102-27","doi-asserted-by":"publisher","DOI":"10.1109\/6046.807953"},{"key":"jkdb.2010040102-28","first-page":"505","article-title":"Distance metric learning, with application to clustering with side-information. In","volume":"15","author":"E. P.Xing","year":"2002","journal-title":"Proceedings of Advances in Neural Information Processing Systems"},{"issue":"8","key":"jkdb.2010040102-29","doi-asserted-by":"crossref","first-page":"601","DOI":"10.1038\/nrg861","article-title":"Genomics and natural language processing.","volume":"3","author":"M. D.Yandell","year":"2002","journal-title":"Nature Reviews. Genetics"},{"key":"jkdb.2010040102-30","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2003-4-5-r34"},{"key":"jkdb.2010040102-31","first-page":"113","article-title":"Gene clustering and gene function prediction using multiple sources of data. In","volume":"2006","author":"H.Zare","year":"2006","journal-title":"Proceedings of the IEEE Genomic Signal Processing and Statistics"},{"key":"jkdb.2010040102-32","unstructured":"Zhong, S., & Ghosh, J. (2003). A comparative study of generative models for document clustering. In Proceedings of the workshop on Clustering High Dimensional Data and Its Applications in SIAM Data Mining Conference."}],"container-title":["International Journal of Knowledge Discovery in Bioinformatics"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=45163","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T12:33:50Z","timestamp":1740227630000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jkdb.2010040102"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2010,4,1]]},"references-count":33,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2010,4]]}},"URL":"https:\/\/doi.org\/10.4018\/jkdb.2010040102","relation":{},"ISSN":["1947-9115","1947-9123"],"issn-type":[{"type":"print","value":"1947-9115"},{"type":"electronic","value":"1947-9123"}],"subject":[],"published":{"date-parts":[[2010,4,1]]}}}