{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,9]],"date-time":"2026-03-09T19:57:14Z","timestamp":1773086234383,"version":"3.50.1"},"reference-count":13,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":2769,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: As the number of publically available microarray experiments increases, the ability to analyze extremely large datasets across multiple experiments becomes critical. There is a requirement to develop algorithms which are fast and can cluster extremely large datasets without affecting the cluster quality. Clustering is an unsupervised exploratory technique applied to microarray data to find similar data structures or expression patterns. Because of the high input\/output costs involved and large distance matrices calculated, most of the algomerative clustering algorithms fail on large datasets (30 000 + genes\/200 + arrays). In this article, we propose a new two-stage algorithm which partitions the high-dimensional space associated with microarray data using hyperplanes. The first stage is based on the Balanced Iterative Reducing and Clustering using Hierarchies algorithm with the second stage being a conventional k-means clustering technique. This algorithm has been implemented in a software tool (HPCluster) designed to cluster gene expression data. We compared the clustering results using the two-stage hyperplane algorithm with the conventional k-means algorithm from other available programs. Because, the first stage traverses the data in a single scan, the performance and speed increases substantially. The data reduction accomplished in the first stage of the algorithm reduces the memory requirements allowing us to cluster 44 460 genes without failure and significantly decreases the time to complete when compared with popular k-means programs. The software was written in C# (.NET 1.1).<\/jats:p>\n               <jats:p>Availability: The program is freely available and can be downloaded from http:\/\/www.amdcc.org\/bioinformatics\/bioinformatics.aspx.<\/jats:p>\n               <jats:p>Contact: \u00a0rmcindoe@mail.mcg.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp123","type":"journal-article","created":{"date-parts":[[2009,3,5]],"date-time":"2009-03-05T01:56:49Z","timestamp":1236218209000},"page":"1152-1157","source":"Crossref","is-referenced-by-count":17,"title":["A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets"],"prefix":"10.1093","volume":"25","author":[{"given":"Ashok","family":"Sharma","sequence":"first","affiliation":[{"name":"1 Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA"}]},{"given":"Robert","family":"Podolsky","sequence":"additional","affiliation":[{"name":"1 Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA"},{"name":"1 Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA"}]},{"given":"Jieping","family":"Zhao","sequence":"additional","affiliation":[{"name":"1 Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA"}]},{"given":"Richard A.","family":"McIndoe","sequence":"additional","affiliation":[{"name":"1 Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA"},{"name":"1 Center for Biotechnology and Genomic Medicine, 2Department of Medicine and 3Department of Pathology, Medical College of Georgia, Augusta, GA, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,3,4]]},"reference":[{"key":"2023013110282550700_B1","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1038\/nbt1150","article-title":"Creation and implications of a phenome-genome network","volume":"24","author":"Butte","year":"2006","journal-title":"Nat. Biotechnol."},{"key":"2023013110282550700_B2","first-page":"241","article-title":"Evaluation and comparison of clustering algorithms in anglyzing ES cell gene expression data","volume":"12","author":"Chen","year":"2002","journal-title":"Stat. Sin."},{"key":"2023013110282550700_B3","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/S0169-023X(02)00138-6","article-title":"Fast hierarchical clustering and its validation","volume":"44","author":"Dash","year":"2003","journal-title":"Data Knowl. Eng."},{"key":"2023013110282550700_B4","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1093\/bioinformatics\/btg025","article-title":"Comparisons and validation of statistical clustering techniques for microarray gene expression data","volume":"19","author":"Datta","year":"2003","journal-title":"Bioinformatics"},{"key":"2023013110282550700_B5","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110282550700_B6","doi-asserted-by":"crossref","first-page":"3201","DOI":"10.1093\/bioinformatics\/bti517","article-title":"Computational cluster validation in post-genomic data analysis","volume":"21","author":"Handl","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110282550700_B7","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1007\/BF01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"2023013110282550700_B8","doi-asserted-by":"crossref","first-page":"3705","DOI":"10.1210\/jc.2007-0979","article-title":"Gene expression in peripheral blood mononuclear cells from children with diabetes","volume":"92","author":"Kaizer","year":"2007","journal-title":"J. Clin. Endocrinol. Metab."},{"key":"2023013110282550700_B9","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1186\/1471-2105-9-200","article-title":"ParaKMeans: implementation of a parallelized K-means algorithm suitable for general laboratory use","volume":"9","author":"Kraj","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023013110282550700_B10","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc."},{"key":"2023013110282550700_B11","doi-asserted-by":"crossref","first-page":"2405","DOI":"10.1093\/bioinformatics\/btl406","article-title":"Evaluation and comparison of gene clustering methods in microarray analysis","volume":"22","author":"Thalamuthu","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013110282550700_B12","doi-asserted-by":"crossref","first-page":"R34","DOI":"10.1186\/gb-2003-4-5-r34","article-title":"Clustering gene-expression data with repeated measurements","volume":"4","author":"Yeung","year":"2003","journal-title":"Genome Biol."},{"key":"2023013110282550700_B13","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1145\/235968.233324","article-title":"BIRCH: an efficient data clustering method for very large databases","volume":"25","author":"Zhang","year":"1996","journal-title":"ACM SIGMOD Record"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1152\/48984679\/bioinformatics_25_9_1152.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1152\/48984679\/bioinformatics_25_9_1152.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T20:35:54Z","timestamp":1675197354000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/9\/1152\/204132"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,4]]},"references-count":13,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2009,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp123","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,5,1]]},"published":{"date-parts":[[2009,3,4]]}}}