{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T19:21:56Z","timestamp":1779304916106,"version":"3.51.4"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,3,31]],"date-time":"2023-03-31T00:00:00Z","timestamp":1680220800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,3,31]],"date-time":"2023-03-31T00:00:00Z","timestamp":1680220800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Cancer Insitute","award":["P30-CA076292"],"award-info":[{"award-number":["P30-CA076292"]}]},{"DOI":"10.13039\/100006108","name":"National Center for Advancing Translational Sciences","doi-asserted-by":"publisher","award":["UL1TR001450"],"award-info":[{"award-number":["UL1TR001450"]}],"id":[{"id":"10.13039\/100006108","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Cluster analysis is utilized frequently in scientific theory and applications to separate data into groups. A key assumption in many clustering algorithms is that the data was generated from a population consisting of multiple distinct clusters. Clusterability testing allows users to question the inherent assumption of latent cluster structure, a theoretical requirement for meaningful results in cluster analysis.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>This paper proposes methods for clusterability testing designed for high-dimensional data by utilizing sparse principal component analysis. Type I error and power of the clusterability tests are evaluated using simulated data with different types of cluster structure in high dimensions. Empirical performance of the new methods is evaluated and compared with prior methods on gene expression, microarray, and shotgun proteomics data. Our methods had reasonably low Type I error and maintained power for many datasets with a variety of structures and dimensions. Cluster structure was not detectable in other datasets with spatially close clusters.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>This is the first analysis of clusterability testing on both simulated and real-world high-dimensional data.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-023-05210-6","type":"journal-article","created":{"date-parts":[[2023,3,31]],"date-time":"2023-03-31T10:03:30Z","timestamp":1680257010000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Sparse clusterability: testing for cluster structure in high dimensions"],"prefix":"10.1186","volume":"24","author":[{"given":"Jose","family":"Laborde","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Paul A.","family":"Stewart","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhihua","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yian A.","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9991-427X","authenticated-orcid":false,"given":"Naomi C.","family":"Brownstein","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,3,31]]},"reference":[{"key":"5210_CR1","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.patcog.2018.10.026","volume":"88","author":"A Adolfsson","year":"2019","unstructured":"Adolfsson A, Ackerman M, Brownstein NC. To cluster, or not to cluster: an analysis of clusterability methods. Pattern Recognit. 2019;88:13\u201326.","journal-title":"Pattern Recognit"},{"key":"5210_CR2","doi-asserted-by":"publisher","first-page":"104004","DOI":"10.1016\/j.dib.2019.104004","volume":"25","author":"NC Brownstein","year":"2019","unstructured":"Brownstein NC, Adolfsson A, Ackerman M. Descriptive statistics and visualization of data from the r datasets package with implications for clusterability. Data Brief. 2019;25:104004.","journal-title":"Data Brief"},{"issue":"1","key":"5210_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/biostatistics\/kxab030","volume":"24","author":"TA Alexander","year":"2023","unstructured":"Alexander TA, Irizarry RA, Bravo HC. Capturing discrete latent structures: choose LDs over PCs. Biostatistics. 2023;24(1):1\u201316.","journal-title":"Biostatistics"},{"issue":"2","key":"5210_CR4","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1198\/106186006X113430","volume":"15","author":"H Zou","year":"2006","unstructured":"Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. J Comput Graph Stat. 2006;15(2):265\u201386.","journal-title":"J Comput Graph Stat"},{"issue":"4","key":"5210_CR5","doi-asserted-by":"publisher","first-page":"1927","DOI":"10.1109\/TIP.2017.2789327","volume":"27","author":"T Yellamraju","year":"2018","unstructured":"Yellamraju T, Boutin M. Clusterability and clustering of images and other \u201creal\u2019\u2019 high-dimensional data. IEEE Trans Image Process. 2018;27(4):1927\u201338.","journal-title":"IEEE Trans Image Process"},{"key":"5210_CR6","doi-asserted-by":"publisher","first-page":"012002","DOI":"10.1088\/1742-6596\/1334\/1\/012002","volume":"1334","author":"D Simovici","year":"2019","unstructured":"Simovici D, Hua K. Data ultrametricity and clusterability. J Phys Conf Ser. 2019;1334:012002.","journal-title":"J Phys Conf Ser"},{"key":"5210_CR7","unstructured":"John CR. Clusterlab: flexible Gaussian Cluster Simulator 2019. R package version 0.0.2.8. https:\/\/CRAN.R-project.org\/package=clusterlab"},{"key":"5210_CR8","unstructured":"Erichson NB, Zheng P, Aravkin S. Sparsepca: Sparse Principal Component Analysis (SPCA) 2018. R package version 0.1.2. https:\/\/cran.r-project.org\/web\/packages\/sparsepca\/"},{"issue":"2","key":"5210_CR9","doi-asserted-by":"publisher","first-page":"977","DOI":"10.1137\/18m1211350","volume":"80","author":"NB Erichson","year":"2020","unstructured":"Erichson NB, Zheng P, Manohar K, Brunton SL, Kutz JN, Aravkin AY. Sparse principal component analysis via variable projection. SIAM J Appl Math. 2020;80(2):977\u20131002. https:\/\/doi.org\/10.1137\/18m1211350.","journal-title":"SIAM J Appl Math"},{"key":"5210_CR10","doi-asserted-by":"crossref","unstructured":"Neville Z, Brownstein N, Ackerman M, Adolfsson A. Clusterability: performs tests for cluster tendency of a data set 2020. R package version 0.1.1.0. https:\/\/CRAN.R-project.org\/package=clusterability","DOI":"10.32614\/CRAN.package.clusterability"},{"issue":"3","key":"5210_CR11","doi-asserted-by":"publisher","first-page":"579","DOI":"10.1111\/1467-9868.00141","volume":"60","author":"M-Y Cheng","year":"1998","unstructured":"Cheng M-Y, Hall P. Calibrating the excess mass and dip tests of modality. J R Stat Soc Ser B (Stat Methodol). 1998;60(3):579\u201389.","journal-title":"J R Stat Soc Ser B (Stat Methodol)"},{"issue":"17","key":"5210_CR12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2333\/bhmk.12.17_1","volume":"12","author":"B Efron","year":"1985","unstructured":"Efron B, Tibshirani R. The bootstrap method for assessing statistical accuracy. Behaviormetrika. 1985;12(17):1\u201335.","journal-title":"Behaviormetrika"},{"key":"5210_CR13","doi-asserted-by":"publisher","first-page":"14049","DOI":"10.1038\/ncomms14049","volume":"8","author":"GXY Zheng","year":"2017","unstructured":"Zheng GXY, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017;8:14049. https:\/\/doi.org\/10.1038\/ncomms14049.","journal-title":"Nat Commun"},{"key":"5210_CR14","unstructured":"10x Genomics: 10k PBMCs from a healthy donor\u2014gene expression and cell surface protein. https:\/\/support.10xgenomics.com\/single-cell-gene-expression\/datasets\/3.0.0\/pbmc_10k_protein_v3"},{"key":"5210_CR15","doi-asserted-by":"publisher","DOI":"10.1158\/1078-0432.CCR-21-1694","author":"I Smalley","year":"2021","unstructured":"Smalley I, Chen Z, Phadke MS, Li J, Yu X, Wyatt C, Evernden B, Messina JL, Sarnaik A, Sondak VK, et al. Single cell characterization of the immune microenvironment of melanoma brain and leptomeningeal metastases. Clin Cancer Res. 2021. https:\/\/doi.org\/10.1158\/1078-0432.CCR-21-1694.","journal-title":"Clin Cancer Res"},{"key":"5210_CR16","unstructured":"Dua D, Graff C. UCI machine learning repository 2017. http:\/\/archive.ics.uci.edu\/ml"},{"issue":"17","key":"5210_CR17","doi-asserted-by":"publisher","first-page":"3269","DOI":"10.1080\/00949655.2018.1509979","volume":"88","author":"Z Neville","year":"2018","unstructured":"Neville Z, Brownstein NC. Macros to conduct tests of multimodality in SAS. J Stat Comput Simul. 2018;88(17):3269\u201390.","journal-title":"J Stat Comput Simul"},{"issue":"186","key":"5210_CR18","doi-asserted-by":"publisher","first-page":"186","DOI":"10.1126\/scitranslmed.3005723","volume":"5","author":"S Rousseaux","year":"2013","unstructured":"Rousseaux S, Debernardi A, Jacquiau B, Vitte A-L, Vesin A, Nagy-Mignotte H, Moro-Sibilot D, Brichon P-Y, Lantuejoul S, Hainaut P, et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci Transl Med. 2013;5(186):186\u20136618666.","journal-title":"Sci Transl Med"},{"issue":"1","key":"5210_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-14-153","volume":"14","author":"EA Welsh","year":"2013","unstructured":"Welsh EA, Eschrich SA, Berglund AE, Fenstermacher DA. Iterative rank-order normalization of gene expression microarray data. BMC Bioinform. 2013;14(1):1\u201311.","journal-title":"BMC Bioinform"},{"issue":"1","key":"5210_CR20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41467-019-11452-x","volume":"10","author":"PA Stewart","year":"2019","unstructured":"Stewart PA, Welsh EA, Slebos RJ, Fang B, Izumi V, Chambers M, Zhang G, Cen L, Pettersson F, Zhang Y, et al. Proteogenomic landscape of squamous cell lung cancer. Nat Commun. 2019;10(1):1\u201317.","journal-title":"Nat Commun"},{"issue":"2","key":"5210_CR21","doi-asserted-by":"publisher","first-page":"462","DOI":"10.1016\/j.cell.2013.09.034","volume":"155","author":"CW Brennan","year":"2013","unstructured":"Brennan CW, Verhaak RG, McKenna A, Campos B, Noushmehr H, Salama SR, Zheng S, Chakravarty D, Sanborn JZ, Berman SH, et al. The somatic genomic landscape of glioblastoma. Cell. 2013;155(2):462\u201377.","journal-title":"Cell"},{"issue":"1","key":"5210_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-020-58766-1","volume":"10","author":"CR John","year":"2020","unstructured":"John CR, Watson D, Russ D, Goldmann K, Ehrenstein M, Pitzalis C, Lewis M, Barnes M. M3c: Monte Carlo reference-based consensus clustering. Sci Rep. 2020;10(1):1\u201314.","journal-title":"Sci Rep"},{"issue":"1","key":"5210_CR23","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1214\/aos\/1176346577","volume":"13","author":"JA Hartigan","year":"1985","unstructured":"Hartigan JA, Hartigan PM. The dip test of unimodality. Ann Stat. 1985;13(1):70\u201384.","journal-title":"Ann Stat"},{"key":"5210_CR24","unstructured":"Maechler M. Diptest: Hartigan\u2019s Dip test statistic for unimodality\u2014corrected. 2016. R package version 0.75-7. https:\/\/CRAN.R-project.org\/package=diptest"},{"key":"5210_CR25","doi-asserted-by":"publisher","DOI":"10.1111\/j.2517-6161.1981.tb01155.x","author":"BW Silverman","year":"1981","unstructured":"Silverman BW. Using kernel density estimates to investigate multimodality. J R Stat Soc Ser B (Methodol). 1981. https:\/\/doi.org\/10.1111\/j.2517-6161.1981.tb01155.x.","journal-title":"J R Stat Soc Ser B (Methodol)"},{"key":"5210_CR26","unstructured":"Schwaiger F, Holzmann H. Package which implements the Silvermantest. (2013). https:\/\/www.mathematik.uni-marburg.de\/texttildelowstochastik\/R_packages\/"},{"issue":"2","key":"5210_CR27","first-page":"515","volume":"11","author":"P Hall","year":"2001","unstructured":"Hall P, York M. On the calibration of Silverman\u2019s test for multimodality. Stat Sin. 2001;11(2):515\u201336.","journal-title":"Stat Sin"},{"key":"5210_CR28","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1037\/h0070888","volume":"24","author":"H Hotelling","year":"1933","unstructured":"Hotelling H. Analysis of a complex of statistical variables with principal components. J Educ Psy. 1933;24:498\u2013520.","journal-title":"J Educ Psy"},{"key":"5210_CR29","volume-title":"Principal component analysis","author":"IT Jolliffe","year":"2002","unstructured":"Jolliffe IT. Principal component analysis. 2nd ed. New York: Springer; 2002.","edition":"2"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05210-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05210-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05210-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,17]],"date-time":"2024-10-17T12:33:46Z","timestamp":1729168426000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05210-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,31]]},"references-count":29,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5210"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05210-6","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,31]]},"assertion":[{"value":"30 June 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 February 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 March 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"N\/A: analysis of secondary data without identifiers.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"NCB served as an ad hoc reviewer in 2020 for the American Cancer Society, for which she received sponsored travel during the review meeting and a stipend of US $300. NCB received a series of small awards for conference and travel support, including US $500 from the Statistical Consulting Section of the American Statistical Association (ASA) for Best Paper Award at the 2019 Joint Statistical Meetings. Currently, NCB serves as the Vice President for the Florida Chapter of the ASA and Section Representative for the ASA Statistical Consulting Section, and on the Regional Committee for the Eastern North American Region of the International Biometrics Society. Previously, NCB served as the Florida ASA Chapter Representative, as the mentoring subcommittee chair for the Regional Advisory Board of the Eastern North American Region of the International Biometrics Society, and on the Scientific Review Board at Moffitt Cancer Center. JL is the Information Officer for the ASA Florida Chapter. YAC currently serves on the Scientific Review Board at Moffitt Cancer Center.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"125"}}