{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T04:17:06Z","timestamp":1776917826009,"version":"3.51.2"},"reference-count":22,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T00:00:00Z","timestamp":1638316800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,12,8]],"date-time":"2021-12-08T00:00:00Z","timestamp":1638921600000},"content-version":"vor","delay-in-days":7,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF10CC1016517"],"award-info":[{"award-number":["NNF10CC1016517"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF10CC1016517"],"award-info":[{"award-number":["NNF10CC1016517"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF10CC1016517"],"award-info":[{"award-number":["NNF10CC1016517"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100009708","name":"Novo Nordisk Fonden","doi-asserted-by":"publisher","award":["NNF10CC1016517"],"award-info":[{"award-number":["NNF10CC1016517"]}],"id":[{"id":"10.13039\/501100009708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["U01AI124316"],"award-info":[{"award-number":["U01AI124316"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["U01AI124316"],"award-info":[{"award-number":["U01AI124316"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","award":["U01AI124316"],"award-info":[{"award-number":["U01AI124316"]}],"id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Independent component analysis is an unsupervised machine learning algorithm that separates a set of mixed signals into a set of statistically independent source signals. Applied to high-quality gene expression datasets, independent component analysis effectively reveals both the source signals of the transcriptome as co-regulated gene sets, and the activity levels of the underlying regulators across diverse experimental conditions. Two major variables that affect the final gene sets are the diversity of the expression profiles contained in the underlying data, and the user-defined number of independent components, or dimensionality, to compute. Availability of high-quality transcriptomic datasets has grown exponentially as high-throughput technologies have advanced; however, optimal dimensionality selection remains an open question.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>We computed independent components across a range of dimensionalities for four gene expression datasets with varying dimensions (both in terms of number of genes and number of samples). We computed the correlation between independent components across different dimensionalities to understand how the overall structure evolves as the number of user-defined components increases. We then measured how well the resulting gene clusters reflected known regulatory mechanisms, and developed a set of metrics to assess the accuracy of the decomposition at a given dimension.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We found that over-decomposition results in many independent components dominated by a single gene, whereas under-decomposition results in independent components that poorly capture the known regulatory structure. From these results, we developed a new method, called OptICA, for finding the optimal dimensionality that controls for both over- and under-decomposition. Specifically, OptICA selects the highest dimension that produces a low number of components that are dominated by a single gene. We show that OptICA outperforms two previously proposed methods for selecting the number of independent components across four transcriptomic databases of varying sizes.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>OptICA avoids both over-decomposition and under-decomposition of transcriptomic datasets resulting in the best representation of the organism\u2019s underlying transcriptional regulatory network.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-021-04497-7","type":"journal-article","created":{"date-parts":[[2021,12,8]],"date-time":"2021-12-08T03:18:35Z","timestamp":1638933515000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":61,"title":["Optimal dimensionality selection for independent component analysis of transcriptomic data"],"prefix":"10.1186","volume":"22","author":[{"given":"John Luke","family":"McConn","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cameron R.","family":"Lamoureux","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Saugat","family":"Poudel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bernhard O.","family":"Palsson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anand V.","family":"Sastry","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,12,8]]},"reference":[{"key":"4497_CR1","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1016\/S0893-6080(00)00026-5","volume":"13","author":"A Hyv\u00e4rinen","year":"2000","unstructured":"Hyv\u00e4rinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw. 2000;13:411\u201330.","journal-title":"Neural Netw"},{"key":"4497_CR2","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1016\/j.jneumeth.2003.10.009","volume":"134","author":"A Delorme","year":"2004","unstructured":"Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods. 2004;134:9\u201321.","journal-title":"J Neurosci Methods"},{"key":"4497_CR3","doi-asserted-by":"publisher","first-page":"2447","DOI":"10.1093\/bioinformatics\/bth270","volume":"20","author":"M Scholz","year":"2004","unstructured":"Scholz M, Gatzek S, Sterling A, Fiehn O, Selbig J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics. 2004;20:2447\u201354.","journal-title":"Bioinformatics"},{"key":"4497_CR4","doi-asserted-by":"publisher","first-page":"5536","DOI":"10.1038\/s41467-019-13483-w","volume":"10","author":"AV Sastry","year":"2019","unstructured":"Sastry AV, Gao Y, Szubin R, Hefner Y, Xu S, Kim D, et al. The Escherichia coli transcriptome mostly consists of independently regulated modules. Nat Commun. 2019;10:5536.","journal-title":"Nat Commun"},{"key":"4497_CR5","doi-asserted-by":"publisher","first-page":"501","DOI":"10.2144\/000112950","volume":"45","author":"W Kong","year":"2008","unstructured":"Kong W, Vanderburg CR, Gunshin H, Rogers JT, Huang X. A review of independent component analysis application to microarray gene expression data. Biotechniques. 2008;45:501\u201320.","journal-title":"Biotechniques"},{"key":"4497_CR6","doi-asserted-by":"publisher","first-page":"932","DOI":"10.1016\/j.jbi.2010.07.001","volume":"43","author":"JM Engreitz","year":"2010","unstructured":"Engreitz JM, Daigle BJ Jr, Marshall JJ, Altman RB. Independent component analysis: mining microarray data for fundamental human gene expression modules. J Biomed Inform. 2010;43:932\u201344.","journal-title":"J Biomed Inform"},{"key":"4497_CR7","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1016\/j.celrep.2014.10.035","volume":"9","author":"A Biton","year":"2014","unstructured":"Biton A, Bernard-Pierrot I, Lou Y, Krucker C, Chapeaublanc E, Rubio-P\u00e9rez C, et al. Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. Cell Rep. 2014;9:1235\u201345.","journal-title":"Cell Rep"},{"key":"4497_CR8","doi-asserted-by":"publisher","first-page":"e161","DOI":"10.1371\/journal.pcbi.0030161","volume":"3","author":"AE Teschendorff","year":"2007","unstructured":"Teschendorff AE, Journ\u00e9e M, Absil PA, Sepulchre R, Caldas C. Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput Biol. 2007;3:e161.","journal-title":"PLoS Comput Biol"},{"key":"4497_CR9","doi-asserted-by":"publisher","first-page":"6338","DOI":"10.1038\/s41467-020-20153-9","volume":"11","author":"K Rychel","year":"2020","unstructured":"Rychel K, Sastry AV, Palsson BO. Machine learning uncovers independently regulated modules in the Bacillus subtilis transcriptome. Nat Commun. 2020;11:6338.","journal-title":"Nat Commun"},{"key":"4497_CR10","doi-asserted-by":"publisher","first-page":"17228","DOI":"10.1073\/pnas.2008413117","volume":"117","author":"S Poudel","year":"2020","unstructured":"Poudel S, Tsunemoto H, Seif Y, Sastry AV, Szubin R, Xu S, et al. Revealing 29 sets of independently modulated genes in Staphylococcus aureus, their regulators, and role in key physiological response. Proc Natl Acad Sci USA. 2020;117:17228\u201339.","journal-title":"Proc Natl Acad Sci USA"},{"key":"4497_CR11","doi-asserted-by":"publisher","first-page":"e1004122","DOI":"10.1371\/journal.pgen.1004122","volume":"10","author":"KJ Karczewski","year":"2014","unstructured":"Karczewski KJ, Snyder M, Altman RB, Tatonetti NP. Coherent functional modules improve transcription factor target identification, cooperativity prediction, and disease association. PLoS Genet. 2014;10:e1004122.","journal-title":"PLoS Genet"},{"key":"4497_CR12","doi-asserted-by":"publisher","first-page":"1090","DOI":"10.1038\/s41467-018-03424-4","volume":"9","author":"W Saelens","year":"2018","unstructured":"Saelens W, Cannoodt R, Saeys Y. A comprehensive evaluation of module detection methods for gene expression data. Nat Commun. 2018;9:1090.","journal-title":"Nat Commun"},{"key":"4497_CR13","doi-asserted-by":"publisher","first-page":"D991","DOI":"10.1093\/nar\/gks1193","volume":"41","author":"T Barrett","year":"2013","unstructured":"Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets\u2014update. Nucleic Acids Res. 2013;41:D991\u20135.","journal-title":"Nucleic Acids Res"},{"key":"4497_CR14","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1186\/s13059-020-02021-3","volume":"21","author":"GP Way","year":"2020","unstructured":"Way GP, Zietz M, Rubinetti V, Himmelstein DS, Greene CS. Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations. Genome Biol. 2020;21:109.","journal-title":"Genome Biol"},{"key":"4497_CR15","doi-asserted-by":"publisher","first-page":"712","DOI":"10.1186\/s12864-017-4112-9","volume":"18","author":"U Kairov","year":"2017","unstructured":"Kairov U, Cantini L, Greco A, Molkenov A, Czerwinska U, Barillot E, et al. Determining the optimal number of independent components for reproducible transcriptomic data analysis. BMC Genomics. 2017;18:712.","journal-title":"BMC Genomics"},{"key":"4497_CR16","doi-asserted-by":"crossref","unstructured":"Hyvarinen A. Fast ICA for noisy data using Gaussian moments. In: 1999 IEEE international symposium on circuits and systems (ISCAS). vol 5. 1999. p. 57\u201361.","DOI":"10.1109\/ISCAS.1999.777510"},{"key":"4497_CR17","doi-asserted-by":"publisher","DOI":"10.1101\/2021.04.08.439047v1.abstract","author":"CR Lamoureux","year":"2021","unstructured":"Lamoureux CR, Decker KT, Sastry AV, McConn JL. PRECISE 2.0-an expanded high-quality RNA-seq compendium for Escherichia coli K-12 reveals high-resolution transcriptional regulatory structure. bioRxiv. 2021. https:\/\/doi.org\/10.1101\/2021.04.08.439047v1.abstract.","journal-title":"bioRxiv"},{"key":"4497_CR18","doi-asserted-by":"publisher","first-page":"1103","DOI":"10.1126\/science.1206848","volume":"335","author":"P Nicolas","year":"2012","unstructured":"Nicolas P, M\u00e4der U, Dervyn E, Rochat T, Leduc A, Pigeonneau N, et al. Condition-dependent transcriptome reveals high-level regulatory architecture in Bacillus subtilis. Science. 2012;335:1103\u20136.","journal-title":"Science"},{"key":"4497_CR19","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825\u201330.","journal-title":"J Mach Learn Res"},{"key":"4497_CR20","doi-asserted-by":"publisher","first-page":"626","DOI":"10.1109\/72.761722","volume":"10","author":"A Hyv\u00e4rinen","year":"1999","unstructured":"Hyv\u00e4rinen A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw. 1999;10:626\u201334.","journal-title":"IEEE Trans Neural Netw"},{"key":"4497_CR21","unstructured":"Ester M, Kriegel H-P, Sander J, Xu X, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd 1996; p. 226\u201331."},{"key":"4497_CR22","doi-asserted-by":"crossref","unstructured":"Satopaa V, Albrecht J, Irwin D, Raghavan B. Finding a\u201c kneedle\u201d in a haystack: Detecting knee points in system behavior. In: 2011 31st international conference on distributed computing systems workshops. IEEE; 2011. p. 166\u201371.","DOI":"10.1109\/ICDCSW.2011.20"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04497-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04497-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04497-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,13]],"date-time":"2024-09-13T20:09:38Z","timestamp":1726258178000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04497-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":22,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["4497"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04497-7","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.05.26.445885","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,12]]},"assertion":[{"value":"26 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 December 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"584"}}