{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:32Z","timestamp":1772138012162,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2021,5,5]],"date-time":"2021-05-05T00:00:00Z","timestamp":1620172800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,11,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Gene clustering and sample clustering are commonly used to find patterns in gene expression datasets. However, genes may cluster differently in heterogeneous samples (e.g. different tissues or disease states), whilst traditional methods assume that clusters are consistent across samples. Biclustering algorithms aim to solve this issue by performing sample clustering and gene clustering simultaneously. Existing reviews of biclustering algorithms have yet to include a number of more recent algorithms and have based comparisons on simplistic simulated datasets without specific evaluation of biclusters in real datasets, using less robust metrics.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We compared four classes of sparse biclustering algorithms on a range of simulated and real datasets. All algorithms generally struggled on simulated datasets with a large number of genes or implanted biclusters. We found that Bayesian algorithms with strict sparsity constraints had high accuracy on the simulated datasets and did not require any post-processing, but were considerably slower than other algorithm classes. We found that non-negative matrix factorisation algorithms performed poorly, but could be re-purposed for biclustering through a sparsity-inducing post-processing procedure we introduce; one such algorithm was one of the most highly ranked on real datasets. In a multi-tissue knockout mouse RNA-seq dataset, the algorithms rarely returned clusters containing samples from multiple different tissues, whilst such clusters were identified in a human dataset of more closely related cell types (sorted blood cell subsets). This highlights the need for further thought in the design and analysis of multi-tissue studies to avoid differences between tissues dominating the analysis.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability<\/jats:title>\n                    <jats:p>Code to run the analysis is available at https:\/\/github.com\/nichollskc\/biclust_comp, including wrappers for each algorithm, implementations of evaluation metrics, and code to simulate datasets and perform pre- and post-processing. The full tables of results are available at https:\/\/doi.org\/10.5281\/zenodo.4581206.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bib\/bbab140","type":"journal-article","created":{"date-parts":[[2021,3,25]],"date-time":"2021-03-25T22:09:21Z","timestamp":1616710161000},"source":"Crossref","is-referenced-by-count":13,"title":["Comparison of sparse biclustering algorithms for gene expression datasets"],"prefix":"10.1093","volume":"22","author":[{"given":"Kath","family":"Nicholls","sequence":"first","affiliation":[{"name":"Cambridge Institute for Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, CB2 0AW, UK"}]},{"given":"Chris","family":"Wallace","sequence":"additional","affiliation":[{"name":"Cambridge Institute for Therapeutic Immunology and Infectious Disease, University of Cambridge, Cambridge, CB2 0AW, UK"},{"name":"MRC Biostatistics Unit, Cambridge Biomedical Campus, Forvie Site, Robinson Way, Cambridge, CB2 0SR, UK"}]}],"member":"286","published-online":{"date-parts":[[2021,5,6]]},"reference":[{"issue":"20","key":"2021110814291734300_ref1","doi-asserted-by":"crossref","first-page":"3840","DOI":"10.1093\/bioinformatics\/bti641","article-title":"Shifting and scaling patterns from gene expression data","volume":"21","author":"Aguilar-Ruiz","year":"2005","journal-title":"Bioinformatics"},{"key":"2021110814291734300_ref2"},{"issue":"4","key":"2021110814291734300_ref3","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1214\/aos\/1013699998","article-title":"The control of the false discovery rate in multiple testing under dependency","volume":"29","author":"Benjamini","year":"2001","journal-title":"Annal Statist"},{"key":"2021110814291734300_ref4","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1145\/1854776.1854814","volume-title":"Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology","author":"Bozda\u011f","year":"2010"},{"key":"2021110814291734300_ref5","first-page":"22","volume-title":"Proceedings of the 4th Conference on Message Understanding","author":"Chinchor","year":"1992"},{"issue":"3","key":"2021110814291734300_ref6","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1093\/bib\/bbs032","article-title":"A comparative analysis of biclustering algorithms for gene expression data","volume":"14","author":"Eren","year":"2013","journal-title":"Brief Bioinform"},{"issue":"1","key":"2021110814291734300_ref7","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1186\/s12859-017-1559-2","article-title":"Reactome pathway analysis: a high-performance in-memory approach","volume":"18","author":"Fabregat","year":"2017","journal-title":"BMC Bioinformat"},{"issue":"7","key":"2021110814291734300_ref8","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1004791","article-title":"Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering","volume":"12","author":"Gao","year":"2016","journal-title":"PLoS Comput Biol"},{"issue":"5439","key":"2021110814291734300_ref9","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2021110814291734300_ref10","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.ymeth.2018.02.004","article-title":"Bi-clustering of metabolic data using matrix factorization tools","volume":"151","author":"Gu","year":"2018","journal-title":"Methods"},{"issue":"5","key":"2021110814291734300_ref11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3195833","article-title":"Triclustering algorithms for three-dimensional data analysis: a comprehensive survey","volume":"51","author":"Henriques","year":"2019","journal-title":"ACM Comput Surv"},{"issue":"12","key":"2021110814291734300_ref12","doi-asserted-by":"crossref","first-page":"1520","DOI":"10.1093\/bioinformatics\/btq227","article-title":"FABIA: factor analysis for bicluster acquisition","volume":"26","author":"Hochreiter","year":"2010","journal-title":"Bioinformatics"},{"issue":"9","key":"2021110814291734300_ref13","doi-asserted-by":"crossref","first-page":"1094","DOI":"10.1038\/ng.3624","article-title":"Tensor decomposition for multi-tissue gene expression experiments","volume":"48","author":"Hore","year":"2016","journal-title":"Nat Genet"},{"issue":"5","key":"2021110814291734300_ref14","doi-asserted-by":"crossref","first-page":"942","DOI":"10.1109\/TCBB.2014.2325016","article-title":"Similarity measures for comparing biclusterings","volume":"11","author":"Horta","year":"2014","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"2","key":"2021110814291734300_ref15","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1111\/j.1469-8137.1912.tb05611.x","article-title":"The distribution of the flora in the alpine zone","volume":"11","author":"Jaccard","year":"1912","journal-title":"New Phytol"},{"issue":"D1","key":"2021110814291734300_ref16","first-page":"D498","article-title":"The reactome pathway knowledgebase","volume":"48","author":"Jassal","year":"2020","journal-title":"Nucleic Acids Res"},{"issue":"11","key":"2021110814291734300_ref17","doi-asserted-by":"crossref","first-page":"1370","DOI":"10.1109\/TKDE.2004.68","article-title":"Cluster analysis for gene expression data: a survey","volume":"16","author":"Jiang","year":"2004","journal-title":"IEEE Trans Knowled Data Eng"},{"issue":"12","key":"2021110814291734300_ref18","doi-asserted-by":"crossref","first-page":"1495","DOI":"10.1093\/bioinformatics\/btm134","article-title":"Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis","volume":"23","author":"Kim","year":"2007","journal-title":"Bioinformatics"},{"issue":"Database issue","key":"2021110814291734300_ref19","doi-asserted-by":"crossref","first-page":"D802","DOI":"10.1093\/nar\/gkt977","article-title":"The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data","volume":"42","author":"Koscielny","year":"2014","journal-title":"Nucleic Acids Res"},{"issue":"19","key":"2021110814291734300_ref20","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2014a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"key":"2021110814291734300_ref21","article-title":"Plaid models for gene expression data","volume":"12","author":"Lazzeroni","year":"2000","journal-title":"Statistica Sinica"},{"issue":"10","key":"2021110814291734300_ref22","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0109760","article-title":"Copy number loss of the interferon gene cluster in melanomas is linked to reduced t cell infiltrate and poor patient prognosis","volume":"9","author":"Linsley","year":"2014","journal-title":"PLOS ONE"},{"issue":"1","key":"2021110814291734300_ref23","first-page":"148","article-title":"Spike-and-slab lasso biclustering","volume":"15","author":"Moran","year":"2021","journal-title":"Annal Appl Stat"},{"issue":"7\u20138","key":"2021110814291734300_ref24","doi-asserted-by":"crossref","first-page":"325","DOI":"10.1007\/s00335-015-9573-z","article-title":"MouseMine: a new data warehouse for MGI","volume":"26","author":"Motenko","year":"2015","journal-title":"Mamm Genome"},{"issue":"1","key":"2021110814291734300_ref25","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1002\/nav.3800040112","article-title":"On the assignment and transportation problems (abstract)","volume":"4","author":"Munkres","year":"1957","journal-title":"Naval Res Logist Quart"},{"issue":"1","key":"2021110814291734300_ref26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-017-1487-1","article-title":"A systematic comparative evaluation of biclustering techniques","volume":"18","author":"Padilha","year":"2017","journal-title":"BMC Bioinformat"},{"issue":"3","key":"2021110814291734300_ref27","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1109\/TPAMI.2006.60","article-title":"Nonsmooth nonnegative matrix factorization (nsNMF)","volume":"28","author":"Pascual-Montano","year":"2006","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"7","key":"2021110814291734300_ref28","doi-asserted-by":"crossref","first-page":"902","DOI":"10.1109\/TKDE.2006.106","article-title":"Comparing subspace clusterings","volume":"18","author":"Patrikainen","year":"2006","journal-title":"IEEE Trans Knowled Data Eng"},{"issue":"9","key":"2021110814291734300_ref29","doi-asserted-by":"crossref","first-page":"1122","DOI":"10.1093\/bioinformatics\/btl060","article-title":"A systematic comparison and evaluation of biclustering methods for gene expression data","volume":"22","author":"Preli\u0107","year":"2006","journal-title":"Bioinformatics"},{"issue":"1","key":"2021110814291734300_ref30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-03424-4","article-title":"A comprehensive evaluation of module detection methods for gene expression data","volume":"9","author":"Saelens","year":"2018","journal-title":"Nat Commun"},{"issue":"2","key":"2021110814291734300_ref31","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1016\/j.csda.2004.02.003","article-title":"Improved biclustering of microarray data demonstrated through systematic performance tests","volume":"48","author":"Turner","year":"2005","journal-title":"Comput Statist Data Anal"},{"issue":"2","key":"2021110814291734300_ref32","first-page":"1103","article-title":"Three-way clustering of multi-tissue multi-individual gene expression data using semi-nonnegative tensor decomposition","volume":"13","author":"Wang","year":"2019","journal-title":"Annal App Statist"},{"issue":"2","key":"2021110814291734300_ref33","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pgen.1005691","article-title":"Transcriptome analysis of targeted mouse mutations reveals the topography of local changes in gene expression","volume":"12","author":"West","year":"2016","journal-title":"PLoS Genet"},{"issue":"1","key":"2021110814291734300_ref34","doi-asserted-by":"crossref","first-page":"43","DOI":"10.2174\/157489312799304413","article-title":"Biclustering analysis for pattern discovery: current techniques, comparative studies and applications","volume":"7","author":"Zhao","year":"2012","journal-title":"Curr Bioinformat"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab140\/41087404\/bbab140.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/22\/6\/bbab140\/41087404\/bbab140.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,11,8]],"date-time":"2021-11-08T09:30:00Z","timestamp":1636363800000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbab140\/6265183"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,6]]},"references-count":34,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbab140","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.12.15.422852","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,11]]},"published":{"date-parts":[[2021,5,6]]},"article-number":"bbab140"}}