{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:28:16Z","timestamp":1773275296388,"version":"3.50.1"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: A major challenge in studying gene regulation is to systematically reconstruct transcription regulatory modules, which are defined as sets of genes that are regulated by a common set of transcription factors. A commonly used approach for transcription module reconstruction is to derive coexpression clusters from a microarray dataset. However, such results often contain false positives because genes from many transcription modules may be simultaneously perturbed upon a given type of conditions. In this study, we propose and validate that genes, which form a coexpression cluster in multiple microarray datasets across diverse conditions, are more likely to form a transcription module. However, identifying genes coexpressed in a subset of many microarray datasets is not a trivial computational problem.<\/jats:p>\n               <jats:p>Results: We propose a graph-based data-mining approach to efficiently and systematically identify frequent coexpression clusters. Given m microarray datasets, we model each microarray dataset as a coexpression graph, and search for vertex sets which are frequently densely connected across \u2308 \u03b8 m \u2309 datasets (0 \u2264 \u03b8 \u2264 1). For this novel graph-mining problem, we designed two techniques to narrow down the search space: (1) partition the input graphs into (overlapping) groups sharing common properties; (2) summarize the vertex neighbor information from the partitioned datasets onto the \u2018Neighbor Association Summary Graph's for effective mining. We applied our method to 105 human microarray datasets, and identified a large number of potential transcription modules, activated under different subsets of conditions. Validation by ChIP-chip data demonstrated that the likelihood of a coexpression cluster being a transcription module increases significantly with its recurrence. Our method opens a new way to exploit the vast amount of existing microarray data accumulation for gene regulation study. Furthermore, the algorithm is applicable to other biological networks for approximate network module mining.<\/jats:p>\n               <jats:p>Availability: http:\/\/zhoulab.usc.edu\/NeMo\/<\/jats:p>\n               <jats:p>Contact: xjzhou@usc.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm227","type":"journal-article","created":{"date-parts":[[2007,7,23]],"date-time":"2007-07-23T16:13:46Z","timestamp":1185207226000},"page":"i577-i586","source":"Crossref","is-referenced-by-count":64,"title":["A graph-based approach to systematically reconstruct human transcriptional regulatory modules"],"prefix":"10.1093","volume":"23","author":[{"given":"Xifeng","family":"Yan","sequence":"first","affiliation":[{"name":"1 IBM T. J. Watson Research Center, Hawthorne NY and 2Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael R.","family":"Mehan","sequence":"additional","affiliation":[{"name":"1 IBM T. J. Watson Research Center, Hawthorne NY and 2Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Huang","sequence":"additional","affiliation":[{"name":"1 IBM T. J. Watson Research Center, Hawthorne NY and 2Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael S.","family":"Waterman","sequence":"additional","affiliation":[{"name":"1 IBM T. J. Watson Research Center, Hawthorne NY and 2Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Philip S.","family":"Yu","sequence":"additional","affiliation":[{"name":"1 IBM T. J. Watson Research Center, Hawthorne NY and 2Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xianghong Jasmine","family":"Zhou","sequence":"additional","affiliation":[{"name":"1 IBM T. J. Watson Research Center, Hawthorne NY and 2Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2007,7,1]]},"reference":[{"key":"2023062708513570900_B1","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1016\/S1369-5274(02)00322-3","article-title":"Functional genomics as applied to mapping transcription regulatory networks","volume":"5","author":"Banerjee","year":"2002","journal-title":"Curr. Opin. Microbiol"},{"key":"2023062708513570900_B2","doi-asserted-by":"crossref","first-page":"D267","DOI":"10.1093\/nar\/gkh061","article-title":"The Unified Medical Language System (UMLS): integrating biomedical terminology","volume":"32","author":"Bodenreider","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023062708513570900_B3","first-page":"106","article-title":"Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics","author":"Butte","year":"2006","journal-title":"AMIA Annu. Symp. Proc"},{"key":"2023062708513570900_B4","doi-asserted-by":"crossref","first-page":"3339","DOI":"10.1073\/pnas.0630591100","article-title":"Integrating regulatory motif discovery and genome-wide expression analysis","volume":"100","author":"Conlon","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023062708513570900_B5","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"Eisen","year":"1998","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023062708513570900_B6","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/S0020-0190(00)00142-3","article-title":"A clustering algorithm based on graph connectivity","volume":"76","author":"Hartuv","year":"2000","journal-title":"Information Processing Lett"},{"key":"2023062708513570900_B7","doi-asserted-by":"crossref","first-page":"i213","DOI":"10.1093\/bioinformatics\/bti1049","article-title":"Mining coherent dense subgraphs across massive biological networks for functional discovery","volume":"21","author":"Hu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023062708513570900_B8","doi-asserted-by":"crossref","first-page":"W83","DOI":"10.1093\/nar\/gkh411","article-title":"PathBLAST: a tool for alignment of protein interaction networks","volume":"32","author":"Kelley","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023062708513570900_B9","first-page":"45","article-title":"Assessing significance of connectivity and conservation in protein interaction networks","author":"Koyut\u00fcrk","year":"2006","journal-title":"RECOMB"},{"key":"2023062708513570900_B10","doi-asserted-by":"crossref","first-page":"D668","DOI":"10.1093\/nar\/gkl928","article-title":"The UCSC genome browser database: update 2007","volume":"35","author":"Kuhn","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023062708513570900_B11","doi-asserted-by":"crossref","first-page":"1085","DOI":"10.1101\/gr.1910904","article-title":"Coexpression analysis of human genes across many microarray data sets","volume":"14","author":"Lee","year":"2004","journal-title":"Genome Res"},{"key":"2023062708513570900_B12","first-page":"127","article-title":"BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes","author":"Liu","year":"2001","journal-title":"Pac. Symp. Biocomput"},{"key":"2023062708513570900_B13","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1038\/nature02782","article-title":"Genomic analysis of regulatory network dynamics reveals large topological changes","volume":"431","author":"Luscombe","year":"2004","journal-title":"Nature"},{"key":"2023062708513570900_B14","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1038\/ng724","article-title":"Identifying regulatory networks by combinatorial analysis of promoter elements","volume":"29","author":"Pilpel","year":"2001","journal-title":"Nat. Genet"},{"key":"2023062708513570900_B15","doi-asserted-by":"crossref","first-page":"3508","DOI":"10.1093\/bioinformatics\/bth436","article-title":"Modeling interactome: scale-free or geometric?","volume":"20","author":"Pr\u017eulj","year":"2004","journal-title":"Bioinformatics"},{"key":"2023062708513570900_B16","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1038\/nbt1098-939","article-title":"Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation","volume":"16","author":"Roth","year":"1998","journal-title":"Nat. Biotechnol"},{"key":"2023062708513570900_B17","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1038\/ng1165","article-title":"Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data","volume":"34","author":"Segal","year":"2003","journal-title":"Nat. Genet"},{"key":"2023062708513570900_B18","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1109\/34.868688","article-title":"Normalized Cuts and Image Segmentation","volume":"22","author":"Shi","year":"2000","journal-title":"IEEE Trans. on Pat. Analy. and Mach. Int"},{"key":"2023062708513570900_B19","doi-asserted-by":"crossref","first-page":"2907","DOI":"10.1073\/pnas.96.6.2907","article-title":"Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation","volume":"96","author":"Tamayo","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023062708513570900_B20","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1038\/nbt0698-566","article-title":"Quantitative whole-genome analysis of DNA-protein interactions by in vivo methylase protection in E. coli","volume":"16","author":"Tavazoie","year":"1998","journal-title":"Nat. Biotechnol"},{"key":"2023062708513570900_B21","doi-asserted-by":"crossref","first-page":"1998","DOI":"10.1073\/pnas.0405537102","article-title":"Inference of combinatorial regulation in yeast transcriptional networks: a case study of sporulation","volume":"102","author":"Wang","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023062708513570900_B22","first-page":"324","article-title":"Mining closed relational graphs with connectivity constraints","author":"Yan","year":"2005"},{"issue":"20","key":"2023062708513570900_B23","doi-asserted-by":"crossref","first-page":"12783","DOI":"10.1073\/pnas.192159399","article-title":"Transitive functional annotation by shortest-path analysis of gene expression data","volume":"99","author":"Zhou","year":"2002","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023062708513570900_B24","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1152\/physiolgenomics.00157.2002","article-title":"Novel mechanisms of T-cell and dendritic cell activation revealed by profiling of psoriasis on the 63,100-element oligonucleotide array","volume":"13","author":"Zhou","year":"2003","journal-title":"Physiol. Genomics"},{"key":"2023062708513570900_B25","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1038\/nbt1058","article-title":"Functional annotation and network reconstruction through cross-platform integration of microarray data","volume":"23","author":"Zhou","year":"2005","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/13\/i577\/50718194\/bioinformatics_23_13_i577.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/13\/i577\/50718194\/bioinformatics_23_13_i577.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T08:54:34Z","timestamp":1687856074000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/13\/i577\/237816"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,7,1]]},"references-count":25,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2007,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm227","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,7]]},"published":{"date-parts":[[2007,7,1]]}}}