{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T12:06:34Z","timestamp":1774785994265,"version":"3.50.1"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: A major challenge in the interpretation of high-throughput genomic data is understanding the functional associations between genes. Previously, several approaches have been described to extract gene relationships from various biological databases using term-matching methods. However, more flexible automated methods are needed to identify functional relationships (both explicit and implicit) between genes from the biomedical literature. In this study, we explored the utility of Latent Semantic Indexing (LSI), a vector space model for information retrieval, to automatically identify conceptual gene relationships from titles and abstracts in MEDLINE citations.<\/jats:p><jats:p>Results: We found that LSI identified gene-to-gene and keyword-to-gene relationships with high average precision. In addition, LSI identified implicit gene relationships based on word usage patterns in the gene abstract documents. Finally, we demonstrate here that pairwise distances derived from the vector angles of gene abstract documents can be effectively used to functionally group genes by hierarchical clustering. Our results provide proof-of-principle that LSI is a robust automated method to elucidate both known (explicit) and unknown (implicit) gene relationships from the biomedical literature. These features make LSI particularly useful for the analysis of novel associations discovered in genomic experiments.<\/jats:p><jats:p>Availability: The 50-gene document collection used in this study can be interactively queried at http:\/\/shad.cs.utk.edu\/sgo\/sgo.html<\/jats:p><jats:p>Contact: \u00a0rhomayouni@utmem.edu<\/jats:p><jats:p>Supplementary information: \u00a0http:\/\/shad.cs.utk.edu\/sgo\/pubs.html<\/jats:p>","DOI":"10.1093\/bioinformatics\/bth464","type":"journal-article","created":{"date-parts":[[2004,8,13]],"date-time":"2004-08-13T00:15:36Z","timestamp":1092356136000},"page":"104-115","source":"Crossref","is-referenced-by-count":109,"title":["Gene clustering by Latent Semantic Indexing of MEDLINE abstracts"],"prefix":"10.1093","volume":"21","author":[{"given":"Ramin","family":"Homayouni","sequence":"first","affiliation":[]},{"given":"Kevin","family":"Heinrich","sequence":"additional","affiliation":[]},{"given":"Lai","family":"Wei","sequence":"additional","affiliation":[]},{"given":"Michael W.","family":"Berry","sequence":"additional","affiliation":[]}],"member":"286","published-online":{"date-parts":[[2004,8,12]]},"reference":[{"key":"2023013107190605800_B1","doi-asserted-by":"crossref","unstructured":"Arnaud, L., Ballif, B.A., Forster, E., Cooper, J.A. 2003Fyn tyrosine kinase is a critical regulator of disabled-1 during brain development. Curr. Biol.139\u201317","DOI":"10.1016\/S0960-9822(02)01397-0"},{"key":"2023013107190605800_B2","doi-asserted-by":"crossref","unstructured":"Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet.2525\u201329","DOI":"10.1038\/75556"},{"key":"2023013107190605800_B3","unstructured":"Baeza-Yates, R. and Ribeiro-Neto, B. Modern Information Retrival1999, New York ACM Press"},{"key":"2023013107190605800_B4","doi-asserted-by":"crossref","unstructured":"Becker, K.G., Hosack, D.A., Dennis, G., Jr, Lempicki, R.A., Bright, T.J., Cheadle, C., Engel, J. 2003PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics4, pp. 61","DOI":"10.1186\/1471-2105-4-61"},{"key":"2023013107190605800_B5","unstructured":"Berry, M.W. 1992Large scale singular value computations. Int. J. Supercomputer App.613\u201349"},{"key":"2023013107190605800_B6","unstructured":"Berry, M.W. and Browne, M. Understanding Search Engines: Mathematical Modeling and Text Retrieval1999, Philadelphia SIAM"},{"key":"2023013107190605800_B7","doi-asserted-by":"crossref","unstructured":"Berry, M.W., Drmac, Z., Jessup, E. 1999Matrices, vector spaces, and information retrieval. SIAM Rev.41, pp. 335\u2013362","DOI":"10.1137\/S0036144598347035"},{"key":"2023013107190605800_B8","unstructured":"Berry, M.W., Dumais, S., O'Brien, G. 1995Using linear algebra for intelligent information retrieval. SIAM Rev.37573\u2013595"},{"key":"2023013107190605800_B9","doi-asserted-by":"crossref","unstructured":"Bock, H.H. and Herz, J. 2003Reelin activates SRC family tyrosine kinases in neurons. Curr. Biol.1318\u201326","DOI":"10.1016\/S0960-9822(02)01403-3"},{"key":"2023013107190605800_B10","doi-asserted-by":"crossref","unstructured":"Brich, J., Shie, F.S., Howell, B.W., Li, R., Tus, K., Wakeland, E.K., Jin, L.W., Mumby, M., Churchill, G., Herz, J., Cooper, J.A. 2003Genetic modulation of tau phosphorylation in the mouse. J. Neurosci.23187\u2013192","DOI":"10.1523\/JNEUROSCI.23-01-00187.2003"},{"key":"2023013107190605800_B11","unstructured":"Chen, C., Stoffel, N., Post, M., Basu, C., Bassu, D., Behrens, C. Aberer, K. and Liu, L. 2001Telcordia LSI engine: implementation and scalability issues. Proceedings of the 11th International Workshop on Research Issues in Data Engineering , Germany Heidelberg, pp. 51\u201358"},{"key":"2023013107190605800_B12","unstructured":"D'Arcangelo, G., Homayouni, R., Keshvara, L., Rice, D.S., Sheldon, M., Curran, T. 1999Reelin is a ligand for lipoprotein receptors. Neuron24471\u2013479"},{"key":"2023013107190605800_B13","unstructured":"D'Arcangelo, G., Miao, G.G., Chen, S.C., Soares, H.D., Morgan, J.I., Curran, T. 1995A protein related to extracellular matrix proteins deleted in the mouse mutant reeler. Nature374719\u2013723"},{"key":"2023013107190605800_B14","unstructured":"Deerwester, S.C., Dumais, S.T., Furnas, G.W., Harshman, R.A., Landauer, T.K., Lochbaum, K.E., Streeter, L.A. Computer Information Retrieval Using Latent Semantic Structure1988, USA Bell Communications Research, Inc"},{"key":"2023013107190605800_B15","doi-asserted-by":"crossref","unstructured":"Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A. 1990Indexing by latent semantic analysis. J. Inform. Sci.41, pp. 391\u2013407","DOI":"10.1002\/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9"},{"key":"2023013107190605800_B16","doi-asserted-by":"crossref","unstructured":"Doniger, S.W., Salomonis, N., Dahlquist, K.D., Vranizan, K., Lawlor, S.C., Conklin, B.R. 2003MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol.4R7","DOI":"10.1186\/gb-2003-4-1-r7"},{"key":"2023013107190605800_B17","unstructured":"Dumais, S. 1991Improving the retrieval of information from external sources. Behavior Res. Meth. Instr. Comp.23229\u2013236"},{"key":"2023013107190605800_B18","unstructured":"Fitch, W.M. and Margoliash, E. 1967Construction of phylogenetic trees. Science155279\u2013284"},{"key":"2023013107190605800_B19","unstructured":"Foltz, P.W., Laham, D., Landauer, T.K. 1999Automated essay scoring: applications to educational technology. Proceedings of the World Conference on Educational Multimedia, Hypermedia and Telecommunications , pp. 939\u2013944"},{"key":"2023013107190605800_B20","unstructured":"Funk, M.E. and Reid, C.A. 1983Indexing consistency in MEDLINE. Bull. Med. Libr. Assoc.71176\u2013183"},{"key":"2023013107190605800_B21","doi-asserted-by":"crossref","unstructured":"Giles, J.T., Wo, L., Berry, M.W. 2003GTP (General Text Parser) software for Tex mining. In Bozdogan, H. (Ed.). Statistical Data Mining and Knowledge Discover , Boca Raton, FL CRC Press","DOI":"10.1201\/9780203497159.ch27"},{"key":"2023013107190605800_B22","doi-asserted-by":"crossref","unstructured":"Glenisson, P., Antal, P., Mathys, J., Moreau, Y., De Moor, B. 2003Evaluation of the vector space representation in text-based gene clustering. Pac. Symp. Biocomput. , pp. 391\u2013402","DOI":"10.1142\/9789812776303_0037"},{"key":"2023013107190605800_B23","unstructured":"Golub, G. and Loan, CV. Matrix Computations1996, Baltimore Johns-Hopkins"},{"key":"2023013107190605800_B24","doi-asserted-by":"crossref","unstructured":"Hiesberger, T., Trommsdorff, M., Howell, B.W., Goffinet, A., Mumby, M.C., Cooper, J.A., Herz, J. 1999Direct binding of Reelin to VLDL receptor and ApoE receptor 2 induces tyrosine phosphorylation of disabled-1 and modulates tau phosphorylation. Neuron24, pp. 481\u2013489","DOI":"10.1016\/S0896-6273(00)80861-2"},{"key":"2023013107190605800_B25","doi-asserted-by":"crossref","unstructured":"Homayouni, R., Rice, D.S., Sheldon, M., Curran, T. 1999Disabled-1 binds to the cytoplasmic domain of amyloid precursor-like protein 1. J. Neurosci.197507\u20137515","DOI":"10.1523\/JNEUROSCI.19-17-07507.1999"},{"key":"2023013107190605800_B26","doi-asserted-by":"crossref","unstructured":"Hosack, D.A., Dennis, G., Jr, Sherman, B.T., Lane, H.C., Lempicki, R.A. 2003Identifying biological themes within lists of genes with EASE. Genome Biol.4R70","DOI":"10.1186\/gb-2003-4-6-p4"},{"key":"2023013107190605800_B27","doi-asserted-by":"crossref","unstructured":"Howell, B.W., Gertler, F.B., Cooper, J.A. 1997Mouse disabled (mDab1): a Src binding protein implicated in neuronal development. EMBO J.16121\u2013132","DOI":"10.1093\/emboj\/16.1.121"},{"key":"2023013107190605800_B28","doi-asserted-by":"crossref","unstructured":"Howell, B.W., Lanier, L.M., Frank, R., Gertler, F.B., Cooper, J.A. 1999The disabled 1 phosphotyrosine-binding domain binds to the internalization signals of transmembrane glycoproteins and to phospholipids. Mol. Cell. Biol.195179\u20135188","DOI":"10.1128\/MCB.19.7.5179"},{"key":"2023013107190605800_B29","doi-asserted-by":"crossref","unstructured":"Jenssen, T.K., Laegreid, A., Komorowski, J., Hovig, E. 2001A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet.2821\u201328","DOI":"10.1038\/ng0501-21"},{"key":"2023013107190605800_B30","unstructured":"Kanehisa, M. and Goto, S. 2000KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.2827\u201330"},{"key":"2023013107190605800_B31","doi-asserted-by":"crossref","unstructured":"Keshvara, L., Magdaleno, S., Benhayon, D., Curran, T. 2002Cyclin-dependent kinase 5 phosphorylates disabled 1 independently of Reelin signaling. J. Neurosci.224869\u20134877","DOI":"10.1523\/JNEUROSCI.22-12-04869.2002"},{"key":"2023013107190605800_B32","doi-asserted-by":"crossref","unstructured":"Kwon, Y.T. and Tsai, L.H. 1998A novel disruption of cortical development in p35(\u2212\/\u2212) mice distinct from reeler. J. Comput. Neurol.395510\u2013522","DOI":"10.1002\/(SICI)1096-9861(19980615)395:4<510::AID-CNE7>3.0.CO;2-4"},{"key":"2023013107190605800_B33","doi-asserted-by":"crossref","unstructured":"Kwon, Y.T. and Tsai, L.H. 2000The role of the p35\/cdk5 kinase in cortical development. Results Probl. Cell Differ.30241\u2013253","DOI":"10.1007\/978-3-540-48002-0_10"},{"key":"2023013107190605800_B34","doi-asserted-by":"crossref","unstructured":"Landauer, T.K., Laham, D., Derr, M. 2004From paragraph to graph: latent semantic analysis for information visualization. Proc. Natl Acad. Sci., USA1015214\u20135219","DOI":"10.1073\/pnas.0400341101"},{"key":"2023013107190605800_B35","unstructured":"Landauer, T.K., Laham, D., Foltz, P.W. 1998Learning human-like knowledge by singular value decomposition: a progress report. In Jordan, M.I., Kearns, M.J., Solla, S.A. (Eds.). Advances in Neural Information Processing Systems , Cambridge MIT Press vol. 10, pp. 45\u201351"},{"key":"2023013107190605800_B36","doi-asserted-by":"crossref","unstructured":"Lee, M.S. and Tsai, L.H. 2003Cdk5: one of the links between senile plaques and neurofibrillary tangles?. J. Alzheimers Dis.5127\u2013137","DOI":"10.3233\/JAD-2003-5207"},{"key":"2023013107190605800_B37","doi-asserted-by":"crossref","unstructured":"Masys, D.R., Welsh, J.B., Lynn Fink, J., Gribskov, M., Klacansky, I., Corbeil, J. 2001Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics17319\u2013326","DOI":"10.1093\/bioinformatics\/17.4.319"},{"key":"2023013107190605800_B38","doi-asserted-by":"crossref","unstructured":"Pruitt, K.D. and Maglott, D.R. 2001RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res.29137\u2013140","DOI":"10.1093\/nar\/29.1.137"},{"key":"2023013107190605800_B39","doi-asserted-by":"crossref","unstructured":"Rice, D.S. and Curran, T. 2001Role of the reelin signaling pathway in central nervous system development. Annu. Rev. Neurosci.241005\u20131039","DOI":"10.1146\/annurev.neuro.24.1.1005"},{"key":"2023013107190605800_B40","unstructured":"Selkoe, D.J. 2001Alzheimer's disease: genes, proteins, and therapy. Physiol. Rev.81741\u2013766"},{"key":"2023013107190605800_B41","unstructured":"Shatkay, H. and Feldman, R. 2003Mining the biomedical literature in the genomic era: an overview. J. Comput. Biol.10821\u2013855"},{"key":"2023013107190605800_B42","doi-asserted-by":"crossref","unstructured":"Sheldon, M., Rice, D.S., D'Arcangelo, G., Yoneshima, H., Nakajima, K., Mikoshiba, K., Howell, B.W., Cooper, J.A., Goldowitz, D., Curran, T. 1997Scrambler and yotari disrupt the disabled gene and produce a reeler-like phenotype in mice. Nature389730\u2013733","DOI":"10.1038\/39601"},{"key":"2023013107190605800_B43","doi-asserted-by":"crossref","unstructured":"Smalheiser, N.R. and Swanson, D.R. 1998Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. Comput. Meth. Programs Biomed.57149\u2013153","DOI":"10.1016\/S0169-2607(98)00033-9"},{"key":"2023013107190605800_B44","doi-asserted-by":"crossref","unstructured":"Stuart, G.W. and Berry, M.W. 2003A comprehensive whole genome bacterial phylogeny using correlated peptide motifs defined in a high dimensional vector space. J. Bioinformatics Comput. Biol.1475\u2013493","DOI":"10.1142\/S0219720003000265"},{"key":"2023013107190605800_B45","unstructured":"Tissir, F. and Goffinet, A.M. 2003Reelin and brain development. Nat. Rev. Neurosci.4496\u2013505"},{"key":"2023013107190605800_B46","doi-asserted-by":"crossref","unstructured":"Trommsdorff, M., Borg, J.P., Margolis, B., Herz, J. 1998Interaction of cytosolic adaptor proteins with neuronal apolipoprotein E receptors and the amyloid precursor protein. J. Biol. Chem.27333556\u201333560","DOI":"10.1074\/jbc.273.50.33556"},{"key":"2023013107190605800_B47","unstructured":"Wilkinson, D.M. and Huberman, B.A. 2004A method for finding communities of related genes. Proc. Natl Acad. Sci., USA1015241\u20135248"},{"key":"2023013107190605800_B48","unstructured":"Yandell, M.D. and Majoros, W.H. 2002Genomics and natural language processing. Nat. Rev. Genet.3601\u2013610"},{"key":"2023013107190605800_B49","doi-asserted-by":"crossref","unstructured":"Zambrano, N., Gianni, D., Bruni, P., Passaro, F., Telese, F., Russo, T. 2004Fe65 is not involved in the platelet-derived growth factor-induced processing of Alzheimer's amyloid precursor protein, which activates its caspase-directed cleavage. J. Biol. Chem.27916161\u201316169","DOI":"10.1074\/jbc.M311027200"},{"key":"2023013107190605800_B50","unstructured":"Zmasek, C.M. and Eddy, S.R. 2001ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics17383\u2013384"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/1\/104\/48961973\/bioinformatics_21_1_104.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/1\/104\/48961973\/bioinformatics_21_1_104.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,18]],"date-time":"2024-12-18T06:53:58Z","timestamp":1734504838000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/1\/104\/212452"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,8,12]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2005,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bth464","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,1,1]]},"published":{"date-parts":[[2004,8,12]]}}}