{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T15:15:05Z","timestamp":1764688505650},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation.<\/jats:p>\n               <jats:p>Results: We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments.<\/jats:p>\n               <jats:p>Availability: Java and Matlab codes are available on request from the authors.<\/jats:p>\n               <jats:p>Supplementary material: Available online at<\/jats:p>\n               <jats:p>Contact: \u00a0jotatu@utu.fi<\/jats:p>","DOI":"10.1093\/bioinformatics\/btk019","type":"journal-article","created":{"date-parts":[[2005,12,24]],"date-time":"2005-12-24T01:13:44Z","timestamp":1135386824000},"page":"566-572","source":"Crossref","is-referenced-by-count":83,"title":["Improving missing value estimation in microarray data with gene ontology"],"prefix":"10.1093","volume":"22","author":[{"given":"Johannes","family":"Tuikkala","sequence":"first","affiliation":[{"name":"Department of Information Technology, University of Turku 1 \u00a0 1 \u00a0 \u00a0 Lemmink\u00e4isenkatu 14A, FIN-20520, Finland"},{"name":"Turku Centre for Computer Science (TUCS) 3 \u00a0 3 \u00a0 \u00a0 Lemmink\u00e4isenkatu 14A, FIN-20520, Finland"}]},{"given":"Laura","family":"Elo","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Turku 2 \u00a0 2 \u00a0 \u00a0 FIN-20014 Finland"},{"name":"Turku Centre for Computer Science (TUCS) 3 \u00a0 3 \u00a0 \u00a0 Lemmink\u00e4isenkatu 14A, FIN-20520, Finland"},{"name":"Turku Centre for Biotechnology 4 \u00a0 4 \u00a0 \u00a0 Tykist\u00f6katu 6, FIN-20521, Finland"}]},{"given":"Olli S.","family":"Nevalainen","sequence":"additional","affiliation":[{"name":"Department of Information Technology, University of Turku 1 \u00a0 1 \u00a0 \u00a0 Lemmink\u00e4isenkatu 14A, FIN-20520, Finland"},{"name":"Turku Centre for Computer Science (TUCS) 3 \u00a0 3 \u00a0 \u00a0 Lemmink\u00e4isenkatu 14A, FIN-20520, Finland"},{"name":"Turku Centre for Biotechnology 4 \u00a0 4 \u00a0 \u00a0 Tykist\u00f6katu 6, FIN-20521, Finland"}]},{"given":"Tero","family":"Aittokallio","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Turku 2 \u00a0 2 \u00a0 \u00a0 FIN-20014 Finland"},{"name":"Turku Centre for Computer Science (TUCS) 3 \u00a0 3 \u00a0 \u00a0 Lemmink\u00e4isenkatu 14A, FIN-20520, Finland"},{"name":"Turku Centre for Biotechnology 4 \u00a0 4 \u00a0 \u00a0 Tykist\u00f6katu 6, FIN-20521, Finland"}]}],"member":"286","published-online":{"date-parts":[[2005,12,23]]},"reference":[{"key":"2023012408523669900_b1","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/1471-2105-5-18","article-title":"Quantifying the relationship between co-expression, co-regulation and gene function","volume":"5","author":"Allocco","year":"2004","journal-title":"BMC Bioinfromatics"},{"key":"2023012408523669900_b2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012408523669900_b3","doi-asserted-by":"crossref","first-page":"1464","DOI":"10.1093\/bioinformatics\/bth088","article-title":"GOstat: find statistically overrepresented Gene Ontologies within a group of genes","volume":"20","author":"Bei\u00dfbarth","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b4","doi-asserted-by":"crossref","first-page":"e34","DOI":"10.1093\/nar\/gnh026","article-title":"LSimpute: accurate estimation of missing values in microarray data with least squares method","volume":"32","author":"B\u00f8","year":"2004","journal-title":"Nucleic Acids Res."},{"key":"2023012408523669900_b5","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1016\/j.jmva.2004.02.001","article-title":"Ontology concepts and tools for statistical genomics","volume":"90","author":"Carey","year":"2003","journal-title":"J. Multivariate Anal."},{"key":"2023012408523669900_b6","doi-asserted-by":"crossref","first-page":"114","DOI":"10.1186\/1471-2105-5-114","article-title":"Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering","volume":"5","author":"De Brevern","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012408523669900_b7","doi-asserted-by":"crossref","first-page":"680","DOI":"10.1126\/science.278.5338.680","article-title":"Exploring the metabolic and genetic control of gene expression on a genomic scale","volume":"278","author":"DeRisi","year":"1997","journal-title":"Science"},{"key":"2023012408523669900_b8","doi-asserted-by":"crossref","first-page":"10","DOI":"10.2202\/1544-6115.1120","article-title":"Prediction of missing values in microarray and use of mixed models to evaluate the predictors","volume":"4","author":"Feten","year":"2005","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023012408523669900_b9","doi-asserted-by":"crossref","first-page":"1574","DOI":"10.1101\/gr.397002","article-title":"Judging the quality of gene expression-based clustering methods using gene annotation","volume":"12","author":"Gibbons","year":"2002","journal-title":"Genome Res."},{"key":"2023012408523669900_b10","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1093\/bioinformatics\/bth499","article-title":"Missing value estimation for DNA microarray gene expression data: local least squares imputation","volume":"21","author":"Kim","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b11","doi-asserted-by":"crossref","first-page":"160","DOI":"10.1186\/1471-2105-5-160","article-title":"Reuse of imputed data in microarray analysis increases imputation efficiency","volume":"5","author":"Kim","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012408523669900_b12","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1093\/bioinformatics\/btg420","article-title":"A graph-theoretic modelling on GO space for biological interpretation of gene clusters","volume":"20","author":"Lee","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b13","doi-asserted-by":"crossref","first-page":"13167","DOI":"10.1073\/pnas.1733249100","article-title":"Robust singular value decomposition analysis of microarray data","volume":"100","author":"Liu","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012408523669900_b14","volume-title":"Statistical Analysis with Missing Data","author":"Little","year":"1987"},{"key":"2023012408523669900_b15","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1038\/35015701","article-title":"Genomics, gene expression and DNA arrays","volume":"405","author":"Lockhart","year":"2000","journal-title":"Nature"},{"key":"2023012408523669900_b16","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b17","doi-asserted-by":"crossref","first-page":"2088","DOI":"10.1093\/bioinformatics\/btg287","article-title":"A Bayesian missing value estimation method for gene expression profile data","volume":"19","author":"Oba","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b18","doi-asserted-by":"crossref","first-page":"4309","DOI":"10.1091\/mbc.11.12.4309","article-title":"New components of a system for phosphate accumulation and polyphosphate metabolism in Saccharomyces cerevisiae revealed by genomic expression analysis","volume":"11","author":"Ogawa","year":"2000","journal-title":"Mol. Biol. Cell"},{"key":"2023012408523669900_b19","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1093\/bioinformatics\/bth007","article-title":"Gaussian mixture clustering and imputation of microarray data","volume":"20","author":"Ouyang","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b20","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1186\/1471-2105-6-59","article-title":"Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein","volume":"6","author":"Raghava","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012408523669900_b21","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1126\/science.270.5235.467","article-title":"Quantitative monitoring of gene expression patterns with a complementary DNA microarray","volume":"270","author":"Schena","year":"1995","journal-title":"Science"},{"key":"2023012408523669900_b22","doi-asserted-by":"crossref","first-page":"2417","DOI":"10.1093\/bioinformatics\/bti345","article-title":"Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data","volume":"21","author":"Sehgal","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b23","doi-asserted-by":"crossref","first-page":"3273","DOI":"10.1091\/mbc.9.12.3273","article-title":"Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization","volume":"9","author":"Spellman","year":"1998","journal-title":"Mol. Biol. Cell"},{"key":"2023012408523669900_b24","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarray","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012408523669900_b25","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1038\/46567","article-title":"Chromosomal landscape of nucleosome-dependent gene expression and silencing in yeast","volume":"402","author":"Wyrick","year":"1999","journal-title":"Nature"},{"key":"2023012408523669900_b26","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/bioinformatics\/btg323","article-title":"Missing-value estimation using linear and non-linear regression with Bayesian gene selection","volume":"19","author":"Zhou","year":"2003","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/5\/566\/48839027\/bioinformatics_22_5_566.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/5\/566\/48839027\/bioinformatics_22_5_566.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T09:30:03Z","timestamp":1674552603000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/5\/566\/205669"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,12,23]]},"references-count":26,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2006,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btk019","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,3,1]]},"published":{"date-parts":[[2005,12,23]]}}}