{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,10]],"date-time":"2024-06-10T00:03:06Z","timestamp":1717977786481},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":480,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Predicting disease phenotypes from genotypes is a key challenge in medical applications in the postgenomic era. Large training datasets of patients that have been both genotyped and phenotyped are the key requisite when aiming for high prediction accuracy. With current genotyping projects producing genetic data for hundreds of thousands of patients, large-scale phenotyping has become the bottleneck in disease phenotype prediction.<\/jats:p><jats:p>Results: Here we present an approach for imputing missing disease phenotypes given the genotype of a patient. Our approach is based on co-training, which predicts the phenotype of unlabeled patients based on a second class of information, e.g. clinical health record information. Augmenting training datasets by this type of in silico phenotyping can lead to significant improvements in prediction accuracy. We demonstrate this on a dataset of patients with two diagnostic types of migraine, termed migraine with aura and migraine without aura, from the International Headache Genetics Consortium.<\/jats:p><jats:p>Conclusions: Imputing missing disease phenotypes for patients via co-training leads to larger training datasets and improved prediction accuracy in phenotype prediction.<\/jats:p><jats:p>Availability and implementation: The code can be obtained at: http:\/\/www.bsse.ethz.ch\/mlcb\/research\/bioinformatics-and-computational-biology\/co-training.html<\/jats:p><jats:p>Contact: \u00a0karsten.borgwardt@bsse.ethz.ch or menno.witteveen@bsse.ethz.ch<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv254","type":"journal-article","created":{"date-parts":[[2015,6,13]],"date-time":"2015-06-13T17:12:36Z","timestamp":1434215556000},"page":"i303-i310","source":"Crossref","is-referenced-by-count":7,"title":["<i>In silico<\/i>phenotyping via co-training for improved phenotype prediction from genotype"],"prefix":"10.1093","volume":"31","author":[{"given":"Damian","family":"Roqueiro","sequence":"first","affiliation":[{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Menno J.","family":"Witteveen","sequence":"additional","affiliation":[{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Verneri","family":"Anttila","sequence":"additional","affiliation":[{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"},{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"},{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gisela M.","family":"Terwindt","sequence":"additional","affiliation":[{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arn M.J.M.","family":"van den Maagdenberg","sequence":"additional","affiliation":[{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"},{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karsten","family":"Borgwardt","sequence":"additional","affiliation":[{"name":"1 Machine Learning and Computational Biology Lab, Department of Biosystems Science and Engineering, ETH Zurich, Switzerland, 2Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, 3Program in Medical and Population Genetics and 4Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA, 5Department of Neurology and 6Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,6,10]]},"reference":[{"key":"2023020115421913700_btv254-B1","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"1000 Genomes Project Consortium et al.","year":"2012","journal-title":"Nature"},{"key":"2023020115421913700_btv254-B2","doi-asserted-by":"crossref","first-page":"224ed4","DOI":"10.1126\/scitranslmed.3008601","article-title":"UK biobank data: come and get it","volume":"6","author":"Allen","year":"2014","journal-title":"Science Trans. Med."},{"key":"2023020115421913700_btv254-B3","doi-asserted-by":"crossref","first-page":"869","DOI":"10.1038\/ng.652","article-title":"Genome-wide association study of migraine implicates a common susceptibility variant on 8q22.1","volume":"42","author":"Anttila","year":"2010","journal-title":"Nat. Genet."},{"key":"2023020115421913700_btv254-B4","doi-asserted-by":"crossref","DOI":"10.1145\/279943.279962","article-title":"Combining labeled and unlabeled data with co-training","volume-title":"Proceedings of the Eleventh Annual Conference on Computational Learning Theory","author":"Blum","year":"1998"},{"key":"2023020115421913700_btv254-B5","doi-asserted-by":"crossref","DOI":"10.2202\/1544-6115.1676","article-title":"Multiple imputation of missing phenotype data for QTL mapping","volume":"10","author":"Bobb","year":"2011","journal-title":"Stat. Appl. Genet. Mol. Biol."},{"key":"2023020115421913700_btv254-B6","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"140","author":"Breiman","year":"1996","journal-title":"Mach. Learn."},{"key":"2023020115421913700_btv254-B7","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"2023020115421913700_btv254-B8","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1104\/pp.114.243519","article-title":"Image-based high-throughput field phenotyping of crop roots","volume":"166","author":"Bucksch","year":"2014","journal-title":"Plant Physiol."},{"key":"2023020115421913700_btv254-B9","doi-asserted-by":"crossref","first-page":"375","DOI":"10.7551\/mitpress\/1120.003.0053","article-title":"PAC generalization bounds for co-training","volume-title":"Advances in Neural Information Processing Systems 14","author":"Dasgupta","year":"2002"},{"key":"2023020115421913700_btv254-B10","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1038\/nrg3012","article-title":"Genome-wide genetic marker discovery and genotyping using next-generation sequencing","volume":"12","author":"Davey","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023020115421913700_btv254-B11","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1111\/j.0006-341X.1999.00997.x","article-title":"Genomic control for association studies","volume":"55","author":"Devlin","year":"1999","journal-title":"Biometrics"},{"key":"2023020115421913700_btv254-B12","doi-asserted-by":"crossref","first-page":"777","DOI":"10.1038\/ng.2307","article-title":"Genome-wide association analysis identifies susceptibility loci for migraine without aura","volume":"44","author":"Freilinger","year":"2012","journal-title":"Nat. Genet."},{"key":"2023020115421913700_btv254-B13","article-title":"A systematic review of factors associated to m-health adoption by health care professionals","volume-title":"Medicine 2.0 Conference","author":"Gagnon","year":"2014"},{"key":"2023020115421913700_btv254-B14","first-page":"9","article-title":"The International Classification of Headache Disorders: 2nd edition","volume":"24","author":"Headache Classification Subcommittee, International Headache Society","year":"2004","journal-title":"Cephalalgia"},{"key":"2023020115421913700_btv254-B15","doi-asserted-by":"crossref","first-page":"3405","DOI":"10.1002\/sim.5804","article-title":"A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index","volume":"33","author":"Hilden","year":"2014","journal-title":"Stat. Med."},{"key":"2023020115421913700_btv254-B16","doi-asserted-by":"crossref","first-page":"1001","DOI":"10.1093\/bioinformatics\/bts081","article-title":"ShapePheno: unsupervised extraction of shape phenotypes from biological image collections","volume":"28","author":"Karaletsos","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115421913700_btv254-B17","doi-asserted-by":"crossref","first-page":"e1003200","DOI":"10.1371\/journal.pcbi.1003200","article-title":"Predicting disease risk using bootstrap ranking and classification algorithms","volume":"9","author":"Manor","year":"2013","journal-title":"PLoS Comput. Biol."},{"key":"2023020115421913700_btv254-B18","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1038\/nature09796","article-title":"A decade\u2019s perspective on DNA sequencing technology","volume":"470","author":"Mardis","year":"2011","journal-title":"Nature"},{"key":"2023020115421913700_btv254-B19","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"2023020115421913700_btv254-B20","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1086\/519795","article-title":"PLINK: a toolset for whole-genome association and population-based linkage analysis","volume":"81","author":"Purcell","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"2023020115421913700_btv254-B21","doi-asserted-by":"crossref","first-page":"e1002141","DOI":"10.1371\/journal.pcbi.1002141","article-title":"Using electronic patient records to discover disease correlations and stratify patient cohorts","volume":"7","author":"Roque","year":"2011","journal-title":"PLoS Comput. Biol."},{"key":"2023020115421913700_btv254-B22","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1007\/s100440200011","article-title":"Bagging, boosting and the random subspace method for linear classifiers","volume":"5","author":"Skurichina","year":"2002","journal-title":"Pattern Anal. Appl."},{"key":"2023020115421913700_btv254-B23","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1038\/nature05911","article-title":"Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls","volume":"447","author":"Wellcome Trust Case Control Consortium","year":"2007","journal-title":"Nature"},{"key":"2023020115421913700_btv254-B24","article-title":"DNA sequencing costs: data from the NHGRI Genome Sequencing Program (GSP)","author":"Wetterstrand","year":"2013"},{"key":"2023020115421913700_btv254-B25","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1038\/nmeth.2848","article-title":"Efficient multivariate linear mixed model algorithms for genome-wide association studies","volume":"11","author":"Zhou","year":"2014","journal-title":"Nat. Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/12\/i303\/49013623\/bioinformatics_31_12_i303.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/12\/i303\/49013623\/bioinformatics_31_12_i303.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,9]],"date-time":"2024-06-09T14:00:37Z","timestamp":1717941637000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/12\/i303\/216227"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,6,10]]},"references-count":25,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv254","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,6,15]]},"published":{"date-parts":[[2015,6,10]]}}}