{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T09:43:33Z","timestamp":1776332613168,"version":"3.50.1"},"reference-count":35,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2022,11,3]],"date-time":"2022-11-03T00:00:00Z","timestamp":1667433600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Genotype imputation has a wide range of applications in genome-wide association study (GWAS), including increasing the statistical power of association tests, discovering trait-associated loci in meta-analyses, and prioritizing causal variants with fine-mapping. In recent years, deep learning (DL) based methods, such as sparse convolutional denoising autoencoder (SCDA), have been developed for genotype imputation. However, it remains a challenging task to optimize the learning process in DL-based methods to achieve high imputation accuracy. To address this challenge, we have developed a convolutional autoencoder (AE) model for genotype imputation and implemented a customized training loop by modifying the training process with a single batch loss rather than the average loss over batches. This modified AE imputation model was evaluated using a yeast dataset, the human leukocyte antigen (HLA) data from the 1,000 Genomes Project (1KGP), and our in-house genotype data from the Louisiana Osteoporosis Study (LOS). Our modified AE imputation model has achieved comparable or better performance than the existing SCDA model in terms of evaluation metrics such as the concordance rate (CR), the Hellinger score, the scaled Euclidean norm (SEN) score, and the imputation quality score (IQS) in all three datasets. Taking the imputation results from the HLA data as an example, the AE model achieved an average CR of 0.9468 and 0.9459, Hellinger score of 0.9765 and 0.9518, SEN score of 0.9977 and 0.9953, and IQS of 0.9515 and 0.9044 at missing ratios of 10% and 20%, respectively. As for the results of LOS data, it achieved an average CR of 0.9005, Hellinger score of 0.9384, SEN score of 0.9940, and IQS of 0.8681 at the missing ratio of 20%. In summary, our proposed method for genotype imputation has a great potential to increase the statistical power of GWAS and improve downstream post-GWAS analyses.<\/jats:p>","DOI":"10.3389\/frai.2022.1028978","type":"journal-article","created":{"date-parts":[[2022,11,3]],"date-time":"2022-11-03T18:29:24Z","timestamp":1667500164000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["An autoencoder-based deep learning method for genotype imputation"],"prefix":"10.3389","volume":"5","author":[{"given":"Meng","family":"Song","sequence":"first","affiliation":[]},{"given":"Jonathan","family":"Greenbaum","sequence":"additional","affiliation":[]},{"suffix":"IV","given":"Joseph","family":"Luttrell","sequence":"additional","affiliation":[]},{"given":"Weihua","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Chong","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Zhe","family":"Luo","sequence":"additional","affiliation":[]},{"given":"Chuan","family":"Qiu","sequence":"additional","affiliation":[]},{"given":"Lan Juan","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Kuan-Jui","family":"Su","sequence":"additional","affiliation":[]},{"given":"Qing","family":"Tian","sequence":"additional","affiliation":[]},{"given":"Hui","family":"Shen","sequence":"additional","affiliation":[]},{"given":"Huixiao","family":"Hong","sequence":"additional","affiliation":[]},{"given":"Ping","family":"Gong","sequence":"additional","affiliation":[]},{"given":"Xinghua","family":"Shi","sequence":"additional","affiliation":[]},{"given":"Hong-Wen","family":"Deng","sequence":"additional","affiliation":[]},{"given":"Chaoyang","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2022,11,3]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"eabl3533","DOI":"10.1126\/science.abl3533","article-title":"A complete reference genome improves analysis of human genetic variation","volume":"376","author":"Aganezov","year":"2022","journal-title":"Science"},{"key":"B2","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"key":"B3","doi-asserted-by":"publisher","first-page":"8712","DOI":"10.1038\/ncomms9712","article-title":"Genetic interactions contribute less than additive effects to quantitative trait variation in yeast","volume":"6","author":"Bloom","year":"2015","journal-title":"Nat. Commun."},{"key":"B4","doi-asserted-by":"publisher","first-page":"338","DOI":"10.1016\/j.ajhg.2018.07.015","article-title":"A one-penny imputed genome from next-generation reference panels","volume":"103","author":"Browning","year":"2018","journal-title":"Am. J. Hum. Genet."},{"key":"B5","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/s00335-021-09914-z","article-title":"Best practices for analyzing imputed genotypes from low-pass sequencing in dogs","volume":"33","author":"Buckley","year":"2022","journal-title":"Mamm. Genome"},{"key":"B6","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1038\/nature05911","article-title":"Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls","volume":"447","author":"Burton","year":"2007","journal-title":"Nature"},{"key":"B7","doi-asserted-by":"publisher","first-page":"652","DOI":"10.3390\/genes10090652","article-title":"Sparse convolutional denoising autoencoders for genotype imputation","volume":"10","author":"Chen","year":"2019","journal-title":"Genes"},{"key":"B8","doi-asserted-by":"publisher","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"B9","doi-asserted-by":"publisher","first-page":"giab008","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"Danecek","year":"2021","journal-title":"GigaScience"},{"key":"B10","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1146\/annurev-genom-083117-021602","article-title":"Genotype imputation from large reference panels","volume":"19","author":"Das","year":"2018","journal-title":"Annu. Rev. Genom. Hum. Genet."},{"key":"B11","doi-asserted-by":"publisher","first-page":"1284","DOI":"10.1038\/ng.3656","article-title":"Next-generation genotype imputation service and methods","volume":"48","author":"Das","year":"2016","journal-title":"Nat. Genet."},{"key":"B12","doi-asserted-by":"publisher","first-page":"1104","DOI":"10.1038\/s41588-021-00877-0","article-title":"Rapid genotype imputation from sequence with reference panels","volume":"53","author":"Davies","year":"2021","journal-title":"Nat. Genet."},{"key":"B13","doi-asserted-by":"publisher","first-page":"901","DOI":"10.1186\/1756-0500-7-901","article-title":"Genotype harmonizer: automatic strand alignment and format conversion for genotype data integration","volume":"7","author":"Deelen","year":"2014","journal-title":"BMC Res. Notes"},{"key":"B14","doi-asserted-by":"publisher","first-page":"782","DOI":"10.1093\/bioinformatics\/btu704","article-title":"minimac2: faster genotype imputation","volume":"31","author":"Fuchsberger","year":"2015","journal-title":"Bioinformatics"},{"key":"B15","doi-asserted-by":"publisher","first-page":"e03395","DOI":"10.1016\/j.heliyon.2020.e03395","article-title":"DCNN for condition monitoring and fault detection in rotating machines and its contribution to the understanding of machine nature","volume":"6","author":"Gonz\u00e1lez-Mu\u00f1iz","year":"2020","journal-title":"Heliyon"},{"key":"B16","article-title":"Autoencoders,","volume-title":"Deep Learning","author":"Goodfellow","year":"2016"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1067","DOI":"10.1093\/hmg\/ddab305","article-title":"A multiethnic whole genome sequencing study to identify novel loci for bone mineral density","volume":"31","author":"Greenbaum","year":"2022","journal-title":"Hum. Mol. Genet."},{"key":"B18","doi-asserted-by":"publisher","first-page":"486","DOI":"10.1016\/S2095-3119(21)63695-X","article-title":"A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population","volume":"21","author":"Kai-li","year":"2022","journal-title":"J. Integr. Agric."},{"key":"B19","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"B20","doi-asserted-by":"publisher","first-page":"816","DOI":"10.1002\/gepi.20533","article-title":"MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes","volume":"34","author":"Li","year":"2010","journal-title":"Genet. Epidemiol."},{"key":"B21","doi-asserted-by":"publisher","first-page":"e9697","DOI":"10.1371\/journal.pone.0009697","article-title":"A new statistic to evaluate imputation reliability","volume":"5","author":"Lin","year":"2010","journal-title":"PLOS ONE"},{"key":"B22","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res."},{"key":"B23","doi-asserted-by":"publisher","first-page":"1639","DOI":"10.1038\/s41467-021-21975-x","article-title":"A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes","volume":"12","author":"Naito","year":"2021","journal-title":"Nat. Commun."},{"key":"B24","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1086\/519795","article-title":"PLINK: a tool set for whole-genome association and population-based linkage analyses","volume":"81","author":"Purcell","year":"2007","journal-title":"Am. J. Hum. Genet."},{"key":"B25","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1186\/s12863-014-0088-5","article-title":"Impact of pre-imputation SNP-filtering on genotype imputation results","volume":"15","author":"Roshyara","year":"2014","journal-title":"BMC Genet."},{"key":"B26","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1007\/978-94-6351-086-8_4","article-title":"Paired samples T-test,","volume-title":"Basic and Advanced Statistical Tests: Writing Results Sections and Creating Tables and Figures","author":"Ross","year":"2017"},{"key":"B27","doi-asserted-by":"publisher","first-page":"e1009049","DOI":"10.1371\/journal.pgen.1009049","article-title":"Genotype imputation using the Positional Burrows Wheeler Transform","volume":"16","author":"Rubinacci","year":"2020","journal-title":"PLOS Genet."},{"key":"B28","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1038\/s41588-020-00756-0","article-title":"Efficient phasing and imputation of low-coverage sequencing data using large reference panels","volume":"53","author":"Rubinacci","year":"2021","journal-title":"Nat. Genet."},{"key":"B29","doi-asserted-by":"publisher","first-page":"629","DOI":"10.1086\/502802","article-title":"A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase","volume":"78","author":"Scheet","year":"2006","journal-title":"Am. J. Hum. Genet."},{"key":"B30","doi-asserted-by":"publisher","first-page":"1341","DOI":"10.1126\/science.1142382","article-title":"A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants","volume":"316","author":"Scott","year":"2007","journal-title":"Science"},{"key":"B31","doi-asserted-by":"publisher","first-page":"570255","DOI":"10.3389\/fgene.2020.570255","article-title":"A review of integrative imputation for multi-omics datasets","volume":"11","author":"Song","year":"2020","journal-title":"Front. Genet."},{"key":"B32","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1534\/genetics.117.200063","article-title":"GeneImp: fast imputation to large reference panels using genotype likelihoods from ultralow coverage sequencing","volume":"206","author":"Spiliopoulou","year":"2017","journal-title":"Genetics"},{"key":"B33","doi-asserted-by":"publisher","first-page":"724037","DOI":"10.3389\/fgene.2021.724037","article-title":"Assessment of imputation quality: comparison of phasing and imputation algorithms in real data","volume":"12","author":"Stahl","year":"2021","journal-title":"Front. Genet."},{"key":"B34","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1007\/978-1-0716-1103-6_13","article-title":"Accurate imputation of untyped variants from deep sequencing data,","volume-title":"Deep Sequencing Data Analysis Methods in Molecular Biology","author":"Torkamaneh","year":"2021"},{"key":"B35","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1093\/bfgp\/elw027","article-title":"Applications of the 1000 genomes project resources","volume":"16","author":"Zheng-Bradley","year":"2017","journal-title":"Briefings in Functional Genomics"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.1028978\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,11,3]],"date-time":"2022-11-03T18:29:28Z","timestamp":1667500168000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2022.1028978\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,3]]},"references-count":35,"alternative-id":["10.3389\/frai.2022.1028978"],"URL":"https:\/\/doi.org\/10.3389\/frai.2022.1028978","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,3]]},"article-number":"1028978"}}