{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T07:46:20Z","timestamp":1775979980146,"version":"3.50.1"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2022,7,2]],"date-time":"2022-07-02T00:00:00Z","timestamp":1656720000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Vingroup Innovation Foundation","award":["VINIF.DA.2020.02"],"award-info":[{"award-number":["VINIF.DA.2020.02"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,7,18]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Despite the rapid development of sequencing technology, single-nucleotide polymorphism (SNP) arrays are still the most cost-effective genotyping solutions for large-scale genomic research and applications. Recent years have witnessed the rapid development of numerous genotyping platforms of different sizes and designs, but population-specific platforms are still lacking, especially for those in developing countries. SNP arrays designed for these countries should be cost-effective (small size), yet incorporate key information needed to associate genotypes with traits. A key design principle for most current platforms is to improve genome-wide imputation so that more SNPs not included in the array (imputed SNPs) can be predicted. However, current tag SNP selection methods mostly focus on imputation accuracy and coverage, but not the functional content of the array. It is those functional SNPs that are most likely associated with traits. Here, we propose LmTag, a novel method for tag SNP selection that not only improves imputation performance but also prioritizes highly functional SNP markers. We apply LmTag on a wide range of populations using both public and in-house whole-genome sequencing databases. Our results show that LmTag improved both functional marker prioritization and genome-wide imputation accuracy compared to existing methods. This novel approach could contribute to the next generation genotyping arrays that provide excellent imputation capability as well as facilitate array-based functional genetic studies. Such arrays are particularly suitable for under-represented populations in developing countries or non-model species, where little genomics data are available while investment in genome sequencing or high-density SNP arrays is limited. $\\textrm{LmTag}$ is available at: https:\/\/github.com\/datngu\/LmTag.<\/jats:p>","DOI":"10.1093\/bib\/bbac252","type":"journal-article","created":{"date-parts":[[2022,7,3]],"date-time":"2022-07-03T06:26:23Z","timestamp":1656829583000},"source":"Crossref","is-referenced-by-count":7,"title":["LmTag: functional-enrichment and imputation-aware tag SNP selection for population-specific genotyping arrays"],"prefix":"10.1093","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3852-9578","authenticated-orcid":false,"given":"Dat","family":"Thanh Nguyen","sequence":"first","affiliation":[{"name":"Center for Biomedical Informatics, Vingroup Big Data Institute , 458 Minh Khai, 10000, Hanoi, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Quan","family":"Hoang Nguyen","sequence":"additional","affiliation":[{"name":"Institute for Molecular Bioscience, University of Queensland , st Lucia, QLD 4067, Brisbane, Australia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nguyen","family":"Thuy Duong","sequence":"additional","affiliation":[{"name":"Center for Biomedical Informatics, Vingroup Big Data Institute , 458 Minh Khai, 10000, Hanoi, Vietnam"},{"name":"Institute of Genome Research, Vietnam Academy of Science and Technology , 18 Hoang Quoc Viet, 10000, Hanoi, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nam S","family":"Vo","sequence":"additional","affiliation":[{"name":"Center for Biomedical Informatics, Vingroup Big Data Institute , 458 Minh Khai, 10000, Hanoi, Vietnam"},{"name":"College of Engineering and Computer Science, VinUniversity , Vinhomes Ocean Park, 10000, Hanoi, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,7,2]]},"reference":[{"issue":"8","key":"2022072804092667400_ref1","doi-asserted-by":"crossref","first-page":"467","DOI":"10.1038\/s41576-019-0127-1","article-title":"Benefits and limitations of genome-wide association studies","volume":"20","author":"Tam","year":"2019","journal-title":"Nat Rev Genet"},{"issue":"1","key":"2022072804092667400_ref2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1016\/j.ajhg.2017.06.005","article-title":"10 years of GWASN discovery: biology, function, and translation","volume":"101","author":"Visscher","year":"2017","journal-title":"Am J Hum Genet"},{"issue":"10","key":"2022072804092667400_ref3","doi-asserted-by":"crossref","first-page":"1284","DOI":"10.1038\/ng.3656","article-title":"Next-generation genotype imputation service and methods","volume":"48","author":"Das","year":"2016","journal-title":"Nat Genet"},{"issue":"1","key":"2022072804092667400_ref4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/ncomms9111","article-title":"Improved imputation of low-frequency and rare variants using the uk10k haplotype reference panel","volume":"6","author":"Huang","year":"2015","journal-title":"Nat Commun"},{"issue":"10","key":"2022072804092667400_ref5","doi-asserted-by":"crossref","first-page":"1279","DOI":"10.1038\/ng.3643","article-title":"A reference panel of 64,976 haplotypes for genotype imputation","volume":"48","author":"McCarthy","year":"2016","journal-title":"Nat Genet"},{"issue":"10","key":"2022072804092667400_ref6","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1534\/g3.113.007161","article-title":"Imputation-based genomic coverage assessments of current human genotyping arrays","volume":"3","author":"Nelson","year":"2013","journal-title":"G3: Genes, Genomes, Genetics"},{"issue":"4","key":"2022072804092667400_ref7","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1038\/s41588-019-0379-x","article-title":"Clinical use of current polygenic risk scores may exacerbate health disparities","volume":"51","author":"Martin","year":"2019","journal-title":"Nat Genet"},{"issue":"3","key":"2022072804092667400_ref8","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1016\/j.cell.2019.08.051","volume":"179","author":"Peterson","year":"2019","journal-title":"Cell"},{"issue":"6090","key":"2022072804092667400_ref9","doi-asserted-by":"crossref","first-page":"100","DOI":"10.1126\/science.1217876","article-title":"An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people","volume":"337","author":"Nelson","year":"2012","journal-title":"Science"},{"issue":"7571","key":"2022072804092667400_ref10","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"issue":"7762","key":"2022072804092667400_ref11","doi-asserted-by":"crossref","first-page":"514","DOI":"10.1038\/s41586-019-1310-4","article-title":"Genetic analyses of diverse populations improves discovery for complex traits","volume":"570","author":"Wojcik","year":"2019","journal-title":"Nature"},{"issue":"1","key":"2022072804092667400_ref12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-11112-0","article-title":"Analysis of polygenic risk score usage and performance in diverse human populations","volume":"10","author":"Duncan","year":"2019","journal-title":"Nat Commun"},{"key":"2022072804092667400_ref13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13073-020-00742-5","article-title":"Polygenic risk scores: from research tools to clinical instruments","volume":"12","author":"Lewis","year":"2020","journal-title":"Genome Med"},{"issue":"7726","key":"2022072804092667400_ref14","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41586-018-0579-z","article-title":"The UK biobank resource with deep phenotyping and genomic data","volume":"562","author":"Bycroft","year":"2018","journal-title":"Nature"},{"issue":"2","key":"2022072804092667400_ref15","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1038\/ejhg.2016.152","article-title":"A method to customize population-specific arrays for genome-wide association testing","volume":"25","author":"Ehli","year":"2017","journal-title":"Eur J Hum Genet"},{"issue":"24","key":"2022072804092667400_ref16","first-page":"5321","article-title":"Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan biobank project","volume":"25","author":"Chen","year":"2016","journal-title":"Hum Mol Genet"},{"issue":"10","key":"2022072804092667400_ref17","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1016\/S2213-2600(19)30144-4","article-title":"Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in chinese populations","volume":"7","author":"Dai","year":"2019","journal-title":"Lancet Respir Med"},{"issue":"10","key":"2022072804092667400_ref18","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/jhg.2015.68","article-title":"Japonica array: improved genotype imputation by designing a population-specific SNP array with 1070 Japanese individuals","volume":"60","author":"Kawai","year":"2015","journal-title":"J Hum Genet"},{"key":"2022072804092667400_ref19","doi-asserted-by":"crossref","DOI":"10.1093\/jb\/mvab060","article-title":"Japonica array neo with increased genome-wide coverage and abundant disease risk SNPS","volume-title":"J Biochem","author":"Sakurai-Yageta"},{"issue":"1","key":"2022072804092667400_ref20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-018-37832-9","article-title":"The Korea biobank array: design and identification of coding variants associated with blood biochemical traits","volume":"9","author":"Moon","year":"2019","journal-title":"Sci Rep"},{"issue":"2","key":"2022072804092667400_ref21","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1038\/ng1001-233","article-title":"Haplotype tagging for the identification of common disease genes","volume":"29","author":"Johnson","year":"2001","journal-title":"Nat Genet"},{"issue":"5547","key":"2022072804092667400_ref22","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1126\/science.1065573","article-title":"Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21","volume":"294","author":"Patil","year":"2001","journal-title":"Science"},{"issue":"17","key":"2022072804092667400_ref23","doi-asserted-by":"crossref","first-page":"9900","DOI":"10.1073\/pnas.1633613100","article-title":"Minimal haplotype tagging","volume":"100","author":"Sebastiani","year":"2003","journal-title":"Proc Natl Acad Sci"},{"issue":"1","key":"2022072804092667400_ref24","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1086\/381000","article-title":"Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium","volume":"74","author":"Carlson","year":"2004","journal-title":"Am J Hum Genet"},{"issue":"1","key":"2022072804092667400_ref25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-11-66","article-title":"Fasttagger: an efficient algorithm for genome-wide tag snp selection using multi-marker linkage disequilibrium","volume":"11","author":"Liu","year":"2010","journal-title":"BMC Bioinformatics"},{"issue":"6","key":"2022072804092667400_ref26","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1016\/j.ygeno.2011.08.007","article-title":"Design and coverage of high throughput genotyping arrays optimized for individuals of east asian, african american, and Latino race\/ethnicity using imputation and a novel hybrid snp selection algorithm","volume":"98","author":"Hoffmann","year":"2011","journal-title":"Genomics"},{"issue":"10","key":"2022072804092667400_ref27","doi-asserted-by":"crossref","first-page":"3255","DOI":"10.1534\/g3.118.200502","article-title":"Imputation-aware tag SNP selection to improve power for large-scale, multi-ethnic association studies","volume":"8","author":"Wojcik","year":"2018","journal-title":"G3: Genes, Genomes, Genetics"},{"key":"2022072804092667400_ref28","volume-title":"Nature"},{"issue":"1","key":"2022072804092667400_ref29","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1093\/tas\/txz182","article-title":"Development of a low-density panel for genomic selection of pigs in Russia","volume":"4","author":"Shashkova","year":"2020","journal-title":"Transl Anim Sci"},{"issue":"1","key":"2022072804092667400_ref30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12863-018-0695-7","article-title":"Design of low density SNP chips for genotype imputation in layer chicken","volume":"19","author":"Herry","year":"2018","journal-title":"BMC Genet"},{"issue":"3","key":"2022072804092667400_ref31","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1086\/378098","article-title":"Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene scn1a: implications for linkage-disequilibrium gene mapping","volume":"73","author":"Weale","year":"2003","journal-title":"Am J Hum Genet"},{"key":"2022072804092667400_ref32","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1142\/9781848163324_0003","volume-title":"Genome Informatics 2008: Genome Informatics Series","author":"Wang","year":"2008"},{"issue":"23","key":"2022072804092667400_ref33","doi-asserted-by":"crossref","first-page":"3178","DOI":"10.1093\/bioinformatics\/btm496","article-title":"Genome-wide selection of tag snps using multiple-marker correlation","volume":"23","author":"Hao","year":"2007","journal-title":"Bioinformatics"},{"issue":"8","key":"2022072804092667400_ref34","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/s41576-018-0016-z","article-title":"From genome-wide associations to candidate causal variants by statistical fine-mapping","volume":"19","author":"Schaid","year":"2018","journal-title":"Nat Rev Genet"},{"key":"2022072804092667400_ref35","article-title":"The Harpy speech recognition system","author":"Lowerre","year":"1976"},{"issue":"D1","key":"2022072804092667400_ref36","doi-asserted-by":"crossref","first-page":"D896","DOI":"10.1093\/nar\/gkw1133","article-title":"The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog)","volume":"45","author":"MacArthur","year":"2017","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"2022072804092667400_ref37","doi-asserted-by":"crossref","first-page":"D1062","DOI":"10.1093\/nar\/gkx1153","article-title":"Clinvar: improving access to variant interpretations and supporting evidence","volume":"46","author":"Landrum","year":"2018","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"2022072804092667400_ref38","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat Genet"},{"issue":"1","key":"2022072804092667400_ref39","doi-asserted-by":"crossref","first-page":"s13742","DOI":"10.1186\/s13742-015-0047-8","article-title":"Second-generation plink: rising to the challenge of larger and richer datasets","volume":"4","author":"Chang","year":"2015","journal-title":"Gigascience"},{"issue":"D1","key":"2022072804092667400_ref40","doi-asserted-by":"crossref","first-page":"D886","DOI":"10.1093\/nar\/gky1016","article-title":"Cadd: predicting the deleteriousness of variants throughout the human genome","volume":"47","author":"Rentzsch","year":"2019","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"2022072804092667400_ref41","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1016\/j.ygeno.2011.04.005","article-title":"Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array","volume":"98","author":"Hoffmann","year":"2011","journal-title":"Genomics"},{"issue":"2","key":"2022072804092667400_ref42","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1016\/j.cie.2005.01.020","article-title":"Filtered and recovering beam search algorithms for the early\/tardy scheduling problem with no idle time","volume":"48","author":"Valente","year":"2005","journal-title":"Comput Indus Eng"},{"key":"2022072804092667400_ref43","doi-asserted-by":"crossref","DOI":"10.2139\/ssrn.3967671","article-title":"High coverage whole genome sequencing of the expanded 1000 genomes project cohort including 602 trios","author":"Byrska-Bishop","year":"2021"},{"issue":"1","key":"2022072804092667400_ref44","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13073-015-0221-8","article-title":"A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases","volume":"7","author":"Miller","year":"2015","journal-title":"Genome Med"},{"issue":"1","key":"2022072804092667400_ref45","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1002\/0471250953.bi1110s43","article-title":"From fastq data to high-confidence variant calls: the genome analysis toolkit best practices pipeline","volume":"43","author":"Auwera","year":"2013","journal-title":"Curr Protoc Bioinformatics"},{"issue":"1","key":"2022072804092667400_ref46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-019-13225-y","article-title":"Accurate, scalable and integrative haplotype estimation","volume":"10","author":"Delaneau","year":"2019","journal-title":"Nat Commun"},{"issue":"1","key":"2022072804092667400_ref47","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1111\/j.1365-2052.2011.02208.x","article-title":"Accuracy of genotype imputation in sheep breeds","volume":"43","author":"Hayes","year":"2012","journal-title":"Anim Genet"},{"key":"2022072804092667400_ref48","doi-asserted-by":"crossref","first-page":"472","DOI":"10.3389\/fgene.2018.00472","article-title":"Development and validation of 58k snp-array and high-density linkage map in Nile tilapia (O. niloticus)","volume":"9","author":"Joshi","year":"2018","journal-title":"Front Genet"},{"issue":"7","key":"2022072804092667400_ref49","doi-asserted-by":"crossref","first-page":"4136","DOI":"10.3168\/jds.2011-5133","article-title":"Imputation performances of 3 low-density marker panels in beef and dairy cattle","volume":"95","author":"Romain Dassonneville","year":"2012","journal-title":"J Dairy Sci"},{"issue":"1","key":"2022072804092667400_ref50","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-017-09285-z","article-title":"Genome-wide target enrichment-aided chip design: a 66 k SNP chip for cashmere goat","volume":"7","author":"Qiao","year":"2017","journal-title":"Sci Rep"},{"issue":"2","key":"2022072804092667400_ref51","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1093\/bioinformatics\/btl574","article-title":"Ldcompare: rapid computation of single-and multiple-marker r 2 and genetic coverage","volume":"23","author":"Hao","year":"2007","journal-title":"Bioinformatics"},{"key":"2022072804092667400_ref52","first-page":"1","volume-title":"2021 13th International Conference on Knowledge and Systems Engineering (KSE)","author":"Nguyen","year":"2021"},{"issue":"7","key":"2022072804092667400_ref53","doi-asserted-by":"crossref","first-page":"1006","DOI":"10.1093\/bioinformatics\/btt730","article-title":"Crossmap: a versatile tool for coordinate conversion between genome assemblies","volume":"30","author":"Zhao","year":"2014","journal-title":"Bioinformatics"},{"issue":"5","key":"2022072804092667400_ref54","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1038\/nrg2760","article-title":"Genome-wide association studies in diverse populations","volume":"11","author":"Rosenberg","year":"2010","journal-title":"Nat Rev Genet"},{"issue":"5903","key":"2022072804092667400_ref55","doi-asserted-by":"crossref","first-page":"881","DOI":"10.1126\/science.1156409","article-title":"Genetic mapping in human disease","volume":"322","author":"Altshuler","year":"2008","journal-title":"Science"},{"key":"2022072804092667400_ref56","doi-asserted-by":"crossref","DOI":"10.1038\/s41431-021-00917-7","article-title":"A comparison of genotyping arrays","volume":"29","author":"Verlouw","year":"2021","journal-title":"Eur J Hum Genet"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac252\/45041957\/bbac252.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/23\/4\/bbac252\/45041957\/bbac252.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,28]],"date-time":"2022-07-28T00:11:17Z","timestamp":1658967077000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbac252\/6627269"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,2]]},"references-count":56,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,7,18]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbac252","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.01.28.478108","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,21]]},"published":{"date-parts":[[2022,7,2]]},"article-number":"bbac252"}}