{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,4]],"date-time":"2024-09-04T06:45:25Z","timestamp":1725432325784},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data.<\/jats:p>\n               <jats:p>Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based approach to identify inherited variation of microsatellite loci from short sequence reads without paired-end information. It effectively distinguishes length variants from noise including insertion\/deletion errors in homopolymer runs by addressing the bidirectional aspect of insertion and deletion errors in sequence reads. Here we first introduce a homopolymer decomposition method which estimates error bias toward insertion or deletion in homopolymer sequence runs. Combining these approaches, GenoTan was able to genotype 94.9% of microsatellite loci accurately from simulated data with 40x sequence coverage quickly while the other programs showed &amp;lt;90% correct calls for the same data and required 5\u223c30\u00d7 more computational time than GenoTan. It also showed the highest true-positive rate for real data using mixed sequence data of two Drosophila inbred lines, which was a novel validation approach for genotyping.<\/jats:p>\n               <jats:p>Availability: GenoTan is open-source software available at http:\/\/genotan.sourceforge.net.<\/jats:p>\n               <jats:p>Contact: \u00a0garner@vbi.vt.edu<\/jats:p>\n               <jats:p>Supplementary Information: \u00a0Supplementary data are available at Bioinformatics online<\/jats:p>","DOI":"10.1093\/bioinformatics\/btt595","type":"journal-article","created":{"date-parts":[[2013,10,18]],"date-time":"2013-10-18T02:15:04Z","timestamp":1382062504000},"page":"652-659","source":"Crossref","is-referenced-by-count":16,"title":["Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs"],"prefix":"10.1093","volume":"30","author":[{"given":"Hongseok","family":"Tae","sequence":"first","affiliation":[{"name":"1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 and 2Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA"}]},{"given":"Dong-Yun","family":"Kim","sequence":"additional","affiliation":[{"name":"1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 and 2Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA"}]},{"given":"John","family":"McCormick","sequence":"additional","affiliation":[{"name":"1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 and 2Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA"}]},{"given":"Robert E.","family":"Settlage","sequence":"additional","affiliation":[{"name":"1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 and 2Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA"}]},{"given":"Harold R.","family":"Garner","sequence":"additional","affiliation":[{"name":"1 Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061 and 2Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA"}]}],"member":"286","published-online":{"date-parts":[[2013,10,17]]},"reference":[{"key":"2023012710430178100_btt595-B1","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1101\/gr.112326.110","article-title":"Dindel: accurate indel calls from short-read data","volume":"21","author":"Albers","year":"2011","journal-title":"Genome Res."},{"key":"2023012710430178100_btt595-B2","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res."},{"key":"2023012710430178100_btt595-B3","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1038\/nmeth.1230","article-title":"Alta-Cyclic: a self-optimizing base caller for next-generation sequencing","volume":"5","author":"Erlich","year":"2008","journal-title":"Nat. Methods"},{"key":"2023012710430178100_btt595-B4","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1101\/gr.8.3.175","article-title":"Base-calling of automated sequencer traces using phred. I. Accuracy assessment","volume":"8","author":"Ewing","year":"1998","journal-title":"Genome Res."},{"key":"2023012710430178100_btt595-B5","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1101\/gr.135780.111","article-title":"lobSTR: a short tandem repeat profiler for personal genomes","volume":"22","author":"Gymrek","year":"2012","journal-title":"Genome Res."},{"key":"2023012710430178100_btt595-B6","doi-asserted-by":"crossref","first-page":"e32","DOI":"10.1093\/nar\/gks981","article-title":"Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles","volume":"41","author":"Highnam","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023012710430178100_btt595-B7","doi-asserted-by":"crossref","first-page":"1533","DOI":"10.1093\/bioinformatics\/bts187","article-title":"pIRS: Profile-based Illumina pair-end reads simulator","volume":"28","author":"Hu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012710430178100_btt595-B8","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012710430178100_btt595-B9","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012710430178100_btt595-B10","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1038\/nature10811","article-title":"The Drosophila melanogaster Genetic Reference Panel","volume":"482","author":"Mackay","year":"2012","journal-title":"Nature"},{"key":"2023012710430178100_btt595-B11","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.ygeno.2011.01.001","article-title":"Evaluation of microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the quality and utility of the raw data and alignments","volume":"97","author":"McIver","year":"2011","journal-title":"Genomics"},{"key":"2023012710430178100_btt595-B12","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res."},{"key":"2023012710430178100_btt595-B13","doi-asserted-by":"crossref","first-page":"R112","DOI":"10.1186\/gb-2011-12-11-r112","article-title":"Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems","volume":"12","author":"Minoche","year":"2011","journal-title":"Genome Biol."},{"key":"2023012710430178100_btt595-B14","doi-asserted-by":"crossref","first-page":"3636","DOI":"10.1073\/pnas.92.9.3636","article-title":"Simple tandem DNA repeats and human genetic disease","volume":"92","author":"Sutherland","year":"1995","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023012710430178100_btt595-B15","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1038\/74238","article-title":"The direction of microsatellite mutations is dependent upon allele length","volume":"24","author":"Xu","year":"2000","journal-title":"Nat. Genet."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/5\/652\/48917976\/bioinformatics_30_5_652.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/5\/652\/48917976\/bioinformatics_30_5_652.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T11:02:32Z","timestamp":1674817352000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/5\/652\/247536"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,10,17]]},"references-count":15,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2014,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btt595","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,3,1]]},"published":{"date-parts":[[2013,10,17]]}}}