{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,11,18]],"date-time":"2023-11-18T22:33:38Z","timestamp":1700346818112},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2016,12,19]],"date-time":"2016-12-19T00:00:00Z","timestamp":1482105600000},"content-version":"vor","delay-in-days":14,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Niger Delta Development Commission Postgraduate"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The advent of Next Generation Sequencing (NGS) has led to the generation of enormous volumes of short read sequence data, cheaply and in reasonable time scales. Nevertheless, the quality of genome assemblies generated using NGS technologies has been greatly affected, compared to those generated using Sanger DNA sequencing. This is largely due to the inability of short read sequence data to scaffold repetitive structures, creating gaps, inversions and rearrangements and resulting in assemblies that are, at best, draft forms. Third generation single-molecule sequencing (SMS) technologies (e.g. Pacific Biosciences Single Molecule Real Time (SMRT) system) address this challenge by generating sequences with increased read lengths, offering the prospect to better recover these complex repetitive structures, concomitantly improving assembly quality.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we evaluate the ability of SMS data (specifically human genome Pacific Biosciences SMRT data) to recover poorly represented repetitive sequences (specifically, GC-rich human minisatellites). To do this we designed a pipeline for the collection, processing and local assembly of single-molecule sequence data to form accurate contiguous local reconstructions. Our results show the recovery of an allele of the non-coding minisatellite MS1 (located on chromosome 1 at 1p33-35) at greater than 97% identity to reference (GRCh38) from the unprocessed sequence data of a haploid complete hydatidiform mole (CHM1) cell line. Furthermore, our assembly revealed an allele of over 500 repeat units; much larger than the reference (GRCh38), but consistent in structure with naturally occurring alleles that are segregating in human populations. This local assembly\u2019s reconstruction was validated with the release of the whole genome assemblies GCA_001297185.1 and GCA_000772585.3, where this allele occurs. Additionally, application of this pipeline to coding minisatellites in the PRDM9 and ZNF93 genes enabled recovery of high identity allele structures for these sequence regions whose length was confirmed by PCR from cell line genomic DNA. The internal repeat structure of the PRDM9 allele recovered was consistent with common human-specific alleles.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>Code available at https:\/\/github.com\/ndliberial\/smrt_pipeline<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw687","type":"journal-article","created":{"date-parts":[[2016,12,21]],"date-time":"2016-12-21T01:20:12Z","timestamp":1482283212000},"page":"650-653","source":"Crossref","is-referenced-by-count":2,"title":["A pipeline for local assembly of minisatellite alleles from single-molecule sequencing data"],"prefix":"10.1093","volume":"33","author":[{"given":"Denye","family":"Ogeh","sequence":"first","affiliation":[{"name":"Department of Genetics, University of Leicester, Leicester, UK"}]},{"given":"Richard","family":"Badge","sequence":"additional","affiliation":[{"name":"Department of Genetics, University of Leicester, Leicester, UK"}]}],"member":"286","published-online":{"date-parts":[[2016,12,5]]},"reference":[{"key":"2023020204420832600_btw687-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023020204420832600_btw687-B2","doi-asserted-by":"crossref","first-page":"573","DOI":"10.1093\/nar\/27.2.573","article-title":"Tandem repeats finder: a program to analyze DNA sequences","volume":"27","author":"Benson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023020204420832600_btw687-B3","doi-asserted-by":"crossref","first-page":"859","DOI":"10.1038\/ng.658","article-title":"PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans","volume":"42","author":"Berg","year":"2010","journal-title":"Nat. Genet"},{"key":"2023020204420832600_btw687-B4","doi-asserted-by":"crossref","first-page":"e1000475.","DOI":"10.1371\/journal.pbio.1000475","article-title":"Multi-Platform Next-Generation Sequencing of the Domestic Turkey (Meleagris gallopavo): Genome Assembly and Analysis","volume":"8","author":"Dalloul","year":"2010","journal-title":"PLoS Biol"},{"key":"2023020204420832600_btw687-B5","doi-asserted-by":"crossref","first-page":"e47768.","DOI":"10.1371\/journal.pone.0047768","article-title":"Mind the Gap: Upgrading Genomes with Pacific Biosciences RS Long-Read Sequencing Technology","volume":"77","author":"English","year":"2012","journal-title":"PLoS ONE"},{"key":"2023020204420832600_btw687-B6","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1006\/geno.2001.6676","article-title":"Paternal origins of complete hydatidiform moles proven by whole genome single-nucleotide polymorphism haplotyping","volume":"79","author":"Fan","year":"2002","journal-title":"Genomics"},{"key":"2023020204420832600_btw687-B7","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1098\/rspb.1991.0038","article-title":"Evolutionary transience of hypervariable minisatellites in man and the primates","volume":"243","author":"Gray","year":"1991","journal-title":"Proc. Biol. Sci"},{"key":"2023020204420832600_btw687-B8","doi-asserted-by":"crossref","first-page":"901","DOI":"10.2217\/pgs.12.72","article-title":"Next-generation sequencing and large genome assemblies","volume":"13","author":"Henson","year":"2012","journal-title":"Pharmacogenomics"},{"key":"2023020204420832600_btw687-B9","doi-asserted-by":"crossref","first-page":"688","DOI":"10.1101\/gr.168450.113","article-title":"Reconstructing complex regions of genomes using long-read sequencing technology","volume":"24","author":"Huddleston","year":"2014","journal-title":"Genome Res"},{"key":"2023020204420832600_btw687-B10","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1038\/ng.872","article-title":"Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals","volume":"43","author":"Ju","year":"2011","journal-title":"Nat. Genet"},{"key":"2023020204420832600_btw687-B24","first-page":"656","article-title":"BLAT - the BLAST-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023020204420832600_btw687-B25","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res."},{"key":"2023020204420832600_btw687-B11","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1101\/gr.113985.110","article-title":"Adaptive seeds tame genomic sequence comparison","volume":"21","author":"Kie\u0142basa","year":"2011","journal-title":"Genome Res"},{"key":"2023020204420832600_btw687-B12","author":"Li","year":"2013"},{"key":"2023020204420832600_btw687-B13","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinforma. Oxf. Engl"},{"key":"2023020204420832600_btw687-B14","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1038\/nature08696","article-title":"The sequence and de novo assembly of the giant panda genome","volume":"463","author":"Li","year":"2010","journal-title":"Nature"},{"key":"2023020204420832600_btw687-B15","doi-asserted-by":"crossref","first-page":"e106689.","DOI":"10.1371\/journal.pone.0106689","article-title":"Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-Repetitive Transposable Elements","volume":"99","author":"McCoy","year":"2014","journal-title":"Plos One"},{"key":"2023020204420832600_btw687-B16","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A Whole-Genome Assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023020204420832600_btw687-B17","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: a flexible suite of utilities for comparing genomic features","volume":"26","author":"Quinlan","year":"2010","journal-title":"Bioinforma. Oxf. Engl"},{"key":"2023020204420832600_btw687-B18","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1038\/nbt.1754","article-title":"Integrative genomics viewer","volume":"29","author":"Robinson","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023020204420832600_btw687-B19","doi-asserted-by":"crossref","first-page":"557","DOI":"10.1101\/gr.131383.111","article-title":"GAGE: A critical evaluation of genome assemblies and assembly algorithms","volume":"22","author":"Salzberg","year":"2012","journal-title":"Genome Res"},{"key":"2023020204420832600_btw687-B20","doi-asserted-by":"crossref","first-page":"1165","DOI":"10.1101\/gr.101360.109","article-title":"Assembly of large genomes using second-generation sequencing","volume":"20","author":"Schatz","year":"2010","journal-title":"Genome Res"},{"key":"2023020204420832600_btw687-B21","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1038\/nature08795","article-title":"Complete Khoisan and Bantu genomes from southern Africa","volume":"463463","author":"Schuster","year":"2010","journal-title":"Nature"},{"key":"2023020204420832600_btw687-B22","author":"Smit","year":"(2013)"},{"key":"2023020204420832600_btw687-B23","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1038\/nature07484","article-title":"The diploid genome sequence of an Asian individual","volume":"456456","author":"Wang","year":"2008","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/5\/650\/49037585\/bioinformatics_33_5_650.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/5\/650\/49037585\/bioinformatics_33_5_650.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T04:46:39Z","timestamp":1675313199000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/5\/650\/2724087"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2016,12,5]]},"references-count":25,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2017,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw687","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,3,1]]},"published":{"date-parts":[[2016,12,5]]}}}