{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,14]],"date-time":"2026-02-14T10:37:48Z","timestamp":1771065468933,"version":"3.50.1"},"reference-count":16,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2016,10,12]],"date-time":"2016-10-12T00:00:00Z","timestamp":1476230400000},"content-version":"vor","delay-in-days":231,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly.<\/jats:p>\n               <jats:p>Results: We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies.<\/jats:p>\n               <jats:p>Availability and implementation: Documentation and source code for Alpha-CENTAURI are freely available at http:\/\/github.com\/volkansevim\/alpha-CENTAURI.<\/jats:p>\n               <jats:p>Contact: \u00a0ali.bashir@mssm.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw101","type":"journal-article","created":{"date-parts":[[2016,2,26]],"date-time":"2016-02-26T01:23:37Z","timestamp":1456449817000},"page":"1921-1924","source":"Crossref","is-referenced-by-count":51,"title":["Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing"],"prefix":"10.1093","volume":"32","author":[{"given":"Volkan","family":"Sevim","sequence":"first","affiliation":[{"name":"1 Pacific Biosciences, Inc., Menlo Park, CA 94025, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ali","family":"Bashir","sequence":"additional","affiliation":[{"name":"2 Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA"},{"name":"3 Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen-Shan","family":"Chin","sequence":"additional","affiliation":[{"name":"1 Pacific Biosciences, Inc., Menlo Park, CA 94025, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karen H.","family":"Miga","sequence":"additional","affiliation":[{"name":"4 Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA 95064, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2016,2,24]]},"reference":[{"key":"2023020112332275200_btw101-B1","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1007\/s004120100146","article-title":"Alpha-satellite DNA of primates: old and new families","volume":"110","author":"Alexandrov","year":"2001","journal-title":"Chromosoma"},{"key":"2023020112332275200_btw101-B2","doi-asserted-by":"crossref","first-page":"e181","DOI":"10.1371\/journal.pcbi.0030181","article-title":"Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data","volume":"3","author":"Alkan","year":"2007","journal-title":"PLoS Comput. Biol"},{"key":"2023020112332275200_btw101-B3","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023020112332275200_btw101-B4","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1038\/nrg1322","article-title":"An assessment of the sequence gaps: unfinished business in a finished human genome","volume":"5","author":"Eichler","year":"2004","journal-title":"Nat. Rev. Genet"},{"key":"2023020112332275200_btw101-B5","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1159\/000084979","article-title":"Repbase Update, a database of eukaryotic repetitive elements","volume":"110","author":"Jurka","year":"2005","journal-title":"Cytogent. Genome Res"},{"key":"2023020112332275200_btw101-B6","doi-asserted-by":"crossref","first-page":"e254","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol"},{"key":"2023020112332275200_btw101-B7","doi-asserted-by":"crossref","first-page":"2101","DOI":"10.1093\/bioinformatics\/btq343","article-title":"Global sequence characterization of rice centromeric satellite based on oligomer frequency analysis in large-scale sequencing data.\u201d","volume":"26","author":"Macas","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020112332275200_btw101-B8","first-page":"92","article-title":"Homology between human and simian repeated DNA","author":"Manuelidis","year":"1978"},{"key":"2023020112332275200_btw101-B9","doi-asserted-by":"crossref","first-page":"R10","DOI":"10.1186\/gb-2013-14-1-r10","article-title":"Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution","volume":"14","author":"Melters","year":"2013","journal-title":"Genome Biol"},{"key":"2023020112332275200_btw101-B10","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1007\/BF01840446","article-title":"AnO(ND) difference algorithm and its variations","volume":"1","author":"Myers","year":"1986","journal-title":"Algorithmica"},{"key":"2023020112332275200_btw101-B11","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1126\/science.1065042","article-title":"Genomic and genetic definition of a functional human centromere","volume":"294","author":"Schueler","year":"2001","journal-title":"Science"},{"key":"2023020112332275200_btw101-B12","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/S0022-2836(05)80056-7","article-title":"Genomic analysis of sequence variation in tandemly repeated DNA: evidence for localized homogeneous sequence domains within arrays of \u03b1-satellite DNA","volume":"216","author":"Warburton","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023020112332275200_btw101-B13","first-page":"6520","article-title":"Nonrandom localization of recombination events in human alpha satellite repeat unit variants: implications for higher-order structural characteristics within centromeric heterochromatin","volume":"13","author":"Warburton","year":"1993","journal-title":"Mol. Cell. Biol"},{"key":"2023020112332275200_btw101-B14","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1016\/0168-9525(90)90302-M","article-title":"Centromeres of mammalian chromosomes","volume":"6","author":"Willard","year":"1990","journal-title":"Trends Genet"},{"key":"2023020112332275200_btw101-B15","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1007\/BF02100014","article-title":"Chromosome-specific subsets of human alpha satellite DNA: analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat","volume":"25","author":"Willard","year":"1987","journal-title":"J. Mol. Evol"},{"key":"2023020112332275200_btw101-B16","doi-asserted-by":"crossref","first-page":"192","DOI":"10.1016\/0168-9525(87)90232-0","article-title":"Hierarchical order in chromosome-specific human alpha satellite DNA","volume":"3","author":"Willard","year":"1987","journal-title":"Trends Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/13\/1921\/49019628\/bioinformatics_32_13_1921.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/13\/1921\/49019628\/bioinformatics_32_13_1921.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:43:59Z","timestamp":1675291439000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/13\/1921\/1743232"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,2,24]]},"references-count":16,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2016,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw101","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,7,1]]},"published":{"date-parts":[[2016,2,24]]}}}