{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,19]],"date-time":"2026-01-19T12:28:51Z","timestamp":1768825731852,"version":"3.49.0"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2021,7,12]],"date-time":"2021-07-12T00:00:00Z","timestamp":1626048000000},"content-version":"vor","delay-in-days":11,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004285","name":"St. Petersburg State University","doi-asserted-by":"publisher","award":["73023672"],"award-info":[{"award-number":["73023672"]}],"id":[{"id":"10.13039\/501100004285","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,8,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for \u2018live\u2019 centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>CentromereArchitect is publicly available on https:\/\/github.com\/ablab\/stringdecomposer\/tree\/ismb2021<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab265","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T20:31:57Z","timestamp":1619555517000},"page":"i196-i204","source":"Crossref","is-referenced-by-count":24,"title":["CentromereArchitect: inference and analysis of the architecture of centromeres"],"prefix":"10.1093","volume":"37","author":[{"given":"Tatiana","family":"Dvorkina","sequence":"first","affiliation":[{"name":"Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University , Saint Petersburg 199034, Russia"}]},{"given":"Olga","family":"Kunyavskaya","sequence":"additional","affiliation":[{"name":"Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University , Saint Petersburg 199034, Russia"}]},{"given":"Andrey V","family":"Bzikadze","sequence":"additional","affiliation":[{"name":"Graduate Program in Bioinformatics and Systems Biology, University of California , San Diego, CA 92093, USA"}]},{"given":"Ivan","family":"Alexandrov","sequence":"additional","affiliation":[{"name":"Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University , Saint Petersburg 199034, Russia"}]},{"given":"Pavel A","family":"Pevzner","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, University of California , San Diego, CA 92093, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,7,12]]},"reference":[{"key":"2023062410170770100_btab265-B1","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1007\/s004120100146","article-title":"Alpha-satellite DNA of primates: old and new families","volume":"110","author":"Alexandrov","year":"2001","journal-title":"Chromosoma"},{"key":"2023062410170770100_btab265-B2","doi-asserted-by":"crossref","first-page":"911","DOI":"10.3390\/genes11080911","article-title":"Centromeric transcription: a conserved Swiss-Army knife","volume":"11","author":"Arunkumar","year":"2020","journal-title":"Genes"},{"key":"2023062410170770100_btab265-B3","doi-asserted-by":"crossref","first-page":"e181","DOI":"10.1371\/journal.pcbi.0030181","article-title":"Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data","volume":"3","author":"Alkan","year":"2007","journal-title":"PLoS Comput. Biol"},{"key":"2023062410170770100_btab265-B4","doi-asserted-by":"crossref","first-page":"615","DOI":"10.3390\/genes9120615","article-title":"Repetitive fragile sites: centromere satellite DNA as a source of genome instability in human diseases","volume":"9","author":"Black","year":"2018","journal-title":"Genes"},{"key":"2023062410170770100_btab265-B5","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1038\/s41587-020-0582-4","article-title":"centroFlye: assembling centromeres with long error-prone reads","volume":"38","author":"Bzikadze","year":"2020","journal-title":"Nat. Biotechnol"},{"key":"2023062410170770100_btab265-B6","doi-asserted-by":"crossref","DOI":"10.1038\/s41592-020-01056-5","article-title":"Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm","author":"Cheng","year":"2021"},{"key":"2023062410170770100_btab265-B7","doi-asserted-by":"crossref","first-page":"i93","DOI":"10.1093\/bioinformatics\/btaa454","article-title":"The string decomposition problem and its applications to centromere assembly","volume":"36","author":"Dvorkina","year":"2020","journal-title":"Bioinformatics"},{"key":"2023062410170770100_btab265-B8","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1126\/science.1062939","article-title":"The centromere paradox: stable inheritance with rapidly evolving DNA","volume":"293","author":"Henikoff","year":"2001","journal-title":"Science"},{"key":"2023062410170770100_btab265-B9","doi-asserted-by":"crossref","first-page":"e42989","DOI":"10.7554\/eLife.42989","article-title":"Haplotypes spanning centromeric regions reveal persistence of large blocks of archaic DNA","volume":"8","author":"Langley","year":"2019","journal-title":"Elife"},{"key":"2023062410170770100_btab265-B10","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-021-03420-7","article-title":"The structure, function, and evolution of a complete human chromosome 8","author":"Logsdon,G.A., Vollger, M.R., Hsieh, P. Logsdon","year":"2021","journal-title":"Nature"},{"key":"2023062410170770100_btab265-B11","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1016\/j.gde.2018.03.003","article-title":"Satellite DNA evolution: old ideas, new approaches","volume":"49","author":"Lower","year":"2018","journal-title":"Curr. Opin. Genet. Dev"},{"key":"2023062410170770100_btab265-B12","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1016\/j.cell.2009.08.036","article-title":"Major evolutionary transitions in centromere complexity","volume":"138","author":"Malik","year":"2009","journal-title":"Cell"},{"key":"2023062410170770100_btab265-B13","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/s10577-018-9582-3","article-title":"Alpha satellite DNA biology: finding function in the recesses of the genome","volume":"26","author":"McNulty","year":"2018","journal-title":"Chromosome Res"},{"key":"2023062410170770100_btab265-B14","doi-asserted-by":"crossref","first-page":"697","DOI":"10.1101\/gr.159624.113","article-title":"Centromere reference models for human chromosomes X and y satellite arrays","volume":"24","author":"Miga","year":"2014","journal-title":"Genome Res"},{"key":"2023062410170770100_btab265-B15","doi-asserted-by":"crossref","first-page":"352","DOI":"10.3390\/genes10050352","article-title":"Centromeric satellite DNAs: hidden sequence variation in the human population","volume":"10","author":"Miga","year":"2019","journal-title":"Genes"},{"key":"2023062410170770100_btab265-B16","doi-asserted-by":"crossref","first-page":"112127","DOI":"10.1016\/j.yexcr.2020.112127","article-title":"Centromere studies in the era of \u201ctelomere-to-telomere\u201d genomics","volume":"394","author":"Miga","year":"2020","journal-title":"Exp. Cell Res"},{"key":"2023062410170770100_btab265-B17","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1038\/s41586-020-2547-7","article-title":"Telomere-to-telomere assembly of a complete human X chromosome","volume":"585","author":"Miga","year":"2020","journal-title":"Nature"},{"key":"2023062410170770100_btab265-B18","doi-asserted-by":"crossref","first-page":"i75","DOI":"10.1093\/bioinformatics\/btaa440","article-title":"TandemTools: mapping long reads and assessing\/improving assembly quality in extra-long tandem repeats","volume":"36","author":"Mikheenko","year":"2020","journal-title":"Bioinformatics"},{"key":"2023062410170770100_btab265-B19","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1038\/nrg3245","article-title":"Human aneuploidy: mechanisms and new insights into an age-old problem","volume":"13","author":"Nagaoka","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023062410170770100_btab265-B128","doi-asserted-by":"publisher","DOI":"10.1101\/2021.05.26.445798","article-title":"The complete sequence of a human genome","author":"Nurk","year":"2021","journal-title":"bioRxiv"},{"key":"2023062410170770100_btab265-B20","doi-asserted-by":"crossref","first-page":"1291","DOI":"10.1101\/gr.263566.120","article-title":"HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads","volume":"30","author":"Nurk","year":"2020","journal-title":"Genome Res"},{"key":"2023062410170770100_btab265-B21","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1093\/bioinformatics\/bti072","article-title":"ColorHOR\u2014novel graphical algorithm for fast scan of alpha satellite higher-order repeats and HOR annotation for GenBank sequence of human genome","volume":"21","author":"Paar","year":"2005","journal-title":"Bioinformatics"},{"key":"2023062410170770100_btab265-B22","doi-asserted-by":"crossref","first-page":"D670","DOI":"10.1093\/nar\/gku1177","article-title":"The UCSC genome browser database: 2015 update","volume":"43","author":"Rosenbloom","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062410170770100_btab265-B23","doi-asserted-by":"crossref","first-page":"1921","DOI":"10.1093\/bioinformatics\/btw101","article-title":"Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing","volume":"32","author":"Sevim","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062410170770100_btab265-B24","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol. Syst. Biol"},{"key":"2023062410170770100_btab265-B25","doi-asserted-by":"crossref","first-page":"e1000641","DOI":"10.1371\/journal.pgen.1000641","article-title":"The evolutionary origin of man can be traced in the layers of defunct ancestral alpha satellites flanking the active centromeres of human chromosomes","volume":"5","author":"Shepelev","year":"2009","journal-title":"PLoS Genet"},{"key":"2023062410170770100_btab265-B26","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1016\/j.gdata.2015.05.035","article-title":"Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly","volume":"5","author":"Shepelev","year":"2015","journal-title":"Genome Data"},{"key":"2023062410170770100_btab265-B27","doi-asserted-by":"crossref","first-page":"674","DOI":"10.3389\/fgene.2018.00674","article-title":"Centromere and pericentromere transcription: roles and regulation \u2026 in Sickness and in Health","volume":"9","author":"Smurova","year":"2018","journal-title":"Front. Genet"},{"key":"2023062410170770100_btab265-B28","volume-title":"Data Compression: Methods and Theory","author":"Storer","year":"1987"},{"key":"2023062410170770100_btab265-B29","doi-asserted-by":"crossref","first-page":"eabd9230","DOI":"10.1126\/sciadv.abd9230","article-title":"Rapid and ongoing evolution of repetitive sequence structures in human centromeres","volume":"6","author":"Suzuki","year":"2020","journal-title":"Sci. Adv"},{"key":"2023062410170770100_btab265-B30","doi-asserted-by":"crossref","first-page":"103708","DOI":"10.1016\/j.dib.2019.103708","article-title":"Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly","volume":"24","author":"Uralsky","year":"2019","journal-title":"Data Brief"},{"key":"2023062410170770100_btab265-B31","doi-asserted-by":"crossref","first-page":"2731","DOI":"10.1093\/nar\/13.8.2731","article-title":"Chromosome-specific alpha satellite DNA: nucleotide sequence analysis of the 2.0 kilobasepair repeat from the human X chromosome","volume":"13","author":"Waye","year":"1985","journal-title":"Nucleic Acids Res"},{"key":"2023062410170770100_btab265-B32","doi-asserted-by":"crossref","first-page":"842","DOI":"10.1016\/j.molcel.2018.04.023","article-title":"Heterochromatin-encoded satellite RNAs induce breast cancer","volume":"70","author":"Zhu","year":"2018","journal-title":"Mol. Cell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/Supplement_1\/i196\/50694211\/btab265.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/Supplement_1\/i196\/50694211\/btab265.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,25]],"date-time":"2023-06-25T00:16:09Z","timestamp":1687652169000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/Supplement_1\/i196\/6319687"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,1]]},"references-count":33,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2021,8,4]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab265","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,7,1]]},"published":{"date-parts":[[2021,7,1]]}}}