{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T11:45:58Z","timestamp":1777117558274,"version":"3.51.4"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2017,4,7]],"date-time":"2017-04-07T00:00:00Z","timestamp":1491523200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2017,4,7]],"date-time":"2017-04-07T00:00:00Z","timestamp":1491523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Sci Rep"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We address the problem of discovering pairs of symmetric genomic words (i.e., words and the corresponding reversed complements) occurring at distances that are overrepresented. For this purpose, we developed new procedures to identify symmetric word pairs with uncommon empirical distance distribution and with clusters of overrepresented short distances. We speculate that patterns of overrepresentation of short distances between symmetric word pairs may allow the occurrence of non-standard DNA conformations, such as hairpin\/cruciform structures. We focused on the human genome, and analysed both the complete genome as well as a version with known repetitive sequences masked out. We reported several well-defined features in the distributions of distances, which can be classified into three different profiles, showing enrichment in distinct distance ranges. We analysed in greater detail certain pairs of symmetric words of length seven, found by our procedure, characterised by the surprising fact that they occur at single distances more frequently than expected.<\/jats:p>","DOI":"10.1038\/s41598-017-00646-2","type":"journal-article","created":{"date-parts":[[2017,4,3]],"date-time":"2017-04-03T14:23:29Z","timestamp":1491229409000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["DNA word analysis based on the distribution of the distances between symmetric words"],"prefix":"10.1038","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4632-3561","authenticated-orcid":false,"given":"Ana H. M. P.","family":"Tavares","sequence":"first","affiliation":[]},{"given":"Armando J.","family":"Pinho","sequence":"additional","affiliation":[]},{"given":"Raquel M.","family":"Silva","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9187-8094","authenticated-orcid":false,"given":"Jo\u00e3o M. O. S.","family":"Rodrigues","sequence":"additional","affiliation":[]},{"given":"Carlos A. C.","family":"Bastos","sequence":"additional","affiliation":[]},{"given":"Paulo J. S. G.","family":"Ferreira","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1051-8084","authenticated-orcid":false,"given":"Vera","family":"Afreixo","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2017,4,7]]},"reference":[{"key":"646_CR1","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1016\/S0378-1119(00)00472-8","volume":"261","author":"DR Forsdyke","year":"2000","unstructured":"Forsdyke, D. R. & Mortimer, J. R. Chargaff\u2019s legacy. Gene \n                           261, 127\u2013137 (2000).","journal-title":"Gene"},{"key":"646_CR2","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1093\/dnares\/dsp021","volume":"16","author":"B Powdel","year":"2009","unstructured":"Powdel, B. et al. A study in entire chromosomes of violations of the intra-strand parity of complementary nucleotides (Chargaff\u2019s second parity rule). DNA Research \n                           16, 325\u2013343 (2009).","journal-title":"DNA Research"},{"key":"646_CR3","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1093\/biostatistics\/kxu041","volume":"16","author":"V Afreixo","year":"2015","unstructured":"Afreixo, V., Rodrigues, J. M. & Bastos, C. A. C. Analysis of single-strand exceptional word symmetry in the human genome: new measures. Biostatistics \n                           16, 209\u2013221 (2015).","journal-title":"Biostatistics"},{"key":"646_CR4","first-page":"269","volume":"4","author":"H Zhang","year":"2013","unstructured":"Zhang, H., Zhong, H.-S. & Zhang, S.-H. Conservation vs. variation of dinucleotide frequencies across bacterial and archaeal genomes: evolutionary implications. Frontiers in Microbiology \n                           4, 269 (2013).","journal-title":"Frontiers in Microbiology"},{"key":"646_CR5","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1186\/1471-2199-12-33","volume":"12","author":"V Br\u00e1zda","year":"2011","unstructured":"Br\u00e1zda, V., Laister, R. C., Jagelsk\u00e1, E. B. & Arrowsmith, C. Cruciform structures are a common dna feature important for regulating biological processes. BMC Molecular Biology \n                           12, 33 (2011).","journal-title":"BMC Molecular Biology"},{"key":"646_CR6","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1007\/s10577-009-9039-9","volume":"17","author":"J Kolb","year":"2009","unstructured":"Kolb, J. et al. Cruciform-forming inverted repeats appear to have mediated many of the microinversions that distinguish the human and chimpanzee genomes. Chromosome Research \n                           17, 469\u2013483 (2009).","journal-title":"Chromosome Research"},{"key":"646_CR7","doi-asserted-by":"publisher","first-page":"125","DOI":"10.3389\/fgene.2016.00125","volume":"7","author":"H Inagaki","year":"2016","unstructured":"Inagaki, H. et al. Palindrome-mediated translocations in humans: A new mechanistic model for gross chromosomal rearrangements. Frontiers in Genetics \n                           7, 125 (2016).","journal-title":"Frontiers in Genetics"},{"key":"646_CR8","doi-asserted-by":"publisher","first-page":"446","DOI":"10.1186\/1471-2105-7-446","volume":"7","author":"M Hackenberg","year":"2006","unstructured":"Hackenberg, M. et al. CpGcluster: a distance-based algorithm for CpG-island detection. BMC Bioinformatics \n                           7, 446 (2006).","journal-title":"BMC Bioinformatics"},{"key":"646_CR9","doi-asserted-by":"publisher","first-page":"3064","DOI":"10.1093\/bioinformatics\/btp546","volume":"25","author":"V Afreixo","year":"2009","unstructured":"Afreixo, V., Bastos, C. A. C., Pinho, A. J., Garcia, S. P. & Ferreira, P. J. S. G. Genome analysis with inter-nucleotide distances. Bioinformatics \n                           25, 3064\u20133070 (2009).","journal-title":"Bioinformatics"},{"key":"646_CR10","unstructured":"Genome Browser team. GRCh38\/hg38 assembly of the human genome, masked, one file per chromosome. URL http:\/\/hgdownload.cse.ucsc.edu\/goldenPath\/hg38\/bigZips\/hg38.chromFaMasked.tar.gz."},{"key":"646_CR11","unstructured":"Smit, A. F. A., Hubley, R. M. & Green, P. RepeatMasker Open \u2013 4.0. 2013\u20132015 (http:\/\/repeatmasker.org). URL http:\/\/repeatmasker.org."},{"key":"646_CR12","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1093\/nar\/27.2.573","volume":"27","author":"G Benson","year":"1999","unstructured":"Benson, G. et al. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research \n                           27, 573\u2013580 (1999).","journal-title":"Nucleic Acids Research"},{"key":"646_CR13","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.1126\/science.1058040","volume":"291","author":"JC Venter","year":"2001","unstructured":"Venter, J. C. et al. The sequence of the human genome. Science \n                           291, 1304\u20131351 (2001).","journal-title":"Science"},{"key":"646_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1371\/journal.pbio.0050254","volume":"5","author":"S Levy","year":"2007","unstructured":"Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol \n                           5, 1\u201332 (2007).","journal-title":"PLoS Biol"},{"key":"646_CR15","unstructured":"Tavares, A. H. M. P. et al. Detection of exceptional genomic words: A comparison between species. In Proceedings of 22nd International Conference on Computational Statistics (COMPSTAT), 255\u2013264 (2016)."},{"key":"646_CR16","first-page":"957","volume":"6","author":"JC Fu","year":"1996","unstructured":"Fu, J. C. Distribution theory of runs and patterns associated with a sequence of multi-state trials. Statistica Sinica \n                           6, 957\u2013974 (1996).","journal-title":"Statistica Sinica"},{"key":"646_CR17","doi-asserted-by":"publisher","first-page":"1277","DOI":"10.1016\/j.febslet.2006.01.045","volume":"580","author":"Y Wang","year":"2006","unstructured":"Wang, Y. & Leung, F. C. Long inverted repeats in eukaryotic genomes: Recombinogenicmotifs determine genomic plasticity. FEBS Letters \n                           580, 1277\u20131284 (2006).","journal-title":"FEBS Letters"},{"key":"646_CR18","doi-asserted-by":"publisher","first-page":"D383","DOI":"10.1093\/nar\/gkq1170","volume":"39","author":"RZ Cer","year":"2011","unstructured":"Cer, R. Z. et al. Non-b db: a database of predicted non-b dna-forming motifs in mammalian genomes. Nucleic Acids Research \n                           39, D383\u2013D391 (2011).","journal-title":"Nucleic Acids Research"},{"key":"646_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s00239-003-2493-7","volume":"58","author":"J Qi","year":"2004","unstructured":"Qi, J., Wang, B. & Hao, B.-I. Whole proteome prokaryote phylogeny without sequence alignment: A K-string composition approach. Journal of Molecular Evolution \n                           58, 1\u201311 (2004).","journal-title":"Journal of Molecular Evolution"},{"key":"646_CR20","doi-asserted-by":"publisher","first-page":"618","DOI":"10.1016\/j.jtbi.2010.05.027","volume":"265","author":"S Ding","year":"2010","unstructured":"Ding, S., Dai, Q., Liu, H. & Wang, T. A simple feature representation vector for phylogenetic analysis of DNA sequences. Journal of Theoretical Biology \n                           265, 618\u2013623 (2010).","journal-title":"Journal of Theoretical Biology"},{"key":"646_CR21","doi-asserted-by":"crossref","unstructured":"Agresti, A. An Introduction to Categorical Data Analysis (Wiley, 2007).","DOI":"10.1002\/0470114754"},{"key":"646_CR22","unstructured":"Rea, L. M. & Parker, R. A. Designing and Conducting Survey Research (Jossey-Boss, San Francisco, 1992)."},{"key":"646_CR23","doi-asserted-by":"publisher","first-page":"5991","DOI":"10.1093\/nar\/20.22.5991","volume":"20","author":"A Dayn","year":"1992","unstructured":"Dayn, A., Malkhosyan, S. & Mirkin, S. M. Transcriptionally driven cruciform formation in vivo. Nucleic Acids Research \n                           20, 5991\u20135997 (1992).","journal-title":"Nucleic Acids Research"},{"key":"646_CR24","doi-asserted-by":"publisher","first-page":"4343","DOI":"10.1093\/nar\/13.12.4343","volume":"13","author":"DB Haniford","year":"1985","unstructured":"Haniford, D. B. & Pulleyblank, D. E. Transition of a cloned d(AT)n-d(AT)n tract to a cruciform in vivo. Nucleic Acids Research \n                           13, 4343\u20134363 (1985).","journal-title":"Nucleic Acids Research"},{"key":"646_CR25","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1016\/j.jmb.2005.03.010","volume":"348","author":"VN Potaman","year":"2005","unstructured":"Potaman, V. N., Shlyakhtenko, L. S., Oussatcheva, E. A., Lyubchenko, Y. L. & Soldatenkov, V. A. Specific binding of poly(ADP-ribose) polymerase-1 to cruciform hairpins. Journal of Molecular Biology \n                           348, 609\u2013615 (2005).","journal-title":"Journal of Molecular Biology"},{"key":"646_CR26","doi-asserted-by":"crossref","unstructured":"Lubliner, S., Keren, L. & Segal, E. Sequence features of yeast and human core promoters that are predictive of maximal promoter activity. Nucleic Acids Research (2013).","DOI":"10.1093\/nar\/gkt256"},{"key":"646_CR27","doi-asserted-by":"publisher","first-page":"1188","DOI":"10.1101\/gr.849004","volume":"14","author":"G Crooks","year":"2004","unstructured":"Crooks, G., Hon, G., Chandonia, J. & Brenner, S. WebLogo: A sequence logo generator. Genome Research \n                           14, 1188\u20131190 (2004).","journal-title":"Genome Research"},{"key":"646_CR28","doi-asserted-by":"crossref","unstructured":"Elbarbary, R. A., Lucas, B. A. & Maquat, L. E. Retrotransposons as regulators of gene expression. Science \n                           351 (2016).","DOI":"10.1126\/science.aac7247"},{"key":"646_CR29","doi-asserted-by":"publisher","first-page":"e64884","DOI":"10.1371\/journal.pone.0064884","volume":"8","author":"A Teixeira-Silva","year":"2013","unstructured":"Teixeira-Silva, A., Silva, R. M., Carneiro, J., Amorim, A. & Azevedo, L. The role of recombination in the origin and evolution of alu subfamilies. Plos One \n                           8, e64884 (2013).","journal-title":"Plos One"},{"key":"646_CR30","doi-asserted-by":"publisher","first-page":"860","DOI":"10.1038\/35057062","volume":"409","author":"ES Lander","year":"2001","unstructured":"Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature \n                           409, 860\u2013921 (2001).","journal-title":"Nature"},{"key":"646_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1093\/nar\/gkn923","volume":"37","author":"DW Huang","year":"2009","unstructured":"Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Research \n                           37, 1\u201313 (2009).","journal-title":"Nucleic Acids Research"},{"key":"646_CR32","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1038\/nprot.2008.211","volume":"4","author":"DW Huang","year":"2009","unstructured":"Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols \n                           4, 44\u201357 (2009).","journal-title":"Nature Protocols"},{"key":"646_CR33","doi-asserted-by":"crossref","unstructured":"Pratas, D., Silva, R. M., Pinho, A. J. & Ferreira, P. J. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences. Scientific Reports \n                           5 (2015).","DOI":"10.1038\/srep10203"},{"key":"646_CR34","doi-asserted-by":"publisher","first-page":"977","DOI":"10.1534\/g3.112.003061","volume":"2","author":"MS O\u2019Bleness","year":"2012","unstructured":"O\u2019Bleness, M. S. et al. Evolutionary history and genome organization of duf1220 protein domains. G3: Genes\u2014 Genomes\u2014 Genetics \n                           2, 977\u2013986 (2012).","journal-title":"G3: Genes\u2014 Genomes\u2014 Genetics"}],"container-title":["Scientific Reports"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41598-017-00646-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41598-017-00646-2","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41598-017-00646-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,23]],"date-time":"2022-12-23T03:36:00Z","timestamp":1671766560000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41598-017-00646-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,4,7]]},"references-count":34,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2017,12]]}},"alternative-id":["646"],"URL":"https:\/\/doi.org\/10.1038\/s41598-017-00646-2","relation":{},"ISSN":["2045-2322"],"issn-type":[{"value":"2045-2322","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,4,7]]},"assertion":[{"value":"11 August 2016","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 March 2017","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2017","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare that they have no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing Interests"}}],"article-number":"728"}}