{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T21:40:18Z","timestamp":1761946818250,"version":"3.37.3"},"reference-count":47,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2018,5,14]],"date-time":"2018-05-14T00:00:00Z","timestamp":1526256000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61372138"],"award-info":[{"award-number":["61372138"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018537","name":"National Science and Technology Major Project","doi-asserted-by":"crossref","award":["2018ZX10201002"],"award-info":[{"award-number":["2018ZX10201002"]}],"id":[{"id":"10.13039\/501100018537","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100013223","name":"Chongqing Research Program of Basic Research and Frontier Technology","doi-asserted-by":"crossref","award":["cstc2015jcyjA40026","cstc2016jcyjA0568"],"award-info":[{"award-number":["cstc2015jcyjA40026","cstc2016jcyjA0568"]}],"id":[{"id":"10.13039\/501100013223","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Chinese Chongqing Distinguish Youth Funding","award":["cstc2014jcyjjq40003"],"award-info":[{"award-number":["cstc2014jcyjjq40003"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>This study addresses several important questions related to naturally underrepresented sequences: (i) are there permutations of real genomic DNA sequences in a defined length (k-mer) and a given lineage that do not actually exist or underrepresented? (ii) If there are such sequences, what are their characteristics in terms of k-mer length and base composition? (iii) Are they related to CpG or TpA underrepresentation known for human sequences? We propose that the answers to these questions are of great significance for the study of sequence-associated regulatory mechanisms, such cytosine methylation and chromosomal structures in physiological or pathological conditions such as cancer.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We empirically defined sequences that were not included in any well-known public databases as lineage-associated underrepresented permutations (LAUPs). Then, we developed a Jellyfish-based LAUPs analysis application (JBLA) to investigate LAUPs for 24 representative species. The present discoveries include: (i) lengths for the shortest LAUPs, ranging from 10 to 14, which collectively constitute a low proportion of the genome. (ii) Common LAUPs showing higher CG content over the analysed mammalian genome and possessing distinct CG*CG motifs. (iii) Neither CpG-containing LAUPs nor CpG island sequences are randomly structured and distributed over the genomes; some LAUPs and most CpG-containing sequences exhibit an opposite trend within the same k and n variants. In addition, we demonstrate that the JBLA algorithm is more efficient than the original Jellyfish for computing LAUPs.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>We developed a Jellyfish-based LAUP analysis (JBLA) application by integrating Jellyfish (Mar\u00e7ais and Kingsford, 2011), MEME (Bailey, et al., 2009) and the NCBI genome database (Pruitt, et al., 2007) applications, which are listed as Supplementary Material.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty392","type":"journal-article","created":{"date-parts":[[2018,5,9]],"date-time":"2018-05-09T11:39:17Z","timestamp":1525865957000},"page":"3624-3630","source":"Crossref","is-referenced-by-count":40,"title":["Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA)"],"prefix":"10.1093","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3708-1727","authenticated-orcid":false,"given":"Le","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu, China"},{"name":"School of Computer and Information Science, Southwest University, Chongqing, China"}]},{"given":"Ming","family":"Xiao","sequence":"additional","affiliation":[{"name":"School of Computer and Information Science, Southwest University, Chongqing, China"},{"name":"College of Mobile Telecommunications, Chongqing University of Posts and Telecommunications, Chongqing, China"}]},{"given":"Jingsong","family":"Zhou","sequence":"additional","affiliation":[{"name":"College of Computer Science, Sichuan University, Chengdu, China"}]},{"given":"Jun","family":"Yu","sequence":"additional","affiliation":[{"name":"CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China"},{"name":"University of Chinese Academy of Sciences, Beijing, China"}]}],"member":"286","published-online":{"date-parts":[[2018,5,14]]},"reference":[{"key":"2023012712375493600_bty392-B1","doi-asserted-by":"crossref","first-page":"e1022.","DOI":"10.1371\/journal.pone.0001022","article-title":"Nullomers: really a matter of natural selection?","volume":"2","author":"Acquisti","year":"2007","journal-title":"PLos One"},{"key":"2023012712375493600_bty392-B2","doi-asserted-by":"crossref","first-page":"W202","DOI":"10.1093\/nar\/gkp335","article-title":"MEME SUITE: tools for motif discovery and searching","volume":"37","author":"Bailey","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B3","first-page":"28","article-title":"Fitting a mixture model by expectation maximization to discover motifs in biopolymers","volume":"2","author":"Bailey","year":"1994","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol"},{"key":"2023012712375493600_bty392-B4","doi-asserted-by":"crossref","first-page":"3.","DOI":"10.1186\/1471-2148-2-3","article-title":"Sequence permutations in the molecular evolution of DNA methyltransferases","volume":"2","author":"Bujnicki","year":"2002","journal-title":"BMC Evol. Biol"},{"key":"2023012712375493600_bty392-B5","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1128\/IAI.67.3.1116-1124.1999","article-title":"Evolutionary relationships of pathogenic clones of Vibrio cholerae by sequence analysis of four housekeeping genes","volume":"67","author":"Byun","year":"1999","journal-title":"Infect. Immun"},{"key":"2023012712375493600_bty392-B6","doi-asserted-by":"crossref","first-page":"6228","DOI":"10.1093\/nar\/gkn626","article-title":"A novel DNA sequence periodicity decodes nucleosome positioning","volume":"36","author":"Chen","year":"2008","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B7","doi-asserted-by":"crossref","first-page":"1997","DOI":"10.1016\/S0006-3495(00)76747-6","article-title":"Mechanical stability of single DNA molecules","volume":"78","author":"Clausen-Schaumann","year":"2000","journal-title":"Biophys. J"},{"key":"2023012712375493600_bty392-B8","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1016\/j.gene.2006.09.018","article-title":"Repetitive sequence environment distinguishes housekeeping genes","volume":"390","author":"Daniel Eller","year":"2007","journal-title":"Gene"},{"key":"2023012712375493600_bty392-B9","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1038\/nbt0406-423","article-title":"What are DNA sequence motifs?","volume":"24","author":"D\u2019Haeseleer","year":"2006","journal-title":"Nat. Biotechnol"},{"key":"2023012712375493600_bty392-B10","doi-asserted-by":"crossref","first-page":"11935","DOI":"10.1073\/pnas.94.22.11935","article-title":"Mechanical separation of the complementary strands of\u2009DNA","volume":"94","author":"Essevaz-Roulet","year":"1997","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712375493600_bty392-B11","doi-asserted-by":"crossref","first-page":"R140.","DOI":"10.1186\/gb-2007-8-7-r140","article-title":"Housekeeping genes tend to show reduced upstream sequence conservation","volume":"8","author":"Farr\u00e9","year":"2007","journal-title":"Genome Biol"},{"key":"2023012712375493600_bty392-B12","doi-asserted-by":"crossref","first-page":"2209","DOI":"10.3390\/molecules22122209","article-title":"Developing an agent-based drug model to investigate the synergistic effects of drug combinations","volume":"22","author":"Gao","year":"2017","journal-title":"Molecules"},{"key":"2023012712375493600_bty392-B13","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1016\/0022-2836(87)90689-9","article-title":"CpG islands in vertebrate genomes","volume":"196","author":"Gardiner-Garden","year":"1987","journal-title":"J. Mol. Biol"},{"key":"2023012712375493600_bty392-B14","doi-asserted-by":"crossref","first-page":"505.","DOI":"10.1016\/0022-2836(76)90284-9","article-title":"Limited permutations of the nucleotide sequence in bacteriophage T1 DNA","volume":"104","author":"Gill","year":"1976","journal-title":"J. Mol. Biol"},{"key":"2023012712375493600_bty392-B15","first-page":"355","article-title":"Absent sequences: nullomers and primes","volume":"12","author":"Hampikian","year":"2007","journal-title":"Pac. Symp. Biocomput"},{"key":"2023012712375493600_bty392-B16","doi-asserted-by":"crossref","first-page":"R79","DOI":"10.1186\/gb-2008-9-5-r79","article-title":"CpG island density and its correlations with genomic features in mammalian genomes","volume":"9","author":"Han","year":"2008","journal-title":"Genome Biol"},{"key":"2023012712375493600_bty392-B17","doi-asserted-by":"crossref","first-page":"167.","DOI":"10.1186\/1471-2105-9-167","article-title":"Efficient computation of absent words in genomic sequences","volume":"9","author":"Herold","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012712375493600_bty392-B18","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1007\/PL00006529","article-title":"Circular permutations in the molecular evolution of DNA methyltransferases","volume":"49","author":"Jeltsch","year":"1999","journal-title":"J. Mol. Evol"},{"key":"2023012712375493600_bty392-B19","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.matcom.2014.07.003","article-title":"Novel 3D GPU based numerical parallel diffusion algorithms in cylindrical coordinates for health care simulation","volume":"109","author":"Jiang","year":"2015","journal-title":"Math. Comput. Simulat"},{"key":"2023012712375493600_bty392-B20","doi-asserted-by":"crossref","first-page":"1829","DOI":"10.1002\/cnm.1444","article-title":"Employing graphics processing unit technology, alternating direction implicit method and domain decomposition to speed up the numerical diffusion solver for the biomedical engineering research","volume":"27","author":"Jiang","year":"2011","journal-title":"Int. J. Numer. Meth. Bio"},{"key":"2023012712375493600_bty392-B21","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1002\/9781118347300.ch6","volume-title":"Asymmetric Synthesis of Natural Products","author":"Koskinen","year":"2012"},{"key":"2023012712375493600_bty392-B22","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.gene.2007.09.017","article-title":"Housekeeping and tissue-specific genes differ in simple sequence repeats in the 5 \u2018-UTR region","volume":"407","author":"Lawson","year":"2008","journal-title":"Gene"},{"key":"2023012712375493600_bty392-B23","doi-asserted-by":"crossref","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012712375493600_bty392-B24","doi-asserted-by":"crossref","first-page":"16.","DOI":"10.1002\/9780470110607.ch2","article-title":"The GenBank sequence database","volume":"39","author":"Ouellette","year":"1998","journal-title":"Methods Biochem. Anal"},{"key":"2023012712375493600_bty392-B25","doi-asserted-by":"crossref","first-page":"9164","DOI":"10.1093\/nar\/gkx548","article-title":"CpG and methylation-dependent DNA binding and dynamics of the methylcytosine binding domain 2 protein at the single-molecule level","volume":"45","author":"Pan","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B26","doi-asserted-by":"crossref","first-page":"1899","DOI":"10.1093\/bioinformatics\/btu133","article-title":"Characterization of p38 MAPK isoforms for drug resistance study using systems biology approach","volume":"30","author":"Peng","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712375493600_bty392-B27","doi-asserted-by":"crossref","first-page":"512.","DOI":"10.1016\/j.bpj.2016.12.029","article-title":"Optical trapping nanometry of hypermethylated CPG-island DNA","volume":"112","author":"Pongor","year":"2017","journal-title":"Biophys. J"},{"key":"2023012712375493600_bty392-B28","doi-asserted-by":"crossref","first-page":"D61","DOI":"10.1093\/nar\/gkl842","article-title":"NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"35","author":"Pruitt","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B29","doi-asserted-by":"crossref","first-page":"67.","DOI":"10.1186\/1471-2164-9-67","article-title":"All and only CpG containing sequences are enriched in promoters abundantly bound by RNA polymerase II in multiple tissues","volume":"9","author":"Rozenberg","year":"2008","journal-title":"BMC Genomics"},{"key":"2023012712375493600_bty392-B30","doi-asserted-by":"crossref","first-page":"6097","DOI":"10.1093\/nar\/18.20.6097","article-title":"Sequence logos: a new way to display consensus sequences","volume":"18","author":"Schneider","year":"1990","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B31","doi-asserted-by":"crossref","first-page":"1863.","DOI":"10.1021\/ja00112a001","article-title":"Hydrophobic, non-hydrogen-bonding bases and base pairs in DNA","volume":"117","author":"Schweitzer","year":"1995","journal-title":"J. Am. Chem. Soc"},{"key":"2023012712375493600_bty392-B32","doi-asserted-by":"crossref","first-page":"94.","DOI":"10.1016\/S0006-291X(86)80084-5","article-title":"Frequent occurrence of short complementary sequences in nucleic acids","volume":"139","author":"Segerst\u00e9en","year":"1986","journal-title":"Biochem. Biophys. Res. Commun"},{"key":"2023012712375493600_bty392-B33","first-page":"29","article-title":"The EMBL nucleotide sequence database","volume":"33","author":"Stoesser","year":"1999","journal-title":"Mol. Biotechnol"},{"key":"2023012712375493600_bty392-B34","doi-asserted-by":"crossref","first-page":"3740","DOI":"10.1073\/pnas.052410099","article-title":"Comprehensive analysis of CpG islands in human chromosomes 21 and 22","volume":"99","author":"Takai","year":"2002","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712375493600_bty392-B35","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/30.1.27","article-title":"DNA data bank of Japan (DDBJ) for genome scale research in life science","volume":"30","author":"Tateno","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B36","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/S0168-1656(99)00163-7","article-title":"Housekeeping genes as internal standards: use and limits","volume":"75","author":"Thellin","year":"1999","journal-title":"J. Biotechnol"},{"key":"2023012712375493600_bty392-B37","doi-asserted-by":"crossref","first-page":"4385","DOI":"10.1093\/nar\/12.10.4385","article-title":"CG dinucleotide clusters in MHC genes and in 5\u2019 demethylated genes","volume":"12","author":"Tykocinski","year":"1984","journal-title":"Nucleic Acids Res"},{"key":"2023012712375493600_bty392-B38","doi-asserted-by":"crossref","first-page":"e0164540.","DOI":"10.1371\/journal.pone.0164540","article-title":"Nullomers and high order nullomers in genomic sequences","volume":"11","author":"Vergni","year":"2016","journal-title":"PLoS One"},{"key":"2023012712375493600_bty392-B39","doi-asserted-by":"crossref","first-page":"706.","DOI":"10.1093\/nar\/28.3.706","article-title":"Structural analysis of DNA sequence: evidence for lateral gene transfer in Thermotoga maritima","volume":"28","author":"Worning","year":"2000","journal-title":"Nucleic Acids Res"},{"first-page":"S119","year":"2015","author":"Yang","key":"2023012712375493600_bty392-B40"},{"key":"2023012712375493600_bty392-B41","doi-asserted-by":"crossref","first-page":"8452","DOI":"10.1073\/pnas.86.21.8452","article-title":"Concordant evolution of coding and noncoding regions of DNA made possible by the universal rule of TA\/CG deficiency-TG\/CT excess","volume":"86","author":"Yomo","year":"1989","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012712375493600_bty392-B42","doi-asserted-by":"crossref","first-page":"1845.","DOI":"10.1128\/MCB.01124-12","article-title":"Developmentally programmed 3\u2019 CpG island methylation confers tissue- and cell-type-specific transcriptional activation","volume":"33","author":"Yu","year":"2013","journal-title":"Mol. Cell. Biol"},{"key":"2023012712375493600_bty392-B44","doi-asserted-by":"crossref","first-page":"14877","DOI":"10.1039\/C6NR01637E","article-title":"Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization\/validation","volume":"8","author":"Zhang","year":"2016","journal-title":"Nanoscale"},{"key":"2023012712375493600_bty392-B43","doi-asserted-by":"crossref","first-page":"477","DOI":"10.1093\/jmcb\/mjx056","article-title":"EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients","volume":"9","author":"Zhang","year":"2017","journal-title":"J. Mol. Cell Biol"},{"key":"2023012712375493600_bty392-B45","first-page":"1","article-title":"Building up a robust risk mathematical platform to predict colorectal cancer","volume":"2017","author":"Zhang","year":"2017","journal-title":"Complexity"},{"key":"2023012712375493600_bty392-B47","doi-asserted-by":"crossref","first-page":"9143","DOI":"10.1038\/srep09143","article-title":"Determination of base binding strength and base stacking interaction of DNA duplex using atomic force microscope.","volume":"5","author":"Zhang","year":"2015","journal-title":"Sci Rep."},{"key":"2023012712375493600_bty392-B46","doi-asserted-by":"crossref","first-page":"481.","DOI":"10.1016\/j.tig.2008.08.004","article-title":"On the nature of human housekeeping genes","volume":"24","author":"Zhu","year":"2008","journal-title":"Trends Genet"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/21\/3624\/48921372\/bioinformatics_34_21_3624.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/21\/3624\/48921372\/bioinformatics_34_21_3624.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,2]],"date-time":"2023-09-02T14:28:03Z","timestamp":1693664883000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/21\/3624\/4995845"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,5,14]]},"references-count":47,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2018,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty392","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2018,11,1]]},"published":{"date-parts":[[2018,5,14]]}}}