{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T03:32:04Z","timestamp":1775791924008,"version":"3.50.1"},"reference-count":25,"publisher":"Public Library of Science (PLoS)","issue":"12","license":[{"start":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:00:00Z","timestamp":1607040000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01-HG006677"],"award-info":[{"award-number":["R01-HG006677"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R35-GM130151"],"award-info":[{"award-number":["R35-GM130151"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IOS-1744309"],"award-info":[{"award-number":["IOS-1744309"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI\u2019s Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/jenniferlu717.shinyapps.io\/SkewIT\/\" xlink:type=\"simple\">https:\/\/jenniferlu717.shinyapps.io\/SkewIT\/<\/jats:ext-link>\n                    that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository:\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/jenniferlu717\/SkewIT\" xlink:type=\"simple\">https:\/\/github.com\/jenniferlu717\/SkewIT<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1008439","type":"journal-article","created":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T13:38:05Z","timestamp":1607089085000},"page":"e1008439","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":62,"title":["SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes"],"prefix":"10.1371","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9167-2002","authenticated-orcid":true,"given":"Jennifer","family":"Lu","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8859-7432","authenticated-orcid":true,"given":"Steven L.","family":"Salzberg","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2020,12,4]]},"reference":[{"issue":"Database issue","key":"pcbi.1008439.ref001","first-page":"D7","article-title":"Database resources of the National Center for Biotechnology Information","volume":"42","author":"NCBI Resource Coordinators","year":"2014","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"pcbi.1008439.ref002","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"NA O\u2019Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"pcbi.1008439.ref003","doi-asserted-by":"crossref","first-page":"954","DOI":"10.1101\/gr.245373.118","article-title":"Human contamination in bacterial genomes has created thousands of spurious proteins","volume":"29","author":"FP Breitwieser","year":"2019","journal-title":"Genome Res"},{"issue":"2","key":"pcbi.1008439.ref004","doi-asserted-by":"crossref","first-page":"e16410","DOI":"10.1371\/journal.pone.0016410","article-title":"Abundant human DNA contamination identified in non-primate genome databases","volume":"6","author":"MS Longo","year":"2011","journal-title":"PLoS One"},{"key":"pcbi.1008439.ref005","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/1944-3277-10-18","article-title":"Large-scale contamination of microbial isolate genomes by Illumina PhiX control","volume":"10","author":"S Mukherjee","year":"2015","journal-title":"Stand Genomic Sci"},{"issue":"9","key":"pcbi.1008439.ref006","doi-asserted-by":"crossref","first-page":"e0162424","DOI":"10.1371\/journal.pone.0162424","article-title":"Human Contamination in Public Genome Assemblies","volume":"11","author":"K Kryukov","year":"2016","journal-title":"PLoS One"},{"key":"pcbi.1008439.ref007","article-title":"Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank","author":"M Steinegger","year":"2020","journal-title":"bioRxiv"},{"issue":"5","key":"pcbi.1008439.ref008","doi-asserted-by":"crossref","first-page":"660","DOI":"10.1093\/oxfordjournals.molbev.a025626","article-title":"Asymmetric substitution patterns in the two DNA strands of bacteria","volume":"13","author":"JR Lobry","year":"1996","journal-title":"Mol Biol Evol"},{"issue":"10","key":"pcbi.1008439.ref009","doi-asserted-by":"crossref","first-page":"2286","DOI":"10.1093\/nar\/26.10.2286","article-title":"Analyzing genomes with cumulative skew diagrams","volume":"26","author":"A Grigoriev","year":"1998","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"pcbi.1008439.ref010","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1073\/pnas.59.2.598","article-title":"Mechanism of DNA chain growth. I. Possible discontinuity and unusual secondary structure of newly synthesized chains","volume":"59","author":"R Okazaki","year":"1968","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"8","key":"pcbi.1008439.ref011","doi-asserted-by":"crossref","first-page":"2176","DOI":"10.1073\/pnas.1522325113","article-title":"Strand-biased cytosine deamination at the replication fork causes cytosine to thymine mutations in Escherichia coli","volume":"113","author":"AS Bhagwat","year":"2016","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"1","key":"pcbi.1008439.ref012","doi-asserted-by":"crossref","first-page":"65","DOI":"10.1016\/S0378-1119(99)00297-8","article-title":"Asymmetric substitution patterns: a review of possible underlying mutational or selective mechanisms","volume":"238","author":"AC Frank","year":"1999","journal-title":"Gene"},{"issue":"2","key":"pcbi.1008439.ref013","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1046\/j.1365-2958.1999.01368.x","article-title":"Physical mapping of an origin of bidirectional replication at the centre of the Borrelia burgdorferi linear chromosome","volume":"32","author":"M Picardeau","year":"1999","journal-title":"Mol Microbiol"},{"issue":"6660","key":"pcbi.1008439.ref014","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/37551","article-title":"Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi","volume":"390","author":"CM Fraser","year":"1997","journal-title":"Nature"},{"issue":"5331","key":"pcbi.1008439.ref015","doi-asserted-by":"crossref","first-page":"1453","DOI":"10.1126\/science.277.5331.1453","article-title":"The complete genome sequence of Escherichia coli K-12","volume":"277","author":"FR Blattner","year":"1997","journal-title":"Science"},{"issue":"6","key":"pcbi.1008439.ref016","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1007\/PL00006428","article-title":"Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes","volume":"47","author":"MJ McLean","year":"1998","journal-title":"J Mol Evol"},{"issue":"1","key":"pcbi.1008439.ref017","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1046\/j.1365-2958.1999.01334.x","article-title":"Universal replication biases in bacteria","volume":"32","author":"EP Rocha","year":"1999","journal-title":"Mol Microbiol"},{"issue":"2","key":"pcbi.1008439.ref018","doi-asserted-by":"crossref","first-page":"e0171408","DOI":"10.1371\/journal.pone.0171408","article-title":"Quantitative analysis of correlation between AT and GC biases among bacterial genomes","volume":"12","author":"G Zhang","year":"2017","journal-title":"PLoS One"},{"key":"pcbi.1008439.ref019","first-page":"808410","article-title":"Accurate and Complete Genomes from Metagenomes","author":"LX Chen","year":"2019","journal-title":"bioRxiv"},{"issue":"8","key":"pcbi.1008439.ref020","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"A Gurevich","year":"2013","journal-title":"Bioinformatics"},{"issue":"R47","key":"pcbi.1008439.ref021","article-title":"REAPR: a universal tool for genome assembly evaluation","volume":"14","author":"M Hunt","year":"2013","journal-title":"Genome Biol"},{"issue":"386","key":"pcbi.1008439.ref022","article-title":"misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads","volume":"16","author":"X Zhu","year":"2015","journal-title":"BMC Bioinformatics"},{"issue":"11","key":"pcbi.1008439.ref023","doi-asserted-by":"crossref","first-page":"2478","DOI":"10.1093\/nar\/30.11.2478","article-title":"Fast algorithms for large-scale genome alignment and comparison","volume":"30","author":"AL Delcher","year":"2002","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"pcbi.1008439.ref024","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"B Langmead","year":"2012","journal-title":"Nat Methods"},{"issue":"2","key":"pcbi.1008439.ref025","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/s41559-017-0425-y","article-title":"Evolutionary Determinants of Genome-Wide Nucleotide Composition","volume":"2","author":"H Long","year":"2018","journal-title":"Nature Ecology & Evolution"}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008439","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T13:38:31Z","timestamp":1607089111000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008439"}},"subtitle":[],"editor":[{"given":"Andrey","family":"Rzhetsky","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,4]]},"references-count":25,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12,4]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008439","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.02.27.968214","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,4]]}}}