{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:15Z","timestamp":1772138055876,"version":"3.50.1"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2022,9,1]],"date-time":"2022-09-01T00:00:00Z","timestamp":1661990400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Greenberg and Ilan Shomorony","award":["ECCB2022"],"award-info":[{"award-number":["ECCB2022"]}]},{"name":"National Science Foundation CAREER Award","award":["CCF-2046991"],"award-info":[{"award-number":["CCF-2046991"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,16]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>The complexity of genome assembly is due in large part to the presence of repeats. In particular, large reverse-complemented repeats can lead to incorrect inversions of large segments of the genome. To detect and correct such inversions in finished bacterial genomes, we propose a statistical test based on tetranucleotide frequency (TNF), which determines whether two segments from the same genome are of the same or opposite orientation. In most cases, the test neatly partitions the genome into two segments of roughly equal length with seemingly opposite orientations. This corresponds to the segments between the DNA replication origin and terminus, which were previously known to have distinct nucleotide compositions. We show that, in several cases where this balanced partition is not observed, the test identifies a potential inverted misassembly, which is validated by the presence of a reverse-complemented repeat at the boundaries of the inversion. After inverting the sequence between the repeat, the balance of the misassembled genome is restored. Our method identifies 31 potential misassemblies in the NCBI database, several of which are further supported by a reassembly of the read data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>A github repository is available at https:\/\/github.com\/gcgreenberg\/Oriented-TNF.git.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac516","type":"journal-article","created":{"date-parts":[[2022,8,24]],"date-time":"2022-08-24T12:52:23Z","timestamp":1661345543000},"page":"ii34-ii41","source":"Crossref","is-referenced-by-count":2,"title":["Improving bacterial genome assembly using a test of strand orientation"],"prefix":"10.1093","volume":"38","author":[{"given":"Grant","family":"Greenberg","sequence":"first","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign"}]},{"given":"Ilan","family":"Shomorony","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign"}]}],"member":"286","published-online":{"date-parts":[[2022,9,18]]},"reference":[{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"Spades: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/bib\/bbx120","article-title":"A review of methods and databases for metagenomic classification and assembly","volume":"20","author":"Breitwieser","year":"2017","journal-title":"Brief. Bioinformat"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read smrt sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat. Methods"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"D67","DOI":"10.1093\/nar\/gkv1276","article-title":"Genbank","volume":"44","author":"Clark","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023041408003799100_","volume-title":"Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)","author":"Cover","year":"2006"},{"key":"2023041408003799100_","article-title":"The metagenomic binning problem: clustering markov sequences","author":"Greenberg","year":"2019","journal-title":"In:2019 IEEE Information Theory Workshop (ITW), pp. 1\u20135."},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"101389","DOI":"10.1016\/j.isci.2020.101389","article-title":"Haslr: fast hybrid assembly of long reads","volume":"23","author":"Haghshenas","year":"2020","journal-title":"iScience"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1101\/gr.216465.116","article-title":"HINGE: long-read assembly achieves optimal repeat resolution","volume":"27","author":"Kamath","year":"2017","journal-title":"Genome Res"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"e1165","DOI":"10.7717\/peerj.1165","article-title":"MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities","volume":"3","author":"Kang","year":"2015","journal-title":"PeerJ"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s41587-019-0072-8","article-title":"Assembly of long, error-prone reads using repeat graphs","volume":"37","author":"Kolmogorov","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"784","DOI":"10.1186\/s12864-017-4162-z","article-title":"eRP arrangement: a strategy for assembled genomic contig rearrangement based on replication profiling in bacteria","volume":"18","author":"Kono","year":"2017","journal-title":"BMC Genomics"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1038\/nbt.2280","article-title":"Hybrid error correction and de novo assembly of single-molecule sequencing reads","volume":"30","author":"Koren","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"R12","DOI":"10.1186\/gb-2004-5-2-r12","article-title":"Versatile and open software for comparing large genomes","volume":"5","author":"Kurtz","year":"2004","journal-title":"Genome Biol"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1038\/nature12506","article-title":"Richness of human gut microbiome correlates with metabolic markers","volume":"500","author":"Le Chatelier","year":"2013","journal-title":"Nature"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The sequence read archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1126\/science.1181369","article-title":"Comprehensive mapping of long-range interactions reveals folding principles of the human genome","volume":"326","author":"Lieberman-Aiden","year":"2009","journal-title":"Science (New York, NY)"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1093\/bioinformatics\/btw290","article-title":"COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge","volume":"33","author":"Lu","year":"2017","journal-title":"Bioinformatics"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1007\/PL00006428","article-title":"Base composition skews, replication orientation, and gene orientation in 12 prokaryote genomes","volume":"47","author":"McLean","year":"1998","journal-title":"J. Mol. Evol"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"4662","DOI":"10.1038\/s41467-018-07110-3","article-title":"Gene inversion potentiates bacterial evolvability and virulence","volume":"9","author":"Merrikh","year":"2018","journal-title":"Nat. Commun"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","DOI":"10.1017\/9781107185920","volume-title":"Statistical Inference for Engineers and Data Scientists","author":"Moulin","year":"2018"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"1163","DOI":"10.1093\/molbev\/msp032","article-title":"Phylogenetic signals in dna composition: limitations and prospects","volume":"26","author":"Mr\u00e1zek","year":"2009","journal-title":"Mol. Biol. Evol"},{"key":"2023041408003799100_","author":"National Center for Biotechnology Information","year":"1988"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"528","DOI":"10.1002\/elps.1150190412","article-title":"Tetranucleotide frequencies in microbial genomes","volume":"19","author":"Noble","year":"1998","journal-title":"Electrophoresis"},{"key":"2023041408003799100_","author":"Nurk","year":"2017"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An eulerian path approach to dna fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl. Acad. Sci. U S A"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1101\/gr.335003","article-title":"Evolutionary implications of microbial genome tetranucleotide frequency biases","volume":"13","author":"Pride","year":"2003","journal-title":"Genome Res"},{"key":"2023041408003799100_","author":"Public Health England, Pacific Biosciences, and Wellcome Sanger Institute","year":"2014"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1186\/1471-2164-4-17","article-title":"Wavelet to predict bacterial ori and ter: a tendency towards a physical balance","volume":"4","author":"Song","year":"2003","journal-title":"BMC Genomics"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"804","DOI":"10.1038\/nature06244","article-title":"The human microbiome project","volume":"449","author":"Turnbaugh","year":"2007","journal-title":"Nature"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1101\/gr.214270.116","article-title":"Fast and accurate de novo genome assembly from long uncorrected reads","volume":"27","author":"Vaser","year":"2017","journal-title":"Genome Res"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/2049-2618-2-26","article-title":"Maxbin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm","volume":"2","author":"Wu","year":"2014","journal-title":"Microbiome"},{"key":"2023041408003799100_","first-page":"1895","article-title":"Svdetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data","volume":"26","author":"Zeitouni","year":"2010","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2023041408003799100_","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii34\/49886631\/btac516.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii34\/49886631\/btac516.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T23:24:20Z","timestamp":1700954660000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_2\/ii34\/6701993"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,1]]},"references-count":34,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2022,9,16]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac516","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.07.06.499059","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,9,1]]}}}