{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:42:35Z","timestamp":1753875755354,"version":"3.41.2"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T00:00:00Z","timestamp":1687564800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100009619","name":"Japan Agency for Medical Research and Development","doi-asserted-by":"publisher","award":["22tm0424219h0002"],"award-info":[{"award-number":["22tm0424219h0002"]}],"id":[{"id":"10.13039\/100009619","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Diploid assembly, or determining sequences of homologous chromosomes separately, is essential to elucidate genetic differences between haplotypes. One approach is to call and phase single nucleotide variants (SNVs) on a reference sequence. However, this approach becomes unstable on large segmental duplications (SDs) or structural variations (SVs) because the alignments of reads deriving from these regions tend to be unreliable. Another approach is to use highly accurate PacBio HiFi reads to output diploid assembly directly. Nonetheless, HiFi reads cannot phase homozygous regions longer than their length and require oxford nanopore technology (ONT) reads or Hi-C to produce a fully phased assembly. Is a single long-read sequencing technology sufficient to create an accurate diploid assembly?<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we present JTK, a megabase-scale diploid genome assembler. It first randomly samples kilobase-scale sequences (called \u2018chunks\u2019) from the long reads, phases variants found on them, and produces two haplotypes. The novel idea of JTK is to utilize chunks to capture SNVs and SVs simultaneously. From 60-fold ONT reads on the HG002 and a Japanese sample, it fully assembled two haplotypes with approximately 99.9% accuracy on the histocompatibility complex (MHC) and the leukocyte receptor complex (LRC) regions, which was impossible by the reference-based approach. In addition, in the LRC region on a Japanese sample, JTK output an assembly of better contiguity than those built from high-coverage HiFi+Hi-C. In the coming age of pan-genomics, JTK would complement the reference-based phasing method to assemble the difficult-to-assemble but medically important regions.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>JTK is available at https:\/\/github.com\/ban-m\/jtk, and the datasets are available at https:\/\/doi.org\/10.5281\/zenodo.7790310 or JGAS000580 in DDBJ.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad398","type":"journal-article","created":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T20:14:40Z","timestamp":1687637680000},"source":"Crossref","is-referenced-by-count":1,"title":["JTK: targeted diploid genome assembler"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7123-3018","authenticated-orcid":false,"given":"Bansho","family":"Masutani","sequence":"first","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoshihiko","family":"Suzuki","sequence":"additional","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuta","family":"Suzuki","sequence":"additional","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shinichi","family":"Morishita","sequence":"additional","affiliation":[{"name":"Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo , Chiba 277-8562, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,6,24]]},"reference":[{"key":"2023070505193406500_btad398-B1","doi-asserted-by":"crossref","first-page":"e116","DOI":"10.7717\/peerj-cs.116","article-title":"Alitv\u2014interactive visualization of whole genome comparisons","volume":"3","author":"Ankenbrand","year":"2017","journal-title":"PeerJ Comput Sci"},{"key":"2023070505193406500_btad398-B2","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/s41592-020-01056-5","article-title":"Haplotype-resolved de novo assembly using phased assembly graphs with HiFiASM","volume":"18","author":"Cheng","year":"2021","journal-title":"Nat Methods"},{"key":"2023070505193406500_btad398-B3","doi-asserted-by":"crossref","first-page":"1332","DOI":"10.1038\/s41587-022-01261-x","article-title":"Haplotype-resolved assembly of diploid genomes without parental data","volume":"40","author":"Cheng","year":"2022","journal-title":"Nat Biotechnol"},{"key":"2023070505193406500_btad398-B4","doi-asserted-by":"crossref","first-page":"563","DOI":"10.1038\/nmeth.2474","article-title":"Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data","volume":"10","author":"Chin","year":"2013","journal-title":"Nat Methods"},{"key":"2023070505193406500_btad398-B5","doi-asserted-by":"crossref","first-page":"4660","DOI":"10.1038\/s41467-019-12493-y","article-title":"Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing","volume":"10","author":"Edge","year":"2019","journal-title":"Nat Commun"},{"key":"2023070505193406500_btad398-B6","doi-asserted-by":"crossref","first-page":"108","DOI":"10.1186\/s13059-022-02670-6","article-title":"Deeprepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing","volume":"23","author":"Fang","year":"2022","journal-title":"Genome Biol"},{"key":"2023070505193406500_btad398-B7","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1038\/s41587-020-0711-0","article-title":"Chromosome-scale, haplotype-resolved assembly of human genomes","volume":"39","author":"Garg","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2023070505193406500_btad398-B8","doi-asserted-by":"crossref","first-page":"2853","DOI":"10.1093\/bioinformatics\/bty1046","article-title":"GfaViz: flexible and interactive visualization of GFA sequence graphs","volume":"35","author":"Gonnella","year":"2019","journal-title":"Bioinformatics"},{"author":"Houwaart","key":"2023070505193406500_btad398-B9"},{"key":"2023070505193406500_btad398-B10","doi-asserted-by":"crossref","first-page":"519","DOI":"10.1038\/s41586-022-05325-5","article-title":"Semi-automated assembly of high-quality diploid human reference genomes","volume":"611","author":"Jarvis","year":"2022","journal-title":"Nature"},{"key":"2023070505193406500_btad398-B11","doi-asserted-by":"crossref","first-page":"1597","DOI":"10.1101\/gr.218891.116","article-title":"Assembly and analysis of 100 full MHC haplotypes from the Danish population","volume":"27","author":"Jensen","year":"2017","journal-title":"Genome Res"},{"key":"2023070505193406500_btad398-B12","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1186\/s13059-020-02107-y","article-title":"Long-read-based human genomic structural variation detection with cuteSV","volume":"21","author":"Jiang","year":"2020","journal-title":"Genome Biol"},{"key":"2023070505193406500_btad398-B13","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1038\/s41587-019-0072-8","article-title":"Assembly of long, error-prone reads using repeat graphs","volume":"37","author":"Kolmogorov","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023070505193406500_btad398-B14","doi-asserted-by":"crossref","first-page":"1026","DOI":"10.1093\/bioinformatics\/btm039","article-title":"Gepard: a rapid and sensitive tool for creating dotplots on genome scale","volume":"23","author":"Krumsiek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023070505193406500_btad398-B15","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1038\/s41587-019-0054-x","article-title":"Best practices for benchmarking germline small-variant calls in human genomes","volume":"37","author":"Krusche","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023070505193406500_btad398-B16","doi-asserted-by":"crossref","first-page":"2555","DOI":"10.1093\/molbev\/msw127","article-title":"Excess of deleterious mutations around HLA genes reveals evolutionary cost of balancing selection","volume":"33","author":"Lenz","year":"2016","journal-title":"Mol Biol Evol"},{"key":"2023070505193406500_btad398-B17","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1038\/s41592-018-0054-7","article-title":"A synthetic-diploid benchmark for accurate variant-calling evaluation","volume":"15","author":"Li","year":"2018","journal-title":"Nat Methods"},{"key":"2023070505193406500_btad398-B18","doi-asserted-by":"crossref","first-page":"1816","DOI":"10.1093\/bioinformatics\/btac058","article-title":"LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants","volume":"38","author":"Lin","year":"2022","journal-title":"Bioinformatics"},{"key":"2023070505193406500_btad398-B19","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1186\/s13059-021-02512-x","article-title":"Phasebook: haplotype-aware de novo assembly of diploid genomes from long reads","volume":"22","author":"Luo","year":"2021","journal-title":"Genome Biol"},{"key":"2023070505193406500_btad398-B20","doi-asserted-by":"crossref","first-page":"4664","DOI":"10.1182\/blood-2009-10-251157","article-title":"Impact of highly conserved HLA haplotype on acute graft-versus-host disease","volume":"115","author":"Morishima","year":"2010","journal-title":"Blood"},{"year":"2022","author":"Nie","key":"2023070505193406500_btad398-B21"},{"key":"2023070505193406500_btad398-B22","doi-asserted-by":"crossref","first-page":"1291","DOI":"10.1101\/gr.263566.120","article-title":"HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads","volume":"30","author":"Nurk","year":"2020","journal-title":"Genome Res"},{"key":"2023070505193406500_btad398-B23","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/nbt.4235","article-title":"A universal SNP and small-indel variant caller using deep neural networks","volume":"36","author":"Poplin","year":"2018","journal-title":"Nat Biotechnol"},{"author":"Porubsky","key":"2023070505193406500_btad398-B24"},{"key":"2023070505193406500_btad398-B25","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J Am Stat Assoc"},{"author":"Rautiainen","key":"2023070505193406500_btad398-B26"},{"key":"2023070505193406500_btad398-B27","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1093\/bioinformatics\/btaa1016","article-title":"Liftoff: accurate mapping of gene annotations","volume":"37","author":"Shumate","year":"2021","journal-title":"Bioinformatics"},{"key":"2023070505193406500_btad398-B28","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1016\/j.coi.2005.07.015","article-title":"HLA genomics in the third millennium","volume":"17","author":"Trowsdale","year":"2005","journal-title":"Curr Opin Immunol"},{"key":"2023070505193406500_btad398-B29","doi-asserted-by":"crossref","first-page":"eabj6965","DOI":"10.1126\/science.abj6965","article-title":"Segmental duplications and their variation in a complete human genome","volume":"376","author":"Vollger","year":"2022","journal-title":"Science"},{"key":"2023070505193406500_btad398-B30","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/gigascience\/gix010","article-title":"NanoSim: nanopore sequence read simulator based on statistical characterization","volume":"6","author":"Yang","year":"2017","journal-title":"Gigascience"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad398\/50696362\/btad398.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad398\/50800914\/btad398.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/7\/btad398\/50800914\/btad398.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,5]],"date-time":"2023-07-05T13:37:23Z","timestamp":1688564243000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad398\/7206882"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2023,6,24]]},"references-count":30,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2023,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad398","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2023,7,1]]},"published":{"date-parts":[[2023,6,24]]},"article-number":"btad398"}}