{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:11:12Z","timestamp":1772165472022,"version":"3.50.1"},"reference-count":35,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,16]],"date-time":"2025-01-16T00:00:00Z","timestamp":1736985600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,16]],"date-time":"2025-01-16T00:00:00Z","timestamp":1736985600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Pacific Biosciences (PacBio) circular consensus sequencing (CCS), also known as high fidelity (HiFi) technology, has revolutionized modern genomics by producing long (10\u2009+\u2009kb) and highly accurate reads. This is achieved by sequencing circularized DNA molecules multiple times and combining them into a consensus sequence. Currently, the accuracy and quality value estimation provided by HiFi technology are more than sufficient for applications such as genome assembly and germline variant calling. However, there are limitations in the accuracy of the estimated quality scores when it comes to somatic variant calling on single reads.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To address the challenge of inaccurate quality scores for somatic variant calling, we introduce TopoQual, a novel tool designed to enhance the accuracy of base quality predictions. TopoQual leverages techniques including partial order alignments (POA), topologically parallel bases, and deep learning algorithms to polish consensus sequences. Our results demonstrate that TopoQual corrects approximately 31.9% of errors in PacBio consensus sequences. Additionally, it validates base qualities up to q59, which corresponds to one error in 0.9 million bases. These improvements will significantly enhance the reliability of somatic variant calling using HiFi data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusion<\/jats:title>\n                    <jats:p>TopoQual represents a significant advancement in genomics by improving the accuracy of base quality predictions for PacBio HiFi sequencing data. By correcting a substantial proportion of errors and achieving high base quality validation, TopoQual enables confident and accurate somatic variant calling. This tool not only addresses a critical limitation of current HiFi technology but also opens new possibilities for precise genomic analysis in various research and clinical applications.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-024-06020-0","type":"journal-article","created":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T22:17:21Z","timestamp":1736979441000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["TopoQual polishes circular consensus sequencing data and accurately predicts quality scores"],"prefix":"10.1186","volume":"26","author":[{"given":"Minindu","family":"Weerakoon","sequence":"first","affiliation":[]},{"given":"Sangjin","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Emily","family":"Mitchell","sequence":"additional","affiliation":[]},{"given":"Haynes","family":"Heaton","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,16]]},"reference":[{"issue":"6127","key":"6020_CR1","doi-asserted-by":"publisher","first-page":"1546","DOI":"10.1126\/science.1235122","volume":"339","author":"B Vogelstein","year":"2013","unstructured":"Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW. Cancer genome landscapes. Science. 2013;339(6127):1546\u201358.","journal-title":"Science"},{"issue":"6255","key":"6020_CR2","doi-asserted-by":"publisher","first-page":"1483","DOI":"10.1126\/science.aab4082","volume":"349","author":"I Martincorena","year":"2015","unstructured":"Martincorena I, Campbell PJ. Somatic mutation in cancer and normal cells. Science. 2015;349(6255):1483\u20139.","journal-title":"Science"},{"issue":"1","key":"6020_CR3","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/S0092-8674(00)81683-9","volume":"100","author":"D Hanahan","year":"2000","unstructured":"Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57\u201370.","journal-title":"Cell"},{"issue":"11","key":"6020_CR4","doi-asserted-by":"publisher","first-page":"2586","DOI":"10.1038\/nprot.2014.170","volume":"9","author":"SR Kennedy","year":"2014","unstructured":"Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, et al. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc. 2014;9(11):2586\u2013606.","journal-title":"Nat Protoc"},{"key":"6020_CR5","doi-asserted-by":"publisher","unstructured":"Benjamin D, Sato T, Cibulskis K, Getz G, Stewart C, Lichtenstein L. Calling Somatic SNVs and Indels with Mutect2 [Internet]. bioRxiv. 2019 [cited 2024 Jan 8]. p. 861054. Available from: https:\/\/www.biorxiv.org\/content\/https:\/\/doi.org\/10.1101\/861054","DOI":"10.1101\/861054"},{"issue":"4","key":"6020_CR6","doi-asserted-by":"publisher","first-page":"1176","DOI":"10.1073\/pnas.0710982105","volume":"105","author":"J Korlach","year":"2008","unstructured":"Korlach J, Marks PJ, Cicero RL, Gray JJ, Murphy DL, Roitman DB, et al. Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. Proc Natl Acad Sci U S A. 2008;105(4):1176\u201381.","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"10","key":"6020_CR7","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","volume":"37","author":"AM Wenger","year":"2019","unstructured":"Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat Biotechnol. 2019;37(10):1155\u201362.","journal-title":"Nat Biotechnol"},{"issue":"12","key":"6020_CR8","doi-asserted-by":"publisher","first-page":"5463","DOI":"10.1073\/pnas.74.12.5463","volume":"74","author":"F Sanger","year":"1977","unstructured":"Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A. 1977;74(12):5463\u20137.","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"7218","key":"6020_CR9","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1038\/nature07517","volume":"456","author":"DR Bentley","year":"2008","unstructured":"Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53\u20139.","journal-title":"Nature"},{"issue":"7","key":"6020_CR10","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0181128","volume":"12","author":"V Potapov","year":"2017","unstructured":"Potapov V, Ong JL. Correction: examining sources of error in PCR by single-molecule sequencing. PLoS ONE. 2017;12(7): e0181128.","journal-title":"PLoS ONE"},{"issue":"1","key":"6020_CR11","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkab855","volume":"50","author":"K Xiong","year":"2022","unstructured":"Xiong K, Shea D, Rhoades J, Blewett T, Liu R, Bae JH, et al. Duplex-repair enables highly accurate sequencing, despite DNA damage. Nucleic Acids Res. 2022;50(1): e1.","journal-title":"Nucleic Acids Res"},{"issue":"54","key":"6020_CR12","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1146\/annurev-genet-040620-022145","volume":"23","author":"JN Wells","year":"2020","unstructured":"Wells JN, Feschotte C. A field guide to eukaryotic transposable elements. Annu Rev Genet. 2020;23(54):539\u201361.","journal-title":"Annu Rev Genet"},{"issue":"1","key":"6020_CR13","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1006\/geno.1995.1224","volume":"29","author":"SS Arcot","year":"1995","unstructured":"Arcot SS, Wang Z, Weber JL, Deininger PL, Batzer MA. Alu repeats: a source for the genesis of primate microsatellites. Genomics. 1995;29(1):136\u201344.","journal-title":"Genomics"},{"issue":"12","key":"6020_CR14","doi-asserted-by":"publisher","first-page":"1050","DOI":"10.1038\/nmeth.4035","volume":"13","author":"CS Chin","year":"2016","unstructured":"Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13(12):1050\u20134.","journal-title":"Nat Methods"},{"issue":"5","key":"6020_CR15","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/j.gpb.2015.08.002","volume":"13","author":"A Rhoads","year":"2015","unstructured":"Rhoads A, Au KF. PacBio sequencing and its applications. Genom Proteomics Bioinform. 2015;13(5):278\u201389.","journal-title":"Genom Proteomics Bioinform"},{"key":"6020_CR16","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1007\/978-3-662-44753-6_5","volume-title":"Algorithms in Bioinformatics","author":"G Myers","year":"2014","unstructured":"Myers G. Efficient Local Alignment Discovery amongst Noisy Long Reads. In: Algorithms in Bioinformatics. Berlin: Springer; 2014. p. 52\u201367."},{"issue":"2","key":"6020_CR17","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1038\/s41592-020-01056-5","volume":"18","author":"H Cheng","year":"2021","unstructured":"Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods. 2021;18(2):170\u20135.","journal-title":"Nat Methods"},{"issue":"1","key":"6020_CR18","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1186\/s13059-021-02328-9","volume":"22","author":"S Garg","year":"2021","unstructured":"Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol. 2021;22(1):101.","journal-title":"Genome Biol"},{"issue":"1","key":"6020_CR19","doi-asserted-by":"publisher","first-page":"1358","DOI":"10.1038\/s41467-023-36689-5","volume":"14","author":"S Garg","year":"2023","unstructured":"Garg S. Towards routine chromosome-scale haplotype-resolved reconstruction in cancer genomics. Nat Commun. 2023;14(1):1358.","journal-title":"Nat Commun"},{"key":"6020_CR20","doi-asserted-by":"publisher","unstructured":"Zhang Z, Zhang J, Kang L, Qiu X, Niu B, Bi A, et al. Genotyping of structural variation using PacBio high-fidelity sequencing [Internet]. bioRxiv. 2021 [cited 2024 Jan 11]. p. 2021.10.28.466362. Available from: https:\/\/www.biorxiv.org\/content\/https:\/\/doi.org\/10.1101\/2021.10.28.466362","DOI":"10.1101\/2021.10.28.466362"},{"issue":"9","key":"6020_CR21","doi-asserted-by":"publisher","first-page":"1291","DOI":"10.1101\/gr.263566.120","volume":"30","author":"S Nurk","year":"2020","unstructured":"Nurk S, Walenz BP, Rhie A, Vollger MR, Logsdon GA, Grothe R, et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 2020;30(9):1291\u2013305.","journal-title":"Genome Res"},{"issue":"10","key":"6020_CR22","doi-asserted-by":"publisher","first-page":"1474","DOI":"10.1038\/s41587-023-01662-6","volume":"41","author":"M Rautiainen","year":"2023","unstructured":"Rautiainen M, Nurk S, Walenz BP, Logsdon GA, Porubsky D, Rhie A, et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat Biotechnol. 2023;41(10):1474\u201382.","journal-title":"Nat Biotechnol"},{"issue":"10","key":"6020_CR23","doi-asserted-by":"publisher","first-page":"958","DOI":"10.1016\/j.cels.2021.08.009","volume":"12","author":"B Ekim","year":"2021","unstructured":"Ekim B, Berger B, Chikhi R. Minimizer-space de Bruijn graphs: Whole-genome assembly of long reads in minutes on a personal computer. Cell Syst. 2021;12(10):958-68.e6.","journal-title":"Cell Syst"},{"key":"6020_CR24","doi-asserted-by":"publisher","unstructured":"Bankevich A, Bzikadze A, Kolmogorov M, Antipov D, Pevzner PA. LJA: Assembling long and accurate reads using multiplex de Bruijn graphs [Internet]. bioRxiv; 2020. Available from: https:\/\/doi.org\/10.1101\/2020.12.10.420448","DOI":"10.1101\/2020.12.10.420448"},{"issue":"2","key":"6020_CR25","first-page":"232","volume":"41","author":"G Baid","year":"2023","unstructured":"Baid G, Cook DE, Shafin K, Yun T, Llinares-L\u00f3pez F, Berthet Q, et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nat Biotechnol. 2023;41(2):232\u20138.","journal-title":"Nat Biotechnol"},{"key":"6020_CR26","doi-asserted-by":"publisher","DOI":"10.1101\/2023.08.17.553778v1","author":"Z Zheng","year":"2023","unstructured":"Zheng Z, Su J, Chen L, Lee YL, Lam TW, Luo R. ClairS: a deep-learning method for long-read somatic small variant calling. Bioinformatics. 2023. https:\/\/doi.org\/10.1101\/2023.08.17.553778v1.","journal-title":"Bioinformatics"},{"key":"6020_CR27","doi-asserted-by":"crossref","unstructured":"Park J, Cook DE, Chang PC, Kolesnikov A, Brambrink L, Mier JC, et al. DeepSomatic: Accurate somatic small variant discovery for multiple sequencing technologies. bioRxivorg [Internet]. 2024 Aug 19; Available from: https:\/\/pubmed.ncbi.nlm.nih.gov\/39229187\/","DOI":"10.1101\/2024.08.16.608331"},{"issue":"8","key":"6020_CR28","doi-asserted-by":"publisher","first-page":"591","DOI":"10.1038\/s41592-018-0051-x","volume":"15","author":"S Kim","year":"2018","unstructured":"Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, K\u00e4llberg M, et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods. 2018;15(8):591\u20134.","journal-title":"Nat Methods"},{"key":"6020_CR29","unstructured":"Base Quality Score Recalibration (BQSR) [Internet]. GATK. [cited 2024 Oct 17]. Available from: https:\/\/gatk.broadinstitute.org\/hc\/en-us\/articles\/360035890531-Base-Quality-Score-Recalibration-BQSR"},{"key":"6020_CR30","unstructured":"Data pre-processing for variant discovery [Internet]. GATK. [cited 2024 Oct 17]. Available from: https:\/\/gatk.broadinstitute.org\/hc\/en-us\/articles\/360035535912-Data-pre-processing-for-variant-discovery"},{"issue":"3","key":"6020_CR31","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1093\/bioinformatics\/18.3.452","volume":"18","author":"C Lee","year":"2002","unstructured":"Lee C, Grasso C, Sharlow MF. Multiple sequence alignment using partial order graphs. Bioinformatics. 2002;18(3):452\u201364.","journal-title":"Bioinformatics"},{"issue":"1","key":"6020_CR32","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"TF Smith","year":"1981","unstructured":"Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195\u20137.","journal-title":"J Mol Biol"},{"issue":"3","key":"6020_CR33","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","volume":"48","author":"SB Needleman","year":"1970","unstructured":"Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48(3):443\u201353.","journal-title":"J Mol Biol"},{"key":"6020_CR34","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1016\/S0076-6879(10)72001-2","volume":"472","author":"J Korlach","year":"2010","unstructured":"Korlach J, Bjornson KP, Chaudhuri BP, Cicero RL, Flusberg BA, Gray JJ, et al. Real-time DNA sequencing from single polymerase molecules. Methods Enzymol. 2010;472:431\u201355.","journal-title":"Methods Enzymol"},{"issue":"9","key":"6020_CR35","doi-asserted-by":"publisher","first-page":"2308","DOI":"10.1016\/j.celrep.2018.11.014","volume":"25","author":"FG Osorio","year":"2018","unstructured":"Osorio FG, Rosendahl Huber A, Oka R, Verheul M, Patel SH, Hasaart K, et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 2018;25(9):2308-16.e4.","journal-title":"Cell Rep"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-024-06020-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-024-06020-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-024-06020-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T22:17:24Z","timestamp":1736979444000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-024-06020-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,16]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["6020"],"URL":"https:\/\/doi.org\/10.1186\/s12859-024-06020-0","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.02.08.579541","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,16]]},"assertion":[{"value":"6 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"We acquired a cord blood sample (ID: PD47269d) from a newborn female patient through the Cambridge Blood and Stem Cell Biobank (CBSB). This sample was collected at Addenbrooke's Hospital with informed consent from the parents and approved by the Cambridge East Ethics Committee under reference 18\/EE\/0199.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Sangjin Lee is an employee of Pacific Biosciences (PacBio).","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interest"}}],"article-number":"17"}}