{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,9]],"date-time":"2026-05-09T21:29:05Z","timestamp":1778362145062,"version":"3.51.4"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T00:00:00Z","timestamp":1732492800000},"content-version":"vor","delay-in-days":3,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Hong Kong Research Grants Council","award":["17113721"],"award-info":[{"award-number":["17113721"]}]},{"name":"TRS","award":["T21-705\/20-N and T12-703\/19-R"],"award-info":[{"award-number":["T21-705\/20-N and T12-703\/19-R"]}]},{"name":"Shenzhen Municipal Government General Program","award":["JCYJ20210324134405015"],"award-info":[{"award-number":["JCYJ20210324134405015"]}]},{"DOI":"10.13039\/501100003803","name":"HKU","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003803","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010890","name":"Oxford Nanopore Technologies","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100010890","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Ensuring a unified variant representation aligning the sequencing data is critical for downstream analysis as variant representation may differ across platforms and sequencing conditions. Current approaches typically treat variant unification as a post-step following variant calling and are incapable of measuring the correct variant representation from the outset. Aligning variant representations with the alignment before variant calling has benefits like providing reliable training labels for deep learning-based variant caller model training and enabling direct assessment of alignment quality. However, it also poses challenges due to the large number of candidates to handle. Here, we present Repun, a haplotype-aware variant-alignment unification algorithm that harmonizes the variant representation between provided variants and alignments in different sequencing platforms. Repun leverages phasing to facilitate equivalent haplotype matches between variants and alignments. Our approach reduced the comparisons between variant haplotypes and candidate haplotypes by utilizing haplotypes with read evidence to speed up the unification process. Repun achieved &amp;gt;99.99% precision and\u2009&amp;gt;\u200999.5% recall through extensive evaluations of various Genome in a Bottle Consortium samples encompassing three sequencing platforms: Oxford Nanopore Technology, Pacific Biosciences, and Illumina. Repun is open-source and available at (https:\/\/github.com\/zhengzhenxian\/Repun).<\/jats:p>","DOI":"10.1093\/bib\/bbae613","type":"journal-article","created":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T11:46:39Z","timestamp":1732535199000},"source":"Crossref","is-referenced-by-count":4,"title":["Repun: an accurate small variant representation unification method for multiple sequencing platforms"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6546-2324","authenticated-orcid":false,"given":"Zhenxian","family":"Zheng","sequence":"first","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yingxuan","family":"Ren","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lei","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Angel On Ki","family":"Wong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shumin","family":"Li","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1917-143X","authenticated-orcid":false,"given":"Xian","family":"Yu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]},{"name":"Faculty of Computing, Harbin Institute of Technology , 92 Xidazhi Street, Nangang District, Harbin, Heilongjiang 150001 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tak-Wah","family":"Lam","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9711-6533","authenticated-orcid":false,"given":"Ruibang","family":"Luo","sequence":"additional","affiliation":[{"name":"Department of Computer Science, The University of Hong Kong , Pok Fu Lam Road, Hong Kong, 999077 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2024,11,25]]},"reference":[{"key":"2024112511462488800_ref1","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1038\/s41576-023-00590-0","article-title":"Variant calling and benchmarking in an era of complete human genome sequences","volume":"24","author":"Olson","year":"2023","journal-title":"Nat Rev Genet"},{"key":"2024112511462488800_ref2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/sdata.2016.25","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"Zook","year":"2016","journal-title":"Scientific data"},{"key":"2024112511462488800_ref3","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1038\/s41587-021-00993-6","article-title":"Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing","volume":"39","author":"Fang","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2024112511462488800_ref4","doi-asserted-by":"publisher","first-page":"964","DOI":"10.1093\/bioinformatics\/btw748","article-title":"Improved VCF normalization for accurate VCF comparison","volume":"33","author":"Bayat","year":"2017","journal-title":"Bioinformatics"},{"key":"2024112511462488800_ref5","doi-asserted-by":"publisher","first-page":"023754","DOI":"10.1093\/bioinformatics\/btw748","article-title":"Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines","author":"Cleary","year":"2015","journal-title":"BioRxiv"},{"key":"2024112511462488800_ref6","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1038\/s41587-019-0054-x","article-title":"Best practices for benchmarking germline small-variant calls in human genomes","volume":"37","author":"the Global Alliance for Genomics and Health Benchmarking Team","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2024112511462488800_ref7","doi-asserted-by":"publisher","DOI":"10.1016\/j.xgen.2021.100027","article-title":"The GA4GH variation representation specification: a computational framework for variation representation and federated identification","volume":"1","author":"Wagner","year":"2021","journal-title":"Cell genomics"},{"key":"2024112511462488800_ref8","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1038\/s42256-020-0167-4","article-title":"Exploring the limit of using a deep neural network on pileup data for germline variant calling","volume":"2","author":"Luo","year":"2020","journal-title":"Nature Machine Intelligence"},{"key":"2024112511462488800_ref9","doi-asserted-by":"publisher","first-page":"983","DOI":"10.1038\/nbt.4235","article-title":"A universal SNP and small-indel variant caller using deep neural networks","volume":"36","author":"Poplin","year":"2018","journal-title":"Nat Biotechnol"},{"key":"2024112511462488800_ref10","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1038\/s43588-022-00387-x","article-title":"Symphonizing pileup and full-alignment for deep learning-based long-read variant calling","volume":"2","author":"Zheng","year":"2022","journal-title":"Nature Computational Science"},{"key":"2024112511462488800_ref11","doi-asserted-by":"publisher","first-page":"2017.553778","DOI":"10.1038\/s43588-022-00387-x","article-title":"ClairS: a deep-learning method for long-read somatic small variant calling","volume":"2023","author":"Zheng","year":"2008","journal-title":"bioRxiv 2023"},{"key":"2024112511462488800_ref12","doi-asserted-by":"publisher","first-page":"100128","DOI":"10.1016\/j.xgen.2022.100128","article-title":"Benchmarking challenging small variants with linked and long reads","volume":"2","author":"Wagner","year":"2022","journal-title":"Cell genomics"},{"key":"2024112511462488800_ref13","doi-asserted-by":"publisher","author":"Guppy basecalling software","DOI":"10.1016\/j.xgen.2022.100128"},{"key":"2024112511462488800_ref14","author":"Dorado basecaller"},{"key":"2024112511462488800_ref15","author":"Nanopore Q20+ chemistry"},{"key":"2024112511462488800_ref16","doi-asserted-by":"publisher","first-page":"1044","DOI":"10.1038\/s41587-020-0503-6","article-title":"Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes","volume":"38","author":"Shafin","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2024112511462488800_ref17","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","article-title":"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome","volume":"37","author":"Wenger","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2024112511462488800_ref18","doi-asserted-by":"publisher","first-page":"100129","DOI":"10.1016\/j.xgen.2022.100129","article-title":"PrecisionFDA truth challenge V2: calling variants from short and long reads in difficult-to-map regions","volume":"2","author":"Olson","year":"2022","journal-title":"Cell genomics"},{"key":"2024112511462488800_ref19","doi-asserted-by":"publisher","first-page":"422022","DOI":"10.1016\/j.xgen.2022.100129","article-title":"An extensive sequence dataset of gold-standard samples for benchmarking and development","author":"Baid","year":"2011","journal-title":"bioRxiv 2020:20202012"},{"key":"2024112511462488800_ref20","first-page":"085050","article-title":"nHap: fast and accurate read-based phasing","author":"Martin","year":"2016","journal-title":"BioRxiv"},{"key":"2024112511462488800_ref21","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1089\/cmb.2014.0157","article-title":"WhatsHap: weighted haplotype assembly for future-generation sequencing reads","volume":"22","author":"Patterson","year":"2015","journal-title":"J Comput Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbae613\/60805802\/bbae613.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbae613\/60805802\/bbae613.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,25]],"date-time":"2024-11-25T11:46:46Z","timestamp":1732535206000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae613\/7908003"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae613","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,11,22]]},"article-number":"bbae613"}}