{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T13:57:23Z","timestamp":1778162243003,"version":"3.51.4"},"reference-count":25,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2022,12,22]],"date-time":"2022-12-22T00:00:00Z","timestamp":1671667200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (\u223c16\u00d7) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002\u2013HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16\u00d7.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/huangnengCSU\/NanoSNP.git.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac824","type":"journal-article","created":{"date-parts":[[2022,12,22]],"date-time":"2022-12-22T18:47:08Z","timestamp":1671734828000},"source":"Crossref","is-referenced-by-count":14,"title":["NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data"],"prefix":"10.1093","volume":"39","author":[{"given":"Neng","family":"Huang","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minghua","family":"Xu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fan","family":"Nie","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Ni","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chuan-Le","family":"Xiao","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University , Guangzhou 510060, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4813-2403","authenticated-orcid":false,"given":"Feng","family":"Luo","sequence":"additional","affiliation":[{"name":"School of Computing, Clemson University , Clemson, SC 29634, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1516-0480","authenticated-orcid":false,"given":"Jianxin","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University , Changsha 410083, China"},{"name":"Hunan Provincial Key Lab on Bioinformatics, Central South University , Changsha 410083, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,12,22]]},"reference":[{"key":"2023010620565696700_btac824-B1","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023010620565696700_btac824-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-021-02472-2","article-title":"Nanocaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks","volume":"22","author":"Ahsan","year":"2021","journal-title":"Genome Biol"},{"key":"2023010620565696700_btac824-B3","first-page":"1","article-title":"Efficient assembly of nanopore reads via highly accurate and intact error correction","volume":"12","author":"Chen","year":"2021","journal-title":"Nat. Commun"},{"key":"2023010620565696700_btac824-B4","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023010620565696700_btac824-B5","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023010620565696700_btac824-B6","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1101\/gr.214007.116","article-title":"Discovery and genotyping of structural variation from long-read haploid genome sequence data","volume":"27","author":"Huddleston","year":"2017","journal-title":"Genome Res"},{"key":"2023010620565696700_btac824-B7","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1038\/nbt.4060","article-title":"Nanopore sequencing and assembly of a human genome with ultra-long reads","volume":"36","author":"Jain","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023010620565696700_btac824-B8","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2023010620565696700_btac824-B9","author":"Luo","year":"2019"},{"key":"2023010620565696700_btac824-B10","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1038\/s42256-020-0167-4","article-title":"Exploring the limit of using a deep neural network on pileup data for germline variant calling","volume":"2","author":"Luo","year":"2020","journal-title":"Nat. Mach. Intell"},{"key":"2023010620565696700_btac824-B11","author":"Martin","year":"2016"},{"key":"2023010620565696700_btac824-B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-021-26278-9","article-title":"Genome-wide detection of cytosine methylations in plant from nanopore data using deep learning","volume":"12","author":"Ni","year":"2021","journal-title":"Nat. Commun"},{"key":"2023010620565696700_btac824-B13","author":"Nurk","year":"2022"},{"key":"2023010620565696700_btac824-B14","doi-asserted-by":"crossref","first-page":"498","DOI":"10.1089\/cmb.2014.0157","article-title":"WhatsHap: weighted haplotype assembly for future-generation sequencing reads","volume":"22","author":"Patterson","year":"2015","journal-title":"J. Comput. Biol"},{"key":"2023010620565696700_btac824-B15","doi-asserted-by":"crossref","first-page":"2193","DOI":"10.1093\/bioinformatics\/bty841","article-title":"Bulkvis: a graphical viewer for oxford nanopore bulk fast5 files","volume":"35","author":"Payne","year":"2019","journal-title":"Bioinformatics"},{"key":"2023010620565696700_btac824-B16","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/nbt.4235","article-title":"A universal SNP and small-indel variant caller using deep neural networks","volume":"36","author":"Poplin","year":"2018","journal-title":"Nat. Biotechnol"},{"key":"2023010620565696700_btac824-B17","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1038\/s41587-020-0719-5","article-title":"Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads","volume":"39","author":"Porubsky","year":"2021","journal-title":"Nat. Biotechnol"},{"key":"2023010620565696700_btac824-B18","doi-asserted-by":"crossref","first-page":"1483","DOI":"10.1038\/s41588-018-0196-7","article-title":"Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk","volume":"50","author":"Reshef","year":"2018","journal-title":"Nat. Genet"},{"key":"2023010620565696700_btac824-B19","doi-asserted-by":"crossref","first-page":"1322","DOI":"10.1038\/s41592-021-01299-w","article-title":"Haplotype-aware variant calling with pepper-margin-deepvariant enables high accuracy in nanopore long-reads","volume":"18","author":"Shafin","year":"2021","journal-title":"Nat. Methods"},{"key":"2023010620565696700_btac824-B20","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1007\/s10038-007-0200-z","article-title":"Snps in disease gene mapping, medicinal drug development and evolution","volume":"52","author":"Shastry","year":"2007","journal-title":"J. Hum. Genet"},{"key":"2023010620565696700_btac824-B21","doi-asserted-by":"crossref","first-page":"1348","DOI":"10.1038\/s41587-021-01108-x","article-title":"Nanopore sequencing technology, bioinformatics and applications","volume":"39","author":"Wang","year":"2021","journal-title":"Nat. Biotechnol"},{"key":"2023010620565696700_btac824-B22","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","article-title":"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome","volume":"37","author":"Wenger","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023010620565696700_btac824-B23","author":"Wright","year":"2021"},{"key":"2023010620565696700_btac824-B24","author":"Zheng","year":"2022"},{"key":"2023010620565696700_btac824-B25","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1038\/s41587-019-0074-6","article-title":"An open resource for accurately benchmarking small variant and reference calls","volume":"37","author":"Zook","year":"2019","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac824\/48353335\/btac824.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac824\/48482875\/btac824.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/1\/btac824\/48482875\/btac824.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,6]],"date-time":"2023-01-06T20:59:51Z","timestamp":1673038791000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btac824\/6957086"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,12,22]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,1,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac824","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,1,1]]},"published":{"date-parts":[[2022,12,22]]},"article-number":"btac824"}}