{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,27]],"date-time":"2025-12-27T07:33:27Z","timestamp":1766820807328,"version":"build-2065373602"},"reference-count":69,"publisher":"Walter de Gruyter GmbH","issue":"2","license":[{"start":{"date-parts":[[2025,6,1]],"date-time":"2025-06-01T00:00:00Z","timestamp":1748736000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["CNS2022-135101"],"award-info":[{"award-number":["CNS2022-135101"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>The rapid advancement of Next-Generation Sequencing (NGS) technologies has revolutionized the field of genomics, producing large volumes of data that necessitate sophisticated analytical techniques. This paper introduces a Deep Learning model designed to predict the pathogenicity of genetic variants, a vital component in advancing personalized medicine. The model is trained on a dataset derived from the analysis of NGS outputs, containing a combination of well-defined and ambiguous genetic variants. By employing a semi-supervised learning approach, the model efficiently utilizes both confidently labeled and less certain data. At the core of the methodology is the Feature Tokenizer Transformer architecture, which processes both numerical and categorical genomic information. The preprocessing pipeline includes key steps such as data imputation, scaling, and encoding to ensure high data quality. The results highlight the model\u2019s impressive accuracy, particularly in detecting confidently labeled variants, while also addressing the impact of its predictions on less certain (soft-labeled) data.<\/jats:p>","DOI":"10.1515\/jib-2024-0047","type":"journal-article","created":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T02:59:32Z","timestamp":1750388372000},"source":"Crossref","is-referenced-by-count":1,"title":["Leveraging transformers for semi-supervised pathogenicity prediction with soft labels"],"prefix":"10.1515","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-5328-7661","authenticated-orcid":false,"given":"Pablo Enrique","family":"Guillem","sequence":"first","affiliation":[{"name":"AIR Institute, IoT Digital Innovation Hub , Salamanca , Spain"},{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6381-5149","authenticated-orcid":false,"given":"Marco","family":"Zurdo-Tabernero","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"},{"name":"16779 Institute of Biomedical Research of Salamanca (IBSAL), University of Salamanca , Salamanca , Spain"}]},{"given":"Noelia","family":"Egido Iglesias","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"}]},{"given":"\u00c1ngel","family":"Canal-Alonso","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"},{"name":"16779 Institute of Biomedical Research of Salamanca (IBSAL), University of Salamanca , Salamanca , Spain"},{"name":"James Watt School of Engineering , University of Glasgow , Glasgow , UK"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8039-3731","authenticated-orcid":false,"given":"Liliana","family":"Dur\u00f3n Figueroa","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7481-5961","authenticated-orcid":false,"given":"Guillermo","family":"Hern\u00e1ndez","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"},{"name":"16779 Institute of Biomedical Research of Salamanca (IBSAL), University of Salamanca , Salamanca , Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4726-7103","authenticated-orcid":false,"given":"Ang\u00e9lica","family":"Gonz\u00e1lez-Arrieta","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"},{"name":"16779 Institute of Biomedical Research of Salamanca (IBSAL), University of Salamanca , Salamanca , Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8239-5020","authenticated-orcid":false,"given":"Fernando","family":"de la Prieta","sequence":"additional","affiliation":[{"name":"BISITE Research Group , 16779 University of Salamanca , Salamanca , Spain"}]}],"member":"374","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"key":"2025102907580412138_j_jib-2024-0047_ref_001","doi-asserted-by":"crossref","unstructured":"Wadman, M. James Watson\u2019s genome sequenced at high speed. Nature 2008;452:788\u20139. https:\/\/doi.org\/10.1038\/452788b.","DOI":"10.1038\/452788b"},{"key":"2025102907580412138_j_jib-2024-0047_ref_002","doi-asserted-by":"crossref","unstructured":"Gunning, AC, Fryer, V, Fasham, J, Crosby, AH, Ellard, S, Baple, EL, et al.. Assessing performance of pathogenicity predictors using clinically relevant variant datasets. J Med Genet 2021;58:547\u201355. https:\/\/doi.org\/10.1136\/jmedgenet-2020-107003.","DOI":"10.1136\/jmedgenet-2020-107003"},{"key":"2025102907580412138_j_jib-2024-0047_ref_003","doi-asserted-by":"crossref","unstructured":"Qi, H, Zhang, H, Zhao, Y, Chen, C, Long, JJ, Chung, WK, et al.. MVP predicts the pathogenicity of missense variants by deep learning. Nat Commun 2021;12:510. https:\/\/doi.org\/10.1038\/s41467-020-20847-0.","DOI":"10.1038\/s41467-020-20847-0"},{"key":"2025102907580412138_j_jib-2024-0047_ref_004","doi-asserted-by":"crossref","unstructured":"Alarcon, JLC, Enriquez, JA, S\u00e1nchez-Cabo, F. Frequency Conservation Score (FCS): the power of conservation and allele frequency for variant pathogenic prediction. bioRxiv 2019:805051.","DOI":"10.1101\/805051"},{"key":"2025102907580412138_j_jib-2024-0047_ref_005","doi-asserted-by":"crossref","unstructured":"LeCun, Y, Bengio, Y, Hinton, G. Deep learning. Nature 2015;521:436\u201344. https:\/\/doi.org\/10.1038\/nature14539.","DOI":"10.1038\/nature14539"},{"key":"2025102907580412138_j_jib-2024-0047_ref_006","unstructured":"Gorishniy, Y, Rubachev, I, Khrulkov, V, Babenko, A. Revisiting deep learning models for tabular data. Adv Neural Inf Process Syst 2021;34:18932\u201343."},{"key":"2025102907580412138_j_jib-2024-0047_ref_007","unstructured":"Zhu, X, Goldberg, AB. Introduction to semi-supervised learning. Cham, Switzerland: Springer Nature; 2022."},{"key":"2025102907580412138_j_jib-2024-0047_ref_008","doi-asserted-by":"crossref","unstructured":"Guillem, PE, Zurdo-Tabernero, M, Dur\u00f3n Figueroa, L, Canal-Alonso, \u00c1, Hern\u00e1ndez, G, Gonz\u00e1lez-Arrieta, A, et al.. Transformer-enhanced pathogenicity prediction with soft labels in a semi-supervised setup. In: International Conference on Practical Applications of Computational Biology & Bioinformatics. Cham, Switzerland: Springer; 2024:41\u201350 pp.","DOI":"10.1007\/978-3-031-87873-2_5"},{"key":"2025102907580412138_j_jib-2024-0047_ref_009","doi-asserted-by":"crossref","unstructured":"Alirezaie, N, Kernohan, KD, Hartley, T, Majewski, J, Hocking, TD. ClinPred: prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. Am J Hum Genet 2018;103:474\u201383. https:\/\/doi.org\/10.1016\/j.ajhg.2018.08.005.","DOI":"10.1016\/j.ajhg.2018.08.005"},{"key":"2025102907580412138_j_jib-2024-0047_ref_010","doi-asserted-by":"crossref","unstructured":"Ioannidis, NM, Rothstein, JH, Pejaver, V, Middha, S, McDonnell, SK, Baheti, S, et al.. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet 2016;99:877\u201385. https:\/\/doi.org\/10.1016\/j.ajhg.2016.08.016.","DOI":"10.1016\/j.ajhg.2016.08.016"},{"key":"2025102907580412138_j_jib-2024-0047_ref_011","doi-asserted-by":"crossref","unstructured":"Dong, C, Wei, P, Jian, X, Gibbs, R, Boerwinkle, E, Wang, K, et al.. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 2015;24:2125\u201337. https:\/\/doi.org\/10.1093\/hmg\/ddu733.","DOI":"10.1093\/hmg\/ddu733"},{"key":"2025102907580412138_j_jib-2024-0047_ref_012","doi-asserted-by":"crossref","unstructured":"Liu, Y, Zhang, T, You, N, Wu, S, Shen, N. MAGPIE: accurate pathogenic prediction for multiple variant types using machine learning approach. Genome Med 2024;16:3. https:\/\/doi.org\/10.1186\/s13073-023-01274-4.","DOI":"10.1186\/s13073-023-01274-4"},{"key":"2025102907580412138_j_jib-2024-0047_ref_013","doi-asserted-by":"crossref","unstructured":"Carter, H, Douville, C, Stenson, PD, Cooper, DN, Karchin, R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genom 2013;14:1\u201316. https:\/\/doi.org\/10.1186\/1471-2164-14-s3-s3.","DOI":"10.1186\/1471-2164-14-S3-S3"},{"key":"2025102907580412138_j_jib-2024-0047_ref_014","doi-asserted-by":"crossref","unstructured":"Jagadeesh, KA, Wenger, AM, Berger, MJ, Guturu, H, Stenson, PD, Cooper, DN, et al.. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet 2016;48:1581\u20136. https:\/\/doi.org\/10.1038\/ng.3703.","DOI":"10.1038\/ng.3703"},{"key":"2025102907580412138_j_jib-2024-0047_ref_015","doi-asserted-by":"crossref","unstructured":"Reva, B, Antipin, Y, Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 2011;39:e118. https:\/\/doi.org\/10.1093\/nar\/gkr407.","DOI":"10.1093\/nar\/gkr407"},{"key":"2025102907580412138_j_jib-2024-0047_ref_016","doi-asserted-by":"crossref","unstructured":"Sundaram, L, Gao, H, Padigepati, SR, McRae, JF, Li, Y, Kosmicki, JA, et al.. Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 2018;50:1161\u201370. https:\/\/doi.org\/10.1038\/s41588-018-0167-z.","DOI":"10.1038\/s41588-018-0167-z"},{"key":"2025102907580412138_j_jib-2024-0047_ref_017","doi-asserted-by":"crossref","unstructured":"Schmidt, A, R\u00f6ner, S, Mai, K, Klinkhammer, H, Kircher, M, Ludwig, KU. Predicting the pathogenicity of missense variants using features derived from AlphaFold2. Bioinformatics 2023;39:btad280. https:\/\/doi.org\/10.1093\/bioinformatics\/btad280.","DOI":"10.1093\/bioinformatics\/btad280"},{"key":"2025102907580412138_j_jib-2024-0047_ref_018","doi-asserted-by":"crossref","unstructured":"Vaser, R, Adusumalli, S, Leng, SN, Sikic, M, Ng, PC. SIFT missense predictions for genomes. Nat Protoc 2016;11:1\u20139. https:\/\/doi.org\/10.1038\/nprot.2015.123.","DOI":"10.1038\/nprot.2015.123"},{"key":"2025102907580412138_j_jib-2024-0047_ref_019","doi-asserted-by":"crossref","unstructured":"Malhis, N, Jacobson, M, Jones, SJ, Gsponer, J. LIST-S2: taxonomy based sorting of deleterious missense mutations across species. Nucleic Acids Res 2020;48:W154\u201361. https:\/\/doi.org\/10.1093\/nar\/gkaa288.","DOI":"10.1093\/nar\/gkaa288"},{"key":"2025102907580412138_j_jib-2024-0047_ref_020","doi-asserted-by":"crossref","unstructured":"Quang, D, Chen, Y, Xie, X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 2015;31:761\u20133. https:\/\/doi.org\/10.1093\/bioinformatics\/btu703.","DOI":"10.1093\/bioinformatics\/btu703"},{"key":"2025102907580412138_j_jib-2024-0047_ref_021","doi-asserted-by":"crossref","unstructured":"Landrum, MJ, Lee, JM, Benson, M, Brown, G, Chao, C, Chitipiralla, S, et al.. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 2016;44:D862\u20138. https:\/\/doi.org\/10.1093\/nar\/gkv1222.","DOI":"10.1093\/nar\/gkv1222"},{"key":"2025102907580412138_j_jib-2024-0047_ref_022","doi-asserted-by":"crossref","unstructured":"Slatko, BE, Gardner, AF, Ausubel, FM. Overview of next-generation sequencing technologies. Curr Protoc Mol Biol 2018;122:e59. https:\/\/doi.org\/10.1002\/cpmb.59.","DOI":"10.1002\/cpmb.59"},{"key":"2025102907580412138_j_jib-2024-0047_ref_023","doi-asserted-by":"crossref","unstructured":"Di Tommaso, P, Chatzou, M, Floden, EW, Barja, PP, Palumbo, E, Notredame, C. Nextflow enables reproducible computational workflows. Nat Biotechnol 2017;35:316\u20139. https:\/\/doi.org\/10.1038\/nbt.3820.","DOI":"10.1038\/nbt.3820"},{"key":"2025102907580412138_j_jib-2024-0047_ref_024","doi-asserted-by":"crossref","unstructured":"Sarmento, C, Guimar\u00e3es, S, K\u0131l\u0131n\u00e7, GM, G\u00f6therstr\u00f6m, A, Pires, AE, Ginja, C, et al.. A study on Burrows-Wheeler Aligner\u2019s performance optimization for Ancient DNA mapping. In: Practical Applications of Computational Biology & Bioinformatics, 15th International Conference (PACBB 2021). Cham, Switzerland: Springer; 2022:105\u201314 pp.","DOI":"10.1007\/978-3-030-86258-9_11"},{"key":"2025102907580412138_j_jib-2024-0047_ref_025","doi-asserted-by":"crossref","unstructured":"McLaren, W, Gil, L, Hunt, SE, Riat, HS, Ritchie, GRS, Thormann, A, et al.. The Ensembl variant effect predictor. Genome Biol 2016;17:122. https:\/\/doi.org\/10.1186\/s13059-016-0974-4.","DOI":"10.1186\/s13059-016-0974-4"},{"key":"2025102907580412138_j_jib-2024-0047_ref_026","doi-asserted-by":"crossref","unstructured":"Pagel, KA, Kim, R, Moad, K, Busby, B, Zheng, L, Tokheim, C, et al.. Integrated Informatics analysis of cancer-Related variants. JCO Clinical Cancer Inf 2020;4:310\u20137. https:\/\/doi.org\/10.1200\/cci.19.00132.","DOI":"10.1200\/CCI.19.00132"},{"key":"2025102907580412138_j_jib-2024-0047_ref_027","doi-asserted-by":"crossref","unstructured":"Xavier, A, Scott, RJ, Talseth-Palmer, BA. TAPES: a tool for assessment and prioritisation in exome studies. PLoS Comput Biol 2019;15:1\u20139. https:\/\/doi.org\/10.1371\/journal.pcbi.1007453.","DOI":"10.1371\/journal.pcbi.1007453"},{"key":"2025102907580412138_j_jib-2024-0047_ref_028","doi-asserted-by":"crossref","unstructured":"Rentzsch, P, Schubach, M, Shendure, J, Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 2019;47:D886\u201394. https:\/\/doi.org\/10.1093\/nar\/gky1016.","DOI":"10.1093\/nar\/gky1016"},{"key":"2025102907580412138_j_jib-2024-0047_ref_029","doi-asserted-by":"crossref","unstructured":"Shihab, HA, Gough, J, Cooper, DN, Stenson, PD, Barker, GL, Edwards, KJ, et al.. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 2013;34:57\u201365. https:\/\/doi.org\/10.1002\/humu.22225.","DOI":"10.1002\/humu.22225"},{"key":"2025102907580412138_j_jib-2024-0047_ref_030","doi-asserted-by":"crossref","unstructured":"Shihab, HA, Rogers, MF, Gough, J, Mort, M, Cooper, DN, Day, INM, et al.. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 2015;31:1536\u201343. https:\/\/doi.org\/10.1093\/bioinformatics\/btv009.","DOI":"10.1093\/bioinformatics\/btv009"},{"key":"2025102907580412138_j_jib-2024-0047_ref_031","doi-asserted-by":"crossref","unstructured":"Karczewski, KJ, Francioli, LC, Tiao, G, Cummings, BB, Alf\u00f6ldi, J, Wang, Q, et al.. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020;581:434\u201343. https:\/\/doi.org\/10.1038\/s41586-020-2308-7.","DOI":"10.1530\/ey.17.14.3"},{"key":"2025102907580412138_j_jib-2024-0047_ref_032","doi-asserted-by":"crossref","unstructured":"Fadista, JA, Oskolkov, N, Hansson, O, Groop, L. LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals. Bioinformatics 2016;33:471\u20134. https:\/\/doi.org\/10.1093\/bioinformatics\/btv602.","DOI":"10.1093\/bioinformatics\/btv602"},{"key":"2025102907580412138_j_jib-2024-0047_ref_033","doi-asserted-by":"crossref","unstructured":"Jaganathan, K, Kyriazopoulou Panagiotopoulou, S, McRae, JF, Darbandi, SF, Knowles, D, Li, YI, et al.. Predicting splicing from primary sequence with deep learning. Cell 2019;176:535\u201348.e24. https:\/\/doi.org\/10.1016\/j.cell.2018.12.015.","DOI":"10.1016\/j.cell.2018.12.015"},{"key":"2025102907580412138_j_jib-2024-0047_ref_034","doi-asserted-by":"crossref","unstructured":"Vilella, AJ, Severin, J, Ureta-Vidal, A, Heng, L, Durbin, R, Birney, E. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res 2009;19:327\u201335. https:\/\/doi.org\/10.1101\/gr.073585.107.","DOI":"10.1101\/gr.073585.107"},{"key":"2025102907580412138_j_jib-2024-0047_ref_035","doi-asserted-by":"crossref","unstructured":"Thormann, A, Halachev, M, McLaren, W, Moore, DJ, Svinti, V, Campbell, A, et al.. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP. Nat Commun 2019;10:2373. https:\/\/doi.org\/10.1038\/s41467-019-10016-3.","DOI":"10.1038\/s41467-019-10016-3"},{"key":"2025102907580412138_j_jib-2024-0047_ref_036","doi-asserted-by":"crossref","unstructured":"Pi\u00f1ero, J, Bravo, \u00c0, Queralt-Rosinach, N, Guti\u00e9rrez-Sacrist\u00e1n, A, Deu-Pons, J, Centeno, E, et al.. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res 2017;45:D833\u20139. https:\/\/doi.org\/10.1093\/nar\/gkw943.","DOI":"10.1093\/nar\/gkw943"},{"key":"2025102907580412138_j_jib-2024-0047_ref_037","doi-asserted-by":"crossref","unstructured":"Chunn, LM, Nefcy, DC, Scouten, RW, Tarpey, RP, Chauhan, G, Lim, MS, et al.. Mastermind: a comprehensive genomic association search engine for empirical evidence curation and genetic variant interpretation. Front Genet 2020;11:577152. https:\/\/doi.org\/10.3389\/fgene.2020.577152.","DOI":"10.3389\/fgene.2020.577152"},{"key":"2025102907580412138_j_jib-2024-0047_ref_038","doi-asserted-by":"crossref","unstructured":"Ashburner, M, Ball, CA, Blake, JA, Botstein, D, Butler, H, Cherry, JM, et al.. Gene Ontology: tool for the unification of biology. Nat Genet 2000;25:25\u20139. https:\/\/doi.org\/10.1038\/75556.","DOI":"10.1038\/75556"},{"key":"2025102907580412138_j_jib-2024-0047_ref_039","doi-asserted-by":"crossref","unstructured":"Yeo, G, Burge, CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 2004;11:377\u201394. https:\/\/doi.org\/10.1089\/1066527041410418.","DOI":"10.1089\/1066527041410418"},{"key":"2025102907580412138_j_jib-2024-0047_ref_040","doi-asserted-by":"crossref","unstructured":"Pertea, M, Lin, X, Salzberg, SL. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 2001;29:1185\u201390. https:\/\/doi.org\/10.1093\/nar\/29.5.1185.","DOI":"10.1093\/nar\/29.5.1185"},{"key":"2025102907580412138_j_jib-2024-0047_ref_041","doi-asserted-by":"crossref","unstructured":"Liu, X, Jian, X, Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 2011;32:894\u20139. https:\/\/doi.org\/10.1002\/humu.21517.","DOI":"10.1002\/humu.21517"},{"key":"2025102907580412138_j_jib-2024-0047_ref_042","doi-asserted-by":"crossref","unstructured":"Chen, S, Francioli, LC, Goodrich, JK, Collins, RL, Kanai, M, Wang, Q, et al.. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 2024;625:92\u2013100. https:\/\/doi.org\/10.1038\/s41586-023-06045-0.","DOI":"10.1038\/s41586-023-06045-0"},{"key":"2025102907580412138_j_jib-2024-0047_ref_043","doi-asserted-by":"crossref","unstructured":"del Toro, N, Shrivastava, A, Ragueneau, E, Meldal, B, Combe, C, Barrera, E, et al.. The IntAct database: efficient access to fine-grained molecular interaction data. Nucleic Acids Res 2021;50:D648\u201353. https:\/\/doi.org\/10.1093\/nar\/gkab1006.","DOI":"10.1093\/nar\/gkab1006"},{"key":"2025102907580412138_j_jib-2024-0047_ref_044","doi-asserted-by":"crossref","unstructured":"Rogers, MF, Shihab, HA, Gaunt, TR, Campbell, C. CScape: a tool for predicting oncogenic single-point mutations in the cancer genome. Sci Rep 2017;7:11597. https:\/\/doi.org\/10.1038\/s41598-017-11746-4.","DOI":"10.1038\/s41598-017-11746-4"},{"key":"2025102907580412138_j_jib-2024-0047_ref_045","doi-asserted-by":"crossref","unstructured":"Jian, X, Boerwinkle, E, Liu, X. In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 2014;42:13534\u201344. https:\/\/doi.org\/10.1093\/nar\/gku1206.","DOI":"10.1093\/nar\/gku1206"},{"key":"2025102907580412138_j_jib-2024-0047_ref_046","doi-asserted-by":"crossref","unstructured":"Tokheim, C, Karchin, R. CHASMplus reveals the scope of somatic missense mutations driving human cancers. Cell Syst 2019;9:9\u201323.e8. https:\/\/doi.org\/10.1016\/j.cels.2019.05.005.","DOI":"10.1016\/j.cels.2019.05.005"},{"key":"2025102907580412138_j_jib-2024-0047_ref_047","doi-asserted-by":"crossref","unstructured":"Griffith, M, Spies, NC, Krysiak, K, McMichael, JF, Coffman, AC, Danos, AM, et al.. CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer. Nat Genet 2017;49:170\u20134. https:\/\/doi.org\/10.1038\/ng.3774.","DOI":"10.1038\/ng.3774"},{"key":"2025102907580412138_j_jib-2024-0047_ref_048","doi-asserted-by":"crossref","unstructured":"Tate, JG, Bamford, S, Jubb, HC, Sondka, Z, Beare, DM, Bindal, N, et al.. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 2019;47:D941\u20137. https:\/\/doi.org\/10.1093\/nar\/gky1015.","DOI":"10.1093\/nar\/gky1015"},{"key":"2025102907580412138_j_jib-2024-0047_ref_049","doi-asserted-by":"crossref","unstructured":"Tamborero, D, Rubio-Perez, C, Deu-Pons, J, Schroeder, MP, Vivancos, A, Rovira, A, et al.. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med 2018;10:25. https:\/\/doi.org\/10.1186\/s13073-018-0531-8.","DOI":"10.1186\/s13073-018-0531-8"},{"key":"2025102907580412138_j_jib-2024-0047_ref_050","doi-asserted-by":"crossref","unstructured":"Sollis, E, Mosaku, A, Abid, A, Buniello, A, Cerezo, M, Gil, L, et al.. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res 2023;51:D977\u201385. https:\/\/doi.org\/10.1093\/nar\/gkac1010.","DOI":"10.1093\/nar\/gkac1010"},{"key":"2025102907580412138_j_jib-2024-0047_ref_051","doi-asserted-by":"crossref","unstructured":"Leslie, R, O\u2019Donnell, CJ, Johnson, AD. GRASP: analysis of genotype-phenotype results from 1,390 genome-wide association studies and corresponding open access database. Bioinformatics 2014;30:i185\u201394. https:\/\/doi.org\/10.1093\/bioinformatics\/btu273.","DOI":"10.1093\/bioinformatics\/btu273"},{"key":"2025102907580412138_j_jib-2024-0047_ref_052","doi-asserted-by":"crossref","unstructured":"Huang, YF, Gulko, B, Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat Genet 2017;49:618\u201324. https:\/\/doi.org\/10.1038\/ng.3810.","DOI":"10.1038\/ng.3810"},{"key":"2025102907580412138_j_jib-2024-0047_ref_053","doi-asserted-by":"crossref","unstructured":"Petrovski, S, Wang, Q, Heinzen, EL, Allen, AS, Goldstein, DB. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 2013;9:e1003709. https:\/\/doi.org\/10.1371\/journal.pgen.1003709.","DOI":"10.1371\/journal.pgen.1003709"},{"key":"2025102907580412138_j_jib-2024-0047_ref_054","unstructured":"Gene [Internet]. Bethesda (MD): National Center for Biotechnology Information, National Library of Medicine (US); 2004 [cited 2024 Sep 5]. Available from: https:\/\/www.ncbi.nlm.nih.gov\/gene\/"},{"key":"2025102907580412138_j_jib-2024-0047_ref_055","doi-asserted-by":"crossref","unstructured":"Zerbino, DR, Wilder, SP, Johnson, N, Juettemann, T, Flicek, PR. The Ensembl regulatory build. Genome Biol 2015;16:56. https:\/\/doi.org\/10.1186\/s13059-015-0621-5.","DOI":"10.1186\/s13059-015-0621-5"},{"key":"2025102907580412138_j_jib-2024-0047_ref_056","doi-asserted-by":"crossref","unstructured":"Sherry, ST, Ward, MH, Kholodov, M, Baker, J, Phan, L, Smigielski, E, et al.. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29:308\u201311. https:\/\/doi.org\/10.1093\/nar\/29.1.308.","DOI":"10.1093\/nar\/29.1.308"},{"key":"2025102907580412138_j_jib-2024-0047_ref_057","doi-asserted-by":"crossref","unstructured":"Niknafs, N, Kim, D, Kim, R, Diekhans, M, Ryan, M, Stenson, PD, et al.. MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures. Hum Genet 2013;132:1235\u201343. https:\/\/doi.org\/10.1007\/s00439-013-1325-0.","DOI":"10.1007\/s00439-013-1325-0"},{"key":"2025102907580412138_j_jib-2024-0047_ref_058","doi-asserted-by":"crossref","unstructured":"Allot, A, Peng, Y, Wei, CH, Lee, K, Phan, L, Lu, Z. LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Nucleic Acids Res 2018;46:W530\u20136. https:\/\/doi.org\/10.1093\/nar\/gky355.","DOI":"10.1093\/nar\/gky355"},{"key":"2025102907580412138_j_jib-2024-0047_ref_059","doi-asserted-by":"crossref","unstructured":"Richards, S, Aziz, N, Bale, S, Bick, D, Das, S, Gastier-Foster, J, et al.. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular pathology. Genet Med 2015;17:405\u201324. https:\/\/doi.org\/10.1038\/gim.2015.30.","DOI":"10.1038\/gim.2015.30"},{"key":"2025102907580412138_j_jib-2024-0047_ref_060","doi-asserted-by":"crossref","unstructured":"Ghosh, R, Harrison, SM, Rehm, HL, Plon, SE, Biesecker, LG. On behalf of ClinGen sequence variant interpretation working group. updated recommendation for the benign stand-alone ACMG\/AMP criterion. Hum Mutat 2018;39:1633\u201341.","DOI":"10.1002\/humu.23642"},{"key":"2025102907580412138_j_jib-2024-0047_ref_061","unstructured":"Devlin, J, Chang, MW, Lee, K, Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv 2018:181004805."},{"key":"2025102907580412138_j_jib-2024-0047_ref_062","unstructured":"Lee, D. Pseudo-label\u202f: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML 2013 Workshop: Challenges in Representation Learning. Atlanta, GA: Workshop Proceedings; 2013:1\u20136 pp."},{"key":"2025102907580412138_j_jib-2024-0047_ref_063","unstructured":"Xie, Q, Dai, Z, Hovy, E, Luong, T, Le, Q. Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst 2020;33:6256\u201368."},{"key":"2025102907580412138_j_jib-2024-0047_ref_064","doi-asserted-by":"crossref","unstructured":"Fan, Y, Kukleva, A, Dai, D, Schiele, B. Revisiting consistency regularization for semi-supervised learning. Int J Comput Vis 2023;131:626\u201343. https:\/\/doi.org\/10.1007\/s11263-022-01723-4.","DOI":"10.1007\/s11263-022-01723-4"},{"key":"2025102907580412138_j_jib-2024-0047_ref_065","unstructured":"Loshchilov, I, Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv 2017:171105101."},{"key":"2025102907580412138_j_jib-2024-0047_ref_066","unstructured":"Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I, Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15:1929\u201358."},{"key":"2025102907580412138_j_jib-2024-0047_ref_067","doi-asserted-by":"crossref","unstructured":"Garbin, C, Zhu, X, Marques, O. Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimed Tool Appl 2020;79:12777\u2013815. https:\/\/doi.org\/10.1007\/s11042-019-08453-9.","DOI":"10.1007\/s11042-019-08453-9"},{"key":"2025102907580412138_j_jib-2024-0047_ref_068","doi-asserted-by":"crossref","unstructured":"Szegedy, C, Vanhoucke, V, Ioffe, S, Shlens, J, Wojna, Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). Las Vegas, NV: IEEE Computer Society; 2016:2818\u201326 pp.","DOI":"10.1109\/CVPR.2016.308"},{"key":"2025102907580412138_j_jib-2024-0047_ref_069","doi-asserted-by":"crossref","unstructured":"Zhang, CB, Jiang, PT, Hou, Q, Wei, Y, Han, Q, Li, Z, et al.. Delving deep into label smoothing. IEEE Trans Image Process 2021;30:5984\u201396. https:\/\/doi.org\/10.1109\/tip.2021.3089942.","DOI":"10.1109\/TIP.2021.3089942"}],"container-title":["Journal of Integrative Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jib-2024-0047\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jib-2024-0047\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T07:59:52Z","timestamp":1761724792000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyterbrill.com\/document\/doi\/10.1515\/jib-2024-0047\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,1]]},"references-count":69,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,7,14]]},"published-print":{"date-parts":[[2025,10,30]]}},"alternative-id":["10.1515\/jib-2024-0047"],"URL":"https:\/\/doi.org\/10.1515\/jib-2024-0047","relation":{},"ISSN":["1613-4516"],"issn-type":[{"type":"electronic","value":"1613-4516"}],"subject":[],"published":{"date-parts":[[2025,6,1]]},"article-number":"20240047"}}