{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:52:27Z","timestamp":1740135147186,"version":"3.37.3"},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"S6","license":[{"start":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T00:00:00Z","timestamp":1622592000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T00:00:00Z","timestamp":1622592000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["19H03213","20K06606"],"award-info":[{"award-number":["19H03213","20K06606"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-021-03999-8","type":"journal-article","created":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T11:07:12Z","timestamp":1622632032000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations"],"prefix":"10.1186","volume":"22","author":[{"given":"Hao","family":"Jia","sequence":"first","affiliation":[]},{"given":"Sung-Joon","family":"Park","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8721-8883","authenticated-orcid":false,"given":"Kenta","family":"Nakai","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,6,2]]},"reference":[{"key":"3999_CR1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tig.2019.09.006","author":"H Lee","year":"2019","unstructured":"Lee H, et al. Long noncoding RNAs and repetitive elements: Junk or intimate evolutionary partners? Trends Genet. 2019. https:\/\/doi.org\/10.1016\/j.tig.2019.09.006.","journal-title":"Trends Genet"},{"key":"3999_CR2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0903103106","author":"LA Hindorff","year":"2009","unstructured":"Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009. https:\/\/doi.org\/10.1073\/pnas.0903103106.","journal-title":"Proc Natl Acad Sci USA"},{"key":"3999_CR3","doi-asserted-by":"publisher","DOI":"10.1038\/nrg.2015.17","author":"E Khurana","year":"2016","unstructured":"Khurana E, et al. Role of non-coding sequence variants in cancer. Nat Rev Genet. 2016. https:\/\/doi.org\/10.1038\/nrg.2015.17.","journal-title":"Nat Rev Genet"},{"key":"3999_CR4","doi-asserted-by":"publisher","DOI":"10.1038\/nature11247","author":"I Dunham","year":"2012","unstructured":"Dunham I, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012. https:\/\/doi.org\/10.1038\/nature11247.","journal-title":"Nature"},{"key":"3999_CR5","doi-asserted-by":"publisher","DOI":"10.1038\/nbt1010-1045","author":"BE Bernstein","year":"2010","unstructured":"Bernstein BE, et al. The NIH roadmap epigenomics mapping consortium. Nat Biotechnol. 2010. https:\/\/doi.org\/10.1038\/nbt1010-1045.","journal-title":"Nat Biotechnol"},{"key":"3999_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.ajhg.2018.03.026","author":"D Backenroth","year":"2018","unstructured":"Backenroth D, et al. FUN-LDA: a latent Dirichlet allocation model for predicting tissue-specific functional effects of noncoding variation: methods and applications. Am J Hum Genet. 2018. https:\/\/doi.org\/10.1016\/j.ajhg.2018.03.026.","journal-title":"Am J Hum Genet"},{"key":"3999_CR7","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pgen.1005947","author":"Q Lu","year":"2016","unstructured":"Lu Q, et al. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLoS Genet. 2016. https:\/\/doi.org\/10.1371\/journal.pgen.1005947.","journal-title":"PLoS Genet"},{"key":"3999_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/ng.3477","volume":"1","author":"I Ionita-Laza","year":"2016","unstructured":"Ionita-Laza I, et al. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet. 2016;1:1. https:\/\/doi.org\/10.1038\/ng.3477.","journal-title":"Nat Genet"},{"key":"3999_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/ng.3331","volume":"1","author":"D Lee","year":"2015","unstructured":"Lee D, et al. A method to predict the impact of regulatory variants from DNA sequence. Nat Genet. 2015;1:1. https:\/\/doi.org\/10.1038\/ng.3331.","journal-title":"Nat Genet"},{"key":"3999_CR10","doi-asserted-by":"publisher","DOI":"10.1038\/ng.2892","author":"M Kircher","year":"2014","unstructured":"Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014. https:\/\/doi.org\/10.1038\/ng.2892.","journal-title":"Nat Genet"},{"key":"3999_CR11","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btu703","author":"D Quang","year":"2015","unstructured":"Quang D, et al. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015. https:\/\/doi.org\/10.1093\/bioinformatics\/btu703.","journal-title":"Bioinformatics"},{"key":"3999_CR12","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.3547","author":"J Zhou","year":"2015","unstructured":"Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods. 2015. https:\/\/doi.org\/10.1038\/nmeth.3547.","journal-title":"Nat Methods"},{"key":"3999_CR13","first-page":"555","volume":"1","author":"Q Liu","year":"2000","unstructured":"Liu Q, et al. Interactive and incremental learning via a mixture of supervised and unsupervised learning strategies. Proc Joint Conf Inf Sci. 2000;1:555\u20138.","journal-title":"Proc Joint Conf Inf Sci"},{"key":"3999_CR14","unstructured":"Zhu X. Semi-supervised learning literature survey. Technical report 1530, Computer Sciences, University of Wisconsin-Madison. 2005."},{"key":"3999_CR15","unstructured":"Joachims T. Transductive inference for text classification using support vector machines. In: Proceedings of the 20th international conference on machine learning; 2000. p. 200\u20139."},{"key":"3999_CR16","first-page":"465","volume":"16","author":"N Shental","year":"2004","unstructured":"Shental N, et al. Computing Gaussian mixture models with EM using equivalence constraints. Adv Neural Inf Process Syst. 2004;16:465\u201372.","journal-title":"Adv Neural Inf Process Syst"},{"key":"3999_CR17","unstructured":"Lee D-H. Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: ICML 2013 workshop: challenges in representation learning; 2013."},{"key":"3999_CR18","doi-asserted-by":"publisher","unstructured":"Iscen A, et al. Label propagation for deep semi-supervised learning. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition; 2019. https:\/\/doi.org\/10.1109\/CVPR.2019.00521.","DOI":"10.1109\/CVPR.2019.00521"},{"key":"3999_CR19","doi-asserted-by":"publisher","unstructured":"Li Z, et al. Naive semi-supervised deep learning using pseudo-label. In: Peer-to-peer networking and applications; 2019. https:\/\/doi.org\/10.1007\/s12083-018-0702-9.","DOI":"10.1007\/s12083-018-0702-9"},{"key":"3999_CR20","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-018-07349-w","author":"Z He","year":"2018","unstructured":"He Z, et al. A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs. Nat Commun. 2018. https:\/\/doi.org\/10.1038\/s41467-018-07349-w.","journal-title":"Nat Commun"},{"key":"3999_CR21","doi-asserted-by":"publisher","DOI":"10.1016\/j.cell.2007.12.014","author":"AP Boyle","year":"2008","unstructured":"Boyle AP, et al. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008. https:\/\/doi.org\/10.1016\/j.cell.2007.12.014.","journal-title":"Cell"},{"key":"3999_CR22","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929\u201358.","journal-title":"J Mach Learn Res"},{"key":"3999_CR23","first-page":"315","volume":"15","author":"X Glorot","year":"2011","unstructured":"Glorot X, et al. Deep sparse rectifier neural networks. J Mach Learn Res. 2011;15:315\u201323.","journal-title":"J Mach Learn Res"},{"key":"3999_CR24","unstructured":"Ioffe S, Christian S. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd international conference on machine learning, ICML 2015; 2015. p. 448\u201356."},{"key":"3999_CR25","doi-asserted-by":"publisher","unstructured":"Bottou L. Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010\u201419th international conference on computational statistics; 2010. https:\/\/doi.org\/10.1007\/978-3-7908-2604-3-16.","DOI":"10.1007\/978-3-7908-2604-3-16"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-03999-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-03999-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-03999-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,4]],"date-time":"2022-08-04T16:07:48Z","timestamp":1659629268000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-03999-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,2]]},"references-count":25,"journal-issue":{"issue":"S6","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["3999"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-03999-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"type":"electronic","value":"1471-2105"}],"subject":[],"published":{"date-parts":[[2021,6,2]]},"assertion":[{"value":"21 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 February 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"128"}}