{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T14:38:31Z","timestamp":1780411111943,"version":"3.54.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2021,1,18]],"date-time":"2021-01-18T00:00:00Z","timestamp":1610928000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62076109"],"award-info":[{"award-number":["62076109"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["32000464"],"award-info":[{"award-number":["32000464"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007847","name":"Natural Science Foundation of Jilin Province","doi-asserted-by":"publisher","award":["20190103006JH"],"award-info":[{"award-number":["20190103006JH"]}],"id":[{"id":"10.13039\/100007847","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005847","name":"Health and Medical Research Fund","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100005847","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005407","name":"Food and Health Bureau","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100005407","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Government of the Hong Kong Special Administrative Region","award":["07181426"],"award-info":[{"award-number":["07181426"]}]},{"DOI":"10.13039\/100017449","name":"Hong Kong Institute for Data Science","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100017449","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["CityU 11202219"],"award-info":[{"award-number":["CityU 11202219"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007567","name":"City University of Hong Kong","doi-asserted-by":"publisher","award":["CityU 11203520"],"award-info":[{"award-number":["CityU 11203520"]}],"id":[{"id":"10.13039\/100007567","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Haploinsufficiency, wherein a single allele is not enough to maintain normal functions, can lead to many diseases including cancers and neurodevelopmental disorders. Recently, computational methods for identifying haploinsufficiency have been developed. However, most of those computational methods suffer from study bias, experimental noise and instability, resulting in unsatisfactory identification of haploinsufficient genes. To address those challenges, we propose a deep forest model, called HaForest, to identify haploinsufficient genes. The multiscale scanning is proposed to extract local contextual representations from input features under Linear Discriminant Analysis. After that, the cascade forest structure is applied to obtain the concatenated features directly by integrating decision-tree-based forests. Meanwhile, to exploit the complex dependency structure among haploinsufficient genes, the LightGBM library is embedded into HaForest to reveal the highly expressive features. To validate the effectiveness of our method, we compared it to several computational methods and four deep learning algorithms on five epigenomic data sets. The results reveal that HaForest achieves superior performance over the other algorithms, demonstrating its unique and complementary performance in identifying haploinsufficient genes. The standalone tool is available at https:\/\/github.com\/yangyn533\/HaForest.<\/jats:p>","DOI":"10.1093\/bib\/bbaa393","type":"journal-article","created":{"date-parts":[[2020,12,2]],"date-time":"2020-12-02T13:09:22Z","timestamp":1606914562000},"source":"Crossref","is-referenced-by-count":4,"title":["Identification of haploinsufficient genes from epigenomic data using deep forest"],"prefix":"10.1093","volume":"22","author":[{"given":"Yuning","family":"Yang","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, Jilin University and School of Information Science and Technology, Northeast Normal University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shaochuan","family":"Li","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northeast Normal University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yunhe","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northeast Normal University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhiqiang","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, Northeast Normal University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ka-Chun","family":"Wong","sequence":"additional","affiliation":[{"name":"City University of Hong Kong, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiangtao","family":"Li","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, Jilin University, Changchun, Jilin, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2021,1,18]]},"reference":[{"issue":"11","key":"2021090907254301100_ref1","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1038\/ejhg.2008.111","article-title":"Identification of human haploinsufficient genes and their genomic proximity to segmental duplications","volume":"16","author":"Dang","year":"2008","journal-title":"Eur J Hum Genet"},{"issue":"4","key":"2021090907254301100_ref2","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1172\/JCI0215043","article-title":"Transcription factor haploinsufficiency: when half a loaf is not enough","volume":"109","author":"Seidman","year":"2002","journal-title":"J Clin Invest"},{"issue":"2","key":"2021090907254301100_ref3","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1002\/bies.10023","article-title":"Exploring the etiology of haploinsufficiency","volume":"24","author":"Veitia","year":"2002","journal-title":"Bioessays"},{"issue":"12","key":"2021090907254301100_ref4","doi-asserted-by":"crossref","first-page":"1751","DOI":"10.1093\/bioinformatics\/btx028","article-title":"HIPred: an integrative approach to predicting haploinsufficient genes","volume":"33","author":"Shihab","year":"2017","journal-title":"Bioinformatics"},{"issue":"10","key":"2021090907254301100_ref5","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pgen.1001154","article-title":"Characterising and predicting haploinsufficiency in the human genome","volume":"6","author":"Huang","year":"2010","journal-title":"PLoS Genet"},{"issue":"15","key":"2021090907254301100_ref6","doi-asserted-by":"crossref","first-page":"e101","DOI":"10.1093\/nar\/gkv474","article-title":"Haploinsufficiency predictions without study bias","volume":"43","author":"Steinberg","year":"2015","journal-title":"Nucleic Acids Res"},{"issue":"7539","key":"2021090907254301100_ref7","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nature14248","article-title":"Integrative analysis of 111 reference human epigenomes","volume":"518","author":"Kundaje","year":"2015","journal-title":"Nature"},{"issue":"7414","key":"2021090907254301100_ref8","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"Consortium","year":"2012","journal-title":"Nature"},{"issue":"1","key":"2021090907254301100_ref9","doi-asserted-by":"crossref","first-page":"2138","DOI":"10.1038\/s41467-018-04552-7","article-title":"Distinct epigenomic patterns are associated with haploinsufficiency and predict risk genes of developmental disorders","volume":"9","author":"Han","year":"2018","journal-title":"Nat Commun"},{"key":"2021090907254301100_ref10","article-title":"Deep Forest","author":"Zhou","year":"2017","journal-title":"arXiv preprint arXiv:170208835"},{"issue":"9","key":"2021090907254301100_ref11","doi-asserted-by":"crossref","first-page":"1682","DOI":"10.1101\/gr.083501.108","article-title":"High-resolution mapping and analysis of copy number variations in the human genome: a data resource for clinical and research applications","volume":"19","author":"Shaikh","year":"2009","journal-title":"Genome Res"},{"issue":"1","key":"2021090907254301100_ref12","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat Genet"},{"issue":"D1","key":"2021090907254301100_ref13","doi-asserted-by":"crossref","first-page":"D109","DOI":"10.1093\/nar\/gkr988","article-title":"KEGG for integration and interpretation of large-scale molecular data sets","volume":"40","author":"Kanehisa","year":"2011","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2021090907254301100_ref14","article-title":"Metascape provides a biologist-oriented resource for the analysis of systems-level datasets","volume":"10","author":"Zhou","year":"2019","journal-title":"Nat Commun"},{"issue":"8","key":"2021090907254301100_ref15","doi-asserted-by":"crossref","first-page":"e1003709","DOI":"10.1371\/journal.pgen.1003709","article-title":"Genic intolerance to functional variation and the interpretation of personal genomes","volume":"9","author":"Petrovski","year":"2013","journal-title":"PLoS Genet"},{"issue":"2","key":"2021090907254301100_ref16","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.neuron.2012.04.009","article-title":"De novo gene disruptions in children on the autistic spectrum","volume":"74","author":"Iossifov","year":"2012","journal-title":"Neuron"},{"issue":"4","key":"2021090907254301100_ref17","doi-asserted-by":"crossref","first-page":"948","DOI":"10.1016\/j.cell.2013.10.011","article-title":"Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome","volume":"155","author":"Davoli","year":"2013","journal-title":"Cell"},{"issue":"3","key":"2021090907254301100_ref18","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1016\/j.cell.2014.06.027","article-title":"H3K4me3 breadth is linked to cell identity and transcriptional consistency","volume":"158","author":"Benayoun","year":"2014","journal-title":"Cell"},{"key":"2021090907254301100_ref19","first-page":"3146","article-title":"Lightgbm: a highly efficient gradient boosting decision tree","volume-title":"Advances in Neural Information Processing Systems","author":"Ke","year":"2017"},{"issue":"11","key":"2021090907254301100_ref20","doi-asserted-by":"crossref","first-page":"1965","DOI":"10.1534\/g3.113.008144","article-title":"Characterization and prediction of haploinsufficiency using systems-level gene properties in yeast","volume":"3","author":"Norris","year":"2013","journal-title":"G3"},{"issue":"4","key":"2021090907254301100_ref21","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1016\/j.ajhg.2017.09.001","article-title":"DOMINO: using machine learning to predict genes associated with dominant disorders","volume":"101","author":"Quinodoz","year":"2017","journal-title":"Am J Hum Genet"},{"key":"2021090907254301100_ref22","doi-asserted-by":"crossref","first-page":"785","DOI":"10.1145\/2939672.2939785","article-title":"Xgboost: A scalable tree boosting system","volume-title":"Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chen","year":"2016"},{"key":"2021090907254301100_ref23","doi-asserted-by":"crossref","DOI":"10.1201\/b12207","volume-title":"Ensemble Methods: Foundations and Algorithms","author":"Zhou","year":"2012"},{"key":"2021090907254301100_ref24","first-page":"265","article-title":"Tensorflow: a system for large-scale machine learning","volume-title":"12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)","author":"Abadi","year":"2016"},{"key":"2021090907254301100_ref25","author":"Chollet","year":"2015"},{"key":"2021090907254301100_ref26","article-title":"Adam: a method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"arXiv preprint arXiv:14126980"},{"key":"2021090907254301100_ref27","first-page":"2825","article-title":"Scikit-learn: machine learning in python","author":"Pedregosa","year":"2011","journal-title":"J Mach Learning Res"},{"issue":"1","key":"2021090907254301100_ref28","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1186\/s12863-017-0495-5","article-title":"Network-based regularization for high dimensional SNP data in the case-control study of type 2 diabetes","volume":"18","author":"Ren","year":"2017","journal-title":"BMC Genet"},{"key":"2021090907254301100_ref29","first-page":"29","article-title":"Deep neural networks for acoustic modeling in speech recognition","author":"Hinton","year":"2012","journal-title":"IEEE Signal Process Mag"},{"key":"2021090907254301100_ref30","first-page":"1097","article-title":"Imagenet classification with deep convolutional neural networks","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky","year":"2012"},{"issue":"5","key":"2021090907254301100_ref31","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1093\/bib\/bbu046","article-title":"A selective review of robust variable selection with applications in bioinformatics","volume":"16","author":"Wu","year":"2015","journal-title":"Brief Bioinform"},{"issue":"2","key":"2021090907254301100_ref32","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1023\/A:1022859003006","article-title":"Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy","volume":"51","author":"Kuncheva","year":"2003","journal-title":"Mach Learn"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/5\/bbaa393\/40328001\/bbaa393.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/5\/bbaa393\/40328001\/bbaa393.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,14]],"date-time":"2023-10-14T12:34:24Z","timestamp":1697286864000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa393\/6102676"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,18]]},"references-count":32,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2021,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa393","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,9]]},"published":{"date-parts":[[2021,1,18]]},"article-number":"bbaa393"}}