{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T22:04:31Z","timestamp":1774649071876,"version":"3.50.1"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2017,2,15]],"date-time":"2017-02-15T00:00:00Z","timestamp":1487116800000},"content-version":"vor","delay-in-days":16,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000265","name":"Medical Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000265","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000266","name":"Engineering and Physical Sciences Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100000266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A major cause of autosomal dominant disease is haploinsufficiency, whereby a single copy of a gene is not sufficient to maintain the normal function of the gene. A large proportion of existing methods for predicting haploinsufficiency incorporate biological networks, e.g. protein-protein interaction networks that have recently been shown to introduce study bias. As a result, these methods tend to perform best on well-studied genes, but underperform on less studied genes. The advent of large genome sequencing consortia, such as the 1000 genomes project, NHLBI Exome Sequencing Project and the Exome Aggregation Consortium creates an urgent need for unbiased haploinsufficiency prediction methods.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we describe a machine learning approach, called HIPred, that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier. We benchmark HIPred using several datasets and show that our unbiased method performs as well as, and in most cases, outperforms existing biased algorithms.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>HIPred scores for all gene identifiers are available at: https:\/\/github.com\/HAShihab\/HIPred.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx028","type":"journal-article","created":{"date-parts":[[2017,1,30]],"date-time":"2017-01-30T13:28:22Z","timestamp":1485782902000},"page":"1751-1757","source":"Crossref","is-referenced-by-count":43,"title":["HIPred: an integrative approach to predicting haploinsufficient genes"],"prefix":"10.1093","volume":"33","author":[{"given":"Hashem A","family":"Shihab","sequence":"first","affiliation":[{"name":"MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, UK"}]},{"given":"Mark F","family":"Rogers","sequence":"additional","affiliation":[{"name":"Intelligent Systems Laboratory, University of Bristol, Bristol, UK"}]},{"given":"Colin","family":"Campbell","sequence":"additional","affiliation":[{"name":"Intelligent Systems Laboratory, University of Bristol, Bristol, UK"}]},{"given":"Tom R","family":"Gaunt","sequence":"additional","affiliation":[{"name":"MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, UK"}]}],"member":"286","published-online":{"date-parts":[[2017,1,30]]},"reference":[{"key":"2023020205315726800_btx028-B1","first-page":"1","article-title":"Learning with support vector machines","volume":"5","author":"Campbell","year":"2011","journal-title":"Synth. Lect. Artif. Intell. Mach. Learn"},{"key":"2023020205315726800_btx028-B2","author":"Chen","year":"2016"},{"key":"2023020205315726800_btx028-B3","doi-asserted-by":"crossref","first-page":"e46688.","DOI":"10.1371\/journal.pone.0046688","article-title":"Predicting the functional effect of amino acid substitutions and indels","volume":"7","author":"Choi","year":"2012","journal-title":"PLoS One"},{"key":"2023020205315726800_btx028-B4","doi-asserted-by":"crossref","first-page":"1350","DOI":"10.1038\/ejhg.2008.111","article-title":"Identification of human haploinsufficient genes and their genomic proximity to segmental duplications","volume":"16","author":"Dang","year":"2008","journal-title":"Eur. J. Hum. Genet"},{"key":"2023020205315726800_btx028-B5","doi-asserted-by":"crossref","first-page":"e1001154.","DOI":"10.1371\/journal.pgen.1001154","article-title":"Characterising and predicting haploinsufficiency in the human genome","volume":"6","author":"Huang","year":"2010","journal-title":"PLoS Genet"},{"key":"2023020205315726800_btx028-B6","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1016\/j.neuron.2012.04.009","article-title":"De novo gene disruptions in children on the autistic spectrum","volume":"74","author":"Iossifov","year":"2012","journal-title":"Neuron"},{"key":"2023020205315726800_btx028-B7","doi-asserted-by":"crossref","first-page":"e1002886.","DOI":"10.1371\/journal.pcbi.1002886","article-title":"Interpretation of genomic variants using a unified biological network approach","volume":"9","author":"Khurana","year":"2013","journal-title":"PLoS Comput. Biol"},{"key":"2023020205315726800_btx028-B8","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/ng.2892","article-title":"A general framework for estimating the relative pathogenicity of human genetic variants","volume":"46","author":"Kircher","year":"2014","journal-title":"Nat. Genet"},{"key":"2023020205315726800_btx028-B9","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1038\/nature19057","article-title":"Analysis of protein-coding genetic variation in 60,706 humans","volume":"536","author":"Lek","year":"2016","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B10","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1126\/science.1215040","article-title":"A systematic survey of loss-of-function variants in human protein-coding genes","volume":"335","author":"MacArthur","year":"2012","journal-title":"Science"},{"key":"2023020205315726800_btx028-B11","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/nature11011","article-title":"Patterns and rates of exonic de novo mutations in autism spectrum disorders","volume":"485","author":"Neale","year":"2012","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B12","doi-asserted-by":"crossref","first-page":"e1000160.","DOI":"10.1371\/journal.pgen.1000160","article-title":"Genetic variation in an individual human exome","volume":"4","author":"Ng","year":"2008","journal-title":"PLoS Genet"},{"key":"2023020205315726800_btx028-B13","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nature10989","article-title":"Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations","volume":"485","author":"O\u2019Roak","year":"2012","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B14","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"2023020205315726800_btx028-B15","doi-asserted-by":"crossref","first-page":"e1001111.","DOI":"10.1371\/journal.pgen.1001111","article-title":"The characterization of twenty sequenced human genomes","volume":"6","author":"Pelak","year":"2010","journal-title":"PLoS Genet"},{"key":"2023020205315726800_btx028-B16","doi-asserted-by":"crossref","first-page":"e1003709.","DOI":"10.1371\/journal.pgen.1003709","article-title":"Genic intolerance to functional variation and the interpretation of personal genomes","volume":"9","author":"Petrovski","year":"2013","journal-title":"PLoS Genet"},{"key":"2023020205315726800_btx028-B17","doi-asserted-by":"crossref","first-page":"e33","DOI":"10.1093\/nar\/gku1322","article-title":"EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization","volume":"43","author":"Rackham","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020205315726800_btx028-B18","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1038\/nmeth.2832","article-title":"Functional annotation of noncoding sequence variants","volume":"11","author":"Ritchie","year":"2014","journal-title":"Nat. Methods"},{"key":"2023020205315726800_btx028-B19","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/nature14248","article-title":"Integrative analysis of 111 reference human epigenomes","volume":"518","author":"Roadmap Epigenomics Consortium","year":"2015","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B20","first-page":"639","author":"Rogers","year":"2015"},{"key":"2023020205315726800_btx028-B21","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1038\/nature10945","article-title":"De novo mutations revealed by whole-exome sequencing are strongly associated with autism","volume":"485","author":"Sanders","year":"2012","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B22","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1002\/humu.22225","article-title":"Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using Hidden Markov Models","volume":"34","author":"Shihab","year":"2013","journal-title":"Hum. Mutat"},{"key":"2023020205315726800_btx028-B23","doi-asserted-by":"crossref","first-page":"1536\u20131543","DOI":"10.1093\/bioinformatics\/btv009","article-title":"An integrative approach to predicting the functional effects of non-coding and coding sequence variation","volume":"31","author":"Shihab","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020205315726800_btx028-B24","first-page":"1799","article-title":"The SHOGUN machine learning toolbox","volume":"11","author":"Sonnenburg","year":"2010","journal-title":"J. Mach. Learn. Res"},{"key":"2023020205315726800_btx028-B25","doi-asserted-by":"crossref","first-page":"e101","DOI":"10.1093\/nar\/gkv474","article-title":"Haploinsufficiency predictions without study bias","volume":"43","author":"Steinberg","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023020205315726800_btx028-B26","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"The 1000 Genomes Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B27","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1038\/nature11247","article-title":"An integrated encyclopedia of DNA elements in the human genome","volume":"489","author":"The ENCODE Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023020205315726800_btx028-B28","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1002\/path.2623","article-title":"Dominance and gene dosage balance in health and disease: why levels matter!","volume":"220","author":"Veitia","year":"2010","journal-title":"J. Pathol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/12\/1751\/49039929\/bioinformatics_33_12_1751.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/12\/1751\/49039929\/bioinformatics_33_12_1751.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T05:35:43Z","timestamp":1675316143000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/12\/1751\/2964486"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,1,30]]},"references-count":28,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2017,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx028","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,6,15]]},"published":{"date-parts":[[2017,1,30]]}}}