{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T19:34:43Z","timestamp":1774380883577,"version":"3.50.1"},"reference-count":44,"publisher":"Springer Science and Business Media LLC","issue":"S16","license":[{"start":{"date-parts":[[2019,12,1]],"date-time":"2019-12-01T00:00:00Z","timestamp":1575158400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2019,12,2]],"date-time":"2019-12-02T00:00:00Z","timestamp":1575244800000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-019-3076-y","type":"journal-article","created":{"date-parts":[[2019,12,2]],"date-time":"2019-12-02T12:00:35Z","timestamp":1575288035000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":62,"title":["DeepEP: a deep learning framework for identifying essential proteins"],"prefix":"10.1186","volume":"20","author":[{"given":"Min","family":"Zeng","sequence":"first","affiliation":[]},{"given":"Min","family":"Li","sequence":"additional","affiliation":[]},{"given":"Fang-Xiang","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Yaohang","family":"Li","sequence":"additional","affiliation":[]},{"given":"Yi","family":"Pan","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,12,2]]},"reference":[{"issue":"1","key":"3076_CR1","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1038\/msb.2009.89","volume":"5","author":"JI Glass","year":"2009","unstructured":"Glass JI, Hutchison CA, Smith HO, Venter JC. A systems biology tour de force for a near-minimal bacterium. Mol Syst Biol. 2009;5(1):330.","journal-title":"Mol Syst Biol"},{"issue":"9","key":"3076_CR2","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1038\/nchembio.2007.24","volume":"3","author":"AE Clatworthy","year":"2007","unstructured":"Clatworthy AE, Pierson E, Hung DT. Targeting virulence: a new paradigm for antimicrobial therapy. Nat Chem Biol. 2007;3(9):541.","journal-title":"Nat Chem Biol"},{"issue":"1","key":"3076_CR3","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1046\/j.1365-2958.2003.03697.x","volume":"50","author":"T Roemer","year":"2003","unstructured":"Roemer T, Jiang B, Davison J, Ketela T, Veillette K, Breton A, Tandia F, Linteau A, Sillaots S, Marta C. Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery. Mol Microbiol. 2003;50(1):167\u201381.","journal-title":"Mol Microbiol"},{"issue":"3","key":"3076_CR4","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1111\/j.1440-1711.2005.01332.x","volume":"83","author":"LM Cullen","year":"2005","unstructured":"Cullen LM, Arndt GM. Genome-wide screening for gene function using RNAi in mammalian cells. Immunol Cell Biol. 2005;83(3):217\u201323.","journal-title":"Immunol Cell Biol"},{"issue":"6896","key":"3076_CR5","doi-asserted-by":"publisher","first-page":"387","DOI":"10.1038\/nature00935","volume":"418","author":"G Giaever","year":"2002","unstructured":"Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B. Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002;418(6896):387.","journal-title":"Nature"},{"issue":"6833","key":"3076_CR6","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/35075138","volume":"411","author":"H Jeong","year":"2001","unstructured":"Jeong H, Mason SP, Barab\u00e1si A-L, Oltvai ZN. Lethality and centrality in protein networks. Nature. 2001;411(6833):41.","journal-title":"Nature"},{"issue":"4","key":"3076_CR7","doi-asserted-by":"publisher","first-page":"803","DOI":"10.1093\/molbev\/msi072","volume":"22","author":"MW Hahn","year":"2004","unstructured":"Hahn MW, Kern AD. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 2004;22(4):803\u20136.","journal-title":"Mol Biol Evol"},{"issue":"2","key":"3076_CR8","doi-asserted-by":"crossref","first-page":"96","DOI":"10.1155\/JBB.2005.96","volume":"2005","author":"MP Joy","year":"2005","unstructured":"Joy MP, Brock A, Ingber DE, Huang S. High-betweenness proteins in the yeast protein interaction network. Biomed Res Int. 2005;2005(2):96\u2013103.","journal-title":"Biomed Res Int"},{"issue":"1","key":"3076_CR9","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1016\/S0022-5193(03)00071-7","volume":"223","author":"S Wuchty","year":"2003","unstructured":"Wuchty S, Stadler PF. Centers of complex networks. J Theor Biol. 2003;223(1):45\u201353.","journal-title":"J Theor Biol"},{"issue":"5","key":"3076_CR10","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.71.056103","volume":"71","author":"E Estrada","year":"2005","unstructured":"Estrada E, Rodriguez-Velazquez JA. Subgraph centrality in complex networks. Phys Rev E. 2005;71(5):056103.","journal-title":"Phys Rev E"},{"issue":"4","key":"3076_CR11","doi-asserted-by":"publisher","first-page":"1070","DOI":"10.1109\/TCBB.2011.147","volume":"9","author":"J Wang","year":"2012","unstructured":"Wang J, Li M, Wang H, Pan Y. Identification of essential proteins based on edge clustering coefficient. IEEE\/ACM Trans Comput Biol Bioinform. 2012;9(4):1070\u201380.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"3076_CR12","doi-asserted-by":"publisher","unstructured":"Li G, Li M, Wang J, Li Y, Pan Y. United neighborhood closeness centrality and orthology for predicting essential proteins. IEEE\/ACM Trans Comput Biol Bioinform. 2018. https:\/\/doi.org\/10.1109\/TCBB.2018.2889978.","DOI":"10.1109\/TCBB.2018.2889978"},{"issue":"1","key":"3076_CR13","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/1752-0509-6-15","volume":"6","author":"M Li","year":"2012","unstructured":"Li M, Zhang H, J-x W, Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol. 2012;6(1):15.","journal-title":"BMC Syst Biol"},{"issue":"2","key":"3076_CR14","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1109\/TCBB.2013.2295318","volume":"11","author":"X Tang","year":"2014","unstructured":"Tang X, Wang J, Zhong J, Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE\/ACM Trans Comput Biol Bioinform. 2014;11(2):407\u201318.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"3076_CR15","doi-asserted-by":"publisher","unstructured":"Zhang J, Li W, Zeng M, Meng X, Kurgan L, Wu F, Li M. NetEPD: a network-based essential protein discovery platform. Tsinghua Sci Technol. 2019. https:\/\/doi.org\/10.26599\/TST.2019.9010056.","DOI":"10.26599\/TST.2019.9010056"},{"key":"3076_CR16","doi-asserted-by":"publisher","unstructured":"Zeng M, Li M, Fei Z, Wu F, Li Y, Pan Y, Wang J. A deep learning framework for identifying essential proteins by integrating multiple types of biological information. IEEE\/ACM Trans Comput Biol Bioinform. 2019. https:\/\/doi.org\/10.1109\/TCBB.2019.2897679 .","DOI":"10.1109\/TCBB.2019.2897679"},{"issue":"2","key":"3076_CR17","doi-asserted-by":"publisher","first-page":"276","DOI":"10.1109\/TCBB.2014.2338317","volume":"12","author":"W Peng","year":"2015","unstructured":"Peng W, Wang J, Cheng Y, Lu Y, Wu F, Pan Y. UDoNC: an algorithm for identifying essential proteins based on protein domains and protein-protein interaction networks. IEEE\/ACM Trans Comput Biol Bioinform. 2015;12(2):276\u201388.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"issue":"6","key":"3076_CR18","doi-asserted-by":"publisher","first-page":"668","DOI":"10.1109\/TST.2016.7787009","volume":"21","author":"M Li","year":"2016","unstructured":"Li M, Niu Z, Chen X, Zhong P, Wu F, Pan Y. A reliable neighbor-based method for identifying essential proteins by integrating gene expressions, orthology, and subcellular localization information. Tsinghua Sci Technol. 2016;21(6):668\u201377.","journal-title":"Tsinghua Sci Technol"},{"issue":"8","key":"3076_CR19","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1186\/s12859-016-1115-5","volume":"17","author":"G Li","year":"2016","unstructured":"Li G, Li M, Wang J, Wu J, Wu F-X, Pan Y. Predicting essential proteins based on subcellular localization, orthology and PPI networks. BMC Bioinf. 2016;17(8):279.","journal-title":"BMC Bioinf"},{"key":"3076_CR20","doi-asserted-by":"publisher","first-page":"136","DOI":"10.1016\/j.knosys.2018.03.027","volume":"151","author":"X Lei","year":"2018","unstructured":"Lei X, Zhao J, Fujita H, Zhang A. Predicting essential proteins based on RNA-Seq, subcellular localization and GO annotation datasets. Knowl-Based Syst. 2018;151:136\u201348.","journal-title":"Knowl-Based Syst"},{"key":"3076_CR21","doi-asserted-by":"publisher","unstructured":"Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform. 2019. https:\/\/doi.org\/10.1093\/bib\/bbz017.","DOI":"10.1093\/bib\/bbz017"},{"issue":"12","key":"3076_CR22","doi-asserted-by":"publisher","first-page":"1672","DOI":"10.1039\/b900611g","volume":"5","author":"Y-C Hwang","year":"2009","unstructured":"Hwang Y-C, Lin C-C, Chang J-Y, Mori H, Juan H-F, Huang H-C. Predicting essential genes based on network and sequence analysis. Mol BioSyst. 2009;5(12):1672\u20138.","journal-title":"Mol BioSyst"},{"key":"3076_CR23","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/j.compbiolchem.2014.01.011","volume":"50","author":"Y Lu","year":"2014","unstructured":"Lu Y, Deng J, Rhodes JC, Lu H, Lu LJ. Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. Comput Biol Chem. 2014;50:29\u201340.","journal-title":"Comput Biol Chem"},{"issue":"1","key":"3076_CR24","doi-asserted-by":"publisher","first-page":"e86805","DOI":"10.1371\/journal.pone.0086805","volume":"9","author":"J Cheng","year":"2014","unstructured":"Cheng J, Xu Z, Wu W, Zhao L, Li X, Liu Y, Tao S. Training set selection for the prediction of essential genes. PLoS One. 2014;9(1):e86805.","journal-title":"PLoS One"},{"issue":"1","key":"3076_CR25","doi-asserted-by":"publisher","first-page":"290","DOI":"10.1186\/1471-2105-10-290","volume":"10","author":"ML Acencio","year":"2009","unstructured":"Acencio ML, Lemke N. Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinf. 2009;10(1):290.","journal-title":"BMC Bioinf"},{"issue":"4","key":"3076_CR26","doi-asserted-by":"publisher","first-page":"S7","DOI":"10.1186\/1471-2164-14-S8-S7","volume":"14","author":"J Zhong","year":"2013","unstructured":"Zhong J, Wang J, Peng W, Zhang Z, Pan Y. Prediction of essential proteins based on gene expression programming. BMC Genomics. 2013;14(4):S7.","journal-title":"BMC Genomics"},{"key":"3076_CR27","doi-asserted-by":"publisher","unstructured":"Li M, Gao H, Wang J, Wu F. Control principles for complex biological networks. Brief Bioinform. 2018. https:\/\/doi.org\/10.1093\/bib\/bby088.","DOI":"10.1093\/bib\/bby088"},{"key":"3076_CR28","doi-asserted-by":"publisher","DOI":"10.1002\/pmic.201900019","volume":"19","author":"F Zhang","year":"2019","unstructured":"Zhang F, Song H, Zeng M, Li Y, Kurgan L, Li M. DeepFunc: a deep learning framework for accurate prediction of protein functions from protein sequences and interactions. Proteomics. 2019;19:1900019.","journal-title":"Proteomics"},{"key":"3076_CR29","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1016\/j.neucom.2018.04.081","volume":"324","author":"M Zeng","year":"2019","unstructured":"Zeng M, Li M, Fei Z, Yu Y, Pan Y, Wang J. Automatic ICD-9 coding via deep transfer learning. Neurocomputing. 2019;324:43\u201350.","journal-title":"Neurocomputing"},{"issue":"4","key":"3076_CR30","doi-asserted-by":"publisher","first-page":"1193","DOI":"10.1109\/TCBB.2018.2817488","volume":"16","author":"Min Li","year":"2019","unstructured":"Li M, Fei Z, Zeng M, Wu F, Li Y, Pan Y, Wang J. Automated ICD-9 coding via a deep learning approach. IEEE\/ACM Trans Comput Biol Bioinf. 2018. https:\/\/doi.org\/10.1109\/TCBB.2018.2817488.","journal-title":"IEEE\/ACM Transactions on Computational Biology and Bioinformatics"},{"key":"3076_CR31","first-page":"3889","volume-title":"IJCAI","author":"C Tu","year":"2016","unstructured":"Tu C, Zhang W, Liu Z, Sun M. Max-margin DeepWalk: discriminative learning of network representation. In: IJCAI; 2016. p. 3889\u201395."},{"key":"3076_CR32","first-page":"3111","volume-title":"Advances in neural information processing systems","author":"T Mikolov","year":"2013","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. p. 3111\u20139."},{"key":"3076_CR33","doi-asserted-by":"publisher","unstructured":"Grover A, Leskovec J. node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. New York: ACM; 2016. p. 855\u201364. https:\/\/doi.org\/10.1145\/2939672.2939754.","DOI":"10.1145\/2939672.2939754"},{"key":"3076_CR34","doi-asserted-by":"crossref","unstructured":"He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2009;21(9):1263\u201384.","DOI":"10.1109\/TKDE.2008.239"},{"key":"3076_CR35","doi-asserted-by":"publisher","unstructured":"Zeng M, Zou B, Wei F, Liu X, Wang L. Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. In: 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS). Chongqing: IEEE; 2016. p. 225\u20138. https:\/\/doi.org\/10.1109\/ICOACS.2016.7563084.","DOI":"10.1109\/ICOACS.2016.7563084"},{"key":"3076_CR36","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321\u201357.","journal-title":"J Artif Intell Res"},{"key":"3076_CR37","doi-asserted-by":"publisher","unstructured":"Zeng M, Zhang F, Wu F, Li Y, Wang J, Li M. Protein-protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics. https:\/\/doi.org\/10.1093\/bioinformatics\/btz699.","DOI":"10.1093\/bioinformatics\/btz699"},{"key":"3076_CR38","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1016\/j.media.2016.10.004","volume":"36","author":"K Kamnitsas","year":"2017","unstructured":"Kamnitsas K, Ledig C, Newcombe VF, Simpson JP, Kane AD, Menon DK, Rueckert D, Glocker B. Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med Image Anal. 2017;36:61\u201378.","journal-title":"Med Image Anal"},{"issue":"suppl_1","key":"3076_CR39","doi-asserted-by":"publisher","first-page":"D535","DOI":"10.1093\/nar\/gkj109","volume":"34","author":"C Stark","year":"2006","unstructured":"Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34(suppl_1):D535\u20139.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"3076_CR40","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1093\/nar\/30.1.31","volume":"30","author":"H-W Mewes","year":"2002","unstructured":"Mewes H-W, Frishman D, G\u00fcldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, M\u00fcnsterk\u00f6tter M, Rudd S, Weil B. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002;30(1):31\u20134.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"3076_CR41","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1093\/nar\/26.1.73","volume":"26","author":"JM Cherry","year":"1998","unstructured":"Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M. SGD: Saccharomyces genome database. Nucleic Acids Res. 1998;26(1):73\u20139.","journal-title":"Nucleic Acids Res"},{"issue":"suppl_1","key":"3076_CR42","first-page":"D455","volume":"37","author":"R Zhang","year":"2008","unstructured":"Zhang R, Lin Y. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 2008;37(suppl_1):D455\u20138.","journal-title":"Nucleic Acids Res"},{"key":"3076_CR43","first-page":"265","volume-title":"OSDI","author":"M Abadi","year":"2016","unstructured":"Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M. Tensorflow: a system for large-scale machine learning. In: OSDI; 2016. p. 265\u201383."},{"issue":"4","key":"3076_CR44","doi-asserted-by":"publisher","first-page":"288","DOI":"10.26599\/BDMA.2019.9020007","volume":"2","author":"Ying Yu","year":"2019","unstructured":"Yu Y, Li M, Liu L, Li Y, Wang J. Clinical big data and deep learning: Applications, challenges, and future outlooks. Big Data Mining and Analytics, 2019, 2(4): 288-305.","journal-title":"Big Data Mining and Analytics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3076-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3076-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3076-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,27]],"date-time":"2024-07-27T23:32:57Z","timestamp":1722123177000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3076-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12]]},"references-count":44,"journal-issue":{"issue":"S16","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["3076"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3076-y","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,12]]},"assertion":[{"value":"2 December 2019","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"506"}}