{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T17:08:59Z","timestamp":1776532139586,"version":"3.51.2"},"reference-count":30,"publisher":"Oxford University Press (OUP)","issue":"21","license":[{"start":{"date-parts":[[2017,7,7]],"date-time":"2017-07-07T00:00:00Z","timestamp":1499385600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The method is available as a web server at http:\/\/www.cbs.dtu.dk\/services\/DeepLoc. Example code is available at https:\/\/github.com\/JJAlmagro\/subcellular_localization. The dataset is available at http:\/\/www.cbs.dtu.dk\/services\/DeepLoc\/data.php.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx431","type":"journal-article","created":{"date-parts":[[2017,7,3]],"date-time":"2017-07-03T19:15:21Z","timestamp":1499109321000},"page":"3387-3395","source":"Crossref","is-referenced-by-count":1006,"title":["DeepLoc: prediction of protein subcellular localization using deep learning"],"prefix":"10.1093","volume":"33","author":[{"given":"Jos\u00e9 Juan","family":"Almagro Armenteros","sequence":"first","affiliation":[{"name":"Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark"},{"name":"The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen N, Denmark"}]},{"given":"Casper Kaae","family":"S\u00f8nderby","sequence":"additional","affiliation":[{"name":"The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen N, Denmark"}]},{"given":"S\u00f8ren Kaae","family":"S\u00f8nderby","sequence":"additional","affiliation":[{"name":"The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen N, Denmark"}]},{"given":"Henrik","family":"Nielsen","sequence":"additional","affiliation":[{"name":"Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark"}]},{"given":"Ole","family":"Winther","sequence":"additional","affiliation":[{"name":"The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen N, Denmark"},{"name":"DTU Compute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark"}]}],"member":"286","published-online":{"date-parts":[[2017,7,7]]},"reference":[{"key":"2023051506254614200_btx431-B1","author":"Bahdanau","year":"2014"},{"key":"2023051506254614200_btx431-B2","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1093\/bioinformatics\/16.5.412","article-title":"Assessing the accuracy of prediction algorithms for classification: an overview","volume":"16","author":"Baldi","year":"2000","journal-title":"Bioinformatics"},{"key":"2023051506254614200_btx431-B3","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1186\/1471-2105-10-274","article-title":"Multiloc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction","volume":"10","author":"Blum","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023051506254614200_btx431-B4","doi-asserted-by":"crossref","first-page":"5363","DOI":"10.1021\/pr900665y","article-title":"Sherloc2: a high-accuracy hybrid method for predicting subcellular localization of proteins","volume":"8","author":"Briesemeister","year":"2009","journal-title":"J. Proteome Res"},{"key":"2023051506254614200_btx431-B5","doi-asserted-by":"crossref","first-page":"W497","DOI":"10.1093\/nar\/gkq477","article-title":"YLoc\u2013an interpretable web server for predicting subcellular localization","volume":"38","author":"Briesemeister","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023051506254614200_btx431-B6","doi-asserted-by":"crossref","first-page":"e18258.","DOI":"10.1371\/journal.pone.0018258","article-title":"iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins","volume":"6","author":"Chou","year":"2011","journal-title":"PLoS ONE"},{"key":"2023051506254614200_btx431-B7","volume-title":"Lasagne: First Release","author":"Dieleman","year":"2015"},{"key":"2023051506254614200_btx431-B8","doi-asserted-by":"crossref","first-page":"953","DOI":"10.1038\/nprot.2007.131","article-title":"Locating proteins in the cell using TargetP, SignalP and related tools","volume":"2","author":"Emanuelsson","year":"2007","journal-title":"Nature Protoc"},{"key":"2023051506254614200_btx431-B9","doi-asserted-by":"crossref","first-page":"i458","DOI":"10.1093\/bioinformatics\/bts390","article-title":"LocTree2 predicts localization for all domains of life","volume":"28","author":"Goldberg","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051506254614200_btx431-B10","doi-asserted-by":"crossref","first-page":"W350","DOI":"10.1093\/nar\/gku396","article-title":"Loctree3 prediction of localization","volume":"42","author":"Goldberg","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023051506254614200_btx431-B11","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/j.compbiolchem.2004.09.006","article-title":"Comparing two k-category assignments by a k-category correlation coefficient","volume":"28","author":"Gorodkin","year":"2004","journal-title":"Comput. Biol. Chem"},{"key":"2023051506254614200_btx431-B12","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051506254614200_btx431-B13","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1002\/pro.5560010313","article-title":"Selection of representative protein data sets","volume":"1","author":"Hobohm","year":"1992","journal-title":"Protein Sci"},{"key":"2023051506254614200_btx431-B14","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023051506254614200_btx431-B15","doi-asserted-by":"crossref","first-page":"1158","DOI":"10.1093\/bioinformatics\/btl002","article-title":"Multiloc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition","volume":"22","author":"H\u00f6glund","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051506254614200_btx431-B16","doi-asserted-by":"crossref","first-page":"W585","DOI":"10.1093\/nar\/gkm259","article-title":"WoLF PSORT: protein localization predictor","volume":"35","author":"Horton","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023051506254614200_btx431-B17","doi-asserted-by":"crossref","first-page":"3381","DOI":"10.1242\/jcs.089110","article-title":"Protein localization in disease and therapy","volume":"124","author":"Hung","year":"2011","journal-title":"J. Cell Sci"},{"key":"2023051506254614200_btx431-B18","doi-asserted-by":"crossref","first-page":"3970","DOI":"10.1002\/pmic.201000274","article-title":"Prediction of subcellular locations of proteins: where to proceed?","volume":"10","author":"Imai","year":"2010","journal-title":"Proteomics"},{"key":"2023051506254614200_btx431-B19","doi-asserted-by":"crossref","first-page":"924.","DOI":"10.15252\/msb.20177551","article-title":"Automated analysis of high-content microscopy data with deep learning","volume":"13","author":"Kraus","year":"2017","journal-title":"Mol. Syst. Biol"},{"key":"2023051506254614200_btx431-B20","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051506254614200_btx431-B21","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochim. Biophys. Acta (BBA)-Protein Struct"},{"key":"2023051506254614200_btx431-B22","doi-asserted-by":"crossref","first-page":"545","DOI":"10.1093\/protein\/13.8.545","article-title":"Structure-derived substitution matrices for alignment of distantly related sequences","volume":"13","author":"Prli\u0107","year":"2000","journal-title":"Protein Eng"},{"key":"2023051506254614200_btx431-B23","doi-asserted-by":"crossref","first-page":"1410","DOI":"10.1093\/bioinformatics\/btm115","article-title":"Sherloc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data","volume":"23","author":"Shatkay","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051506254614200_btx431-B24","author":"S\u00f8nderby","year":"2015"},{"key":"2023051506254614200_btx431-B25","doi-asserted-by":"crossref","first-page":"D158","DOI":"10.1093\/nar\/gkw1099","article-title":"UniProt: the universal protein knowledgebase","volume":"45","author":"The UniProt Consortium","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023051506254614200_btx431-B26","author":"Theano Development Team","year":"2016"},{"key":"2023051506254614200_btx431-B27","doi-asserted-by":"crossref","first-page":"W401","DOI":"10.1093\/nar\/gkv485","article-title":"The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides","volume":"43","author":"Tsirigos","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051506254614200_btx431-B28","doi-asserted-by":"crossref","DOI":"10.1515\/9781501501500","volume-title":"Machine Learning for Protein Subcellular Localization Prediction","author":"Wan","year":"2015"},{"key":"2023051506254614200_btx431-B29","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1002\/prot.21018","article-title":"Prediction of protein subcellular localization","volume":"64","author":"Yu","year":"2006","journal-title":"Proteins"},{"key":"2023051506254614200_btx431-B30","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1109\/TKDE.2006.17","article-title":"Training cost-sensitive neural networks with methods addressing the class imbalance problem","volume":"18","author":"Zhou","year":"2006","journal-title":"IEEE Trans. Knowledge Data Eng"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/21\/3387\/50315453\/bioinformatics_33_21_3387.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/21\/3387\/50315453\/bioinformatics_33_21_3387.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T06:26:08Z","timestamp":1684131968000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/21\/3387\/3931857"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,7,7]]},"references-count":30,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2017,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx431","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,11,1]]},"published":{"date-parts":[[2017,7,7]]}}}