{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T19:52:48Z","timestamp":1776109968390,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Collaborative Genome Program for Fostering New Post-Genome"},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007431","name":"NRF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007431","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004083","name":"Ministry of Science ICT and Future Planning","doi-asserted-by":"publisher","award":["NRF-2014M3C9A3063541"],"award-info":[{"award-number":["NRF-2014M3C9A3063541"]}],"id":[{"id":"10.13039\/501100004083","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Next-Generation Information Computing Development Program"},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100007431","name":"NRF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007431","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Ministry of Science"},{"DOI":"10.13039\/100010669","name":"ICT","doi-asserted-by":"publisher","award":["NRF-2017M3C4A7065887"],"award-info":[{"award-number":["NRF-2017M3C4A7065887"]}],"id":[{"id":"10.13039\/100010669","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100010418","name":"Institute for Information & communications Technology Promotion","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100010418","id-type":"DOI","asserted-by":"crossref"}]},{"name":"IITP"},{"DOI":"10.13039\/501100003621","name":"MSIP","doi-asserted-by":"publisher","award":["B0717-16-0098"],"award-info":[{"award-number":["B0717-16-0098"]}],"id":[{"id":"10.13039\/501100003621","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A large number of newly sequenced proteins are generated by the next-generation sequencing technologies and the biochemical function assignment of the proteins is an important task. However, biological experiments are too expensive to characterize such a large number of protein sequences, thus protein function prediction is primarily done by computational modeling methods, such as profile Hidden Markov Model (pHMM) and k-mer based methods. Nevertheless, existing methods have some limitations; k-mer based methods are not accurate enough to assign protein functions and pHMM is not fast enough to handle large number of protein sequences from numerous genome projects. Therefore, a more accurate and faster protein function prediction method is needed.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>In this paper, we introduce DeepFam, an alignment-free method that can extract functional information directly from sequences without the need of multiple sequence alignments. In extensive experiments using the Clusters of Orthologous Groups (COGs) and G protein-coupled receptor (GPCR) dataset, DeepFam achieved better performance in terms of accuracy and runtime for predicting functions of proteins compared to the state-of-the-art methods, both alignment-free and alignment-based methods. Additionally, we showed that DeepFam has a power of capturing conserved regions to model protein families. In fact, DeepFam was able to detect conserved regions documented in the Prosite database while predicting functions of proteins. Our deep learning method will be useful in characterizing functions of the ever increasing protein sequences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Codes are available at https:\/\/bhi-kimlab.github.io\/DeepFam.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty275","type":"journal-article","created":{"date-parts":[[2018,4,16]],"date-time":"2018-04-16T11:10:42Z","timestamp":1523877042000},"page":"i254-i262","source":"Crossref","is-referenced-by-count":117,"title":["DeepFam: deep learning based alignment-free method for protein family modeling and prediction"],"prefix":"10.1093","volume":"34","author":[{"given":"Seokjun","family":"Seo","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Seoul National University, Seoul, Korea"}]},{"given":"Minsik","family":"Oh","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Seoul National University, Seoul, Korea"}]},{"given":"Youngjune","family":"Park","sequence":"additional","affiliation":[{"name":"Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea"}]},{"given":"Sun","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Seoul National University, Seoul, Korea"},{"name":"Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea"},{"name":"Bioinformatics Institute, Seoul National University, Seoul, Korea"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604234427400_bty275-B1","first-page":"265","article-title":"TensorFlow: A System for Large-scale Machine Learning","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (OSDI-16)","author":"Abadi","year":"2016"},{"key":"2023051604234427400_bty275-B2","doi-asserted-by":"crossref","first-page":"831","DOI":"10.1038\/nbt.3300","article-title":"Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning","volume":"33","author":"Alipanahi","year":"2015","journal-title":"Nat. Biotechnol"},{"key":"2023051604234427400_bty275-B3","doi-asserted-by":"crossref","first-page":"D115","DOI":"10.1093\/nar\/gkh131","article-title":"UniProt: the universal protein knowledgebase","volume":"32","author":"Apweiler","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B4","doi-asserted-by":"crossref","first-page":"e0141287.","DOI":"10.1371\/journal.pone.0141287","article-title":"Continuous distributed representation of biological sequences for deep proteomics and genomics","volume":"10","author":"Asgari","year":"2015","journal-title":"PloS One"},{"key":"2023051604234427400_bty275-B5","doi-asserted-by":"crossref","first-page":"W202","DOI":"10.1093\/nar\/gkp335","article-title":"MEME SUITE: tools for motif discovery and searching","volume":"37","author":"Bailey","year":"2009","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B6","doi-asserted-by":"crossref","first-page":"D138","DOI":"10.1093\/nar\/gkh121","article-title":"The Pfam protein families database","volume":"32","author":"Bateman","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B7","author":"Boureau","year":"2010"},{"key":"2023051604234427400_bty275-B8","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1101\/gr.849004","article-title":"WebLogo: a sequence logo generator","volume":"14","author":"Crooks","year":"2004","journal-title":"Genome Res"},{"key":"2023051604234427400_bty275-B9","doi-asserted-by":"crossref","first-page":"3113","DOI":"10.1093\/bioinformatics\/btm506","article-title":"On the hierarchical classification of G protein-coupled receptors","volume":"23","author":"Davies","year":"2007","journal-title":"Bioinformatics"},{"key":"2023051604234427400_bty275-B10","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"Durbin","year":"1998"},{"key":"2023051604234427400_bty275-B11","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023051604234427400_bty275-B12","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B13","doi-asserted-by":"crossref","first-page":"D261","DOI":"10.1093\/nar\/gku1223","article-title":"Expanded microbial genome coverage and improved protein family annotation in the COG database","volume":"43","author":"Galperin","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B14","author":"Glorot","year":"2010"},{"key":"2023051604234427400_bty275-B15","first-page":"1461","article-title":"Training highly multiclass classifiers","volume":"15","author":"Gupta","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604234427400_bty275-B16","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1093\/nar\/gkg128","article-title":"The TIGRFAMs database of protein families","volume":"31","author":"Haft","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B17","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051604234427400_bty275-B18","first-page":"48","volume-title":"Proceedings of the 32nd International Conference on Machine Learning (ICML-15)","author":"Ioffe","year":"2015"},{"key":"2023051604234427400_bty275-B19","first-page":"655","volume-title":"Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL-2014)","author":"Kalchbrenner","year":"2014"},{"key":"2023051604234427400_bty275-B20","author":"Kingma","year":"2015"},{"key":"2023051604234427400_bty275-B21","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.1093\/bioinformatics\/btl376","article-title":"Remote homology detection based on oligomer distances","volume":"22","author":"Lingner","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051604234427400_bty275-B22","doi-asserted-by":"crossref","first-page":"539.","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega","volume":"7","author":"Sievers","year":"2014","journal-title":"Mol. Syst. Biol"},{"key":"2023051604234427400_bty275-B23","doi-asserted-by":"crossref","first-page":"D344","DOI":"10.1093\/nar\/gks1067","article-title":"New and continuing developments at PROSITE","volume":"41","author":"Sigrist","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B24","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023051604234427400_bty275-B25","doi-asserted-by":"crossref","first-page":"2326","DOI":"10.1093\/bioinformatics\/btl398","article-title":"ARCS: an aggregated related column scoring scheme for aligned sequences","volume":"22","author":"Song","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051604234427400_bty275-B26","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"Srivastava","year":"2014","journal-title":"J. Mach. Learn. Res"},{"key":"2023051604234427400_bty275-B27","doi-asserted-by":"crossref","first-page":"602","DOI":"10.1016\/j.ygeno.2007.01.008","article-title":"Simple alignment-free methods for protein classification: a case study from G-protein-coupled receptors","volume":"89","author":"Strope","year":"2007","journal-title":"Genomics"},{"key":"2023051604234427400_bty275-B28","author":"Szegedy","year":"2015"},{"key":"2023051604234427400_bty275-B29","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023051604234427400_bty275-B30","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1093\/bioinformatics\/btg005","article-title":"Alignment-free sequence comparison\u2014a review","volume":"19","author":"Vinga","year":"2003","journal-title":"Bioinformatics"},{"key":"2023051604234427400_bty275-B31","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1089\/cmb.1994.1.337","article-title":"On the complexity of multiple sequence alignment","volume":"1","author":"Wang","year":"1994","journal-title":"J. Comput. Biol"},{"key":"2023051604234427400_bty275-B32","doi-asserted-by":"crossref","first-page":"186.","DOI":"10.1186\/s13059-017-1319-7","article-title":"Alignment-free sequence comparison: benefits, applications, and tools","volume":"18","author":"Zielezinski","year":"2017","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i254\/50315954\/bioinformatics_34_13_i254.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i254\/50315954\/bioinformatics_34_13_i254.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T04:25:50Z","timestamp":1684211150000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i254\/5045722"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":32,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty275","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}