{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,1]],"date-time":"2026-03-01T00:47:23Z","timestamp":1772326043546,"version":"3.50.1"},"reference-count":34,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Nonnegative matrix factorization (NMF) is a feature extraction method that has the property of intuitive part-based representation of the original features. This unique ability makes NMF a potentially promising method for biological sequence analysis. Here, we apply NMF to fold recognition and remote homolog detection problems. Recent studies have shown that combining support vector machines (SVM) with profile-profile alignments improves performance of fold recognition and remote homolog detection remarkably. However, it is not clear which parts of sequences are essential for the performance improvement.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The performance of fold recognition and remote homolog detection using NMF features is compared to that of the unmodified profile-profile alignment (PPA) features by estimating Receiver Operating Characteristic (ROC) scores. The overall performance is noticeably improved. For fold recognition at the fold level, SVM with NMF features recognize 30% of homolog proteins at &gt; 0.99 ROC scores, while original PPA feature, HHsearch, and PSI-BLAST recognize almost none. For detecting remote homologs that are related at the superfamily level, NMF features also achieve higher performance than the original PPA features. At &gt; 0.90 ROC<jats:sub>50<\/jats:sub> scores, 25% of proteins with NMF features correctly detects remotely related proteins, whereas using original PPA features only 1% of proteins detect remote homologs. In addition, we investigate the effect of number of positive training examples and the number of basis vectors on performance improvement. We also analyze the ability of NMF to extract essential features by comparing NMF basis vectors with functionally important sites and structurally conserved regions of proteins. The results show that NMF basis vectors have significant overlap with functional sites from PROSITE and with structurally conserved regions from the multiple structural alignments generated by MUSTANG. The correlation between NMF basis vectors and biologically essential parts of proteins supports our conjecture that NMF basis vectors can explicitly represent important sites of proteins.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The present work demonstrates that applying NMF to profile-profile alignments can reveal essential features of proteins and that these features significantly improve the performance of fold recognition and remote homolog detection.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-298","type":"journal-article","created":{"date-parts":[[2008,7,1]],"date-time":"2008-07-01T06:15:09Z","timestamp":1214892909000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection"],"prefix":"10.1186","volume":"9","author":[{"given":"Inkyung","family":"Jung","sequence":"first","affiliation":[]},{"given":"Jaehyung","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Soo-Young","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Dongsup","family":"Kim","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,7,1]]},"reference":[{"issue":"6755","key":"2283_CR1","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1038\/44565","volume":"401","author":"DD Lee","year":"1999","unstructured":"Lee DD, Seung HS: Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401(6755):788\u2013791. 10.1038\/44565","journal-title":"Nature"},{"key":"2283_CR2","doi-asserted-by":"publisher","first-page":"366","DOI":"10.1186\/1471-2105-7-366","volume":"7","author":"A Pascual-Montano","year":"2006","unstructured":"Pascual-Montano A, Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Marqui RD: bioNMF: a versatile tool for non-negative matrix factorization in biology. BMC Bioinformatics 2006, 7: 366. 10.1186\/1471-2105-7-366","journal-title":"BMC Bioinformatics"},{"key":"2283_CR3","doi-asserted-by":"publisher","first-page":"121","DOI":"10.1002\/(SICI)1097-0134(1999)37:3+<121::AID-PROT16>3.0.CO;2-Q","volume":"Suppl 3","author":"K Karplus","year":"1999","unstructured":"Karplus K, Barrett C, Cline M, Diekhans M, Grate L, Hughey R: Predicting protein structure using only sequence information. Proteins 1999, Suppl 3: 121\u2013125. Publisher Full Text 10.1002\/(SICI)1097-0134(1999)37:3+%3C121::AID-PROT16%3E3.0.CO;2-Q","journal-title":"Proteins"},{"key":"2283_CR4","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"issue":"2","key":"2283_CR5","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1110\/ps.9.2.232","volume":"9","author":"L Rychlewski","year":"2000","unstructured":"Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000, 9(2):232\u2013241.","journal-title":"Protein Sci"},{"issue":"3","key":"2283_CR6","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1093\/bioinformatics\/17.3.272","volume":"17","author":"A Heger","year":"2001","unstructured":"Heger A, Holm L: Picasso: generating a covering set of protein family profiles. Bioinformatics 2001, 17(3):272\u2013279. 10.1093\/bioinformatics\/17.3.272","journal-title":"Bioinformatics"},{"issue":"1","key":"2283_CR7","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1016\/S0022-2836(02)01371-2","volume":"326","author":"R Sadreyev","year":"2003","unstructured":"Sadreyev R, Grishin N: COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 2003, 326(1):317\u2013336. 10.1016\/S0022-2836(02)01371-2","journal-title":"J Mol Biol"},{"issue":"4","key":"2283_CR8","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1006\/jmbi.1999.2583","volume":"287","author":"DT Jones","year":"1999","unstructured":"Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 1999, 287(4):797\u2013815. 10.1006\/jmbi.1999.2583","journal-title":"J Mol Biol"},{"issue":"2","key":"2283_CR9","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1006\/jmbi.2000.3741","volume":"299","author":"LA Kelley","year":"2000","unstructured":"Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 2000, 299(2):499\u2013520. 10.1006\/jmbi.2000.3741","journal-title":"J Mol Biol"},{"issue":"1","key":"2283_CR10","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1006\/jmbi.2001.4762","volume":"310","author":"J Shi","year":"2001","unstructured":"Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001, 310(1):243\u2013257. 10.1006\/jmbi.2001.4762","journal-title":"J Mol Biol"},{"issue":"9","key":"2283_CR11","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1093\/protein\/gzg081","volume":"16","author":"D Kim","year":"2003","unstructured":"Kim D, Xu D, Guo JT, Ellrott K, Xu Y: PROSPECT II: protein structure prediction program for genome-scale applications. Protein Eng 2003, 16(9):641\u2013650. 10.1093\/protein\/gzg081","journal-title":"Protein Eng"},{"issue":"3","key":"2283_CR12","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1002\/1097-0134(20000815)40:3<343::AID-PROT10>3.0.CO;2-S","volume":"40","author":"Y Xu","year":"2000","unstructured":"Xu Y, Xu D: Protein threading using PROSPECT: design and evaluation. Proteins 2000, 40(3):343\u2013354. 10.1002\/1097-0134(20000815)40:3<343::AID-PROT10>3.0.CO;2-S","journal-title":"Proteins"},{"key":"2283_CR13","doi-asserted-by":"publisher","first-page":"152","DOI":"10.1002\/prot.20732","volume":"61 Suppl 7","author":"H Zhou","year":"2005","unstructured":"Zhou H, Zhou Y: SPARKS 2 and SP3 servers in CASP6. Proteins 2005, 61 Suppl 7: 152\u2013156. 10.1002\/prot.20732","journal-title":"Proteins"},{"key":"2283_CR14","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1186\/1471-2105-6-253","volume":"6","author":"T Ohlson","year":"2005","unstructured":"Ohlson T, Elofsson A: ProfNet, a method to derive profile-profile alignment scoring functions that improves the alignments of distantly related proteins. BMC Bioinformatics 2005, 6: 253. 10.1186\/1471-2105-6-253","journal-title":"BMC Bioinformatics"},{"issue":"8","key":"2283_CR15","doi-asserted-by":"publisher","first-page":"1301","DOI":"10.1093\/bioinformatics\/bth090","volume":"20","author":"RC Edgar","year":"2004","unstructured":"Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301\u20131308. 10.1093\/bioinformatics\/bth090","journal-title":"Bioinformatics"},{"issue":"7","key":"2283_CR16","doi-asserted-by":"publisher","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","volume":"21","author":"J Soding","year":"2005","unstructured":"Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics 2005, 21(7):951\u2013960. 10.1093\/bioinformatics\/bti125","journal-title":"Bioinformatics"},{"issue":"3","key":"2283_CR17","doi-asserted-by":"publisher","first-page":"518","DOI":"10.1002\/prot.20221","volume":"57","author":"Y Hou","year":"2004","unstructured":"Hou Y, Hsu W, Lee ML, Bystroff C: Remote homolog detection using local sequence-structure correlations. Proteins 2004, 57(3):518\u2013530. 10.1002\/prot.20221","journal-title":"Proteins"},{"issue":"17","key":"2283_CR18","doi-asserted-by":"publisher","first-page":"2294","DOI":"10.1093\/bioinformatics\/btg317","volume":"19","author":"Y Hou","year":"2003","unstructured":"Hou Y, Hsu W, Lee ML, Bystroff C: Efficient remote homology detection using local structure. Bioinformatics 2003, 19(17):2294\u20132301. 10.1093\/bioinformatics\/btg317","journal-title":"Bioinformatics"},{"issue":"6","key":"2283_CR19","doi-asserted-by":"publisher","first-page":"857","DOI":"10.1089\/106652703322756113","volume":"10","author":"L Liao","year":"2003","unstructured":"Liao L, Noble WS: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol 2003, 10(6):857\u2013868. 10.1089\/106652703322756113","journal-title":"J Comput Biol"},{"issue":"1-2","key":"2283_CR20","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1089\/10665270050081405","volume":"7","author":"T Jaakkola","year":"2000","unstructured":"Jaakkola T, Diekhans M, Haussler D: A discriminative framework for detecting remote protein homologies. J Comput Biol 2000, 7(1\u20132):95\u2013114. 10.1089\/10665270050081405","journal-title":"J Comput Biol"},{"issue":"11","key":"2283_CR21","doi-asserted-by":"publisher","first-page":"2667","DOI":"10.1093\/bioinformatics\/bti384","volume":"21","author":"S Han","year":"2005","unstructured":"Han S, Lee BC, Yu ST, Jeong CS, Lee S, Kim D: Fold recognition by combining profile-profile alignment and support vector machine. Bioinformatics 2005, 21(11):2667\u20132673. 10.1093\/bioinformatics\/bti384","journal-title":"Bioinformatics"},{"issue":"11","key":"2283_CR22","doi-asserted-by":"publisher","first-page":"1682","DOI":"10.1093\/bioinformatics\/bth141","volume":"20","author":"H Saigo","year":"2004","unstructured":"Saigo H, Vert JP, Ueda N, Akutsu T: Protein homology detection using string alignment kernels. Bioinformatics 2004, 20(11):1682\u20131689. 10.1093\/bioinformatics\/bth141","journal-title":"Bioinformatics"},{"issue":"23","key":"2283_CR23","doi-asserted-by":"publisher","first-page":"4239","DOI":"10.1093\/bioinformatics\/bti687","volume":"21","author":"H Rangwala","year":"2005","unstructured":"Rangwala H, Karypis G: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 2005, 21(23):4239\u20134247. 10.1093\/bioinformatics\/bti687","journal-title":"Bioinformatics"},{"issue":"15","key":"2283_CR24","doi-asserted-by":"publisher","first-page":"3241","DOI":"10.1093\/bioinformatics\/bti497","volume":"21","author":"J Weston","year":"2005","unstructured":"Weston J, Leslie C, Ie E, Zhou D, Elisseeff A, Noble WS: Semi-supervised protein classification using cluster kernels. Bioinformatics 2005, 21(15):3241\u20133247. 10.1093\/bioinformatics\/bti497","journal-title":"Bioinformatics"},{"key":"2283_CR25","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1186\/1471-2105-7-78","volume":"7","author":"P Carmona-Saez","year":"2006","unstructured":"Carmona-Saez P, Pascual-Marqui RD, Tirado F, Carazo JM, Pascual-Montano A: Biclustering of gene expression data by Non-smooth Non-negative Matrix Factorization. BMC Bioinformatics 2006, 7: 78. 10.1186\/1471-2105-7-78","journal-title":"BMC Bioinformatics"},{"key":"2283_CR26","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1186\/1471-2105-7-41","volume":"7","author":"M Chagoyen","year":"2006","unstructured":"Chagoyen M, Carmona-Saez P, Shatkay H, Carazo JM, Pascual-Montano A: Discovering semantic features in the literature: a foundation for building functional associations. BMC Bioinformatics 2006, 7: 41. 10.1186\/1471-2105-7-41","journal-title":"BMC Bioinformatics"},{"key":"2283_CR27","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1186\/1471-2105-7-175","volume":"7","author":"G Wang","year":"2006","unstructured":"Wang G, Kossenkov AV, Ochs MF: LS-NMF: a modified non-negative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics 2006, 7: 175. 10.1186\/1471-2105-7-175","journal-title":"BMC Bioinformatics"},{"issue":"14","key":"2283_CR28","doi-asserted-by":"publisher","first-page":"1728","DOI":"10.1093\/bioinformatics\/btm247","volume":"23","author":"S Hochreiter","year":"2007","unstructured":"Hochreiter S, Heusel M, Obermayer K: Fast model-based protein homology detection without alignment. Bioinformatics 2007, 23(14):1728\u20131736. 10.1093\/bioinformatics\/btm247","journal-title":"Bioinformatics"},{"issue":"Database issue","key":"2283_CR29","doi-asserted-by":"publisher","first-page":"D227","DOI":"10.1093\/nar\/gkj063","volume":"34","author":"N Hulo","year":"2006","unstructured":"Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJ: The PROSITE database. Nucleic Acids Res 2006, 34(Database issue):D227\u201330. 10.1093\/nar\/gkj063","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"2283_CR30","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1002\/prot.20921","volume":"64","author":"AS Konagurthu","year":"2006","unstructured":"Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM: MUSTANG: a multiple structural alignment algorithm. Proteins 2006, 64(3):559\u2013574. 10.1002\/prot.20921","journal-title":"Proteins"},{"issue":"3","key":"2283_CR31","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1016\/j.sbi.2006.05.006","volume":"16","author":"RL Dunbrack Jr.","year":"2006","unstructured":"Dunbrack RL Jr.: Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006, 16(3):374\u2013384. 10.1016\/j.sbi.2006.05.006","journal-title":"Curr Opin Struct Biol"},{"issue":"Database issue","key":"2283_CR32","doi-asserted-by":"publisher","first-page":"D189","DOI":"10.1093\/nar\/gkh034","volume":"32","author":"JM Chandonia","year":"2004","unstructured":"Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE: The ASTRAL Compendium in 2004. Nucleic Acids Res 2004, 32(Database issue):D189\u201392. 10.1093\/nar\/gkh034","journal-title":"Nucleic Acids Res"},{"key":"2283_CR33","volume-title":"Projected Gradient Methods for Non-negative Matrix Factorization","author":"CJ Lin","year":"2005","unstructured":"Lin CJ: Projected Gradient Methods for Non-negative Matrix Factorization. Volume 352. Department of Computer Science National Taiwan University; 2005."},{"key":"2283_CR34","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1016\/S0097-8485(96)80004-0","volume":"20","author":"M Gribskov","year":"1996","unstructured":"Gribskov M, Robinson NL: The use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Computers Chem 1996, 20: 25\u201334. 10.1016\/S0097-8485(96)80004-0","journal-title":"Computers Chem"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-298.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:00:04Z","timestamp":1630494004000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-298"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,7,1]]},"references-count":34,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2283"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-298","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,7,1]]},"assertion":[{"value":"8 January 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 July 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 July 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"298"}}