{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T23:15:23Z","timestamp":1674861323306},"reference-count":17,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable), we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH) technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-124","type":"journal-article","created":{"date-parts":[[2008,2,27]],"date-time":"2008-02-27T07:13:36Z","timestamp":1204096416000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A gene pattern mining algorithm using interchangeable gene sets for prokaryotes"],"prefix":"10.1186","volume":"9","author":[{"given":"Meng","family":"Hu","sequence":"first","affiliation":[]},{"given":"Kwangmin","family":"Choi","sequence":"additional","affiliation":[]},{"given":"Wei","family":"Su","sequence":"additional","affiliation":[]},{"given":"Sun","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Jiong","family":"Yang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,2,26]]},"reference":[{"key":"2109_CR1","volume-title":"Silico Biol","author":"R Overbeek","year":"1998","unstructured":"Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: Use of contiguity on the chromosome to predict functional coupling. Silico Biol 1998."},{"key":"2109_CR2","doi-asserted-by":"crossref","unstructured":"Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The use of gene patterns to infer functional coupling. Proc Natl Acad Sci USA 96(6):2896\u20132901. 10.1073\/pnas.96.6.2896","DOI":"10.1073\/pnas.96.6.2896"},{"key":"2109_CR3","doi-asserted-by":"publisher","first-page":"1587","DOI":"10.1002\/pro.5560040817","volume":"4","author":"I Jonassen","year":"1995","unstructured":"Jonassen I, FC J, Higgins DG: Finding flexible patterns in unaligned protein sequences. Protein Science 1995, 4: 1587\u20131595.","journal-title":"Protein Science"},{"key":"2109_CR4","first-page":"509","volume":"13","author":"I Jonassen","year":"1997","unstructured":"Jonassen I: Effcient discovery of conserved patterns using a pattern graph. CABIOS 1997, 13: 509\u2013522.","journal-title":"CABIOS"},{"key":"2109_CR5","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1093\/bioinformatics\/14.1.55","volume":"14","author":"I Rigoutsos","year":"1998","unstructured":"Rigoutsos I, Floratos A: Combinatorial pattern discovery in biological sequences: The teiresias algorithm. Bioinformatics 1998, 14: 55\u201367. 10.1093\/bioinformatics\/14.1.55","journal-title":"Bioinformatics"},{"key":"2109_CR6","volume-title":"Proc of the Second International Workshop on Algorithms in Bioinformatics, Lecture Notes In Computer Science","author":"A Bergeron","year":"2002","unstructured":"Bergeron A, Corteel S, Raffnot M: The algorithmic of gene teams. Proc of the Second International Workshop on Algorithms in Bioinformatics, Lecture Notes In Computer Science 2002., 2452:"},{"key":"2109_CR7","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1145\/974614.974650","volume-title":"Proc of RECOMB","author":"X He","year":"2004","unstructured":"He X, Goldwasser M: Identifying conserved gene patterns in the presence of orthologous groups. Proc of RECOMB 2004, 272\u2013280."},{"issue":"5338","key":"2109_CR8","doi-asserted-by":"publisher","first-page":"631","DOI":"10.1126\/science.278.5338.631","volume":"278","author":"RL Tatusov","year":"1997","unstructured":"Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science 1997, 278(5338):631\u20137. 10.1126\/science.278.5338.631","journal-title":"Science"},{"key":"2109_CR9","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1093\/nar\/29.1.22","volume":"29","author":"RL Tatusov","year":"2001","unstructured":"Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Research 2001, 29: 22\u201328. 10.1093\/nar\/29.1.22","journal-title":"Nucleic Acids Research"},{"issue":"2","key":"2109_CR10","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1142\/S0219720006001850","volume":"4","author":"S Kim","year":"2006","unstructured":"Kim S, Choi J, Saple A, Yang Y: A hybrid hene team model and its application to genome analysis. Journal of Bioinformatics and Computational Biology 2006, 4(2):171\u2013196. 10.1142\/S0219720006001850","journal-title":"Journal of Bioinformatics and Computational Biology"},{"key":"2109_CR11","volume-title":"CSB","author":"S Kim","year":"2005","unstructured":"Kim S, Choi J, Yang Y: Gene teams with relaxed proximity constraint. CSB 2005."},{"key":"2109_CR12","doi-asserted-by":"publisher","first-page":"74","DOI":"10.1093\/bioinformatics\/btg1008","volume":"19","author":"P Calabrese","year":"2003","unstructured":"Calabrese P, Chakravarty S, Todd J: Vision Fast identification and statistical evaluation of segmental homologies in comparative maps. Bioinformatics 2003, 19: 74\u201380. 10.1093\/bioinformatics\/btg1008","journal-title":"Bioinformatics"},{"key":"2109_CR13","doi-asserted-by":"publisher","first-page":"3643","DOI":"10.1093\/bioinformatics\/bth397","volume":"20","author":"B Haas","year":"3646","unstructured":"Haas B, Delcher A, Wortman J, Salzberg S: DAGchainer: a tool for mining segmental genome duplications and synteny. Bioinformatics 3646, 20: 3643\u20132004. 10.1093\/bioinformatics\/bth397","journal-title":"Bioinformatics"},{"key":"2109_CR14","volume-title":"Int'l Conf on Principles of Knowledge Representation and Reasoning","author":"R Rymon","year":"1992","unstructured":"Rymon R: Search Through Systematic Set Enumeration. Int'l Conf on Principles of Knowledge Representation and Reasoning 1992."},{"key":"2109_CR15","unstructured":"Gene Pattern Website[http:\/\/beijing.case.edu\/genepattern\/4new]"},{"key":"2109_CR16","volume-title":"Bioinformatics","author":"XH Zheng","year":"2005","unstructured":"Zheng XH, Fu L, Wang Z, Zhong F, Hoover J, Mural R: Using shared genomic synteny and shared protein functions to enhance the identification of orthologous gene pairs. Bioinformatics 2005."},{"key":"2109_CR17","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-124.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:26:18Z","timestamp":1630466778000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-124"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,2,26]]},"references-count":17,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2109"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-124","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,2,26]]},"assertion":[{"value":"3 July 2007","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 February 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"124"}}