{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,18]],"date-time":"2025-10-18T20:37:28Z","timestamp":1760819848347},"reference-count":38,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method.<\/jats:p>\n            <jats:p>These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-213","type":"journal-article","created":{"date-parts":[[2006,4,20]],"date-time":"2006-04-20T15:47:49Z","timestamp":1145548069000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["An analysis of the Sargasso Sea resource and the consequences for database composition"],"prefix":"10.1186","volume":"7","author":[{"given":"Michael L","family":"Tress","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Domenico","family":"Cozzetto","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anna","family":"Tramontano","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2006,4,19]]},"reference":[{"key":"952_CR1","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1126\/science.1093857","volume":"304","author":"JC Venter","year":"2004","unstructured":"Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science 2004, 304: 66\u201374. 10.1126\/science.1093857","journal-title":"Science"},{"key":"952_CR2","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1093\/bib\/5.1.39","volume":"5","author":"A Bairoch","year":"2004","unstructured":"Bairoch A, Boeckmann B, Ferro S, Gasteiger E: Swiss-Prot: Juggling between evolution and stability. Brief Bioinform 2004, 5: 39\u201355. 10.1093\/bib\/5.1.39","journal-title":"Brief Bioinform"},{"key":"952_CR3","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1126\/science.1107851","volume":"308","author":"S Green Tringe","year":"2005","unstructured":"Green Tringe S, von Mering C, Kobayashi A, Salamov AA, Chen K, Chang HW, Podar M, Short JM, Mathur EJ, Detter JC, Bork P, Hugenholtz P, Rubin EM: Comparative Metagenomics of Microbial Communities. Science 2005, 308: 554\u2013557. 10.1126\/science.1107851","journal-title":"Science"},{"key":"952_CR4","doi-asserted-by":"publisher","first-page":"1208","DOI":"10.1038\/sj.embor.7400538","volume":"6","author":"KU Foerstner","year":"2005","unstructured":"Foerstner KU, Mering C, Hooper SD, Bork P: Environments shape the nucleotide composition of genomes. EMBO Reports 2005, 6: 1208\u20131213. 10.1038\/sj.embor.7400538","journal-title":"EMBO Reports"},{"key":"952_CR5","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1016\/S0378-1119(02)00871-5","volume":"297","author":"F Tekaia","year":"2002","unstructured":"Tekaia F, Yeramian E, Dujon B: Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 2002, 297: 51\u201360. 10.1016\/S0378-1119(02)00871-5","journal-title":"Gene"},{"key":"952_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.febslet.2004.06.030","volume":"570","author":"J Meyer","year":"2004","unstructured":"Meyer J: Miraculous catch of iron-sulfur protein sequences in the Sargasso Sea. FEBS Letters 2004, 570: 1\u20136. 10.1016\/j.febslet.2004.06.030","journal-title":"FEBS Letters"},{"key":"952_CR7","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SR Altschul","year":"1997","unstructured":"Altschul SR, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"952_CR8","doi-asserted-by":"crossref","unstructured":"Tramontano A, Morea V: Assessment of homology based predictions in CASP 5. Proteins 2003, (Suppl 6):352\u2013368. 10.1002\/prot.10543","DOI":"10.1002\/prot.10543"},{"key":"952_CR9","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","volume":"28","author":"HM Berman","year":"2000","unstructured":"Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235\u2013242. 10.1093\/nar\/28.1.235","journal-title":"Nucleic Acids Res"},{"key":"952_CR10","doi-asserted-by":"publisher","first-page":"7290","DOI":"10.1073\/pnas.89.16.7290","volume":"89","author":"P Bork","year":"1992","unstructured":"Bork P, Sander C, Valencia A: An ATPase Domain Common to Prokaryotic Cell Cycle Proteins, Sugar Kinases, Actin, and hsp70 Heat Shock Proteins. Proc Natl Acad Sci 1992, 89: 7290\u20137294. 10.1073\/pnas.89.16.7290","journal-title":"Proc Natl Acad Sci"},{"key":"952_CR11","doi-asserted-by":"publisher","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","volume":"22","author":"JD Thompson","year":"1994","unstructured":"Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994, 22: 4673\u20134680.","journal-title":"Nucleic Acids Res"},{"key":"952_CR12","doi-asserted-by":"publisher","first-page":"1792","DOI":"10.1093\/nar\/gkh340","volume":"32","author":"RC Edgar","year":"2004","unstructured":"Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004, 32: 1792\u20139. 10.1093\/nar\/gkh340","journal-title":"Nucleic Acids Res"},{"key":"952_CR13","doi-asserted-by":"publisher","first-page":"554","DOI":"10.1016\/S0076-6879(96)66035-2","volume":"266","author":"JC Wootton","year":"1996","unstructured":"Wootton JC, Federhen S: Analysis of compositionally biased regions in sequence databases. Methods Enzymol 1996, 266: 554\u201371.","journal-title":"Methods Enzymol"},{"key":"952_CR14","doi-asserted-by":"publisher","first-page":"282","DOI":"10.1093\/bioinformatics\/17.3.282","volume":"17","author":"W Li","year":"2001","unstructured":"Li W, Jaroszewski L, Godzik A: Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 2001, 17: 282\u2013283. 10.1093\/bioinformatics\/17.3.282","journal-title":"Bioinformatics"},{"key":"952_CR15","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1016\/S0959-440X(98)80072-9","volume":"8","author":"L Holm","year":"1998","unstructured":"Holm L: Unification of protein families. Curr Op Struct Biol 1998, 8: 372\u2013379. 10.1016\/S0959-440X(98)80072-9","journal-title":"Curr Op Struct Biol"},{"key":"952_CR16","doi-asserted-by":"publisher","first-page":"751","DOI":"10.1126\/science.285.5428.751","volume":"285","author":"EM Marcotte","year":"1999","unstructured":"Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D: Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285: 751\u2013753. 10.1126\/science.285.5428.751","journal-title":"Science"},{"key":"952_CR17","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","volume":"41","author":"D Devos","year":"2000","unstructured":"Devos D, Valencia A: Practical limits of function prediction. Proteins 2000, 41: 98\u2013107. 10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","journal-title":"Proteins"},{"key":"952_CR18","doi-asserted-by":"publisher","first-page":"705","DOI":"10.1016\/S0022-2836(03)00622-3","volume":"330","author":"ML Tress","year":"2003","unstructured":"Tress ML, Jones DT, Valencia A: Predicting Reliable Regions in Protein Alignments from Sequence Profiles. J Mol Biol 2003, 330: 705\u2013718. 10.1016\/S0022-2836(03)00622-3","journal-title":"J Mol Biol"},{"key":"952_CR19","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1002\/prot.10029","volume":"46","author":"D Przybylski","year":"2002","unstructured":"Przybylski D, Rost B: Alignments grow, secondary structure prediction improves. Proteins 2002, 46: 197\u2013205. 10.1002\/prot.10029","journal-title":"Proteins"},{"key":"952_CR20","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1016\/S0968-0004(01)02039-4","volume":"27","author":"DT Jones","year":"2002","unstructured":"Jones DT, Swindells M: Getting the Most from PSI-BLAST. Trends in Biochemical Sciences 2002, 27: 161\u2013164. 10.1016\/S0968-0004(01)02039-4","journal-title":"Trends in Biochemical Sciences"},{"key":"952_CR21","doi-asserted-by":"crossref","unstructured":"Chen K, Pachter L: Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities. PLOS Computational Biology 2005., 1(2):","DOI":"10.1371\/journal.pcbi.0010024"},{"key":"952_CR22","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1671\/0272-4634(2003)023[0297:ITICAP]2.0.CO;2","volume":"23","author":"JJ Wiens","year":"2003","unstructured":"Wiens JJ: Incomplete taxa, incomplete characters, and phylogenetic accuracy: Is there a missing data problem? J Vertebr Paleontol 2003, 23: 297\u2013310.","journal-title":"J Vertebr Paleontol"},{"key":"952_CR23","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1111\/j.1462-2920.2004.00652.x","volume":"6","author":"MY Galperin","year":"2004","unstructured":"Galperin MY: Metagenomics: from acid mine to shining sea. Environmental Microbiology 2004, 6: 543\u2013545. 10.1111\/j.1462-2920.2004.00652.x","journal-title":"Environmental Microbiology"},{"key":"952_CR24","doi-asserted-by":"publisher","first-page":"D138","DOI":"10.1093\/nar\/gkh121","volume":"32","author":"A Bateman","year":"2004","unstructured":"Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, Eddy SR: The PFAM Protein Families Database. Nucleic Acids Res 2004, 32: D138-D141. 10.1093\/nar\/gkh121","journal-title":"Nucleic Acids Res"},{"key":"952_CR25","doi-asserted-by":"publisher","first-page":"4607","DOI":"10.1128\/JB.187.13.4607-4614.2005","volume":"187","author":"MG Kalyuzhnaya","year":"2005","unstructured":"Kalyuzhnaya MG, Korotkova N, Crowther G, Marx CJ, Lidstrom ME, Chistoserdova M: Analysis of Gene Islands Involved in Methanopterin-Linked C1 Transfer Reactions Reveals New Functions and Provides Evolutionary Insights. Journal of Bacteriology 2005, 187: 4607\u20134614. 10.1128\/JB.187.13.4607-4614.2005","journal-title":"Journal of Bacteriology"},{"key":"952_CR26","doi-asserted-by":"crossref","unstructured":"Sabehi G, Loy A, Jung KH, Partha R, Spudich JL, Isaacson T, Hirschberg J, Wagner M, B\u00e9j\u00e0 O: New Insights into Metabolic Properties of Marine Bacteria Encoding Proteorhodopsins. PLOS Medicine 2005., 3(8):","DOI":"10.1371\/journal.pbio.0030273"},{"key":"952_CR27","volume-title":"BMC Genomics","author":"M Feder","year":"2005","unstructured":"Feder M, Bujnicki JM: BMC Genomics. 2005., 6:"},{"key":"952_CR28","first-page":"2314","volume":"33","author":"MO Dayhoff","year":"1974","unstructured":"Dayhoff MO: Computer analysis of protein sequences. Feder Proc 1974, 33: 2314\u20132316.","journal-title":"Feder Proc"},{"key":"952_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/BF01732178","volume":"7","author":"E Zuckerkandl","year":"1975","unstructured":"Zuckerkandl E: The appearance of new structures and functions in proteins during evolution. J Mol Evol 1975, 7: 1\u201357. 10.1007\/BF01732178","journal-title":"J Mol Evol"},{"key":"952_CR30","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1038\/357543a0","volume":"357","author":"C Chothia","year":"1992","unstructured":"Chothia C: One thousand families for the molecular biologist. Nature 1992, 357: 543\u2013544. 10.1038\/357543a0","journal-title":"Nature"},{"key":"952_CR31","doi-asserted-by":"publisher","first-page":"1029","DOI":"10.1038\/4136","volume":"5","author":"A Sali","year":"1998","unstructured":"Sali A: 100,000 protein structures for the biologist. Nat Struct Biol 1998, 5: 1029\u20131032. 10.1038\/4136","journal-title":"Nat Struct Biol"},{"key":"952_CR32","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1093\/bioinformatics\/18.7.922","volume":"18","author":"J Liu","year":"2002","unstructured":"Liu J, Rost B: Target space for structural genomics revisited. Bioinformatics 2002, 18: 922\u2013933. 10.1093\/bioinformatics\/18.7.922","journal-title":"Bioinformatics"},{"key":"952_CR33","doi-asserted-by":"publisher","first-page":"D23","DOI":"10.1093\/nar\/gkh045","volume":"32","author":"DA Benson","year":"2004","unstructured":"Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank: update. Nucleic Acids Res 2004, 32: D23\u20136. 10.1093\/nar\/gkh045","journal-title":"Nucleic Acids Res"},{"key":"952_CR34","doi-asserted-by":"crossref","unstructured":"Kinch LN, Qi Y, Hubbard TJP, Grishin NV: CASP5 target classification. Proteins 2003, (Suppl 6):340\u2013351. 10.1002\/prot.10555","DOI":"10.1002\/prot.10555"},{"key":"952_CR35","doi-asserted-by":"crossref","unstructured":"Tress ML, Tai, Chin-Hsien, Wang G, Ezkurdia I, L\u00f3pez G, Valencia A, Lee BK, Dunbrack RL: Domain Definition and Target Classification for CASP6. Proteins 2005, (Suppl 7):8\u201318. 10.1002\/prot.20717","DOI":"10.1002\/prot.20717"},{"key":"952_CR36","doi-asserted-by":"crossref","unstructured":"Tramontano A, Leplae R, Morea V: Analysis and Assessment of Comparative Modeling Predictions in CASP4. Proteins 2001, (Suppl 5):22\u201338. 10.1002\/prot.10015","DOI":"10.1002\/prot.10015"},{"key":"952_CR37","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1002\/prot.20284","volume":"58","author":"D Cozzetto","year":"2005","unstructured":"Cozzetto D, Tramontano A: The relationship between multiple sequence alignments and the quality of protein comparative models. Proteins 2005, 58: 151\u2013157. 10.1002\/prot.20284","journal-title":"Proteins"},{"key":"952_CR38","doi-asserted-by":"publisher","first-page":"3370","DOI":"10.1093\/nar\/gkg571","volume":"31","author":"A Zemla","year":"2003","unstructured":"Zemla A: LGA \u2013 a Method for Finding 3D Similarities in Protein Structures. Nucleic Acids Res 2003, 31: 3370\u20133374. 10.1093\/nar\/gkg571","journal-title":"Nucleic Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-213.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:13:56Z","timestamp":1630466036000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-213"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,4,19]]},"references-count":38,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["952"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-213","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,4,19]]},"assertion":[{"value":"22 December 2005","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 April 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 April 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"213"}}