{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,7]],"date-time":"2026-06-07T00:38:52Z","timestamp":1780792732786,"version":"3.54.1"},"reference-count":19,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>For many types of analyses, data about gene structure and locations of non-coding regions of genes are required. Although a vast amount of genomic sequence data is available, precise annotation of genes is lacking behind. Finding the corresponding gene of a given protein sequence by means of conventional tools is error prone, and cannot be completed without manual inspection, which is time consuming and requires considerable experience.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Scipio is a tool based on the alignment program BLAT to determine the precise gene structure given a protein sequence and a genome sequence. It identifies intron-exon borders and splice sites and is able to cope with sequencing errors and genes spanning several contigs in genomes that have not yet been assembled to supercontigs or chromosomes. Instead of producing a set of hits with varying confidence, Scipio gives the user a coherent summary of locations on the genome that code for the query protein. The output contains information about discrepancies that may result from sequencing errors. Scipio has also successfully been used to find homologous genes in closely related species. Scipio was tested with 979 protein queries against 16 arthropod genomes (intra species search). For cross-species annotation, Scipio was used to annotate 40 genes from <jats:italic>Homo sapiens<\/jats:italic> in the primates <jats:italic>Pongo pygmaeus abelii<\/jats:italic> and <jats:italic>Callithrix jacchus<\/jats:italic>. The prediction quality of Scipio was tested in a comparative study against that of BLAT and the well established program Exonerate.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Scipio is able to precisely map a protein query onto a genome. Even in cases when there are many sequencing errors, or when incomplete genome assemblies lead to hits that stretch across multiple target sequences, it very often provides the user with the correct determination of intron-exon borders and splice sites, showing an improved prediction accuracy compared to BLAT and Exonerate. Apart from being able to find genes in the genome that encode the query protein, Scipio can also be used to annotate genes in closely related species.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-278","type":"journal-article","created":{"date-parts":[[2008,7,1]],"date-time":"2008-07-01T06:15:09Z","timestamp":1214892909000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":128,"title":["Scipio: Using protein sequences to determine the precise exon\/intron structures of genes and their orthologs in closely related species"],"prefix":"10.1186","volume":"9","author":[{"given":"Oliver","family":"Keller","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Florian","family":"Odronitz","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mario","family":"Stanke","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Martin","family":"Kollmar","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Stephan","family":"Waack","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2008,6,13]]},"reference":[{"issue":"2\u20133","key":"2263_CR1","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1023\/A:1024145407467","volume":"118","author":"L Fedorova","year":"2003","unstructured":"Fedorova L, Fedorov A: Introns in gene evolution. Genetica 2003, 118(2\u20133):123\u201331. 10.1023\/A:1024145407467","journal-title":"Genetica"},{"issue":"6","key":"2263_CR2","doi-asserted-by":"publisher","first-page":"424","DOI":"10.1038\/nrg2026","volume":"8","author":"A Sandelin","year":"2007","unstructured":"Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume DA: Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nature reviews 2007, 8(6):424\u201336.","journal-title":"Nature reviews"},{"key":"2263_CR3","doi-asserted-by":"publisher","first-page":"188","DOI":"10.1186\/1471-2148-7-188","volume":"7","author":"M Irimia","year":"2007","unstructured":"Irimia M, Rukov J, Penny D, Roy S: Functional and evolutionary analysis of alternatively spliced genes is consistent with an early eukaryotic origin of alternative splicing. BMC Evol Biol 2007, 7: 188. 10.1186\/1471-2148-7-188","journal-title":"BMC Evol Biol"},{"key":"2263_CR4","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1186\/1471-2164-8-103","volume":"8","author":"F Odronitz","year":"2007","unstructured":"Odronitz F, Hellkamp M, Kollmar M: diArk-a resource for eukaryotic genome research. BMC Genomics 2007, 8: 103. 10.1186\/1471-2164-8-103","journal-title":"BMC Genomics"},{"issue":"9","key":"2263_CR5","doi-asserted-by":"publisher","first-page":"R196","DOI":"10.1186\/gb-2007-8-9-r196","volume":"8","author":"F Odronitz","year":"2007","unstructured":"Odronitz F, Kollmar M: Drawing the tree of eukaryotic life based on the analysis of 2269 manually annotated myosins from 328 species. Genome Biol 2007, 8(9):R196. 10.1186\/gb-2007-8-9-r196","journal-title":"Genome Biol"},{"issue":"16","key":"2263_CR6","doi-asserted-by":"publisher","first-page":"2848","DOI":"10.1093\/bioinformatics\/bth287","volume":"20","author":"F Lazzarato","year":"2004","unstructured":"Lazzarato F, Franceschinis G, Botta M, Cordero F, Calogero RA: RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets. Bioinformatics (Oxford, England) 2004, 20(16):2848\u201350. 10.1093\/bioinformatics\/bth287","journal-title":"Bioinformatics (Oxford, England)"},{"key":"2263_CR7","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1471-2105-8-94","volume":"8","author":"ST Doh","year":"2007","unstructured":"Doh ST, Zhang Y, Temple MH, Cai L: Non-coding sequence retrieval system for comparative genomic analysis of gene regulatory elements. BMC bioinformatics 2007, 8: 94. 10.1186\/1471-2105-8-94","journal-title":"BMC bioinformatics"},{"key":"2263_CR8","first-page":"D707","volume-title":"Nucleic Acids Res","author":"P Flicek","year":"2008","unstructured":"Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Eyre T, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Holland R, Howe KL, Howe K, Johnson N, Jenkinson A, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Megy K, Meidl P, Overduin B, Parker A, Pritchard B, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Slater G, Smedley D, Spudich G, Trevanion S, Vilella AJ, Vogel J, White S, Wood M, Birney E, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Herrero J, Hubbard TJP, Kasprzyk A, Proctor G, Smith J, Ureta-Vidal A, Searle S: Ensembl 2008. Nucleic Acids Res 2008, (36 Database):D707\u201314."},{"key":"2263_CR9","first-page":"D773","volume-title":"Nucleic Acids Res","author":"D Karolchik","year":"2008","unstructured":"Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 2008, (36 Database):D773\u20139."},{"issue":"1","key":"2263_CR10","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/1745-6150-3-20","volume":"3","author":"Y Kapustin","year":"2008","unstructured":"Kapustin Y, Souvorov A, Tatusova T, Lipman D: Splign: algorithms for computing spliced alignments with identification of paralogs. Biol Direct 2008, 3(1):20. 10.1186\/1745-6150-3-20","journal-title":"Biol Direct"},{"issue":"5","key":"2263_CR11","doi-asserted-by":"publisher","first-page":"988","DOI":"10.1101\/gr.1865504","volume":"14","author":"E Birney","year":"2004","unstructured":"Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res 2004, 14(5):988\u2013995. 10.1101\/gr.1865504","journal-title":"Genome Res"},{"key":"2263_CR12","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1186\/1471-2105-6-31","volume":"6","author":"GSC Slater","year":"2005","unstructured":"Slater GSC, Birney E: Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 2005, 6: 31. 10.1186\/1471-2105-6-31","journal-title":"BMC Bioinformatics"},{"issue":"4","key":"2263_CR13","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1101\/gr.229202. Article published online before March 2002","volume":"12","author":"WJ Kent","year":"2002","unstructured":"Kent WJ: BLAT-the BLAST-like alignment tool. Genome research 2002, 12(4):656\u201364. 10.1101\/gr.229202. Article published online before March 2002","journal-title":"Genome research"},{"issue":"3","key":"2263_CR14","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","volume":"215","author":"SF Altschul","year":"1990","unstructured":"Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215(3):403\u2013410.","journal-title":"J Mol Biol"},{"issue":"10B","key":"2263_CR15","doi-asserted-by":"publisher","first-page":"2121","DOI":"10.1101\/gr.2596504","volume":"14","author":"DS Gerhard","year":"2004","unstructured":"Gerhard DS, Wagner L, Feingold EA, Shenmen CM, Grouse LH, Schuler G, Klein SL, Old S, Rasooly R, Good P, Guyer M, Peck AM, Derge JG, Lipman D, Collins FS, Jang W, Sherry S, Feolo M, Misquitta L, Lee E, Rotmistrovsky K, Greenhut SF, Schaefer CF, Buetow K, Bonner TI, Haussler D, Kent J, Kiekhaus M, Furey T, Brent M, Prange C, Schreiber K, Shapiro N, Bhat NK, Hopkins RF, Hsie F, Driscoll T, Soares MB, Casavant TL, Scheetz TE, Brown-stein MJ, Usdin TB, Toshiyuki S, Carninci P, Piao Y, Dudekula DB, Ko MSH, Kawakami K, Suzuki Y, Sugano S, Gruber CE, Smith MR, Simmons B, Moore T, Waterman R, Johnson SL, Ruan Y, Wei CL, Mathavan S, Gunaratne PH, Wu J, Garcia AM, Hulyk SW, Fuh E, Yuan Y, Sneed A, Kowis C, Hodgson A, Muzny DM, McPherson J, Gibbs RA, Fahey J, Helton E, Ketteman M, Madan A, Rodrigues S, Sanchez A, Whiting M, Madari A, Young AC, Wetherby KD, Granite SJ, Kwong PN, Brinkley CP, Pearson RL, Bouffard GG, Blakesly RW, Green ED, Dickson MC, Rodriguez AC, Grimwood J, Schmutz J, Myers RM, Butterfield YSN, Griffith M, Griffith OL, Krzywinski MI, Liao N, Morin R, Palmquist D, Petrescu AS, Skalska U, Smailus DE, Stott JM, Schnerch A, Schein JE, Jones SJM, Holt RA, Baross A, Marra MA, Clifton S, Makowski KA, Bosak S, Malek J: The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC). Genome Res 2004, 14(10B):2121\u20132127. 10.1101\/gr.2596504","journal-title":"Genome Res"},{"key":"2263_CR16","doi-asserted-by":"publisher","first-page":"40","DOI":"10.1038\/ng1285","volume":"36","author":"T Ota","year":"2004","unstructured":"Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R, Wakamatsu A, Hayashi K, Sato H, Nagai K, Kimura K, Makita H, Sekine M, Obayashi M, Nishi T, Shibahara T, Tanaka T, Ishii S, Yamamoto Ji, Saito K, Kawai Y, Isono Y, Nakamura Y, Nagahari K, Murakami K, Yasuda T, Iwayanagi T, Wagatsuma M, Shiratori A, Sudo H, Hosoiri T, Kaku Y, Kodaira H, Kondo H, Sugawara M, Takahashi M, Kanda K, Yokoi T, Furuya T, Kikkawa E, Omura Y, Abe K, Kamihara K, Katsuta N, Sato K, Tanikawa M, Yamazaki M, Ninomiya K, Ishibashi T, Yamashita H, Murakawa K, Fujimori K, Tanai H, Kimata M, Watanabe M, Hiraoka S, Chiba Y, Ishida S, Ono Y, Takiguchi S, Watanabe S, Yosida M, Hotuta T, Kusano J, Kanehori K, Takahashi-Fujii A, Hara H, Tanase To, Nomura Y, Togiya S, Komai F, Hara R, Takeuchi K, Arita M, Imose N, Musashino K, Yuuki H, Oshima A, Sasaki N, Aotsuka S, Yoshikawa Y, Matsunawa H, Ichihara T, Shiohata N, Sano S, Moriya S, Momiyama H, Satoh N, Takami S, Terashima Y, Suzuki O, Nakagawa S, Senoh A, Mizoguchi H, Goto Y, Shimizu F, Wakebe H, Hishigaki H, Watanabe T, Sugiyama A, Takemoto M, Kawakami B, Yamazaki M, Watanabe K, Kumagai A, Itakura S, Fukuzumi Y, Fujimori Y, Komiyama M, Tashiro H, Tanigami A, Fujiwara T, Ono T, Yamada K, Fujii Y, Ozaki K, Hirao M, Ohmori Y, Kawabata A, Hikiji T, Kobatake N, Inagaki H, Ikema Y, Okamoto S, Okitani R, Kawakami T, Noguchi S, Itoh T, Shigeta K, Senba T, Matsumura K, Nakajima Y, Mizuno T, Morinaga M, Sasaki M, Togashi T, Oyama M, Hata H, Watanabe M, Komatsu T, Mizushima-Sugano J, Satoh T, Shirai Y, Takahashi Y, Nakagawa K, Okumura K, Nagase T, Nomura N, Kikuchi H, Masuho Y, Yamashita R, Nakai K, Yada T, Nakamura Y, Ohara O, Isogai T, Sugano S: Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet 2004, 36: 40\u201345. 10.1038\/ng1285","journal-title":"Nat Genet"},{"key":"2263_CR17","unstructured":"Genome sequencing centre at the Washington University School of Medicine[http:\/\/genome.wustl.edu]"},{"key":"2263_CR18","doi-asserted-by":"publisher","first-page":"300","DOI":"10.1186\/1471-2164-7-300","volume":"7","author":"F Odronitz","year":"2006","unstructured":"Odronitz F, Kollmar M: Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase). BMC Genomics 2006, 7: 300. [http:\/\/www.cymobase.org] 10.1186\/1471-2164-7-300","journal-title":"BMC Genomics"},{"key":"2263_CR19","unstructured":"CyMoBase \u2013 a database for cytoskeletal and motor proteins[http:\/\/www.cymobase.org]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-278.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T03:24:19Z","timestamp":1630466659000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-278"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,6,13]]},"references-count":19,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2263"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-278","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,6,13]]},"assertion":[{"value":"8 February 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 June 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 June 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"278"}}