{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T13:50:30Z","timestamp":1767016230522},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Despite significant improvements in computational annotation of genomes, sequences of abnormal, incomplete or incorrectly predicted genes and proteins remain abundant in public databases. Since the majority of incomplete, abnormal or mispredicted entries are not annotated as such, these errors seriously affect the reliability of these databases. Here we describe the MisPred approach that may provide an efficient means for the quality control of databases. The current version of the MisPred approach uses five distinct routines for identifying abnormal, incomplete or mispredicted entries based on the principle that a sequence is likely to be incorrect if some of its features conflict with our current knowledge about protein-coding genes and proteins: (i) conflict between the predicted subcellular localization of proteins and the absence of the corresponding sequence signals; (ii) presence of extracellular and cytoplasmic domains and the absence of transmembrane segments; (iii) co-occurrence of extracellular and nuclear domains; (iv) violation of domain integrity; (v) chimeras encoded by two or more genes located on different chromosomes.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Analyses of predicted EnsEMBL protein sequences of nine deuterostome (<jats:italic>Homo sapiens, Mus musculus, Rattus norvegicus, Monodelphis domestica, Gallus gallus, Xenopus tropicalis, Fugu rubripes, Danio rerio<\/jats:italic> and <jats:italic>Ciona intestinalis<\/jats:italic>) and two protostome species (<jats:italic>Caenorhabditis elegans<\/jats:italic> and <jats:italic>Drosophila melanogaster<\/jats:italic>) have revealed that the absence of expected signal peptides and violation of domain integrity account for the majority of mispredictions. Analyses of sequences predicted by NCBI's GNOMON annotation pipeline show that the rates of mispredictions are comparable to those of EnsEMBL. Interestingly, even the manually curated UniProtKB\/Swiss-Prot dataset is contaminated with mispredicted or abnormal proteins, although to a much lesser extent than UniProtKB\/TrEMBL or the EnsEMBL or GNOMON-predicted entries.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>MisPred works efficiently in identifying errors in predictions generated by the most reliable gene prediction tools such as the EnsEMBL and NCBI's GNOMON pipelines and also guides the correction of errors. We suggest that application of the MisPred approach will significantly improve the quality of gene predictions and the associated databases.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-353","type":"journal-article","created":{"date-parts":[[2008,8,27]],"date-time":"2008-08-27T18:27:45Z","timestamp":1219861665000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["Identification and correction of abnormal, incomplete and mispredicted proteins in public databases"],"prefix":"10.1186","volume":"9","author":[{"given":"Alinda","family":"Nagy","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"H\u00e9di","family":"Hegyi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Krisztina","family":"Farkas","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hedvig","family":"Tordai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Evelin","family":"Kozma","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"L\u00e1szl\u00f3","family":"B\u00e1nyai","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"L\u00e1szl\u00f3","family":"Patthy","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2008,8,27]]},"reference":[{"key":"2338_CR1","doi-asserted-by":"publisher","first-page":"860","DOI":"10.1038\/35057062","volume":"409","author":"ES Lander","year":"2001","unstructured":"Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Bl\u00f6cker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature 2001, 409: 860\u2013921. 10.1038\/35057062","journal-title":"Nature"},{"key":"2338_CR2","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.1126\/science.1058040","volume":"291","author":"JC Venter","year":"2001","unstructured":"Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guig\u00f3 R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X: The sequence of the human genome. Science 2001, 291: 1304\u20131351. 10.1126\/science.1058040","journal-title":"Science"},{"key":"2338_CR3","doi-asserted-by":"publisher","first-page":"931","DOI":"10.1038\/nature03001","volume":"431","author":"International Human Genome Sequencing Consortium","year":"2004","unstructured":"International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature 2004, 431: 931\u2013945. 10.1038\/nature03001","journal-title":"Nature"},{"issue":"5828","key":"2338_CR4","doi-asserted-by":"publisher","first-page":"1113","DOI":"10.1126\/science.316.5828.1113a","volume":"316","author":"E Pennisi","year":"2007","unstructured":"Pennisi E: Working the (gene count) numbers: finally, a firm answer? Science 2007, 316(5828):1113. 10.1126\/science.316.5828.1113a","journal-title":"Science"},{"issue":"Suppl 1","key":"2338_CR5","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/gb-2006-7-s1-s2","volume":"7","author":"R Guig\u00f3","year":"2006","unstructured":"Guig\u00f3 R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG: EGASP: the human ENCODE genome annotation assessment project. Genome Biol 2006, 7(Suppl 1):S2. 10.1186\/gb-2006-7-s1-s2","journal-title":"Genome Biol"},{"key":"2338_CR6","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1038\/nrg2220","volume":"9","author":"MR Brent","year":"2008","unstructured":"Brent MR: Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat Rev Genet 2008, 9: 62\u201373. 10.1038\/nrg2220","journal-title":"Nat Rev Genet"},{"key":"2338_CR7","doi-asserted-by":"publisher","first-page":"3097","DOI":"10.1093\/hmg\/11.24.3097","volume":"11","author":"T Wang","year":"2002","unstructured":"Wang T, Waters CT, Rothman AM, Jakins TJ, Romisch K, Trump D: Intracellular retention of mutant retinoschisin is the pathological mechanism underlying X-linked retinoschisis. Hum Mol Genet 2002, 11: 3097\u2013105. 10.1093\/hmg\/11.24.3097","journal-title":"Hum Mol Genet"},{"key":"2338_CR8","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1016\/j.bbrc.2003.09.072","volume":"310","author":"A Ohnishi","year":"2003","unstructured":"Ohnishi A, Emi Y: Rapid proteasomal degradation of translocation-deficient UDP-glucuronosyltransferase 1A1 proteins in patients with Crigler-Najjar type II. Biochem Biophys Res Commun 2003, 310: 735\u201341. 10.1016\/j.bbrc.2003.09.072","journal-title":"Biochem Biophys Res Commun"},{"key":"2338_CR9","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1002\/humu.9276","volume":"24","author":"J Saarela","year":"2004","unstructured":"Saarela J, von Schantz C, Peltonen L, Jalanko A: A novel aspartylglucosaminuria mutation affects translocation of aspartylglucosaminidase. Hum Mutat 2004, 24: 350\u20131. 10.1002\/humu.9276","journal-title":"Hum Mutat"},{"key":"2338_CR10","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1016\/j.abb.2004.12.012","volume":"435","author":"A Jayakumar","year":"2005","unstructured":"Jayakumar A, Kang Y, Henderson Y, Mitsudo K, Liu X, Briggs K, Wang M, Frederick MJ, El-Naggar AK, Bebok Z, Clayman GL: Consequences of C-terminal domains and N-terminal signal peptide deletions on LEKTI secretion, stability, and subcellular distribution. Arch Biochem Biophys 2005, 435: 89\u2013102. 10.1016\/j.abb.2004.12.012","journal-title":"Arch Biochem Biophys"},{"key":"2338_CR11","first-page":"1033","volume":"12","author":"L Hansen","year":"2006","unstructured":"Hansen L, Yao W, Eiberg H, Funding M, Riise R, Kjaer KW, Hejtmancik JF, Rosenberg T: The congenital \"ant-egg\" cataract phenotype is caused by a missense mutation in connexin46. Mol Vis 2006, 12: 1033\u20139.","journal-title":"Mol Vis"},{"key":"2338_CR12","doi-asserted-by":"publisher","first-page":"314","DOI":"10.1002\/ana.20963","volume":"60","author":"O Mukherjee","year":"2006","unstructured":"Mukherjee O, Pastor P, Cairns NJ, Chakraverty S, Kauwe JS, Shears S, Behrens MI, Budde J, Hinrichs AL, Norton J, Levitch D, Taylor-Reinwald L, Gitcho M, Tu PH, Tenenholz Grinberg L, Liscic RM, Armendariz J, Morris JC, Goate AM: HDDD2 is a familial frontotemporal lobar degeneration with ubiquitin-positive, tau-negative inclusions caused by a missense mutation in the signal peptide of progranulin. Ann Neurol 2006, 60: 314\u201322. 10.1002\/ana.20963","journal-title":"Ann Neurol"},{"key":"2338_CR13","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1038\/sj.jid.5700551","volume":"127","author":"B Favre","year":"2007","unstructured":"Favre B, Plantard L, Aeschbach L, Brakch N, Christen-Zaech S, de Viragh PA, Sergeant A, Huber M, Hohl D: SLURP1 is a late marker of epidermal differentiation and is absent in Mal de Meleda. J Invest Dermatol 2007, 127: 301\u20138. 10.1038\/sj.jid.5700551","journal-title":"J Invest Dermatol"},{"key":"2338_CR14","doi-asserted-by":"crossref","first-page":"24109","DOI":"10.1016\/S0021-9258(18)54400-8","volume":"266","author":"RM Hudziak","year":"1991","unstructured":"Hudziak RM, Ullrich A: Cell transformation potential of a HER2 transmembrane domain deletion mutant retained in the endoplasmic reticulum. J Biol Chem 1991, 266: 24109\u201315.","journal-title":"J Biol Chem"},{"key":"2338_CR15","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1073\/pnas.89.3.922","volume":"89","author":"C Brenner","year":"1992","unstructured":"Brenner C, Fuller RS: Structural and enzymatic characterization of a purified prohormone-processing enzyme: secreted, soluble Kex2 protease. Proc Natl Acad Sci USA 1992, 89: 922\u20136. 10.1073\/pnas.89.3.922","journal-title":"Proc Natl Acad Sci USA"},{"key":"2338_CR16","doi-asserted-by":"publisher","first-page":"895","DOI":"10.1038\/nature02263","volume":"426","author":"AL Goldberg","year":"2003","unstructured":"Goldberg AL: Protein degradation and protection against misfolded or damaged proteins. Nature 2003, 426: 895\u20139. 10.1038\/nature02263","journal-title":"Nature"},{"key":"2338_CR17","doi-asserted-by":"publisher","first-page":"1168","DOI":"10.1101\/gr.96802","volume":"12","author":"R Mott","year":"2002","unstructured":"Mott R, Schultz J, Bork P, Ponting CP: Predicting protein cellular localization using a domain projection method. Genome Res 2002, 12: 1168\u201374. 10.1101\/gr.96802","journal-title":"Genome Res"},{"key":"2338_CR18","doi-asserted-by":"publisher","first-page":"5064","DOI":"10.1111\/j.1742-4658.2005.04917.x","volume":"272","author":"H Tordai","year":"2005","unstructured":"Tordai H, Nagy A, Farkas K, Banyai L, Patthy L: Modules, multidomain proteins and organismic complexity. FEBS J 2005, 272: 5064\u20135078. 10.1111\/j.1742-4658.2005.04917.x","journal-title":"FEBS J"},{"key":"2338_CR19","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1093\/bioinformatics\/16.7.613","volume":"16","author":"SJ Wheelan","year":"2000","unstructured":"Wheelan SJ, Marchler-Bauer A, Bryant SH: Domain size distributions can predict domain boundaries. Bioinformatics 2000, 16: 613\u20138. 10.1093\/bioinformatics\/16.7.613","journal-title":"Bioinformatics"},{"key":"2338_CR20","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1186\/1471-2148-7-19","volume":"7","author":"Y Wolf","year":"2007","unstructured":"Wolf Y, Madej T, Babenko V, Shoemaker B, Panchenko AR: Long-term trends in evolution of indels in protein sequences. BMC Evol Biol 2007, 7: 19. 10.1186\/1471-2148-7-19","journal-title":"BMC Evol Biol"},{"key":"2338_CR21","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1016\/j.cell.2006.12.042","volume":"128","author":"AL Watters","year":"2007","unstructured":"Watters AL, Deka P, Corrent C, Callender D, Varani G, Sosnick T, Baker D: The highly cooperative folding of small naturally occurring proteins is likely the result of natural selection. Cell 2007, 128: 613\u201324. 10.1016\/j.cell.2006.12.042","journal-title":"Cell"},{"key":"2338_CR22","doi-asserted-by":"publisher","first-page":"349","DOI":"10.1093\/protein\/gzh037","volume":"17","author":"JD Bendtsen","year":"2004","unstructured":"Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S: Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 2004, 17: 349\u201356. 10.1093\/protein\/gzh037","journal-title":"Protein Eng Des Sel"},{"key":"2338_CR23","doi-asserted-by":"publisher","first-page":"109","DOI":"10.1016\/j.febslet.2004.08.045","volume":"575","author":"H Tordai","year":"2004","unstructured":"Tordai H, Patthy L: Insertion of spliceosomal introns in proto-splice sites: the case of secretory signal peptides. FEBS Lett 2004, 575: 109\u201311. 10.1016\/j.febslet.2004.08.045","journal-title":"FEBS Lett"},{"key":"2338_CR24","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1016\/j.febslet.2004.03.088","volume":"565","author":"L B\u00e1nyai","year":"2004","unstructured":"B\u00e1nyai L, Patthy L: Evidence that human genes of modular proteins have retained significantly more ancestral introns than their fly or worm orthologues. FEBS Lett 2004, 565: 127\u201332. 10.1016\/j.febslet.2004.03.088","journal-title":"FEBS Lett"},{"key":"2338_CR25","doi-asserted-by":"publisher","first-page":"2012","DOI":"10.1126\/science.282.5396.2012","volume":"282","author":"C. elegans Sequencing Consortium","year":"1998","unstructured":"C. elegans Sequencing Consortium: Genome sequence of the nematode C. elegans : a platform for investigating biology. Science 1998, 282: 2012\u20132018. 10.1126\/science.282.5396.2012","journal-title":"Science"},{"key":"2338_CR26","doi-asserted-by":"publisher","first-page":"2185","DOI":"10.1126\/science.287.5461.2185","volume":"287","author":"MD Adams","year":"2000","unstructured":"Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, George RA, Lewis SE, Richards S, Ashburner M, Henderson SN, Sutton GG, Wortman JR, Yandell MD, Zhang Q, Chen LX, Brandon RC, Rogers YH, Blazej RG, Champe M, Pfeiffer BD, Wan KH, Doyle C, Baxter EG, Helt G, Nelson CR, Gabor GL, Abril JF, Agbayani A, An HJ, Andrews-Pfannkoch C, Baldwin D, Ballew RM, Basu A, Baxendale J, Bayraktaroglu L, Beasley EM, Beeson KY, Benos PV, Berman BP, Bhandari D, Bolshakov S, Borkova D, Botchan MR, Bouck J, Brokstein P, Brottier P, Burtis KC, Busam DA, Butler H, Cadieu E, Center A, Chandra I, Cherry JM, Cawley S, Dahlke C, Davenport LB, Davies P, de Pablos B, Delcher A, Deng Z, Mays AD, Dew I, Dietz SM, Dodson K, Doup LE, Downes M, Dugan-Rocha S, Dunkov BC, Dunn P, Durbin KJ, Evangelista CC, Ferraz C, Ferriera S, Fleischmann W, Fosler C, Gabrielian AE, Garg NS, Gelbart WM, Glasser K, Glodek A, Gong F, Gorrell JH, Gu Z, Guan P, Harris M, Harris NL, Harvey D, Heiman TJ, Hernandez JR, Houck J, Hostin D, Houston KA, Howland TJ, Wei MH, Ibegwam C, Jalali M, Kalush F, Karpen GH, Ke Z, Kennison JA, Ketchum KA, Kimmel BE, Kodira CD, Kraft C, Kravitz S, Kulp D, Lai Z, Lasko P, Lei Y, Levitsky AA, Li J, Li Z, Liang Y, Lin X, Liu X, Mattei B, McIntosh TC, McLeod MP, McPherson D, Merkulov G, Milshina NV, Mobarry C, Morris J, Moshrefi A, Mount SM, Moy M, Murphy B, Murphy L, Muzny DM, Nelson DL, Nelson DR, Nelson KA, Nixon K, Nusskern DR, Pacleb JM, Palazzolo M, Pittman GS, Pan S, Pollard J, Puri V, Reese MG, Reinert K, Remington K, Saunders RD, Scheeler F, Shen H, Shue BC, Sid\u00e9n-Kiamos I, Simpson M, Skupski MP, Smith T, Spier E, Spradling AC, Stapleton M, Strong R, Sun E, Svirskas R, Tector C, Turner R, Venter E, Wang AH, Wang X, Wang ZY, Wassarman DA, Weinstock GM, Weissenbach J, Williams SM, Woodage T, Worley KC, Wu D, Yang S, Yao QA, Ye J, Yeh RF, Zaveri JS, Zhan M, Zhang G, Zhao Q, Zheng L, Zheng XH, Zhong FN, Zhong W, Zhou X, Zhu S, Zhu X, Smith HO, Gibbs RA, Myers EW, Rubin GM, Venter JC: The genome sequence of Drosophila melanogaster . Science 2000, 287: 2185\u20132195. 10.1126\/science.287.5461.2185","journal-title":"Science"},{"key":"2338_CR27","doi-asserted-by":"publisher","first-page":"D142","DOI":"10.1093\/nar\/gkh088","volume":"32","author":"I Letunic","year":"2004","unstructured":"Letunic I, Copley RR, Schmidt S, Ciccarelli FD, Doerks T, Schultz J, Ponting CP, Bork P: SMART 4.0: towards genomic data integration. Nucl Acids Res 2004, 32: D142\u20134. 10.1093\/nar\/gkh088","journal-title":"Nucl Acids Res"},{"key":"2338_CR28","doi-asserted-by":"publisher","first-page":"1301","DOI":"10.1126\/science.1072104","volume":"297","author":"S Aparicio","year":"2002","unstructured":"Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes . Science 2002, 297: 1301\u20131310. 10.1126\/science.1072104","journal-title":"Science"},{"key":"2338_CR29","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1002\/cne.20391","volume":"481","author":"B Chen","year":"2005","unstructured":"Chen B, Bixby JL: Neuronal pentraxin with chromo domain (NPCD) is a novel class of protein expressed in multiple neuronal domains. J Comp Neurol 2005, 481: 391\u2013402. 10.1002\/cne.20391","journal-title":"J Comp Neurol"},{"key":"2338_CR30","doi-asserted-by":"publisher","first-page":"880","DOI":"10.1523\/JNEUROSCI.4365-04.2005","volume":"25","author":"B Chen","year":"2005","unstructured":"Chen B, Bixby JL: A novel substrate of receptor tyrosine phosphatase PTPRO is required for nerve growth factor-induced process outgrowth. J Neurosci 2005, 25: 880\u2013888. 10.1523\/JNEUROSCI.4365-04.2005","journal-title":"J Neurosci"},{"key":"2338_CR31","doi-asserted-by":"publisher","first-page":"988","DOI":"10.1101\/gr.1865504","volume":"14","author":"E Birney","year":"2004","unstructured":"Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res 2004, 14: 988\u2013995. 10.1101\/gr.1865504","journal-title":"Genome Res"},{"key":"2338_CR32","doi-asserted-by":"publisher","first-page":"636","DOI":"10.1126\/science.1105136","volume":"306","author":"ENCODE project consortium","year":"2004","unstructured":"ENCODE project consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004, 306: 636\u2013640. 10.1126\/science.1105136","journal-title":"Science"},{"key":"2338_CR33","doi-asserted-by":"publisher","first-page":"5495","DOI":"10.1073\/pnas.0700800104","volume":"104","author":"ML Tress","year":"2007","unstructured":"Tress ML, Martelli PL, Frankish A, Reeves GA, Wesselink JJ, Yeats C, Olason PL, Albrecht M, Hegyi H, Giorgetti A, Raimondo D, Lagarde J, Laskowski RA, Lopez G, Sadowski MI, Watson JD, Fariselli P, Rossi I, Nagy A, Kai W, Storling Z, Orsini M, Assenov Y, Blankenburg H, Huthmacher C, Ramirez F, Schlicker A, Denoeud F, Jones P, Kerrien S, Orchard S, Antonarakis SE, Reymond A, Birney E, Brunak S, Casadio R, Guigo R, Harrow J, Hermjakob H, Jones DT, Lengauer T, Orengo CA, Patthy L, Thornton JM, Tramontano A, Valencia A: The implications of alternative splicing in the ENCODE protein complement. Proc Natl Acad Sci USA 2007, 104: 5495\u2013500. 10.1073\/pnas.0700800104","journal-title":"Proc Natl Acad Sci USA"},{"key":"2338_CR34","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1101\/gr.4137606","volume":"16","author":"P Akiva","year":"2006","unstructured":"Akiva P, Toporik A, Edelheit S, Peretz Y, Diber A, Shemesh R, Novik A, Sorek R: Transcription-mediated gene fusion in the human genome. Genome Res 2006, 16: 30\u20136. 10.1101\/gr.4137606","journal-title":"Genome Res"},{"key":"2338_CR35","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1101\/gr.4145906","volume":"16","author":"G Parra","year":"2006","unstructured":"Parra G, Reymond A, Dabbouseh N, Dermitzakis ET, Castelo R, Thomson TM, Antonarakis SE, Guigo R: Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res 2006, 16: 37\u201344. 10.1101\/gr.4145906","journal-title":"Genome Res"},{"key":"2338_CR36","doi-asserted-by":"publisher","first-page":"e254","DOI":"10.1371\/journal.pone.0000254","volume":"2","author":"P Unneberg","year":"2007","unstructured":"Unneberg P, Claverie JM: Tentative Mapping of Transcription-Induced Interchromosomal Interaction using Chimeric EST and mRNA Data. PLoS ONE 2007, 2: e254. 10.1371\/journal.pone.0000254","journal-title":"PLoS ONE"},{"key":"2338_CR37","doi-asserted-by":"publisher","first-page":"D193","DOI":"10.1093\/nar\/gkl929","volume":"35","author":"The UniProt Consortium","year":"2007","unstructured":"The UniProt Consortium: The Universal Protein Resource (UniProt). Nucl Acids Res 2007, 35: D193-D197. 10.1093\/nar\/gkl929","journal-title":"Nucl Acids Res"},{"key":"2338_CR38","doi-asserted-by":"publisher","first-page":"D610","DOI":"10.1093\/nar\/gkl996","volume":"35","author":"TJ Hubbard","year":"2007","unstructured":"Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E: Ensembl 2007. Nucl Acids Res 2007, 35: D610-D617. 10.1093\/nar\/gkl996","journal-title":"Nucl Acids Res"},{"key":"2338_CR39","doi-asserted-by":"publisher","first-page":"D5","DOI":"10.1093\/nar\/gkl1031","volume":"35","author":"DL Wheeler","year":"2007","unstructured":"Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucl Acids Res 2007, 35: D5-D12. 10.1093\/nar\/gkl1031","journal-title":"Nucl Acids Res"},{"key":"2338_CR40","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucl Acids Res"},{"key":"2338_CR41","doi-asserted-by":"publisher","first-page":"D247","DOI":"10.1093\/nar\/gkj149","volume":"34","author":"RD Finn","year":"2006","unstructured":"Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A: Pfam: clans, web tools and services. Nucl Acids Res 2006, 34: D247\u201351. 10.1093\/nar\/gkj149","journal-title":"Nucl Acids Res"},{"key":"2338_CR42","doi-asserted-by":"publisher","first-page":"W375","DOI":"10.1093\/nar\/gkh378","volume":"32","author":"K Hiller","year":"2004","unstructured":"Hiller K, Grote A, Scheer M, Munch R, Jahn D: PrediSi: prediction of signal peptides and their cleavage positions. Nucl Acids Res 2004, 32: W375\u20139. 10.1093\/nar\/gkh378","journal-title":"Nucl Acids Res"},{"key":"2338_CR43","doi-asserted-by":"publisher","first-page":"567","DOI":"10.1006\/jmbi.2000.4315","volume":"305","author":"A Krogh","year":"2001","unstructured":"Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes. J Mol Biol 2001, 305: 567\u2013580. 10.1006\/jmbi.2000.4315","journal-title":"J Mol Biol"},{"key":"2338_CR44","doi-asserted-by":"publisher","first-page":"656","DOI":"10.1101\/gr.229202. Article published online before March 2002","volume":"12","author":"WJ Kent","year":"2002","unstructured":"Kent WJ: BLAT \u2013 the BLAST-like alignment tool. Genome Res 2002, 12: 656\u2013664.","journal-title":"Genome Res"},{"key":"2338_CR45","doi-asserted-by":"publisher","first-page":"R15","DOI":"10.1186\/gb-2008-9-1-r15","volume":"9","author":"JL Fink","year":"2008","unstructured":"Fink JL, Karunaratne S, Mittal A, Gardiner DM, Hamilton N, Mahony D, Kai C, Suzuki H, Hayashizaki Y, Teasdale RD: Towards defining the nuclear proteome. Genome Biol 2008, 9: R15. 10.1186\/gb-2008-9-1-r15","journal-title":"Genome Biol"},{"key":"2338_CR46","doi-asserted-by":"publisher","first-page":"783","DOI":"10.1016\/j.jmb.2004.05.028","volume":"340","author":"JD Bendtsen","year":"2004","unstructured":"Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004, 340: 783\u2013795. 10.1016\/j.jmb.2004.05.028","journal-title":"J Mol Biol"},{"key":"2338_CR47","doi-asserted-by":"publisher","first-page":"W429","DOI":"10.1093\/nar\/gkm256","volume":"35","author":"L Kall","year":"2007","unstructured":"Kall L, Krogh A, Sonnhammer EL: Advantages of combined transmembrane topology and signal peptide prediction \u2013 the Phobius web server. Nucl Acids Res 2007, 35: W429\u201332. 10.1093\/nar\/gkm256","journal-title":"Nucl Acids Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-353.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:04:28Z","timestamp":1630494268000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-353"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,27]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2008,12]]}},"alternative-id":["2338"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-353","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,8,27]]},"assertion":[{"value":"23 April 2008","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 August 2008","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 August 2008","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"353"}}