{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T18:42:37Z","timestamp":1774550557995,"version":"3.50.1"},"reference-count":40,"publisher":"Oxford University Press (OUP)","issue":"Supplement_2","license":[{"start":{"date-parts":[[2022,9,18]],"date-time":"2022-09-18T00:00:00Z","timestamp":1663459200000},"content-version":"vor","delay-in-days":17,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"National Human Genome Research Institute of the National Institutes of Health","award":["ECCB2022"],"award-info":[{"award-number":["ECCB2022"]}]},{"name":"National Human Genome Research Institute of the National Institutes of Health","award":["U24HG007234"],"award-info":[{"award-number":["U24HG007234"]}]},{"name":"National Human Genome Research Institute of the National Institutes of Health","award":["PGC2018-097019-B-I00"],"award-info":[{"award-number":["PGC2018-097019-B-I00"]}]},{"DOI":"10.13039\/100014440","name":"Ministry of Science, Innovation and Universities","doi-asserted-by":"publisher","award":["IPT17\/0019"],"award-info":[{"award-number":["IPT17\/0019"]}],"id":[{"id":"10.13039\/100014440","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Carlos III Institute of Health-Fondo de Investigaci\u00f3n Sanitaria","award":["HR17-00247"],"award-info":[{"award-number":["HR17-00247"]}]},{"name":"\u2018la Caixa\u2019 Foundation"},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,16]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https:\/\/appris.bioinfo.cnio.es), GENCODE genes (https:\/\/www.gencodegenes.org\/) and the Ensembl website (https:\/\/www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https:\/\/www.ncbi.nlm.nih.gov\/refseq\/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac473","type":"journal-article","created":{"date-parts":[[2022,9,20]],"date-time":"2022-09-20T09:22:48Z","timestamp":1663665768000},"page":"ii89-ii94","source":"Crossref","is-referenced-by-count":21,"title":["APPRIS principal isoforms and MANE Select transcripts define reference splice variants"],"prefix":"10.1093","volume":"38","author":[{"given":"Fernando","family":"Pozo","sequence":"first","affiliation":[{"name":"Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO) , 28029 Madrid, Spain"}]},{"given":"Jos\u00e9 Manuel","family":"Rodriguez","sequence":"additional","affiliation":[{"name":"Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC) , 28029 Madrid, Spain"}]},{"given":"Laura","family":"Mart\u00ednez G\u00f3mez","sequence":"additional","affiliation":[{"name":"Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO) , 28029 Madrid, Spain"}]},{"given":"Jes\u00fas","family":"V\u00e1zquez","sequence":"additional","affiliation":[{"name":"Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC) , 28029 Madrid, Spain"},{"name":"CIBER de Investigaciones Cardiovasculares (CIBERCV) , 28029 Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9046-6370","authenticated-orcid":false,"given":"Michael L","family":"Tress","sequence":"additional","affiliation":[{"name":"Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO) , 28029 Madrid, Spain"}]}],"member":"286","published-online":{"date-parts":[[2022,9,18]]},"reference":[{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"1000 Genomes Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pcbi.1004325","article-title":"Alternatively spliced homologous exons have ancient origins and are highly expressed at the protein level","volume":"11","author":"Abascal","year":"2015","journal-title":"PLoS Comp. Biol"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"7070","DOI":"10.1093\/nar\/gky587","article-title":"Loose ends: almost one in five human genes still have unresolved coding status","volume":"46","author":"Abascal","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1016\/j.cels.2017.05.009","article-title":"An optimized shotgun strategy for the rapid generation of comprehensive human proteomes","volume":"4","author":"Bekker-Jensen","year":"2017","journal-title":"Cell Syst"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1186\/s12864-018-5013-2","article-title":"Systematic evaluation of isoform function in literature reports of alternative splicing","volume":"19","author":"Bhuiyan","year":"2018","journal-title":"BMC Genomics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1007\/978-1-4939-7000-1_26","article-title":"Protein data bank (PDB): the single global macromolecular structure archive","volume":"1607","author":"Burley","year":"2017","journal-title":"Methods Mol. Biol"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1787","DOI":"10.1038\/s41593-017-0011-2","article-title":"A multiregional proteomic survey of the postnatal human brain","volume":"20","author":"Carlyle","year":"2017","journal-title":"Nat. Neurosci"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"D988","DOI":"10.1093\/nar\/gkab1049","article-title":"Ensembl 2022","volume":"50","author":"Cunningham","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"D1100","DOI":"10.1093\/nar\/gkw936","article-title":"The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition","volume":"45","author":"Deutsch","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1002\/pmic.201200439","article-title":"Comet: an open-source MS\/MS sequence database search tool","volume":"13","author":"Eng","year":"2013","journal-title":"Proteomics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1880","DOI":"10.1021\/pr501286b","article-title":"Most highly expressed protein-coding genes have a single dominant isoform","volume":"14","author":"Ezkurdia","year":"2015","journal-title":"J. Proteome Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"579","DOI":"10.1586\/14789450.2015.1103186","article-title":"The potential clinical impact of the release of two drafts of the human proteome","volume":"12","author":"Ezkurdia","year":"2015","journal-title":"Exp. Rev. Proteomics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"D916","DOI":"10.1093\/nar\/gkaa1087","article-title":"Gencode 2021","volume":"49","author":"Frankish","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"e108","DOI":"10.1002\/cpbi.108","article-title":"Protein sequence analysis using the MPI bioinformatics toolkit","volume":"72","author":"Gabler","year":"2020","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"R70","DOI":"10.1186\/gb-2013-14-7-r70","article-title":"Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene","volume":"14","author":"Gonz\u00e0lez-Porta","year":"2013","journal-title":"Genome Biol"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.gene.2012.07.083","article-title":"Function of alternative splicing","volume":"514","author":"Kelemen","year":"2013","journal-title":"Gene"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"575","DOI":"10.1038\/nature13302","article-title":"A draft map of the human proteome","volume":"509","author":"Kim","year":"2014","journal-title":"Nature"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"3484","DOI":"10.1021\/acs.jproteome.5b00494","article-title":"Functional networks of highest-connected splice isoforms: from the chromosome 17 human proteome project","volume":"14","author":"Li","year":"2015","journal-title":"J Proteome Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1378","DOI":"10.1039\/C5MB00132C","article-title":"The distribution pattern of genetic variation in the transcript isoforms of the alternatively spliced protein-coding genes in the human genome","volume":"11","author":"Liu","year":"2015","journal-title":"Mol. Biosyst"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"W235","DOI":"10.1093\/nar\/gkr437","article-title":"Firestar\u2014advances in the prediction of functionally important residues","volume":"39","author":"Lopez","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"8232","DOI":"10.1093\/nar\/gkab623","article-title":"The clinical importance of tandem exon duplication-derived substitutions","volume":"49","author":"Martinez Gomez","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"D412","DOI":"10.1093\/nar\/gkaa913","article-title":"Pfam: the protein families database in 2021","volume":"49","author":"Mistry","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1038\/s41586-022-04558-8","article-title":"A joint NCBI and EMBL-EBI transcript set for clinical genomics and research","volume":"604","author":"Morales","year":"2022","journal-title":"Nature"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1093\/bioinformatics\/btab542","article-title":"Ranked choice voting for representative transcripts with TRaCE","volume":"38","author":"Olson","year":"2021","journal-title":"Bioinformatics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"lqab044","DOI":"10.1093\/nargab\/lqab044","article-title":"Assessing the functional relevance of splice isoforms","volume":"3","author":"Pozo","year":"2021","journal-title":"NAR Genom. Bioinformatics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"e1707","DOI":"10.1002\/wrna.1707","article-title":"Uncovering the impacts of alternative splicing on the proteome with current omics techniques","author":"Reixachs-Sol\u00e9","year":"2022","journal-title":"Wiley Interdiscip. Rev"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"e1008287","DOI":"10.1371\/journal.pcbi.1008287","article-title":"An analysis of tissue-specific alternative splicing at the protein level","volume":"16","author":"Rodriguez","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"D54","DOI":"10.1093\/nar\/gkab1058","article-title":"APPRIS: selecting functionally important isoforms","volume":"50","author":"Rodriguez","year":"2022","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"D10","DOI":"10.1093\/nar\/gkaa892","article-title":"Database resources of the national center for biotechnology information","volume":"49","author":"Sayers","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1074\/mcp.RA118.001170","article-title":"Identification of TEX101-associated proteins through proteomic measurement of human spermatozoa homozygous for the missense variant rs35033974","volume":"18","author":"Schiza","year":"2019","journal-title":"Mol. Cell. Proteomics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1007\/s13361-016-1460-7","article-title":"Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0","volume":"27","author":"The","year":"2016","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1016\/j.tibs.2016.08.008","article-title":"Alternative splicing may not be the key to proteome complexity","volume":"42","author":"Tress","year":"2017","journal-title":"Trends Biochem. Sci"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1260419","DOI":"10.1126\/science.1260419","article-title":"Proteomics. Tissue-based map of the human proteome","volume":"347","author":"Uhl\u00e9n","year":"2015","journal-title":"Science"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"4637","DOI":"10.1021\/bi00233a001","article-title":"The ras protein family: evolutionary tree and role of conserved amino acids","volume":"30","author":"Valencia","year":"1991","journal-title":"Biochemistry"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"e8503","DOI":"10.15252\/msb.20188503","article-title":"A deep proteome and transcriptome abundance atlas of 29 healthy human tissues","volume":"15","author":"Wang","year":"2019","journal-title":"Mol. Syst. Biol"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"422","DOI":"10.1074\/mcp.RA117.000155","article-title":"Detection of proteome diversity resulted from alternative splicing is limited by trypsin cleavage specificity","volume":"17","author":"Wang","year":"2018","journal-title":"Mol. Cell. Proteomics"},{"key":"2023041408001228600_","doi-asserted-by":"crossref","first-page":"1491","DOI":"10.1038\/sj.emboj.7600643","article-title":"Structural basis for recruitment of RILP by small GTPase Rab7","volume":"24","author":"Wu","year":"2005","journal-title":"EMBO J"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii89\/49885990\/btac473.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/Supplement_2\/ii89\/49885990\/btac473.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,26]],"date-time":"2023-11-26T04:31:56Z","timestamp":1700973116000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/Supplement_2\/ii89\/6701991"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,1]]},"references-count":40,"journal-issue":{"issue":"Supplement_2","published-print":{"date-parts":[[2022,9,16]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac473","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,9,1]]}}}