{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,1]],"date-time":"2026-01-01T14:05:36Z","timestamp":1767276336959,"version":"3.37.3"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2019,10,4]],"date-time":"2019-10-04T00:00:00Z","timestamp":1570147200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"China National Natural Science Foundation","doi-asserted-by":"crossref","award":["61471139"],"award-info":[{"award-number":["61471139"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005046","name":"Natural Science Foundation of Heilongjiang Province","doi-asserted-by":"publisher","award":["F2016006"],"award-info":[{"award-number":["F2016006"]}],"id":[{"id":"10.13039\/501100005046","id-type":"DOI","asserted-by":"publisher"}]},{"name":"HEU Fundamental Research Funds for the Central University","award":["3072019CFG0401","HEUCFP201722"],"award-info":[{"award-number":["3072019CFG0401","HEUCFP201722"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,2,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Generally, bottom-up and top-down are two complementary approaches for proteoforms identification. The inference of proteoforms relies on searching mass spectra against an accurate proteoform sequence database. A customized protein sequence database derived by RNA-Seq data can be used to better identify the proteoform existed in a studied species. However, the quality of sequences in customized databases which constructed by different strategies affect the performances of mass spectrometry (MS) identification. Additionally, performances of identifications between bottom-up and top-down using customized databases are also needed to be evaluated<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Three customized databases were constructed with different strategies separately. Two of them were based on translating assembled transcripts with or without genomic annotation, and the third one is a variant-extending protein database. By testing with bottom-up and top-down MS data separately, a variant-extending protein database could identify not only the most number of spectra but also the alleles expressed at the same time in diploid cells. An assembled database could identify the spectrum missed in reference database and amino acid (AA) alterations existed in studied species.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>Experimental results demonstrated that the proteoform sequences in an annotated database are more suitable for identifying AA alterations and peptide sequences missed in reference database. An unannotated database instead of a reference proteome database gets an enough high sensitivity of identifying mass spectra. The variant-extending reference database is the most sensitive to identify mass spectra and single AA variants<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz733","type":"journal-article","created":{"date-parts":[[2019,9,26]],"date-time":"2019-09-26T11:28:48Z","timestamp":1569497328000},"page":"1030-1036","source":"Crossref","is-referenced-by-count":6,"title":["Evaluation of bottom-up and top-down mass spectrum identifications with different customized protein sequences databases"],"prefix":"10.1093","volume":"36","author":[{"given":"Ziwei","family":"Li","sequence":"first","affiliation":[{"name":"College of Automation, Harbin Engineering University , Harbin, Heilongjiang 150001, China"}]},{"given":"Bo","family":"He","sequence":"additional","affiliation":[{"name":"College of Automation, Harbin Engineering University , Harbin, Heilongjiang 150001, China"}]},{"given":"Weixing","family":"Feng","sequence":"additional","affiliation":[{"name":"College of Automation, Harbin Engineering University , Harbin, Heilongjiang 150001, China"}]}],"member":"286","published-online":{"date-parts":[[2019,10,4]]},"reference":[{"key":"2023013110104322800_btz733-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023013110104322800_btz733-B2","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.jprot.2015.09.021","article-title":"Genomic variability and protein species - Improving sequence coverage for proteogenomics","volume":"134","author":"Bischoff","year":"2016","journal-title":"J. Proteomics"},{"key":"2023013110104322800_btz733-B3","doi-asserted-by":"crossref","first-page":"918","DOI":"10.1038\/nbt.2377","article-title":"A cross-platform toolkit for mass spectrometry and proteomics","volume":"30","author":"Chambers","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023013110104322800_btz733-B4","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1038\/nature08989","article-title":"Genome remodelling in a basal-like breast cancer metastasis and xenograft","volume":"464","author":"Ding","year":"2010","journal-title":"Nature"},{"key":"2023013110104322800_btz733-B5","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"Dobin","year":"2013","journal-title":"Bioinformatics"},{"key":"2023013110104322800_btz733-B6","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1126\/science.1124619","article-title":"Mass spectrometry and protein analysis","volume":"312","author":"Domon","year":"2006","journal-title":"Science"},{"key":"2023013110104322800_btz733-B7","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1021\/acs.jproteome.5b00997","article-title":"Quantitation and identification of thousands of human proteoforms below 30 kDa","volume":"15","author":"Durbin","year":"2016","journal-title":"J. Proteome Res"},{"key":"2023013110104322800_btz733-B8","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023013110104322800_btz733-B9","doi-asserted-by":"crossref","first-page":"1207.","DOI":"10.1038\/nmeth.2227","article-title":"De novo derivation of proteomes from transcriptomes for transcript and protein identification","volume":"9","author":"Evans","year":"2012","journal-title":"Nat. Methods"},{"key":"2023013110104322800_btz733-B10","doi-asserted-by":"crossref","first-page":"644.","DOI":"10.1038\/nbt.1883","article-title":"Full-length transcriptome assembly from RNA-Seq data without a reference genome","volume":"29","author":"Grabherr","year":"2011","journal-title":"Nat. Biotechnol"},{"key":"2023013110104322800_btz733-B11","first-page":"656","article-title":"BLAT\u2013the BLAST-like alignment tool","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023013110104322800_btz733-B12","first-page":"221","article-title":"Database searching in mass spectrometry based proteomics","volume-title":"Curr. Bioinform.","author":"Kertesz-Farkas","year":"2012"},{"key":"2023013110104322800_btz733-B13","doi-asserted-by":"crossref","first-page":"5277","DOI":"10.1038\/ncomms6277","article-title":"MS-GF+ makes progress towards a universal database search tool for proteomics","volume":"5","author":"Kim","year":"2014","journal-title":"Nat. Commun"},{"key":"2023013110104322800_btz733-B14","doi-asserted-by":"crossref","first-page":"3495","DOI":"10.1093\/bioinformatics\/btw398","article-title":"TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization","volume":"32","author":"Kou","year":"2016","journal-title":"Bioinformatics"},{"key":"2023013110104322800_btz733-B15","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023013110104322800_btz733-B16","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1016\/j.celrep.2013.08.022","article-title":"Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts","volume":"4","author":"Li","year":"2013","journal-title":"Cell Rep"},{"key":"2023013110104322800_btz733-B17","doi-asserted-by":"crossref","first-page":"494.","DOI":"10.1186\/s12859-018-2462-1","article-title":"Evaluation of top-down mass spectral identification with homologous protein sequences","volume":"19","author":"Li","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2023013110104322800_btz733-B18","doi-asserted-by":"crossref","first-page":"2772","DOI":"10.1074\/mcp.M110.002766","article-title":"Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach","volume":"9","author":"Liu","year":"2010","journal-title":"Mol. Cell. Proteomics"},{"key":"2023013110104322800_btz733-B19","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023013110104322800_btz733-B20","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1074\/mcp.M114.047480","article-title":"Integrated bottom-up and top-down proteomics of patient-derived breast tumor xenografts","volume":"15","author":"Ntai","year":"2016","journal-title":"Mol. Cell. Proteomics"},{"key":"2023013110104322800_btz733-B21","doi-asserted-by":"crossref","first-page":"D733","DOI":"10.1093\/nar\/gkv1189","article-title":"Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation","volume":"44","author":"O'Leary","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023013110104322800_btz733-B22","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1038\/nmeth.4388","article-title":"Informed-proteomics: open-source software package for top-down proteomics","volume":"14","author":"Park","year":"2017","journal-title":"Nat. Methods"},{"key":"2023013110104322800_btz733-B23","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2","article-title":"Probability-based protein identification by searching sequence databases using mass spectrometry data","volume":"20","author":"Perkins","year":"1999","journal-title":"Electrophoresis"},{"key":"2023013110104322800_btz733-B24","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: a flexible suite of utilities for comparing genomic features","volume":"26","author":"Quinlan","year":"2010","journal-title":"Bioinformatics"},{"key":"2023013110104322800_btz733-B25","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1038\/nmeth725","article-title":"Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book","volume":"1","author":"Sadygov","year":"2004","journal-title":"Nat. Methods"},{"key":"2023013110104322800_btz733-B26","doi-asserted-by":"crossref","first-page":"D158","DOI":"10.1093\/nar\/gkw1099","article-title":"UniProt: the universal protein knowledgebase","volume":"45","author":"The UniProt","year":"2017","journal-title":"Nucleic Acids Res"},{"key":"2023013110104322800_btz733-B27","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1038\/nature10575","article-title":"Mapping intact protein isoforms in discovery mode using top-down proteomics","volume":"480","author":"Tran","year":"2011","journal-title":"Nature"},{"key":"2023013110104322800_btz733-B28","doi-asserted-by":"crossref","first-page":"e164","DOI":"10.1093\/nar\/gkq603","article-title":"ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data","volume":"38","author":"Wang","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023013110104322800_btz733-B29","doi-asserted-by":"crossref","first-page":"1009","DOI":"10.1021\/pr200766z","article-title":"Protein identification using customized protein sequence databases derived from RNA-Seq data","volume":"11","author":"Wang","year":"2012","journal-title":"J. Proteome Res"},{"key":"2023013110104322800_btz733-B30","doi-asserted-by":"crossref","first-page":"244","DOI":"10.1186\/s12859-016-1133-3","article-title":"PGA: an R\/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq","volume":"17","author":"Wen","year":"2016","journal-title":"BMC Bioinformatics"},{"key":"2023013110104322800_btz733-B31","first-page":"242","article-title":"Shotgun proteomics: tools for the analysis of complex biological systems","volume":"4","author":"Wu","year":"2002","journal-title":"Curr. Opin. Mol. Ther"},{"key":"2023013110104322800_btz733-B32","doi-asserted-by":"crossref","first-page":"D710","DOI":"10.1093\/nar\/gkv1157","article-title":"Ensembl 2016","volume":"44","author":"Yates","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023013110104322800_btz733-B33","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1146\/annurev-bioeng-061008-124934","article-title":"Proteomics by mass spectrometry: approaches, advances, and applications","volume":"11","author":"Yates","year":"2009","journal-title":"Annu. Rev. Biomed. Eng"},{"key":"2023013110104322800_btz733-B34","doi-asserted-by":"crossref","first-page":"W701","DOI":"10.1093\/nar\/gkm371","article-title":"ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry","volume":"35","author":"Zamdborg","year":"2007","journal-title":"Nucleic Acids Res"},{"key":"2023013110104322800_btz733-B35","doi-asserted-by":"crossref","first-page":"2343","DOI":"10.1021\/cr3003533","article-title":"Protein analysis by shotgun\/bottom-up proteomics","volume":"113","author":"Zhang","year":"2013","journal-title":"Chem. Rev"},{"key":"2023013110104322800_btz733-B36","doi-asserted-by":"crossref","first-page":"i106","DOI":"10.1093\/bioinformatics\/btv236","article-title":"MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms","volume":"31","author":"Zickmann","year":"2015","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btz733\/30246315\/btz733.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/4\/1030\/48982300\/bioinformatics_36_4_1030.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/4\/1030\/48982300\/bioinformatics_36_4_1030.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,20]],"date-time":"2023-09-20T22:44:24Z","timestamp":1695249864000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/4\/1030\/5581398"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,10,4]]},"references-count":36,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,2,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz733","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2020,2,15]]},"published":{"date-parts":[[2019,10,4]]}}}