{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T21:51:42Z","timestamp":1775857902332,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"22-23","license":[{"start":{"date-parts":[[2020,12,16]],"date-time":"2020-12-16T00:00:00Z","timestamp":1608076800000},"content-version":"vor","delay-in-days":15,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Germany\u2019s Excellence Strategy\u2013EXC-2048\/1\u2013Project","award":["390686111"],"award-info":[{"award-number":["390686111"]}]},{"name":"BMBF-funded de.NBI Cloud"},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A537B"],"award-info":[{"award-number":["031A537B"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A533A"],"award-info":[{"award-number":["031A533A"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A538A"],"award-info":[{"award-number":["031A538A"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A533B"],"award-info":[{"award-number":["031A533B"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A535A"],"award-info":[{"award-number":["031A535A"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A537C"],"award-info":[{"award-number":["031A537C"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A534A"],"award-info":[{"award-number":["031A534A"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018929","name":"German Network for Bioinformatics Infrastructure","doi-asserted-by":"publisher","award":["031A532B"],"award-info":[{"award-number":["031A532B"]}],"id":[{"id":"10.13039\/501100018929","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Current state-of-the-art tools for the de novo annotation of genes in eukaryotic genomes have to be specifically fitted for each species and still often produce annotations that can be improved much further. The fundamental algorithmic architecture for these tools has remained largely unchanged for about two decades, limiting learning capabilities. Here, we set out to improve the cross-species annotation of genes from DNA sequence alone with the help of deep learning. The goal is to eliminate the dependency on a closely related gene model while also improving the predictive quality in general with a fundamentally new architecture.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present Helixer, a framework for the development and usage of a cross-species deep learning model that improves significantly on performance and generalizability when compared to more traditional methods. We evaluate our approach by building a single vertebrate model for the base-wise annotation of 186 animal genomes and a separate land plant model for 51 plant genomes. Our predictions are shown to be much less sensitive to the length of the genome than those of a current state-of-the-art tool. We also present two novel post-processing techniques that each worked to further strengthen our annotations and show in-depth results of an RNA-Seq based comparison of our predictions. Our method does not yet produce comprehensive gene models but rather outputs base pair wise probabilities.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The source code of this work is available at https:\/\/github.com\/weberlab-hhu\/Helixer under the GNU General Public License v3.0. The trained models are available at https:\/\/doi.org\/10.5281\/zenodo.3974409<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa1044","type":"journal-article","created":{"date-parts":[[2020,12,8]],"date-time":"2020-12-08T03:55:05Z","timestamp":1607399705000},"page":"5291-5298","source":"Crossref","is-referenced-by-count":137,"title":["Helixer: cross-species gene annotation of large eukaryotic genomes using deep learning"],"prefix":"10.1093","volume":"36","author":[{"given":"Felix","family":"Stiehler","sequence":"first","affiliation":[{"name":"Institue of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich-Heine-University , Dusseldorf 40225, Germany"}]},{"given":"Marvin","family":"Steinborn","sequence":"additional","affiliation":[{"name":"Institue of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich-Heine-University , Dusseldorf 40225, Germany"}]},{"given":"Stephan","family":"Scholz","sequence":"additional","affiliation":[]},{"given":"Daniela","family":"Dey","sequence":"additional","affiliation":[{"name":"Institute of Human Genetics, Medical Faculty, RWTH Aachen University , Aachen 52062, Germany"}]},{"given":"Andreas P M","family":"Weber","sequence":"additional","affiliation":[{"name":"Institue of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich-Heine-University , Dusseldorf 40225, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1132-0224","authenticated-orcid":false,"given":"Alisandra K","family":"Denton","sequence":"additional","affiliation":[{"name":"Institue of Plant Biochemistry, Faculty of Mathematics and Natural Sciences, Heinrich-Heine-University , Dusseldorf 40225, Germany"}]}],"member":"286","published-online":{"date-parts":[[2020,12,16]]},"reference":[{"key":"2023062708410424200_btaa1044-B1","author":"Abadi","year":"2016"},{"key":"2023062708410424200_btaa1044-B2","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baw093","article-title":"The ensembl gene annotation system","volume":"2016","author":"Aken","year":"2016","journal-title":"Database"},{"key":"2023062708410424200_btaa1044-B3","first-page":"254","author":"Amin","year":"2018"},{"key":"2023062708410424200_btaa1044-B4","author":"Ba","year":"2016"},{"key":"2023062708410424200_btaa1044-B5","first-page":"2546","author":"Bergstra","year":"2011"},{"key":"2023062708410424200_btaa1044-B6","doi-asserted-by":"crossref","first-page":"2114","DOI":"10.1093\/bioinformatics\/btu170","article-title":"Trimmomatic: a flexible trimmer for illumina sequence data","volume":"30","author":"Bolger","year":"2014","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B7","doi-asserted-by":"crossref","first-page":"7570","DOI":"10.1128\/JVI.79.12.7570-7596.2005","article-title":"Predicting coding potential from genome sequence: application to betaherpesviruses infecting rats and mice","volume":"79","author":"Brocchieri","year":"2005","journal-title":"J. Virol"},{"key":"2023062708410424200_btaa1044-B8","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1006\/jmbi.1997.0951","article-title":"Prediction of complete gene structures in human genomic dna","volume":"268","author":"Burge","year":"1997","journal-title":"J. Mol. Biol"},{"key":"2023062708410424200_btaa1044-B9","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1101\/gr.6743907","article-title":"Maker: an easy-to-use annotation pipeline designed for emerging model organism genomes","volume":"18","author":"Cantarel","year":"2007","journal-title":"Genome Res"},{"key":"2023062708410424200_btaa1044-B10","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1038\/ng1789","article-title":"Genome-wide analysis of mammalian promoter architecture and evolution","volume":"38","author":"Carninci","year":"2006","journal-title":"Nat. Genet"},{"key":"2023062708410424200_btaa1044-B11","doi-asserted-by":"crossref","first-page":"20170387","DOI":"10.1098\/rsif.2017.0387","article-title":"Opportunities and obstacles for deep learning in biology and medicine","volume":"15","author":"Ching","year":"2018","journal-title":"J. R. Soc. Interface"},{"key":"2023062708410424200_btaa1044-B12","author":"Choudhary","year":"2017"},{"key":"2023062708410424200_btaa1044-B13","doi-asserted-by":"crossref","first-page":"3047","DOI":"10.1093\/bioinformatics\/btw354","article-title":"MultiQC: summarize analysis results for multiple tools and samples in a single report","volume":"32","author":"Ewels","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B14","doi-asserted-by":"crossref","first-page":"D1178","DOI":"10.1093\/nar\/gkr944","article-title":"Phytozome: a comparative platform for green plant genomics","volume":"40","author":"Goodstein","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062708410424200_btaa1044-B15","doi-asserted-by":"crossref","first-page":"5654","DOI":"10.1093\/nar\/gkg770","article-title":"Improving the arabidopsis genome annotation using maximal transcript alignment assemblies","volume":"31","author":"Haas","year":"2003","journal-title":"Nucleic Acids Res"},{"key":"2023062708410424200_btaa1044-B16","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput"},{"key":"2023062708410424200_btaa1044-B17","doi-asserted-by":"crossref","first-page":"1936","DOI":"10.1093\/nar\/gks1271","article-title":"Quantification of stochastic noise of splicing and polyadenylation in entamoeba histolytica","volume":"41","author":"Hon","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023062708410424200_btaa1044-B18","doi-asserted-by":"crossref","first-page":"D689","DOI":"10.1093\/nar\/gkz890","article-title":"Ensembl genomes 2020-enabling non-vertebrate genomic research","volume":"48","author":"Howe","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023062708410424200_btaa1044-B19","doi-asserted-by":"crossref","first-page":"535","DOI":"10.1016\/j.cell.2018.12.015","article-title":"Predicting splicing from primary sequence with deep learning","volume":"176","author":"Jaganathan","year":"2019","journal-title":"Cell"},{"key":"2023062708410424200_btaa1044-B20","doi-asserted-by":"crossref","first-page":"2938","DOI":"10.1093\/bioinformatics\/btn564","article-title":"Snap: a web-based tool for identification and annotation of proxy snps using hapmap","volume":"24","author":"Johnson","year":"2008","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B21","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1038\/s41587-019-0201-4","article-title":"Graph-based genome alignment and genotyping with hisat2 and hisat-genotype","volume":"37","author":"Kim","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062708410424200_btaa1044-B22","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The sequence read archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023062708410424200_btaa1044-B23","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and samtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B24","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1093\/bioinformatics\/btv643","article-title":"De novo identification of replication-timing domains in the human genome by deep learning","volume":"32","author":"Liu","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B25","doi-asserted-by":"crossref","first-page":"28517","DOI":"10.1038\/srep28517","article-title":"Pedla: predicting enhancers with a deep learning-based algorithmic framework","volume":"6","author":"Liu","year":"2016","journal-title":"Scientific Rep"},{"key":"2023062708410424200_btaa1044-B26","doi-asserted-by":"crossref","first-page":"286","DOI":"10.3389\/fgene.2019.00286","article-title":"Deepromoter: robust promoter predictor using deep learning","volume":"10","author":"Oubounyt","year":"2019","journal-title":"Front. Genet"},{"key":"2023062708410424200_btaa1044-B27","doi-asserted-by":"crossref","first-page":"e107","DOI":"10.1093\/nar\/gkw226","article-title":"DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences","volume":"44","author":"Quang","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023062708410424200_btaa1044-B01196143","doi-asserted-by":"crossref","first-page":"3210","DOI":"10.1093\/bioinformatics\/btv351","article-title":"BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs","volume":"31","author":"Sim\u00e3o","year":"2015","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B28","doi-asserted-by":"crossref","first-page":"ii215","DOI":"10.1093\/bioinformatics\/btg1080","article-title":"Gene prediction with a hidden Markov model and a new intron submodel","volume":"19","author":"Stanke","year":"2003","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B29","first-page":"1","volume-title":"The NCBI Handbook [Internet]","author":"Thibaud-Nissen","year":"2013","edition":"2nd edn"},{"key":"2023062708410424200_btaa1044-B30","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1186\/s12864-016-2646-x","article-title":"A robust (re-) annotation approach to generate unbiased mapping references for rna-seq-based analyses of differential expression across closely related species","volume":"17","author":"Torres-Oliva","year":"2016","journal-title":"BMC Genomics"},{"key":"2023062708410424200_btaa1044-B31","doi-asserted-by":"crossref","first-page":"i269","DOI":"10.1093\/bioinformatics\/btz339","article-title":"Comprehensive evaluation of deep learning architectures for prediction of DNA\/RNA sequence binding specificities","volume":"35","author":"Trabelsi","year":"2019","journal-title":"Bioinformatics"},{"key":"2023062708410424200_btaa1044-B32","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1186\/s12859-019-3306-3","article-title":"Splicefinder: ab initio prediction of splice sites using convolutional neural network","volume":"20","author":"Wang","year":"2019","journal-title":"BMC Bioinformatics"},{"key":"2023062708410424200_btaa1044-B33","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/S1672-0229(04)02028-5","article-title":"A brief review of computational gene prediction methods","volume":"2","author":"Wang","year":"2004","journal-title":"Genomics Proteomics Bioinf"},{"key":"2023062708410424200_btaa1044-B34","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1186\/s12864-015-1308-8","article-title":"A comprehensive evaluation of ensembl, refseq, and ucsc annotations in the context of rna-seq read mapping and gene quantification","volume":"16","author":"Zhao","year":"2015","journal-title":"BMC Genomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa1044\/36215443\/btaa1044.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/22-23\/5291\/50716045\/btaa1044.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/22-23\/5291\/50716045\/btaa1044.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T08:41:41Z","timestamp":1687855301000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/22-23\/5291\/6039118"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,12,1]]},"references-count":35,"journal-issue":{"issue":"22-23","published-print":{"date-parts":[[2021,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa1044","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,12,1]]},"published":{"date-parts":[[2020,12,1]]}}}