{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,16]],"date-time":"2026-03-16T12:49:49Z","timestamp":1773665389956,"version":"3.50.1"},"reference-count":46,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2014,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score.<\/jats:p>\n               <jats:p>Results: We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two \u2018post-genomic\u2019 applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.<\/jats:p>\n               <jats:p>Availability and implementation: The statistical calculation is available in FALP ( http:\/\/www.ncbi.nlm.nih.gov\/CBBresearch\/Spouge\/html_ncbi\/html\/index\/software.html ), and giga-scale frameshift alignment is available in LAST ( http:\/\/last.cbrc.jp\/falp ).<\/jats:p>\n               <jats:p>Contact: \u00a0spouge@ncbi.nlm.nih.gov or martin@cbrc.jp<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu576","type":"journal-article","created":{"date-parts":[[2014,8,30]],"date-time":"2014-08-30T00:28:26Z","timestamp":1409358506000},"page":"3575-3582","source":"Crossref","is-referenced-by-count":37,"title":["Frameshift alignment: statistics and post-genomic applications"],"prefix":"10.1093","volume":"30","author":[{"given":"Sergey L.","family":"Sheetlin","sequence":"first","affiliation":[{"name":"1 National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and 2 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan"}]},{"given":"Yonil","family":"Park","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and 2 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan"}]},{"given":"Martin C.","family":"Frith","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and 2 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan"}]},{"given":"John L.","family":"Spouge","sequence":"additional","affiliation":[{"name":"1 National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and 2 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan"}]}],"member":"286","published-online":{"date-parts":[[2014,8,28]]},"reference":[{"key":"2023012712055554800_btu576-B1","doi-asserted-by":"crossref","first-page":"460","DOI":"10.1016\/S0076-6879(96)66029-7","article-title":"Local alignment statistics","volume":"266","author":"Altschul","year":"1996","journal-title":"Methods Enzymol."},{"key":"2023012712055554800_btu576-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023012712055554800_btu576-B3","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped blast and psi-blast: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012712055554800_btu576-B4","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1093\/nar\/29.2.351","article-title":"The estimation of statistical parameters for local alignment score distributions","volume":"29","author":"Altschul","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023012712055554800_btu576-B5","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1214\/aoap\/1177005208","article-title":"A phase transition for the score in matching random sequences allowing deletions","volume":"4","author":"Arratia","year":"1994","journal-title":"Ann. Appl. Probab."},{"key":"2023012712055554800_btu576-B6","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1093\/bib\/3.2.181","article-title":"Exact mapping of prokaryotic gene starts","volume":"3","author":"Baytaluk","year":"2002","journal-title":"Brief. Bioinformatics"},{"key":"2023012712055554800_btu576-B7","doi-asserted-by":"crossref","first-page":"1321","DOI":"10.1126\/science.1098119","article-title":"Ultraconserved elements in the human genome","volume":"304","author":"Bejerano","year":"2004","journal-title":"Science"},{"key":"2023012712055554800_btu576-B8","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1089\/10665270252935449","article-title":"Rapid significance estimation in local sequence alignment with gaps","volume":"9","author":"Bundschuh","year":"2002","journal-title":"J. Comput. Biol."},{"key":"2023012712055554800_btu576-B9","doi-asserted-by":"crossref","first-page":"375","DOI":"10.1186\/1471-2164-13-375","article-title":"Pacific biosciences sequencing technology for genotyping and variation discovery in human data","volume":"13","author":"Carneiro","year":"2012","journal-title":"BMC Genomics"},{"key":"2023012712055554800_btu576-B10","doi-asserted-by":"crossref","first-page":"e243","DOI":"10.7717\/peerj.243","article-title":"Phylosift: Phylogenetic analysis of genomes and metagenomes","volume":"2","author":"Darling","year":"2014","journal-title":"Peer J."},{"key":"2023012712055554800_btu576-B11","article-title":"A model of evolutionary change in proteins","volume-title":"Atlas of protein sequence and structure","author":"Dayhoff","year":"1978"},{"key":"2023012712055554800_btu576-B12","doi-asserted-by":"crossref","first-page":"2022","DOI":"10.1214\/aop\/1176988493","article-title":"Limit distributions of maximal non-aligned two-sequence segmental score","volume":"22","author":"Dembo","year":"1994","journal-title":"Ann. Probab."},{"key":"2023012712055554800_btu576-B13","doi-asserted-by":"crossref","first-page":"2460","DOI":"10.1093\/bioinformatics\/btq461","article-title":"Search and clustering orders of magnitude faster than blast","volume":"26","author":"Edgar","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012712055554800_btu576-B14","doi-asserted-by":"crossref","first-page":"e23","DOI":"10.1093\/nar\/gkq1212","article-title":"A new repeat-masking method enables specific detection of homologous sequences","volume":"39","author":"Frith","year":"2011","journal-title":"Nucleic Acids Res."},{"key":"2023012712055554800_btu576-B15","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1186\/1741-7007-4-41","article-title":"Composition-based statistics and translated nucleotide searches: improving the tblastn module of blast","volume":"4","author":"Gertz","year":"2006","journal-title":"BMC Biol."},{"key":"2023012712055554800_btu576-B16","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1186\/1748-7188-5-6","article-title":"Back-translation for discovering distant protein homologies in the presence of frameshift mutations","volume":"5","author":"Girdea","year":"2010","journal-title":"Algorithms Mol. Biol."},{"key":"2023012712055554800_btu576-B17","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1038\/ng0393-266","article-title":"Identification of protein coding regions by database similarity search","volume":"3","author":"Gish","year":"1993","journal-title":"Nat. Genet."},{"key":"2023012712055554800_btu576-B18","first-page":"31","article-title":"Alignments of DNA and protein sequences containing frameshift errors","volume":"12","author":"Guan","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"2023012712055554800_btu576-B19","doi-asserted-by":"crossref","DOI":"10.1007\/978-94-009-5819-7","article-title":"Monte Carlo methods","volume-title":"Monographs on Applied Probability & Statistics","author":"Hammersley","year":"1964"},{"key":"2023012712055554800_btu576-B20","doi-asserted-by":"crossref","first-page":"1760","DOI":"10.1101\/gr.135350.111","article-title":"Gencode: The reference human genome annotation for the encode project","volume":"22","author":"Harrow","year":"2012","journal-title":"Genome Res."},{"key":"2023012712055554800_btu576-B21","doi-asserted-by":"crossref","first-page":"10915","DOI":"10.1073\/pnas.89.22.10915","article-title":"Amino acid substitution matrices from protein blocks","volume":"89","author":"Henikoff","year":"1992","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012712055554800_btu576-B22","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1093\/bioinformatics\/btt254","article-title":"A poor man's blastx\u2014high-throughput metagenomic protein database search using pauda","volume":"30","author":"Huson","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712055554800_btu576-B23","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1101\/gr.113985.110","article-title":"Adaptive seeds tame genomic sequence comparison","volume":"21","author":"Kielbasa","year":"2011","journal-title":"Genome Res."},{"key":"2023012712055554800_btu576-B24","doi-asserted-by":"crossref","DOI":"10.1093\/database\/bau062","article-title":"UCbase 2.0: ultraconserved sequences database (2014 update)","volume":"2014","author":"Lomonaco","year":"2014","journal-title":"Database"},{"key":"2023012712055554800_btu576-B25","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1186\/1471-2105-13-230","article-title":"Highly improved homopolymer aware nucleotide-protein alignments with 454 data","volume":"13","author":"Lysholm","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023012712055554800_btu576-B26","doi-asserted-by":"crossref","first-page":"140","DOI":"10.1093\/gbe\/evs128","article-title":"Vertebrate paralogous conserved noncoding sequences may be related to gene expressions in brain","volume":"5","author":"Matsunami","year":"2013","journal-title":"Genome Biol. Evol."},{"key":"2023012712055554800_btu576-B27","doi-asserted-by":"crossref","first-page":"e1000762","DOI":"10.1371\/journal.pgen.1000762","article-title":"Early evolution of conserved regulatory sequences associated with development in vertebrates","volume":"5","author":"Mcewen","year":"2009","journal-title":"PLoS Genet."},{"key":"2023012712055554800_btu576-B28","doi-asserted-by":"crossref","first-page":"D64","DOI":"10.1093\/nar\/gks1048","article-title":"The ucsc genome browser database: extensions and updates 2013","volume":"41","author":"Meyer","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023012712055554800_btu576-B29","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1093\/bioinformatics\/17.1.13","article-title":"Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors","volume":"17","author":"Mironov","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012712055554800_btu576-B30","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1186\/1756-0500-5-286","article-title":"New finite-size correction for local alignment score distributions","volume":"5","author":"Park","year":"2012","journal-title":"BMC Res. Notes"},{"key":"2023012712055554800_btu576-B31","doi-asserted-by":"crossref","first-page":"3697","DOI":"10.1214\/08-AOS663","article-title":"Estimating the gumbel scale parameter for local alignment of random sequences by importance sampling with stopping times","volume":"37","author":"Park","year":"2009","journal-title":"Ann. Stat."},{"key":"2023012712055554800_btu576-B32","doi-asserted-by":"crossref","first-page":"24","DOI":"10.1006\/geno.1997.4995","article-title":"Comparison of DNA sequences with protein sequences","volume":"46","author":"Pearson","year":"1997","journal-title":"Genomics"},{"key":"2023012712055554800_btu576-B33","doi-asserted-by":"crossref","first-page":"8880","DOI":"10.1073\/pnas.88.20.8880","article-title":"Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins","volume":"88","author":"Robinson","year":"1991","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012712055554800_btu576-B34","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1007\/s00248-013-0253-9","article-title":"Taxonomic profiling and metagenome analysis of a microbial community from a habitat contaminated with industrial discharges","volume":"66","author":"Shah","year":"2013","journal-title":"Microb. Ecol."},{"key":"2023012712055554800_btu576-B35","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1186\/1742-4690-10-18","article-title":"Analysis of 454 sequencing error rate, error sources, and artifact recombination for detection of low-frequency drug resistance mutations in hiv-1 DNA","volume":"10","author":"Shao","year":"2013","journal-title":"Retrovirology"},{"key":"2023012712055554800_btu576-B36","doi-asserted-by":"crossref","first-page":"4987","DOI":"10.1093\/nar\/gki800","article-title":"The gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment","volume":"33","author":"Sheetlin","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012712055554800_btu576-B37","doi-asserted-by":"crossref","first-page":"1135","DOI":"10.1038\/nbt1486","article-title":"Next-generation DNA sequencing","volume":"26","author":"Shendure","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"2023012712055554800_btu576-B38","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"Uniref: Comprehensive and non-redundant uniprot reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012712055554800_btu576-B39","doi-asserted-by":"crossref","first-page":"e36060","DOI":"10.1371\/journal.pone.0036060","article-title":"Ghostm: a gpu-accelerated homology search tool for metagenomics","volume":"7","author":"Suzuki","year":"2012","journal-title":"Plos One"},{"key":"2023012712055554800_btu576-B40","first-page":"42","volume-title":"GNU Parallel: The Command-Line Power Tool.;login: The USENIX Magazine","author":"Tange","year":"2011"},{"key":"2023012712055554800_btu576-B41","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1186\/1471-2105-13-185","article-title":"Estimation of sequencing error rates in short reads","volume":"13","author":"Wang","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023012712055554800_btu576-B42","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1016\/0001-8708(76)90202-4","article-title":"Some biological sequence metrics","volume":"20","author":"Waterman","year":"1976","journal-title":"Adv. Math."},{"key":"2023012712055554800_btu576-B43","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1186\/1471-2105-12-198","article-title":"Hmm-frame: accurate protein domain classification for metagenomic sequences containing frameshift errors","volume":"12","author":"Zhang","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023012712055554800_btu576-B44","doi-asserted-by":"crossref","first-page":"339","DOI":"10.1089\/cmb.1997.4.339","article-title":"Aligning a DNA sequence with a protein sequence","volume":"4","author":"Zhang","year":"1997","journal-title":"J. Comput. Biol."},{"key":"2023012712055554800_btu576-B45","doi-asserted-by":"crossref","first-page":"2541","DOI":"10.1101\/gr.1429003","article-title":"Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome","volume":"13","author":"Zhang","year":"2003","journal-title":"Genome Res."},{"key":"2023012712055554800_btu576-B46","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1093\/bioinformatics\/btr595","article-title":"Rapsearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data","volume":"28","author":"Zhao","year":"2012","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/24\/3575\/48932228\/bioinformatics_30_24_3575.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/30\/24\/3575\/48932228\/bioinformatics_30_24_3575.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T13:04:43Z","timestamp":1674824683000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/30\/24\/3575\/2422185"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,8,28]]},"references-count":46,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2014,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu576","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2014,12,15]]},"published":{"date-parts":[[2014,8,28]]}}}