{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,3]],"date-time":"2025-05-03T18:25:23Z","timestamp":1746296723391},"reference-count":25,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2007,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Whole genome shotgun sequencing produces increasingly higher coverage of a genome with random sequence reads. Progressive whole genome assembly and eventual finishing sequencing is a process that typically takes several years for large eukaryotic genomes. In the interim, all sequence reads of public sequencing projects are made available in repositories such as the NCBI Trace Archive. For a particular locus, sequencing coverage may be high enough early on to produce a reliable local genome assembly. We have developed software, <jats:italic>Tracembler<\/jats:italic>, that facilitates <jats:italic>in silico<\/jats:italic> chromosome walking by recursively assembling reads of a selected species from the NCBI Trace Archive starting with reads that significantly match sequence seeds supplied by the user.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>\n              <jats:italic>Tracembler<\/jats:italic> takes one or multiple DNA or protein sequence(s) as input to the NCBI Trace Archive BLAST engine to identify matching sequence reads from a species of interest. The BLAST searches are carried out recursively such that BLAST matching sequences identified in previous rounds of searches are used as new queries in subsequent rounds of BLAST searches. The recursive BLAST search stops when either no more new matching sequences are found, a given maximal number of queries is exhausted, or a specified maximum number of rounds of recursion is reached. All the BLAST matching sequences are then assembled into contigs based on significant sequence overlaps using the CAP3 program. We demonstrate the validity of the concept and software implementation with an example of successfully recovering a full-length <jats:italic>Chrm2<\/jats:italic> gene as well as its upstream and downstream genomic regions from <jats:italic>Rattus norvegicus<\/jats:italic> reads. In a second example, a query with two adjacent <jats:italic>Medicago truncatula<\/jats:italic> genes as seeds resulted in a contig that likely identifies the microsyntenic homologous soybean locus.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>\n              <jats:italic>Tracembler<\/jats:italic> streamlines the process of recursive database searches, sequence assembly, and gene identification in resulting contigs in attempts to identify homologous loci of genes of interest in species with emerging whole genome shotgun reads. A web server hosting <jats:italic>Tracembler<\/jats:italic> is provided at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"http:\/\/www.plantgdb.org\/tool\/tracembler\/\" ext-link-type=\"uri\">http:\/\/www.plantgdb.org\/tool\/tracembler\/<\/jats:ext-link>, and the software is also freely available from the authors for local installations.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-8-151","type":"journal-article","created":{"date-parts":[[2007,5,9]],"date-time":"2007-05-09T18:13:18Z","timestamp":1178734398000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Tracembler \u2013 software for in-silico chromosome walking in unassembled genomes"],"prefix":"10.1186","volume":"8","author":[{"given":"Qunfeng","family":"Dong","sequence":"first","affiliation":[]},{"given":"Matthew D","family":"Wilkerson","sequence":"additional","affiliation":[]},{"given":"Volker","family":"Brendel","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2007,5,9]]},"reference":[{"issue":"17","key":"1523_CR1","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Sch\u00e4ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 1997, 25(17):3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic acids research"},{"key":"1523_CR2","unstructured":"NCBI Entrez Genome Project[http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?db=genomeprj]"},{"key":"1523_CR3","unstructured":"JGI Sequencing Plans and Progress[http:\/\/www.jgi.doe.gov\/sequencing\/seqplans.html]"},{"key":"1523_CR4","unstructured":"NCBI Trace Archive[http:\/\/www.ncbi.nlm.nih.gov\/Traces\/trace.cgi]"},{"key":"1523_CR5","unstructured":"National Center for Biotechnology Information Newsletter 15(1), Summer 2006[http:\/\/www.ncbi.nlm.nih.gov\/Web\/Newsltr\/V15N1\/trace.html]"},{"key":"1523_CR6","unstructured":"NCBI Trace Archive discontiguous Mega BLAST Server[http:\/\/www.ncbi.nlm.nih.gov\/blast\/tracemb.shtml]"},{"key":"1523_CR7","unstructured":"Ensembl Trace Server[http:\/\/trace.ensembl.org\/cgi-bin\/tracesearch]"},{"key":"1523_CR8","doi-asserted-by":"publisher","first-page":"W20","DOI":"10.1093\/nar\/gkh435","volume":"32","author":"S McGinnis","year":"2004","unstructured":"McGinnis S, Madden TL: BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic acids research 2004, 32: W20-W25. 10.1093\/nar\/gkh435","journal-title":"Nucleic acids research"},{"key":"1523_CR9","unstructured":"NHGRI Rapid Data Release Policy[http:\/\/www.genome.gov\/10506376]"},{"issue":"9","key":"1523_CR10","doi-asserted-by":"publisher","first-page":"868","DOI":"10.1101\/gr.9.9.868","volume":"9","author":"X Huang","year":"1999","unstructured":"Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome research 1999, 9(9):868\u2013877. 10.1101\/gr.9.9.868","journal-title":"Genome research"},{"key":"1523_CR11","doi-asserted-by":"publisher","first-page":"965","DOI":"10.1016\/j.infsof.2005.09.005","volume":"47","author":"G Gremme Brendel, V","year":"2005","unstructured":"Gremme G Brendel, V., Sparks, M.E. & Kurtz, S.: Engineering a software tool for gene structure prediction in higher organisms. Information Software Technol 2005, 47: 965\u2013978. 10.1016\/j.infsof.2005.09.005","journal-title":"Information Software Technol"},{"key":"1523_CR12","unstructured":"Rattus norvegicus Chrm2, cholinergic receptor muscarinic 2 gene sequence[http:\/\/www.ncbi.nlm.nih.gov\/entrez\/viewer.fcgi?val=NC_005103.2amp;from=63911288&to=63913359&dopt=fasta]"},{"issue":"6982","key":"1523_CR13","doi-asserted-by":"publisher","first-page":"493","DOI":"10.1038\/nature02426","volume":"428","author":"RA Gibbs","year":"2004","unstructured":"Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera, Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar Alba M, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hubner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Venter JC, Payseur BA, Bourque G, Lopez-Otin C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004, 428(6982):493\u2013521. 10.1038\/nature02426","journal-title":"Nature"},{"key":"1523_CR14","unstructured":"NCBI bl2seq Web Server[http:\/\/www.ncbi.nlm.nih.gov\/blast\/bl2seq\/wblast2.cgi]"},{"key":"1523_CR15","unstructured":"NCBI Entrez Gene Chrm2 Region[http:\/\/www.ncbi.nlm.nih.gov\/entrez\/query.fcgi?db=gene&cmd=Retrieve&dopt=full_report&list_uids=81645]"},{"key":"1523_CR16","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/1471-2229-5-15","volume":"5","author":"J Mudge","year":"2005","unstructured":"Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, Town CD, Young ND: Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC plant biology 2005, 5: 15. 10.1186\/1471-2229-5-15","journal-title":"BMC plant biology"},{"issue":"4","key":"1523_CR17","doi-asserted-by":"publisher","first-page":"1174","DOI":"10.1104\/pp.104.057034","volume":"137","author":"ND Young","year":"2005","unstructured":"Young ND, Cannon SB, Sato S, Kim D, Cook DR, Town CD, Roe BA, Tabata S: Sequencing the genespaces of Medicago truncatula and Lotus japonicus. Plant physiology 2005, 137(4):1174\u20131181. 10.1104\/pp.104.057034","journal-title":"Plant physiology"},{"key":"1523_CR18","unstructured":"Department of Energy Press Release[http:\/\/www.energy.gov\/news\/2979.htm]"},{"key":"1523_CR19","volume-title":"Legume Crop Genomics","author":"RF Wilson","year":"2004","unstructured":"Wilson RF, Stalker HT, Brummer C: Legume Crop Genomics. Champaign, IL, U.S.A. , Am. Oil Chem. Soc. Press ; 2004."},{"key":"1523_CR20","unstructured":"Medicago truncatula IMGAG Genome Annotation[http:\/\/www.tigr.org\/tigr-scripts\/medicago\/IMGAG\/tab_delimited_output?word=&locus=AC146590_10.2&accession=]"},{"issue":"10","key":"1523_CR21","doi-asserted-by":"publisher","first-page":"1396","DOI":"10.1093\/bioinformatics\/18.10.1396","volume":"18","author":"E Berezikov","year":"2002","unstructured":"Berezikov E, Plasterk RH, Cuppen E: GENOTRACE: cDNA-based local GENOme assembly from TRACE archives. Bioinformatics (Oxford, England) 2002, 18(10):1396\u20131397. 10.1093\/bioinformatics\/18.10.1396","journal-title":"Bioinformatics (Oxford, England)"},{"issue":"10","key":"1523_CR22","doi-asserted-by":"publisher","first-page":"1725","DOI":"10.1101\/gr.194201","volume":"11","author":"Z Ning","year":"2001","unstructured":"Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA databases. Genome research 2001, 11(10):1725\u20131729. 10.1101\/gr.194201","journal-title":"Genome research"},{"key":"1523_CR23","unstructured":"Phrap[http:\/\/www.phrap.org]"},{"key":"1523_CR24","unstructured":"NCBI Trace Archive Statistics[http:\/\/www.ncbi.nlm.nih.gov\/Traces\/trace.cgi?cmd=show&f=graph_query&m=stat&s=graph]"},{"key":"1523_CR25","unstructured":"NCBI QBlast's URL API. User's Guide[http:\/\/www.ncbi.nlm.nih.gov\/BLAST\/Doc\/urlapi.html]"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-8-151.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T10:14:45Z","timestamp":1630491285000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-8-151"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,5,9]]},"references-count":25,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2007,12]]}},"alternative-id":["1523"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-8-151","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,5,9]]},"assertion":[{"value":"15 January 2007","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 May 2007","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 May 2007","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"151"}}