{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:30:21Z","timestamp":1773275421537,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The increasing availability of second-generation high-throughput sequencing (HTS) technologies has sparked a growing interest in de novo genome sequencing. This in turn has fueled the need for reliable means of obtaining high-quality draft genomes from short-read sequencing data. The millions of reads usually involved in HTS experiments are first assembled into longer fragments called contigs, which are then scaffolded, i.e. ordered and oriented using additional information, to produce even longer sequences called scaffolds. Most existing scaffolders of HTS genome assemblies are not suited for using information other than paired reads to perform scaffolding. They use this limited information to construct scaffolds, often preferring scaffold length over accuracy, when faced with the tradeoff.<\/jats:p>\n               <jats:p>Results: We present GRASS (GeneRic ASsembly Scaffolder)\u2014a novel algorithm for scaffolding second-generation sequencing assemblies capable of using diverse information sources. GRASS offers a mixed-integer programming formulation of the contig scaffolding problem, which combines contig order, distance and orientation in a single optimization objective. The resulting optimization problem is solved using an expectation\u2013maximization procedure and an unconstrained binary quadratic programming approximation of the original problem. We compared GRASS with existing HTS scaffolders using Illumina paired reads of three bacterial genomes. Our algorithm constructs a comparable number of scaffolds, but makes fewer errors. This result is further improved when additional data, in the form of related genome sequences, are used.<\/jats:p>\n               <jats:p>Availability: GRASS source code is freely available from http:\/\/code.google.com\/p\/tud-scaffolding\/.<\/jats:p>\n               <jats:p>Contact: \u00a0a.gritsenko@tudelft.nl<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts175","type":"journal-article","created":{"date-parts":[[2012,4,8]],"date-time":"2012-04-08T08:06:13Z","timestamp":1333872373000},"page":"1429-1437","source":"Crossref","is-referenced-by-count":44,"title":["GRASS: a generic algorithm for scaffolding next-generation sequencing assemblies"],"prefix":"10.1093","volume":"28","author":[{"given":"Alexey A.","family":"Gritsenko","sequence":"first","affiliation":[{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"}]},{"given":"Jurgen F.","family":"Nijkamp","sequence":"additional","affiliation":[{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"}]},{"given":"Marcel J.T.","family":"Reinders","sequence":"additional","affiliation":[{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"}]},{"given":"Dick de","family":"Ridder","sequence":"additional","affiliation":[{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"},{"name":"1 The Delft Bioinformatics Lab, Department of Mediamatics, Delft University of Technology, Mekelweg 4, 2628 CD Delft, 2Platform Green Synthetic Biology and 3Kluyver Centre for Genomics of Industrial Fermentation, P.O. Box 5057, 2600 GA Delft, The Netherlands"}]}],"member":"286","published-online":{"date-parts":[[2012,4,6]]},"reference":[{"key":"2023012512313706400_B1","doi-asserted-by":"crossref","first-page":"142","DOI":"10.4056\/sigs.541628","article-title":"Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs","volume":"2","author":"Auch","year":"2010","journal-title":"Stand. Genomic Sci."},{"key":"2023012512313706400_B2","doi-asserted-by":"crossref","first-page":"1691","DOI":"10.1093\/bioinformatics\/btr174","article-title":"BamTools: a C++ API and toolkit for analyzing and managing BAM files","volume":"27","author":"Barnett","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B3","article-title":"Heuristic algorithms for the unconstrained binary quadratic programming problem","volume-title":"Technical Report.","author":"Beasley","year":"1998"},{"key":"2023012512313706400_B4","doi-asserted-by":"crossref","first-page":"578","DOI":"10.1093\/bioinformatics\/btq683","article-title":"Scaffolding pre-assembled contigs using SSPACE","volume":"27","author":"Boetzer","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B5","volume-title":"Linear Programming and Extensions","author":"Dantzig","year":"1998"},{"key":"2023012512313706400_B6","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1186\/1471-2105-11-345","article-title":"SOPRA: scaffolding algorithm for paired reads via statistical optimization","volume":"11","author":"Dayarian","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512313706400_B7","doi-asserted-by":"crossref","first-page":"2478","DOI":"10.1093\/nar\/30.11.2478","article-title":"Fast algorithms for large-scale genome alignment and comparison","volume":"30","author":"Delcher","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023012512313706400_B8","doi-asserted-by":"crossref","first-page":"1681","DOI":"10.1089\/cmb.2011.0170","article-title":"Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences","volume":"18","author":"Gao","year":"2011","journal-title":"J. Comput. Biol."},{"key":"2023012512313706400_B9","doi-asserted-by":"crossref","first-page":"2329","DOI":"10.1093\/bioinformatics\/bth324","article-title":"Whole-genome prokaryotic phylogeny","volume":"21","author":"Henz","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B10","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1145\/360825.360861","article-title":"A linear space algorithm for computing maximal common subsequences","volume":"18","author":"Hirschberg","year":"1975","journal-title":"Commun. ACM"},{"key":"2023012512313706400_B11","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1145\/585265.585267","article-title":"The greedy path-merging algorithm for contig scaffolding","volume":"49","author":"Huson","year":"2002","journal-title":"J. ACM"},{"key":"2023012512313706400_B12","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1093\/molbev\/msj030","article-title":"Application of phylogenetic networks in evolutionary studies","volume":"23","author":"Huson","year":"2006","journal-title":"Mol. Biol. Evol."},{"key":"2023012512313706400_B13","author":"IBM |ILOG","year":"2011","journal-title":"ILOG CPLEX: high-performance software for mathematical programming and optimization."},{"key":"2023012512313706400_B14","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1007\/BF01188580","article-title":"Combinatorial algorithms for DNA sequence assembly","volume":"13","author":"Kececioglu","year":"1995","journal-title":"Algorithmica"},{"key":"2023012512313706400_B15","doi-asserted-by":"crossref","first-page":"1541","DOI":"10.1101\/gr.183201","article-title":"Assembly of the working draft of the human genome with GigAssembler","volume":"11","author":"Kent","year":"2001","journal-title":"Genome Res."},{"key":"2023012512313706400_B16","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol."},{"key":"2023012512313706400_B17","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B18","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B19","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.biosystems.2004.08.002","article-title":"Memetic algorithms for the unconstrained binary quadratic programming problem","volume":"78","author":"Merz","year":"2004","journal-title":"BioSystems"},{"key":"2023012512313706400_B20","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B21","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023012512313706400_B22","article-title":"Biological Sequence Data Model","volume-title":"The NCBI C++ Toolkit Book (Internet).","author":"National Center for Biotechnology Information","year":"2011"},{"key":"2023012512313706400_B23","doi-asserted-by":"crossref","first-page":"1229","DOI":"10.1093\/bioinformatics\/btn102","article-title":"Scaffolding and validation of bacterial genome assemblies using optical restriction maps","volume":"10","author":"Nagarajan","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B24","doi-asserted-by":"crossref","DOI":"10.1002\/9781118627372","volume-title":"Integer and combinatorial optimization.","author":"Nemhauser","year":"1988"},{"key":"2023012512313706400_B25","article-title":"Quality of semidefinite relaxation for nonconvex quadratic optimization","volume-title":"CORE Discussion Papers 1997019.","author":"Nesterov","year":"1997"},{"key":"2023012512313706400_B26","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1080\/10556780701550083","article-title":"Global equilibrium search applied to the unconstrained binary quadratic optimization problem","volume":"14","author":"Pardalos","year":"2008","journal-title":"Optim. Meth. Softw."},{"key":"2023012512313706400_B27","first-page":"149","article-title":"IDBA \u2013 a practical iterative de Bruijn graph de novo assembler","volume":"13","author":"Peng","year":"2010","journal-title":"Genome Res."},{"key":"2023012512313706400_B28","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1101\/gr.1536204","article-title":"Hierarchical scaffolding with Bambus","volume":"14","author":"Pop","year":"2004","journal-title":"Genome Res."},{"key":"2023012512313706400_B29","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: a flexible suite of utilities for comparing genomic features","volume":"26","author":"Quinlan","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B30","doi-asserted-by":"crossref","first-page":"3259","DOI":"10.1093\/bioinformatics\/btr562","article-title":"Fast scaffolding with small independent mixed integer programs","volume":"27","author":"Salmela","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512313706400_B31","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."},{"key":"2023012512313706400_B32","article-title":"Genome assembly and comparison","volume-title":"PhD Thesis","author":"Zerbino","year":"2009"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/11\/1429\/48869328\/bioinformatics_28_11_1429.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/11\/1429\/48869328\/bioinformatics_28_11_1429.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T16:00:48Z","timestamp":1674662448000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/11\/1429\/267020"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,4,6]]},"references-count":32,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2012,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts175","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,6,1]]},"published":{"date-parts":[[2012,4,6]]}}}