{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T23:35:39Z","timestamp":1769556939771,"version":"3.49.0"},"reference-count":62,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,10,20]],"date-time":"2023-10-20T00:00:00Z","timestamp":1697760000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010663","name":"European Research Council","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100010663","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Union\u2019s Horizon 2020 research and innovation programme"},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["322595"],"award-info":[{"award-number":["322595"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["352821"],"award-info":[{"award-number":["352821"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["346968"],"award-info":[{"award-number":["346968"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding \u201csafe\u201d partial solutions (e.g. contigs) which are common to all solutions. Previous research on safety has focused on polynomially time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of \u201csafety tools\u201d for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, \u201cminimum flow decomposition\u201d (MFD). We obtain our results by developing a \u201csafety test\u201d for paths based on a general integer linear programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Experimental results on transcriptome datasets show that all safe paths for MFDs correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27\u2009000 non-trivial graphs of this dataset in only 1.5\u2009h. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/algbio\/mfd-safety.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad640","type":"journal-article","created":{"date-parts":[[2023,10,20]],"date-time":"2023-10-20T17:54:54Z","timestamp":1697824494000},"source":"Crossref","is-referenced-by-count":2,"title":["A safety framework for flow decomposition problems via integer linear programming"],"prefix":"10.1093","volume":"39","author":[{"given":"Fernando H C","family":"Dias","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00560, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0235-6951","authenticated-orcid":false,"given":"Manuel","family":"C\u00e1ceres","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00560, Finland"}]},{"given":"Lucia","family":"Williams","sequence":"additional","affiliation":[{"name":"School of Computing, Montana State University , Bozeman, MT 59717, United States"}]},{"given":"Brendan","family":"Mumey","sequence":"additional","affiliation":[{"name":"School of Computing, Montana State University , Bozeman, MT 59717, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5747-8350","authenticated-orcid":false,"given":"Alexandru I","family":"Tomescu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Helsinki , Helsinki 00560, Finland"}]}],"member":"286","published-online":{"date-parts":[[2023,10,20]]},"reference":[{"key":"2023110707090866600_btad640-B1","doi-asserted-by":"crossref","DOI":"10.21236\/ADA594171","volume-title":"Network Flows","author":"Ahuja","year":"1988"},{"key":"2023110707090866600_btad640-B2","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1186\/s13059-020-1935-5","article-title":"Opportunities and challenges in long-read sequencing data analysis","volume":"21","author":"Amarasinghe","year":"2020","journal-title":"Genome Biol"},{"key":"2023110707090866600_btad640-B3","doi-asserted-by":"crossref","first-page":"5086","DOI":"10.1093\/bioinformatics\/btz443","article-title":"Full-length de novo viral quasispecies assembly through variation graph construction","volume":"35","author":"Baaijens","year":"2019","journal-title":"Bioinformatics"},{"key":"2023110707090866600_btad640-B4","first-page":"221","author":"Baaijens","year":"2020"},{"key":"2023110707090866600_btad640-B5","volume-title":"Flows with Path Restrictions","author":"Baier","year":"2004"},{"key":"2023110707090866600_btad640-B6","doi-asserted-by":"crossref","first-page":"2447","DOI":"10.1093\/bioinformatics\/btu317","article-title":"Efficient RNA isoform identification and quantification from RNA-Seq data with network flows","volume":"30","author":"Bernard","year":"2014","journal-title":"Bioinformatics"},{"key":"2023110707090866600_btad640-B7","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/j.tcs.2021.06.039","article-title":"Spectral concepts in genome informational analysis","volume":"894","author":"Bonnici","year":"2021","journal-title":"Theoretical Computer Science"},{"key":"2023110707090866600_btad640-B8","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1109\/TCBB.2021.3131203","article-title":"Safety in multi-assembly via paths appearing in all path covers of a dag","volume":"19","author":"C\u00e1ceres","year":"2022","journal-title":"IEEE\/ACM Trans Comput Biol Bioinf"},{"key":"2023110707090866600_btad640-B9","first-page":"1","volume-title":"30th Annual European Symposium on Algorithms, ESA 2022, 5\u20139 September 2022, Berlin\/Potsdam, Germany, Vol. 244 of LIPIcs","author":"C\u00e1ceres","year":"2022"},{"key":"2023110707090866600_btad640-B10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3341731","article-title":"An optimal O(nm) algorithm for enumerating all walks common to all closed edge-covering walks of a graph","volume":"15","author":"Cairo","year":"2019","journal-title":"ACM Trans Algorithms"},{"key":"2023110707090866600_btad640-B11","first-page":"1","volume-title":"40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023), Dagstuhl, Germany","author":"Cairo","year":"2023"},{"key":"2023110707090866600_btad640-B12","first-page":"387","article-title":"Locating well-conserved regions within a pairwise alignment","volume":"9","author":"Chao","year":"1993","journal-title":"Comput Appl Biosci"},{"key":"2023110707090866600_btad640-B13","doi-asserted-by":"crossref","first-page":"2927","DOI":"10.1093\/bioinformatics\/bty202","article-title":"De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding","volume":"34","author":"Chen","year":"2018","journal-title":"Bioinformatics"},{"key":"2023110707090866600_btad640-B14","first-page":"1734","author":"Cohen","year":"2014"},{"key":"2023110707090866600_btad640-B15","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1016\/0167-6377(94)90049-3","article-title":"Persistency in maximum cardinality bipartite matchings","volume":"15","author":"Costa","year":"1994","journal-title":"Oper Res Lett"},{"key":"2023110707090866600_btad640-B16","author":"Dias","year":"2022"},{"key":"2023110707090866600_btad640-B17","first-page":"230","author":"Dias","year":"2022"},{"key":"2023110707090866600_btad640-B18","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1007\/s10878-019-00464-4","article-title":"Efficient algorithms for measuring the funnel-likeness of DAGs","volume":"39","author":"Garlet Millani","year":"2020","journal-title":"J Comb Optim"},{"key":"2023110707090866600_btad640-B19","doi-asserted-by":"crossref","first-page":"10073","DOI":"10.1093\/nar\/gks666","article-title":"Modelling and simulating generic RNA-seq experiments with the flux simulator","volume":"40","author":"Griebel","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023110707090866600_btad640-B20","author":"Gurobi Optimization, LLC","year":"2021"},{"key":"2023110707090866600_btad640-B21","author":"Hagberg","year":"2008"},{"key":"2023110707090866600_btad640-B22","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1137\/0603052","article-title":"Vertices belonging to all or to no maximum stable sets of a graph","volume":"3","author":"Hammer","year":"1982","journal-title":"SIAM J Algebra Discret Methods"},{"key":"2023110707090866600_btad640-B23","first-page":"828","author":"Hartman","year":"2012"},{"key":"2023110707090866600_btad640-B24","first-page":"15","author":"Hong","year":"2013"},{"key":"2023110707090866600_btad640-B25","author":"Jackson","year":"2009"},{"key":"2023110707090866600_btad640-B26","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1007\/BF01188580","article-title":"Combinatorial algorithms for DNA sequence assembly","volume":"13","author":"Kececioglu","year":"1995","journal-title":"Algorithmica"},{"key":"2023110707090866600_btad640-B27","volume-title":"30th Annual European Symposium on Algorithms (ESA 2022)","author":"Khan","year":"2022"},{"key":"2023110707090866600_btad640-B28","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1089\/cmb.2022.0261","article-title":"Improving RNA assembly via safety and completeness in flow decompositions","volume":"29","author":"Khan","year":"2022","journal-title":"J Comput Biol"},{"key":"2023110707090866600_btad640-B29","first-page":"177","volume-title":"International Conference on Research in Computational Molecular Biology","author":"Khan","year":"2022"},{"key":"2023110707090866600_btad640-B30","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1038\/s41587-019-0201-4","article-title":"Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype","volume":"37","author":"Kim","year":"2019","journal-title":"Nat Biotechnol"},{"key":"2023110707090866600_btad640-B31","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/1471-2105-11-21","article-title":"Assembly complexity of prokaryotic genomes using short reads","volume":"11","author":"Kingsford","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023110707090866600_btad640-B32","first-page":"75","author":"Kloster","year":"2018"},{"key":"2023110707090866600_btad640-B33","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1186\/s13059-019-1910-1","article-title":"Transcriptome assembly from long-read RNA-seq alignments with StringTie2","volume":"20","author":"Kovaka","year":"2019","journal-title":"Genome Biol"},{"key":"2023110707090866600_btad640-B34","doi-asserted-by":"crossref","first-page":"19867","DOI":"10.1073\/pnas.1113972108","article-title":"Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation","volume":"108","author":"Li","year":"2011","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023110707090866600_btad640-B35","author":"Li","year":"2014"},{"key":"2023110707090866600_btad640-B36","doi-asserted-by":"crossref","first-page":"1693","DOI":"10.1089\/cmb.2011.0171","article-title":"IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly","volume":"18","author":"Li","year":"2011","journal-title":"J Comput Biol"},{"key":"2023110707090866600_btad640-B37","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s13015-021-00184-7","article-title":"Exact transcript quantification over splice graphs","volume":"16","author":"Ma","year":"2021","journal-title":"Algorithms Mol Biol"},{"key":"2023110707090866600_btad640-B38","first-page":"289","author":"Medvedev","year":"2007"},{"key":"2023110707090866600_btad640-B39","first-page":"1","volume-title":"2015 IEEE Globecom Workshops (GC Wkshps), San Diego, CA, USA","author":"Mumey","year":"2015"},{"key":"2023110707090866600_btad640-B40","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1089\/cmb.1994.1.349","article-title":"On near-optimal alignments of biological sequences","volume":"1","author":"Naor","year":"1994","journal-title":"J Comput Biol"},{"key":"2023110707090866600_btad640-B41","author":"Ohst","year":"2015"},{"key":"2023110707090866600_btad640-B42","doi-asserted-by":"crossref","first-page":"883","DOI":"10.1007\/s10100-020-00705-6","article-title":"A study on flow decomposition methods for scheduling of electric buses in public transport based on aggregated time-space network models","volume":"30","author":"Olsen","year":"2022","journal-title":"Cent Eur J Oper Res"},{"key":"2023110707090866600_btad640-B43","first-page":"021592","article-title":"Salmon: accurate, versatile and ultrafast quantification from RNA-seq data using lightweight-alignment","author":"Patro","year":"2015","journal-title":"BioRxiv"},{"key":"2023110707090866600_btad640-B44","doi-asserted-by":"crossref","first-page":"290","DOI":"10.1038\/nbt.3122","article-title":"StringTie enables improved reconstruction of a transcriptome from RNA-seq reads","volume":"33","author":"Pertea","year":"2015","journal-title":"Nat Biotechnol"},{"key":"2023110707090866600_btad640-B45","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc Natl Acad Sci USA"},{"key":"2023110707090866600_btad640-B46","doi-asserted-by":"crossref","first-page":"414","DOI":"10.1016\/j.ejor.2015.06.012","article-title":"Integral flow decomposition with minimum longest path length","volume":"247","author":"Pie\u0144kosz","year":"2015","journal-title":"Eur J Oper Res"},{"key":"2023110707090866600_btad640-B47","doi-asserted-by":"crossref","first-page":"1673","DOI":"10.1093\/bioinformatics\/btab015","article-title":"V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data","volume":"37","author":"Posada-C\u00e9spedes","year":"2021","journal-title":"Bioinformatics"},{"key":"2023110707090866600_btad640-B48","doi-asserted-by":"crossref","first-page":"1746","DOI":"10.1101\/gr.276601.122","article-title":"Assembler artifacts include misassembly because of unsafe unitigs and under-assembly because of bidirected graphs","volume":"32","author":"Rahman","year":"2022","journal-title":"Genome Res"},{"key":"2023110707090866600_btad640-B49","doi-asserted-by":"crossref","first-page":"6728","DOI":"10.1038\/s41467-021-26944-y","article-title":"Jumper enables discontinuous transcript assembly in coronaviruses","volume":"12","author":"Sashittal","year":"2021","journal-title":"Nat Commun"},{"key":"2023110707090866600_btad640-B50","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1038\/nbt.4020","article-title":"Accurate assembly of transcripts through phase-preserving graph decomposition","volume":"35","author":"Shao","year":"2017","journal-title":"Nat Biotechnol"},{"key":"2023110707090866600_btad640-B51","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1109\/TCBB.2017.2779509","article-title":"Theory and a heuristic for the minimum path flow decomposition problem","volume":"16","author":"Shao","year":"2017","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023110707090866600_btad640-B52","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.ejor.2016.01.003","article-title":"Integer programming formulations for the elementary shortest path problem","volume":"252","author":"Taccari","year":"2016","journal-title":"Eur J Oper Res"},{"key":"2023110707090866600_btad640-B53","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1089\/cmb.2016.0141","article-title":"Safe and complete contig assembly through omnitigs","volume":"24","author":"Tomescu","year":"2017","journal-title":"J Comput Biol"},{"key":"2023110707090866600_btad640-B54","first-page":"S15:1","volume-title":"BMC Bioinformatics","author":"Tomescu","year":"2013"},{"key":"2023110707090866600_btad640-B55","doi-asserted-by":"crossref","first-page":"1390","DOI":"10.1016\/j.ejor.2006.05.043","article-title":"Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths","volume":"185","author":"Vatinlen","year":"2008","journal-title":"Eur J Oper Res"},{"key":"2023110707090866600_btad640-B56","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1093\/protein\/3.7.565","article-title":"Determination of reliable regions in protein sequence alignments","volume":"3","author":"Vingron","year":"1990","journal-title":"Protein Eng"},{"key":"2023110707090866600_btad640-B57","first-page":"15","article-title":"Next-generation transcriptome assembly: strategies and performance analysis","author":"Voshall","year":"2018","journal-title":"Bioinformatics in the Era of Post Genomics and Big Data"},{"key":"2023110707090866600_btad640-B58","author":"Williams","year":"2021"},{"key":"2023110707090866600_btad640-B59","doi-asserted-by":"crossref","first-page":"360","DOI":"10.1109\/TCBB.2022.3147697","article-title":"Flow decomposition with subpath constraints","volume":"20","author":"Williams","year":"2022","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"2023110707090866600_btad640-B60","first-page":"D682","article-title":"Ensembl 2020","volume":"48","author":"Yates","year":"2020","journal-title":"Nucleic Acids Res"},{"key":"2023110707090866600_btad640-B61","author":"Zhang","year":"2021"},{"key":"2023110707090866600_btad640-B62","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1089\/cmb.2021.0444","article-title":"Deriving ranges of optimal estimated transcript expression due to nonidentifiability","volume":"29","author":"Zheng","year":"2022","journal-title":"J Comput Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad640\/52306218\/btad640.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad640\/52771898\/btad640.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad640\/52771898\/btad640.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,31]],"date-time":"2024-10-31T04:50:47Z","timestamp":1730350247000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad640\/7325350"}},"subtitle":[],"editor":[{"given":"Christina","family":"Kendziorski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,10,20]]},"references-count":62,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad640","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,10,20]]},"article-number":"btad640"}}