{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T00:01:03Z","timestamp":1755216063794,"version":"3.43.0"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,8,3]],"date-time":"2025-08-03T00:00:00Z","timestamp":1754179200000},"content-version":"vor","delay-in-days":33,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>While there is explosive growth in the creation of biological data, researchers rely on ad hoc verification methods such as testing with small simulated datasets. Due to their importance in biology and biomedicine, there is a critical need to verify these algorithms as well as their implementations to ensure that the results and conclusions are trustworthy. In this paper, we explore an effective combination of model checking and theorem proving of bioinformatics software, including BiopLib, BWA, Jellyfish, SDSL, Dashing, SPAdes, and MUMmer. We provide results for model checking for bioinfomatics software libraries and theorem proving for specific properties. Our model checking framework found several potential flaws in the two tools (BiopLib and BWA). We have also detected several failing cases in Succinct Data Structures Library (SDSL).<\/jats:p>","DOI":"10.1093\/bib\/bbaf383","type":"journal-article","created":{"date-parts":[[2025,8,3]],"date-time":"2025-08-03T14:28:38Z","timestamp":1754231318000},"source":"Crossref","is-referenced-by-count":0,"title":["Formal verification of bioinformatics software using model checking and theorem proving"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7107-8137","authenticated-orcid":false,"given":"Hansika","family":"Weerasena","sequence":"first","affiliation":[{"name":"CISE Department , University of Florida, 1889 Museum Road, Gainesville, FL 32611,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8347-5065","authenticated-orcid":false,"given":"Aruna","family":"Jayasena","sequence":"additional","affiliation":[{"name":"CISE Department , University of Florida, 1889 Museum Road, Gainesville, FL 32611,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9509-9725","authenticated-orcid":false,"given":"Christina","family":"Boucher","sequence":"additional","affiliation":[{"name":"CISE Department , University of Florida, 1889 Museum Road, Gainesville, FL 32611,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3653-6221","authenticated-orcid":false,"given":"Prabhat","family":"Mishra","sequence":"additional","affiliation":[{"name":"CISE Department , University of Florida, 1889 Museum Road, Gainesville, FL 32611,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,8,3]]},"reference":[{"key":"2025080310283040800_ref1","doi-asserted-by":"crossref","first-page":"e01607","DOI":"10.1128\/mbio.01607-23","article-title":"Major data analysis errors invalidate cancer microbiome findings","volume":"14","author":"Gihawi","year":"2023","journal-title":"mBio"},{"key":"2025080310283040800_ref2","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-10-24","article-title":"An innovative approach for testing bioinformatics programs using metamorphic testing","volume":"10","author":"Chen","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2025080310283040800_ref3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-15-S16-S15","article-title":"Verification and validation of bioinformatics software without a gold standard: a case study of BWA and bowtie","volume":"15","author":"Giannoulatou","year":"2014","journal-title":"BMC Bioinformatics"},{"year":"2023","author":"Pereira","article-title":"Verification of distributed artificial intelligence systems in bioinformatics","key":"2025080310283040800_ref4"},{"key":"2025080310283040800_ref5","doi-asserted-by":"publisher","first-page":"3187","DOI":"10.1093\/bioinformatics\/bty282","article-title":"Automatic selection of verification tools for efficient analysis of biochemical models","volume":"34","author":"Bakir","year":"2018","journal-title":"Bioinformatics"},{"key":"2025080310283040800_ref6","doi-asserted-by":"publisher","first-page":"362","DOI":"10.3389\/fmicb.2019.00362","article-title":"Validation of a bioinformatics workflow for routine analysis of whole-genome sequencing data and related challenges for pathogen typing in a European national reference center: Neisseria meningitidis as a proof-of-concept","volume":"10","author":"Bogaerts","year":"2019","journal-title":"Front Microbiol"},{"key":"2025080310283040800_ref7","doi-asserted-by":"publisher","first-page":"1118","DOI":"10.5858\/arpa.2019-0476-RA","article-title":"Assembling and validating bioinformatic pipelines for next-generation sequencing clinical assays","volume":"144","author":"SoRelle","year":"2020","journal-title":"Arch Pathol Lab Med"},{"key":"2025080310283040800_ref8","doi-asserted-by":"publisher","first-page":"514","DOI":"10.3892\/ijmm.2018.3640","article-title":"Bioinformatics analysis and verification of key genes associated with recurrent respiratory tract infections","volume":"42","author":"Jin","year":"2018","journal-title":"Int J Mol Med"},{"key":"2025080310283040800_ref9","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1007\/BFb0058022","article-title":"Model checking","volume-title":"International Conference on Foundations of Software Technology and Theoretical Computer Science","author":"Clarke","year":"1997"},{"key":"2025080310283040800_ref10","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1023\/A:1011276507260","article-title":"Bounded model checking using satisfiability solving","volume":"19","author":"Clarke","year":"2001","journal-title":"Formal Methods Syst Des"},{"key":"2025080310283040800_ref11","doi-asserted-by":"publisher","first-page":"852","DOI":"10.1109\/TC.2011.49","article-title":"Property learning techniques for efficient generation of directed tests","volume":"60","author":"Chen","year":"2011","journal-title":"IEEE Trans Comput"},{"key":"2025080310283040800_ref12","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1007\/BF01257083","article-title":"The temporal logic of branching time","volume":"20","author":"Ben-Ari","year":"1983","journal-title":"Acta Informatica"},{"key":"2025080310283040800_ref13","doi-asserted-by":"crossref","first-page":"241","DOI":"10.1016\/0167-6423(83)90017-5","article-title":"Using branching time temporal logic to synthesize synchronization skeletons","volume":"2","author":"Allen Emerson","year":"1982","journal-title":"Sci Comput Program"},{"key":"2025080310283040800_ref14","first-page":"1","article-title":"User-friendly formal specification languages-conclusions drawn from industrial experience on model checking","volume-title":"Proceedings of the IEEE 21st International Conference on Emerging Technologies and Factory Automation","author":"Pakonen","year":"2016"},{"key":"2025080310283040800_ref15","first-page":"28:1","article-title":"On the expressiveness of QCTL","volume-title":"27th International Conference on Concurrency Theory (CONCUR 2016), volume 59 of Leibniz International Proceedings in Informatics (LIPIcs)","author":"David","year":"2016"},{"key":"2025080310283040800_ref16","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1016\/j.csbj.2017.07.002","article-title":"Scalability and validation of big data bioinformatics software","volume":"15","author":"Yang","year":"2017","journal-title":"Comput Struct Biotechnol"},{"key":"2025080310283040800_ref17","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1016\/j.jmoldx.2017.11.003","article-title":"Standards and guidelines for validating next-generation sequencing bioinformatics pipelines: a joint recommendation of the Association for Molecular Pathology and the College of American Pathologists","volume":"20","author":"Roy","year":"2018","journal-title":"J Mol Diagn"},{"key":"2025080310283040800_ref18","first-page":"1","article-title":"Formal verification for natural and engineered biological systems","volume-title":"Proceedings of Formal Methods in Computer Aided Design (FMCAD)","author":"Kugler","year":"2020"},{"key":"2025080310283040800_ref19","first-page":"1","article-title":"Valection: design optimization for validation and verification studies","volume":"19","author":"Christopher","year":"2018","journal-title":"BMC Bioinformatics"},{"key":"2025080310283040800_ref20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1101\/gr.260604.119","article-title":"Data structures based on k-mers for querying large collections of sequencing data sets","volume":"31","author":"Marchet","year":"2021","journal-title":"Genome Res"},{"key":"2025080310283040800_ref21","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1109\/SFCS.2000.892127","article-title":"Opportunistic data structures with applications","volume-title":"Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS)","author":"Ferragina","year":"2000"},{"key":"2025080310283040800_ref22","doi-asserted-by":"publisher","first-page":"407","DOI":"10.1145\/382780.382782","article-title":"An analysis of the Burrows\u2013Wheeler transform","volume":"48","author":"Manzini","year":"2001","journal-title":"J ACM"},{"key":"2025080310283040800_ref23","first-page":"1365","article-title":"HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm","volume-title":"Proceedings of the Conference on Analysis of Algorithms (AoA)","author":"Flajolet","year":"2007"},{"key":"2025080310283040800_ref24","doi-asserted-by":"publisher","first-page":"279","DOI":"10.1016\/j.entcs.2005.04.017","article-title":"Efficient model checking of hardware using conditioned slicing","volume":"128","author":"Shobha Vasudevan","year":"2005","journal-title":"Electron Notes Theor Comput Sci"},{"key":"2025080310283040800_ref25","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1007\/10722167_15","article-title":"Counterexample-guided abstraction refinement","volume-title":"Proceedings of the 12th International Conference on Computer Aided Verification (CAV)","author":"Clarke","year":"2000"},{"key":"2025080310283040800_ref26","doi-asserted-by":"publisher","first-page":"752","DOI":"10.1145\/876638.876643","article-title":"Counterexample-guided abstraction refinement for symbolic model checking","volume":"50","author":"Clarke","year":"2003","journal-title":"J ACM"},{"volume-title":"Rosette Language Guide","key":"2025080310283040800_ref27"},{"key":"2025080310283040800_ref28"},{"key":"2025080310283040800_ref29","doi-asserted-by":"publisher","first-page":"4017","DOI":"10.1093\/bioinformatics\/btv482","article-title":"BiopLib and BiopTools\u2014a C programming library and toolset for manipulating protein structure","volume":"31","author":"Porter","year":"2015","journal-title":"Bioinformatics"},{"key":"2025080310283040800_ref30","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2025080310283040800_ref31","first-page":"389","article-title":"CBMC\u2013C bounded model checker: (competition contribution)","volume-title":"Proceedings of the 20th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS)","author":"Kroening","year":"2014"},{"key":"2025080310283040800_ref32","doi-asserted-by":"publisher","first-page":"764","DOI":"10.1093\/bioinformatics\/btr011","article-title":"A fast, lock-free approach for efficient parallel counting of occurrences of k-mers","volume":"27","author":"Mar\u00e7ais","year":"2011","journal-title":"Bioinformatics"},{"key":"2025080310283040800_ref33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1875-0","article-title":"Dashing: fast and accurate genomic distances with HyperLogLog","volume":"20","author":"Baker","year":"2019","journal-title":"Genome Biol"},{"key":"2025080310283040800_ref34","doi-asserted-by":"publisher","DOI":"10.1002\/cpbi.102","article-title":"Using SPAdes de novo assembler","volume":"70","author":"Prjibelski","year":"2020","journal-title":"Curr Protoc Bioinformatics"},{"key":"2025080310283040800_ref35","doi-asserted-by":"publisher","first-page":"e1005944","DOI":"10.1371\/journal.pcbi.1005944","article-title":"MUMmer4: a fast and versatile genome alignment system","volume":"14","author":"Mar\u00e7ais","year":"2018","journal-title":"PLoS Comput Biol"},{"key":"2025080310283040800_ref36"},{"key":"2025080310283040800_ref37","first-page":"337","article-title":"Z3: An efficient SMT solver","volume-title":"Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS)","author":"De Moura","year":"2008"},{"key":"2025080310283040800_ref38","doi-asserted-by":"publisher","first-page":"e70031","DOI":"10.1002\/cpz1.70031","article-title":"The TELCoMB protocol for high-sensitivity detection of ARG-MGE colocalizations in complex microbial communities","volume":"4","author":"Bravo","year":"2024","journal-title":"Curr Protoc"},{"key":"2025080310283040800_ref39","doi-asserted-by":"crossref","first-page":"gix038","DOI":"10.1093\/gigascience\/gix038","article-title":"Alignment of 1000 genomes project reads to reference assembly GRCh38","volume":"6","author":"Zheng-Bradley","year":"2017","journal-title":"Gigascience"},{"key":"2025080310283040800_ref40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12864-017-3770-y","article-title":"Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls","volume":"18","author":"Buckley","year":"2017","journal-title":"BMC Genomics"},{"key":"2025080310283040800_ref41","doi-asserted-by":"publisher","first-page":"1569290","DOI":"10.3389\/fgene.2025.1569290","article-title":"TCGADownloadHelper: Simplifying TCGA data extraction and preprocessing","volume":"16","author":"Baumann","year":"2025","journal-title":"Front Genet"},{"key":"2025080310283040800_ref42","first-page":"4","article-title":"Subset wavelet trees","volume-title":"Proceedings of the 21st International Symposium on Experimental Algorithms (SEA)","author":"Alanko","year":"2023"},{"key":"2025080310283040800_ref43","article-title":"A natural encoding of genetic variation in a Burrows\u2013Wheeler transform to enable mapping and genome inference","volume-title":"Proceedings of the Workshop on Algorithms in Bioinformatics (WABI)","author":"Iqbal","year":"2016"},{"key":"2025080310283040800_ref44","doi-asserted-by":"publisher","first-page":"1950008","DOI":"10.1142\/S0219720019500082","article-title":"Fast and memory efficient approach for mapping NGS reads to a reference genome","volume":"17","author":"Kumar","year":"2019","journal-title":"J Bioinform Comput Biol"},{"key":"2025080310283040800_ref45","first-page":"12","article-title":"A succinct solution to Rmap alignment","volume-title":"Proceedings of the 18th International Workshop on Algorithms in Bioinformatics","author":"Muggli","year":"2018"},{"key":"2025080310283040800_ref46","first-page":"383","article-title":"Variable-order De Bruijn graphs","volume-title":"Proceedings of the Data Compression Conference (DCC)","author":"Boucher","year":"2015"},{"key":"2025080310283040800_ref47","first-page":"68","article-title":"Efficient indexed alignment of contigs to optical maps","volume-title":"Proceedings of the 14th International Workshop Algorithms in Bioinformatics (WABI)","author":"Muggli","year":"2014"},{"key":"2025080310283040800_ref48","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13015-019-0160-9","article-title":"Kohdista: An efficient method to index and query possible Rmap alignments","volume":"14","author":"Muggli","year":"2019","journal-title":"Algorithms Mol Biol"},{"key":"2025080310283040800_ref49","doi-asserted-by":"publisher","first-page":"188","DOI":"10.1089\/cmb.2021.0445","article-title":"Finding maximal exact matches using the r-index","volume":"29","author":"Rossi","year":"2022","journal-title":"J Comput Biol"},{"year":"2013","author":"Li","article-title":"Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM","key":"2025080310283040800_ref50"},{"key":"2025080310283040800_ref51","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1561\/2400000035","article-title":"Algorithms for verifying deep neural networks","volume":"4","author":"Liu","year":"2021","journal-title":"Foundations and Trendsin Optimization"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf383\/63924917\/bbaf383.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf383\/63924917\/bbaf383.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,3]],"date-time":"2025-08-03T14:28:42Z","timestamp":1754231322000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf383\/8221466"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":51,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf383","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7]]},"article-number":"bbaf383"}}