{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T12:29:18Z","timestamp":1766579358421,"version":"3.41.0"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"OOPSLA","license":[{"start":{"date-parts":[[2019,10,10]],"date-time":"2019-10-10T00:00:00Z","timestamp":1570665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Program. Lang."],"published-print":{"date-parts":[[2019,10,10]]},"abstract":"<jats:p>\n            The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100\u2014a factor of over 10\n            <jats:sup>6<\/jats:sup>\n            \u2014and the amount of data to be analyzed has increased proportionally. Yet, as Moore\u2019s Law continues to slow, computational biologists can no longer rely on computing hardware to compensate for the ever-increasing size of biological datasets. In a field where many researchers are primarily focused on biological analysis over computational optimization, the unfortunate solution to this problem is often to simply buy larger and faster machines.\n          <\/jats:p>\n          <jats:p>\n            Here, we introduce\n            <jats:bold>Seq<\/jats:bold>\n            , the first language tailored specifically to bioinformatics, which marries the ease and productivity of Python with C-like performance. Seq starts with a subset of Python\u2014and is in many cases a drop-in replacement\u2014yet also incorporates novel bioinformatics- and computational genomics-oriented data types, language constructs and optimizations. Seq enables users to write high-level, Pythonic code without having to worry about low-level or domain-specific optimizations, and allows for the seamless expression of the algorithms, idioms and patterns found in many genomics or bioinformatics applications. We evaluated Seq on several standard computational genomics tasks like reverse complementation,\n            <jats:italic>k<\/jats:italic>\n            -mer manipulation, sequence pattern matching and large genomic index queries. On equivalent CPython code, Seq attains a performance improvement of up to two orders of magnitude, and a 160\u00d7 improvement once domain-specific language features and optimizations are used. With parallelism, we demonstrate up to a 650\u00d7 improvement. Compared to optimized C++ code, which is already difficult for most biologists to produce, Seq frequently attains up to a 2\u00d7 improvement, and with shorter, cleaner code. Thus, Seq opens the door to an age of democratization of highly-optimized bioinformatics software.\n          <\/jats:p>","DOI":"10.1145\/3360551","type":"journal-article","created":{"date-parts":[[2019,10,11]],"date-time":"2019-10-11T14:53:33Z","timestamp":1570805613000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["Seq: a high-performance language for bioinformatics"],"prefix":"10.1145","volume":"3","author":[{"given":"Ariya","family":"Shajii","sequence":"first","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Ibrahim","family":"Numanagi\u0107","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Riyadh","family":"Baghdadi","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]},{"given":"Saman","family":"Amarasinghe","sequence":"additional","affiliation":[{"name":"Massachusetts Institute of Technology, USA"}]}],"member":"320","published-online":{"date-parts":[[2019,10,10]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature09534"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2018.00050"},{"key":"e_1_2_1_3_1","volume-title":"magrittr: A forward-pipe operator for R. R package version 1, 1","author":"Bache Stefan Milton","year":"2014","unstructured":"Stefan Milton Bache and Hadley Wickham . 2014. magrittr: A forward-pipe operator for R. R package version 1, 1 ( 2014 ). https:\/\/cran.r-project.org\/web\/packages\/magrittr\/vignettes\/magrittr.html Stefan Milton Bache and Hadley Wickham. 2014. magrittr: A forward-pipe operator for R. R package version 1, 1 (2014). https:\/\/cran.r-project.org\/web\/packages\/magrittr\/vignettes\/magrittr.html"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2019.8661197"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1101\/gr.187101"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1038\/533452a"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2010.118"},{"key":"e_1_2_1_8_1","volume-title":"Julia: A fast dynamic language for technical computing. arXiv","author":"Bezanson Jeff","year":"2012","unstructured":"Jeff Bezanson , Stefan Karpinski , Viral B Shah , and Alan Edelman . 2012 . Julia: A fast dynamic language for technical computing. arXiv (2012), 1209.5145. Jeff Bezanson, Stefan Karpinski, Viral B Shah, and Alan Edelman. 2012. Julia: A fast dynamic language for technical computing. arXiv (2012), 1209.5145."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380180902"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1565824.1565827"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2038037.1941561"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1272743.1272747"},{"volume-title":"Acm sigplan notices","author":"Chiw Charisee","key":"e_1_2_1_13_1","unstructured":"Charisee Chiw , Gordon Kindlmann , John Reppy , Lamont Samuels , and Nick Seltzer . 2012. Diderot: a parallel DSL for image analysis and visualization . In Acm sigplan notices , Vol. 47 . ACM , 111\u2013120. Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: a parallel DSL for image analysis and visualization. In Acm sigplan notices, Vol. 47. ACM, 111\u2013120."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp163"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-11"},{"volume-title":"Shed skin: An optimizing python-to-c++ compiler. Master\u2019s thesis","author":"Dufour Mark","key":"e_1_2_1_16_1","unstructured":"Mark Dufour . 2006. Shed skin: An optimizing python-to-c++ compiler. Master\u2019s thesis . Delft University of Technology . Mark Dufour. 2006. Shed skin: An optimizing python-to-c++ compiler. Master\u2019s thesis. Delft University of Technology."},{"key":"e_1_2_1_17_1","volume-title":"Striped Smith\u2013Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23, 2 (11","author":"Farrar Michael","year":"2006","unstructured":"Michael Farrar . 2006. Striped Smith\u2013Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23, 2 (11 2006 ), 156\u2013161. Michael Farrar. 2006. Striped Smith\u2013Waterman speeds database searches six times over other SIMD implementations. Bioinformatics 23, 2 (11 2006), 156\u2013161."},{"key":"e_1_2_1_18_1","volume-title":"Compression Boosting in Optimal Linear Time Using the Burrows-Wheeler Transform. In SODA","author":"Ferragina Paolo","year":"2004","unstructured":"Paolo Ferragina and Giovanni Manzini . 2004 . Compression Boosting in Optimal Linear Time Using the Burrows-Wheeler Transform. In SODA 2004. 655\u2013663. Paolo Ferragina and Giovanni Manzini. 2004. Compression Boosting in Optimal Linear Time Using the Burrows-Wheeler Transform. In SODA 2004. 655\u2013663."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1038\/507294a"},{"key":"e_1_2_1_20_1","unstructured":"K Hayen. 2012. Nuitka. (2012). http:\/\/nuitka.net  K Hayen. 2012. Nuitka. (2012). http:\/\/nuitka.net"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3390\/ijms18020308"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1177\/1176934318758650"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243185"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2017.8115709"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2866569"},{"key":"e_1_2_1_26_1","volume-title":"Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics 31, 22 (07","author":"Kucherov Gregory","year":"2015","unstructured":"Gregory Kucherov , Karel B\u0159inda , and Maciej Sykulski . 2015. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics 31, 22 (07 2015 ), 3584\u20133592. Gregory Kucherov, Karel B\u0159inda, and Maciej Sykulski. 2015. Spaced seeds improve k-mer-based metagenomic classification. Bioinformatics 31, 22 (07 2015), 3584\u20133592."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2833157.2833162"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp324"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp352"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp352"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbq015"},{"key":"e_1_2_1_33_1","volume-title":"Proteomics &amp","author":"Lu Hengyun","year":"2016","unstructured":"Hengyun Lu , Francesca Giordano , and Zemin Ning . 2016. Oxford Nanopore MinION sequencing and genome assembly. Genomics , Proteomics &amp ; Bioinformatics 14, 5 ( 2016 ), 265\u2013279. Hengyun Lu, Francesca Giordano, and Zemin Ning. 2016. Oxford Nanopore MinION sequencing and genome assembly. Genomics, Proteomics &amp; Bioinformatics 14, 5 (2016), 265\u2013279."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2925426.2926283"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1172\/JCI34772"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1038\/nprot.2016.182"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1101\/gr.107524.110"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-016-0917-0"},{"key":"e_1_2_1_39_1","unstructured":"Gor Nishanov. 2017. ISO\/IEC TS 22277:2017. (Dec 2017). https:\/\/www.iso.org\/standard\/73008.html  Gor Nishanov. 2017. ISO\/IEC TS 22277:2017. (Dec 2017). https:\/\/www.iso.org\/standard\/73008.html"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13059-016-0997-x"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.1213847"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2499370.2462176"},{"key":"e_1_2_1_43_1","volume-title":"Dov Greenbaum, Raymond K Auerbach, and Mark B Gerstein.","author":"Sboner Andrea","year":"2011","unstructured":"Andrea Sboner , Xinmeng Jasmine Mu , Dov Greenbaum, Raymond K Auerbach, and Mark B Gerstein. 2011 . The real cost of sequencing: higher than you think! Genome biology 12, 8 (2011), 125. Andrea Sboner, Xinmeng Jasmine Mu, Dov Greenbaum, Raymond K Auerbach, and Mark B Gerstein. 2011. The real cost of sequencing: higher than you think! Genome biology 12, 8 (2011), 125."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3155284.3018758"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cels.2018.07.005"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1101\/gr.126953.111"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1321152111"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-018-2014-8"},{"key":"e_1_2_1_49_1","unstructured":"Guido van Rossum. 2015. The Python Library Reference Release 3.5. Fred L. Drake Jr.  Guido van Rossum. 2015. The Python Library Reference Release 3.5. Fred L. Drake Jr."},{"key":"e_1_2_1_50_1","volume-title":"18th Annual Bioinformatics Open Source Conference . poster.","author":"Voss K","year":"2017","unstructured":"K Voss , J Gentry , and G Van der Auwera . 2017 . Full-stack genomics pipelining with GATK4 +WDL +Cromwell .. In 18th Annual Bioinformatics Open Source Conference . poster. K Voss, J Gentry, and G Van der Auwera. 2017. Full-stack genomics pipelining with GATK4 +WDL +Cromwell.. In 18th Annual Bioinformatics Open Source Conference . poster."},{"key":"e_1_2_1_51_1","volume-title":"Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics 33, 9 (01","author":"\u0160o\u0161i\u0107 Martin","year":"2017","unstructured":"Martin \u0160o\u0161i\u0107 and Mile \u0160iki\u0107 . 2017. Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics 33, 9 (01 2017 ), 1394\u20131395. Martin \u0160o\u0161i\u0107 and Mile \u0160iki\u0107. 2017. Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics 33, 9 (01 2017), 1394\u20131395."},{"key":"e_1_2_1_52_1","volume-title":"Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &amp","author":"Wang Wendi","year":"2012","unstructured":"Wendi Wang , Wen Tang , Linchuan Li , Guangming Tan , Peiheng Zhang , and Ninghui Sun . 2012. Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &amp ; PhD Forum ( 2012 ), 665\u2013674. Wendi Wang, Wen Tang, Linchuan Li, Guangming Tan, Peiheng Zhang, and Ninghui Sun. 2012. Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture. 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &amp; PhD Forum (2012), 665\u2013674."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1002\/mgg3.281"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1038\/nbt.3511"},{"key":"e_1_2_1_55_1","volume-title":"Faster and More Accurate Sequence Alignment with SNAP. CoRR abs\/1111.5572","author":"Zaharia Matei","year":"2011","unstructured":"Matei Zaharia , William J. Bolosky , Kristal Curtis , Armando Fox , David A. Patterson , Scott Shenker , Ion Stoica , Richard M. Karp , and Taylor Sittler . 2011. Faster and More Accurate Sequence Alignment with SNAP. CoRR abs\/1111.5572 ( 2011 ). arXiv: 1111.5572 http:\/\/arxiv.org\/abs\/1111.5572 Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David A. Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, and Taylor Sittler. 2011. Faster and More Accurate Sequence Alignment with SNAP. CoRR abs\/1111.5572 (2011). arXiv: 1111.5572 http:\/\/arxiv.org\/abs\/1111.5572"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.5555\/1763653.1763669"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"}],"container-title":["Proceedings of the ACM on Programming Languages"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3360551","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3360551","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:22:58Z","timestamp":1750202578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3360551"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,10]]},"references-count":57,"journal-issue":{"issue":"OOPSLA","published-print":{"date-parts":[[2019,10,10]]}},"alternative-id":["10.1145\/3360551"],"URL":"https:\/\/doi.org\/10.1145\/3360551","relation":{},"ISSN":["2475-1421"],"issn-type":[{"type":"electronic","value":"2475-1421"}],"subject":[],"published":{"date-parts":[[2019,10,10]]},"assertion":[{"value":"2019-10-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}