{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T20:29:04Z","timestamp":1768249744213,"version":"3.49.0"},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation:\u2002The exponential growth of sequence databases poses a major challenge to bioinformatics tools for querying alignment and annotation databases. There is a pressing need for methods for finding overlapping sequence intervals that are highly scalable to database size, query interval size, result size and construction\/updating of the interval database.<\/jats:p><jats:p>Results:\u2002We have developed a new interval database representation, the Nested Containment List (NCList), whose query time is O(n + log N), where N is the database size and n is the size of the result set. In all cases tested, this query algorithm is 5\u2013500-fold faster than other indexing methods tested in this study, such as MySQL multi-column indexing, MySQL binning and R-Tree indexing. We provide performance comparisons both in simulated datasets and real-world genome alignment databases, across a wide range of database sizes and query interval widths. We also present an in-place NCList construction algorithm that yields database construction times that are \u223c100-fold faster than other methods available. The NCList data structure appears to provide a useful foundation for highly scalable interval database applications.<\/jats:p><jats:p>Availability:\u2002NCList data structure is part of Pygr, a bioinformatics graph database library, available at http:\/\/sourceforge.net\/projects\/pygr<\/jats:p><jats:p>Contact: leec@chem.ucla.edu<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl647","type":"journal-article","created":{"date-parts":[[2007,1,19]],"date-time":"2007-01-19T01:13:12Z","timestamp":1169169192000},"page":"1386-1393","source":"Crossref","is-referenced-by-count":53,"title":["Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases"],"prefix":"10.1093","volume":"23","author":[{"given":"Alexander V.","family":"Alekseyenko","sequence":"first","affiliation":[{"name":"1 Department of Biomathematics, David Geffen School of Medicine and 2Molecular Biology Institute, Center for Computational Biology, Institute for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, CA 90095-1570, USA"}]},{"given":"Christopher J.","family":"Lee","sequence":"additional","affiliation":[{"name":"1 Department of Biomathematics, David Geffen School of Medicine and 2Molecular Biology Institute, Center for Computational Biology, Institute for Genomics and Proteomics, Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, CA 90095-1570, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,1,18]]},"reference":[{"issue":"(Database issue)","key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"556","DOI":"10.1093\/nar\/gkj133","article-title":"Ensembl 2006","volume":"34","author":"Birney","year":"2006","journal-title":"Nucleic Acids Res"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"708","DOI":"10.1101\/gr.1933104","article-title":"Aligning multiple genomic sequences with the threaded blockset aligner","volume":"14","author":"Blanchette","year":"2004","journal-title":"Genome Res"},{"key":"2023041104583300500_","first-page":"311","article-title":"Chapter 14.3","volume-title":"Introduction to Algorithms","author":"Cormen","year":"2001","edition":"2nd edn"},{"key":"2023041104583300500_","article-title":"Dynamic rectangle intersection searching","volume-title":"Technical Report","author":"Edelsbrunner","year":"1980"},{"key":"2023041104583300500_","first-page":"683","article-title":"Joining interval data in relational databases","author":"Enderle","year":"2004","journal-title":"Proc. ACM SIGMOD Intnl. Conf. on Management of Data"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"732","DOI":"10.1101\/gr.603103","article-title":"Gala, a database for genomic sequence alignments and annotations","volume":"13","author":"Giardine","year":"2003","journal-title":"Genome Res"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1145\/971697.602266","article-title":"R-Trees: a dynamic index structure for spatial searching","author":"Guttman","year":"1984","journal-title":"Proc. ACM SIGMOD Intnl. Conf. on Management of Data"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at ucsc","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"138","DOI":"10.1145\/119995.115807","article-title":"Segment indexes: dynamic indexing techniques for multi-dimensional interval data","author":"Kolovson","year":"1991","journal-title":"SIGMOD Conference"},{"key":"2023041104583300500_","first-page":"407","article-title":"Managing intervals efficiently in object-relational databases","author":"Kriegel","year":"2000","journal-title":"Proc. 26th Intl. Conf. on Very Large Data Bases (VLDB)"},{"key":"2023041104583300500_","article-title":"R-trees: Theory and Applications","author":"Manolopoulos","year":"2005"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","DOI":"10.1002\/0471223921.ch2","article-title":"The NCBI data model","volume-title":"Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins","author":"Ostell","year":"2001","edition":"2nd edn"},{"key":"2023041104583300500_","volume-title":"Computational Geometry: An Introduction","author":"Preparata","year":"1993","edition":"5th edn"},{"key":"2023041104583300500_","first-page":"431","article-title":"Mv3r-tree: a spatio-temporal access method for timestamp and interval queries","author":"Tao","year":"2001","journal-title":"VLDB 2001"},{"key":"2023041104583300500_","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1016\/S0169-023X(98)00046-9","article-title":"Overlapping b+-trees: an implementation of a transaction time access method","volume":"29","author":"Tzouramanis","year":"1999","journal-title":"Data Knowl. Eng"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/11\/1386\/49812386\/bioinformatics_23_11_1386.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/11\/1386\/49812386\/bioinformatics_23_11_1386.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,10]],"date-time":"2024-02-10T09:03:56Z","timestamp":1707555836000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/11\/1386\/199545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,1,18]]},"references-count":15,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2007,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl647","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,6]]},"published":{"date-parts":[[2007,1,18]]}}}