{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T12:34:22Z","timestamp":1778675662689,"version":"3.51.4"},"reference-count":42,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2020,10,30]],"date-time":"2020-10-30T00:00:00Z","timestamp":1604016000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [\u03b1,\u03b2], return a shortest substring T[i,j] of T with exactly one occurrence in [\u03b1,\u03b2]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=\u03a9(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlog\u03f5n) query time, where \u03f5&gt;0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].<\/jats:p>","DOI":"10.3390\/a13110276","type":"journal-article","created":{"date-parts":[[2020,10,30]],"date-time":"2020-10-30T09:29:32Z","timestamp":1604050172000},"page":"276","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Efficient Data Structures for Range Shortest Unique Substring Queries"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7854-7960","authenticated-orcid":false,"given":"Paniz","family":"Abedin","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arnab","family":"Ganguly","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Wisconsin - Whitewater, Whitewater, WI 53190, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Solon P.","family":"Pissis","sequence":"additional","affiliation":[{"name":"Life Sciences and Health, CWI, 1098 XG Amsterdam, The Netherlands"},{"name":"Center for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sharma V.","family":"Thankachan","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,10,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Lothaire, M. (2005). Applied Combinatorics on Words, Cambridge University Press.","DOI":"10.1017\/CBO9781107341005"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4633","DOI":"10.1093\/nar\/29.22.4633","article-title":"REPuter: The manifold applications of repeat analysis on a genomic scale","volume":"29","author":"Schleiermacher","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Haubold, B., Pierstorff, N., M\u00f6ller, F., and Wiehe, T. (2005). Genome comparison without alignment using shortest unique substrings. BMC Bioinform., 6.","DOI":"10.1186\/1471-2105-6-123"},{"key":"ref_4","unstructured":"Pei, J., Wu, W.C., and Yeh, M. (2013, January 8\u201312). On shortest unique substring queries. Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE 2013), Brisbane, Australia."},{"key":"ref_5","unstructured":"Khmelev, D.V., and Teahan, W.J. (August, January 28). A Repetition Based Measure for Verification of Text Collections and for Text Categorization. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Gusfield, D. (1997). Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press.","DOI":"10.1017\/CBO9780511574931"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Weiner, P. (1973, January 15\u201317). Linear Pattern Matching Algorithms. Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), Iowa City, IA, USA.","DOI":"10.1109\/SWAT.1973.13"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Ileri, A.M., K\u00fclekci, M.O., and Xu, B. (2014, January 16\u201318). Shortest Unique Substring Query Revisited. Proceedings of the Combinatorial Pattern Matching\u201425th Annual Symposium (CPM 2014), Moscow, Russia.","DOI":"10.1007\/978-3-319-07566-2_18"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Tsuruta, K., Inenaga, S., Bannai, H., and Takeda, M. (2014, January 26\u201329). Shortest Unique Substrings Queries in Optimal Time. Proceedings of the 40th International Conference on Current Trends in Theory and Practice of Computer Science, Nov\u00fd Smokovec, Slovakia.","DOI":"10.1007\/978-3-319-04298-5_44"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Abedin, P., K\u00fclekci, M.O., and V Thankachan, S. (2020). A Survey on Shortest Unique Substring Queries. Algorithms, 13.","DOI":"10.3390\/a13090224"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Allen, D.R., Thankachan, S.V., and Xu, B. (September, January 29). A Practical and Efficient Algorithm for the k-mismatch Shortest Unique Substring Finding Problem. Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (BCB 2018),  Washington, DC, USA.","DOI":"10.1145\/3233547.3233564"},{"key":"ref_12","unstructured":"Ganguly, A., Hon, W., Shah, R., and Thankachan, S.V. (2016, January 12\u201314). Space-Time Trade-Offs for the Shortest Unique Substring Problem. Proceedings of the 27th International Symposium on Algorithms and Computation (ISAAC), Sydney, Australia."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.tcs.2017.08.002","article-title":"Space-time trade-offs for finding shortest unique substrings and maximal unique matches","volume":"700","author":"Ganguly","year":"2017","journal-title":"Theor. Comput. Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1016\/j.jda.2018.11.009","article-title":"Algorithms and combinatorial properties on shortest unique palindromic substrings","volume":"52","author":"Inoue","year":"2018","journal-title":"J. Discret. Algorithms"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.tcs.2017.05.032","article-title":"In-place algorithms for exact and approximate shortest unique substring problems","volume":"690","author":"Hon","year":"2017","journal-title":"Theor. Comput. Sci."},{"key":"ref_16","unstructured":"Mieno, T., Inenaga, S., Bannai, H., and Takeda, M. (2016, January 22\u201326). Shortest Unique Substring Queries on Run-Length Encoded Strings. Proceedings of the 41st International Symposium on Mathematical Foundations of Computer Science MFCS, Krak\u00f3w, Poland."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Schultz, D.W., and Xu, B. (2018, January 8\u201311). On k-Mismatch Shortest Unique Substring Queries Using GPU. Proceedings of the 14th International Symposium, Bioinformatics Research and Applications, Beijing, China.","DOI":"10.1007\/978-3-319-94968-0_18"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mieno, T., K\u00f6ppl, D., Nakashima, Y., Inenaga, S., Bannai, H., and Takeda, M. (2019, January 7\u20139). Compact Data Structures for Shortest Unique Substring Queries. Proceedings of the 26th International Symposium, String Processing and Information Retrieval, Segovia, Spain.","DOI":"10.1007\/978-3-030-32686-9_8"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Watanabe, K., Nakashima, Y., Inenaga, S., Bannai, H., and Takeda, M. (2019, January 23\u201325). Shortest Unique Palindromic Substring Queries on Run-Length Encoded Strings. Proceedings of the 30th International Workshop Combinatorial Algorithms, Pisa, Italy.","DOI":"10.1007\/978-3-030-25005-8_35"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yao, A.C. (1982, January 5\u20137). Space-time Tradeoff for Answering Range Queries (Extended Abstract). Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing (STOC \u201982), San Francisco, CA, USA.","DOI":"10.1145\/800070.802185"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1137\/0222017","article-title":"Recursive Star-Tree Parallel Data Structure","volume":"22","author":"Berkman","year":"1993","journal-title":"SIAM J. Comput."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bender, M.A., and Farach-Colton, M. (2000, January 10\u201314). The LCA Problem Revisited. Proceedings of the 4th Latin American Symposium, LATIN 2000: Theoretical Informatics, Punta del Este, Uruguay.","DOI":"10.1007\/10719839_9"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1016\/j.jcss.2014.02.010","article-title":"Range LCP","volume":"80","author":"Amir","year":"2014","journal-title":"J. Comput. Syst. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Amir, A., Lewenstein, M., and Thankachan, S.V. (2015, January 1\u20134). Range LCP Queries Revisited. Proceedings of the 22nd International Symposium, String Processing and Information Retrieval, London, UK.","DOI":"10.1007\/978-3-319-23826-5_33"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Abedin, P., Ganguly, A., Hon, W., Nekrich, Y., Sadakane, K., Shah, R., and Thankachan, S.V. (2018, January 2\u20134). A Linear-Space Data Structure for Range-LCP Queries in Poly-Logarithmic Time. Proceedings of the 24th International Conference, Computing and Combinatorics, Qing Dao, China.","DOI":"10.1007\/978-3-319-94776-1_51"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"245","DOI":"10.3233\/FI-2018-1741","article-title":"A Linear Space Data Structure for Range LCP Queries","volume":"163","author":"Ganguly","year":"2018","journal-title":"Fundam. Inform."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Pissis, S.P. (2014). MoTeX-II: Structured MoTif eXtraction from large-scale datasets. BMC Bioinform., 15.","DOI":"10.1186\/1471-2105-15-235"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"5:1","DOI":"10.1186\/s13015-017-0094-z","article-title":"On avoided words, absent words, and their application to biological sequence analysis","volume":"12","author":"Almirantis","year":"2017","journal-title":"Algorithms Mol. Biol."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"i743","DOI":"10.1093\/bioinformatics\/bty601","article-title":"CNEFinder: Finding conserved non-coding elements in genomes","volume":"34","author":"Ayad","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Iliopoulos, C.S., Mohamed, M., Pissis, S.P., and Vayani, F. (2018, January 9\u201311). Maximal Motif Discovery in a Sliding Window. Proceedings of the 25th International Symposium, String Processing and Information Retrieval, Lima, Peru.","DOI":"10.1007\/978-3-030-00479-8_16"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1016\/j.tcs.2018.09.011","article-title":"On overabundant words and their application to biological sequence analysis","volume":"792","author":"Almirantis","year":"2019","journal-title":"Theor. Comput. Sci."},{"key":"ref_32","unstructured":"Matsuda, K., Sadakane, K., Starikovskaya, T., and Tateshita, M. (2020, January 17\u201319). Compressed Orthogonal Search on Suffix Arrays with Applications to Range LCP. Proceedings of the 31st Annual Symposium on Combinatorial Pattern Matching, Copenhagen, Denmark."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Abedin, P., Ganguly, A., Pissis, S.P., and Thankachan, S.V. (2019, January 7\u20139). Range Shortest Unique Substring Queries. Proceedings of the 26th International Symposium, String Processing and Information Retrieval, Segovia, Spain.","DOI":"10.1007\/978-3-030-32686-9_18"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Sleator, D.D., and Tarjan, R.E. (1981, January 11\u201313). A Data Structure for Dynamic Trees. Proceedings of the 13th Annual ACM Symposium on Theory of Computing, Milwaukee, WI, USA.","DOI":"10.1145\/800076.802464"},{"key":"ref_35","unstructured":"Chan, T.M., Nekrich, Y., Rahul, S., and Tsakalidis, K. (2018, January 9\u201313). Orthogonal Point Location and Rectangle Stabbing Queries in 3-d. Proceedings of the 45th International Colloquium on Automata, Languages, and Programming, Prague, Czech Republic."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1137\/0213024","article-title":"Fast Algorithms for Finding Nearest Common Ancestors","volume":"13","author":"Harel","year":"1984","journal-title":"SIAM J. Comput."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"935","DOI":"10.1137\/0222058","article-title":"Suffix Arrays: A New Method for On-Line String Searches","volume":"22","author":"Manber","year":"1993","journal-title":"SIAM J. Comput."},{"key":"ref_38","unstructured":"Farach, M. (1997, January 19\u201322). Optimal Suffix Tree Construction with Large Alphabets. Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS \u201997), Miami Beach, FL, USA."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"918","DOI":"10.1145\/1217856.1217858","article-title":"Linear work suffix array construction","volume":"53","author":"Sanders","year":"2006","journal-title":"J. ACM"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Nekrich, Y., and Navarro, G. (2012, January 4\u20136). Sorted Range Reporting. Proceedings of the 13th Scandinavian Symposium and Workshops (SWAT 2012), Helsinki, Finland.","DOI":"10.1007\/978-3-642-31155-0_24"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Thankachan, S.V., Aluru, C., Chockalingam, S.P., and Aluru, S. (2018, January 21\u201324). Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis. Proceedings of the 22nd Annual International Conference, Research in Computational Molecular Biology (RECOMB 2018), Paris, France.","DOI":"10.1007\/978-3-319-89929-9_14"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Barton, C., H\u00e9liou, A., Mouchard, L., and Pissis, S.P. (2014). Linear-time computation of minimal absent words using suffix array. BMC Bioinform., 15.","DOI":"10.1186\/s12859-014-0388-9"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/11\/276\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:27:05Z","timestamp":1760178425000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/11\/276"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,30]]},"references-count":42,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,11]]}},"alternative-id":["a13110276"],"URL":"https:\/\/doi.org\/10.3390\/a13110276","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,10,30]]}}}