{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T21:40:11Z","timestamp":1775079611774,"version":"3.50.1"},"reference-count":117,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T00:00:00Z","timestamp":1737504000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T00:00:00Z","timestamp":1737504000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"MUR","award":["FSE REACT EU - PON R\\&I 2014-2020"],"award-info":[{"award-number":["FSE REACT EU - PON R\\&I 2014-2020"]}]},{"DOI":"10.13039\/501100000781","name":"European Research Council","doi-asserted-by":"publisher","award":["Advanced Grant 788893 AMDROMA"],"award-info":[{"award-number":["Advanced Grant 788893 AMDROMA"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"name":"NWO","award":["OCENW.GROOT.2019.015 ``Optimization for and with Machine Learning (OPTIMAL)''"],"award-info":[{"award-number":["OCENW.GROOT.2019.015 ``Optimization for and with Machine Learning (OPTIMAL)''"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2025,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Missing values arise routinely in real-world sequential (string) datasets due to: (1) imprecise data measurements; (2) flexible sequence modeling, such as binding profiles of molecular sequences; or (3) the existence of confidential information in a dataset which has been deleted deliberately for privacy protection. In order to analyze such datasets, it is often important to replace each missing value, with one or more <jats:italic>valid<\/jats:italic> letters, in an efficient and effective way. Here we formalize this task as a combinatorial optimization problem: the set of constraints includes the <jats:italic>context<\/jats:italic> of the missing value (i.e., its vicinity) as well as a finite set of user-defined <jats:italic>forbidden<\/jats:italic> patterns, modeling, for instance, implausible or confidential patterns; and the objective function seeks to <jats:italic>minimize the number of new letters<\/jats:italic> we introduce. Algorithmically, our problem translates to finding shortest paths in special graphs that contain <jats:italic>forbidden edges<\/jats:italic> representing the forbidden patterns. Our work makes the following contributions: (1) we design a linear-time algorithm to solve this problem for strings over constant-sized alphabets; (2) we show how our algorithm can be effortlessly applied to <jats:italic>fully<\/jats:italic> sanitize a private string in the presence of a set of fixed-length forbidden patterns [Bernardini et al. 2021a]; (3) we propose a methodology for sanitizing and clustering a collection of private strings that utilizes our algorithm and an effective and efficiently computable distance measure; and (4) we present extensive experimental results showing that our methodology can efficiently sanitize a collection of private strings while preserving clustering quality, outperforming the state of the art and baselines. To arrive at our theoretical results, we employ techniques from formal languages and combinatorial pattern matching.<\/jats:p>","DOI":"10.1007\/s10618-024-01074-3","type":"journal-article","created":{"date-parts":[[2025,1,22]],"date-time":"2025-01-22T15:00:03Z","timestamp":1737558003000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Missing value replacement in strings and applications"],"prefix":"10.1007","volume":"39","author":[{"given":"Giulia","family":"Bernardini","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chang","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Grigorios","family":"Loukides","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alberto","family":"Marchetti-Spaccamela","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Solon P.","family":"Pissis","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leen","family":"Stougie","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michelle","family":"Sweering","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,1,22]]},"reference":[{"key":"1074_CR1","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1016\/j.datak.2011.10.002","volume":"72","author":"O Abul","year":"2012","unstructured":"Abul O, G\u00f6k\u00e7e H (2012) Knowledge hiding from tree and graph databases. Data Knowl Eng 72:148\u2013171. https:\/\/doi.org\/10.1016\/j.datak.2011.10.002","journal-title":"Data Knowl Eng"},{"issue":"12","key":"1074_CR2","doi-asserted-by":"publisher","first-page":"1709","DOI":"10.1109\/TKDE.2009.213","volume":"22","author":"O Abul","year":"2010","unstructured":"Abul O, Bonchi F, Giannotti F (2010) Hiding sequential and spatiotemporal patterns. IEEE Trans Knowl Data Eng 22(12):1709\u20131723. https:\/\/doi.org\/10.1109\/TKDE.2009.213","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"4","key":"1074_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1824777.1824779","volume":"6","author":"MR Ackermann","year":"2010","unstructured":"Ackermann MR, Bl\u00f6mer J, Sohler C (2010) Clustering for metric and nonmetric distance measures. ACM Trans Algorithms 6(4):1\u201326. https:\/\/doi.org\/10.1145\/1824777.1824779","journal-title":"ACM Trans Algorithms"},{"key":"1074_CR4","doi-asserted-by":"publisher","unstructured":"Aggarwal CC (2008) On unifying privacy and uncertain data models. In: Proceedings of the 24th International Conference on Data Engineering (ICDE). IEEE Computer Society, pp 386\u2013395, https:\/\/doi.org\/10.1109\/ICDE.2008.4497447","DOI":"10.1109\/ICDE.2008.4497447"},{"key":"1074_CR5","doi-asserted-by":"publisher","unstructured":"Aggarwal CC (2009) Managing and Mining Uncertain Data, Advances in Database Systems, vol 35. Kluwer. https:\/\/doi.org\/10.1007\/978-0-387-09690-2","DOI":"10.1007\/978-0-387-09690-2"},{"key":"1074_CR6","doi-asserted-by":"publisher","unstructured":"Aggarwal CC, Parthasarathy S (2001) Mining massively incomplete data sets by conceptual reconstruction. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 227\u2013232, https:\/\/doi.org\/10.1145\/502512.502543","DOI":"10.1145\/502512.502543"},{"issue":"5","key":"1074_CR7","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1109\/TKDE.2008.190","volume":"21","author":"CC Aggarwal","year":"2009","unstructured":"Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609\u2013623. https:\/\/doi.org\/10.1109\/TKDE.2008.190","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1074_CR8","doi-asserted-by":"publisher","unstructured":"Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. In: Aggarwal CC, Zhai C (eds) Mining Text Data. Springer, p 77\u2013128, https:\/\/doi.org\/10.1007\/978-1-4614-3223-4_4,","DOI":"10.1007\/978-1-4614-3223-4_4"},{"key":"1074_CR9","doi-asserted-by":"publisher","first-page":"474","DOI":"10.1177\/1073110516667943","volume":"44","author":"I Ajunwa","year":"2016","unstructured":"Ajunwa I, Crawford K, Ford J (2016) Health and big data: an ethical framework for health information collection by corporate wellness programs. J Law Med Ethics 44:474\u2013480","journal-title":"J Law Med Ethics"},{"key":"1074_CR10","unstructured":"Allard T, B\u00e9ziaud L, Gambs S (2020) Online publication of court records: circumventing the privacy-transparency trade-off. In: 1st International Workshop on Law and Machine Learning LML2020, in conjunction with ICML 2020"},{"issue":"1\u20134","key":"1074_CR11","doi-asserted-by":"publisher","first-page":"41","DOI":"10.3233\/FI-2020-1947","volume":"175","author":"M Alzamel","year":"2020","unstructured":"Alzamel M, Ayad LAK, Bernardini G et al (2020) Comparing degenerate strings. Fundam Inform 175(1\u20134):41\u201358. https:\/\/doi.org\/10.3233\/FI-2020-1947","journal-title":"Fundam Inform"},{"issue":"1","key":"1074_CR12","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1109\/TCBB.2021.3136792","volume":"20","author":"N Anjum","year":"2023","unstructured":"Anjum N, Nabil RL, Rafi RI et al (2023) CD-MAWS: an alignment-free phylogeny estimation method using cosine distance on minimal absent word sets. IEEE ACM Trans Comput Biol Bioinform 20(1):196\u2013205. https:\/\/doi.org\/10.1109\/TCBB.2021.3136792","journal-title":"IEEE ACM Trans Comput Biol Bioinform"},{"issue":"3","key":"1074_CR13","doi-asserted-by":"publisher","first-page":"1087","DOI":"10.1137\/15M1053128","volume":"47","author":"A Backurs","year":"2018","unstructured":"Backurs A, Indyk P (2018) Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). SIAM J Comput 47(3):1087\u20131097. https:\/\/doi.org\/10.1137\/15M1053128","journal-title":"SIAM J Comput"},{"key":"1074_CR14","doi-asserted-by":"crossref","unstructured":"Bansal P, Deshpande P, Sarawagi S (2021) Missing value imputation on multidimensional time series. Proc VLDB Endow 14(11), 2533\u20132545. https:\/\/doi.org\/10.14778\/3476249.3476300","DOI":"10.14778\/3476249.3476300"},{"key":"1074_CR15","doi-asserted-by":"publisher","unstructured":"Bernardini G, Chen H, Conte A, et\u00a0al (2019) String sanitization: A combinatorial approach. In: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD, Proceedings, Part I, Lecture Notes in Computer Science, vol 11906. Springer, pp 627\u2013644, https:\/\/doi.org\/10.1007\/978-3-030-46150-8_37","DOI":"10.1007\/978-3-030-46150-8_37"},{"key":"1074_CR16","doi-asserted-by":"publisher","unstructured":"Bernardini G, Chen H, Loukides G, et\u00a0al (2020a) String sanitization under edit distance. In: 31st Annual Symposium on Combinatorial Pattern Matching, (CPM), LIPIcs, vol 161. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik, pp 7:1\u20137:14, https:\/\/doi.org\/10.4230\/LIPIcs.CPM.2020.7","DOI":"10.4230\/LIPIcs.CPM.2020.7"},{"key":"1074_CR17","doi-asserted-by":"publisher","unstructured":"Bernardini G, Conte A, Gourdel G, et\u00a0al (2020b) Hide and mine in strings: Hardness and algorithms. In: ICDM. IEEE, pp 924\u2013929, https:\/\/doi.org\/10.1109\/ICDM50108.2020.00103","DOI":"10.1109\/ICDM50108.2020.00103"},{"issue":"1","key":"1074_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3418683","volume":"15","author":"G Bernardini","year":"2020","unstructured":"Bernardini G, Chen H, Conte A, Grossi R, Loukides G, Pisanti N, Pissis SP, Rosone G, Sweering M (2020) Combinatorial algorithms for string sanitization. ACM Trans Knowl Discov Data 15(1):1\u201334. https:\/\/doi.org\/10.1145\/3418683","journal-title":"ACM Trans Knowl Discov Data"},{"key":"1074_CR19","doi-asserted-by":"publisher","unstructured":"Bernardini G, Marchetti-Spaccamela A, Pissis SP, et\u00a0al (2021b) Constructing strings avoiding forbidden substrings. In: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM), LIPIcs, vol 191. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik, pp 9:1\u20139:18, https:\/\/doi.org\/10.4230\/LIPIcs.CPM.2021.9","DOI":"10.4230\/LIPIcs.CPM.2021.9"},{"issue":"6","key":"1074_CR20","doi-asserted-by":"publisher","first-page":"5948","DOI":"10.1109\/TKDE.2022.3158063","volume":"35","author":"G Bernardini","year":"2023","unstructured":"Bernardini G, Conte A, Gourdel G et al (2023) Hide and mine in strings: hardness, algorithms, and experiments. IEEE Trans Knowl Data Eng 35(6):5948\u20135963. https:\/\/doi.org\/10.1109\/TKDE.2022.3158063","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1074_CR21","doi-asserted-by":"publisher","unstructured":"Bie\u00dfmann F, Salinas D, Schelter S, et\u00a0al (2018) \"deep\" learning for missing value imputationin tables with non-numerical data. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (CIKM). ACM, pp 2017\u20132025, https:\/\/doi.org\/10.1145\/3269206.3272005","DOI":"10.1145\/3269206.3272005"},{"key":"1074_CR22","doi-asserted-by":"publisher","unstructured":"Bonomi L, Fan L, Jin H (2016) An information-theoretic approach to individual sequential data sanitization. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. ACM, pp 337\u2013346, https:\/\/doi.org\/10.1145\/2835776.2835828","DOI":"10.1145\/2835776.2835828"},{"issue":"1","key":"1074_CR23","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332. https:\/\/doi.org\/10.1023\/A:1010933404324","journal-title":"Mach Learn"},{"issue":"1","key":"1074_CR24","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1080\/07391102.1986.10507643","volume":"4","author":"V Brendel","year":"1986","unstructured":"Brendel V, Beckmann JS, Trifonov EN (1986) Linguistics of nucleotide sequences: morphology and comparison of vocabularies. J Biomol Struct Dyn 4(1):11\u201321","journal-title":"J Biomol Struct Dyn"},{"key":"1074_CR25","doi-asserted-by":"publisher","unstructured":"Breve B, Caruccio L, Deufemia V, et\u00a0al (2022) RENUVER: A missing value imputation algorithm based on relaxed functional dependencies. In: Proceedings of the 25th International Conference on Extending Database Technology (EDBT). OpenProceedings.org, pp 1:52\u20131:64, https:\/\/doi.org\/10.5441\/002\/edbt.2022.05","DOI":"10.5441\/002\/edbt.2022.05"},{"key":"1074_CR26","first-page":"758","volume":"49","author":"NG de Bruijn","year":"1946","unstructured":"de Bruijn NG (1946) A combinatorial problem. Koninklijke Nederlandse Akademie V Wetenschappen 49:758\u2013764","journal-title":"Koninklijke Nederlandse Akademie V Wetenschappen"},{"key":"1074_CR27","doi-asserted-by":"crossref","unstructured":"Van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. Journal of statistical software. 2011 Dec, 12(45), pp. 1\u201367. https:\/\/doi.org\/10.18637\/jss.v045.i03","DOI":"10.18637\/jss.v045.i03"},{"key":"1074_CR28","doi-asserted-by":"publisher","unstructured":"Calders T, Goethals B, Mampaey M (2007) Mining itemsets in the presence of missing values. In: Proceedings of the 2007 ACM Symposium on Applied Computing (SAC). ACM, pp 404\u2013408, https:\/\/doi.org\/10.1145\/1244002.1244097","DOI":"10.1145\/1244002.1244097"},{"key":"1074_CR29","unstructured":"Cormen TH, Leiserson CE, Rivest RL, et\u00a0al (2009) Introduction to Algorithms, 3rd Edition. MIT Press, http:\/\/mitpress.mit.edu\/books\/introduction-algorithms"},{"issue":"3","key":"1074_CR30","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/S0020-0190(98)00104-5","volume":"67","author":"M Crochemore","year":"1998","unstructured":"Crochemore M, Mignosi F, Restivo A (1998) Automata and forbidden words. Inf Process Lett 67(3):111\u2013117. https:\/\/doi.org\/10.1016\/S0020-0190(98)00104-5","journal-title":"Inf Process Lett"},{"key":"1074_CR31","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511546853","volume-title":"Algorithms on strings","author":"M Crochemore","year":"2007","unstructured":"Crochemore M, Hancart C, Lecroq T (2007) Algorithms on strings. Cambridge University Press, Cambridge"},{"key":"1074_CR32","unstructured":"log dataset H (2022) https:\/\/github.com\/logpai\/loghub\/tree\/master\/HPC"},{"issue":"11","key":"1074_CR33","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.1093\/nar\/27.11.2369","volume":"27","author":"AL Delcher","year":"1999","unstructured":"Delcher AL, Kasif S, Fleischmann RD et al (1999) Alignment of whole genomes. Nucleic Acids Res 27(11):2369\u20132376. https:\/\/doi.org\/10.1093\/nar\/27.11.2369","journal-title":"Nucleic Acids Res"},{"key":"1074_CR34","doi-asserted-by":"publisher","unstructured":"Dong B, Xie S, Gao J, et\u00a0al (2015) Onlinecm: Real-time consensus classification with missing values. In: Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, pp 685\u2013693, https:\/\/doi.org\/10.1137\/1.9781611974010.77","DOI":"10.1137\/1.9781611974010.77"},{"key":"1074_CR35","volume-title":"Applied missing data analysis","author":"CK Enders","year":"2010","unstructured":"Enders CK (2010) Applied missing data analysis. Guilford Press, New York"},{"key":"1074_CR36","doi-asserted-by":"publisher","unstructured":"Farach M (1997) Optimal suffix tree construction with large alphabets. In: 38th Annual Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, pp 137\u2013143, https:\/\/doi.org\/10.1109\/SFCS.1997.646102","DOI":"10.1109\/SFCS.1997.646102"},{"issue":"1","key":"1074_CR37","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1016\/j.jda.2007.01.004","volume":"6","author":"A Figueroa","year":"2008","unstructured":"Figueroa A, Goldstein A, Jiang T et al (2008) Approximate clustering of incomplete fingerprints. J Discrete Algorithms 6(1):103\u2013108. https:\/\/doi.org\/10.1016\/j.jda.2007.01.004","journal-title":"J Discrete Algorithms"},{"key":"1074_CR38","doi-asserted-by":"publisher","unstructured":"Fiot C, Laurent A, Teisseire M (2007) Approximate sequential patterns for incomplete sequence database mining. In: FUZZ-IEEE 2007, IEEE International Conference on Fuzzy Systems, Proceedings. IEEE, pp 1\u20136, https:\/\/doi.org\/10.1109\/FUZZY.2007.4295445","DOI":"10.1109\/FUZZY.2007.4295445"},{"issue":"3","key":"1074_CR39","doi-asserted-by":"publisher","first-page":"538","DOI":"10.1145\/828.1884","volume":"31","author":"ML Fredman","year":"1984","unstructured":"Fredman ML, Koml\u00f3s J, Szemer\u00e9di E (1984) Storing a sparse table with O(1) worst case access time. J ACM 31(3):538\u2013544. https:\/\/doi.org\/10.1145\/828.1884","journal-title":"J ACM"},{"key":"1074_CR40","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/j.ssresearch.2014.11.003","volume":"50","author":"S Fuller","year":"2015","unstructured":"Fuller S, Stecy-Hildebrandt N (2015) Career pathways for temporary workers: exploring heterogeneous mobility dynamics with sequence analysis. Soc Sci Res 50:76\u201399. https:\/\/doi.org\/10.1016\/j.ssresearch.2014.11.003","journal-title":"Soc Sci Res"},{"issue":"6","key":"1074_CR41","doi-asserted-by":"publisher","first-page":"552","DOI":"10.1016\/J.DATAK.2008.12.001","volume":"68","author":"BCM Fung","year":"2009","unstructured":"Fung BCM, Wang K, Wang L et al (2009) Privacy-preserving data publishing for cluster analysis. Data Knowl Eng 68(6):552\u2013575. https:\/\/doi.org\/10.1016\/J.DATAK.2008.12.001","journal-title":"Data Knowl Eng"},{"key":"1074_CR42","doi-asserted-by":"crossref","unstructured":"Girgis HZ (2022) Meshclust v3.0: high-quality clustering of DNA  sequences using the mean shift algorithm and alignment-free identity scores","DOI":"10.1101\/2022.01.15.476464"},{"key":"1074_CR43","doi-asserted-by":"publisher","unstructured":"Gkoulalas-Divanis A, Loukides G (2011) Revisiting sequential pattern hiding to enhance utility. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 1316\u20131324, https:\/\/doi.org\/10.1145\/2020408.2020605","DOI":"10.1145\/2020408.2020605"},{"issue":"5","key":"1074_CR44","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1109\/TKDE.2008.199","volume":"21","author":"A Gkoulalas-Divanis","year":"2009","unstructured":"Gkoulalas-Divanis A, Verykios VS (2009) Exact knowledge hiding through database extension. IEEE Trans Knowl Data Eng 21(5):699\u2013713. https:\/\/doi.org\/10.1109\/TKDE.2008.199","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1074_CR45","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s13015-016-0076-6","volume":"11","author":"R Grossi","year":"2016","unstructured":"Grossi R, Iliopoulos CS, Mercas R et al (2016) Circular sequence comparison: algorithms and applications. Algorithms Mol Biol 11:12. https:\/\/doi.org\/10.1186\/s13015-016-0076-6","journal-title":"Algorithms Mol Biol"},{"key":"1074_CR46","doi-asserted-by":"publisher","unstructured":"Gwadera R, Gkoulalas-Divanis A, Loukides G (2013) Permutation-based sequential pattern hiding. In: 2013 IEEE 13th International Conference on Data Mining. IEEE Computer Society, pp 241\u2013250, https:\/\/doi.org\/10.1109\/ICDM.2013.57","DOI":"10.1109\/ICDM.2013.57"},{"key":"1074_CR47","unstructured":"Halpin B (2012) Multiple imputation for life-course sequence data. Tech. rep., University of Limerick, Technical Report WP2012-01"},{"key":"1074_CR48","unstructured":"Halpin B (2013) Imputing sequence data: Extensions to initial and terminal gaps. Tech. rep., Stata\u2019s Working Paper WP2013-01"},{"issue":"3","key":"1074_CR49","doi-asserted-by":"publisher","first-page":"590","DOI":"10.1177\/1536867X1601600303","volume":"16","author":"B Halpin","year":"2016","unstructured":"Halpin B (2016) Multiple imputation for categorical time series. Stand Genomic Sci 16(3):590\u2013612. https:\/\/doi.org\/10.1177\/1536867X1601600303","journal-title":"Stand Genomic Sci"},{"key":"1074_CR50","doi-asserted-by":"crossref","unstructured":"Hilbe J (2009) Logistic Regression Models. Chapman and Hall\/CRC","DOI":"10.1201\/9781420075779"},{"key":"1074_CR51","doi-asserted-by":"publisher","unstructured":"Hong Y, Vaidya J, Lu H, et\u00a0al (2012) Differentially private search log sanitization with optimal output utility. In: 15th International Conference on Extending Database Technology, EDBT. ACM, pp 50\u201361, https:\/\/doi.org\/10.1145\/2247596.2247604","DOI":"10.1145\/2247596.2247604"},{"issue":"4","key":"1074_CR52","doi-asserted-by":"publisher","first-page":"512","DOI":"10.1006\/jcss.2001.1774","volume":"63","author":"R Impagliazzo","year":"2001","unstructured":"Impagliazzo R, Paturi R, Zane F (2001) Which problems have strongly exponential complexity? J Comput Syst Sci 63(4):512\u2013530. https:\/\/doi.org\/10.1006\/jcss.2001.1774","journal-title":"J Comput Syst Sci"},{"key":"1074_CR53","doi-asserted-by":"publisher","unstructured":"Italiano GF, Prezza N, Sinaimeri B, et\u00a0al (2021) Compressed Weighted de Bruijn Graphs. In: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM), Leibniz International Proceedings in Informatics (LIPIcs), vol 191. Schloss Dagstuhl \u2013 Leibniz-Zentrum f\u00fcr Informatik, Dagstuhl, Germany, pp 16:1\u201316:16, https:\/\/doi.org\/10.4230\/LIPIcs.CPM.2021.16","DOI":"10.4230\/LIPIcs.CPM.2021.16"},{"key":"1074_CR54","doi-asserted-by":"publisher","unstructured":"Jha S, Kruger L, McDaniel PD (2005) Privacy preserving clustering. In: di\u00a0Vimercati SDC, Syverson PF, Gollmann D (eds) Computer Security - ESORICS 2005, 10th European Symposium on Research in Computer Security, Milan, Italy, September 12-14, 2005, Proceedings, Lecture Notes in Computer Science, vol 3679. Springer, pp 397\u2013417, https:\/\/doi.org\/10.1007\/11555827_23,","DOI":"10.1007\/11555827_23"},{"issue":"3","key":"1074_CR55","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1137\/0137041","volume":"37","author":"O Kariv","year":"1979","unstructured":"Kariv O, Hakimi SL (1979) An algorithmic approach to network location problems. ii: the p-medians. SIAM J Appl Math 37(3):539\u2013560","journal-title":"SIAM J Appl Math"},{"issue":"4","key":"1074_CR56","doi-asserted-by":"publisher","first-page":"1889","DOI":"10.1109\/TKDE.2020.3001694","volume":"34","author":"N Karmitsa","year":"2022","unstructured":"Karmitsa N, Taheri S, Bagirov AM et al (2022) Missing value imputation via clusterwise linear regression. IEEE Trans Knowl Data Eng 34(4):1889\u20131901. https:\/\/doi.org\/10.1109\/TKDE.2020.3001694","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"2","key":"1074_CR57","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1147\/rd.312.0249","volume":"31","author":"RM Karp","year":"1987","unstructured":"Karp RM, Rabin MO (1987) Efficient randomized pattern-matching algorithms. IBM J Res Dev 31(2):249\u2013260. https:\/\/doi.org\/10.1147\/rd.312.0249","journal-title":"IBM J Res Dev"},{"key":"1074_CR58","doi-asserted-by":"publisher","unstructured":"Kaufman L, Rousseeuw PJ (1990) Partitioning Around Medoids (Program PAM). John Wiley & Sons Ltd, chap 2:68\u2013125. https:\/\/doi.org\/10.1002\/9780470316801.ch2","DOI":"10.1002\/9780470316801.ch2"},{"key":"1074_CR59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-8-286","volume":"8","author":"A Kelil","year":"2007","unstructured":"Kelil A, Wang S, Brzezinski R et al (2007) CLUSS: clustering of protein sequences based on a new similarity measure. BMC Bioinform 8:1\u20139. https:\/\/doi.org\/10.1186\/1471-2105-8-286","journal-title":"BMC Bioinform"},{"issue":"1\u20132","key":"1074_CR60","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1093\/biomet\/30.1-2.81","volume":"30","author":"MG Kendall","year":"1938","unstructured":"Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1\u20132):81\u201393. https:\/\/doi.org\/10.1093\/biomet\/30.1-2.81","journal-title":"Biometrika"},{"issue":"1","key":"1074_CR61","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/j.cell.2013.09.006","volume":"155","author":"DC Koboldt","year":"2013","unstructured":"Koboldt DC, Steinberg KM, Larson DE et al (2013) The next-generation sequencing revolution and its impact on genomics. Cell 155(1):27\u201338","journal-title":"Cell"},{"issue":"3","key":"1074_CR62","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1109\/TKDE.2016.2628180","volume":"29","author":"B Li","year":"2017","unstructured":"Li B, Vorobeychik Y, Li M et al (2017) Scalable iterative classification for sanitizing large-scale datasets. IEEE Trans Knowl Data Eng 29(3):698\u2013711. https:\/\/doi.org\/10.1109\/TKDE.2016.2628180","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"5","key":"1074_CR63","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","volume":"26","author":"H Li","year":"2010","unstructured":"Li H, Durbin R (2010) Fast and accurate long-read alignment with burrows-wheeler transform. Bioinform 26(5):589\u2013595. https:\/\/doi.org\/10.1093\/bioinformatics\/btp698","journal-title":"Bioinform"},{"key":"1074_CR64","doi-asserted-by":"publisher","unstructured":"Li Q, Zheng Y, Xie X, et\u00a0al (2008) Mining user similarity based on location history. In: Aref WG, Mokbel MF, Schneider M (eds) 16th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2008, November 5-7, 2008, Irvine, California, USA, Proceedings. ACM, p\u00a034, https:\/\/doi.org\/10.1145\/1463434.1463477,","DOI":"10.1145\/1463434.1463477"},{"issue":"15","key":"1074_CR65","doi-asserted-by":"publisher","first-page":"1966","DOI":"10.1093\/bioinformatics\/btp336","volume":"25","author":"R Li","year":"2009","unstructured":"Li R, Yu C, Li Y et al (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15):1966\u20131967. https:\/\/doi.org\/10.1093\/bioinformatics\/btp336","journal-title":"Bioinformatics"},{"issue":"1","key":"1074_CR66","doi-asserted-by":"publisher","first-page":"12226","DOI":"10.1038\/s41598-017-12493-2","volume":"7","author":"Y Li","year":"2017","unstructured":"Li Y, He L, Lucy He R, Yau SS (2017) A novel fast vector method for genetic sequence comparison. Sci Rep 7(1):12226. https:\/\/doi.org\/10.1038\/s41598-017-12493-2","journal-title":"Sci Rep"},{"issue":"9","key":"1074_CR67","doi-asserted-by":"publisher","first-page":"2550","DOI":"10.1109\/TKDE.2015.2411276","volume":"27","author":"Z Li","year":"2015","unstructured":"Li Z, Qin L, Cheng H et al (2015) TRIP: an interactive retrieving-inferring data imputation approach. IEEE Trans Knowl Data Eng 27(9):2550\u20132563. https:\/\/doi.org\/10.1109\/TKDE.2015.2411276","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1074_CR68","doi-asserted-by":"publisher","unstructured":"Lin JC, Zhang Y, Fournier-Viger P, et\u00a0al (2018) A metaheuristic algorithm for hiding sensitive itemsets. In: Database and Expert Systems Applications - 29th International Conference, DEXA, Lecture Notes in Computer Science, vol 11030. Springer, pp 492\u2013498, https:\/\/doi.org\/10.1007\/978-3-319-98812-2_45","DOI":"10.1007\/978-3-319-98812-2_45"},{"key":"1074_CR69","doi-asserted-by":"publisher","unstructured":"Lin S, Wu X, Mart\u00ednez GJ, et\u00a0al (2020) Filling Missing Values on Wearable-Sensory Time Series Data, SIAM, pp 46\u201354. https:\/\/doi.org\/10.1137\/1.9781611976236.6","DOI":"10.1137\/1.9781611976236.6"},{"key":"1074_CR70","volume-title":"Statistical Analysis with Missing Data","author":"RJ Little","year":"2019","unstructured":"Little RJ, Rubin DB (2019) Statistical Analysis with Missing Data, 3rd edn. John Wiley & Sons Inc, USA","edition":"3"},{"key":"1074_CR71","doi-asserted-by":"publisher","unstructured":"Liu A, Zheng K, Li L, et\u00a0al (2015) Efficient secure similarity computation on encrypted trajectory data. In: 31st IEEE International Conference on Data Engineering (ICDE). IEEE Computer Society, pp 66\u201377, https:\/\/doi.org\/10.1109\/ICDE.2015.7113273","DOI":"10.1109\/ICDE.2015.7113273"},{"key":"1074_CR72","doi-asserted-by":"publisher","unstructured":"Loukides G, Gwadera R (2015) Optimal event sequence sanitization. In: Proceedings of the 2015 SIAM International Conference on Data Mining. SIAM, pp 775\u2013783, https:\/\/doi.org\/10.1137\/1.9781611974010.87","DOI":"10.1137\/1.9781611974010.87"},{"key":"1074_CR73","doi-asserted-by":"publisher","unstructured":"Loukides G, Pissis SP (2021) Bidirectional string anchors: A new string sampling mechanism. In: 29th Annual European Symposium on Algorithms (ESA), LIPIcs, vol 204. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik, pp 64:1\u201364:21, https:\/\/doi.org\/10.4230\/LIPIcs.ESA.2021.64","DOI":"10.4230\/LIPIcs.ESA.2021.64"},{"issue":"6","key":"1074_CR74","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3412364","volume":"14","author":"Q Ma","year":"2020","unstructured":"Ma Q, Gu Y, Lee WC, Yu G, Liu H, Wu X (2020) REMIAN: real-time and error-tolerant missing value imputation. ACM Trans Knowl Discov Data 14(6):1\u201338. https:\/\/doi.org\/10.1145\/3412364","journal-title":"ACM Trans Knowl Discov Data"},{"key":"1074_CR75","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1016\/j.alcr.2015.06.002","volume":"26","author":"A McMunn","year":"2015","unstructured":"McMunn A, Lacey R, Worts D et al (2015) De-standardization and gender convergence in work-family life courses in great britain: a multi-channel sequence analysis. Adv Life Course Res 26:60\u201375. https:\/\/doi.org\/10.1016\/j.alcr.2015.06.002","journal-title":"Adv Life Course Res"},{"issue":"5","key":"1074_CR76","doi-asserted-by":"publisher","first-page":"873","DOI":"10.1016\/j.jmva.2006.11.013","volume":"98","author":"M Meila","year":"2007","unstructured":"Meila M (2007) Comparing clusterings-an information based distance. J Multivar Anal 98(5):873\u2013895. https:\/\/doi.org\/10.1016\/j.jmva.2006.11.013","journal-title":"J Multivar Anal"},{"key":"1074_CR77","doi-asserted-by":"publisher","unstructured":"Mieno T, Pissis SP, Stougie L, et\u00a0al (2021) String sanitization under edit distance: Improved and generalized. In: 32nd Annual Symposium on Combinatorial Pattern Matching (CPM), LIPIcs, vol 191. Schloss Dagstuhl - Leibniz-Zentrum f\u00fcr Informatik, pp 19:1\u201319:18, https:\/\/doi.org\/10.4230\/LIPIcs.CPM.2021.19","DOI":"10.4230\/LIPIcs.CPM.2021.19"},{"key":"1074_CR78","doi-asserted-by":"publisher","DOI":"10.1017\/cbo9780511814075","volume-title":"Randomized Algorithms","author":"R Motwani","year":"1995","unstructured":"Motwani R, Raghavan P (1995) Randomized Algorithms. Cambridge University Press, Cambridge. https:\/\/doi.org\/10.1017\/cbo9780511814075"},{"key":"1074_CR79","doi-asserted-by":"publisher","unstructured":"Nguyen D, Luo W, Nguyen TD, et\u00a0al (2018a) Learning graph representation via frequent subgraphs. In: Proceedings of the 2018 SIAM International Conference on Data Mining (SDM). SIAM, pp 306\u2013314, https:\/\/doi.org\/10.1137\/1.9781611975321.35","DOI":"10.1137\/1.9781611975321.35"},{"key":"1074_CR80","doi-asserted-by":"publisher","unstructured":"Nguyen D, Luo W, Nguyen TD, et\u00a0al (2018b) Sqn2vec: Learning sequence representation via sequential patterns with a gap constraint. In: Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD, Proceedings, Part II, Lecture Notes in Computer Science, vol 11052. Springer, pp 569\u2013584, https:\/\/doi.org\/10.1007\/978-3-030-10928-8_34","DOI":"10.1007\/978-3-030-10928-8_34"},{"key":"1074_CR81","doi-asserted-by":"publisher","first-page":"2837","DOI":"10.5555\/1756006.1953024","volume":"11","author":"XV Nguyen","year":"2010","unstructured":"Nguyen XV, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837\u20132854. https:\/\/doi.org\/10.5555\/1756006.1953024","journal-title":"J Mach Learn Res"},{"key":"1074_CR82","unstructured":"package MR (2022a) https:\/\/cran.r-project.org\/web\/packages\/missForest\/index.html"},{"key":"1074_CR83","unstructured":"package SR (2022b) https:\/\/cran.r-project.org\/web\/packages\/seqimpute\/index.html"},{"issue":"1","key":"1074_CR84","first-page":"1","volume":"19","author":"TE Raghunathan","year":"2003","unstructured":"Raghunathan TE, Reiter JP, Rubin DB (2003) Multiple imputation for statistical disclosure limitation. J Off Stat 19(1):1","journal-title":"J Off Stat"},{"issue":"2","key":"1074_CR85","doi-asserted-by":"publisher","first-page":"537","DOI":"10.1142\/S0219720006002028","volume":"4","author":"M R\u00e9gnier","year":"2006","unstructured":"R\u00e9gnier M, Vandenbogaert M (2006) Comparison of statistical significance criteria. J Bioinform Comput Biol 4(2):537\u2013552. https:\/\/doi.org\/10.1142\/S0219720006002028","journal-title":"J Bioinform Comput Biol"},{"key":"1074_CR86","doi-asserted-by":"crossref","unstructured":"Rekatsinas T, Chu X, Ilyas IF, et al. (2017) Holoclean: Holistic data repairs with probabilistic inference. Proc VLDB Endow 10(11), 1190\u20131201. https:\/\/doi.org\/10.14778\/3137628.3137631","DOI":"10.14778\/3137628.3137631"},{"key":"1074_CR87","volume-title":"Statistical Analysis with Missing Data","author":"D Rubin","year":"2019","unstructured":"Rubin D, Little RJA (2019) Statistical Analysis with Missing Data. John Wiley & Sons, Hoboken"},{"key":"1074_CR88","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316696","volume-title":"Multiple Imputation for Nonresponse in Surveys","author":"DB Rubin","year":"1987","unstructured":"Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. Wiley, Hoboken"},{"issue":"2","key":"1074_CR89","first-page":"461","volume":"9","author":"DB Rubin","year":"1993","unstructured":"Rubin DB (1993) Statistical disclosure limitation. J Off Stat 9(2):461\u2013468","journal-title":"J Off Stat"},{"key":"1074_CR90","doi-asserted-by":"publisher","first-page":"179","DOI":"10.4153\/CJM-1961-015-3","volume":"13","author":"C Schensted","year":"1961","unstructured":"Schensted C (1961) Longest increasing and decreasing subsequences. Can J Math 13:179\u2013191. https:\/\/doi.org\/10.4153\/CJM-1961-015-3","journal-title":"Can J Math"},{"issue":"5","key":"1074_CR91","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1101\/072116","volume":"27","author":"VA Schneider","year":"2017","unstructured":"Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS (2017) Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 27(5):849\u201364. https:\/\/doi.org\/10.1101\/072116","journal-title":"Genome Res"},{"key":"1074_CR92","doi-asserted-by":"publisher","first-page":"101804","DOI":"10.1016\/j.is.2021.101804","volume":"101","author":"E Schubert","year":"2021","unstructured":"Schubert E, Rousseeuw PJ (2021) Fast and eager k-medoids clustering: O(k) runtime improvement of the pam, clara, and CLARANS algorithms. Inf Syst 101:101804. https:\/\/doi.org\/10.1016\/j.is.2021.101804","journal-title":"Inf Syst"},{"key":"1074_CR93","doi-asserted-by":"publisher","first-page":"101804","DOI":"10.1016\/J.IS.2021.101804","volume":"101","author":"E Schubert","year":"2021","unstructured":"Schubert E, Rousseeuw PJ (2021) Fast and eager k-medoids clustering: O(k) runtime improvement of the pam, clara, and CLARANS algorithms. Inf Syst 101:101804. https:\/\/doi.org\/10.1016\/J.IS.2021.101804","journal-title":"Inf Syst"},{"issue":"1","key":"1074_CR94","doi-asserted-by":"publisher","first-page":"505","DOI":"10.1093\/nar\/12.1Part2.505","volume":"12","author":"R Staden","year":"1984","unstructured":"Staden R (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res 12(1):505\u2013519. https:\/\/doi.org\/10.1093\/nar\/12.1Part2.505","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"1074_CR95","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1007\/s10115-015-0862-3","volume":"47","author":"EC Stavropoulos","year":"2016","unstructured":"Stavropoulos EC, Verykios VS, Kagklis V (2016) A transversal hypergraph approach for the frequent itemset hiding problem. Knowl Inf Syst 47(3):625\u2013645. https:\/\/doi.org\/10.1007\/s10115-015-0862-3","journal-title":"Knowl Inf Syst"},{"issue":"1","key":"1074_CR96","doi-asserted-by":"publisher","first-page":"2542","DOI":"10.1038\/s41467-018-04964-5","volume":"9","author":"M Steinegger","year":"2018","unstructured":"Steinegger M, S\u00f6ding J (2018) Clustering huge protein sequence sets in linear time. Nat Commun 9(1):2542. https:\/\/doi.org\/10.1038\/s41467-018-04964-5","journal-title":"Nat Commun"},{"key":"1074_CR97","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1002\/ajmg.1571","volume":"106","author":"OK Steinlein","year":"2001","unstructured":"Steinlein OK (2001) Genes and mutations in idiopathic epilepsy. Am J Med Genet 106:139\u2013145. https:\/\/doi.org\/10.1002\/ajmg.1571","journal-title":"Am J Med Genet"},{"issue":"1","key":"1074_CR98","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1093\/bioinformatics\/btr597","volume":"28","author":"DJ Stekhoven","year":"2012","unstructured":"Stekhoven DJ, B\u00fchlmann P (2012) Missforest - non-parametric missing value imputation for mixed-type data. Bioinform 28(1):112\u2013118. https:\/\/doi.org\/10.1093\/bioinformatics\/btr597","journal-title":"Bioinform"},{"issue":"6","key":"1074_CR99","doi-asserted-by":"publisher","first-page":"926","DOI":"10.1093\/bioinformatics\/btu739","volume":"31","author":"BE Suzek","year":"2015","unstructured":"Suzek BE, Wang Y, Huang H et al (2015) Uniref clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinform 31(6):926\u2013932. https:\/\/doi.org\/10.1093\/bioinformatics\/btu739","journal-title":"Bioinform"},{"issue":"6","key":"1074_CR100","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1002\/sam.11348","volume":"10","author":"F Tang","year":"2017","unstructured":"Tang F, Ishwaran H (2017) Random forest missing data algorithms. Stat Anal Data Min 10(6):363\u2013377. https:\/\/doi.org\/10.1002\/sam.11348","journal-title":"Stat Anal Data Min"},{"issue":"6","key":"1074_CR101","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","volume":"17","author":"O Troyanskaya","year":"2001","unstructured":"Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520\u2013525. https:\/\/doi.org\/10.1093\/bioinformatics\/17.6.520","journal-title":"Bioinformatics"},{"key":"1074_CR102","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2105-9-202","volume":"9","author":"J Tuikkala","year":"2008","unstructured":"Tuikkala J, Elo LL, Nevalainen OS et al (2008) Missing value imputation improves clustering and interpretation of gene expression microarray data. BMC Bioinform 9:1\u20134. https:\/\/doi.org\/10.1186\/1471-2105-9-202","journal-title":"BMC Bioinform"},{"issue":"1","key":"1074_CR103","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/0304-3975(92)90143-4","volume":"92","author":"E Ukkonen","year":"1992","unstructured":"Ukkonen E (1992) Approximate string matching with q-grams and maximal matches. Theor Comput Sci 92(1):191\u2013211. https:\/\/doi.org\/10.1016\/0304-3975(92)90143-4","journal-title":"Theor Comput Sci"},{"issue":"3","key":"1074_CR104","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1007\/BF01206331","volume":"14","author":"E Ukkonen","year":"1995","unstructured":"Ukkonen E (1995) On-line construction of suffix trees. Algorithmica 14(3):249\u2013260. https:\/\/doi.org\/10.1007\/BF01206331","journal-title":"Algorithmica"},{"key":"1074_CR105","doi-asserted-by":"publisher","unstructured":"Vreeken J, Siebes A (2008) Filling in the blanks - krimp minimisation for missing data. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, pp 1067\u20131072, https:\/\/doi.org\/10.1109\/ICDM.2008.40","DOI":"10.1109\/ICDM.2008.40"},{"key":"1074_CR106","doi-asserted-by":"publisher","unstructured":"Wellenzohn K, B\u00f6hlen MH, Dign\u00f6s A, et\u00a0al (2017) Continuous imputation of missing values in streams of pattern-determining time series. In: Proceedings of the 20th International Conference on Extending Database Technology (EDBT). OpenProceedings.org, pp 330\u2013341, https:\/\/doi.org\/10.5441\/002\/edbt.2017.30","DOI":"10.5441\/002\/edbt.2017.30"},{"key":"1074_CR107","doi-asserted-by":"publisher","first-page":"10024","DOI":"10.1109\/ACCESS.2017.2702281","volume":"5","author":"JM Wu","year":"2017","unstructured":"Wu JM, Zhan J, Lin JC (2017) Ant colony system sanitization approach to hiding sensitive itemsets. IEEE Access 5:10024\u201310039. https:\/\/doi.org\/10.1109\/ACCESS.2017.2702281","journal-title":"IEEE Access"},{"issue":"1","key":"1074_CR108","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1109\/TKDE.2007.250583","volume":"19","author":"Y Wu","year":"2007","unstructured":"Wu Y, Chiang C, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19(1):29\u201342. https:\/\/doi.org\/10.1109\/TKDE.2007.250583","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"6","key":"1074_CR109","doi-asserted-by":"publisher","first-page":"2526","DOI":"10.1073\/pnas.74.6.2526","volume":"74","author":"C Wuilmart","year":"1977","unstructured":"Wuilmart C, Urbain J, Givol D (1977) On the location of palindromes in immunoglobulin genes. Proc Natl Acad Sci 74(6):2526\u20132530. https:\/\/doi.org\/10.1073\/pnas.74.6.2526","journal-title":"Proc Natl Acad Sci"},{"key":"1074_CR110","doi-asserted-by":"publisher","unstructured":"Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: Proceedings of the 19th International Conference on Data Engineering. IEEE Computer Society, pp 101\u2013112, https:\/\/doi.org\/10.1109\/ICDE.2003.1260785","DOI":"10.1109\/ICDE.2003.1260785"},{"key":"1074_CR111","doi-asserted-by":"publisher","unstructured":"Ying JJC, Lee WC, Weng TC, et\u00a0al (2011) Semantic trajectory mining for location prediction. In: 19th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, ACM-GIS 2011, November 1-4, 2011, Chicago, IL, USA, Proceedings. ACM, pp 34\u201343, https:\/\/doi.org\/10.1145\/2093973.2093980","DOI":"10.1145\/2093973.2093980"},{"issue":"5","key":"1074_CR112","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3409382","volume":"53","author":"K Yu","year":"2020","unstructured":"Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, Wu X (2020) Causality-based feature selection: methods and evaluations. ACM Comput Surv 53(5):1\u201336. https:\/\/doi.org\/10.1145\/3409382","journal-title":"ACM Comput Surv"},{"issue":"4","key":"1074_CR113","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3488055","volume":"16","author":"K Yu","year":"2022","unstructured":"Yu K, Yang Y, Ding W (2022) Causal feature selection with missing data. ACM Trans Knowl Discov Data 16(4):1\u201324. https:\/\/doi.org\/10.1145\/3488055","journal-title":"ACM Trans Knowl Discov Data"},{"key":"1074_CR114","doi-asserted-by":"publisher","unstructured":"Zhang H, Zhang Q (2017) Embedjoin: Efficient edit similarity joins via embeddings. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 585\u2013594, https:\/\/doi.org\/10.1145\/3097983.3098003","DOI":"10.1145\/3097983.3098003"},{"issue":"4","key":"1074_CR115","doi-asserted-by":"publisher","first-page":"6618","DOI":"10.1109\/JIOT.2019.2909038","volume":"6","author":"Y Zhang","year":"2019","unstructured":"Zhang Y, Thorburn PJ, Xiang W et al (2019) SSIM - A deep learning approach for recovering missing time series sensor data. IEEE Internet Things J 6(4):6618\u20136628. https:\/\/doi.org\/10.1109\/JIOT.2019.2909038","journal-title":"IEEE Internet Things J"},{"issue":"5","key":"1074_CR116","doi-asserted-by":"publisher","first-page":"1285","DOI":"10.1109\/TKDE.2015.2510010","volume":"28","author":"C Zhou","year":"2016","unstructured":"Zhou C, Cule B, Goethals B (2016) Pattern based sequence classification. IEEE Trans Knowl Data Eng 28(5):1285\u20131298. https:\/\/doi.org\/10.1109\/TKDE.2015.2510010","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"6","key":"1074_CR117","doi-asserted-by":"publisher","first-page":"2425","DOI":"10.1109\/TKDE.2019.2956530","volume":"33","author":"X Zhu","year":"2021","unstructured":"Zhu X, Yang J, Zhang C et al (2021) Efficient utilization of missing data in cost-sensitive learning. IEEE Trans Knowl Data Eng 33(6):2425\u20132436. https:\/\/doi.org\/10.1109\/TKDE.2019.2956530","journal-title":"IEEE Trans Knowl Data Eng"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-024-01074-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-024-01074-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-024-01074-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T02:48:30Z","timestamp":1741574910000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-024-01074-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,22]]},"references-count":117,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3]]}},"alternative-id":["1074"],"URL":"https:\/\/doi.org\/10.1007\/s10618-024-01074-3","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,22]]},"assertion":[{"value":"25 October 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 September 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"12"}}