{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T19:05:09Z","timestamp":1767035109270,"version":"build-2065373602"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2013,7]]},"abstract":"<jats:p>Semi-structured data such as XML are popular for data interchange and storage. However, many XML documents have improper nesting where open - and close-tags are unmatched. Since some semi-structured data (e.g., Latex) have a flexible grammar and since many XML documents lack an accompanying DTD or XSD, we focus on computing a syntactic repair via the edit distance.<\/jats:p>\n          <jats:p>To solve this problem, we propose a dynamic programming algorithm which takes cubic time. While this algorithm is not scalable, well-formed substrings of the data can be pruned to enable faster computation. Unfortunately, there are still cases where the dynamic program could be very expensive; hence, we give branch-and-bound algorithms based on various combinations of two heuristics, called MinCost and MaxBenefit, that trade off between accuracy and efficiency. Finally, we experimentally demonstrate the performance of these algorithms on real data.<\/jats:p>","DOI":"10.14778\/2536360.2536361","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:17:57Z","timestamp":1403612277000},"page":"601-612","source":"Crossref","is-referenced-by-count":20,"title":["On repairing structural problems in semi-structured data"],"prefix":"10.14778","volume":"6","author":[{"given":"Flip","family":"Korn","sequence":"first","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Barna","family":"Saha","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Divesh","family":"Srivastava","sequence":"additional","affiliation":[{"name":"AT&amp;T Labs-Research"}]},{"given":"Shanshan","family":"Ying","sequence":"additional","affiliation":[{"name":"Nat'l Univ Singapore"}]}],"member":"320","published-online":{"date-parts":[[2013,7]]},"reference":[{"issue":"4","key":"e_1_2_1_1_1","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1137\/0201022","article-title":"A Minimum Distance Error-Correcting Parser for Context-Free Languages","volume":"1","author":"Aho A. V.","year":"1972","unstructured":"A. V. Aho , T. G. Peterson : A Minimum Distance Error-Correcting Parser for Context-Free Languages . SIAM J. Comput. 1 ( 4 ): 305 - 312 ( 1972 ). A. V. Aho, T. G. Peterson: A Minimum Distance Error-Correcting Parser for Context-Free Languages. SIAM J. Comput. 1(4): 305-312 (1972).","journal-title":"SIAM J. Comput."},{"issue":"1","key":"e_1_2_1_2_1","first-page":"197","article-title":"Sampling the Repairs of Functional Dependency Violations under Hard Constraints","volume":"3","author":"Beskales G.","year":"2010","unstructured":"G. Beskales , I. F. Ilyas and L. Golab : Sampling the Repairs of Functional Dependency Violations under Hard Constraints . PVLDB 3 ( 1 ): 197 - 207 ( 2010 ). G. Beskales, I. F. Ilyas and L. Golab: Sampling the Repairs of Functional Dependency Violations under Hard Constraints. PVLDB 3(1): 197-207 (2010).","journal-title":"PVLDB"},{"key":"e_1_2_1_3_1","unstructured":"G. J.\n      Bex F.\n      Neven T.\n      Schwentick K.\n  Tuyls: Inference of Concise DTDs from XML Data VLDB\n  2006\n  :  \n  115\n  -\n  126\n  .   G. J. Bex F. Neven T. Schwentick K. Tuyls: Inference of Concise DTDs from XML Data VLDB 2006 : 115-126."},{"key":"e_1_2_1_4_1","unstructured":"G. J.\n      Bex F.\n      Neven S.\n  Vansummeren: Inferring XML Schema Definitions from XML Data VLDB\n  2007\n  :  \n  998\n  -\n  1009\n  .   G. J. Bex F. Neven S. Vansummeren: Inferring XML Schema Definitions from XML Data VLDB 2007 : 998-1009."},{"key":"e_1_2_1_5_1","first-page":"97","article-title":"Correctors for XML Data","volume":"2004","author":"Boobna U.","unstructured":"U. Boobna and M. de Rougemont : Correctors for XML Data . XSym 2004 : 97 - 111 . U. Boobna and M. de Rougemont: Correctors for XML Data. XSym 2004: 97-111.","journal-title":"XSym"},{"volume-title":"SIGMOD 2005:  143-154","author":"Bohannon P.","key":"e_1_2_1_6_1","unstructured":"P. Bohannon , M. Flaster , W. Fan and R. Rastogi : A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification . SIGMOD 2005: 143-154 . P. Bohannon, M. Flaster, W. Fan and R. Rastogi: A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification. SIGMOD 2005: 143-154."},{"key":"e_1_2_1_7_1","first-page":"43","article-title":"What are real DTDs like?","volume":"2002","author":"Choi B.","unstructured":"B. Choi : What are real DTDs like? WebDB 2002 : 43 - 48 . B. Choi: What are real DTDs like? WebDB 2002: 43-48.","journal-title":"WebDB"},{"volume-title":"A. Marian: Detecting Changes in XML Documents. ICDE 2002:  41-52","author":"Cobena G.","key":"e_1_2_1_8_1","unstructured":"G. Cobena , S. Abiteboul , A. Marian: Detecting Changes in XML Documents. ICDE 2002: 41-52 . G. Cobena, S. Abiteboul, A. Marian: Detecting Changes in XML Documents. ICDE 2002: 41-52."},{"issue":"3","key":"e_1_2_1_9_1","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.is.2004.11.009","article-title":"A Methodology for Clustering XML Documents by Structure","volume":"31","author":"Dalamagas T.","year":"2006","unstructured":"T. Dalamagas , T. Cheng , K. Winkel , T. K. Sellis : A Methodology for Clustering XML Documents by Structure . Inf. Syst. 31 ( 3 ): 187 - 228 ( 2006 ). T. Dalamagas, T. Cheng, K. Winkel, T. K. Sellis: A Methodology for Clustering XML Documents by Structure. Inf. Syst. 31(3): 187-228 (2006).","journal-title":"Inf. Syst."},{"volume-title":"SIGMOD 2011:  469-480","author":"Fan W.","key":"e_1_2_1_10_1","unstructured":"W. Fan , J. Li , S. Ma , N. Tang and W. Yu : Interaction between record matching and data repairing . SIGMOD 2011: 469-480 . W. Fan, J. Li, S. Ma, N. Tang and W. Yu: Interaction between record matching and data repairing. SIGMOD 2011: 469-480."},{"issue":"3","key":"e_1_2_1_11_1","first-page":"19","article-title":"DTD Inference from XML Documents","volume":"26","author":"Garofalakis M. N.","year":"2003","unstructured":"M. N. Garofalakis , A. Gionis , R. Rastogi , S. Seshadri and K. Shim : DTD Inference from XML Documents : The XTRACT Approach. IEEE Data Eng. Bull. 26 ( 3 ): 19 - 25 ( 2003 ). M. N. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri and K. Shim: DTD Inference from XML Documents: The XTRACT Approach. IEEE Data Eng. Bull. 26(3): 19-25 (2003).","journal-title":"The XTRACT Approach. IEEE Data Eng. Bull."},{"issue":"5","key":"e_1_2_1_12_1","doi-asserted-by":"crossref","first-page":"447","DOI":"10.1016\/S0019-9958(67)91165-5","article-title":"Language Identification in the Limit","volume":"10","author":"Gold E. M.","year":"1967","unstructured":"E. M. Gold : Language Identification in the Limit , Information and Control 10 ( 5 ): 447 - 474 ( 1967 ). E. M. Gold: Language Identification in the Limit, Information and Control 10(5): 447-474 (1967).","journal-title":"Information and Control"},{"volume-title":"T. Yu: Approximate XML Joins. SIGMOD 2002:  287-298","author":"Guha S.","key":"e_1_2_1_13_1","unstructured":"S. Guha , H. V. Jagadish , N. Koudas , D. Srivastava , T. Yu: Approximate XML Joins. SIGMOD 2002: 287-298 . S. Guha, H. V. Jagadish, N. Koudas, D. Srivastava, T. Yu: Approximate XML Joins. SIGMOD 2002: 287-298."},{"issue":"4","key":"e_1_2_1_14_1","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1137\/0202025","article-title":"The Hardest Context-Free Language","volume":"2","author":"Greibach S. A.","year":"1973","unstructured":"S. A. Greibach : The Hardest Context-Free Language , SIAM J. Comput. 2 ( 4 ): 304 - 310 ( 1973 ). S. A. Greibach: The Hardest Context-Free Language, SIAM J. Comput. 2(4): 304-310 (1973).","journal-title":"SIAM J. Comput."},{"volume-title":"CIKM 2011:  1719-1724","author":"Grijzenhout S.","key":"e_1_2_1_15_1","unstructured":"S. Grijzenhout and M. Marx : The quality of the XML web . CIKM 2011: 1719-1724 . S. Grijzenhout and M. Marx: The quality of the XML web. CIKM 2011: 1719-1724."},{"issue":"1","key":"e_1_2_1_16_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505241.505242","article-title":"Fast context-free grammar parsing requires fast boolean matrix multiplication","volume":"49","author":"Lee L.","year":"2002","unstructured":"L. Lee : Fast context-free grammar parsing requires fast boolean matrix multiplication , J. ACM 49 ( 1 ): 1 - 15 ( 2002 ). L. Lee: Fast context-free grammar parsing requires fast boolean matrix multiplication, J. ACM 49(1): 1-15 (2002).","journal-title":"J. ACM"},{"issue":"8","key":"e_1_2_1_17_1","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions, and reversals","volume":"10","author":"Levenshtein V. I.","year":"1966","unstructured":"V. I. Levenshtein : Binary codes capable of correcting deletions, insertions, and reversals . Soviet Physics Doklady 10 ( 8 ): 707 - 710 ( 1966 ). V. I. Levenshtein: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8): 707-710 (1966).","journal-title":"Soviet Physics Doklady"},{"key":"e_1_2_1_18_1","doi-asserted-by":"crossref","unstructured":"F.\n      Magniez C.\n      Mathieu\n     and \n      A.\n      Nayak Recognizing well-parenthesized expressions in the streaming model STOC\n  2010\n  :  \n  261\n  -\n  270\n  .   F. Magniez C. Mathieu and A. Nayak Recognizing well-parenthesized expressions in the streaming model STOC 2010 : 261-270.","DOI":"10.1145\/1806689.1806727"},{"issue":"2","key":"e_1_2_1_19_1","first-page":"85","volume":"54","author":"Myers G.","year":"1995","unstructured":"G. Myers : Approximately Matching Context-Free Languages. Inf. Process. Lett. 54 ( 2 ): 85 - 92 ( 1995 ). G. Myers: Approximately Matching Context-Free Languages. Inf. Process. Lett. 54(2): 85-92 (1995).","journal-title":"Approximately Matching Context-Free Languages. Inf. Process. Lett."},{"issue":"1","key":"e_1_2_1_20_1","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1145\/375360.375365","article-title":"A guided tour to approximate string matching","volume":"33","author":"Navarro G.","year":"2001","unstructured":"G. Navarro . A guided tour to approximate string matching . ACM Comput. Surv. , 33 ( 1 ): 31 - 88 ( 2001 ). G. Navarro. A guided tour to approximate string matching. ACM Comput. Surv., 33(1):31-88 (2001).","journal-title":"ACM Comput. Surv."},{"issue":"4","key":"e_1_2_1_21_1","first-page":"334","article-title":"RTED","volume":"5","author":"Pawlik M.","year":"2011","unstructured":"M. Pawlik , N. Augsten : RTED : A Robust Algorithm for the Tree Edit Distance. PVLDB 5 ( 4 ): 334 - 345 ( 2011 ). M. Pawlik, N. Augsten: RTED: A Robust Algorithm for the Tree Edit Distance. PVLDB 5(4): 334-345 (2011).","journal-title":"A Robust Algorithm for the Tree Edit Distance. PVLDB"},{"key":"e_1_2_1_22_1","volume-title":"Automated Repair of HTML Generation Errors in PHP Applications Using String Constraint Solving. Int'l Conf. Software Engineering","author":"Samimi H.","year":"2012","unstructured":"H. Samimi , M. Schaefer , S. Artzi , T. Millstein , F. Tip and L. Hendren : Automated Repair of HTML Generation Errors in PHP Applications Using String Constraint Solving. Int'l Conf. Software Engineering 2012 . H. Samimi, M. Schaefer, S. Artzi, T. Millstein, F. Tip and L. Hendren: Automated Repair of HTML Generation Errors in PHP Applications Using String Constraint Solving. Int'l Conf. Software Engineering 2012."},{"volume-title":"Sirangelo: Constant-Memory Validation of Streaming XML Documents Against DTDs. ICDT 2007:  299-313","author":"Segoufin L.","key":"e_1_2_1_23_1","unstructured":"L. Segoufin and C. Sirangelo: Constant-Memory Validation of Streaming XML Documents Against DTDs. ICDT 2007: 299-313 . L. Segoufin and C. Sirangelo: Constant-Memory Validation of Streaming XML Documents Against DTDs. ICDT 2007: 299-313."},{"volume-title":"Vianu: Validating Streaming XML Documents. PODS 2002:  53-64","author":"Segoufin L.","key":"e_1_2_1_24_1","unstructured":"L. Segoufin and V. Vianu: Validating Streaming XML Documents. PODS 2002: 53-64 . L. Segoufin and V. Vianu: Validating Streaming XML Documents. PODS 2002: 53-64."},{"volume-title":"Chomicki: Validity-Sensitive Querying of XML Databases. EDBT Workshops 2006:  164-177","author":"Staworko S.","key":"e_1_2_1_25_1","unstructured":"S. Staworko and J. Chomicki: Validity-Sensitive Querying of XML Databases. EDBT Workshops 2006: 164-177 . S. Staworko and J. Chomicki: Validity-Sensitive Querying of XML Databases. EDBT Workshops 2006: 164-177."},{"volume-title":"SAC 2005:  647-653","author":"Suzuki N.","key":"e_1_2_1_26_1","unstructured":"N. Suzuki : Finding an optimum edit script between an XML document and a DTD . SAC 2005: 647-653 . N. Suzuki: Finding an optimum edit script between an XML document and a DTD. SAC 2005: 647-653."},{"key":"e_1_2_1_27_1","volume-title":"Kruskal: Time Warps, String Edits and Macromolecules Theory and Practice of Sequence Comparison","author":"Sankoff D.","year":"1983","unstructured":"D. Sankoff , J. B. Kruskal: Time Warps, String Edits and Macromolecules Theory and Practice of Sequence Comparison , 1983 . Addison Wesley . D. Sankoff, J. B. Kruskal: Time Warps, String Edits and Macromolecules Theory and Practice of Sequence Comparison, 1983. Addison Wesley."},{"volume-title":"ICER 2011:  85-92","author":"Tabanao E. S.","key":"e_1_2_1_28_1","unstructured":"E. S. Tabanao , M. M. T. Rodrigo , M. C. Jadud : Predicting at-risk novice Java programmers through the analysis of online protocols . ICER 2011: 85-92 . E. S. Tabanao, M. M. T. Rodrigo, M. C. Jadud: Predicting at-risk novice Java programmers through the analysis of online protocols. ICER 2011: 85-92."},{"volume-title":"J. Cai: X-Diff: An Effective Change Detection Algorithm for XML Documents. ICDE 2003:  519-530","author":"Wang Y.","key":"e_1_2_1_29_1","unstructured":"Y. Wang , D. J. DeWitt , J. Cai: X-Diff: An Effective Change Detection Algorithm for XML Documents. ICDE 2003: 519-530 . Y. Wang, D. J. DeWitt, J. Cai: X-Diff: An Effective Change Detection Algorithm for XML Documents. ICDE 2003: 519-530."},{"issue":"1","key":"e_1_2_1_30_1","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1145\/321796.321811","article-title":"The string-to-string correction problem","volume":"21","author":"Wagner R. A.","year":"1974","unstructured":"R. A. Wagner and M. J. Fischer . The string-to-string correction problem . Journal of the ACM , 21 ( 1 ): 168 - 173 ( 1974 ). R. A. Wagner and M. J. Fischer. The string-to-string correction problem. Journal of the ACM, 21(1):168-173 (1974).","journal-title":"Journal of the ACM"},{"issue":"6","key":"e_1_2_1_31_1","doi-asserted-by":"crossref","first-page":"1245","DOI":"10.1137\/0218082","article-title":"Simple Fast Algorithms for the Editing distance between Trees and Related Problems","volume":"18","author":"Zhang K.","year":"1989","unstructured":"K. Zhang and D. Shasha : Simple Fast Algorithms for the Editing distance between Trees and Related Problems . SIAM J. Computing , 18 ( 6 ): 1245 - 1262 ( 1989 ). K. Zhang and D. Shasha: Simple Fast Algorithms for the Editing distance between Trees and Related Problems. SIAM J. Computing, 18(6): 1245-1262 (1989).","journal-title":"SIAM J. Computing"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2536360.2536361","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:49:33Z","timestamp":1672220973000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2536360.2536361"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,7]]},"references-count":31,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2013,7]]}},"alternative-id":["10.14778\/2536360.2536361"],"URL":"https:\/\/doi.org\/10.14778\/2536360.2536361","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2013,7]]}}}