{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,26]],"date-time":"2025-03-26T05:16:20Z","timestamp":1742966180738,"version":"3.40.3"},"publisher-location":"Cham","reference-count":16,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783031170294"},{"type":"electronic","value":"9783031170300"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:00:00Z","timestamp":1675296000000},"content-version":"vor","delay-in-days":397,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent years, a rapidly increasing amount of information has been made publicly available in tabular form on the Web. Many of these data are not usable due to their poor quality (e.g., misspelled or missing values, missing or incomplete metadata, and missing meaningful columns). Solutions have been proposed in the literature to address these data quality issues, but there is still a lack of all-in-one approaches that can fully solve them. Therefore, users need to use several methods to solve these data quality issues. In this paper, we present an all-in-one and automatic approach called SINATRA that helps to bridge this gaps by providing the following features: <jats:italic>data annotation<\/jats:italic> (to address misspelled and incomplete metadata issues), <jats:italic>data repair<\/jats:italic> (to address missing values (data) issues), and <jats:italic>data augmentation<\/jats:italic> (to dynamically add meaningful columns and corresponding cell values to the dataset). An evaluation of the SINATRA approach based on datasets from a state-of-the-art benchmark shows promising results in terms of F1-measure and precision.<\/jats:p>","DOI":"10.1007\/978-3-031-17030-0_6","type":"book-chapter","created":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T13:40:37Z","timestamp":1675258837000},"page":"65-77","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Improving the\u00a0Usability of\u00a0Tabular Data Through Data Annotation, Repair and\u00a0Augmentation"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4005-2633","authenticated-orcid":false,"given":"Rabeb","family":"Abida","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anthony","family":"Cleve","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,2,2]]},"reference":[{"key":"6_CR1","unstructured":"Abdelmageed, N., Schindler, S.: JenTab meets SemTab 2021\u2019s new challenges. In: SemTab@ ISWC, pp. 42\u201353 (2021)"},{"key":"6_CR2","doi-asserted-by":"crossref","unstructured":"Abida, R., Belghith, E.H., Cleve, A.: An end-to-end framework for integrating and publishing linked open government data. In: 2020 IEEE 29th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 257\u2013262. IEEE (2020)","DOI":"10.1109\/WETICE49692.2020.00057"},{"key":"6_CR3","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"722","DOI":"10.1007\/978-3-540-76298-0_52","volume-title":"The Semantic Web","author":"S Auer","year":"2007","unstructured":"Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC\/ISWC -2007. LNCS, vol. 4825, pp. 722\u2013735. Springer, Heidelberg (2007). https:\/\/doi.org\/10.1007\/978-3-540-76298-0_52"},{"key":"6_CR4","unstructured":"Azzi, R., et al.: AMALGAM: making tabular dataset explicit with knowledge graph. In: SemTab@ ISWC, pp. 9\u201316 (2020)"},{"key":"6_CR5","unstructured":"Benedetti, F., Bergamaschi, S., Po, L.: Online index extraction from linked open data sources. In: Second International Workshop on Linked Data for Information Extraction (LD4IE), vol. 1267, pp. 9\u201320. DEU (2014)"},{"key":"6_CR6","doi-asserted-by":"crossref","unstructured":"Chen, J., Jim\u00e9nez-Ruiz, E., Horrocks, I., Sutton, C.: ColNet: embedding the semantics of web tables for column type prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 29\u201336 (2019)","DOI":"10.1609\/aaai.v33i01.330129"},{"key":"6_CR7","doi-asserted-by":"crossref","unstructured":"Chen, J., Jim\u00e9nez-Ruiz, E., Horrocks, I., Sutton, C.: Learning semantic annotations for tabular data. arXiv preprint arXiv:1906.00781 (2019)","DOI":"10.24963\/ijcai.2019\/289"},{"key":"6_CR8","unstructured":"Cremaschi, M., Avogadro, R., Chieregato, D.: MantisTable: an automatic approach for the semantic table interpretation. In: SemTab@ ISWC 2019, pp. 15\u201324 (2019)"},{"key":"6_CR9","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"514","DOI":"10.1007\/978-3-030-49461-2_30","volume-title":"The Semantic Web","author":"E Jim\u00e9nez-Ruiz","year":"2020","unstructured":"Jim\u00e9nez-Ruiz, E., Hassanzadeh, O., Efthymiou, V., Chen, J., Srinivas, K.: SemTab 2019: resources to benchmark tabular data to knowledge graph matching systems. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 514\u2013530. Springer, Cham (2020). https:\/\/doi.org\/10.1007\/978-3-030-49461-2_30"},{"key":"6_CR10","unstructured":"Jim\u00e9nez-Ruiz, E., Hassanzadeh, O., Efthymiou, V., Chen, J., Srinivas, K., Cutrona, V.: Results of SemTab 2020. In: CEUR Workshop Proceedings, vol. 2775, pp. 1\u20138 (2020)"},{"key":"6_CR11","unstructured":"Knap, T.: Towards odalic, a semantic table interpretation tool in the ADEQUATe project. In: LD4IE@ ISWC, pp. 26\u201337 (2017)"},{"key":"6_CR12","unstructured":"Nguyen, P., Kertkeidkachorn, N., Ichise, R., Takeda, H.: MTab: matching tabular data to knowledge graph using probability models. arXiv preprint arXiv:1910.00246 (2019)"},{"key":"6_CR13","unstructured":"Ongenae, F.: MAGIC: mining an augmented graph using INK, starting from a CSV (2021)"},{"issue":"4","key":"6_CR14","doi-asserted-by":"publisher","first-page":"393","DOI":"10.3233\/SW-170263","volume":"9","author":"D Roman","year":"2018","unstructured":"Roman, D., et al.: DataGraft: one-stop-shop for open data management. Semant. Web 9(4), 393\u2013411 (2018)","journal-title":"Semant. Web"},{"issue":"10","key":"6_CR15","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1145\/2629489","volume":"57","author":"D Vrande\u010di\u0107","year":"2014","unstructured":"Vrande\u010di\u0107, D., Kr\u00f6tzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78\u201385 (2014)","journal-title":"Commun. ACM"},{"key":"6_CR16","doi-asserted-by":"crossref","unstructured":"Zhang, S., Balog, K.: Ad hoc table retrieval using semantic similarity. In: Proceedings of the 2018 World Wide Web Conference, pp. 1553\u20131562 (2018)","DOI":"10.1145\/3178876.3186067"}],"container-title":["Communications in Computer and Information Science","Nordic Artificial Intelligence Research and Development"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-17030-0_6","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T14:14:16Z","timestamp":1675260856000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-17030-0_6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783031170294","9783031170300"],"references-count":16,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-17030-0_6","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"type":"print","value":"1865-0929"},{"type":"electronic","value":"1865-0937"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"2 February 2023","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"NAIS","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Symposium of the Norwegian AI Society","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Oslo","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Norway","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"31 May 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1 June 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"4","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"nais12022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/aisociety.no\/nais2022\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Easy chair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"17","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"11","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"65% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0.5","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Yes","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}