{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T22:35:18Z","timestamp":1776206118962,"version":"3.50.1"},"publisher-location":"Cham","reference-count":31,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783030783068","type":"print"},{"value":"9783030783075","type":"electronic"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T00:00:00Z","timestamp":1651190400000},"content-version":"vor","delay-in-days":118,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Data enrichment is a critical task in the data preparation process in which a dataset is extended with additional information from various sources to perform analyses or add meaningful context. Facilitating the enrichment process design for data workers and supporting its execution on large datasets are only supported to a limited extent by existing solutions. Harnessing semantics at scale can be a crucial factor in effectively addressing this challenge. This chapter presents a comprehensive approach covering both design- and run-time aspects of tabular data enrichment and discusses our experience in making this process scalable. We illustrate how data enrichment steps of a Big Data pipeline can be implemented via tabular transformations exploiting semantic table annotation methods and discuss techniques devised to support the enactment of the resulting process on large tabular datasets. Furthermore, we present results from experimental evaluations in which we tested the scalability and run-time efficiency of the proposed cloud-based approach, enriching massive datasets with promising performance.<\/jats:p>","DOI":"10.1007\/978-3-030-78307-5_2","type":"book-chapter","created":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T07:03:15Z","timestamp":1651129395000},"page":"19-39","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Supporting Semantic Data Enrichment at Scale"],"prefix":"10.1007","author":[{"given":"Michele","family":"Ciavotta","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vincenzo","family":"Cutrona","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Flavio","family":"De Paoli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Nikolay","family":"Nikolov","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matteo","family":"Palmonari","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dumitru","family":"Roman","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,4,29]]},"reference":[{"key":"2_CR1","unstructured":"IDC. (2019). Worldwide semiannual big data and analytics spending guide. https:\/\/www.idc.com\/getdoc.jsp?containerId=IDC_P33195"},{"key":"2_CR2","unstructured":"Zillner, S., Curry, E., Metzger, A., Auer, S., & Seidl, R. (Eds.). (2017). European big data value strategic research & innovation agenda."},{"key":"2_CR3","unstructured":"Lohr, S. (2014). For big-data scientists, \u2018janitor work\u2019 is key hurdle to insights. NY Times, 17."},{"key":"2_CR4","unstructured":"Furche, T., Gottlob, G., Libkin, L., Orsi, G., & Paton, N. W. (2016). Data wrangling for big data: Challenges and opportunities. In EDBT (pp. 473\u2013478)."},{"key":"2_CR5","doi-asserted-by":"crossref","unstructured":"\u010creslovnik, D., Ko\u0161merlj, A., & Ciavotta, M. (2018). Using historical and weather data for marketing and category management in ecommerce: The experience of EW-shopp. In Proceedings of ECSA \u201918 (pp. 31:1\u201331:5). ACM.","DOI":"10.1145\/3241403.3241436"},{"key":"2_CR6","doi-asserted-by":"crossref","unstructured":"Beneventano, D., & Vincini, M. (2019). Foreword to the special issue: \u201cSemantics for big data integration\u201d. Information, 10, 68.","DOI":"10.3390\/info10020068"},{"key":"2_CR7","doi-asserted-by":"crossref","unstructured":"Koutsomitropoulos, D., Likothanassis, S., & Kalnis, P. (2019). Semantics in the deep: Semantic analytics for big data. Data, 4, 63.","DOI":"10.3390\/data4020063"},{"key":"2_CR8","doi-asserted-by":"crossref","unstructured":"Zhuge, H., & Sun, X. (2019). Semantics, knowledge, and grids at the age of big data and AI. Concurrency Computation, 31.","DOI":"10.1002\/cpe.v31.3"},{"key":"2_CR9","doi-asserted-by":"crossref","unstructured":"Knoblock, C. A., Szekely, P., Ambite, J. L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., & Mallick, P. (2012). Semi-automatically mapping structured sources into the semantic web. In The semantic web: Research and applications (pp. 375\u2013390).","DOI":"10.1007\/978-3-642-30284-8_32"},{"key":"2_CR10","doi-asserted-by":"crossref","unstructured":"Ritze, D., Lehmberg, O., Bizer, C. (2015). Matching HTML tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, WIMS 2015, Larnaca, Cyprus, July 13\u201315, 2015 (pp. 10:1\u201310:6).","DOI":"10.1145\/2797115.2797118"},{"key":"2_CR11","doi-asserted-by":"crossref","unstructured":"Ermilov, I., & Ngomo, A. C. N. (2016). Taipan: Automatic property mapping for tabular data. In Knowledge engineering and knowledge management (pp. 163\u2013179).","DOI":"10.1007\/978-3-319-49004-5_11"},{"key":"2_CR12","doi-asserted-by":"crossref","unstructured":"Kruit, B., Boncz, P., & Urbani, J. (2019). Extracting novel facts from tables for knowledge graph completion. In The semantic web \u2013 ISWC 2019 (pp. 364\u2013381). Springer.","DOI":"10.1007\/978-3-030-30793-6_21"},{"key":"2_CR13","unstructured":"Chabot, Y., Labb\u00e9, T., Liu, J., & Troncy, R. (2019). DAGOBAH: An end-to-end context-free tabular data semantic annotation system. In Proceedings of SemTab@ISWC 2019. CEUR Workshop Proceedings (Vol. 2553, pp. 41\u201348). CEUR-WS.org."},{"key":"2_CR14","doi-asserted-by":"crossref","unstructured":"Nikolov, N., Ciavotta, M., & De Paoli, F. (2018). Data wrangling at scale: The experience of ew-shopp. In Proceedings of the 12th European Conference on Software Architecture: Companion Proceedings (pp. 32:1\u201332:4). ECSA \u201918, ACM.","DOI":"10.1145\/3241403.3241437"},{"key":"2_CR15","unstructured":"Zillner, S., Bisset, D., Milano, M., Curry, E., Garc\u00eca Robles, A., Hahn, T., Irgens, M., Lafrenz, R., Liepert, B., O\u2019Sullivan, B., & Smeulders, A. (Eds.). (2020). Strategic research, innovation and deployment agenda \u2013 AI, data and robotics partnership. third release. Brussels. BDVA, EU-Robotics, ELLIS, EurAI and CLAIRE (September 2020)."},{"key":"2_CR16","doi-asserted-by":"crossref","unstructured":"Sukhobok, D., Nikolov, N., Pultier, A., Ye, X., Berre, A., Moynihan, R., Roberts, B., Elves\u00e6ter, B., Mahasivam, N., & Roman, D. (2016). Tabular data cleaning and linked data generation with grafterizer. In ISWC (pp. 134\u2013139). Springer.","DOI":"10.1007\/978-3-319-47602-5_27"},{"key":"2_CR17","unstructured":"Cutrona, V., Ciavotta, M., Paoli, F. D., & Palmonari, M. (2019). ASIA: A tool for assisted semantic interpretation and annotation of tabular data. In Proceedings of the ISWC 2019 Satellite Tracks. CEUR Workshop Proceedings (Vol. 2456, pp. 209\u2013212)."},{"issue":"4","key":"2_CR18","doi-asserted-by":"publisher","first-page":"393","DOI":"10.3233\/SW-170263","volume":"9","author":"D Roman","year":"2018","unstructured":"Roman, D., Nikolov, N., Putlier, A., Sukhobok, D., Elves\u00e6ter, B., Berre, A., Ye, X., Dimitrov, M., Simov, A., Zarev, M., Moynihan, R., Roberts, B., Berlocher, I., Kim, S., Lee, T., Smith, A., & Heath, T. (2018). Datagraft: One-stop-shop for open data management. Semantic Web, 9(4), 393\u2013411.","journal-title":"Semantic Web"},{"key":"2_CR19","doi-asserted-by":"crossref","unstructured":"Palmonari, M., Rula, A., Porrini, R., Maurino, A., Spahiu, B., & Ferme, V. (2015). ABSTAT: Linked data summaries with abstraction and statistics. In ISWC (pp. 128\u2013132).","DOI":"10.1007\/978-3-319-25639-9_25"},{"issue":"1","key":"2_CR20","first-page":"4","volume":"9","author":"M Stonebraker","year":"1986","unstructured":"Stonebraker, M. (1986). The case for shared nothing. IEEE Database Engineering Bulletin, 9(1), 4\u20139.","journal-title":"IEEE Database Engineering Bulletin"},{"key":"2_CR21","doi-asserted-by":"crossref","unstructured":"Dessalk, Y.D., Nikolov, N., Matskin, M., Soylu, A., & Roman, D. (2020). Scalable execution of big data workflows using software containers. In Proceedings of the 12th International Conference on Management of Digital EcoSystems (pp. 76\u201383).","DOI":"10.1145\/3415958.3433082"},{"key":"2_CR22","unstructured":"Wind, D. (2013). Instant effective caching with ehcache. Packt Publishing."},{"key":"2_CR23","doi-asserted-by":"crossref","unstructured":"Fette, I., & Melnikov, A. (2011). The websocket protocol. Technical Report RFC 6455, IETF.","DOI":"10.17487\/rfc6455"},{"key":"2_CR24","doi-asserted-by":"crossref","unstructured":"Sumaray, A., & Makki, S. K. (2012). A comparison of data serialization formats for optimal efficiency on a mobile platform. In Proceedings of ICUIMC \u201912.","DOI":"10.1145\/2184751.2184810"},{"key":"2_CR25","doi-asserted-by":"crossref","unstructured":"Sukhobok, D., Nikolov, N., & Roman, D. (2017). Tabular data anomaly patterns. In 2017 International Conference on Big Data Innovations and Applications (Innovate-Data) (pp. 25\u201334).","DOI":"10.1109\/Innovate-Data.2017.10"},{"issue":"4","key":"2_CR26","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1145\/2935694.2935702","volume":"44","author":"H Wang","year":"2015","unstructured":"Wang, H., Li, M., Bu, Y., Li, J., Gao, H., & Zhang, J. (2015). Cleanix: a parallel big data cleaning system. SIGMOD Record, 44(4), 35\u201340.","journal-title":"SIGMOD Record"},{"issue":"1","key":"2_CR27","first-page":"1338","volume":"3","author":"G Limaye","year":"2010","unstructured":"Limaye, G., Sarawagi, S., & Chakrabarti, S. (2010). Annotating and searching web tables using entities, types and relationships. PVLDB, 3(1), 1338\u20131347.","journal-title":"PVLDB"},{"issue":"1","key":"2_CR28","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1109\/MIS.2018.111144556","volume":"33","author":"M Kejriwal","year":"2018","unstructured":"Kejriwal, M., Szekely, P. A., & Knoblock, C. A. (2018). Investigative knowledge discovery for combating illicit activities. IEEE Intelligent Systems, 33(1), 53\u201363.","journal-title":"IEEE Intelligent Systems"},{"key":"2_CR29","unstructured":"Sutton, L., Nikolov, N., Ciavotta, M., & Ko\u0161merlj, A. (2019). D3.5 EW-Shopp components as a service: Final Release. https:\/\/www.ew-shopp.eu\/wp-content\/uploads\/2020\/02\/EW-Shopp_D3.5_Components-as-a-service_release_v1.1-SUBMITTED_Low.pdf"},{"key":"2_CR30","doi-asserted-by":"crossref","unstructured":"Cutrona, V., Bianchi, F., Jim\u00e9nez-Ruiz, E., & Palmonari, M. (2020). Tough tables: Carefully evaluating entity linking for tabular data. In ISWC.","DOI":"10.1007\/978-3-030-62466-8_21"},{"issue":"4","key":"2_CR31","doi-asserted-by":"publisher","first-page":"463","DOI":"10.3233\/SW-150205","volume":"7","author":"IF Cruz","year":"2016","unstructured":"Cruz, I. F., Palmonari, M., Loprete, F., Stroe, C., & Taheri, A. (2016). Quality-based model for effective and robust multi-user pay-as-you-go ontology matching. Semantic Web, 7(4), 463\u2013479.","journal-title":"Semantic Web"}],"container-title":["Technologies and Applications for Big Data Value"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-78307-5_2","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,28]],"date-time":"2022-04-28T07:11:12Z","timestamp":1651129872000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-030-78307-5_2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783030783068","9783030783075"],"references-count":31,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-78307-5_2","relation":{},"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"29 April 2022","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}}]}}