{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T06:27:31Z","timestamp":1763706451841,"version":"build-2065373602"},"reference-count":79,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,12,5]],"date-time":"2022-12-05T00:00:00Z","timestamp":1670198400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Web systems have become a valuable source of semi-structured and streaming data. In this sense, Entity Resolution (ER) has become a key solution for integrating multiple data sources or identifying similarities between data items, namely entities. To avoid the quadratic costs of the ER task and improve efficiency, blocking techniques are usually applied. Beyond the traditional challenges faced by ER and, consequently, by the blocking techniques, there are also challenges related to streaming data, incremental processing, and noisy data. To address them, we propose a schema-agnostic blocking technique capable of handling noisy and streaming data incrementally through a distributed computational infrastructure. To the best of our knowledge, there is a lack of blocking techniques that address these challenges simultaneously. This work proposes two strategies (attribute selection and top-n neighborhood entities) to minimize resource consumption and improve blocking efficiency. Moreover, this work presents a noise-tolerant algorithm, which minimizes the impact of noisy data (e.g., typos and misspellings) on blocking effectiveness. In our experimental evaluation, we use real-world pairs of data sources, including a case study that involves data from Twitter and Google News. The proposed technique achieves better results regarding effectiveness and efficiency compared to the state-of-the-art technique (metablocking). More precisely, the application of the two strategies over the proposed technique alone improves efficiency by 56%, on average.<\/jats:p>","DOI":"10.3390\/info13120568","type":"journal-article","created":{"date-parts":[[2022,12,6]],"date-time":"2022-12-06T01:43:48Z","timestamp":1670291028000},"page":"568","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":4,"title":["Incremental Entity Blocking over Heterogeneous Streaming Data"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6339-9117","authenticated-orcid":false,"given":"Tiago Brasileiro","family":"Ara\u00fajo","sequence":"first","affiliation":[{"name":"Academic Unit of Systems and Computing, Federal University of Campina Grande, Campina Grande 58429-900, Brazil"},{"name":"Federal Institute of Para\u00edba, Monteiro 58500-000, Brazil"},{"name":"Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1317-8062","authenticated-orcid":false,"given":"Kostas","family":"Stefanidis","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7743-899X","authenticated-orcid":false,"given":"Carlos Eduardo Santos","family":"Pires","sequence":"additional","affiliation":[{"name":"Academic Unit of Systems and Computing, Federal University of Campina Grande, Campina Grande 58429-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7476-7840","authenticated-orcid":false,"given":"Jyrki","family":"Nummenmaa","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Thiago Pereira","family":"da N\u00f3brega","sequence":"additional","affiliation":[{"name":"Academic Unit of Systems and Computing, Federal University of Campina Grande, Campina Grande 58429-900, Brazil"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,5]]},"reference":[{"key":"ref_1","unstructured":"Gentile, A.L., Ristoski, P., Eckel, S., Ritze, D., and Paulheim, H. (2017, January 21\u201324). Entity Matching on Web Tables: A Table Embeddings approach for Blocking. Proceedings of the 20th International Conference on Extending Database Technology, EDBT, Venice, Italy."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1504\/IJWGS.2017.085167","article-title":"Stream-based live entity resolution approach with adaptive duplicate count strategy","volume":"13","author":"Ma","year":"2017","journal-title":"Int. J. Web Grid Serv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ipm.2018.08.006","article-title":"AHAB: Aligning heterogeneous knowledge bases via iterative blocking","volume":"56","author":"Chen","year":"2019","journal-title":"Inf. Process. Manag."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"644","DOI":"10.1007\/s11390-017-1731-1","article-title":"EntityManager: Managing dirty data based on entity resolution","volume":"32","author":"Liu","year":"2017","journal-title":"J. Comput. Sci. Technol."},{"key":"ref_5","first-page":"1","article-title":"Entity Resolution in the Web of Data","volume":"5","author":"Christophides","year":"2015","journal-title":"Synth. Lect. Semant. Web"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1016\/j.ins.2014.02.135","article-title":"Entity resolution for probabilistic data","volume":"277","author":"Ayat","year":"2014","journal-title":"Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Christen, P. (2012). Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer Science & Business Media.","DOI":"10.1007\/978-3-642-31164-2"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"312","DOI":"10.14778\/2856318.2856326","article-title":"Schema-agnostic vs schema-based configurations for blocking methods on homogeneous data","volume":"9","author":"Papadakis","year":"2015","journal-title":"Proc. VLDB Endow."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1016\/j.jss.2017.11.074","article-title":"Heuristic-based approaches for speeding up incremental record linkage","volume":"137","author":"Pires","year":"2018","journal-title":"J. Syst. Softw."},{"key":"ref_10","unstructured":"Ren, X., and Cur\u00e9, O. Strider: A hybrid adaptive distributed RDF stream processing engine. Proceedings of the International Semantic Web Conference."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1016\/j.is.2019.03.006","article-title":"Scaling entity resolution: A loosely schema-aware approach","volume":"83","author":"Simonini","year":"2019","journal-title":"Inf. Syst."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1103","DOI":"10.1016\/j.ipm.2018.04.010","article-title":"An unsupervised aspect extraction strategy for monitoring real-time reviews stream","volume":"56","author":"Dragoni","year":"2019","journal-title":"Inf. Process. Manag."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"769","DOI":"10.1007\/s11390-020-0350-4","article-title":"A Survey on Blocking Technology of Entity Resolution","volume":"35","author":"Li","year":"2020","journal-title":"J. Comput. Sci. Technol."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"931","DOI":"10.1002\/asi.23726","article-title":"Incremental author name disambiguation by exploiting domain-specific heuristics","volume":"68","author":"Santana","year":"2017","journal-title":"J. Assoc. Inf. Sci. Technol."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G., and Stefanidis, K. (2019). End-to-End Entity Resolution for Big Data: A Survey. arXiv.","DOI":"10.1145\/3418896"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Ara\u00fajo, T.B., Pires, C.E.S., and da N\u00f3brega, T.P. (2017, January 3\u20136). Spark-based Streamlined Metablocking. Proceedings of the 2017 IEEE Symposium on Computers and Communications (ISCC), Heraklion, Greece.","DOI":"10.1109\/ISCC.2017.8024632"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Agarwal, S., Godbole, S., Punjani, D., and Roy, S. (2007, January 28\u201331). How much noise is too much: A study in automatic text classification. Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA.","DOI":"10.1109\/ICDM.2007.21"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Garc\u00eda, S., Luengo, J., and Herrera, F. (2015). Data Preprocessing in Data Mining, Springer.","DOI":"10.1007\/978-3-319-10247-4"},{"key":"ref_19","unstructured":"Ara\u00fajo, T.B., Stefanidis, K., Santos Pires, C.E., Nummenmaa, J., and da N\u00f3brega, T.P. (April, January 30). Schema-Agnostic Blocking for Streaming Data. Proceedings of the 35th Annual ACM Symposium on Applied Computing, SAC \u201920, Brno, Czech Republic."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.14778\/2994509.2994533","article-title":"BLAST: A loosely schema-aware meta-blocking approach for entity resolution","volume":"9","author":"Simonini","year":"2016","journal-title":"Proc. VLDB Endow."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1946","DOI":"10.1109\/TKDE.2013.54","article-title":"Meta-blocking: Taking entity resolutionto the next level","volume":"26","author":"Papadakis","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1016\/j.is.2016.12.001","article-title":"Parallel meta-blocking for scaling entity resolution over big heterogeneous data","volume":"65","author":"Efthymiou","year":"2017","journal-title":"Inf. Syst."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1109\/TBDATA.2016.2576463","article-title":"Benchmarking Blocking Algorithms for Web Entities","volume":"6","author":"Efthymiou","year":"2020","journal-title":"IEEE Trans. Big Data"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Efthymiou, V., Stefanidis, K., and Christophides, V. (November, January 29). Big data entity resolution: From highly to somehow similar entity descriptions in the Web. Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7363781"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Burys, J., Awan, A.J., and Heinis, T. (2018, January 12). Large-Scale Clustering Using MPI-Based Canopy. Proceedings of the 2018 IEEE\/ACM Machine Learning in HPC Environments (MLHPC), Dallas, TX, USA.","DOI":"10.1109\/MLHPC.2018.8638632"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.5626\/JCSE.2016.10.1.1","article-title":"An improved hybrid Canopy-Fuzzy C-means clustering algorithm based on MapReduce model","volume":"10","author":"Dai","year":"2016","journal-title":"J. Comput. Sci. Eng."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1007\/s00450-011-0177-x","article-title":"Multi-pass sorted neighborhood blocking with MapReduce","volume":"27","author":"Kolb","year":"2012","journal-title":"Comput.-Sci.-Res. Dev."},{"key":"ref_28","first-page":"15","article-title":"Dynamic sorted neighborhood indexing for real-time entity resolution","volume":"6","author":"Ramadan","year":"2015","journal-title":"J. Data Inf. Qual. (JDIQ)"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.jss.2017.03.003","article-title":"An efficient spark-based adaptive windowing for entity matching","volume":"128","author":"Mestre","year":"2017","journal-title":"J. Syst. Softw."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ma, Y., and Tran, T. (2013, January 4\u20138). Typimatch: Type-specific unsupervised learning of keys and key values for heterogeneous web data integration. Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Rome, Italy.","DOI":"10.1145\/2433396.2433439"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"2665","DOI":"10.1109\/TKDE.2012.150","article-title":"A blocking framework for entity resolution in highly heterogeneous information spaces","volume":"25","author":"Papadakis","year":"2013","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ara\u00fajo, T.B., Pires, C.E.S., Mestre, D.G., N\u00f3brega, T.P.d., Nascimento, D.C.d., and Stefanidis, K. (2019, January 8\u201312). A noise tolerant and schema-agnostic blocking technique for entity resolution. Proceedings of the 34th ACM\/SIGAPP Symposium on Applied Computing, Limassol, Cyprus.","DOI":"10.1145\/3297280.3299730"},{"key":"ref_33","unstructured":"Papadakis, G., Papastefanatos, G., Palpanas, T., and Koubarakis, M. (2016, January 15\u201318). Scaling entity resolution to large, heterogeneous data with enhanced meta-blocking. Proceedings of the 19th International Conference on Extending Database Technology (EDBT), Bordeaux, France."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Yang, Y., Sun, Y., Tang, J., Ma, B., and Li, J. (2015, January 10\u201313). Entity matching across heterogeneous sources. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia.","DOI":"10.1145\/2783258.2783353"},{"key":"ref_35","unstructured":"Gagliardelli, L., Simonini, G., Beneventano, D., and Bergamaschi, S. (2019, January 26\u201329). SparkER: Scaling Entity Resolution in Spark. Proceedings of the Advances in Database Technology\u201422nd International Conference on Extending Database Technology, EDBT, Lisbon, Portugal."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liang, H., Wang, Y., Christen, P., and Gayler, R. (2014, January 13\u201316). Noise-tolerant approximate blocking for dynamic real-time entity resolution. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Tainan, Taiwan.","DOI":"10.1007\/978-3-319-06605-9_37"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.jbi.2015.04.008","article-title":"Automated misspelling detection and correction in clinical free-text records","volume":"55","author":"Lai","year":"2015","journal-title":"J. Biomed. Inform."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Sohail, A., and Qounain, W.u. (2022). Locality sensitive blocking (LSB): A robust blocking technique for data deduplication. J. Inf. Sci.","DOI":"10.1177\/01655515221121963"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"3212","DOI":"10.1109\/TKDE.2020.2967722","article-title":"Discovering relaxed functional dependencies based on multi-attribute dominance","volume":"33","author":"Caruccio","year":"2020","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3397462","article-title":"Incremental discovery of imprecise functional dependencies","volume":"12","author":"Caruccio","year":"2020","journal-title":"J. Data Inf. Qual. (JDIQ)"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"189","DOI":"10.14778\/3149193.3149199","article-title":"Synthesizing entity matching rules by examples","volume":"11","author":"Singh","year":"2017","journal-title":"Proc. VLDB Endow."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"712","DOI":"10.14778\/3377369.3377379","article-title":"MDedup: Duplicate detection with matching dependencies","volume":"13","author":"Koumarelas","year":"2020","journal-title":"Proc. VLDB Endow."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"697","DOI":"10.14778\/2732939.2732943","article-title":"Incremental record linkage","volume":"7","author":"Gruenheid","year":"2014","journal-title":"Proc. VLDB Endow."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Nentwig, M., and Rahm, E. (2018, January 17\u201320). Incremental clustering on linked data. Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore.","DOI":"10.1109\/ICDMW.2018.00084"},{"key":"ref_45","unstructured":"Saeedi, A., Peukert, E., and Rahm, E. Incremental Multi-source Entity Resolution for Knowledge Graph Completion. Proceedings of the European Semantic Web Conference."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1007\/s11704-016-5485-7","article-title":"iGraph: An incremental data processing system for dynamic graph","volume":"10","author":"Ju","year":"2016","journal-title":"Front. Comput. Sci."},{"key":"ref_47","first-page":"361","article-title":"TPStream: Low-latency and high-throughput temporal pattern matching on event streams","volume":"39","author":"Glombiewski","year":"2019","journal-title":"Distrib. Parallel Databases"},{"key":"ref_48","unstructured":"Opitz, B., Sztyler, T., Jess, M., Knip, F., Bikar, C., Pfister, B., and Scherp, A. (2014, January 27). An Approach for Incremental Entity Resolution at the Example of Social Media Data. Proceedings of the AI Mashup Challenge 2014 Co-Located with 11th Extended Semantic Web Conference (ESWC 2014), Crete, Greece."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ao, F., Yan, Y., Huang, J., and Huang, K. (2007, January 18\u201320). Mining maximal frequent itemsets in data streams based on fp-tree. Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany.","DOI":"10.1007\/978-3-540-73499-4_36"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Kumar, A., Singh, A., and Singh, R. (2017, January 4\u20137). An efficient hybrid-clustream algorithm for stream mining. Proceedings of the 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Jaipur, India.","DOI":"10.1109\/SITIS.2017.77"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1621","DOI":"10.1007\/s11277-016-3275-z","article-title":"Entity Resolution Approach of Data Stream Management Systems","volume":"91","author":"Kim","year":"2016","journal-title":"Wirel. Pers. Commun."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-3-031-01853-4","article-title":"Big data integration","volume":"7","author":"Dong","year":"2015","journal-title":"Synth. Lect. Data Manag."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Garofalakis, M., Gehrke, J., and Rastogi, R. (2016). Data Stream Management: Processing High-Speed Data Streams, Springer.","DOI":"10.1007\/978-3-540-28608-0"},{"key":"ref_54","unstructured":"Zikopoulos, P., Eaton, C., and Zikopoulos, P. (2011). Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw-Hill Osborne Media."},{"key":"ref_55","unstructured":"Kreps, J., Narkhede, N., and Rao, J. (2011, January 12). Kafka: A distributed messaging system for log processing. Proceedings of the NetDB, Athens, Greece."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Andoni, A., Indyk, P., Nguyen, H.L., and Razenshteyn, I. (2014, January 5\u20137). Beyond locality-sensitive hashing. Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Portland, OR, USA.","DOI":"10.1137\/1.9781611973402.76"},{"key":"ref_57","unstructured":"Efthymiou, V., Papadakis, G., Stefanidis, K., and Christophides, V. (2019, January 26\u201329). MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities. Proceedings of the Advances in Database Technology\u201422nd International Conference on Extending Database Technology, EDBT 2019, Lisbon, Portugal."},{"key":"ref_58","unstructured":"Efthymiou, V., Stefanidis, K., and Christophides, V. (2016, January 15\u201316). Minoan ER: Progressive Entity Resolution in the Web of Data. Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Golab, L., and \u00d6zsu, M.T. (2003, January 9\u201312). Processing sliding window multi-joins in continuous queries over data streams. Proceedings of the Proceedings 2003 VLDB Conference, Berlin, Germany.","DOI":"10.1016\/B978-012722442-8\/50051-3"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1007\/s13042-014-0313-6","article-title":"Concepts reduction in formal concept analysis with fuzzy setting using Shannon entropy","volume":"8","author":"Singh","year":"2017","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Dasgupta, A., Kumar, R., and Sarl\u00f3s, T. (2011, January 21\u201324). Fast locality-sensitive hashing. Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA.","DOI":"10.1145\/2020408.2020578"},{"key":"ref_62","first-page":"28","article-title":"Apache flink: Stream and batch processing in a single engine","volume":"36","author":"Carbone","year":"2015","journal-title":"Bull. IEEE Comput. Soc. Tech. Comm. Data Eng."},{"key":"ref_63","doi-asserted-by":"crossref","unstructured":"Lopez, M.A., Lobato, A.G.P., and Duarte, O.C.M. (2016, January 4\u20138). A performance comparison of open-source stream processing platforms. Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA.","DOI":"10.1109\/GLOCOM.2016.7841533"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Veiga, J., Exp\u00f3sito, R.R., Pardo, X.C., Taboada, G.L., and Tourifio, J. (2016, January 5\u20138). Performance evaluation of big data frameworks for large-scale data analytics. Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA.","DOI":"10.1109\/BigData.2016.7840633"},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Stefanidis, K., Christophides, V., and Efthymiou, V. (2017, January 19\u201322). Web-scale blocking, iterative and progressive entity resolution. Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA.","DOI":"10.1109\/ICDE.2017.214"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Hassani, M. (2019). Overview of Efficient Clustering Methods for High-Dimensional Big Data Streams. Clustering Methods for Big Data Analytics, Springer.","DOI":"10.1007\/978-3-319-97864-2_2"},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Fedoryszak, M., Frederick, B., Rajaram, V., and Zhong, C. (2019, January 4\u20138). Real-time Event Detection on Social Data Streams. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA.","DOI":"10.1145\/3292500.3330689"},{"key":"ref_68","unstructured":"(2017, January 07). Global Social Journalism Study. Available online: https:\/\/www.cision.com\/us\/resources\/white-papers\/2019-sotm\/?sf=false."},{"key":"ref_69","first-page":"e2297v1","article-title":"TwitterNews: Real time event detection from the Twitter data stream","volume":"4","author":"Hasan","year":"2016","journal-title":"PeerJ Prepr."},{"key":"ref_70","first-page":"807","article-title":"Sourcing the Sources: An analysis of the use of Twitter and Facebook as a journalistic source over 10 years in The New York Times, The Guardian, and S\u00fcddeutsche Zeitung","volume":"6","author":"Boczek","year":"2018","journal-title":"Digit. J."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"21","DOI":"10.3233\/SW-200410","article-title":"Network metrics for assessing the quality of entity resolution between multiple datasets","volume":"12","author":"Idrissou","year":"2021","journal-title":"Semant. Web"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1177\/0165551517698564","article-title":"A survey on real-time event detection from the twitter data stream","volume":"44","author":"Hasan","year":"2018","journal-title":"J. Inf. Sci."},{"key":"ref_73","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/j.ins.2019.09.013","article-title":"YAKE! Keyword extraction from single documents using multiple local features","volume":"509","author":"Campos","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Lee, P., Lakshmanan, L.V., and Milios, E.E. (April, January 31). Incremental cluster evolution tracking from highly dynamic network data. Proceedings of the 2014 IEEE 30th International Conference on Data Engineering, Chicago, IL, USA.","DOI":"10.1109\/ICDE.2014.6816635"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Sharma, S. (2020). Dynamic hashtag interactions and recommendations: An implementation using apache spark streaming and GraphX. Data Management, Analytics and Innovation, Springer.","DOI":"10.1007\/978-981-32-9949-8_51"},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Altowim, Y., and Mehrotra, S. (2017, January 19\u201322). Parallel progressive approach to entity resolution using mapreduce. Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA.","DOI":"10.1109\/ICDE.2017.139"},{"key":"ref_77","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1016\/j.ins.2017.12.024","article-title":"A novel approach for entity resolution in scientific documents using context graphs","volume":"432","author":"Huang","year":"2018","journal-title":"Inf. Sci."},{"key":"ref_78","doi-asserted-by":"crossref","first-page":"217","DOI":"10.1007\/s10723-015-9359-2","article-title":"A survey on resource scheduling in cloud computing: Issues and challenges","volume":"14","author":"Singh","year":"2016","journal-title":"J. Grid Comput."},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"3045","DOI":"10.1016\/j.cor.2013.06.012","article-title":"Optimized task scheduling and resource allocation on cloud computing environment using improved differential evolution algorithm","volume":"40","author":"Tsai","year":"2013","journal-title":"Comput. Oper. Res."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/12\/568\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:34:37Z","timestamp":1760146477000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/13\/12\/568"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,5]]},"references-count":79,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["info13120568"],"URL":"https:\/\/doi.org\/10.3390\/info13120568","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2022,12,5]]}}}