{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T07:47:38Z","timestamp":1772264858464,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T00:00:00Z","timestamp":1628812800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T00:00:00Z","timestamp":1628812800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100020639","name":"Bayerisches Staatsministerium f\u00fcr Wirtschaft, Landesentwicklung und Energie","doi-asserted-by":"crossref","award":["IUK-1709-0011\/\/ IUK530\/010"],"award-info":[{"award-number":["IUK-1709-0011\/\/ IUK530\/010"]}],"id":[{"id":"10.13039\/501100020639","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004895","name":"European Social Fund","doi-asserted-by":"publisher","award":["WiT-HuB\/2014-2020"],"award-info":[{"award-number":["WiT-HuB\/2014-2020"]}],"id":[{"id":"10.13039\/501100004895","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Hochschule f\u00fcr angewandte Wissenschaften W\u00fcrzburg-Schweinfurt"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Evolving Systems"],"published-print":{"date-parts":[[2022,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent years social media became an important part of everyday life for many people. A big challenge of social media is, to find posts, that are interesting for the user. Many social networks like Twitter handle this problem with so-called hashtags. A user can label his own Tweet (post) with a hashtag, while other users can search for posts containing a specified hashtag. But what about finding posts which are not labeled by the creator? We provide a way of completing hashtags for unlabeled posts using classification on a novel real-world Twitter data stream. New posts will be created every second, thus this context fits perfectly for non-stationary data analysis. Our goal is to show, how labels (hashtags) of social media posts can be predicted by stream classifiers. In particular, we employ random projection (RP) as a preprocessing step in calculating streaming models. Also, we provide a novel real-world data set for streaming analysis called NSDQ with a comprehensive data description. We show that this dataset is a real challenge for state-of-the-art stream classifiers. While RP has been widely used and evaluated in stationary data analysis scenarios, non-stationary environments are not well analyzed. In this paper, we provide a use case of RP on real-world streaming data, especially on NSDQ dataset. We discuss why RP can be used in this scenario and how it can handle stream-specific situations like concept drift. We also provide experiments with RP on streaming data, using state-of-the-art stream classifiers like adaptive random forest and concept drift detectors. Additionally, we experimentally evaluate an online principal component analysis (PCA) approach in the same fashion as we do for RP. To obtain higher dimensional synthetic streams, we use random Fourier features (RFF) in an online manner which allows us, to increase the number of dimensions of low dimensional streams.<\/jats:p>","DOI":"10.1007\/s12530-021-09396-z","type":"journal-article","created":{"date-parts":[[2021,8,13]],"date-time":"2021-08-13T18:02:41Z","timestamp":1628877761000},"page":"387-401","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Dimensionality reduction in the context of dynamic social media data streams"],"prefix":"10.1007","volume":"13","author":[{"given":"Moritz","family":"Heusinger","sequence":"first","affiliation":[]},{"given":"Christoph","family":"Raab","sequence":"additional","affiliation":[]},{"given":"Frank-Michael","family":"Schleif","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,8,13]]},"reference":[{"key":"9396_CR1","doi-asserted-by":"crossref","unstructured":"Achlioptas D (2001) Database-friendly random projections. In: Proc. of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, pp 274\u2013281","DOI":"10.1145\/375551.375608"},{"key":"9396_CR2","doi-asserted-by":"publisher","first-page":"671","DOI":"10.1016\/S0022-0000(03)00025-4","volume":"66","author":"D Achlioptas","year":"2003","unstructured":"Achlioptas D (2003) Database-friendly random projections: Johnson\u2013Lindenstrauss with binary coins. J Comput Syst Sci 66:671\u2013687","journal-title":"J Comput Syst Sci"},{"key":"9396_CR3","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1201\/b17320","volume-title":"Data classification: algorithms and applications","author":"CC Aggarwal","year":"2014","unstructured":"Aggarwal CC (2014) A survey of stream classification algorithms. In: Aggarwal CC (ed) Data classification: algorithms and applications. CRC Press, Boca Raton, pp 245\u2013274"},{"key":"9396_CR4","unstructured":"Baena-Garc\u0131a M, del Campo-\u00c1vila J, Fidalgo R, Bifet A, Gavalda R, Morales-Bueno R (2006) Early drift detection method. In: Fourth international workshop on knowledge discovery from data streams, vol 6, pp 77\u201386"},{"key":"9396_CR5","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/10654.001.0001","volume-title":"Machine learning for data streams with practical examples in MOA","author":"A Bifet","year":"2018","unstructured":"Bifet A, Gavald\u00e0 R, Holmes G, Pfahringer B (2018) Machine learning for data streams with practical examples in MOA. MIT Press, Cambridge"},{"key":"9396_CR6","doi-asserted-by":"crossref","unstructured":"Bifet A, Gavald\u00e0 R (2007) Learning from time-changing data with adaptive windowing. In: Proc. of the Seventh SIAM International Conference on Data Mining, April 26-28, 2007, Minneapolis, Minnesota, USA, pp 443\u2013448","DOI":"10.1137\/1.9781611972771.42"},{"key":"9396_CR7","doi-asserted-by":"crossref","unstructured":"Bifet A, Gavald\u00e0 R (2009) Adaptive learning from evolving data streams. In: Advances in Intelligent Data Analysis VIII, IDA 2009. Proceedings. Lecture Notes in Computer Science, vol. 5772. Springer, pp 249\u2013260","DOI":"10.1007\/978-3-642-03915-7_22"},{"key":"9396_CR8","unstructured":"Bifet A, Hammer B, Schleif FM (2019) Recent trends in streaming data analysis, concept drift and analysis of dynamic data sets. In: 27th European Symposium on Artificial Neural Networks, ESANN 2019, Bruges, Belgium, April 24-26, 2019, pp 421\u2013430"},{"key":"9396_CR9","doi-asserted-by":"crossref","unstructured":"Carraher LA, Wilsey PA, Moitra A, Dey S (2016) Random projection clustering on streaming data. In: 2016 IEEE 16th ICDMW, pp 708\u2013715","DOI":"10.1109\/ICDMW.2016.0105"},{"key":"9396_CR10","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1002\/rsa.10073","volume":"22","author":"S Dasgupta","year":"2003","unstructured":"Dasgupta S, Gupta A (2003) An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct Algorithms 22:60\u201365","journal-title":"Random Struct Algorithms"},{"key":"9396_CR11","first-page":"79","volume":"16","author":"S Edosomwan","year":"2011","unstructured":"Edosomwan S, Prakasan S, Kouame D, Watson J, Seymour T (2011) The history of social media and its impact on business. J Appl Manag Entrep 16:79\u201391","journal-title":"J Appl Manag Entrep"},{"key":"9396_CR12","doi-asserted-by":"publisher","first-page":"209","DOI":"10.1007\/978-981-10-7200-0_18","volume-title":"Advances in big data and cloud computing","author":"DP Francis","year":"2018","unstructured":"Francis DP, Raimond K (2018) A random Fourier features based streaming algorithm for anomaly detection in large datasets. In: Rajsingh EB, Veerasamy J, Alavi AH, Peter JD (eds) Advances in big data and cloud computing. Springer Singapore, Singapore, pp 209\u2013217"},{"issue":"4","key":"9396_CR13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2523813","volume":"46","author":"J Gama","year":"2014","unstructured":"Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):1\u201337","journal-title":"ACM Comput Surv"},{"key":"9396_CR14","doi-asserted-by":"crossref","unstructured":"Golbeck J, Robles C, Turner K (2011) Predicting personality with social media. In: CHI\u201911 extended abstracts on human factors in computing systems. ACM, pp 253\u2013262","DOI":"10.1145\/1979742.1979614"},{"issue":"9\u201310","key":"9396_CR15","doi-asserted-by":"publisher","first-page":"1469","DOI":"10.1007\/s10994-017-5642-8","volume":"106","author":"HM Gomes","year":"2017","unstructured":"Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9\u201310):1469\u20131495","journal-title":"Mach Learn"},{"key":"9396_CR16","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1007\/978-3-030-00840-6_15","volume-title":"Computer and Information Sciences","author":"M Grabowska","year":"2018","unstructured":"Grabowska M, Kot\u0142owski W (2018) Online principal component analysis for evolving data streams. Computer and Information Sciences. Springer, Berlin, pp 130\u2013137"},{"key":"9396_CR17","unstructured":"Habernal I, Pt\u00e1\u010dek T, Steinberger J (2013) Sentiment analysis in Czech social media using supervised machine learning. In: WASSA@NAACL-HLT, pp 65\u201374"},{"key":"9396_CR18","first-page":"200","volume-title":"Advances in SOM, LVQ, clustering and data visualization","author":"M Heusinger","year":"2020","unstructured":"Heusinger M, Raab C, Schleif FM (2020) Passive concept drift handling via momentum based robust soft learning vector quantization. Advances in SOM, LVQ, clustering and data visualization. Springer, Berlin, pp 200\u2013209"},{"key":"9396_CR19","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1090\/conm\/026\/737400","volume":"26","author":"WB Johnson","year":"1984","unstructured":"Johnson WB, Lindenstrauss J (1984) Extensions of Lipchitz mappings into a Hilbert space. Contemp Math 26:189\u2013206","journal-title":"Contemp Math"},{"key":"9396_CR20","doi-asserted-by":"crossref","unstructured":"Kaban A (2015) Improved bounds on the dot product under random projection and random sign projection. In: Proc. of the 21th ACM SIGKDD, KDD \u201915, pp. 487\u2013496. ACM, New York, NY, USA","DOI":"10.1145\/2783258.2783364"},{"key":"9396_CR21","doi-asserted-by":"crossref","unstructured":"Kaski S (1998) Dimensionality reduction by random mapping: fast similarity computation for clustering. In: IJCNN98, vol.\u00a01, pp. 413\u2013418. Piscataway, NJ, IEEE","DOI":"10.1109\/IJCNN.1998.682302"},{"issue":"1","key":"9396_CR22","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1016\/j.jfa.2004.10.009","volume":"225","author":"B Klartag","year":"2005","unstructured":"Klartag B, Mendelson S (2005) Empirical processes and random projections. J Funct Anal 225(1):229\u2013245","journal-title":"J Funct Anal"},{"key":"9396_CR23","unstructured":"Kouloumpis E, Wilson T, Moore J (2011) Twitter sentiment analysis: the good the bad and the omg! In: Fifth International AAAI conference on weblogs and social media"},{"key":"9396_CR24","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-39351-3","volume-title":"Nonlinear dimensionality reduction","author":"JA Lee","year":"2007","unstructured":"Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction, 1st edn. Springer Publishing Company, Incorporated","edition":"1"},{"issue":"8","key":"9396_CR25","doi-asserted-by":"publisher","first-page":"1371","DOI":"10.1109\/83.855432","volume":"9","author":"A Levey","year":"2000","unstructured":"Levey A, Lindenbaum M (2000) Sequential Karhunen\u2013Loeve basis extraction and its application to images. IEEE Trans Image Process 9(8):1371\u20131374. https:\/\/doi.org\/10.1109\/83.855432","journal-title":"IEEE Trans Image Process"},{"key":"9396_CR26","doi-asserted-by":"crossref","unstructured":"Li P, Hastie TJ, Church KW(2006) Very sparse random projections. In: Proc. of the 12th ACM SIGKDD. ACM, pp 287\u2013296","DOI":"10.1145\/1150402.1150436"},{"issue":"1","key":"9396_CR27","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s10115-017-1137-y","volume":"54","author":"V Losing","year":"2018","unstructured":"Losing V, Hammer B, Wersing H (2018) Tackling heterogeneous concept drift with the self-adjusting memory (SAM). Knowl Inform Syst 54(1):171\u2013201","journal-title":"Knowl Inform Syst"},{"key":"9396_CR28","doi-asserted-by":"crossref","unstructured":"Losing V, Hammer B, Wersing H (2017) KNN classifier with self adjusting memory for heterogeneous concept drift. in: Proc. IEEE, ICDM, pp 291\u2013300","DOI":"10.1109\/ICDM.2016.0040"},{"key":"9396_CR29","doi-asserted-by":"crossref","unstructured":"Losing V, Hammer B, Wersing H (2017) Self-adjusting memory: How to deal with diverse drift types. In: Proc. IJCAI 2017, pp 4899\u20134903","DOI":"10.24963\/ijcai.2017\/690"},{"key":"9396_CR30","first-page":"2579","volume":"9","author":"Maaten Lvd","year":"2008","unstructured":"Lvd Maaten, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579\u20132605","journal-title":"J Mach Learn Res"},{"key":"9396_CR31","unstructured":"Mikolov T, Sutskever I, Chen K, Corrado G, Dean J(2013)Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS\u201913. Curran Associates Inc., Red Hook, NY, USA, pp 3111\u20133119"},{"issue":"72","key":"9396_CR32","first-page":"1","volume":"19","author":"J Montiel","year":"2018","unstructured":"Montiel J, Read J, Bifet A, Abdessalem T (2018) Scikit-multiflow: a multi-output streaming framework. J Mach Learn Res 19(72):1\u20135","journal-title":"J Mach Learn Res"},{"key":"9396_CR33","unstructured":"Park M, Li, H, Kim J (2016) Harrison: A benchmark on hashtag recommendation for real-world images in social networks. CoRR arXiv: abs\/1605.05054"},{"key":"9396_CR34","first-page":"1","volume":"2017","author":"XC Pham","year":"2017","unstructured":"Pham XC, Dang MT, Dinh SV, Hoang S, Nguyen TT, Liew AW (2017) Learning from data stream based on random projection and Hoeffding tree classifier. DICTA 2017:1\u20138","journal-title":"DICTA"},{"key":"9396_CR35","unstructured":"Raab C, Heusinger M, Schleif FM (2019) Reactive soft prototype computing for frequent reoccurring concept drift. In: Proc. of the 27. ESANN, pp 437\u2013442"},{"key":"9396_CR36","doi-asserted-by":"crossref","unstructured":"Raab C, Heusinger M, Schleif FM (2020) Reactive soft prototype computing for concept drift streams. Neurocomputing","DOI":"10.1016\/j.neucom.2019.11.111"},{"key":"9396_CR37","unstructured":"Rahimi A, Recht B (2008) Random features for large-scale kernel machines. In: Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20, pp 1177\u20131184. Curran Associates, Inc. http:\/\/papers.nips.cc\/paper\/3182-random-features-for-large-scale-kernel-machines.pdf"},{"issue":"1\u20133","key":"9396_CR38","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1007\/s11263-007-0075-7","volume":"77","author":"DA Ross","year":"2008","unstructured":"Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1\u20133):125\u2013141","journal-title":"Int J Comput Vis"},{"issue":"1","key":"9396_CR39","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1109\/TVCG.2016.2598495","volume":"23","author":"D Sacha","year":"2017","unstructured":"Sacha D, Zhang L, Sedlmair M, Lee JA, Peltonen J, Weiskopf D, North SC, Keim DA (2017) Visual interaction with dimensionality reduction: a structured literature analysis. IEEE Trans Vis Comput Graph 23(1):241\u2013250","journal-title":"IEEE Trans Vis Comput Graph"},{"key":"9396_CR40","doi-asserted-by":"crossref","unstructured":"Schoeneman F, Mahapatra S, Chandola V, Napp N, Zola J (2017) Error metrics for learning reliable manifolds from streaming data. In: Proc. of the 2017 SIAM International Conference on Data Mining, SIAM (2017), pp 750\u2013758","DOI":"10.1137\/1.9781611974973.84"},{"key":"9396_CR41","doi-asserted-by":"publisher","unstructured":"Shao J, Ahmadi Z, Kramer S(2014) Prototype-based learning on concept-drifting data streams. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD \u201914, pp 412\u2013421. Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/2623330.2623609","DOI":"10.1145\/2623330.2623609"},{"key":"9396_CR42","first-page":"1144","volume-title":"Advances in neural information processing systems","author":"B Sriperumbudur","year":"2015","unstructured":"Sriperumbudur B, Szabo Z (2015) Optimal rates for random Fourier features. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, Red Hook, pp 1144\u20131152"},{"key":"9396_CR43","doi-asserted-by":"publisher","unstructured":"Street WN, Kim Y (2001)A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD \u201901, pp 377\u2013382. Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/502512.502568","DOI":"10.1145\/502512.502568"},{"key":"9396_CR44","unstructured":"Ullah E, Mianjy P, Marinov TV, Arora R(2018) Streaming kernel PCA with $${\\tilde{O}}(\\sqrt{n})$$ random features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS\u201918, pp 7322\u20137332. Curran Associates Inc., Red Hook, NY, USA"},{"key":"9396_CR45","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/0169-7439(87)80084-9","volume":"2","author":"S Wold","year":"1987","unstructured":"Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2:37\u201352","journal-title":"Chemom Intell Lab Syst"},{"issue":"5","key":"9396_CR46","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1109\/LSP.2019.2907480","volume":"26","author":"K Xiong","year":"2019","unstructured":"Xiong K, Wang S (2019) The online random Fourier features conjugate gradient algorithm. IEEE Signal Process Lett 26(5):740\u2013744. https:\/\/doi.org\/10.1109\/LSP.2019.2907480","journal-title":"IEEE Signal Process Lett"}],"container-title":["Evolving Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12530-021-09396-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12530-021-09396-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12530-021-09396-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,6]],"date-time":"2024-09-06T13:19:18Z","timestamp":1725628758000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12530-021-09396-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,13]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["9396"],"URL":"https:\/\/doi.org\/10.1007\/s12530-021-09396-z","relation":{},"ISSN":["1868-6478","1868-6486"],"issn-type":[{"value":"1868-6478","type":"print"},{"value":"1868-6486","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,8,13]]},"assertion":[{"value":"13 January 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 July 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 August 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}