{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,4]],"date-time":"2025-10-04T07:48:37Z","timestamp":1759564117915,"version":"3.41.2"},"reference-count":45,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,1,28]],"date-time":"2021-01-28T00:00:00Z","timestamp":1611792000000},"content-version":"vor","delay-in-days":27,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Complexity"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>Twitter integrates with streaming data technologies and machine learning to add new value to healthcare. This paper presented a real\u2010time system to predict breast cancer based on streaming patient\u2019s health data from Twitter. The proposed system consists of two major components: developing an offline building model and an online prediction pipeline. For the first component, we made a correlation between the features to determine the correlation between features and reduce the number of features from the Breast Cancer Wisconsin Diagnostic dataset. Two feature selection algorithms are recursive feature elimination and univariate feature selection algorithms which are applied to features after correlation to select the essential features. Four decision trees, logistic regression, support vector machine, and random forest classifier have been used on features after correlation and feature selection. Also, hyperparameter tuning and cross\u2010validation have been applied with machine learning to optimize models and enhance accuracy. Apache Spark, Apache Kafka, and Twitter Streaming API are used to develop the second component. The best model with the highest accuracy obtained from the first component predicts breast cancer in real time from tweets\u2019 streaming. The results showed that the best model is the random forest classifier which achieved the best accuracy.<\/jats:p>","DOI":"10.1155\/2021\/6653508","type":"journal-article","created":{"date-parts":[[2021,1,28]],"date-time":"2021-01-28T19:35:21Z","timestamp":1611862521000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Breast Cancer Identification from Patients\u2019 Tweet Streaming Using Machine Learning Solution on Spark"],"prefix":"10.1155","volume":"2021","author":[{"given":"Nahla F.","family":"Omran","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8362-4350","authenticated-orcid":false,"given":"Sara F.","family":"Abd-el Ghany","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6184-7107","authenticated-orcid":false,"given":"Hager","family":"Saleh","sequence":"additional","affiliation":[]},{"given":"Ayman","family":"Nabil","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,1,28]]},"reference":[{"volume-title":"Special Requirements for QA in Mammography with Respect to CR-Systems","year":"2013","author":"Rodr\u00edguez Larumbe L.","key":"e_1_2_10_1_2"},{"volume-title":"Breast Cancer","year":"2020","author":"Clinic M.","key":"e_1_2_10_2_2"},{"volume-title":"Breast Cancer Treatment (Adult)(pdq\u00ae): Patient Version","year":"2002","author":"PDQ Adult Treatment Editorial Board","key":"e_1_2_10_3_2"},{"volume-title":"United States Cancer Statistics: 1999\u20132011 Incidence and Mortality Web-Based Report","year":"2014","author":"Centers for Disease Control and Prevention","key":"e_1_2_10_4_2"},{"key":"e_1_2_10_5_2","unstructured":"World Health Organization Breast cancer 2020 World Health Organization Geneva Switzerland https:\/\/www.who.int\/cancer\/prevention\/diagnosis-screening\/breast-cancer\/en\/."},{"key":"e_1_2_10_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.breast.2009.12.002"},{"key":"e_1_2_10_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.04.224"},{"key":"e_1_2_10_8_2","doi-asserted-by":"publisher","DOI":"10.5121\/ijaia.2012.3603"},{"key":"e_1_2_10_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2008.01.009"},{"key":"e_1_2_10_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.08.044"},{"key":"e_1_2_10_11_2","unstructured":"HadoopA. Apache Hadoop 2020 https:\/\/hadoop.apache.org\/."},{"key":"e_1_2_10_12_2","unstructured":"KafkaA. Apache Kafka 2020 https:\/\/spark.apache.org\/."},{"key":"e_1_2_10_13_2","unstructured":"StormA. Apache Storm 2020 https:\/\/storm.apache.org\/."},{"key":"e_1_2_10_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2014.06.009"},{"key":"e_1_2_10_15_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compeleceng.2017.03.009"},{"key":"e_1_2_10_16_2","doi-asserted-by":"publisher","DOI":"10.1080\/21675511.2015.1083145"},{"key":"e_1_2_10_17_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0205855"},{"key":"e_1_2_10_18_2","doi-asserted-by":"publisher","DOI":"10.1177\/1932296818811679"},{"key":"e_1_2_10_19_2","doi-asserted-by":"crossref","unstructured":"PlachourasV. LeidnerJ. L. andGarrowA. G. Quantifying self-reported adverse drug events on twitter: signal and topic analysis Proceedings of the 7th 2016 International Conference on Social Media & Society July 2016 London UK 1\u201310.","DOI":"10.1145\/2930971.2930977"},{"key":"e_1_2_10_20_2","unstructured":"ClarkE. M. JamesT. JonesC. A.et al. A sentiment analysis of breast cancer treatment experiences and healthcare perceptions across Twitter 2018 https:\/\/arxiv.org\/abs\/1805.09959."},{"key":"e_1_2_10_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2019.09.056"},{"key":"e_1_2_10_22_2","doi-asserted-by":"publisher","DOI":"10.3390\/healthcare8020111"},{"key":"e_1_2_10_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2004.07.002"},{"key":"e_1_2_10_24_2","doi-asserted-by":"crossref","unstructured":"AgarapA. F. M. On breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset Proceedings of the 2nd International Conference on Machine Learning and Soft Computing February 2018 Phu Quoc Island Vietnam 5\u20139.","DOI":"10.1145\/3184066.3184080"},{"key":"e_1_2_10_25_2","doi-asserted-by":"publisher","DOI":"10.19072\/ijet.280563"},{"key":"e_1_2_10_26_2","doi-asserted-by":"crossref","unstructured":"BenbrahimH. HachimiH. andAmineA. Comparative study of machine learning algorithms using the breast cancer dataset Proceedings of the International Conference on Advanced Intelligent Systems for Sustainable Development July 2019 Marrakech Morocco Springer 83\u201391.","DOI":"10.1007\/978-3-030-36664-3_10"},{"key":"e_1_2_10_27_2","article-title":"Using three machine learning techniques for predicting breast cancer recurrence","volume":"4","author":"Eshlaghy A. T.","year":"2013","journal-title":"Journal of Health and Medical Informatics"},{"key":"e_1_2_10_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2018.10.014"},{"key":"e_1_2_10_29_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10916-010-9518-8"},{"key":"e_1_2_10_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2011.01.120"},{"key":"e_1_2_10_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2019.2927080"},{"key":"e_1_2_10_32_2","unstructured":"SparkA. Apache Spark 2020 https:\/\/spark.apache.org\/."},{"key":"e_1_2_10_33_2","unstructured":"Breast Cancer Wisconsin Breast cancer Wisconsin (diagnostic) data set 2020 https:\/\/www.kaggle.com\/uciml\/breast-cancer-wisconsin-data."},{"key":"e_1_2_10_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11548-016-1437-9"},{"key":"e_1_2_10_35_2","doi-asserted-by":"publisher","DOI":"10.5120\/15399-4026"},{"key":"e_1_2_10_36_2","unstructured":"pandas.dataframe.corr 2020 https:\/\/pandas.pydata.org\/pandas-docs\/stable\/reference\/api\/pandas.DataFrame.corr.html."},{"key":"e_1_2_10_37_2","doi-asserted-by":"publisher","DOI":"10.1023\/a:1012487302797"},{"key":"e_1_2_10_38_2","doi-asserted-by":"crossref","unstructured":"JinX. XuA. BieR. andGuoP. Machine learning techniques and chi-square feature selection for cancer classification using sage gene expression profiles Proceedings of the International Workshop on Data Mining for Biomedical Applications April 2006 Singapore Springer 106\u2013115.","DOI":"10.1007\/11691730_11"},{"key":"e_1_2_10_39_2","doi-asserted-by":"crossref","unstructured":"UtamaH. Sentiment analysis in airline tweets using mutual information for feature selection Proceedings of the 2019 4th International Conference on Information Technology Information Systems and Electrical Engineering (ICITISEE) November 2019 Yogyakarta Indonesia IEEE 295\u2013300.","DOI":"10.1109\/ICITISEE48480.2019.9003903"},{"key":"e_1_2_10_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19425-7_13"},{"key":"e_1_2_10_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/21.97458"},{"key":"e_1_2_10_42_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2004.03.017"},{"key":"e_1_2_10_43_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bti319"},{"key":"e_1_2_10_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/NNSP.2000.890157"},{"key":"e_1_2_10_45_2","unstructured":"AppT. Twitter STREAMING API 2019 https:\/\/developer.twitter.com\/en\/docs\/tweets\/filter-realtime\/guides\/connecting.html."}],"container-title":["Complexity"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6653508.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6653508.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/6653508","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T21:59:57Z","timestamp":1723240797000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/6653508"}},"subtitle":[],"editor":[{"given":"Ahmed Mostafa","family":"Khalil","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/6653508"],"URL":"https:\/\/doi.org\/10.1155\/2021\/6653508","archive":["Portico"],"relation":{},"ISSN":["1076-2787","1099-0526"],"issn-type":[{"type":"print","value":"1076-2787"},{"type":"electronic","value":"1099-0526"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-12-24","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-13","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"6653508"}}