{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T02:09:15Z","timestamp":1776132555988,"version":"3.50.1"},"reference-count":39,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,11,9]],"date-time":"2020-11-09T00:00:00Z","timestamp":1604880000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100007891","name":"Ryerson University","doi-asserted-by":"publisher","award":["DRF_URO"],"award-info":[{"award-number":["DRF_URO"]}],"id":[{"id":"10.13039\/100007891","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>During the COVID-19 pandemic, many research studies have been conducted to examine the impact of the outbreak on the financial sector, especially on cryptocurrencies. Social media, such as Twitter, plays a significant role as a meaningful indicator in forecasting the Bitcoin (BTC) prices. However, there is a research gap in determining the optimal preprocessing strategy in BTC tweets to develop an accurate machine learning prediction model for bitcoin prices. This paper develops different text preprocessing strategies for correlating the sentiment scores of Twitter text with Bitcoin prices during the COVID-19 pandemic. We explore the effect of different preprocessing functions, features, and time lengths of data on the correlation results. Out of 13 strategies, we discover that splitting sentences, removing Twitter-specific tags, or their combination generally improve the correlation of sentiment scores and volume polarity scores with Bitcoin prices. The prices only correlate well with sentiment scores over shorter timespans. Selecting the optimum preprocessing strategy would prompt machine learning prediction models to achieve better accuracy as compared to the actual prices.<\/jats:p>","DOI":"10.3390\/bdcc4040033","type":"journal-article","created":{"date-parts":[[2020,11,10]],"date-time":"2020-11-10T10:47:28Z","timestamp":1605005248000},"page":"33","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":154,"title":["A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19"],"prefix":"10.3390","volume":"4","author":[{"given":"Toni","family":"Pano","sequence":"first","affiliation":[{"name":"Electrical, Computer, and Biomedical Engineering, Ryerson University, Toronto, ON M5B 2K3, Canada"}]},{"given":"Rasha","family":"Kashef","sequence":"additional","affiliation":[{"name":"Electrical, Computer, and Biomedical Engineering, Ryerson University, Toronto, ON M5B 2K3, Canada"}]}],"member":"1968","published-online":{"date-parts":[[2020,11,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"101607","DOI":"10.1016\/j.frl.2020.101607","article-title":"Safe haven or risky hazard? Bitcoin during the COVID-19 bear market","volume":"35","author":"Conlon","year":"2020","journal-title":"Financ. Res. Lett."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kristoufek, L. (2020). Grandpa, Grandpa, Tell Me the One About Bitcoin Being a Safe Haven: New Evidence from the COVID-19 Pandemic. Front. Phys., 8.","DOI":"10.3389\/fphy.2020.00296"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Corbet, S., Charles, L., and Brian, L. (2020). The contagion effects of the COVID-19 pandemic: Evidence from gold and cryptocurrencies. Financ. Res. Lett.","DOI":"10.2139\/ssrn.3564443"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"109936","DOI":"10.1016\/j.chaos.2020.109936","article-title":"The impact of COVID-19 pandemic upon stability and sequential irregularity of equity and cryptocurrency markets","volume":"138","author":"Lahmiri","year":"2020","journal-title":"Chaos Solitons Fractals"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Grobys, K. (2020). When Bitcoin has the flu: On Bitcoin\u2019s performance to hedge equity risk in the early wake of the COVID-19 outbreak. Appl. Econ. Lett., in press.","DOI":"10.2139\/ssrn.3565844"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Goodell, J., and Goutte, S. (2020). Co-movement of COVID-19 and Bitcoin: Evidence from wavelet coherence analysis. Financ. Res. Lett.","DOI":"10.2139\/ssrn.3597144"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Yarovaya, L., Matkovskyy, R., and Jalan, A. (2020). The Effects of a Black Swan Event (COVID-19) on Herding Behavior in Cryptocurrency Markets: Evidence from Cryptocurrency USD, EUR, JPY and KRW Markets. SSRN Electron. J.","DOI":"10.2139\/ssrn.3586511"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"101597","DOI":"10.1016\/j.frl.2020.101597","article-title":"Infected Markets: Novel Coronavirus, Government Interventions, and Stock Return Volatility around the Globe","volume":"35","author":"Zaremba","year":"2020","journal-title":"Financ. Res. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Jain, A., Tripathi, S., Dwivedi, H.D., and Saxena, P. (2018, January 2\u20134). Forecasting Price of Cryptocurrencies Using Tweets Sentiment Analysis. Proceedings of the 2018 Eleventh International Conference on Contemporary Computing (IC3) Institute of Electrical and Electronics Engineers (IEEE), Noida, India.","DOI":"10.1109\/IC3.2018.8530659"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1016\/j.eswa.2018.06.022","article-title":"A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis","volume":"110","author":"Symeonidis","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Ibrahim, A., Kashef, R., Li, M., Valencia, E., and Huang, E. (2020). Bitcoin Network Mechanics: Forecasting the BTC Closing Price Using Vector Auto-Regression Models Based on Endogenous and Exogenous Feature Variables. J. Risk Fin. Manag., 13.","DOI":"10.3390\/jrfm13090189"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tan, X., and Kashef, R. (2019, January 2\u20136). Predicting the closing price of cryptocurrencies. Proceedings of the Second International Conference on Data Science, E-Learning and Information Systems-DATA \u201919, Association for Computing Machinery (ACM), Dubai, United Arab Emirates.","DOI":"10.1145\/3368691.3368728"},{"key":"ref_13","unstructured":"Hutto, C.J. (2020, July 24). VADER-Sentiment-Analysis, GitHub. Available online: https:\/\/github.com\/cjhutto\/vaderSentiment."},{"key":"ref_14","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1108\/00220410410560573","article-title":"A statistical interpretation of term specificity and its application in retrieval","volume":"60","author":"Jones","year":"2004","journal-title":"J. Document."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1016\/j.eswa.2016.03.028","article-title":"Classification of sentiment reviews using n-gram machine learning approach","volume":"57","author":"Tripathy","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_17","unstructured":"Hutto, C.J., and Gilbert, E. (2020, July 24). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Available online: https:\/\/www.aaai.org\/ocs\/index.php\/ICWSM\/ICWSM14\/paper\/view\/8109."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1080\/03081079.2017.1291635","article-title":"A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation)","volume":"46","author":"Havrlant","year":"2017","journal-title":"Int. J. Gen. Syst."},{"key":"ref_19","unstructured":"Stenqvist, E., and L\u00f6nn\u00f6, J. (2017). Predicting Bitcoin Price Fluctuation with Twitter Sentiment Analysis. [Bachelor\u2019 Thesis, School of Computer Science and Communication (CSC), KTH]."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"101188","DOI":"10.1016\/j.intfin.2020.101188","article-title":"The predictive power of public Twitter sentiment for forecasting cryptocurrency prices","volume":"65","author":"Kraaijeveld","year":"2020","journal-title":"J. Int. Financial Mark. Inst. Money"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Li, T.R., Chamrajnagar, A.S., Fong, X.R., Rizik, N.R., and Fu, F. (2019). Sentiment-Based Prediction of Alternative Cryptocurrency Price Fluctuations Using Gradient Boosting Tree Model. Front. Phys., 7.","DOI":"10.3389\/fphy.2019.00098"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Mohapatra, S., Ahmed, N., and Alencar, P. (2019). KryptoOracle: A Real-Time Cryptocurrency Price Prediction Platform Using Twitter Sentiments. arXiv.","DOI":"10.1109\/BigData47090.2019.9006554"},{"key":"ref_23","unstructured":"Kaplan, C., Aslan, C., and Bulbul, A. (2020, May 20). Cryptocurrency Word-of-Mouth Analysis via Twitter, ResearchGate. Available online: https:\/\/www.researchgate.net\/publication\/327988035_Cryptocurrency_Word-of-Mouth_Analysis_viaTwitter."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"101003","DOI":"10.1016\/j.jocs.2019.05.009","article-title":"Emotion and sentiment analysis from Twitter text","volume":"36","author":"Sailunaz","year":"2019","journal-title":"J. Comput. Sci."},{"key":"ref_25","unstructured":"Rosen, A. (2020, July 24). Tweeting Made Easier, Twitter. Available online: https:\/\/blog.twitter.com\/en_us\/topics\/product\/2017\/tweetingmadeeasier.html."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lyu, H., Chen, L., Wang, Y., and Luo, J. (2020). Sense and Sensibility: Characterizing Social Media Users Regarding the Use of Controversial Terms for COVID-19. IEEE Trans. Big Data, 1.","DOI":"10.1109\/TBDATA.2020.2996401"},{"key":"ref_27","unstructured":"(2020, May 19). The Twitter Rules, Twitter. Available online: https:\/\/help.twitter.com\/en\/rules-and-policies\/twitter-rules."},{"key":"ref_28","unstructured":"(2020, May 19). Automation rules, Twitter. Available online: https:\/\/help.twitter.com\/en\/rules-and-policies\/twitter-automation."},{"key":"ref_29","unstructured":"(2020, May 19). Tweepy. Available online: http:\/\/www.tweepy.org\/."},{"key":"ref_30","unstructured":"(2020, July 24). Counting Characters, Twitter. Available online: https:\/\/developer.twitter.com\/en\/docs\/basics\/counting-characters."},{"key":"ref_31","unstructured":"(2020, July 24). Search Tweets-Overview-Search API, Twitter. Available online: https:\/\/developer.twitter.com\/en\/docs\/tweets\/search\/overview\/standard."},{"key":"ref_32","unstructured":"(2020, July 24). Search Tweets-API Reference-Standard search API, Twitter. Available online: https:\/\/developer.twitter.com\/en\/docs\/tweets\/search\/api-reference\/get-search-tweets."},{"key":"ref_33","unstructured":"(2020, July 24). Choose Your Plan, CryptoCompare. Available online: https:\/\/min-api.cryptocompare.com\/pricing."},{"key":"ref_34","unstructured":"Hutto, C.J. (2020, July 24). vaderSentiment\/vaderSentiment\/vader_lexicon.txt, GitHub. Available online: https:\/\/github.com\/cjhutto\/vaderSentiment\/blob\/master\/vaderSentiment\/vader_lexicon.txt."},{"key":"ref_35","unstructured":"(2020, July 24). \u201c5. Built-in Types\u201d\u2014Python 2.7.18 Documentation. Available online: https:\/\/docs.python.org\/2\/library\/stdtypes.html."},{"key":"ref_36","unstructured":"Hutto, C.J. (2020, July 24). vaderSentiment\/vaderSentiment\/vaderSentiment.py, GitHub. Available online: https:\/\/github.com\/cjhutto\/vaderSentiment\/blob\/master\/vaderSentiment\/vaderSentiment.py."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Pano, T., and Kashef, R. (2020, January 9\u201312). A Corpus of BTC Tweets in the Era of COVID-19. Proceedings of the 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Institute of Electrical and Electronics Engineers (IEEE), Vancouver, BC, Canada.","DOI":"10.1109\/IEMTRONICS51293.2020.9216427"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Stella, M. (2019). Modelling Early Word Acquisition through Multiplex Lexical Networks and Machine Learning. Big Data Cogn. Comput., 3.","DOI":"10.3390\/bdcc3010010"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, D., and Summers-Stay, D. (2019). Mapping Distributional Semantics to Property Norms with Deep Neural Networks. Big Data Cogn. Comput., 3.","DOI":"10.3390\/bdcc3020030"}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/4\/4\/33\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:30:56Z","timestamp":1760178656000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/4\/4\/33"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,9]]},"references-count":39,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["bdcc4040033"],"URL":"https:\/\/doi.org\/10.3390\/bdcc4040033","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,9]]}}}