{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T21:03:03Z","timestamp":1771707783759,"version":"3.50.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"S11","license":[{"start":{"date-parts":[[2020,12,1]],"date-time":"2020-12-01T00:00:00Z","timestamp":1606780800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T00:00:00Z","timestamp":1609286400000},"content-version":"vor","delay-in-days":29,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100002790","name":"Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"publisher","award":["RGPIN-2017-05377"],"award-info":[{"award-number":["RGPIN-2017-05377"]}],"id":[{"id":"10.13039\/501100002790","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Med Inform Decis Mak"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The collection and examination of social media has become a useful mechanism for studying the mental activity and behavior tendencies of users. Through the analysis of a collected set of Twitter data, a model will be developed for predicting positively referenced, drug-related tweets. From this, trends and correlations can be determined.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>Social media data (tweets and attributes) were collected and processed using topic pertaining keywords, such as drug slang and use-conditions (methods of drug consumption). Potential candidates were preprocessed resulting in a dataset of 3,696,150 rows. The predictive classification power of multiple methods was compared including SVM, XGBoost, BERT and CNN-based classifiers. For the latter, a deep learning approach was implemented to screen and analyze the semantic meaning of the tweets.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>To test the predictive capability of the model, SVM and XGBoost were first employed. The results calculated from the models respectively displayed an accuracy of 59.33% and 54.90%, with AUC\u2019s of 0.87 and 0.71. The values show a low predictive capability with little discrimination. Conversely, the CNN-based classifiers presented a significant improvement, between the two models tested. The first was trained with 2661 manually labeled samples, while the other included synthetically generated tweets culminating in 12,142 samples. The accuracy scores were 76.35% and 82.31%, with an AUC of 0.90 and 0.91. Using association rule mining in conjunction with the CNN-based classifier showed a high likelihood for keywords such as \u201csmoke\u201d, \u201ccocaine\u201d, and \u201cmarijuana\u201d triggering a drug-positive classification.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>Predictive analysis with a CNN is promising, whereas attribute-based models presented little predictive capability and were not suitable for analyzing text of data. This research found that the commonly mentioned drugs had a level of correspondence with frequently used illicit substances, proving the practical usefulness of this system. Lastly, the synthetically generated set provided increased accuracy scores and improves the predictive capability.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12911-020-01335-3","type":"journal-article","created":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T07:02:24Z","timestamp":1609311744000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Utilizing deep learning and graph mining to identify drug use on Twitter data"],"prefix":"10.1186","volume":"20","author":[{"given":"Joseph","family":"Tassone","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peizhi","family":"Yan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mackenzie","family":"Simpson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chetan","family":"Mendhe","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9741-3463","authenticated-orcid":false,"given":"Vijay","family":"Mago","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Salimur","family":"Choudhury","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,12,30]]},"reference":[{"key":"1335_CR1","doi-asserted-by":"publisher","DOI":"10.1155\/2014\/923290","author":"T Johnson","year":"2014","unstructured":"Johnson T. Sources of error in substance use prevalence surveys. Int Schol Res Not. 2014. https:\/\/doi.org\/10.1155\/2014\/923290.","journal-title":"Int Schol Res Not"},{"issue":"3","key":"1335_CR2","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1007\/s40264-015-0379-4","volume":"39","author":"A Sarker","year":"2016","unstructured":"Sarker A, O\u2019Connor K, Ginn R, Scotch M, Smith K, Malone D, Gonzalez G. Social media mining for toxicovigilance: automatic monitoring of prescription medication abuse from twitter. Drug Saf. 2016;39(3):231\u201340.","journal-title":"Drug Saf"},{"issue":"4","key":"1335_CR3","doi-asserted-by":"publisher","first-page":"98","DOI":"10.2196\/jmir.3970","volume":"17","author":"S Gittelman","year":"2015","unstructured":"Gittelman S, Lange V, Crawford CAG, Okoro CA, Lieb E, Dhingra SS, Trimarchi E. A new source of data for public health surveillance: Facebook likes. J Med Internet Res. 2015;17(4):98.","journal-title":"J Med Internet Res"},{"issue":"3","key":"1335_CR4","doi-asserted-by":"publisher","first-page":"63","DOI":"10.2196\/publichealth.8060","volume":"3","author":"A Kim","year":"2017","unstructured":"Kim A, Miano T, Chew R, Eggers M, Nonnemaker J. Classification of Twitter users who tweet about e-cigarettes. JMIR Public Health Surv. 2017;3(3):63.","journal-title":"JMIR Public Health Surv"},{"key":"1335_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3389\/fpubh.2019.00001","volume":"7","author":"N Shah","year":"2019","unstructured":"Shah N, Srivastava G, Savage DW, Mago V. Assessing Canadians health activity and nutritional habits through social media. Front Public Health. 2019;7:1.","journal-title":"Front Public Health"},{"issue":"2","key":"1335_CR6","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1186\/s12911-018-0632-8","volume":"18","author":"J Du","year":"2018","unstructured":"Du J, Zhang Y, Luo J, Jia Y, Wei Q, Tao C, Xu H. Extracting psychiatric stressors for suicide from social media using deep learning. BMC Med Inform Decis Mak. 2018;18(2):43.","journal-title":"BMC Med Inform Decis Mak"},{"key":"1335_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13638-017-1011-3","volume":"2018","author":"K Robinson","year":"2018","unstructured":"Robinson K, Mago V. Birds of prey: identifying lexical irregularities in spam on Twitter. Wirel Netw. 2018;2018:1\u20138.","journal-title":"Wirel Netw"},{"key":"1335_CR8","doi-asserted-by":"crossref","unstructured":"Kim Y. Convolutional neural networks for sentence classification. Preprint. 2014; arXiv:1408.5882.","DOI":"10.3115\/v1\/D14-1181"},{"key":"1335_CR9","unstructured":"Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Preprint. 2013; arXiv:1301.3781."},{"key":"1335_CR10","doi-asserted-by":"crossref","unstructured":"Lampos V, De\u00a0Bie T, Cristianini N. Flu detector-tracking epidemics on Twitter. In: Joint European conference on machine learning and knowledge discovery in databases. London: Springer; 2010. p. 599\u2013602.","DOI":"10.1007\/978-3-642-15939-8_42"},{"key":"1335_CR11","doi-asserted-by":"crossref","unstructured":"Paul MJ, Dredze M. You are what you tweet: analyzing Twitter for public health. In: 5th International AAAI conference on weblogs and social media; 2011.","DOI":"10.1609\/icwsm.v5i1.14137"},{"issue":"11","key":"1335_CR12","doi-asserted-by":"publisher","first-page":"14118","DOI":"10.1371\/journal.pone.0014118","volume":"5","author":"C Chew","year":"2010","unstructured":"Chew C, Eysenbach G. Pandemics in the age of Twitter: content analysis of tweets during the 2009 H1N1 outbreak. PLoS ONE. 2010;5(11):14118.","journal-title":"PLoS ONE"},{"issue":"9","key":"1335_CR13","doi-asserted-by":"publisher","first-page":"1047","DOI":"10.1177\/0022034511415273","volume":"90","author":"N Heaivilin","year":"2011","unstructured":"Heaivilin N, Gerbert B, Page J, Gibbs J. Public health surveillance of dental pain via Twitter. J Dent Res. 2011;90(9):1047\u201351.","journal-title":"J Dent Res"},{"key":"1335_CR14","doi-asserted-by":"crossref","unstructured":"Coppersmith G, Harman C, Dredze M. Measuring post traumatic stress disorder in Twitter. In: 8th international AAAI conference on weblogs and social media; 2014.","DOI":"10.1609\/icwsm.v8i1.14574"},{"issue":"6","key":"1335_CR15","doi-asserted-by":"publisher","first-page":"985","DOI":"10.1016\/j.jbi.2013.07.007","volume":"46","author":"D Cameron","year":"2013","unstructured":"Cameron D, Smith GA, Daniulaityte R, Sheth AP, Dave D, Chen L, Anand G, Carlson R, Watkins KZ, Falck R. Predose: a semantic web platform for drug abuse epidemiology using social media. J Biomed Inform. 2013;46(6):985\u201397.","journal-title":"J Biomed Inform"},{"key":"1335_CR16","doi-asserted-by":"crossref","unstructured":"Kursuncu U, Gaur M, Lokala U, Illendula A, Thirunarayan K, Daniulaityte R, Sheth A, Arpinar IB. What\u2019s UR type? Contextualized classification of user types in marijuana-related communications using compositional multiview embedding. In: 2018 IEEE\/WIC\/ACM international conference on web intelligence (WI). New York: IEEE; 2018. p. 474\u20139.","DOI":"10.1109\/WI.2018.00-50"},{"key":"1335_CR17","doi-asserted-by":"crossref","unstructured":"Huang X, Di\u00a0Lorio S, Dinh T, Chun SA. Deep self-taught learning for detecting drug abuse risk behavior in tweets. In: Computational data and social networks: 7th international conference, CSoNet 2018, Shanghai, China, December 18\u201320, 2018, proceedings, vol. 11280. London: Springer; 2018. p. 330.","DOI":"10.1007\/978-3-030-04648-4_28"},{"key":"1335_CR18","doi-asserted-by":"crossref","unstructured":"Serrat O. Social network analysis. In: Knowledge solutions. London: Springer; 2017. p. 39\u201343.","DOI":"10.1007\/978-981-10-0983-9_9"},{"key":"1335_CR19","doi-asserted-by":"crossref","unstructured":"Sawhney R, Manchanda P, Mathur P, Shah R, Singh R. Exploring and learning suicidal ideation connotations on social media with deep learning. In: Proceedings of the 9th workshop on computational approaches to subjectivity, sentiment and social media analysis. Brussels: Association for Computational Linguistics; 2018. p. 167\u201375.","DOI":"10.18653\/v1\/W18-6223"},{"key":"1335_CR20","doi-asserted-by":"crossref","unstructured":"Severyn A, Moschitti A. Twitter sentiment analysis with deep convolutional neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Chile 2015. p. 959\u201362.","DOI":"10.1145\/2766462.2767830"},{"key":"1335_CR21","unstructured":"Twitter: Developer Agreement and Policy. https:\/\/developer.twitter.com\/en\/developer-terms\/agreement-and-policy.html#f-be-a-good-partner-to-twitter. Accessed 25 May 2018."},{"key":"1335_CR22","unstructured":"Agency DE. Drugs of abuse: A DEA resource guide. US Department of Justice; 2017."},{"key":"1335_CR23","unstructured":"Agency DE. Slang terms and code words: a reference for law enforcement personnel. US Department of Justice; 2016."},{"issue":"7","key":"1335_CR24","doi-asserted-by":"publisher","first-page":"e0158450","DOI":"10.1371\/journal.pone.0158450","volume":"11","author":"J Bian","year":"2016","unstructured":"Bian J, Yoshigoe K, Modave F. Mining Twitter to assess the public perception of the \u201cInternet of Things\u201d. PLoS ONE. 2016;11(7):e0158450.","journal-title":"PLoS ONE"},{"issue":"7","key":"1335_CR25","doi-asserted-by":"publisher","first-page":"67863","DOI":"10.1371\/journal.pone.0067863","volume":"8","author":"Q Wei","year":"2013","unstructured":"Wei Q, Dunbrack RL Jr. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE. 2013;8(7):67863.","journal-title":"PLoS ONE"},{"key":"1335_CR26","doi-asserted-by":"publisher","unstructured":"McHugh M. Interrater reliability: the kappa statistic. Biochemia medica : C\u0306asopis Hrvatskoga drus\u0306tva medicinskih biokemic\u0306ara \/ HDMB 22:276\u201382; 2012. https:\/\/doi.org\/10.11613\/BM.2012.031.","DOI":"10.11613\/BM.2012.031"},{"issue":"5","key":"1335_CR27","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1037\/h0031619","volume":"76","author":"JL Fleiss","year":"1971","unstructured":"Fleiss JL, et al. Measuring nominal scale agreement among many raters. Psychol Bull. 1971;76(5):378\u201382.","journal-title":"Psychol Bull"},{"key":"1335_CR28","doi-asserted-by":"publisher","unstructured":"Ma J, Gao W, Wong K-F. Rumor detection on Twitter with tree-structured recursive neural networks. In: Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers), pp. 1980\u20131989. Association for Computational Linguistics, Melbourne, Australia; 2018. https:\/\/doi.org\/10.18653\/v1\/P18-1184. https:\/\/www.aclweb.org\/anthology\/P18-1184.","DOI":"10.18653\/v1\/P18-1184"},{"key":"1335_CR29","doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM Sigkdd international conference on knowledge discovery and data mining. New York: ACM; 2016. p. 785\u201394.","DOI":"10.1145\/2939672.2939785"},{"key":"1335_CR30","doi-asserted-by":"crossref","unstructured":"Godin F, Vandersmissen B, De\u00a0Neve, W, Van\u00a0de Walle R: Multimedia lab @ acl wnut ner shared task: named entity recognition for Twitter microposts using distributed word representations. In: Proceedings of the workshop on noisy user-generated text. Association for Computational Linguistics, Beijing; 2015. p. 146\u201353.","DOI":"10.18653\/v1\/W15-4322"},{"key":"1335_CR31","doi-asserted-by":"crossref","unstructured":"Chaturvedi I, Cambria E, Poria S, Bajpai R. Bayesian deep convolution belief networks for subjectivity detection. In: 2016 IEEE 16th international conference on data mining workshops (ICDMW). New York: IEEE; 2016. pp. 916\u201323.","DOI":"10.1109\/ICDMW.2016.0134"},{"key":"1335_CR32","unstructured":"Kingma D, Ba J. Adam: a method for stochastic optimization. In: International conference on learning representations 2014."},{"key":"1335_CR33","unstructured":"Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics; 2010. p. 249\u201356."},{"key":"1335_CR34","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. Preprint. 2018; arXiv:1810.04805."},{"key":"1335_CR35","unstructured":"SAMHSA: 2017 National Survey on Drug Use and Health (NSDUH). US Department of Health & Human Services; 2018."},{"issue":"5","key":"1335_CR36","doi-asserted-by":"publisher","first-page":"2014","DOI":"10.1007\/s11227-016-1714-y","volume":"72","author":"E Belyi","year":"2016","unstructured":"Belyi E, Giabbanelli PJ, Patel I, Balabhadrapathruni NH, Abdallah AB, Hameed W, Mago VK. Combining association rule mining and network analysis for pharmacosurveillance. J Supercomput. 2016;72(5):2014\u201334. https:\/\/doi.org\/10.1007\/s11227-016-1714-y.","journal-title":"J Supercomput"},{"key":"1335_CR37","doi-asserted-by":"publisher","unstructured":"Li L, Shang Y, Zhang W. Improvement of hits-based algorithms on web documents. In: Proceedings of the 11th international conference on world wide web. WWW\u201902. Association for Computing Machinery, New York, NY, USA; 2002. p. 527\u201335. https:\/\/doi.org\/10.1145\/511446.511514.","DOI":"10.1145\/511446.511514"},{"key":"1335_CR38","unstructured":"Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Advances in neural information processing systems. NIPS\u201915 Proceedings of the 28th international conference on neural information processing systems, Montreal, Canada; 2015. p. 649\u201357."},{"key":"1335_CR39","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1162\/tacl_a_00051","volume":"5","author":"P Bojanowski","year":"2016","unstructured":"Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist. 2016;5:135\u201346.","journal-title":"Trans Assoc Comput Linguist"},{"key":"1335_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13638-018-1318-8","volume":"2019","author":"N Shah","year":"2019","unstructured":"Shah N, Willick D, Mago V. A framework for social media data analytics using elasticsearch and Kibana. Wirel Netw. 2019;2019:1\u20139.","journal-title":"Wirel Netw"}],"container-title":["BMC Medical Informatics and Decision Making"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-020-01335-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12911-020-01335-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12911-020-01335-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,9]],"date-time":"2022-12-09T23:30:39Z","timestamp":1670628639000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcmedinformdecismak.biomedcentral.com\/articles\/10.1186\/s12911-020-01335-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":40,"journal-issue":{"issue":"S11","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["1335"],"URL":"https:\/\/doi.org\/10.1186\/s12911-020-01335-3","relation":{},"ISSN":["1472-6947"],"issn-type":[{"value":"1472-6947","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]},"assertion":[{"value":"11 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 November 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 December 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Dr. Vijay Mago has been granted approval to conduct this research by the Lakehead University Research Ethics Board (FWA00012950). There is no personally identifiable data (biomedical, clinical, or biometric) being collected from the participants in this research. Therefore, consent to participate is not required for this publication.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"304"}}