{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,8]],"date-time":"2025-12-08T22:33:44Z","timestamp":1765233224891,"version":"3.41.0"},"reference-count":97,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2021,11,17]],"date-time":"2021-11-17T00:00:00Z","timestamp":1637107200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"U.S. National Science Foundation","doi-asserted-by":"crossref","award":["CNS 1951411"],"award-info":[{"award-number":["CNS 1951411"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]},{"name":"LSU Economic Development Assistantships awards"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Softw. Eng. Methodol."],"published-print":{"date-parts":[[2022,4,30]]},"abstract":"<jats:p>Modern application stores enable developers to classify their apps by choosing from a set of generic categories, or genres, such as health, games, and music. These categories are typically static\u2014new categories do not necessarily emerge over time to reflect innovations in the mobile software landscape. With thousands of apps classified under each category, locating apps that match a specific consumer interest can be a challenging task. To overcome this challenge, in this article, we propose an automated approach for classifying mobile apps into more focused categories of functionally related application domains. Our aim is to enhance apps visibility and discoverability. Specifically, we employ word embeddings to generate numeric semantic representations of app descriptions. These representations are then classified to generate more cohesive categories of apps. Our empirical investigation is conducted using a dataset of 600 apps, sampled from the Education, Health&amp;Fitness, and Medical categories of the Apple App Store. The results show that our classification algorithms achieve their best performance when app descriptions are vectorized using GloVe, a count-based model of word embeddings. Our findings are further validated using a dataset of Sharing Economy apps and the results are evaluated by 12 human subjects. The results show that GloVe combined with Support Vector Machines can produce app classifications that are aligned to a large extent with human-generated classifications.<\/jats:p>","DOI":"10.1145\/3474827","type":"journal-article","created":{"date-parts":[[2021,11,17]],"date-time":"2021-11-17T21:57:08Z","timestamp":1637186228000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Classifying Mobile Applications Using Word Embeddings"],"prefix":"10.1145","volume":"31","author":[{"given":"Fahimeh","family":"Ebrahimi","sequence":"first","affiliation":[{"name":"The Division of Computer Science and Engineering Louisiana State University Baton Rouge, LA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Miroslav","family":"Tushev","sequence":"additional","affiliation":[{"name":"The Division of Computer Science and Engineering Louisiana State University Baton Rouge, LA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anas","family":"Mahmoud","sequence":"additional","affiliation":[{"name":"The Division of Computer Science and Engineering Louisiana State University Baton Rouge, LA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,11,17]]},"reference":[{"key":"e_1_3_2_2_2","unstructured":"Statista. 2019. Mobile app usage. Retrieved from https:\/\/www.statista.com\/topics\/1002\/mobile-app-usage\/."},{"key":"e_1_3_2_3_2","article-title":"Docbert: Bert for document classification","author":"Adhikari Ashutosh","year":"2019","unstructured":"Ashutosh Adhikari, Achyudh Ram, Raphael Tang, and Jimmy Lin. 2019. Docbert: Bert for document classification. Retrieved from https:\/\/arXiv:1904.08398.","journal-title":"Retrieved from https:\/\/arXiv:1904.08398"},{"key":"e_1_3_2_4_2","first-page":"1","volume-title":"Empirical Software Engineering","author":"Al-Subaihin Afnan","year":"2019","unstructured":"Afnan Al-Subaihin, Federica Sarro, Sue Black, and Licia Capra. 2019. Empirical comparison of text-based mobile apps similarity measurement techniques. In Empirical Software Engineering. Springer, 1\u201326."},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/2961111.2962600"},{"key":"e_1_3_2_6_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Arora Sanjeev","year":"2016","unstructured":"Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2016. A simple but tough-to-beat baseline for sentence embeddings. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_2_7_2","article-title":"Author\u2019s sentiment prediction","author":"Bastan Mohaddeseh","year":"2020","unstructured":"Mohaddeseh Bastan, Mahnaz Koupaee, Youngseo Son, Richard Sicoli, and Niranjan Balasubramanian. 2020. Author\u2019s sentiment prediction. Retrieved from https:\/\/arXiv:2011.06128.","journal-title":"Retrieved from https:\/\/arXiv:2011.06128"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2695664.2695997"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2063576.2063663"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"issue":"2","key":"e_1_3_2_12_2","first-page":"158","article-title":"Cleaning up that mess: A framework for classifying educational apps","volume":"14","author":"Cherner Todd","year":"2014","unstructured":"Todd Cherner, Judy Dix, and Corey Lee. 2014. Cleaning up that mess: A framework for classifying educational apps. Contemp. Issues Technol. Teacher Edu. 14, 2 (2014), 158\u2013193.","journal-title":"Contemp. Issues Technol. Teacher Edu."},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1177\/001316446002000104"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1037\/0033-2909.88.2.322"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.2196\/jmir.2583"},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3134673"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.tourman.2018.11.008"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/1414004.1414034"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2017.03.002"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/1015330.1015356"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/AICCSA.2018.8612816"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.5555\/646943.712093"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1937.10503522"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.infsof.2016.04.017"},{"key":"e_1_3_2_25_2","article-title":"Emerging app issue identification via online joint sentiment-topic tracing","author":"Gao Cuiyun","year":"2020","unstructured":"Cuiyun Gao, Jichuan Zeng, Zhiyuan Wen, David Lo, Xin Xia, Irwin King, and Michael Lyu. 2020. Emerging app issue identification via online joint sentiment-topic tracing. Retrieved from https:\/\/arXiv:2008.09976.","journal-title":"Retrieved from https:\/\/arXiv:2008.09976"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/2568225.2568276"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0307752101"},{"key":"e_1_3_2_28_2","volume-title":"Effect Sizes for Research: A Broad Practical Approach","author":"Grissom Robert","year":"2005","unstructured":"Robert Grissom and John Kim. 2005. Effect Sizes for Research: A Broad Practical Approach. Lawrence Erlbaum Associates Publishers."},{"key":"e_1_3_2_29_2","first-page":"132","volume-title":"Proceedings of the International Conference on Software Process Improvement","author":"Guti\u00e9rrez Luis","year":"2018","unstructured":"Luis Guti\u00e9rrez and Brian Keith. 2018. A systematic literature review on word embeddings. In Proceedings of the International Conference on Software Process Improvement. 132\u2013141."},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2015.88"},{"key":"e_1_3_2_31_2","first-page":"406","volume-title":"Data Mining: Supervised Learning","author":"Hancock Monte","year":"2016","unstructured":"Monte Hancock. 2016. Data Mining: Supervised Learning. Taylor & Francis Group, 406\u2013421."},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1080\/00437956.1954.11659520"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/UEMCON51285.2020.9298158"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW51313.2020.00071"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/IEMTRONICS52119.2021.9422605"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/SMAP49528.2020.9248443"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3130348.3130370"},{"key":"e_1_3_2_38_2","first-page":"139","article-title":"Classification of health related applications","volume":"266","author":"H\u00f6hn Matthias","year":"2016","unstructured":"Matthias H\u00f6hn, Ute Jan, Theodor Framke, and Urs-Vito Albrecht. 2016. Classification of health related applications. Studies Health Technol. Inform. 266 (2016), 139\u2013142.","journal-title":"Studies Health Technol. Inform."},{"issue":"2","key":"e_1_3_2_39_2","first-page":"65","article-title":"A simple sequentially rejective multiple test procedure","volume":"6","author":"Holm Sture","year":"1979","unstructured":"Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 2 (1979), 65\u201370.","journal-title":"Scand. J. Stat."},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/1964858.1964870"},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2677832.2677842"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/645326.649721"},{"key":"e_1_3_2_43_2","article-title":"Siamese cbow: Optimizing word embeddings for sentence representations","author":"Kenter Tom","year":"2016","unstructured":"Tom Kenter, Alexey Borisov, and Maarten Rijke. 2016. Siamese cbow: Optimizing word embeddings for sentence representations. Retrieved from https:\/\/arXiv:1606.04640.","journal-title":"Retrieved from https:\/\/arXiv:1606.04640"},{"key":"e_1_3_2_44_2","volume-title":"Proceedings of the Natural Legal Language Processing Workshop at KDD","author":"Keymanesh Moniba","year":"2020","unstructured":"Moniba Keymanesh, Micha Elsner, and Srinivasan Parthasarathy. 2020. Toward domain-guided controllable summarization of privacy policies. In Proceedings of the Natural Legal Language Processing Workshop at KDD."},{"key":"e_1_3_2_45_2","first-page":"197","volume-title":"Complex Networks","author":"Keymanesh Moniba","year":"2020","unstructured":"Moniba Keymanesh, Saket Gurukar, Bethany Boettner, Christopher Browning, Catherine Calder, and Srinivasan Parthasarathy. 2020. Twitter watch: Leveraging social media to monitor and predict collective-efficacy of neighborhoods. In Complex Networks. 197\u2013211."},{"key":"e_1_3_2_46_2","volume-title":"Logistic Regression","author":"Kleinbaum David","year":"2002","unstructured":"David Kleinbaum and Mitchel Klein. 2002. Logistic Regression. Springer."},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.5555\/1867135.1867170"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.5555\/3044805.3045025"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.2753\/MIS0742-1222310206"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2050"},{"issue":"3","key":"e_1_3_2_51_2","first-page":"18","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw Andy","year":"2002","unstructured":"Andy Liaw and Matthew Wiener. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18\u201322.","journal-title":"R News"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2013.6557892"},{"key":"e_1_3_2_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2014.2360674"},{"key":"e_1_3_2_54_2","doi-asserted-by":"publisher","DOI":"10.3115\/1225403.1225421"},{"key":"e_1_3_2_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/2449396.2449434"},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/RE.2015.7320414"},{"key":"e_1_3_2_57_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ecolecon.2015.11.027"},{"key":"e_1_3_2_58_2","doi-asserted-by":"publisher","DOI":"10.5555\/2820518.2820535"},{"key":"e_1_3_2_59_2","volume-title":"Proceedings of the Workshop of 1st International Conference on Learning Representations","author":"Mikolov Tomas","year":"2013","unstructured":"Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the Workshop of 1st International Conference on Learning Representations."},{"key":"e_1_3_2_60_2","doi-asserted-by":"publisher","DOI":"10.5555\/2999792.2999959"},{"key":"e_1_3_2_61_2","doi-asserted-by":"publisher","DOI":"10.5555\/541177"},{"key":"e_1_3_2_62_2","first-page":"527","volume-title":"Proceedings of the International Conference on Web Information Systems and Technologies","author":"Mokarizadeh Shahab","year":"2013","unstructured":"Shahab Mokarizadeh, Mohammad Rahman, and Mihhail Matskin. 2013. Mining and analysis of apps in google play. In Proceedings of the International Conference on Web Information Systems and Technologies. 527\u2013535."},{"key":"e_1_3_2_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASONAM49781.2020.9381386"},{"key":"e_1_3_2_64_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2017.08.009"},{"key":"e_1_3_2_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/2993259.2993266"},{"key":"e_1_3_2_66_2","doi-asserted-by":"publisher","DOI":"10.5555\/2534766.2534812"},{"key":"e_1_3_2_67_2","doi-asserted-by":"publisher","DOI":"10.7599\/hmr.2015.35.1.44"},{"key":"e_1_3_2_68_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_2_69_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_2_70_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2020.110657"},{"key":"e_1_3_2_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPC.2017.26"},{"key":"e_1_3_2_72_2","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_3_2_73_2","article-title":"The sharing economy: Consumer intelligence series","year":"2015","unstructured":"PwC. 2015. The sharing economy: Consumer intelligence series. PricewaterhouseCoopers LLP.","journal-title":"PricewaterhouseCoopers LLP"},{"key":"e_1_3_2_74_2","doi-asserted-by":"publisher","DOI":"10.1145\/2872427.2874815"},{"key":"e_1_3_2_75_2","volume-title":"Proceedings of the Workshop on New Challenges for NLP Frameworks","author":"Rehurek Radim","year":"2010","unstructured":"Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the Workshop on New Challenges for NLP Frameworks."},{"key":"e_1_3_2_76_2","doi-asserted-by":"publisher","DOI":"10.1108\/00220410410560582"},{"key":"e_1_3_2_77_2","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188561"},{"key":"e_1_3_2_78_2","volume-title":"International Conference on Information Systems - Making Digital Inclusive: Blending the Local and the Global","author":"Sabzehzar Amin","year":"2020","unstructured":"Amin Sabzehzar, Yili Hong, and Raghu Santanam. 2020. People don\u2019t change, their priorities do: Evidence of value homophily for disaster relief. In International Conference on Information Systems - Making Digital Inclusive: Blending the Local and the Global."},{"key":"e_1_3_2_79_2","unstructured":"Amir Sadeghian and Alireza Sharafat. 2015. Bag of words meets bags of popcorn. Retrieved on October 7th 2021 from https:\/\/cs224d.stanford.edu\/reports\/SadeghianAmir.pdf."},{"key":"e_1_3_2_80_2","doi-asserted-by":"publisher","DOI":"10.2139\/ssrn.3558236"},{"key":"e_1_3_2_81_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCNC.2012.6181075"},{"key":"e_1_3_2_82_2","volume-title":"Proceedings of the International Communication of Association for Computing Machinery Conference","author":"Sch\u00fctze Hinrich","year":"2008","unstructured":"Hinrich Sch\u00fctze, Christopher Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval. In Proceedings of the International Communication of Association for Computing Machinery Conference."},{"key":"e_1_3_2_83_2","doi-asserted-by":"publisher","DOI":"10.1109\/CIS.2010.77"},{"key":"e_1_3_2_84_2","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/52.3-4.591"},{"key":"e_1_3_2_85_2","doi-asserted-by":"publisher","DOI":"10.5555\/2986459.2986549"},{"key":"e_1_3_2_86_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSM.2015.7332476"},{"key":"e_1_3_2_87_2","doi-asserted-by":"publisher","DOI":"10.1136\/eb-2018-102891"},{"key":"e_1_3_2_88_2","doi-asserted-by":"publisher","DOI":"10.1109\/RE48521.2020.00031"},{"key":"e_1_3_2_89_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICSME52107.2021.00016"},{"key":"e_1_3_2_90_2","first-page":"1","volume-title":"Proceedings of the International Conference on Information Systems","author":"Vakulenko Svitlana","year":"2014","unstructured":"Svitlana Vakulenko, Oliver M\u00fcller, and Jan Brocke. 2014. Enriching iTunes app store categories via topic modeling. In Proceedings of the International Conference on Information Systems. 1\u201311."},{"key":"e_1_3_2_91_2","article-title":"Decoding the alphabet soup of degrees in the united states postsecondary education system through hybrid method: Database and text mining","author":"Voghoei Sahar","year":"2021","unstructured":"Sahar Voghoei, James Byars, Khaled Rasheed, and Hamid Arabnia. 2021. Decoding the alphabet soup of degrees in the united states postsecondary education system through hybrid method: Database and text mining. Advances in Data Science and Information Engineering. Springer.","journal-title":"Advances in Data Science and Information Engineering"},{"key":"e_1_3_2_92_2","doi-asserted-by":"publisher","DOI":"10.5555\/2390665.2390688"},{"key":"e_1_3_2_93_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10515-020-00274-7"},{"key":"e_1_3_2_94_2","doi-asserted-by":"publisher","DOI":"10.5555\/2349018"},{"key":"e_1_3_2_95_2","first-page":"175","volume-title":"Studies in Health Technology and Informatics","author":"Yasini Mobin","year":"2015","unstructured":"Mobin Yasini and Guillaume Marchand. 2015. Toward a use case based classification of mobile health applications. In Studies in Health Technology and Informatics. IOS Press, 175\u2013179."},{"key":"e_1_3_2_96_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1056"},{"key":"e_1_3_2_97_2","doi-asserted-by":"publisher","DOI":"10.1145\/2396761.2398484"},{"key":"e_1_3_2_98_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2013.113"}],"container-title":["ACM Transactions on Software Engineering and Methodology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474827","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3474827","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:46Z","timestamp":1750191106000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3474827"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,17]]},"references-count":97,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4,30]]}},"alternative-id":["10.1145\/3474827"],"URL":"https:\/\/doi.org\/10.1145\/3474827","relation":{},"ISSN":["1049-331X","1557-7392"],"issn-type":[{"type":"print","value":"1049-331X"},{"type":"electronic","value":"1557-7392"}],"subject":[],"published":{"date-parts":[[2021,11,17]]},"assertion":[{"value":"2020-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}