{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T22:00:03Z","timestamp":1761948003448,"version":"3.41.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2014,10,7]],"date-time":"2014-10-07T00:00:00Z","timestamp":1412640000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["#0812551 and IIS-1217466"],"award-info":[{"award-number":["#0812551 and IIS-1217466"]}],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2014,10,7]]},"abstract":"<jats:p>Feature selection is widely used in preparing high-dimensional data for effective data mining. The explosive popularity of social media produces massive and high-dimensional data at an unprecedented rate, presenting new challenges to feature selection. Social media data consists of (1) traditional high-dimensional, attribute-value data such as posts, tweets, comments, and images, and (2) linked data that provides social context for posts and describes the relationships between social media users as well as who generates the posts, and so on. The nature of social media also determines that its data is massive, noisy, and incomplete, which exacerbates the already challenging problem of feature selection. In this article, we study a novel feature selection problem of selecting features for social media data with its social context. In detail, we illustrate the differences between attribute-value data and social media data, investigate if linked data can be exploited in a new feature selection framework by taking advantage of social science theories. We design and conduct experiments on datasets from real-world social media Web sites, and the empirical results demonstrate that the proposed framework can significantly improve the performance of feature selection. Further experiments are conducted to evaluate the effects of user--user and user--post relationships manifested in linked data on feature selection, and research issues for future work will be discussed.<\/jats:p>","DOI":"10.1145\/2629587","type":"journal-article","created":{"date-parts":[[2014,11,4]],"date-time":"2014-11-04T13:18:31Z","timestamp":1415107111000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Feature Selection for Social Media Data"],"prefix":"10.1145","volume":"8","author":[{"given":"Jiliang","family":"Tang","sequence":"first","affiliation":[{"name":"Arizona State University, Tempe, AZ"}]},{"given":"Huan","family":"Liu","sequence":"additional","affiliation":[{"name":"Arizona State University, Tempe, AZ"}]}],"member":"320","published-online":{"date-parts":[[2014,10,7]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1995.7.4.639"},{"volume":"19","volume-title":"Neural Information Processing Systems","author":"Argyriou A.","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btl386"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-006-0054-6"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.111"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143880"},{"key":"e_1_2_1_7_1","unstructured":"R. Duda P. Hart and D. Stork. 2001. Pattern Classification (2nd ed.). Wiley New York NY.   R. Duda P. Hart and D. Stork. 2001. Pattern Classification (2nd ed.). Wiley New York NY."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1016787"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1390681.1442794"},{"volume-title":"Proceedings of the 6th International AAAI Conference on Weblogs and Social Media.","author":"Gao H.","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0895479897326432"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1012487302797"},{"volume":"18","volume-title":"Neural Processing Information Systems","author":"He X.","key":"e_1_2_1_13_1"},{"volume-title":"Proceedings of the 19th International Conference on Machine Learning. 259--266","author":"Jensen D.","key":"e_1_2_1_14_1"},{"volume-title":"Proceedings of the 3rd International Conference on Weblogs and Social Media.","author":"Kahanda I.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1557019.1557080"},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"H. Liu and H. Motoda. 2008. Computational Methods of Feature Selection. Chapman and Hall.   H. Liu and H. Motoda. 2008. Computational Methods of Feature Selection. Chapman and Hall.","DOI":"10.1201\/9781584888796"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2005.66"},{"volume-title":"Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 339--348","author":"Liu J.","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/1248659.1248693"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1177\/0049124193022001006"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev.soc.27.1.415"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.v56:12"},{"volume-title":"Proceedings of the Neural Information Processing Systems Conference.","author":"Nie F.","key":"e_1_2_1_24_1"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.159"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1025667309714"},{"volume":"16","volume-title":"Neural Processing Information Systems","author":"Roth V.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2124295.2124309"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3120657.3120658"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/1645953.1646094"},{"volume-title":"Proceedings of the IJCAI Workshop on Learning Statistical Models from Relational Data.","author":"Taskar B.","key":"e_1_2_1_31_1"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2010.48"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1361684.1361686"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772790"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/2283516.2283660"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273641"},{"key":"e_1_2_1_37_1","first-page":"36","article-title":"Multi-source feature selection via geometry-dependent covariance analysis","volume":"4","author":"Zhao Z.","year":"2008","journal-title":"Journal of Machine Learning Research, Workshop and Conference Proceedings."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.53"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629587","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2629587","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T06:13:29Z","timestamp":1750227209000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2629587"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,7]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,10,7]]}},"alternative-id":["10.1145\/2629587"],"URL":"https:\/\/doi.org\/10.1145\/2629587","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2014,10,7]]},"assertion":[{"value":"2012-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-10-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}