{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T08:48:52Z","timestamp":1775638132167,"version":"3.50.1"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,12,15]],"date-time":"2016-12-15T00:00:00Z","timestamp":1481760000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation for Distinguished Young Scholars of China","award":["61325010"],"award-info":[{"award-number":["61325010"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61403358 and 61602147"],"award-info":[{"award-number":["61403358 and 61602147"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Program of China","award":["2016YFB1000904"],"award-info":[{"award-number":["2016YFB1000904"]}]},{"name":"Pinnacle Lab for Analytics @ Singapore Management University"},{"name":"Media Development Authority"},{"name":"Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Web"],"published-print":{"date-parts":[[2016,12,27]]},"abstract":"<jats:p>With the booming popularity of online social networks like Twitter and Weibo, online user footprints are accumulating rapidly on the social web. Simultaneously, the question of how to leverage the large-scale user-generated social media data for personal credit scoring comes into the sight of both researchers and practitioners. It has also become a topic of great importance and growing interest in the P2P lending industry. However, compared with traditional financial data, heterogeneous social data presents both opportunities and challenges for personal credit scoring. In this article, we seek a deep understanding of how to learn users\u2019 credit labels from social data in a comprehensive and efficient way. Particularly, we explore the social-data-based credit scoring problem under the micro-blogging setting for its open, simple, and real-time nature. To identify credit-related evidence hidden in social data, we choose to conduct an analytical and empirical study on a large-scale dataset from Weibo, the largest and most popular tweet-style website in China. Summarizing results from existing credit scoring literature, we first propose three social-data-based credit scoring principles as guidelines for in-depth exploration. In addition, we glean six credit-related insights arising from empirical observations of the testbed dataset. Based on the proposed principles and insights, we extract prediction features mainly from three categories of users\u2019 social data, including demographics, tweets, and networks. To harness this broad range of features, we put forward a two-tier stacking and boosting enhanced ensemble learning framework. Quantitative investigation of the extracted features shows that online social media data does have good potential in discriminating good credit users from bad. Furthermore, we perform experiments on the real-world Weibo dataset consisting of more than 7.3 million tweets and 200,000 users whose credit labels are known through our third-party partner. Experimental results show that (i) our approach achieves a roughly 0.625 AUC value with all the proposed social features as input, and (ii) our learning algorithm can outperform traditional credit scoring methods by as much as 17% for social-data-based personal credit scoring.<\/jats:p>","DOI":"10.1145\/2996465","type":"journal-article","created":{"date-parts":[[2016,12,15]],"date-time":"2016-12-15T17:50:23Z","timestamp":1481824223000},"page":"1-38","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["From Footprint to Evidence"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1373-5527","authenticated-orcid":false,"given":"Guangming","family":"Guo","sequence":"first","affiliation":[{"name":"University of Science and Technology of China, Anhui, China"}]},{"given":"Feida","family":"Zhu","sequence":"additional","affiliation":[{"name":"Singapore Management University, Singapore"}]},{"given":"Enhong","family":"Chen","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Anhui, China"}]},{"given":"Qi","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Anhui, China"}]},{"given":"Le","family":"Wu","sequence":"additional","affiliation":[{"name":"Hefei University of Technology, Anhui, China"}]},{"given":"Chu","family":"Guan","sequence":"additional","affiliation":[{"name":"University of Science and Technology of China, Anhui, China"}]}],"member":"320","published-online":{"date-parts":[[2016,12,15]]},"reference":[{"key":"e_1_2_1_4_1","volume-title":"Analyzing credit risk data: A comparison of logistic discrimination, classification tree analysis, and feedforward networks. Computational Statistics 12, 2","author":"Arminger Gerhard","year":"1997"},{"key":"e_1_2_1_5_1","first-page":"1","article-title":"Online peer-to-peer lending -- a literature review","volume":"16","author":"Bachmann Alexander","year":"2011","journal-title":"Journal of Internet Banking and Commerce"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772698"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1057\/palgrave.jors.2601545"},{"key":"e_1_2_1_8_1","unstructured":"Shane Bergsma and Benjamin Van Durme. 2013. Using conceptual class attributes to characterize social media users. ACL (1). 710--720. Shane Bergsma and Benjamin Van Durme. 2013. Using conceptual class attributes to characterize social media users. ACL (1). 710--720."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/944919.944937"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbankfin.2005.07.014"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jocs.2010.12.007"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1080\/1369118X.2012.678878"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(96)00142-2"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_2_1_15_1","unstructured":"John D. Burger John C. Henderson George Kim and Guido Zarrella. 2011. Discriminating gender on twitter. In EMNLP. 1301--1309. John D. Burger John C. Henderson George Kim and Guido Zarrella. 2011. Discriminating gender on twitter. In EMNLP. 1301--1309."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/509907.509965"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1468-0262.2007.00806.x"},{"key":"e_1_2_1_18_1","volume-title":"SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research","author":"Chawla Nitesh V.","year":"2002"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2637483"},{"key":"e_1_2_1_20_1","volume-title":"Social Informatics","author":"Chen Zhuohua"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1871437.1871535"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2006.09.100"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623703"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1111\/1756-2171.12019"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/0378-4266(78)90012-2"},{"key":"e_1_2_1_26_1","unstructured":"Clayton Fink Jonathon Kopecky and Maksym Morawski. 2012. Inferring gender from the content of tweets: A region specific example. In ICWSM. Clayton Fink Jonathon Kopecky and Maksym Morawski. 2012. Inferring gender from the content of tweets: A region specific example. In ICWSM."},{"key":"e_1_2_1_27_1","volume-title":"Greedy function approximation: A gradient boosting machine. Annals of Statistics","author":"Friedman Jerome H.","year":"2001"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.33.6.1203"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1879141.1879147"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2530540"},{"key":"e_1_2_1_31_1","volume-title":"Stylometric analysis of bloggers","author":"Goswami Sumit"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0307752101"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-31750-2_11"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-985X.1997.00078.x"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.5430\/air.v2n4p49"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2014.08.029"},{"key":"e_1_2_1_38_1","volume-title":"Hand","author":"Henley W. E.","year":"1996"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964858.1964870"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2009.05.059"},{"key":"e_1_2_1_41_1","doi-asserted-by":"crossref","unstructured":"Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou. 2015. Short text understanding through lexical-semantic analysis. In ICDE. 495--506. Wen Hua Zhongyuan Wang Haixun Wang Kai Zheng and Xiaofang Zhou. 2015. Short text understanding through lexical-semantic analysis. In ICDE. 495--506.","DOI":"10.1109\/ICDE.2015.7113309"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2006.07.007"},{"key":"e_1_2_1_43_1","first-page":"1","article-title":"Internet based social lending: Past, present and future","volume":"11","author":"Hulme Michael K.","year":"2006","journal-title":"Social Futures Observatory"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-00528-2_7"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb013696"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.3982\/ECTA5781"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956769"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2014.11.028"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.03.019"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/2566486.2568045"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2339530.2339692"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1016\/0005-2795(75)90109-9"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1718487.1718519"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbusvent.2013.06.005"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Dong Nguyen Rilana Gravel Dolf Trieschnigg and Theo Meder. 2013. \u201cHow old do you think I am?\u201d A study of language and age in twitter. In ICWSM. Dong Nguyen Rilana Gravel Dolf Trieschnigg and Theo Meder. 2013. \u201cHow old do you think I am?\u201d A study of language and age in twitter. In ICWSM.","DOI":"10.1145\/2528272.2528276"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2005.01.003"},{"key":"e_1_2_1_57_1","volume-title":"Paul and Mark Dredze","author":"Michael","year":"2011"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/2065023.2065035"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2020408.2020477"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1871985.1871993"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-009-9124-7"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1287\/opre.42.4.589"},{"key":"e_1_2_1_63_1","unstructured":"Sara Rosenthal and Kathleen McKeown. 2011. Age prediction in blogs: A study of style content and online behavior in pre- and post-social media generations. In ACL. 763--772. Sara Rosenthal and Kathleen McKeown. 2011. Age prediction in blogs: A study of style content and online behavior in pre- and post-social media generations. In ACL. 763--772."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772690.1772777"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1057\/palgrave.jors.2602023"},{"key":"e_1_2_1_66_1","volume-title":"Crook","author":"Thomas Lyn C.","year":"2002"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:ETIN.0000047476.05912.3d"},{"key":"e_1_2_1_68_1","volume-title":"Consumer credit: Learning your customer\u2019s default risk from what (s)he buys. Available at SSRN: http:\/\/ssrn.com\/abstract&equals;2023238","author":"Vissing-Jorgensen Annette","year":"2011"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.2307\/2330408"},{"key":"e_1_2_1_70_1","doi-asserted-by":"crossref","unstructured":"Bing Xiang and Liang Zhou. 2014. Improving twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In ACL. 434--439. Bing Xiang and Liang Zhou. 2014. Improving twitter sentiment analysis with topic-based mixture modeling and semi-supervised training. In ACL. 434--439.","DOI":"10.3115\/v1\/P14-2071"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2011.04.147"},{"key":"e_1_2_1_72_1","unstructured":"Guangxiang Zeng Ping Luo Enhong Chen and Min Wang. 2013. From social user activities to people affiliation. In ICDM. Guangxiang Zeng Ping Luo Enhong Chen and Min Wang. 2013. From social user activities to people affiliation. In ICDM."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939861"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684822.2685287"}],"container-title":["ACM Transactions on the Web"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2996465","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2996465","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:11Z","timestamp":1750220591000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2996465"}},"subtitle":["An Exploratory Study of Mining Social Data for Credit Scoring"],"short-title":[],"issued":{"date-parts":[[2016,12,15]]},"references-count":71,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2016,12,27]]}},"alternative-id":["10.1145\/2996465"],"URL":"https:\/\/doi.org\/10.1145\/2996465","relation":{},"ISSN":["1559-1131","1559-114X"],"issn-type":[{"value":"1559-1131","type":"print"},{"value":"1559-114X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,12,15]]},"assertion":[{"value":"2015-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2016-12-15","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}