{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T03:48:29Z","timestamp":1747194509699,"version":"3.40.5"},"reference-count":21,"publisher":"SAGE Publications","issue":"4","license":[{"start":{"date-parts":[[2018,4,1]],"date-time":"2018-04-01T00:00:00Z","timestamp":1522540800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["International Journal of Distributed Sensor Networks"],"published-print":{"date-parts":[[2018,4]]},"abstract":"<jats:p> In order to rapidly process large amounts of sensor stream data, it is effective to extract and use samples that reflect the characteristics and patterns of the data stream well. In this article, we focus on improving the uniformity confidence of KSample, which has the characteristics of random sampling in the stream environment. For this, we first analyze the uniformity confidence of KSample and then derive two uniformity confidence degradation problems: (1) initial degradation, which rapidly decreases the uniformity confidence in the initial stage, and (2) continuous degradation, which gradually decreases the uniformity confidence in the later stages. We note that the initial degradation is caused by the sample range limitation and the past sample invariance, and the continuous degradation by the sampling range increase. For each problem, we present a corresponding solution, that is, we provide the sample range extension for sample range limitation, the past sample change for past sample invariance, and the use of UC-window for sampling range increase. By reflecting these solutions, we then propose a novel sampling method, named UC-KSample, which largely improves the uniformity confidence. Experimental results show that UC-KSample improves the uniformity confidence over KSample by 2.2 times on average, and it always keeps the uniformity confidence higher than the user-specified threshold. We also note that the sampling accuracy of UC-KSample is higher than that of KSample in both numeric sensor data and text data. The uniformity confidence is an important sampling metric in sensor data streams, and this is the first attempt to apply uniformity confidence to KSample. We believe that the proposed UC-KSample is an excellent approach that adopts an advantage of KSample, dynamic sampling over a fixed sampling ratio, while improving the uniformity confidence. <\/jats:p>","DOI":"10.1177\/1550147718773999","type":"journal-article","created":{"date-parts":[[2018,4,30]],"date-time":"2018-04-30T09:33:42Z","timestamp":1525080822000},"page":"155014771877399","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":1,"title":["Variable size sampling to support high uniformity confidence in sensor data streams"],"prefix":"10.1177","volume":"14","author":[{"given":"Hajin","family":"Kim","sequence":"first","affiliation":[{"name":"Department of Computer Science, Kangwon National University, Chuncheon, South Korea"}]},{"given":"Myeong-Seon","family":"Gil","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Kangwon National University, Chuncheon, South Korea"}]},{"given":"Yang-Sae","family":"Moon","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Kangwon National University, Chuncheon, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9062-4604","authenticated-orcid":false,"given":"Mi-Jung","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Kangwon National University, Chuncheon, South Korea"}]}],"member":"179","published-online":{"date-parts":[[2018,4,30]]},"reference":[{"first-page":"1","volume-title":"Proceedings of the 21st ACM symposium on principles of database systems","author":"Babcock B","key":"bibr1-1550147718773999"},{"key":"bibr2-1550147718773999","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139924801"},{"first-page":"140","volume-title":"Proceedings of the 22nd IEEE international conference on data engineering","author":"Jeffery SR","key":"bibr3-1550147718773999"},{"key":"bibr4-1550147718773999","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.01.078"},{"volume-title":"Data-stream sampling: basic techniques and results, data stream management","year":"2016","author":"Haas PJ","key":"bibr5-1550147718773999"},{"key":"bibr6-1550147718773999","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-2010-0453"},{"key":"bibr7-1550147718773999","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4939-0378-8_12"},{"key":"bibr8-1550147718773999","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2012.03.005"},{"first-page":"22","volume-title":"Proceedings of the 19th international conference on scientific and statistical database management","author":"Al-Kateb M","key":"bibr9-1550147718773999"},{"issue":"32","key":"bibr10-1550147718773999","first-page":"32","volume":"6","author":"Kepe TR","year":"2015","journal-title":"J Inform Data Manage"},{"first-page":"1975","volume-title":"Proceedings of the 20th international conference on knowledge discovery and data mining","author":"Cormode G","key":"bibr11-1550147718773999"},{"issue":"2","key":"bibr12-1550147718773999","first-page":"182","volume":"32","author":"Mcleod AI","year":"1983","journal-title":"J R Stat Soc Series C"},{"key":"bibr13-1550147718773999","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipl.2005.11.003"},{"first-page":"215","volume-title":"Proceedings of Korea computer congress","author":"Kim H","key":"bibr14-1550147718773999"},{"key":"bibr15-1550147718773999","volume-title":"Sampling techniques","author":"Cochran WG","year":"1977","edition":"3"},{"key":"bibr16-1550147718773999","doi-asserted-by":"publisher","DOI":"10.17487\/rfc5475"},{"first-page":"23","volume-title":"Proceedings of the 19th international conference on scientific and statistical database management","author":"Al-Kateb M","key":"bibr17-1550147718773999"},{"first-page":"6608","volume-title":"Proceedings of advances in neural information processing systems","author":"Raghunathan A","key":"bibr21-1550147718773999"},{"key":"bibr22-1550147718773999","volume-title":"Introductory statistics","author":"Mann PS","year":"2016","edition":"9"},{"key":"bibr23-1550147718773999","first-page":"6841216","volume":"2017","author":"Kim S-P","year":"2017","journal-title":"Sec Comm Networks"},{"key":"bibr24-1550147718773999","doi-asserted-by":"publisher","DOI":"10.1145\/1324185.1324190"}],"container-title":["International Journal of Distributed Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1550147718773999","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1550147718773999","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1550147718773999","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,8]],"date-time":"2020-12-08T20:55:06Z","timestamp":1607460906000},"score":1,"resource":{"primary":{"URL":"http:\/\/journals.sagepub.com\/doi\/10.1177\/1550147718773999"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,4]]},"references-count":21,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,4]]}},"alternative-id":["10.1177\/1550147718773999"],"URL":"https:\/\/doi.org\/10.1177\/1550147718773999","relation":{},"ISSN":["1550-1477","1550-1477"],"issn-type":[{"type":"print","value":"1550-1477"},{"type":"electronic","value":"1550-1477"}],"subject":[],"published":{"date-parts":[[2018,4]]}}}