{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,20]],"date-time":"2025-06-20T04:08:55Z","timestamp":1750392535284,"version":"3.41.0"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"FSE","funder":[{"name":"the National Key Research and Development Program of China","award":["No. 2022YFF0902701"],"award-info":[{"award-number":["No. 2022YFF0902701"]}]},{"DOI":"10.13039\/501100001809","name":"the National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["No. 62032016"],"award-info":[{"award-number":["No. 62032016"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"the Shenzhen Science and Technology Program","award":["No. CJGJZD20230724091659002"],"award-info":[{"award-number":["No. CJGJZD20230724091659002"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Softw. Eng."],"published-print":{"date-parts":[[2025,6,19]]},"abstract":"<jats:p>Distributed tracing is a pivotal technique for software operators to understand and diagnose issues within microservice-based systems, offering a comprehensive view of user requests propagated through various services. However, the unprecedented volume of traces imposes expensive storage and analytical burdens on online systems. Conventional tracing approaches typically rely on random sampling with a fixed probability for each trace, which risks missing valuable traces. Several tail-based sampling methods have thus been proposed to sample traces based on their content. Nevertheless, these methods primarily evaluate traces on an individual basis, neglecting the collective attributes of the sample set in terms of comprehensiveness, balance, and consistency. To address these issues, we propose TracePicker, an optimization-based online sampler designed to enhance the quality of sampled data while mitigating storage burden. TracePicker employs a streaming anomaly detector to capture and retain anomalous traces that are crucial for troubleshooting. For normal traces, the sampling process is segmented into quota allocation and group sampling, both formulated as integer programming problems. By solving these problems using dynamic programming and evolution algorithms, TracePicker selects a high-quality subset of data, minimizing overall information loss. Experimental results demonstrate that TracePicker outperforms existing tail-based sampling methods in terms of both sampling quality and time consumption.<\/jats:p>","DOI":"10.1145\/3729351","type":"journal-article","created":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:15:34Z","timestamp":1750346134000},"page":"1802-1823","source":"Crossref","is-referenced-by-count":0,"title":["TracePicker: Optimization-Based Trace Sampling for Microservice-Based Systems"],"prefix":"10.1145","volume":"2","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7925-3788","authenticated-orcid":false,"given":"Shuaiyu","family":"Xie","sequence":"first","affiliation":[{"name":"Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1559-9314","authenticated-orcid":false,"given":"Jian","family":"Wang","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"},{"name":"Zhongguancun Laboratory, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0116-474X","authenticated-orcid":false,"given":"Maodong","family":"Li","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1219-4729","authenticated-orcid":false,"given":"Peiran","family":"Chen","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2968-3496","authenticated-orcid":false,"given":"Jifeng","family":"Xuan","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2165-2636","authenticated-orcid":false,"given":"Bing","family":"Li","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"},{"name":"Zhongguancun Laboratory, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,19]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Apache. 2024. Skywalking. https:\/\/skywalking.apache.org\/"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3472883.3486999"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2024.3361209"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA59077.2024.00039"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3613864"},{"key":"e_1_2_1_6_1","unstructured":"eBPF. 2024. ebpf. https:\/\/ebpf.io\/"},{"key":"e_1_2_1_7_1","volume-title":"4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 07)","author":"Fonseca Rodrigo","year":"2007","unstructured":"Rodrigo Fonseca, George Porter, Randy H Katz, and Scott Shenker. 2007. $X-Trace$: A pervasive network tracing framework. In 4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 07)."},{"key":"e_1_2_1_8_1","unstructured":"C. N. C. Foundation. 2024. Jaeger. https:\/\/www.jaegertracing.io\/"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304013"},{"key":"e_1_2_1_10_1","unstructured":"Geatpy-dev. 2024. Geatpy2. https:\/\/github.com\/geatpy-dev\/geatpy\/"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3611643.3613881"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3643748"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICWS53863.2021.00063"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3543507.3583338"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3267841"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357223.3362736"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/IWQOS52092.2021.9521340"},{"key":"e_1_2_1_18_1","unstructured":"microservices demo. 2024. Sock Shop. https:\/\/github.com\/microservices-demo\/microservices-demo"},{"key":"e_1_2_1_19_1","unstructured":"OpenTelemetry. 2024. Sampling. https:\/\/opentelemetry.io\/docs\/concepts\/sampling\/"},{"key":"e_1_2_1_20_1","unstructured":"OpenTracing. 2024. OpenTracing. https:\/\/opentracing.io\/specification"},{"key":"e_1_2_1_21_1","unstructured":"OpenZipkin. 2024. Zipkin. https:\/\/zipkin.io\/"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485983.3494866"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3558951"},{"key":"e_1_2_1_24_1","unstructured":"Pinpoint-APM. 2024. Pinpoint. https:\/\/pinpoint-apm.github.io\/pinpoint\/"},{"key":"e_1_2_1_25_1","unstructured":"Google Cloud Platform. 2024. Online Boutique. https:\/\/github.com\/GoogleCloudPlatform\/microservices-demo"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1080\/00031305.1994.10476030"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2024.3354457"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3603269.3604823"},{"key":"e_1_2_1_29_1","volume-title":"Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag.","author":"Sigelman Benjamin H","year":"2010","unstructured":"Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a large-scale distributed systems tracing infrastructure.. Citado na, 47."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/NAFIPS.1996.534789"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1008202821328"},{"key":"e_1_2_1_32_1","unstructured":"TracePicker. 2024. TracePicker. https:\/\/github.com\/WHU-AISE\/TracePicker"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","unstructured":"Shuaiyu Xie Jian Wang Hanbin He Zhihao Wang Yuqi Zhao Neng Zhang and Bing Li. 2024. TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data. arXiv preprint arXiv:2407.19711 https:\/\/doi.org\/10.48550\/arXiv.2407.19711 10.48550\/arXiv.2407.19711","DOI":"10.48550\/arXiv.2407.19711"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSC.2024.3376202"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.aay5853"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2311.09032"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449905"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCC.2020.2985352"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3540250.3549146"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE55969.2022.00032"},{"key":"e_1_2_1_41_1","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)","author":"Zhang Lei","year":"2023","unstructured":"Lei Zhang, Zhiqiang Xie, Vaastav Anand, Ymir Vigfusson, and Jonathan Mace. 2023. The Benefit of Hindsight: Tracing $Edge-Cases$ in Distributed Systems. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). 321\u2013339. https:\/\/www.usenix.org\/conference\/nsdi23\/presentation\/zhang-lei"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620678.3624787"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267809.3267823"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSRE59848.2023.00033"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.2018.2887384"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183440.3194991"}],"container-title":["Proceedings of the ACM on Software Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729351","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T15:26:24Z","timestamp":1750346784000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729351"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,19]]},"references-count":46,"journal-issue":{"issue":"FSE","published-print":{"date-parts":[[2025,6,19]]}},"alternative-id":["10.1145\/3729351"],"URL":"https:\/\/doi.org\/10.1145\/3729351","relation":{},"ISSN":["2994-970X"],"issn-type":[{"value":"2994-970X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,19]]}}}