论文标题
SampleHST:有效地选择分布式轨迹
SampleHST: Efficient On-the-Fly Selection of Distributed Traces
论文作者
论文摘要
由于仅通过分布式跟踪产生的少数痕迹有助于故障排除,因此可以通过将选择偏向异常痕迹来大大减少其存储需求。为了在这种情况下,我们提出了SampleHST,这是一种新颖的方法,可以不受监禁的方式从痕迹中进行样品进行采样。 SampleHST根据预算的大小来调整正常和异常痕迹的存储配额。最初,它利用半太空树(HST)的森林进行跟踪评分。这是基于跨树的质量分数的分布,这表征了观察不同痕迹的概率。随后,来自HST的质量分布将在线聚集轨迹,以利用平均移位算法的变体。这种微量群集的关联最终推动了抽样决定。我们已经使用云数据中心的数据将SampleHST的性能与最近建议的方法进行了比较,并证明SampleHST将采样性能提高到9.5倍。
Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose SampleHST, a novel approach to sample on-the-fly from a stream of traces in an unsupervised manner. SampleHST adjusts the storage quota of normal and anomalous traces depending on the size of its budget. Initially, it utilizes a forest of Half Space Trees (HSTs) for trace scoring. This is based on the distribution of the mass scores across the trees, which characterizes the probability of observing different traces. The mass distribution from HSTs is subsequently used to cluster the traces online leveraging a variant of the mean-shift algorithm. This trace-cluster association eventually drives the sampling decision. We have compared the performance of SampleHST with a recently suggested method using data from a cloud data center and demonstrated that SampleHST improves sampling performance up to by 9.5x.